last  page PLNT4610/PLNT7690 Bioinformatics
Lecture 12, part 1 of 3
next page

December   1, 2009

GENE ARRAYS



GENE ARRAY LINKS
Bibliography on Microarray Data Analysis [http://www.nslij-genetics.org/microarray/]  


A. Gene Array Technology

1. How do gene arrays work?
2. Types of experiments
3. Types of data
4. What are we trying to learn from gene arrays?

B. Experimental design and normalization

1. Sources of experimental variation
2. Normalization

C. Grouping genes with similar expression patterns

1. Cluster analysis
2. Self-organizing maps


A. Gene Array Technology

It has become common in many model systems to sequence large numbers of cDNAs from an organism. Craig Venter at the NIH realized that a rapid survey of genes in an mRNA population could beidentified by doing single sequencing reactions on clones froma cDNA library as a rapid means of identifying the different classes of genes present in an mRNA population. Although sequence data from a single reaction is likely to contain errors, the error rate of automated sequencing methods is now far less than one error per hundred bases, more than good enough to identify a sequence, from several hundred bases of sequence.

Sequences derived from one-pass sequencing of libraries are referred to as Expressed Sequence Tags, or ESTs.

The existence of large sets of ESTs opens the door for studying gene expression on a large scale.

Animation: Microarray Tutorial at University Health Network Micrarray Centre, Toronto.
http://www.microarrays.ca/info/tutorials.html

1. How do gene arrays work?

Gene array experiments are sometimes referred to as "reverse Northerns". In Northern blots, RNA is blotted onto a filter and hybridized with a probe to detect a particular species of mRNA as a distinct band or spot. In gene array hybridization, cDNAs are spotted onto a filter or slide and hybridized with a probe made from an mRNA population. Usually, probes are made by reverse-transcribing mRNA into single-stranded cDNA in the presence of labeled nucleotides. The labeled probe, therefore, is a population of cDNA molecules representing the original mRNA population. Probes are hybridized with filters containing cDNAs spotted in a 2-dimensional array. The amount of hybridization to a given clone represents the amount of mRNA present for the corresponding gene.

Gene array technology:   measures mRNA levels for thousands of genes in

  Gene arrays consist of hundreds or thousands of cDNAs spotted onto microscope slides (microarray) or nylon filters (macroarray). cDNAs are chosen from EST collections, so the sequences, and usually the identities of genes in the array are known.

In a gene array experiment, an mRNA population is isolated from cells. The population is labeled by synthesizing complementary cDNAs using reverse transcriptase and labeled nucleotides. The resulting cDNA population is then hybridized to the array.


gene x - strongly expressed; high abundance transcript

gene y - moderately expressed; medium abundance transcript

gene z - weakly expressed; low abundance transcript

Each transcript base pairs with the complementary DNA for its corresponding gene on the array.

Signal strength is proportional to the abundance of each mRNA


WARNING! Each one of these steps contributes to experimental variation.


a. Gene arrays

Each gene on an array is represented as either


b. cDNA probes
Gene array experiments typically attempt to compare gene expression levels in different tissues or conditions, or at different times after a treatment. RNA is extracted from each tissue, condition, or traatment and RNA samples are diluted so that each sample contains the same concentration of RNA. To create a single-stranded probe, RNA is added to a reaction mix containing oligo dT primers, which can base pair with the polyA tail on mRNA, Reverse Transcriptase (RNA-dependent DNA polymerase) and labeled nucleotides. Commonly, labeled nucleotides are either tagged with fluorescent labels such as Cy3 and Cy5, or digoxygenin (DIG), which can be detected using chemiluminescent detection. In principle, for every mRNA molecule in the original RNA population, a single-stranded labeled cDNA will be produced, complementary to the mRNA. The higher the concentration of a particular mRNA, the more cDNA will be present.

c. Hybridization and washing

Incorporation of label into each probe is quantified, and probes are diluted so that all are at an equal concentration. Usually, a duplicate filter or microarray is prepared for each probe to be assayed. Probes are hybridized separately with each array. Filter arrays are incubated with probe and washed in much the same way as is done for Southern or Northern blotting. For glass microarrays, hybridization is done under a coverslip, and slides are washed by dipping into wash solutions. Commercially-produced arrays come in cassettes, in which hybridization, washing, and detection are done.

d. Data acquisition
Hybridized probe is detected by  UV fluorescence in a slide reader using confocal laser microscopy. The raw intensity of each spot is measured by a CCD camera, and the data acquired as a TIF image.
 

2. Types of experiments

Single label experiment

The simplest type of gene array experiment is the single label experiment. Duplicate arrays are hybridized with probes made using a single label. To allow comparison between treatments, controls must be included in the probes and on the arrays to act as hybridization standards.

from Mark Schena*,, Dari Shalon, Renu Heller*, Andrew Chai*, Patrick O. Brown§, and Ronald
W. Davis* (1996) Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Vol. 93, Issue 20, 10614-10619.

Expression of human genes was measured in RNA populations from cells grown at 37°C (-Heat shock) or 43°C (+Heat shock). White boxes: genes whose expression changes with heat shock. Red boxes: genes activated by heat shock. Green boxes: genes suppressed by heat shock.

Double label experiments

Another approach to comparing expression between two conditions is double label experiments.  For example,  in work from Patrick Brown's lab at Stanford,  cDNA probes were made from yeast cells grown in the presence of either galactose or glucose. To distinguish between signals from the two probes, different fluorescently-tagged nucleotides, either Cy3 or Cy5 were added during reverse transcription. Cy3 has emission maxima at 565 and 615 nm, while Cy5 has an emission peak of 670nm. Replicate experiments were done in which dyes were switched. By scanning the arrays twice, once for Cy3 and once for Cy5, a composite image can be generated in which the ratio of the two dyes, and hence, the ratio of transcripts in the two growth conditions, can be measured. In pseudocolor images, spots in the array representing genes that are more strongly expressed in the presence of galactose are shown in green, and spots representing genes more strongly expressed in the presence of glucose are shown in red.

http://www.pnas.org/cgi/content/full/94/24/13057/F1


3) Types  of data

Gene array studies tend to generate two different types of data. Studies in which two or more conditions are compared at a time generate discrete state data. Often it is critical to follow the expression of a gene over time after a treatment. In timecourse experiments, the expression of each gene in response to two or more treatments is measured over time. For example, in the timecourse at right, the solid blue and red dashed curves might represent the expression levels for a gene in response to two different drugs.



There is a whole family of problems in normalization of data and controlling for components of experimental variation.

To put things into perspective, if the experiment was repeated 4 times, the timecourse above represents
2 treatments x 6 times x 4 replicates = 48 probes hybridized to 48 duplicate arrays

to generate the data.
Although the data for each replicate are averaged, there is often a great deal of variation  in the results, which can potentially negate any meaning. Therefore, extraordinary measures must be taken to minimize experimental variation at each step in the procedure, to minimize the overall variation.

2. What are we trying to learn from gene arrays?

The primary goal of gene array experiments is to generate expression information for every gene in the array, under some set of condittions. Expression may be studied in The kind of results that are sought in gene array experiments can be illustrated as follows:

In the example, timecourse data are generated for each gene in an array. The raw data consists of a series of expression curves for timecourses, or histograms where other types of treatments are being compared. The goal is usually to find which groups of genes have the most similar expression patterns. In the example, two genes in the array (hatched background) show a gradual induction over the period of the timecourse. Two other genes (shaded background) show a biphasic response with two distinct periods of strong expression.
 

Key questions:
  • Which genes are expressed  differentially, between condition A and condition B?
  • How can genes be grouped according to similarities in expression patterns?

B. Experimental design and normalization

It is critical to realize that every experimental step in a procedure contributes to the final experimental error. Therefore, one should conceptualize the data as a set of observations each with a measureable amount of variation. In the figure, error bars represent the standard error of each measurement. The goal can then be restated as that of setting up the experiment in such a way as to minimize the final standard error in the observations. For some timepoints in which there is little true difference, a difference can only be detected when the standard error for both treatments is small. For other timepoints where the differences are large, higher standard errors will still allow the detection of the difference between two treatments.

1. Sources of experimental variation

Making a list of factors that contribute to experimental error is essentialy the same as making a list of steps in the gene array experiment. However several points are worth highlighting.
 

BIOLOGICAL REPLICATES ARE THE SINGLE MOST EFFECTIVE WAY TO GET GOOD GENE EXPRESSION RESULTS!

In the next section we will see that there is an almost endless list of ways to massage the data. The most heroic analytical methods are no substitute for the simple step of doing several biological replicates.
  • In each biological replicate, the entire experiment, such as different treatments of a batch of cells, plants or animals, sampling of different tissues from different conditions, followed by extraction of RNA, is repeated.
  • The RNA samples from different biological replicates are NOT mixed for a single hybridization. Rather, a separate labeling and hybridization is carried out for EACH REPLICATE.
  • Technical replicates, in which the same RNA sample is labeled and hybridized, only control for differences in handling. Biological replicates include all sources of biological and experimental variation. Therefor, they are more realistic.
  • As the number of biological replicates increases, the total experimental variation decreases.
Gene chips are getting cheaper all the time, often less than $100 per chip. The excuse that you can't do biological replicates because it is too expensive no longer obtains.



last  page PLNT4610/PLNT7690 Bioinformatics
Lecture 12, part 1 of 3
next page