Spotlight

Try VizRank online - You can now experiment with VizRank online. Find interesting data projections of your own data sets.
[This now works again]


FRI > Biolab > Supplements > VizRank
Saccharomyces cerevisiae Metabolic Example

Saccharomyces cerevisiae Metabolic Example

Data set: yeast - S.cerevisiae.tab

This experiment shows results from analyzed annotated gene expression data set on budding yeast Saccharomyces cerevisiae. The data consist of 79 different DNA microarray hybridization measurements that, for example, include the diauxic shift (12 experiments), sporulation (14 experiments), and heat shock (6 experiments). The data set has been previously used in a study of utility of various machine learning approaches by Brown and coauthors (2000). From the data, we considered 186 genes from the three functional classes that were represented with the highest number of genes: respiration (30 genes), cytoplasmic ribosomes (121 genes), and proteasome (35 genes).

As already mentioned in the paper, having 79 attributes, one could visualize and explore 3081 different two-dimensional scatterplots. For these, VizRank projection scores varied from 98.78 (the best projection) to the 47.22 (the worst one). Interestingly, the top ten projections all included an attribute coming from measurements on sporulation, with a second attribute representing a measurement from either heat shock or diauxic shift experiments. The best two projections are shown in the Figure 1.a and 1.b. The two scatterplots indicate that a single gene expression measurement during sporulation can clearly separate genes from the proteasome functional group from those from the cytoplasmic ribosomes or respiration. To further separate the latter two functional groups, an additional attribute is required from either diauxic shift (Figure 1.a) or heat shock experiment (Figure 1.b). For comparison, in Figure 2 we show two less successful data projections.

Our observation that the gene expression during diauxic shift can characterize two of our three functional groups - cytoplasmic ribosomes and respiration - has already been reported (DeRisi et al., 1997), and confirms the ability of VizRank to identify relevant projections and find interesting attributes. Both projections from Figure 1 also include an outlier which is in both cases a gene called YDR069C (ubiquitin isopeptidase). Interestingly, YDR069C is one of the genes in the list of consistently misclassified genes by Brown et al. (2000) and reported to be loosely associated with its functional group and regulated differently from the rest of the proteasome.

 
(a)
 
(b)
Figure 1. Two best scatterplot projections found by VizRank, with scores 98.78 (a) and 98.45 (b).

 

 
(a)
 
(b)
Figure 2. Two less successful scatterplot projections. VizRank ranked them with scores 80.30 (a) and 47.22 (b).

 

We also investigated the same data set using radviz visualizations with four attributes. Since the overall number of such projections for our data set was large (4,507,503), VizRank was run with search heuristic and evaluated only 10,000 most promising projections. Of these, we first found that most projections that well separated genes of different functional groups (score higher than 95) used attributes from at least two different types of experiments (for instance sporulation and diauxic shift). There is no suitable projection where separation could be achieved with all the measurement coming from the same type of experiment. Such result is biologically relevant as it speaks about the minimal number of experiments to define gene function in this domain. The best projection by VizRank is shown in Figure 3.a. It offers a perfect separation of classes, and an easy interpretation of the influence of attributes: "spo5 11" separates genes from cytoplasmic ribosomes functional group from other two groups, while attributes "diau e" and "spo-mid" enable us to clearly distinguish the proteasome group from the respiration group. For illustration, a less interesting radviz projection is also shown in Figure 3.b.

 
(a)
 
(b)
Figure 3. Two radviz projections. Figure 3.a shows a perfect separation of all three functional groups (VizRank score is 99.96), while Figure 3.b discriminates only between proteasome and respiration genes (VizRank score is 73.16).


References:

Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M., Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences, 1 , 262�267.

DeRisi, J., Iyer, V., Brown, P. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680�6.