This experiment shows results from analyzed annotated gene expression data set on budding
yeast Saccharomyces cerevisiae. The data consist of 79 different
DNA microarray hybridization measurements that, for example,
include the diauxic shift (12 experiments), sporulation (14 experiments), and
heat shock (6 experiments). The data set has been previously used in a study of
utility of various machine learning approaches by Brown and coauthors (2000). From the data,
we considered 186 genes from the three functional classes that were represented
with the highest number of genes: respiration (30 genes), cytoplasmic ribosomes
(121 genes), and proteasome (35 genes).
As already mentioned in the paper, having 79 attributes, one could visualize and explore 3081
different two-dimensional scatterplots. For these, VizRank projection scores varied from 98.78 (the best
projection) to the 47.22 (the worst one). Interestingly, the top ten
projections all included an attribute coming from measurements on sporulation,
with a second attribute representing a measurement from either heat shock or
diauxic shift experiments. The best two projections are shown in
the Figure 1.a and 1.b. The two scatterplots indicate
that a single gene expression measurement during sporulation can clearly
separate genes from the proteasome functional group from those from the
cytoplasmic ribosomes or respiration. To further separate the latter two
functional groups, an additional attribute is required from either diauxic shift (Figure 1.a) or
heat shock
experiment (Figure 1.b). For comparison, in Figure 2 we show two less successful data projections.
Our observation that the gene expression during
diauxic shift can characterize two of our three functional groups -
cytoplasmic ribosomes and respiration - has already been
reported (DeRisi et al., 1997), and confirms the ability of VizRank to identify
relevant projections and find interesting attributes. Both projections from Figure 1 also include an outlier which is
in both cases a gene called YDR069C (ubiquitin isopeptidase).
Interestingly, YDR069C is one of the genes in the list of consistently
misclassified genes by Brown et al. (2000) and reported to be loosely associated with
its functional group and regulated differently from the rest of the proteasome.
(a)
(b)
Figure 1. Two best scatterplot projections found by VizRank, with scores 98.78 (a) and 98.45 (b).
(a)
(b)
Figure 2. Two less successful scatterplot projections. VizRank ranked them with scores 80.30 (a) and 47.22 (b).
We also investigated the same data set using radviz visualizations with four
attributes. Since the overall number of such projections for our data set was
large (4,507,503), VizRank was run with search heuristic and evaluated only 10,000 most promising projections. Of
these, we first found that most projections that well separated genes of
different functional groups (score higher than 95) used attributes from at
least two different types of experiments (for instance sporulation and diauxic
shift). There is no suitable projection where separation could be achieved
with all the measurement coming from the same type of experiment. Such result
is biologically relevant as it speaks about the minimal number of experiments
to define gene function in this domain. The best projection by VizRank is
shown in Figure 3.a. It offers a perfect separation of classes,
and an easy interpretation of the influence of attributes: "spo5 11" separates genes from cytoplasmic ribosomes functional
group from other two groups, while attributes "diau e" and
"spo-mid" enable us to clearly distinguish the proteasome group from
the respiration group. For illustration, a less interesting radviz projection is also shown in Figure 3.b.
(a)
(b)
Figure 3. Two radviz projections. Figure 3.a shows a perfect separation of all three functional groups (VizRank score is 99.96), while Figure 3.b discriminates only between proteasome and respiration genes (VizRank score is 73.16).
References:
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M., Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences, 1 , 262�267.
DeRisi, J., Iyer, V., Brown, P. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680�6.