Supported Visualization Methods

About this supplement

Installation

Start with VizRank in Orange

VizRank's graphical interface

VizRank method details

Supported visualization methods

Comparison with other methods

S. cerevisiae metabolic example

S. cerevisiae cell cycle example

Spotlight

Try VizRank online - You can now experiment with VizRank online. Find interesting data projections of your own data sets.
[This now works again]

FRI > Biolab > Supplements > VizRank

Supported Visualization Methods

VizRank can be in principle applied on any geometric visualization method where examples are visualized as points in a two-dimensional space and the values of attributes only influence the position of the point and not its size, shape or color (symbol properties can, however, be used to represent class value). Examples of such methods are scatterplot, radviz, polyviz, and gridviz (Grinstein et al., 2001). We implemented two such methods in Orange: scatterplot and radviz.

Scatterplot

Scatterplot with all its variants (Harris, 1999) is one of the oldest and most utilized visualization methods. In its basic form, it depicts the relation between two continuous attributes. Attributes are represented with a pair of perpendicular coordinate axes. Each data example is shown as a point in the plane whose position is determined by the values of the two selected attributes. The number of visualized attributes can be increased by mapping them to color, size and shape of the visualized point. We must, however, be aware that when visualizing a larger data set, the points can substantially overlap and the additional attributes may not be successfully perceived. Bellow is a scatterplot of two features from the budding yeast Saccharomyces cerevisiae (Brown et al., 2000) data set.

Radviz

Radviz (Hoffman et al., 1997), which stands for radial visualization, is a method where the examples are represented by points inside a unit circle. The visualized attributes correspond to points equidistantly distributed along the circumference of the circle. The easiest way to understand the method is by using a physical analogy with multiple springs. For visualizing each data example represented by m attributes, m springs are used, one spring for each attribute. One end of each spring is attached to the attribute's position on the circumference, and the other to the position of the data point inside the circle. The stiffness of each spring in terms of Hooke's law is determined by the corresponding attribute value - the greater the attribute value, the greater the stiffness. The data point is then placed at the position where the sum of all spring forces equals 0. Prior to visualizing, the values of each attribute are usually standardized to the interval between 0 and 1 to make all the attributes equally important in "pulling" the data point. Some properties of the radviz method are:

All the points that have approximately equal values of all the attributes after standardization, lie close to the center of the circle.

Points that have approximately equal values at the attributes which lie on the opposite sides of the circle, will also lie close to the center.

If one attribute value is much larger than the values of the other attributes, then the point will lie close to the point on the circumference of the circle which corresponds to this attribute.

The visualization of a given data set, and also its usefulness, largely depends on the selection of visualized attributes and their ordering around the circle perimeter. The total number of possible orderings of m attributes is m!, but some of them are equivalent up to a rotation or image mirroring. Hence, it can be shown that the total number of different projections with m attributes is (m-1)!/2.

Bottom figure shows a radviz plot of four attributes from the budding yeast Saccharomyces cerevisiae data set (Brown et al., 2000). We can see, that this four attributes perfectly separate all three functional groups and provide a clear interpretation. Genes that belong to cytoplasmic ribosomes functional group (green points) have high value at "spo5 11" and low value for other features. As for proteasome and respiration groups, they are discriminated by "spo-mid" and "diau f" features. Proteasome genes are highly expressed during sporulation but not during diauxic shift, while the oposite is true for respiratory functional group.

References:

Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M., Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences, 1 , 262�267.

Grinstein, G., Trutschl,M., Cvek,U. (2001) High-dimensional visualizations. Proceedings of the Visual Data Mining Workshop, KDD.

Harris, R.L. (1999). Information Graphics: A comprehensive illustrated reference. New York, Oxford Press, 290�297.

Hoffman, P.E., Grinstein, G., Marx, K., Grosse, I., Stanley, E. (1997) DNA Visual and Analytic Data Mining, IEEE Visualization 1997, 1 , 437�441.