VizRank can be in principle applied on any geometric visualization method where examples are visualized as points in a two-dimensional space and the values of attributes only influence the position of the point and not its size, shape or color (symbol properties can, however, be used to represent class value). Examples of such methods are scatterplot, radviz, polyviz, and gridviz (Grinstein et al., 2001). We implemented two such methods in Orange: scatterplot and radviz.
Scatterplot
Scatterplot with all its variants (Harris, 1999) is one of the oldest and most
utilized visualization methods. In its basic form, it depicts the relation
between two continuous attributes. Attributes are represented with a pair of
perpendicular coordinate axes. Each data example is shown as a point in the
plane whose position is determined by the values of the two selected
attributes. The number of visualized attributes can be increased by mapping
them to color, size and shape of the visualized point. We must, however, be aware that when visualizing
a larger data set, the points can substantially overlap and the additional
attributes may not be successfully perceived. Bellow is a scatterplot of two features from the budding yeast Saccharomyces cerevisiae (Brown et al., 2000) data set.
Radviz
Radviz (Hoffman et al., 1997), which stands for radial visualization, is a method where
the examples are represented by points inside a unit circle. The visualized
attributes correspond to points equidistantly distributed along the
circumference of the circle. The easiest way to understand the method is by
using a physical analogy with multiple springs. For
visualizing each data example represented by m attributes, m springs are
used, one spring for each attribute. One end of each spring is attached to the
attribute's position on the circumference, and the other to the position of the
data point inside the circle. The stiffness of each spring in terms of Hooke's
law is determined by the corresponding attribute value - the greater the
attribute value, the greater the stiffness. The data point is then placed at
the position where the sum of all spring forces equals 0. Prior to visualizing,
the values of each attribute are usually standardized to the interval between 0
and 1 to make all the attributes equally important in "pulling" the data
point. Some properties of the radviz method are:
All the points that have approximately equal values of all the attributes after standardization, lie close to the center of the circle.
Points that have approximately equal values at the attributes which lie
on the opposite sides of the circle, will also lie close to the center.
If one attribute value is much larger than the values of the other attributes,
then the point will lie close to the point on the circumference of the circle
which corresponds to this attribute.
The visualization of a given data set, and also its usefulness, largely depends
on the selection of visualized attributes and their ordering around the circle
perimeter. The total number of possible orderings of m attributes is m!,
but some of them are equivalent up to a rotation or image mirroring. Hence, it
can be shown that the total number of different projections with m attributes
is (m-1)!/2.
Bottom figure shows a radviz plot of four attributes from the budding yeast Saccharomyces cerevisiae data set (Brown et al., 2000). We can see, that this four attributes perfectly separate all three functional groups and provide a clear interpretation. Genes that belong to cytoplasmic ribosomes functional group (green points) have high value at "spo5 11" and low value for other features. As for proteasome and respiration groups, they are discriminated by "spo-mid" and "diau f" features. Proteasome genes are highly expressed during sporulation but not during diauxic shift, while the oposite is true for respiratory functional group.
References:
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M., Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences, 1 , 262�267.
Grinstein, G., Trutschl,M., Cvek,U. (2001) High-dimensional visualizations. Proceedings of the Visual Data Mining Workshop, KDD.
Harris, R.L. (1999). Information Graphics: A comprehensive illustrated reference. New York, Oxford Press, 290�297.
Hoffman, P.E., Grinstein, G., Marx, K., Grosse, I., Stanley, E. (1997) DNA Visual and Analytic Data Mining, IEEE Visualization 1997, 1 , 437�441.