Looking for a public data mining software? Try Orange, a scriptable component based
framework.
Want to use decision support tool, but
your computer is just not there when you need it? Check our handheld-based
decision-support schema and our PalmPilot software.
The core of the decomposition algorithm is a single-step
decomposition which decomposes a tabulated function y=F(X) into two
possibly less complex tabulated functions G and H, so that
y=G(A,c) and c=H(B). The resulting functions G and H have to
be consistent with F. For the purpose of decomposition, a set of
features X is partitioned to two disjoint subsets A and B,
referred to as a free and bound set, respectively. For a
given feature partition, single-step decomposition discovers a
new feature c and a tabular representation of H and G.
For example, consider a function y=F(x1,x2,x3) as given in the following table. The input features x1, x2, and the
output feature y can take the values lo, med, hi;
input feature x3 can take the values lo, hi.
Suppose that we want to discover a description of a new feature
c=H(x2,x3). For this purpose, we represent F by a partition
matrix that uses the values of x1 for row labels, and the
combinations of values of x2 and x3 for column labels
(Figure .a). Partition matrix entries with no
corresponding instance in the tabulated representation of F are denoted
with "-" and treated as don't-care.
Each column in the partition matrix denotes the behavior of F for a
specific combination of x2 and x3. Columns that have pairwise
equal row entries or at least one row entry is a don't-care are called
compatible. The decomposition has to assign each column a label
that corresponds to a value of c. If two columns are compatible,
they can be assigned the same label. To preserve the consistency,
different labels have to be used for incompatible columns.
The partition matrix column labels are found by coloring an
incompatibility graph. This has a distinct node
for each column of the partition matrix. Two nodes are connected if
the corresponding columns of partition matrix are incompatible. For
instance, the nodes <hi,hi> and <lo,hi> are connected because
their corresponding columns are incompatible due to the entries in the
first row (hi<>lo).
With optimal coloring of the incompatibility graph, the new feature
c obtains a minimal set of values needed for consistent derivation
of H and G from F. The optimal coloring of the graph above requires three different colors, i.e., three abstract values for c. The tabulated functions G and H can then
be derived straightforwardly from the labelled partition matrix. The resulting decomposition is therefore:
The following can be observed:
The tabulated functions G and H are overall smaller than the
original function F.
By combining the features x2 and x3 we obtained a new
feature c and the corresponding tabulated function H, which can be
interpreted as c=MIN(x2, x3). Similarly, the function G that
relates x1 and c to y can be interpreted as y=MAX(x1, c).
The decomposition generalizes some undefined entries of the
partition matrix. For example, F(hi,lo,hi) is
generalized to hi because the column <lo,hi> has the same
label as columns <lo,lo> and <hi,lo>.
Finding the Best Attribute Partition
So far, we have assumed that a partition of the features to free and
bound sets is given. However, for each function F there are many
possible partitions, each one yielding a different feature c and a
different pair of functions G and H. In feature transformation, it
is important to identify partitions that lead to simple but useful
new features. Typically, we prefer features with a small number of
values, and those that yield functions G and H of low
complexity.
The simplest measure for finding good feature partitions is the
number of values required for the new feature c. That is, when
decomposing F, a set of candidate partitions is examined and the one
that yields c with a smallest set of possible values is selected
for decomposition.
An alternative measure for the selection of partitions can be based on
the complexity of functions. Let I(F) denote a number of bits to
encode some function F. Then, the best
partition is the one that minimizes the overall complexity of newly
discovered functions H and G, i.e., the partition with minimal
I(H)+I(G). The complexity of G and H should be lower than that
of F, and decomposition takes place only if I(H)+I(G) < I(F) for
the best partition.
The number of possible partitions increases exponentially with the
number of input features. To limit the complexity of search, the
decomposition should examine only a subset of partitions. In HINT,
the partitions examined are only those with less than b features in
the bound set, where b is a user-defined parameter usually set to 2
or 3.
The partition matrix can be very large. However, it is never
explicitly represented in the program as HINT derives the
incompatibility graph directly from the
data. Partition matrix is thus a formal construct used
for explanation or analysis of the decomposition algorithm. The
coloring of the incompatibility graph is another potential
bottleneck. HINT also uses an efficient heuristic algorithm for graph
coloring.
Discovering Feature Hierarchies
With all the above ingredients, decomposition enables the
discovery of feature hierarchies. Given a data set, it is first
pre-processed to remove redundant features and their values. Next, a
single-step decomposition is used to decompose the pre-processed
tabulated function F to G and H. This process is recursively
repeated on G and H until they can not be decomposed further,
i.e., their further decomposition would increase the overall
complexity of resulting functions.