Introduction

An Example

The Algorithm

Constructive Induction

Related Work

Our Papers on Function Decompositon

Orange

Data sets

Spotlight

Looking for a public data mining software? Try Orange, a scriptable component based framework.

Want to use decision support tool, but your computer is just not there when you need it? Check our handheld-based decision-support schema and our PalmPilot software.

Explore what we are doing in functional genomics.

FRI > Biolab > Function Decomposition > The Algorithm

The Algorithm

The following text is adapted from Feature Transformation by Function Decomposition (Zupan et al. 1997, IEEE Intelligent Systems, 1998, vol. 13, pages 38-43).]

HINT implements function decomposition in three steps. Given a table of examples to decompose, it uses single-step decomposition to assess different attribute partitions. Using some partition selection measure, it chooses which partition is the best one. If using the best partition the complexity of the example set would be minimized after the decomposition, the decomposition takes place. If it does, the same procedure is then executed over the resulting data sets.

Single-Step Decomposition

The core of the decomposition algorithm is a single-step decomposition which decomposes a tabulated function y=F(X) into two possibly less complex tabulated functions G and H, so that y=G(A,c) and c=H(B). The resulting functions G and H have to be consistent with F. For the purpose of decomposition, a set of features X is partitioned to two disjoint subsets A and B, referred to as a free and bound set, respectively. For a given feature partition, single-step decomposition discovers a new feature c and a tabular representation of H and G.

For example, consider a function y=F(x₁,x₂,x₃) as given in the following table. The input features x₁, x₂, and the output feature y can take the values lo, med, hi; input feature x₃ can take the values lo, hi.

Suppose that we want to discover a description of a new feature c=H(x₂,x₃). For this purpose, we represent F by a partition matrix that uses the values of x₁ for row labels, and the combinations of values of x₂ and x₃ for column labels (Figure .a). Partition matrix entries with no corresponding instance in the tabulated representation of F are denoted with "-" and treated as don't-care.

Each column in the partition matrix denotes the behavior of F for a specific combination of x₂ and x₃. Columns that have pairwise equal row entries or at least one row entry is a don't-care are called compatible. The decomposition has to assign each column a label that corresponds to a value of c. If two columns are compatible, they can be assigned the same label. To preserve the consistency, different labels have to be used for incompatible columns.

The partition matrix column labels are found by coloring an incompatibility graph. This has a distinct node for each column of the partition matrix. Two nodes are connected if the corresponding columns of partition matrix are incompatible. For instance, the nodes <hi,hi> and <lo,hi> are connected because their corresponding columns are incompatible due to the entries in the first row (hi<>lo).

With optimal coloring of the incompatibility graph, the new feature c obtains a minimal set of values needed for consistent derivation of H and G from F. The optimal coloring of the graph above requires three different colors, i.e., three abstract values for c. The tabulated functions G and H can then be derived straightforwardly from the labelled partition matrix. The resulting decomposition is therefore:

The following can be observed:

The tabulated functions G and H are overall smaller than the original function F.

By combining the features x₂ and x₃ we obtained a new feature c and the corresponding tabulated function H, which can be interpreted as c=MIN(x₂, x₃). Similarly, the function G that relates x₁ and c to y can be interpreted as y=MAX(x₁, c).

The decomposition generalizes some undefined entries of the partition matrix. For example, F(hi,lo,hi) is generalized to hi because the column <lo,hi> has the same label as columns <lo,lo> and <hi,lo>.

Finding the Best Attribute Partition

So far, we have assumed that a partition of the features to free and bound sets is given. However, for each function F there are many possible partitions, each one yielding a different feature c and a different pair of functions G and H. In feature transformation, it is important to identify partitions that lead to simple but useful new features. Typically, we prefer features with a small number of values, and those that yield functions G and H of low complexity.

The simplest measure for finding good feature partitions is the number of values required for the new feature c. That is, when decomposing F, a set of candidate partitions is examined and the one that yields c with a smallest set of possible values is selected for decomposition.

An alternative measure for the selection of partitions can be based on the complexity of functions. Let I(F) denote a number of bits to encode some function F. Then, the best partition is the one that minimizes the overall complexity of newly discovered functions H and G, i.e., the partition with minimal I(H)+I(G). The complexity of G and H should be lower than that of F, and decomposition takes place only if I(H)+I(G) < I(F) for the best partition.

The number of possible partitions increases exponentially with the number of input features. To limit the complexity of search, the decomposition should examine only a subset of partitions. In HINT, the partitions examined are only those with less than b features in the bound set, where b is a user-defined parameter usually set to 2 or 3.

The partition matrix can be very large. However, it is never explicitly represented in the program as HINT derives the incompatibility graph directly from the data. Partition matrix is thus a formal construct used for explanation or analysis of the decomposition algorithm. The coloring of the incompatibility graph is another potential bottleneck. HINT also uses an efficient heuristic algorithm for graph coloring.

Discovering Feature Hierarchies

With all the above ingredients, decomposition enables the discovery of feature hierarchies. Given a data set, it is first pre-processed to remove redundant features and their values. Next, a single-step decomposition is used to decompose the pre-processed tabulated function F to G and H. This process is recursively repeated on G and H until they can not be decomposed further, i.e., their further decomposition would increase the overall complexity of resulting functions.