Spotlight

Looking for a public data mining software? Try Orange, a scriptable component based framework.

Want to use decision support tool, but your computer is just not there when you need it? Check our handheld-based decision-support schema and our PalmPilot software.

Explore what we are doing in functional genomics.


FRI > Biolab > Function Decomposition > Data sets > Car dataset

Car Dataset

car.data (example set with 1728 instances, C4.5 format)
car.c45-names (C4.5 names file)

Creator: Marko Bohanec
Donors to UCI ML Repository: Marko Bohanec, Blaz Zupan
Date: June, 1997

Past Usage

The hierarchical decision model, from which this dataset is derived, was first presented in

M. Bohanec and V. Rajkovic: Knowledge acquisition and explanation for multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988.

Within machine-learning, this dataset was used for the evaluation of HINT (Hiearchy INduction Tool). The results are presented in

B. Zupan, M. Bohanec, I. Bratko, J. Demsar (1997) Machine learning by function decomposition. In (D. Fisher, ed.) Proc. ICML-97, pages 421-429. Morgan-Kaufmann.

and show that HINT is able to completely reconstruct the original hierarchical model. The paper further compares the generalization capability of HINT and C4.5. The learning curve obtained by both learning systems is (p is the percent of examples used for learning, y axis shows the classification accuracy when all remaining examples are classified).

Relevant Information

Car Evaluation Database was derived from a simple hierarchical decision model originally developed for the demonstration of DEX (M. Bohanec, V. Rajkovic: Expert system for decision making. Sistemica 1(1), pp. 145-157, 1990.). The model evaluates cars according to the following attribute structure:

The features used in the structure are:

   CAR                      car acceptability
   . PRICE                  overall price
   . . buying               buying price
   . . maint                price of the maintenance
   . TECH                   technical characteristics
   . . COMFORT              comfort
   . . . doors              number of doors
   . . . persons            capacity in terms of persons to carry
   . . . lug_boot           the size of luggage boot
   . . safety               estimated safety of the car

and can use the following sets of values:

   CAR              unacc, acc, good, v-good
   . PRICE          v-high, high, med, low
   . . BUYING       v-high, high, med, low
   . . MAINT        v-high, high, med, low
   . TECH           poor, satisf, good, v-good
   . . COMFORT      bad, acc, good, v-good
   . . . DOORS      2, 3, 4, 5-more
   . . . PERSONS    2, 4, more
   . . . LUG_BOOT   small, med, big
   . . SAFETY       low, med, high

The model includes three intermediate concepts (PRICE, TECH, COMFORT). Every higher-level feature is in the original model related to its lower level descendants by a set of examples (click on the intermediate or target concept - circled in the structure - to see the set of examples that define it).

The Car Evaluation Database contains examples with the structural information removed, i.e., directly relates CAR to the six input attributes buying, maint, doors, persons, lug_boot, safety. Because of known underlying concept structure, this database may be particularly useful for testing constructive induction and structure discovery methods.

Statistics

Number of Instances: 1728 (instances completely cover the attribute space)
Number of Attributes: 6

Class distribution:

Class N N[%]
unacc 1210 70.023%
acc 384 22.222%
good 69 3.993%
v-good 65 3.762%


Datasets from the structured model

Examples for car:

PRICE    TECH     CAR
------------------------
v-high   poor     unacc
high     poor     unacc
med      poor     unacc
low      poor     unacc
v-high   satisf   unacc
high     satisf   unacc
med      satisf   acc
low      satisf   acc
v-high   good     unacc
high     good     acc
med      good     acc
low      good     good
v-high   v-good   unacc
high     v-good   acc
med      v-good   v-good
low      v-good   v-good

Examples for comfort:

doors   persons  lug_boot  COMFORT
----------------------------------
2       2        small     bad
3       2        small     bad
4       2        small     bad
5-more  2        small     bad
2       4        small     acc
3       4        small     acc
4       4        small     acc
5-more  4        small     acc
2       more     small     bad
3       more     small     acc
4       more     small     acc
5-more  more     small     acc
2       2        med       bad
3       2        med       bad
4       2        med       bad
5-more  2        med       bad
2       4        med       acc
3       4        med       acc
4       4        med       good
5-more  4        med       v-good
2       more     med       acc
3       more     med       good
4       more     med       v-good
5-more  more     med       v-good
2       2        big       bad
3       2        big       bad
4       2        big       bad
5-more  2        big       bad
2       4        big       good
3       4        big       good
4       4        big       v-good
5-more  4        big       v-good
2       more     big       good
3       more     big       v-good
4       more     big       v-good
5-more  more     big       v-good

Examples for price:

buying  maint   PRICE
----------------------
v-high  v-high  v-high
high    v-high  v-high
med     v-high  high
low     v-high  high
v-high  high    v-high
high    high    high
med     high    high
low     high    med
v-high  med     high
high    med     high
med     med     med
low     med     low
v-high  low     high
high    low     high
med     low     low
low     low     low

Examples for tech:

COMFORT  safety   TECH
------------------------
bad      low      poor
acc      low      poor
good     low      poor
v-good   low      poor
bad      med      poor
acc      med      satisf
good     med      good
v-good   med      good
bad      high     poor
acc      high     good
good     high     v-good
v-good   high     v-good