Representing structural patterns:

Plain Classification rules

Decision Tree

Rules with exceptions

Relational solution

Tree for Numerical Prediction

Instancebased presentation
Reading Material:

Chapter 3 of the textbook by Witten
Ways of representing patterns: decision trees, rules, instancebased, …

“knowledge” representation
— Representation determines inference
method
— Understanding the output is the key to
understanding the underlying learning
methods
— Different types of output for different
learning problems (e. g. classification,
regression, …)
For classification, a natural choice is to use
the same form as the input.
Outlook

Humidity

Play

Sunny

High

No

Sunny

Normal

Yes

Overcast

High

Yes

Overcast

Normal

Yes

Rainy

High

No

Rainy

Normal

No

How to choose the right attributes?
Decision trees
— “Divide and conquer” approach
— Nodes involve comparing an attribute’s value with a constant
Other possibilities:
— Leaves assign classification, set of classifications, or probability distribution
— Unknown instance is routed down the tree
Nominal attribute

Number of children usually equals to number of values. Usually one test unless the attribute values are divided into two subsets
Numeric attribute: test whether value is greater or less than constant

ay be tested several times
Threeway split (or multi way split)
Integer: less than, equal to, greater than
Real: below, within, above
Missing values:
1. Assign “missing” a separate value
2. Assign instance to most popular branch (method 6)
3. Split instance into pieces that receive weight according to fraction of training instances that go down each branch
Classifications from leave nodes are combined using the weights that have percolated to them (only works for some special schemes)
Classification rules
—Precondition: a series of tests
— Tests are usually logically ANDed together
—Consequent: classes, set of classes, or probability distribution assigned by rule
—Individual rules are often logically ORed together
Conflicts arise if different conclusions apply
The weather problem
Outlook

Temperature

humidity

windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No

A Decision List is a set of rules to be interpreted in sequence. For instance
If outlook=sunny and humidity=high then play=no
If outlook=rainy and windy=true then play=no
If outlook=overcast then play=yes
If humidity=normal then play=yes
If none of the above then play=yes
Interpreted as a decision list, the rules correctly classify all the instances
Note: applying the fourth rule leads to a wrong conclusion!
Unordered set of rules may overlap and lead to different conclusions
for the same instance
What to do if confliction appears?

Go with the most popular rule.
If no rules applied to a test instance, what should we do?

Go with the most frequent class in the training set?
Special case: boolean class
— Assumption: if instance does not belong to class “yes”, it belongs to class “no”
— Trick: only learn rules for class “yes” and use default rule for “no”
— Order of rules is not important. No conflicts!
Rules in disjunctive normal form:
If x = 1 and y = 1 then class = a
If z = 1 and w = 1 then class = a
No clues for the executing process.
From trees to rules
— Easy: converting a tree into a set of rules by following conditions for all the nodes on the path from the root to the leaf
Consequent is class assigned by the leaf
— Produces rules that are unambiguous
— Resulting rules are unnecessarily complex
Pruning to remove redundant rules
From rules to trees
Difficult: transforming a rule set into a tree

Tree cannot easily express disjunction between rules
Example: rules that test different attributes
— Symmetry needs to be broken
— Corresponding tree contains identical subtrees; Example

If a and b then x

If c and d then x
Decision Tree VS Classification
Hard to add structure

Easy to add extra rules

Hard to describe disjunctive rules

Easy for disjunctive rules

No conflicting rules

Conflicting rules might appear

Following the path

Order of interpretation is important

Association rules
— Association rules can predict any attribute and combinations of attributes
— Not intended to be used together as a set due to the huge number of possible rules
— Output needs to be restricted to show only the most predictive associations.
Issue: How to find those rules?
Support: number of instances predicted correctly
Confidence: number of correct predictions, as proportion of all instances that rule applies to (coverage and accuracy?)
— Example: 4 cool days with normal humidity
Support = 4, confidence = 100%
Minimum support and confidence prespecified (e. g. 58 rules with support 2 and confidence 95% for weather data)
If temperature = cool then humidity = normal
Interpreting association rules
If humidity = high and windy = false and play = no then outlook = sunny
is not the same as
If windy=false and play=no then outlook=sunny
If windy=false and play=no then humidity=high
— But, it means
If windy=false and play=no then outlook=sunny and humidity=high
Rules with exceptions
— Idea: allow rules to have exceptions
Example: rule for iris data
If petallength in [2.45, 4.45) then Irisversicolor
New instance:
Sepal length

Sepal
Width

Petal length

Petal width

Type

5.1

3.5

2.6

0.2

Irissetosa

— Modified rule:
If petallength in [2.45,4.45) then Irisversicolor
EXCEPT if petalwidth < 1.0 then Irissetosa
Exceptions to exceptions can be added…
Advantages of using exceptions
— Rules can be updated incrementally
Easy to incorporate new data
Easy to incorporate domain knowledge

People often think in terms of exceptions and like to be treated as exceptions,

Each conclusion can be considered just in the context of rules and exceptions that lead to it,
Locality property is important for understanding large rule sets
Rules involving relations
— All rules involved comparing an attributevalue to a constant are called “propositional” as they have the same expressive power as
propositional logic
— What if problem involves relationships between examples (e. g. family tree problem from above)?
A propositional solution
Width

height

Sides

class

2

4

4

Standing

3

6

4

Standing

4

3

4

Lying

7

8

3

Standing

7

6

3

Lying

2

9

4

Standing

9

1

3

Lying

10

2

3

Lying

If width ≥3.5 and height < 7.0 then lying
If height 3.5 then standing
A relational solution:
If width > height then lying
If height > width then standing
A relational solution:
— Comparing attributes with each other
— Generalizes better to new data
— Standard relations: =, <, >
— But: learning relational rules is costly
— Simple solution: adding extra attributes (e. g. a binary attribute is width < height?)
Trees for numeric prediction:
A combination of decision tree and regression.
—Regression: the process of computing an expression that predicts a numeric quantity
—Regression tree: “decision tree” where each leaf predicts a numeric quantity

Predicted value is the average value of training instances that reach the leaf
—Model tree: “regression tree” with linear regression models at the leaf nodes
Linear models approximate nonlinear continuous function
Example: The performance of CPU.
Instance based representation
1. Training instances are searched for instance that most closely resembles new instance
2. The instances themselves represent the knowledge
3. Also called instancebased learning
— Similarity function defines what’s “learned”: `lazy’ learning
— Methods: nearestneighbor, knearestneighbor, …
The distance function
— One numeric attribute: distance is the difference between two attribute values
— Several numeric attributes: Euclidean distance for normalized attributes.
— Nominal attributes: distance is set to 1 if values are different, 0 if they are equal
Issues: Are all attributes equally important?

Weighting the attributes if necessary
