Representing structural patterns




Дата канвертавання22.04.2016
Памер66.47 Kb.
Representing structural patterns:


  • Plain Classification rules

  • Decision Tree

  • Rules with exceptions

  • Relational solution

  • Tree for Numerical Prediction

  • Instance-based presentation


Reading Material:

  • Chapter 3 of the textbook by Witten



Ways of representing patterns: decision trees, rules, instance-based, …

  • “knowledge” representation

— Representation determines inference

method


— Understanding the output is the key to

understanding the underlying learning

methods

— Different types of output for different



learning problems (e. g. classification,

regression, …)

For classification, a natural choice is to use

the same form as the input.




Outlook

Humidity

Play

Sunny

High

No

Sunny

Normal

Yes

Overcast

High

Yes

Overcast

Normal

Yes

Rainy

High

No

Rainy

Normal

No

How to choose the right attributes?



Decision trees


— “Divide- and- conquer” approach

— Nodes involve comparing an attribute’s value with a constant



Other possibilities:

— Leaves assign classification, set of classifications, or probability distribution

— Unknown instance is routed down the tree



Nominal attribute

  • Number of children usually equals to number of values. Usually one test unless the attribute values are divided into two subsets

Numeric attribute: test whether value is greater or less than constant

  • ay be tested several times

Three-way split (or multi- way split)

Integer: less than, equal to, greater than

Real: below, within, above

Missing values:

1. Assign “missing” a separate value

2. Assign instance to most popular branch (method 6)

3. Split instance into pieces that receive weight according to fraction of training instances that go down each branch


Classifications from leave nodes are combined using the weights that have percolated to them (only works for some special schemes)

Classification rules


—Pre-condition: a series of tests

— Tests are usually logically ANDed together

—Consequent: classes, set of classes, or probability distribution assigned by rule

—Individual rules are often logically ORed together


Conflicts arise if different conclusions apply

The weather problem

Outlook

Temperature

humidity

windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

Rainy

Mild

Normal

False

Yes

Sunny

Mild

Normal

True

Yes

Overcast

Mild

High

True

Yes

Overcast

Hot

Normal

False

Yes

Rainy

Mild

High

True

No


A Decision List is a set of rules to be interpreted in sequence. For instance

If outlook=sunny and humidity=high then play=no

If outlook=rainy and windy=true then play=no

If outlook=overcast then play=yes


If humidity=normal then play=yes


If none of the above then play=yes

Interpreted as a decision list, the rules correctly classify all the instances



Note: applying the fourth rule leads to a wrong conclusion!

Unordered set of rules may overlap and lead to different conclusions

for the same instance

What to do if confliction appears?


  • Go with the most popular rule.

If no rules applied to a test instance, what should we do?

  • Go with the most frequent class in the training set?



Special case: boolean class


— Assumption: if instance does not belong to class “yes”, it belongs to class “no”

— Trick: only learn rules for class “yes” and use default rule for “no”

— Order of rules is not important. No conflicts!
Rules in disjunctive normal form:
If x = 1 and y = 1 then class = a

If z = 1 and w = 1 then class = a

No clues for the executing process.


From trees to rules


— Easy: converting a tree into a set of rules by following conditions for all the nodes on the path from the root to the leaf

Consequent is class assigned by the leaf

— Produces rules that are unambiguous

— Resulting rules are unnecessarily complex



Pruning to remove redundant rules

From rules to trees


Difficult: transforming a rule set into a tree

  • Tree cannot easily express disjunction between rules

Example: rules that test different attributes

— Symmetry needs to be broken

— Corresponding tree contains identical subtrees; Example


  • If a and b then x

  • If c and d then x


Decision Tree VS Classification

Hard to add structure

Easy to add extra rules

Hard to describe disjunctive rules

Easy for disjunctive rules

No conflicting rules

Conflicting rules might appear

Following the path

Order of interpretation is important

Association rules

— Association rules can predict any attribute and combinations of attributes

— Not intended to be used together as a set due to the huge number of possible rules

— Output needs to be restricted to show only the most predictive associations.



Issue: How to find those rules?
Support: number of instances predicted correctly

Confidence: number of correct predictions, as proportion of all instances that rule applies to (coverage and accuracy?)
— Example: 4 cool days with normal humidity

Support = 4, confidence = 100%



Minimum support and confidence pre-specified (e. g. 58 rules with support 2 and confidence 95% for weather data)

If temperature = cool then humidity = normal



Interpreting association rules

If humidity = high and windy = false and play = no then outlook = sunny



is not the same as

If windy=false and play=no then outlook=sunny

If windy=false and play=no then humidity=high

But, it means

If windy=false and play=no then outlook=sunny and humidity=high

Rules with exceptions


— Idea: allow rules to have exceptions

Example: rule for iris data



If petal-length in [2.45, 4.45) then Iris-versicolor

New instance:




Sepal length

Sepal

Width


Petal length

Petal width

Type


5.1

3.5

2.6

0.2

Iris-setosa

— Modified rule:



If petal-length in [2.45,4.45) then Iris-versicolor

EXCEPT if petal-width < 1.0 then Iris-setosa

Exceptions to exceptions can be added…


Advantages of using exceptions


— Rules can be updated incrementally

Easy to incorporate new data

Easy to incorporate domain knowledge


  • People often think in terms of exceptions and like to be treated as exceptions,

  • Each conclusion can be considered just in the context of rules and exceptions that lead to it,


Locality property is important for understanding large rule sets


Rules involving relations

— All rules involved comparing an attribute-value to a constant are called “propositional” as they have the same expressive power as

propositional logic

— What if problem involves relationships between examples (e. g. family tree problem from above)?



A propositional solution


Width

height

Sides

class

2

4

4

Standing

3

6

4

Standing

4

3

4

Lying

7

8

3

Standing

7

6

3

Lying

2

9

4

Standing

9

1

3

Lying

10

2

3

Lying

If width ≥3.5 and height < 7.0 then lying

If height 3.5 then standing

A relational solution:

If width > height then lying

If height > width then standing
A relational solution:

— Comparing attributes with each other

— Generalizes better to new data

— Standard relations: =, <, >

— But: learning relational rules is costly

— Simple solution: adding extra attributes (e. g. a binary attribute is width < height?)



Trees for numeric prediction:


A combination of decision tree and regression.

Regression: the process of computing an expression that predicts a numeric quantity

Regression tree: “decision tree” where each leaf predicts a numeric quantity


  • Predicted value is the average value of training instances that reach the leaf

Model tree: “regression tree” with linear regression models at the leaf nodes

Linear models approximate nonlinear continuous function



Example: The performance of CPU.

Instance- based representation


1. Training instances are searched for instance that most closely resembles new instance

2. The instances themselves represent the knowledge

3. Also called instance-based learning

— Similarity function defines what’s “learned”: `lazy’ learning

— Methods: nearest-neighbor, k-nearest-neighbor, …

The distance function


— One numeric attribute: distance is the difference between two attribute values

— Several numeric attributes: Euclidean distance for normalized attributes.



— Nominal attributes: distance is set to 1 if values are different, 0 if they are equal

Issues: Are all attributes equally important?

  • Weighting the attributes if necessary


База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка