# Representing structural patterns

 Дата канвертавання 22.04.2016 Памер 66.47 Kb.
Representing structural patterns:

• Plain Classification rules

• Decision Tree

• Rules with exceptions

• Relational solution

• Tree for Numerical Prediction

• Instance-based presentation

• Chapter 3 of the textbook by Witten

Ways of representing patterns: decision trees, rules, instance-based, …

• “knowledge” representation

— Representation determines inference

method

— Understanding the output is the key to

understanding the underlying learning

methods

— Different types of output for different

learning problems (e. g. classification,

regression, …)

For classification, a natural choice is to use

the same form as the input.

 Outlook Humidity Play Sunny High No Sunny Normal Yes Overcast High Yes Overcast Normal Yes Rainy High No Rainy Normal No

How to choose the right attributes?

## Decision trees

— “Divide- and- conquer” approach

— Nodes involve comparing an attribute’s value with a constant

Other possibilities:

— Leaves assign classification, set of classifications, or probability distribution

— Unknown instance is routed down the tree

Nominal attribute

• Number of children usually equals to number of values. Usually one test unless the attribute values are divided into two subsets

Numeric attribute: test whether value is greater or less than constant

• ay be tested several times

Three-way split (or multi- way split)

Integer: less than, equal to, greater than

Real: below, within, above

Missing values:

1. Assign “missing” a separate value

3. Split instance into pieces that receive weight according to fraction of training instances that go down each branch

Classifications from leave nodes are combined using the weights that have percolated to them (only works for some special schemes)

## Classification rules

—Pre-condition: a series of tests

— Tests are usually logically ANDed together

—Consequent: classes, set of classes, or probability distribution assigned by rule

—Individual rules are often logically ORed together

Conflicts arise if different conclusions apply

The weather problem
 Outlook Temperature humidity windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

A Decision List is a set of rules to be interpreted in sequence. For instance

If outlook=sunny and humidity=high then play=no

If outlook=rainy and windy=true then play=no

If outlook=overcast then play=yes

#### If humidity=normal then play=yes

If none of the above then play=yes

Interpreted as a decision list, the rules correctly classify all the instances

Note: applying the fourth rule leads to a wrong conclusion!

Unordered set of rules may overlap and lead to different conclusions

for the same instance

What to do if confliction appears?

• Go with the most popular rule.

If no rules applied to a test instance, what should we do?

• Go with the most frequent class in the training set?

## Special case: boolean class

— Assumption: if instance does not belong to class “yes”, it belongs to class “no”

— Trick: only learn rules for class “yes” and use default rule for “no”

— Order of rules is not important. No conflicts!
Rules in disjunctive normal form:
If x = 1 and y = 1 then class = a

If z = 1 and w = 1 then class = a

No clues for the executing process.

## From trees to rules

— Easy: converting a tree into a set of rules by following conditions for all the nodes on the path from the root to the leaf

Consequent is class assigned by the leaf

— Produces rules that are unambiguous

— Resulting rules are unnecessarily complex

Pruning to remove redundant rules

## From rules to trees

Difficult: transforming a rule set into a tree

• Tree cannot easily express disjunction between rules

Example: rules that test different attributes

— Symmetry needs to be broken

— Corresponding tree contains identical subtrees; Example

• If a and b then x

• If c and d then x

Decision Tree VS Classification
 Hard to add structure Easy to add extra rules Hard to describe disjunctive rules Easy for disjunctive rules No conflicting rules Conflicting rules might appear Following the path Order of interpretation is important

Association rules

— Association rules can predict any attribute and combinations of attributes

— Not intended to be used together as a set due to the huge number of possible rules

— Output needs to be restricted to show only the most predictive associations.

Issue: How to find those rules?
Support: number of instances predicted correctly

Confidence: number of correct predictions, as proportion of all instances that rule applies to (coverage and accuracy?)
— Example: 4 cool days with normal humidity

Support = 4, confidence = 100%

Minimum support and confidence pre-specified (e. g. 58 rules with support 2 and confidence 95% for weather data)

If temperature = cool then humidity = normal

Interpreting association rules

If humidity = high and windy = false and play = no then outlook = sunny

is not the same as

If windy=false and play=no then outlook=sunny

If windy=false and play=no then humidity=high

But, it means

If windy=false and play=no then outlook=sunny and humidity=high

## Rules with exceptions

— Idea: allow rules to have exceptions

Example: rule for iris data

If petal-length in [2.45, 4.45) then Iris-versicolor

New instance:

 Sepal length Sepal Width Petal length Petal width Type 5.1 3.5 2.6 0.2 Iris-setosa

— Modified rule:

If petal-length in [2.45,4.45) then Iris-versicolor

EXCEPT if petal-width < 1.0 then Iris-setosa

Exceptions to exceptions can be added…

— Rules can be updated incrementally

Easy to incorporate new data

Easy to incorporate domain knowledge

• People often think in terms of exceptions and like to be treated as exceptions,

• Each conclusion can be considered just in the context of rules and exceptions that lead to it,

Locality property is important for understanding large rule sets

## Rules involving relations

— All rules involved comparing an attribute-value to a constant are called “propositional” as they have the same expressive power as

propositional logic

— What if problem involves relationships between examples (e. g. family tree problem from above)?

## A propositional solution

 Width height Sides class 2 4 4 Standing 3 6 4 Standing 4 3 4 Lying 7 8 3 Standing 7 6 3 Lying 2 9 4 Standing 9 1 3 Lying 10 2 3 Lying

If width ≥3.5 and height < 7.0 then lying

If height 3.5 then standing

A relational solution:

If width > height then lying

If height > width then standing
A relational solution:

— Comparing attributes with each other

— Generalizes better to new data

— Standard relations: =, <, >

— But: learning relational rules is costly

— Simple solution: adding extra attributes (e. g. a binary attribute is width < height?)

## Trees for numeric prediction:

A combination of decision tree and regression.

Regression: the process of computing an expression that predicts a numeric quantity

Regression tree: “decision tree” where each leaf predicts a numeric quantity

• Predicted value is the average value of training instances that reach the leaf

Model tree: “regression tree” with linear regression models at the leaf nodes

Linear models approximate nonlinear continuous function

Example: The performance of CPU.

## Instance- based representation

1. Training instances are searched for instance that most closely resembles new instance

2. The instances themselves represent the knowledge

3. Also called instance-based learning

— Similarity function defines what’s “learned”: `lazy’ learning

— Methods: nearest-neighbor, k-nearest-neighbor, …

## The distance function

— One numeric attribute: distance is the difference between two attribute values

— Several numeric attributes: Euclidean distance for normalized attributes.

— Nominal attributes: distance is set to 1 if values are different, 0 if they are equal

Issues: Are all attributes equally important?

• Weighting the attributes if necessary

База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі