Lab #1
Lesson 1.2 Activity: Exploring a dataset
Open the contact-lenses dataset.
1. How many instances are there?
5
8
24
32
2. How many attributes are there?
4
5
12
24
3. How many possible values are there for the age attribute?
2
3
5
8
4. Which of these attributes has reduced as a possible value?
age
spectacle-prescrip
astigmatism
tear-prod-rate
contact-lenses
Lesson 1.3 Activity: The Iris dataset
Open the iris dataset.
1. How many instances are there?
100
150
200
250
2. How many attributes are there?
4
5
7
10
3. How many possible values does the class attribute have?
1
2
3
50
4. Do an image search on the web to find pictures of Iris setosa, Iris virginica and Iris versicolor to see what the different types look like.
5. Label these images of irises according to their type by choosing the correct sequence:
(a)
(b)
(c)
(a) setosa (b) virginica (c) versicolor
(a) setosa (b) versicolor (c) virginica
(a) versicolor (b) virginica (c) setosa
(a) virginica (b) versicolor (c) setosa
6. Does the class Iris-setosa tend to have high or low values of sepallength?
low
high
7. Does the class Iris-virginica tend to have high or low values of petalwidth?
low
high
8. Which of these attributes, taken by itself, gives the best indication of the class?
sepallength
sepalwidth
petalwidth
9. Examine the Iris ARFF file header and say when the dataset was first used?
1936
1973
1980
1988
10. Weka can read Comma Separated Values (.csv) format files by selecting the appropriate File Format in the "Open" file dialog. Ascertain by experiment how Weka determines the attribute names and value sets by creating a small spreadsheet file, saving it in Comma Separated Values (.csv) format, and loading it into Weka.
11. What should be the first row of a Comma Separated Values (.csv) format file that contains the nominal Weather data?
outlook,temperature,humidity,windy,play
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
rainy,mild,high,TRUE,no
Lesson 1.4 Activity: Using J48
Open the glass dataset, go to the Classify panel, choose the J48 tree classifier, and run it (with default parameters).
1. Use the confusion matrix to determine how many headlamps instances were misclassified as build wind float?
1
2
3
6
7
2. Open the labor dataset, go to the Classify panel, and run the J48 classifier (with default parameters). What is the percentage ofcorrectly classified instances?
3. Now turn pruning off in the J48 configuration panel by setting unpruned to True and run it again. What is the percentage of correctly classified instances now?
Lesson 1.5 Activity: Using filters
Download and open the anneal dataset.
1. How many attributes does it have?
2. Apply the unsupervised attribute filter RemoveUseless. How many attributes does the dataset have now?
3. Identify one of the attributes that was removed by clicking Undo and then Apply. Now figure out why it was removed.
The attribute name was too short
Only one of the attribute's values actually appears in the dataset
The attributes only had two possible values
Open the glass dataset.
4. Apply the unsupervised attribute filter Normalize. What is the new range (i.e. minimum and maximum) of the Na attribute?
[-1, 1]
[0, 1]
[-∞, ∞]
5. Undo the change and bring up the Normalize filter's configuration panel. Set the scale to 3 and the translation option to 1. Apply the filter again. What is the Na attribute's range now?
[1, 4]
[0, 1]
[1, 3]
6. Undo the change and check that you have reverted to the original dataset. Now apply the unsupervised attribute filter Standardize. What are the new mean and standard deviation of the K attribute?
The mean is 0.497 and the standard deviation is 0.652
The mean is -0.762 and the standard deviation is 8.76
The mean is 1 and the standard deviation is 0
The mean is 0 and the standard deviation is 1
The mean is 1.518 and the standard deviation is 0.003
(Optional) 7. Undo all changes to the glass dataset again. Now determine which attribute set gives the highest classification accuracy using J48.
removing Fe, Si, Al, K
removing Fe, Mg, Rl
removing Fe, Si, Mg, K
Lesson 1.6 Activity: Finding misclassified instances
Open the iris dataset.
1. Choose the J48 tree classifier, and run it (with default parameters). How many instances are misclassified?
1
2
4
6
2. Visualize the classifier errors by right-clicking on the Result list, and use the visualization to determine the instance numbers of the misclassified instances. Which are they?
15, 73, 92, 98, 109, 119
4, 8, 91, 98, 109, 119
15, 72, 92, 98, 108, 120
3. Now switch the classifier to SimpleLogistic, which you will find in the functions category, and run it (with default parameters). How many instances are misclassified now?
3
6
9
12
15
4. Which instances of type Iris-versicolor are misclassified as Iris-virginica?
15, 73, 92, 98, 109, 119
15, 73, 119, 132, 135, 147, 148
80, 92 |