Lesson 2 Activity: Exploring a dataset Open the contact-lenses datase




Дата канвертавання22.04.2016
Памер62.21 Kb.
Lab #1
Lesson 1.2 Activity: Exploring a dataset

Open the contact-lenses dataset.



1. How many instances are there?

 5
 8
 24
 32


2. How many attributes are there?

 4
 5
 12
 24


3. How many possible values are there for the age attribute?

 2
 3
 5
 8


4. Which of these attributes has reduced as a possible value?

 age
 spectacle-prescrip
 astigmatism
 tear-prod-rate
 contact-lenses

Lesson 1.3 Activity: The Iris dataset

Open the iris dataset.



1. How many instances are there?

 100
 150
 200
 250


2. How many attributes are there?

 4
 5
 7
 10


3. How many possible values does the class attribute have?

 1
 2
 3
 50

4. Do an image search on the web to find pictures of Iris setosaIris virginica and Iris versicolor to see what the different types look like.


5. Label these images of irises according to their type by choosing the correct sequence:

(a)


(b)


(c)



 (a) setosa (b) virginica (c) versicolor
 (a) setosa (b) versicolor (c) virginica
 (a) versicolor (b) virginica (c) setosa
 (a) virginica (b) versicolor (c) setosa


6. Does the class Iris-setosa tend to have high or low values of sepallength?

 low
 high

7. Does the class Iris-virginica tend to have high or low values of petalwidth?

 low
 high


8. Which of these attributes, taken by itself, gives the best indication of the class?

 sepallength
 sepalwidth
 petalwidth

9. Examine the Iris ARFF file header and say when the dataset was first used?

 1936
 1973
 1980
 1988

10. Weka can read Comma Separated Values (.csv) format files by selecting the appropriate File Format in the "Open" file dialog. Ascertain by experiment how Weka determines the attribute names and value sets by creating a small spreadsheet file, saving it in Comma Separated Values (.csv) format, and loading it into Weka.


11. What should be the first row of a Comma Separated Values (.csv) format file that contains the nominal Weather data?

 outlook,temperature,humidity,windy,play
 sunny,hot,high,FALSE,no
 sunny,hot,high,TRUE,no
 rainy,mild,high,TRUE,no

Lesson 1.4 Activity: Using J48

Open the glass dataset, go to the Classify panel, choose the J48 tree classifier, and run it (with default parameters).



1. Use the confusion matrix to determine how many headlamps instances were misclassified as build wind float?

 1
 2
 3
 6
 7


2. Open the labor dataset, go to the Classify panel, and run the J48 classifier (with default parameters). What is the percentage ofcorrectly classified instances?


3. Now turn pruning off in the J48 configuration panel by setting unpruned to True and run it again. What is the percentage of correctly classified instances now?


Lesson 1.5 Activity: Using filters
Download and open the anneal dataset.

1. How many attributes does it have?

 

2. Apply the unsupervised attribute filter RemoveUseless. How many attributes does the dataset have now?




3. Identify one of the attributes that was removed by clicking Undo and then Apply. Now figure out why it was removed.

 The attribute name was too short
 Only one of the attribute's values actually appears in the dataset
 The attributes only had two possible values

Open the glass dataset.



4. Apply the unsupervised attribute filter Normalize. What is the new range (i.e. minimum and maximum) of the Na attribute?

 [-1, 1]
 [0, 1]
 [-∞, ∞]

5. Undo the change and bring up the Normalize filter's configuration panel. Set the scale to 3 and the translation option to 1. Apply the filter again. What is the Na attribute's range now?

 [1, 4]
 [0, 1]
 [1, 3]

6. Undo the change and check that you have reverted to the original dataset. Now apply the unsupervised attribute filter Standardize. What are the new mean and standard deviation of the K attribute?

 The mean is 0.497 and the standard deviation is 0.652
 The mean is -0.762 and the standard deviation is 8.76
 The mean is 1 and the standard deviation is 0
 The mean is 0 and the standard deviation is 1
 The mean is 1.518 and the standard deviation is 0.003
(Optional) 7. Undo all changes to the glass dataset again. Now determine which attribute set gives the highest classification accuracy using J48.

 removing Fe, Si, Al, K
 removing Fe, Mg, Rl
 removing Fe, Si, Mg, K

Lesson 1.6 Activity: Finding misclassified instances

Open the iris dataset.



1. Choose the J48 tree classifier, and run it (with default parameters). How many instances are misclassified?

 1
 2
 4
 6

2. Visualize the classifier errors by right-clicking on the Result list, and use the visualization to determine the instance numbers of the misclassified instances. Which are they?

 15, 73, 92, 98, 109, 119
 4, 8, 91, 98, 109, 119
 15, 72, 92, 98, 108, 120

3. Now switch the classifier to SimpleLogistic, which you will find in the functions category, and run it (with default parameters). How many instances are misclassified now?

 3
 6
 9
 12
 15
4. Which instances of type Iris-versicolor are misclassified as Iris-virginica?

 15, 73, 92, 98, 109, 119
 15, 73, 119, 132, 135, 147, 148
 80, 92


База данных защищена авторским правом ©shkola.of.by 2016
звярнуцца да адміністрацыі

    Галоўная старонка