Question Solved1 Answer Decision Trees, Entropy and Information Gain An important class of machine learning models is decision trees: you can use them for both classification and regression. Decision trees can summarize the way humans reasons. Below is a table of data gathered from a recent census in Ontario, Canada. The study was done according to different features including / AGE: a continuous feature listing the age of the individual / EDUCATION, a categorical feature listing the highest education award achieved by the individual (high school, bachelors, doctorate) / MARITAL STATUS (never married, married, divorced) / OCCUPATION (transport = works in the transportation industry; professional = doctors, lawyers, etc.; agriculture = works in the agricultural industry; armed forces = is a member of the armed forces) and finally the ANNUAL INCOME, the target feature with 3 levels (<25K, 25K–50K, >50K) ID AGE EDUCATION STATUS OCCUPATION INCOME 1 39 bachelors never married transport 25K–50K 2 50 bachelors married professional 25K–50K 3 4 18 28 high school bachelors never married married agriculture professional <25K 25K–50K 5 37 high school married agriculture 25K–50K 6 7 24 52 high school high school never married divorced armed forces transport <25K 25K–50K 8 40 doctorate married professional >50K a) In this part, you are asked to compute the entropy of this dataset. b) Now, In this part, you are asked to Calculate information gain for the features: EDUCATION, MARITAL STATUS, and OCCUPATION (based on entropy).

7SDQC0 The Asker · Computer Science

Decision Trees, Entropy and Information Gain

An important class of machine learning models is decision trees: you can use them for both classification and regression. Decision trees can summarize the way humans reasons. Below is a table of data gathered from a recent census in Ontario, Canada. The study was done according to different features including / AGE: a continuous feature listing the age of the individual / EDUCATION, a categorical feature listing the highest education award achieved by the individual (high school, bachelors, doctorate) / MARITAL STATUS (never married, married, divorced) / OCCUPATION (transport = works in the transportation industry; professional = doctors, lawyers, etc.; agriculture = works in the agricultural industry; armed forces = is a member of the armed forces) and finally the ANNUAL INCOME, the target feature with 3 levels (<25K, 25K–50K, >50K)

ID

AGE

EDUCATION

STATUS

OCCUPATION

INCOME

1

39

bachelors

never married

transport

25K–50K

2

50

bachelors

married

professional

25K–50K

3

4

18

28

high school

bachelors

never married

married

agriculture

professional

<25K

25K–50K

5

37

high school

married

agriculture

25K–50K

6

7

24

52

high school

high school

never married

divorced

armed forces

transport

<25K

25K–50K

8

40

doctorate

married

professional

>50K

a) In this part, you are asked to compute the entropy of this dataset.

b) Now, In this part, you are asked to Calculate information gain for the features: EDUCATION, MARITAL STATUS, and OCCUPATION (based on entropy).

More
See Answer
Add Answer +20 Points
Community Answer
PHNIVS The First Answerer
See all the answers with 1 Unlock
Get 4 Free Unlocks by registration
1 Upvotes
1 Upvotes

a) Entropy can be defined as the measurement of randomness in a dataset. Entropy of each attribute in a dataset can be computed by the equation: E(D)=-sum_(i=1)^(n)P_(i)log_(2)(P_(i)) where, E : Denotes the entropy or impurity or randomness n : Denotes the total number of classes existing in an attribute or feature. P : Denotes the probability of class 'i' in our dataset 'D'. For the given dataset, Entropy of attribute EDUCATION: Here, P(high school)= 4/8 , P(bachelors)= 3/8 and P(doctorate)= 1/8. Therefore, E(EDUCATION)= -[(4/8 log(4/8)) + (3/8 log(3/8)) + (1/8 log(1/8))] &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; =0.423 Entropy of attribute MARITIAL STATUS: Here, P(never married)= 3/8 , P(married)= 4/8 and P(divorced)= 1/8. Therefore, E(MARITIAL STATUS)= -[(3/8 log(3/8)) + (4/8 log(4/8)) + (1/8 log(1/8))] &#160; &#160; &#160; &#160; &# ... See the full answer