**Decision Trees, Entropy and Information
Gain**

**An important class of machine learning models is
decision trees: you can use them for both classification and
regression. Decision trees can summarize the way humans reasons.
Below is a table of data gathered from a recent census in Ontario,
Canada. The study was done according to different features
including / AGE: a continuous feature listing the age of the
individual / EDUCATION, a categorical feature listing the highest
education award achieved by the individual (high school, bachelors,
doctorate) / MARITAL STATUS (never married, married,
divorced) / OCCUPATION (transport = works in the
transportation industry; professional = doctors, lawyers, etc.;
agriculture = works in the agricultural industry; armed forces = is
a member of the armed forces) and finally the ANNUAL INCOME, the
target feature with 3 levels (<25K, 25K–50K,
>50K)**

ID |
AGE |
EDUCATION |
STATUS |
OCCUPATION |
INCOME |

1 |
39 |
bachelors |
never married |
transport |
25 |

2 |
50 |
bachelors |
married |
professional |
25 |

3 4 |
18 28 |
high school bachelors |
never married married |
agriculture professional |
<25 25 |

5 |
37 |
high school |
married |
agriculture |
25 |

6 7 |
24 52 |
high school high school |
never married divorced |
armed forces transport |
<25 25 |

8 |
40 |
doctorate |
married |
professional |
>50 |

**a) ****In this
part, you are asked to compute the entropy of this
dataset.**

**b) Now, ****In
this part, you are asked to **Calculate information
gain for the features: EDUCATION, MARITAL STATUS, and OCCUPATION
(based on entropy).

See Answer

Add Answer +20 Points

Community Answer

See all the answers with 1 Unlock

Get 4 Free Unlocks by registration

Get 4 Free Unlocks by registration

1 Upvotes

1 Upvotes

a) Entropy can be defined as the measurement of randomness in a dataset. Entropy of each attribute in a dataset can be computed by the equation: E(D)=-sum_(i=1)^(n)P_(i)log_(2)(P_(i)) where, E : Denotes the entropy or impurity or randomness n : Denotes the total number of classes existing in an attribute or feature. P : Denotes the probability of class 'i' in our dataset 'D'. For the given dataset, Entropy of attribute EDUCATION: Here, P(high school)= 4/8 , P(bachelors)= 3/8 and P(doctorate)= 1/8. Therefore, E(EDUCATION)= -[(4/8 log(4/8)) + (3/8 log(3/8)) + (1/8 log(1/8))]                                             =0.423 Entropy of attribute MARITIAL STATUS: Here, P(never married)= 3/8 , P(married)= 4/8 and P(divorced)= 1/8. Therefore, E(MARITIAL STATUS)= -[(3/8 log(3/8)) + (4/8 log(4/8)) + (1/8 log(1/8))]         &# ... See the full answer