Decision Trees, Entropy and Information Gain
An important class of machine learning models is decision trees: you can use them for both classification and regression. Decision trees can summarize the way humans reasons. Below is a table of data gathered from a recent census in Ontario, Canada. The study was done according to different features including / AGE: a continuous feature listing the age of the individual / EDUCATION, a categorical feature listing the highest education award achieved by the individual (high school, bachelors, doctorate) / MARITAL STATUS (never married, married, divorced) / OCCUPATION (transport = works in the transportation industry; professional = doctors, lawyers, etc.; agriculture = works in the agricultural industry; armed forces = is a member of the armed forces) and finally the ANNUAL INCOME, the target feature with 3 levels (<25K, 25K–50K, >50K)
ID |
AGE |
EDUCATION |
STATUS |
OCCUPATION |
INCOME |
1 |
39 |
bachelors |
never married |
transport |
25K–50K |
2 |
50 |
bachelors |
married |
professional |
25K–50K |
3 4 |
18 28 |
high school bachelors |
never married married |
agriculture professional |
<25K 25K–50K |
5 |
37 |
high school |
married |
agriculture |
25K–50K |
6 7 |
24 52 |
high school high school |
never married divorced |
armed forces transport |
<25K 25K–50K |
8 |
40 |
doctorate |
married |
professional |
>50K |
a) In this part, you are asked to compute the entropy of this dataset.
b) Now, In this part, you are asked to Calculate information gain for the features: EDUCATION, MARITAL STATUS, and OCCUPATION (based on entropy).
a) Entropy can be defined as the measurement of randomness in a dataset. Entropy of each attribute in a dataset can be computed by the equation: E(D)=-sum_(i=1)^(n)P_(i)log_(2)(P_(i)) where, E : Denotes the entropy or impurity or randomness n : Denotes the total number of classes existing in an attribute or feature. P : Denotes the probability of class 'i' in our dataset 'D'. For the given dataset, Entropy of attribute EDUCATION: Here, P(high school)= 4/8 , P(bachelors)= 3/8 and P(doctorate)= 1/8. Therefore, E(EDUCATION)= -[(4/8 log(4/8)) + (3/8 log(3/8)) + (1/8 log(1/8))]                                             =0.423 Entropy of attribute MARITIAL STATUS: Here, P(never married)= 3/8 , P(married)= 4/8 and P(divorced)= 1/8. Therefore, E(MARITIAL STATUS)= -[(3/8 log(3/8)) + (4/8 log(4/8)) + (1/8 log(1/8))]         &# ... See the full answer