### Q3. Theory (10 marks) Consider training a binary decision tree using entropy splits.
(a) Prove that the decrease in entropy by a split on a binary yes/no feature can never be greater than 1 bit.
(b) Generalize this result to the case of arbitrary multiway branching.
Entropy = -yes/total log yes/total -no/total log no/total Entropy cannot be greater than 1 because the value yes is less than total. The sum of yes and no is equal to total.  So ba ... See the full answer