Simulation The following problem uses requires some programming using R, or a comparable programming language 10. Kullback-Leibler divergence (KL) of two distributions Px and py is defined as D(px.Pr) = Px(r) log (230) KL divergence is a measure of discrepancy between two distributions with D = 0 for px = py and D > 0 for Px #py. Using KL divergence, we then define mutual information (MI) for two jointly distributed r.v.'s as I(X,Y)= D(Pxy, PxPx), that is KL divergence between the joint distribution Pxy and the product of the marginals Px and pr. The latter is the hypothetical joint distribution under independence. In this example we use MI to judge whether two variables X, Y are independent or not. We record n pairs (X,Y- Pxy, i = 1,..., n, independently (data). Here X; <{1,2,3,4) and Y, C (1,2,3). The following (4 x 3) table summarizes the data by reporting nxy(x, y) = #f(X,Y;) = (x,y)}, the count of observations with X; = x and Y = y. Table 1. counts nx y(x, y) (center block), nx(x) (right column) and ny() (bottom row). y 2 X 1 2 3 4 1 10 10 17 22 59 9 7 11 18 45 3 2 2 5 7 16 21 19 33 47 The row totals are nx(x) = #fX; = x) and similarly the column totals report nyly) = #1Y; = y). Let fx,x(x, y) – nx.y(x,y)/n denote the (relative) frequencies, and simiarly for fx(x) and fr(y). We use fxPx as an estimate for px, and fypy as an estimate for ly. Then 1x.x(x, y) log Sx.x(x,y) Tx(x)/(x)) 1-Σ serves as estimate for I(X,Y). In the following questions we implement a possible approach to decide whether to report that X1 Y, or XI Y. The logic is (a) If X 1 Y were true, then I(X,Y)= 0. (b) Instead of /(X,Y) we can only evaluate 1/(X,Y). It is okay for 1 > 0, but it should not be "too large" if XIY were true; (c) To judge how much is "too large", we carry out a small simulation: We generate a hypothetical repeat of the experiment, generating X - fx. Y - /v, independently. i = 1....., and evaluate I'. Repeat this simulation M = 100 times and record the M evaluations of i' (see the footnote about zero counts). Let m= 1.....M, denote the ordered list of those M evaluations. We use i* = is to draw the line and decide what is "too large." (d) If î is "too large", i.e., 1 > it report X / Y. Otherwise we report X 1 Y. We lucked out here with all counts being non-zero. If any count were 0. we could just replace it by 0.5.
The logic of our algorithm is an indirect argument: if in fact X 1 Y were true, then I > it is unlikely. Therefore we accept i > I* as evidence "beyond reasonable doubt" against X 1 Y. In this setup, answer the following questions: 10a. Evaluate î (step 2, above). Mark the choice closest to your answer. You can use the R macro Ih() below this problem (you need not use them). (a) 0.005 (b) 0.01 (c) 0.02 (d) 0.04 (c) 0.10 105. Carry out the simulation described in step 3. Find 1*. See the R code fragments sim) and Ihmstar shown below this problem (you need not use them). Mark the choice closest to your result. (a) 0.005 (b) 0.01 (c) 0.02 (d) 0.04 (e) 0.10 10c. If X 1 Y is true, then what is Pr(i > i*)? An approximate argument is okay. Mark the choice closest to your answer (a) 0.005 (b) 0.01 (c) 0.05 (d) 0.10 (e) 0.50 10d. In the light of (b) and (c), and following steps 1-4 above, what do you report? (a) X 1 Y (b) X LY (c) can not decide (d) need more data (e) none of these Note: In this problem we carried out a test of the hypothesis Ho: X1 Y. We used f(X,Y) as a test statistic. The decision rule to reject Ho when i > i* defined the rejection region. We will talk much more about the concept of hypothesis tests later in the course. Below are three R scripts to evaluate and to carry out the simulation in 10b.. The argument of Ih(nxy) is the (4 x 3) table of counts nxy, and function returns i(X,Y). The arguments of sim(M, fxh, fyh) are the simulation size M, fx and ly, and the function returns an (MX1) vector 1 of f(X,Y) for the M simulations. Ih <-function(nxy) sim <- function(M, fxh, fyh) { # input: f= nx x ny table of counts { ## input: M=#simulations, # output: MI I(X,Y) ## fxh = marginal on X, fyh= .. on Y n <- sum(nxy) ## output: Ihm = (M x 1) vector of Ih(X,Y) f <- nxy/n nx <- length(fxh) fx - apply(f. 1, sum) ny <- length(fyh) fy < apply(f.2, sum) Ihm<rep(0,1)# initialize fxfy <- fx % % (fy) for(i in 1:M) { fx[fx==N] < 0.01 xm <- sample(1:nxin, prob=fxh, replace=T) fy[fy-] < 0.01 ym <- sample(1:ny.n.prob-fyn, replace=T) f[0] < 0.01 nxym <- table(x,ym) I <- sum flog(f/fxfy) Ihmm] <- Ih(nxym) return(I) } } return(Thm) 1 Ihmstar <-function (Ihm) { ## input: list Ihn[1..M] of Ih(X,Y) ## output: threshold Ih* for test M <- length(Ihm) istar <- round(0.95 M) <- sort(Ihm) Cistar] return (IS) } Is