QUESTION
Are rental rates influenced by the student population in a college town? Let rent be the average monthly rent paid on rental units in a college town in the United States. Let pop denote the total city population, avginc the average city income, and pctstu the student population as a percentage of the total population.
One model to test for a relationship is
ln(rent) = β0 + β1ln(pop) + β2ln(avginc) + β3pctstu + u
a) State the null hypothesis that size of the student body relative to the population has no ceteris paribus effect on monthly rents. State the alternative that there is an effect. (Write the hypotheses as statements about the relevant model parameter(s).)
b) What signs do you expect for β1 and β2?
Here is the equation estimated using 1990 data from rental.dta for 64 college towns (with standard errors in parentheses):
?
?????? = .043 + .066 lnpop + .507 lnavginc + .0056 pctstu
(.844) (.039) (.081) (.0017) n=64, R2=.458
Here are the summary statistics and correlation matrix for the four variables:
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------
lnrent | lnpop | lnavginc | pctstu |
64 6.026034 64 11.16897 64 10.04073 64 27.84786
.200436 .6245325 .2556954 13.61892
5.717028 6.829794 10.16119 13.35808 9.133676 10.93857 11.45658 71.20982
| lnrent lnpop lnavginc pctstu -------------+------------------------------------
lnrent | lnpop | lnavginc |
1.0000 0.2195 1.0000 0.6029 0.3692 1.0000
pctstu | 0.0598 -0.5869 -0.3127 1.0000
c) Interpret the slope coefficient of 0.0056 on pctstu. Be careful when stating the units.
d) If lnavginc were omitted from the regression, what do you think would happen to the coefficient on pctstu?
Explain briefly.
e) Using the estimates from above and one of the critical values below, test the hypothesis stated in part (a) at the 5% level. Explain why you chose the critical value that you did.
[Hint: you will need to calculate the t-statistic for ??3.]
Stata note: the expression invttail(df,p) gives the value ?, of a t-distributed random variable with df degrees of freedom, for which ???? ?? = p. So, it gives us the “critical value” that leaves probability p in one tail of a t-distribution with df degrees of freedom.
. display invttail(60,.005)
2.660283
. display invttail(60,.01)
2.3901195
. display invttail(60,.025)
2.0002978
. display invttail(60,.05)
1.6706489
f) Now focus on the slope coefficient on lnpop (β1). Construct a (two-sided) 90% confidence interval for 1. (Continue to use the estimates from part (b) and choose the appropriate critical values from part (e).) Write a statement that explains this 90% confidence interval.
g) Construct a (two-sided) 95% confidence interval for 1. (Again, choose the appropriate critical value from part (e)).
h) Based on your answers to (f) and (g), what can you conclude about the (2-sided) p-value for ??1: A. p>.10
B. p=.10
C. p=.05
D. .05<p<.10 E. p<.05
i) Use the output in (b) to calculate the t-statistic for ??1. It is: __________. Now use the output from the Stata command:
. display ttail(60,1.70)
.04715491
Stata note: the expression ttail(df,t) gives the probability ? ? ???? ?? for a t-distributed random variable with df degrees of freedom. So, it gives us the probability in one tail of a t-distribution with df degrees of freedom above the value t.
Use this information (including knowledge of the t-statistic you calculated) to obtain the (two-sided) p-value associated with ??1. Your answer here should be consistent with your answer to part (h).
j) Now estimate the regression in part (b) yourself and also estimate the model omitting lnavginc. Use the output to verify your answers to parts (d), (e), (f), (g), (h) & (i).
Stata notes:
The data set rental.dta contains data from two years, 1980 and 1990.
The variable “year” takes on of two values (80 or 90).
To estimate the regression for the 1990 data only, type “if” after the last variable in the
regression statement and then the expression “year==90”. Note that there is no comma before “if”:
regress y x1 x2 x3 if year==90
You should run the command once with the default significance level of 5% (and 95% confidence intervals).
Then run it again with the option (typed after a comma): , level(90)
to get output corresponding to the 10% significance level and 90% confidence intervals.
k) Based on your regression output from part (j), which of the three variables (lnpop, lnavginc, pctstu) is statistically significant at each of the following levels. (List the variables in each case.)
a 10% level?
a5%level?
a 1% level?
a 0.1 % level?
(\#1) (Data exercise). You should use Stata for parts (j) and (k) of this problem. Are rental rates influenced by the student population in a college town? Let rent be the average monthly rent paid on rental units in a college town in the United States. Let pop denote the total city population, avginc the average city income, and pctstu the student population as a percentage of the tot population. One model to test for a relationship is \[ \ln (\text { rent })=\beta_{0}+\beta_{1} \ln (\text { pop })+\beta_{2} \ln (\text { avginc })+\beta_{3} p c t s t u+u \] a) State the null hypothesis that size of the student body relative to the population has no ceteris paribus effect on monthly rents. State the alternative that there is an effect. (Write the hypotheses as statements about the relevant model parameter(s).) b) What signs do you expect for $\beta_{1}$ and $\beta_{2}$ ? Here is the equation estimated using 1990 data from rental.dta for 64 college towns (with standard errors in parentheses): \[ \begin{array}{l} \\ \text { lnrent }=.043+.066 \text { lnpop }+.507 \text { lnavginc }+.0056 \text { pctstu } \\ (.844)(.039)(.081) \end{array} \] Here are the summary statistics and correlation matrix for the four variables: c) Interpret the slope coefficient of 0.0056 on pctstu. Be careful when stating the units.
(\#1) cont'd. d) If lnavginc were omitted from the regression, what do you think would happen to the coefficient on pctstu? Explain briefly. e) Using the estimates from above and one of the critical values below, test the hypothesis stated in part (a) at the $5 \%$ level. Explain why you chose the critical value that you did. [Hint: you will need to calculate the t-statistic for $\hat{\beta}_{3}$ ] Stata note: the expression invttail $(d f, p)$ gives the value $t$, of a t-distributed random variable with $d f$ degrees of freedom, for which $\operatorname{Pr}(>\boldsymbol{t})=\boldsymbol{p}$. So, it gives us the "critical value" that leaves probability $\boldsymbol{p}$ in one tail of a t-distribution with $d \boldsymbol{f}$ degrees of freedom. display invttail $(60, .005)$ 2.660283 2. display invttail $(60, .01)$ 2.3901195 2.0002978 invttail $(60, .025)$ display invtail $(60, .05)$ 1.6706489 . display invttail $(60, .005)$ 2. 660283 . display invttail $(60, .01)$ 2. 3901195 . display invttail $(60, .025)$ 2.0002978 . display invttail $(60, .05)$ 1.6706489 f) Now focus on the slope coefficient on Inpop ( $\left.\beta_{1}\right)$. Construct a (two-sided) $\mathbf{9 0 \%}$ confidence interval for $\beta_{1}$. (Continue to use the estimates from part (b) and choose the appropriate critical values from part (e).) Write a statement that explains this $90 \%$ confidence interval. g) Construct a (two-sided) $95 \%$ confidence interval for $\beta_{1}$. (Again, choose the appropriate critical value from part (e)). h) Based on your answers to (f) and (g), what can you conclude about the (2-sided) p-value for $\hat{\beta}_{1}$ : A. $\mathrm{p}>.10$ B. $\mathrm{p}=.10$ C. $\mathrm{p}=.05$ D. $.05<\mathrm{p}<.10$ E. $\mathrm{p}<.05$ i) Use the output in (b) to calculate the t-statistic for $\hat{\beta}_{1}$. It is: Now use the output from the Stata command: \[ \begin{array}{l} \text { display tail }(60,1.70) \\ .04715491 \end{array} \] Stata note: the expression ttail $(d \boldsymbol{f}, t)$ gives the probability $\boldsymbol{p}=\operatorname{Pr}(>\boldsymbol{t})$ for a t-distributed random variable with $d \boldsymbol{f}$ degrees of freedom. So, it gives us the probability in one tail of a t-distribution with $d \boldsymbol{f}$ degrees of freedom above the value $t$. Use this information (including knowledge of the t-statistic you calculated) to obtain the (two-sided) p-value associated with $\hat{\beta}_{1}$. Your answer here should be consistent with your answer to part (h).
(\#1) cont'd. j) Now estimate the regression in part (b) yourself and also estimate the model omitting lnavginc. Use the output to verify your answers to parts (d), (e), (f), (g), (h) \& (i). Stata notes: - The data set rental.dta contains data from two years, 1980 and 1990. - The variable "year" takes on of two values (80 or 90$)$. - To estimate the regression for the 1990 data only, type "if" after the last variable in the regression statement and then the expression "year $==90$ ". Note that there is no comma before "if": regress y $\times 1 \times 2 \times 3$ if year $==90$ - You should run the command once with the default significance level of 5\% (and 95\% confidence intervals). - Then run it again with the option (typed after a comma): , level (90) to get output corresponding to the $10 \%$ significance level and $90 \%$ confidence intervals. k) Based on your regression output from part (j), which of the three variables (Inpop, Inavginc, pctstu) is statistically significant at each of the following levels. (List the variables in each case.) A. a $10 \%$ level? B. a $5 \%$ level? C. a $1 \%$ level? D. a $0.1 \%$ level?