1. There is a formula for sample size n with given margin of error m and confidence level C for population proportion. n = 1 4 ( z? m) 2 What assumption is made for 1/4 in the formula above? Please give a 1-2 sentences brief explanation to your choice. (a). Random guess (b). Assume sample proportion in the future is 1/2 (c). m is half the length of confidence interval 2. Which one is NOT a linear regression models? Please give a 1-2 sentences brief explanation to your choice. (a). yi = 0 + exp (1xi) + ?i, i = 1, 2, ··· , n (b). yi = 0 + 1xi + 2×2 i + ?i, i = 1, 2, ··· , n (c). yi = 0 exp (xi) + 2×7 i + ?i, i = 1, 2, ··· , n 3. Suppose X and Y has linear correlation coecient r = 0.5, and there are 77 observations, what is the test statistic for the hypothesis test H0 : 1 = 0 vs. Ha : 1 6= 0 where 1 comes from the simple linear regression model below? Please give a 1-2 sentences brief explanation to your choice. Y = 0 + 1X + ? (a). Not enough information (b). 5 (c). 0.25 2 0.5 1.0 1.5 2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 yhat ehat 4. What can you see from the graph above? Please give a 1-2 sentences brief explanation to your choice. (a). None-constant variance (b). Dependent error terms (c). None-linearity 5. Which model is more possible to have smaller R2? Please give a 1-2 sentences brief explanation to your choice. A : Y = 0 + 1X1 + ? B : Y = ? 0 + ? 1X1 + ? 2X2 + ? ? where Y and X1 in model A and B are the same. (a). Not enough information (b). Model A (c). Model B 3 6. Suppose we have designed an experiment to compare the 5 di?erent treatment on the weight gain of rat’s liver, and µi is the population mean of the weight gain of rats under treatment i. Which one below is not a contrast? Please give a 1-2 sentences brief explanation to your choice. (a). µ2 = µ4 (b). 2µ3 µ4 = µ1 2µ2 (c). µ3 = (µ2 + 2µ4)/3 Problem 2 (18 points) Suppose we have a partial R output below. Please answer the questions and show your steps. Estimate Std. Error t value Pr(>|t|) (Intercept) — 0.1780 0.850 0.402 x -0.2552 — -1.418 0.166 Residual standard error: 1.006 on 30 degrees of freedom Multiple R-squared: 0.06284, Adjusted R-squared: 0.0316 (a). [3 points] What is the SSE of the model? (b). [3 points] What is the linear correlation coecient between x and y? (c). [3 points] Suppose we know ¯y is 1.44, what is ¯x? (d). [3 points] Find the P-value for H0 : 1 = 0.5 vs. Ha : 1 < 0.5. (e). [3 points][Bonus] What is the 95% CI for the ˆy at x = 3? (f). [3 points] Basing on the output, can you give a reasonable guess of the probability that the y at x = 2 is larger than 0.2. 4 Problem 3 (8 points) For example, if Y is annual income ($1000/year), X1 is educational level (number of years of schooling), X2 is number of years of work experience, and X3 is gender (X3 = 0 is male, X3 = 1 is female), then after a linear regression of the data collected and assume the estimated parameters are the true values, then we have Y = 15 + 0.8X1 + 0.5X2 3X3 + ? where ? ? N(0, 32) (a). [2 points] What is the average di?erence of annual income between women and men if their other conditions are the same? (b). [2 points] What is the average annual income of a female with 10 years of eduction and 10 years of working experience? What about a male of the same condition? (c). [2 points] What is the probability that a female with 16 years education and no work experience will earn more than $30,000/year? (d). [2 points] Suppose a female has 15 years of education, and she has 4 years of working experience, how many more years of working experience will make her expected annual income is no less than $28,000? Problem 4 (12 points) A fisheries biologist is interested in determining a set of optimal conditions for growing hatchery trout. The two factors that are most easily controlled at the hatchery are water temperature (A) and fungicide (B). The biologist designs an experiment consisting of 2 di?erent water temperatures, 3 di?erent levels of fungicide, and 5 observations on each of the temperature-fungicide combinations. The following summary data resulted on the response variable y= weight of a hatchery trout. Source of Variation df Sum Squares Mean Square F-value Water Temperature ( ) ( ) 72 ( ) Fungicide ( ) 18 ( ) ( ) Interaction ( ) ( ) 5 ( ) Error ( ) 100 ( ) 5 (a). [4 points] Filling in the missing entries above. (b). [5 points] Test for the null hypothesis of no interaction between water temperature and level of fungicide. If appropriate, perform tests of the main e?ects for the two factors: Water Temperature and Fungicide as well. Use significance level ? = 5%. (c). [3 points] What conclusion you will make basing on the analysis on part (b). Problem 5 (12 points) Rats were given one of four di?erent diets at random, and the response measure was liver weight as a percentage of body weight. There are 5 observations for Treatment 1, 6 for Treatment 2, 6 for Treatment 3 and 8 for Treatment 4. Below are the some R output of the analysis: > m<-lm(Liver~as.factor(Treatment),data=rat) > anova(m) Df Sum Sq Mean Sq F value Pr(>F) as.factor(Treatment) 3 (4) 0.192736 (2) — Residuals (1) (3) — > summary(m) Estimate Std. Error t value Pr(>|t|) (Intercept) 3.75 0.07688 48.720 1010µ4. 6 Problem 6 (10 points) Using the data in ‘P6 data.txt’ with a single response Y and three predictors X1, X2 and X3. (a). [4 points] Perform the backward and forward variable selection procedure using AIC as the criterion for this data set. Please report your final models first, then show your R commands and related outputs. (b). [3 points] In this specific case, which variable selection procedure (backward or forward) is better? Why? (c). [3 points] Find the AIC, BIC and Cp for the model lm(Y ? X1 + X3). Problem 7 (42 points) Pine oleoresin is obtained by tapping the trunks of pine trees. Tapping is done by cutting a hole in the bark and collecting the resin that oozes out. This experiment compares four shapes for the holes and the ecacy of acid treating the holes. Twentyfour pine trees are selected at random from a plantation, and the 24 trees are assigned at random to the eight combinations of whole shape (circular, diagonal slash, check, rectangular) and acid treatment (yes or no). The response (y) is total grams of resin collected from the hole. Data is available in ‘resin.csv’. (a). [2 points] Summarize the data graphically (interaction plots and profile plots) and tell what you can see from the graphs. (b). [2 points] Build up a two-way ANOVA analysis containing all the main e?ects and the interaction e?ects. Are all the e?ects are significant? (c). [7 points][Bonus] If some e?ects are not significant, what does that mean? Please show how to understand the p-value in the first row of your ANOVA table. I know you can find the general answer for it somewhere, and it is not good enough since I need you to specify your understanding on this specific case (the specific test, specific sample size, specific population and so on). (d). [3 points] Please check with all the assumptions for this two-way ANOVA analysis(model in part (b)) graphically and give specific comments to each of the assumptions. 7 (e). [3 points] Do you think a transformation of the data is necessary? If yes, how? If no, why? (f). [3 points] Basing on the model suggested from part (e), please simplify the model by dropping the insignificant terms (you need to check with the assumptions whenever you have a new/di?erent model). (g). [2 points] Please draw conclusions of your two-way ANOVA analysis. (h). [20 points] Now, you have all the analysis of this problem. Please compile a brief statistic report to show the whole idea. You can follow the format of the example report. The report should be no more than 3 pages (including all the essential graphs). Hint: the total 20 points contain three parts: 1. conclusion(5 points); 2. way of analysis(8 points); 3. format(7 points).
Don't use plagiarized sources. Get Your Custom Essay on
There is a formula for sample size n with given margin of error m and confidence level C for a…