Introductory Statistics Unit

HOW TO PRESENT YOUR ASSIGNMENT
Answers should be typed directly into this word document.. The assignment must be uploaded to the iLearn link as a pdf document.
SUBMISSION INSTRUCTIONS
This assessment must be submitted as a pdf file via iLearn. One assignment should be submitted (by any group member) per group. Hard copies will not be accepted or marked. If you cannot access or view the submission via iLearn, you have not correctly submitted your work and will need to do so prior to the due date. (If you require assistance submitting through iLearn, you may lodge a OneHelp Ticket or refer to the IT help page.)
LATE SUBMISSIONS
Late submissions will be penalised at 20% per day up to 4 days (weekend inclusive). If a student submits an assessment task 5 or more days a after the due date without grounds for special consideration (See Disruptions to Studies Policy) a record or submission will be made but the student will receive zero marks for the assessment task.
RETENTION OF RECORDS
Student are required to keep a copy of all items submitted or completed for the purpose of assessment or evaluation until the end of the grade appeal period.
Question 1 is to be jointly completed by the group. Question 2 has three parts. Each part should be completed by a different group member and then checked by all group members.
Question Print names of students who completed question. Signatures Marks Possible Marks Allocated
1
24
2a
12
2b 12
2c 12
Total 60
Question 1 (24 marks)
A researcher decided to investigate the relationship between student enrolment numbers and teaching staff in remote schools. The following Minitab output was obtained using information recorded on student enrolment numbers and teaching staff in 2014 at 25 very remote schools in Australia. Use this to answer parts a. and b.
Figure i:
Figure ii: Figure iii:
a. Comment on the validity of each the three assumptions for linear model by referring to each of the plots above:
i:
ii:
iii:
b. The largest residual was for Goodooga Central School which had the equivalent of 9.4 teaching staff in 2014 and 35 enrolments in 2014. Use this information, and the Fitted Line Plot above, to calculate the residual for Goodooga Central School and explain clearly what is unusual about Goodooga Central School.

Question 1 continued
The following output was obtained using information on 24 of the very remote schools, (excluding Goodooga Central), from 2014. Use this information to answer part c.
Regression Analysis: Teachers 2014 versus Enrolments 2014
Model Summary
S R-sq R-sq(adj) R-sq(pred)
1.04378 88.27% 87.73% 86.39%
Coefficients
Term Coef SE Coef T-Value P-Value
Constant 0.504 0.383 1.31 0.203
Enrolments 2014 0.09146 0.00711 ****** ******
c. Why do you think that Goodooga Central school was excluded from this analysis?
d. Use an appropriate hypothesis test to determine whether the number of student enrolments at very remote schools in 2014 was a useful predictor of the number of teaching staff:

Question 1 continued
A previous study of the 24 schools from parts c. and d. had been conducted in 2008. The following output was obtained using some of the information recorded on these 24 schools in 2008. Use this output, together with your answers to parts c. and d. to answer the remainder of this question.
Regression Analysis: Teachers 2008 versus Enrolments 2008
Model Summary
S R-sq R-sq(adj) R-sq(pred)
1.52778 78.59% 77.61% 73.91%
Coefficients
Term Coef SE Coef T-Value P-Value
Constant 0.764 0.577 1.32 0.199
Enrolments 2008 0.08825 0.00982 8.99 0.000
d. In 2008, did the school with the highest number of student enrolments also have the highest number of teachers?
e. How many students (approximately) were enrolled at the school with the highest number of teachers in 2008?
f. Explain why the model for predicting teaching staff in 2014 will give more reliable predictions than the model for predicting teaching staff in 2008.
g. Make the following predictions, if possible. If not, explain why your predictions would not be valid.
i. Predict the number of teachers in 2008 at a very remote Australian school which had 50 students enrolled.
ii. Predict the number of teachers in 2014 at a very remote Australian school which had 50 students enrolled.
iii. Predict the number of teachers in 2014 at a very remote Australian school which had 150 students enrolled.

Question 2 (36 marks)
An educational researcher carried out a study into standardised examinations for high school students. 150 Year 7 students were randomly selected from schools across Australia in 2011 and various information was recorded on these students over a four year period. Some of the variables recorded were:
Variable Description
ID Student ID
School_Type School type attended by student:
1 = Government
2 = Catholic 3 = Independent
School_Enrolment School enrolment of school attended by student:
1 = Co-educational, 2 = Girls, 3 = Boys
Maths_Year7 Student’s mark on a standardised Mathematics exam in Year 7
Maths_Year11 Student’s mark on a standardised Mathematics exam in Year 11
English_Year7 Student’s mark on a standardised English exam in Year 7
English_Year11 Student’s mark on a standardised English exam in Year 11
The data are stored in the Excel file: ExamMarks.xlsx which is on iLearn. Use this data file to complete Question 2.
Each part of this question must be labelled and must be presented in order and must be neatly word processed. Untidy work wll not be marked. Each part (a., b. and c.) should be answered by a different student from your group. Each part of this question uses a different section of the data set which should be extracted (either by sorting the data or splitting the worksheet). Each part asks about the difference between average marks for a particular group of students. Since the worksheet contains data on both English and Mathematics marks, each part should address both English and Mathematics marks (ie. one hypothesis test on English marks and one hypothesis on Mathematics marks for each part). Minitab output for these analyses, along with any other appropriate graphical output, should be cut and pasted neatly into the space provided for each part. Please note that output that is not clearly labelled will not receive any marks (eg. Group 1 is not a clear label whereas Co-educational School is a clear label). On the following page provided for each part, a report on these analyses should be written up. One report only is required for each part, with the report addressing both English and Mathematics scores in regard to the Reseach Question. Reports should follow the format described in the report writing document on iLearn, with particular attention to the section ‘A Short Guide to Report Writing for STAT170 students’. Each report will have an Introduction, a Methods section, a Results section and a Conclusion as outlined in the report writing document. Reports should not be more than one A4 page in length.
Research Questions for Q2:
a. Amongst students attending boys only schools, did the average marks on standardised exams change significantly between Years 7 and 11?
b. Amongst Year 11 students, were the average marks on standardised exams for students at Government schools different to the average marks for students at Independent schools?
c. Amongst Year 7 students at were the average marks on standardised exams for for students at boys only schools different to the average marks for students at girls only schools?

Part a Output:

Part a Report
Introduction:
Methods:
Results:
Conclusion:

Part b Output:

Part b Report
Introduction:
Methods:
Results:
Conclusion:

Part c Output:

Part c Report
Introduction:
Methods:
Results:
Conclusion: