# Tests your basic sample analysis skills, using spreadsheet software.

This coursework tests your basic sample analysis skills, using spreadsheet software. You may use Minitab or Excel at your preference. Your answers are to be presented in an essay/report format, for which you will use a word processor. In writing your report, please:

1) state and explain all assumptions, on which your answers are based;

2) clearly indicate your answer/recommendations

3) support any answers with the appropriate calculations to arrive at the answer

4)no evidence of use of excel will result in a fail mark for this assignment and therefore the coursework component of the module;

5) include selected printouts of formulae underlying computed values. Failure to demonstrate you have created appropriate formulations on excel will be severely penalised. Despite the fact that you will be submitting the Excel file as well, your report is a stand-alone document, meaning a reader should not be required to look at the Excel file to understand your analysis, findings and recommendations

6) please note that adequate usage of the excel calculations in the report is important. This means that the key data/findings need to be included in the report and appropriate referencing needs to be done, i.e. the relevant cell/table/range in the relevant tab of the excel file mentioned at the point of the report when it should be consulted.

The report will have a maximum of 5 pages (including any Appendixes; penalties will be applied for longer submissions – you are required to develop your judgement on what is and isn’t important). The file contacting the data for this coursework is posted on Moodle. There are two excel sheets in the file, use the data contained in the first sheet to answer question 1 and use the data contained in the second sheet to answer question 2.

Question 1.

A group of 44 students was asked to guess, to the nearest metre, the width of the lecture hall in which they were sitting. The true width of the hall was 13.1 metres. The students’ guesses are reported in the first excel spreadsheet in the file “Coursework dataset”. Answer the following questions:

a) Create a frequency table of the values of the guesses. Also, include relative

frequencies (percentages) and cumulative percentages in the table. Comments on your findings.

b) Calculate the sample mean, median and standard deviation of the guesses.

Comment on the characteristics of the sample of guesses.

c) Compute the sample skewness and kurtosis of the guesses and comment on the characteristic of the sample that you can learn from these statistics.

Question 2.

The second spreadsheet of the “Coursework dataset” file contains a sample of wealth data for the UK population. The data is presented in a grouped format. Please answer the following questions regarding this sample:

a) Discuss what would be the best way to present the data graphically.

b) Construct a histogram of wealth and comment on wealth distribution.

c) Compute the relevant measures of central tendency (hint: you’ll need to make an assumption regarding the size of the final class) and a measure of variation for this data.

d) Discuss and justify which of the metrics of central tendency above would be more appropriate to describe the data.

e) Compute the first and the third quartile of wealth and construct a boxplot. Comment on your results.

f) Compute the sample skewness and kurtosis and comment on the features of the sample distribution. Is it symmetric?

g) Calculate the proportion of population with a wealth above £100,000.

h) Calculate the probability that an individual selected at random will aheva wealth of no more than £ 50,000.

Question 3.

The third spreadsheet of the “Coursework dataset” file contains data on the salaries of employees in a bank, as well as some other variables which might have an impact on the employee’s salaries. Answer all the following questions:

a) Draw a histogram of SALARY. What does it tell you?

b) For each variable in the data set explain which type of variable it is (categorical, numerical…)

c) Calculate the sample mean, median, standard deviation and first and third quartiles of all the variables in column B to F. Also, draw boxplots of the variables side by side. What can you infer?

d) Draw a scatterplot with SALARY on the horizontal (X) axis and one of the other variables on the vertical (Y) axis. Do this for all the variables in columns B to G. Comment on your findings.

e) Prepare a summary table with the correlations between the variables and discuss the extent of these correlations.

You will need to submit a Word document with the report (see instructions above) and an Excel file with the calculations.

Question weighting:

Question 1 = 10%

Question 2 = 40%

Question = 50%.