QUA401 Assessment 2: Group Project
TfL Case study assignment
Deadline: 16th December 2020 at 23:59
Word count: 1500 to 2000,not including tables and appendices.
You are the new Director of Operations for Transport for London’s (TfL’s) ‘Santander’ Bike Hire Scheme. The docking machines all over London secure bikes that have been returned to the docks by customers, but they are then not in the optimum place at the start of each day for the customers to hire them again. Every night, TfL re-distributes bikes back to the docks where they are needed. The number of bikes hired each day is automatically recorded on each day and you have the data for each day’s hires between July 2016 and August 2020. In addition, daily temperature data has been included for each day of the recorded sample. Your job involves analysing this large dataset to address the project tasks outlined on the pages 2-4. Your research assistant has also supplied you with access to a real time interactive map of London, which she thought might be helpful to broaden the picture for you. This can be accessed via this link:
You are required to:-
- Work in groups of FOUR from WITHIN your own Webinar class. Please assemble your groups as soon as possible.
- IMPORTANT: YOU MUST SUBMIT THE FULL NAMES OF YOUR GROUP MEMBERS BY EMAIL TO THE TUTOR THAT TAKES YOU FOR YOUR WEBINAR CLASS NO LATER THAN 20th NOVEMBER OF THE RELEASE OF THIS BRIEFING. OTHERWISE THE TUTOR WILL ALLOCATE YOU TO A GROUP BY RANDOM SELECTION.
- Download the QUA401 Assessment 2 Excel Database workbook from the Assessment area on Blackboard. This is the project dataset.
- Answer all the questions in the project brief as clearly and fully as possible.
- You must use MS Excel to answer all the questions in the project brief. You must report on these findings in an Analysis Report submitted in MS Word.
MANDATORY SUBMISSION REQUIREMENTS
1. Analysis Report: This report must be in MS Word format and address the requirements of the project brief. It should include all relevant tables, calculations, and screenshots taken from your Excel workbook. The document should be presented in this format: Title page (including names of all group members); Introduction; Main body (addressing the project tasks outlined on pages 2-4); Conclusion. It should not contain the original data, and verbatim copying of the questions; only relevant content from the workbook is needed. Please ONLY SUBMIT ONE MS WORD REPORT PER GROUP.
2. Excel Workbook: You must submit the workbook containing your tables, calculations and formulae, in conjunction with Analysis Report. Please ONLY SUBMIT ONE MS EXCEL WORKBOOK PER GROUP. We will not accept your submission without this workbook. We need it to validate and verify your findings. Please ensure that you show all your full workings here otherwise you will lose marks even though your answers may be correct.
3. Student Contribution: IMPORTANT: Assessment grading for this Group Project will be shared equally amongst all members. You will all receive the SAME GRADE for the work. We require you nevertheless to clearly indicate by name which parts of the project you may have individually been responsible for.
***Important Note*** Please submit both deliverables to the same submission point in Blackboard. Although you are submitting your excel workbook as well, it is crucial that we can fully assess your work based solely on your Analysis Report. We should not need to access your workbook to understand it. In summary, your MS Word submission should stand up, in its own right, as a fully complete report.
- From the full set of 1494 data points supplied, calculate: (i) the mean (average) and median daily number of bikes hired; (ii) the standard deviation number of bikes hired; and (iii) to nearest whole number, the 95% confidence interval for the population mean of bike hires, assuming the standard derived in a (ii) above? [10%]
- Using the full set of data supplied from July 2016 – August 2020, generate a Histogram to show the distribution of the Number of Bicycles hired over this period. You are recommended to use the Data Analysis add-in available within Excel to do so. What do you observe about the nature of this distribution? Do you think that Number of Bicycles is normally distributed? What reasons can you suggest that might explain the pattern of this distribution? [10%]
- Why is the assumption that data is normally distributed data so important for data analysis? [5%]
- Using the last three weeks of data in the dataset (shown below), use the
Forecast.Linear function to predict the bicycle usage figures (Number of Bicycles) for the next 7 days, based on linear relationship with Date. [5%]
|No.||Season||Days||Date||Temperature||Number of Bicycles|
- Comment on your results. How might this data be used to manage allocation of works and resources within TfL? [5%]
- Research the nature and use of the RANDBETWEEN function in Excel. You can discover this using the Help function from within any Excel spreadsheet. Guidance is also available on the Assessment 2 Excel Database. Use this function to create a ‘generated’ random sample of 500 values based on the minimum and maximum values in the ‘Number of Bicycles’ column. You are reminded that every time a group creates a random sample it will, by definition, be different! [5%]
- Using this sample, calculate the mean, median, and sample standard deviation. Comment on the difference, if any, between these measures and the same applied to the main dataset (in task 1(a)). [10%]
Irrespective of whatever conclusions you came to in your answer to Tasks 1(a) and 1(b) above, for this task you must assume that the Bicycles hired per day are normally distributed.
The origins of the London (TfL’s) ‘Santander’ Bike Hire Scheme goes back to July 2010. Please read the following article from the BBC website earlier this year:
TFL believes the number of daily bike hires since the scheme started on 31.7.2010 until last summer (August 27, 2020) is normally distributed with a mean of 25,941.87 and standard deviation of 9,527.853, approximately. Use this data in the tasks below that follow.
- Calculate the probability that the daily number of bike hires lies in the range between 20,000 and 40,000. [5%]
- On July 9th 2015, when the entire Tube network was closed because of a strike, bikes were hired 73,094 times, making it the busiest day since the scheme began. The day which had the fewest hires was 19 December 2010 with just 2,764. Presumably it was cold and miserable that day! Calculate the probabilities of more than 70,000 hires or less than 2,700. [5%]
- In more recent years TfL has faced increased costs of operating the bike hire service as well as the losses incurred from damage and theft of bicycles. Under current austerity measures they are also facing major cuts to their budget. They have also had increasing problems from bicycles not being returned to original locations soon enough. Internal secret policy minutes of TfL leaked to the London Evening Standard newspaper recently have indicated that TfL fears that within five years it will not be able to meet its target of meeting 99% of the demand for bicycle hire. Calculate the maximum number of daily bike hires TfL can satisfy in order to not breach this target. What are the implications of this for TfL? [5%]
For this task, you must perform analyses on (i) the whole dataset of 26/07/2016 to 27/08/2020, and (ii) a subset of the last 365 days of the dataset from 29/08/2019 to 27/08/2020:
- Using correlational analysis, establish whether there is a relationship between the temperature and the number of bikes hired for: (i) the whole dataset and (ii) the 365-day subset. Visualise both of these relationships in Excel using scatter plots. Insert these scatter plots on separate worksheets and maximise their size for visual clarity. Comment on the difference, if any, in the respective correlation coefficients [10%]
- Using Regression Analysis, further explore the relationship between temperature and bicycle hires for both the whole and the 365-day datasets. Interpret the outcomes and produce an appropriate narrative to accompany the statistics:
(i) Comment on the Regression equations.
(ii) Comment on the R2 values and the Significance values (also known as the p-values).
(iii) Using your regression output for the 365-day dataset, state the formula that represents the equation of the regression line. Use this formula to predict the expected daily bicycle hires if the temperature were to rise to 33 degree Celsius.
(iv) Out of the two regression models, which do you think describes the data better? Support your answer. [15%]
- Which other data do you think would be useful to further aid your analyses? Where might this data be obtained from? Do you think that adding further variables to explain levels of daily bicycle hire will always give better predictive forecasts? [10%]