Prescriptive Analytics In this section you will form data-driven recommendations using
your findings from Section 1 and Section 2 for your two clients

items to be Submitted R scripts, R environment files, modified CSV files, Interactive app for section 3
(optional), documentation. Individual or Group Assessment ☒ Individual ☐ Group Module Convenor
Office Hours/Opportunities for Advice and Feedback Workshops are dedicated to answering all
questions related to the course. E.g., worksheet exercises, R workshops, understanding the material,
and understanding the assignment. Contact the module convenor if there are any issues that cannot
be solved in the workshops. 1. What is the Purpose of This Assessment? The general aim of this
coursework is for students to apply descriptive, predictive, and prescriptive analytics (supported by
visualisation techniques) in order to explore data, as well as develop models, which ultimately
contribute to data-driven decision-making for two different problem domains and data sets (LO1,
LO2, LO4). Students are expected to document and present their findings in a 4,000 word report that
is worth 100% of their grade. All data, as well as the saved R scripts that highlight data manipulation
and management must be stored and submitted along with the documentation (LO3). In more
detail, the documentation should be split into three sections: Section one requires the use of
inferential statistics and dimension reduction techniques in order to extract components from a
survey and use the component scores in several analyses (LO1). Students are expected to critically
analyse their results, manage and manipulate the data (LO3), as well as illustrate their findings using
visualisation techniques (LO2). Section two requires students to train at least two types of machine
learning algorithms (or regression models) in order to support data-driven decision making (LO4).
Section three requires students to report on the results of sections one and two in a document that
presents the findings for a layman audience, with explicit recommendations based on the analyses
(LO4). This section needs to be particularly rich in visualisation (LO2)
The following table shows which of the module learning outcomes are being assessed in this
assignment. Use this table to help you see the connection between this assessment and your
learning on the module. Module Learning Outcomes Being Assessed LO1: Select and use appropriate
statistical tools to analyse data LO2: Demonstrate effective use of data visualisation techniques LO3:
Formulate data management strategies for business data analytics LO4: Critically analyse a problem
domain and apply the data analytics approach to support data-driven decision making 2. What is the
Task for This Assessment? Task (attach a separate briefing document if required) See briefing
document 3. What is Required of Me in this Assessment? Guidelines/Details of How to Prepare Your
Submission See briefing document The Assessment Criteria to be Used for Marking This Piece of
Work Refer to the marking criteria rubric at the end of this document. Self-Regulation: Make sure
That You… Include all your R script, so keep a log of what you’re doing so you don’t have to repeat
yourself. Make sure to add comments in order to highlight what you are doing in each step. The
main report document will be submitted separately to the data files (there will be two different
submission entries on Blackboard). Three Key Pieces of Advice Based on the Feedback Given to the
Previous Cohort who Completed This Assignment N/A For Group Work Only: Elements of Group
Working: ☐ Classroom Briefing by Module Convenor ☐ Regular Meetings of All Team Members ☐
Record and Keep Evidence of Meetings (agenda/minutes) ☐ Record Attendance and Member
Contributions ☐ Team Reflection Document ☐ Submit Peer Assessment Required Formatting
Guidelines Use Harvard Style Formatting Word Limit/Guidance and Penalty Applied The absolute
word limit for the documentation is 4000 words. The suggested number of words for each section is
1000 words for section 1, 1200 words for section 2, and 1800 words for section 3. This is not a strict
requirement as long as you stay within the absolute word limit. Tables count towards the word limit,
but figures do not. R script, which is handed in separately does not count towards the word limit.
Referencing Style Use Harvard Style Referencing. Guidance on Academic Misconduct (including using
Turnitin practice area) The work you produce must be your own or that of members of your group if
it is a group You are encouraged to put a draft of your work through the Turnitin practice area to
satisfy yourself that the work is your own original work. You can find this in your module area on
Blackboard. You can seek advice from the Module Convenor or your Programme Administrator. 4.
What Resources Might I Use to Prepare My Work? Data analytics is an extremely popular field of
study. As such you will have no problems finding information from various sources. The lecture
slides, worksheets, R workshops, videos, and audio information that I provide should be your
starting point. From there I recommend you look at books (your textbook, as well as Field et al.,
2012, which is the other recommended textbook), online tutorials (particularly for examples in R),
journal and conference papers (the last two are particularly important in order to understand how
we present results of the analysis)
Section 1 – Descriptive Analytics Introduction Case study: Student satisfaction is a KPI for most, if not
all higher education institutes. There are a range of reasons why students may or may not be
satisfied with their courses. The Turkiye Student Evaluation Datasets gives us a small insight into the
complexities that drive student experience; you have been hired by a company called HigherEdCo
ltd. as a higher education consultant to perform several multivariate analyses that will indicate the
factors that impact student experience (according to the data collected). Your analysis should be
approached critically, and variable as well as method selections should be justified. You MUST
reduce the dimensionality of this dataset. The dataset you will be working with is the Turkiye
Student Evaluation Data Set (Gunduz & Fokue, 2013). The dataset is made up of the following
variables:
instr: Instructor’s identifier; values taken from {1,2,3} class: Course code (descriptor); values taken
from {1-13} repeat: Number of times the student is taking this course; values taken from {0,1,2,3,…}
attendance: Code of the level of attendance; values from {0, 1, 2, 3, 4} difficulty: Level of difficulty of
the course as perceived by the student; values taken from {1,2,3,4,5} Q1: The semester course
content, teaching method and evaluation system were provided at the start. Q2: The course aims
and objectives were clearly stated at the beginning of the period. Q3: The course was worth the
amount of credit assigned to it. Q4: The course was taught according to the syllabus announced on
the first day of class. Q5: The class discussions, homework assignments, applications and studies
were satisfactory. Q6: The textbook and other courses resources were sufficient and up to date. Q7:
The course allowed field work, applications, laboratory, discussion and other studies. Q8: The
quizzes, assignments, projects and exams contributed to helping the learning. Q9: I greatly enjoyed
the class and was eager to actively participate during the lectures. Q10: My initial expectations
about the course were met at the end of the period or year. Q11: The course was relevant and
beneficial to my professional development. Q12: The course helped me look at life and the world
with a new perspective. Q13: The Instructor’s knowledge was relevant and up to date. Q14: The
Instructor came prepared for classes. Q15: The Instructor taught in accordance with the announced
lesson plan. Q16: The Instructor was committed to the course and was understandable. Q17: The
Instructor arrived on time for classes. Q18: The Instructor has a smooth and easy to follow
delivery/speech. Q19: The Instructor made effective use of class hours. Q20: The Instructor
explained the course and was eager to be helpful to students. Q21: The Instructor demonstrated a
positive approach to students. Q22: The Instructor was open and respectful of the views of students
about the course. Q23: The Instructor encouraged participation in the course. Q24: The Instructor
gave relevant homework assignments/projects, and helped/guided students. Q25: The Instructor
responded to questions about the course inside and outside of the course. Q26: The Instructor’s
evaluation system (midterm and final questions, projects, assignments, etc.) effectively measured
the course objectives. Q27: The Instructor provided solutions to exams and discussed them with
students. Q28: The Instructor treated all students in a right and objective manner. It’s up to you to
choose your independent and dependent variable(s), as well as the tests you will run. However,
everything you do needs to be justified, i.e., you need to explain why you chose to use that
particular test, why you treated a certain variable as e.g., categorical, and why you transformed
variables (if applicable). In short, a good project will critically analyse the results obtained and
identify its limitations. The more detailed and exhaustive your analysis, the more likely you are to
score a high grade (see marking scheme). Make sure to include figures and tables to support your
finding
Expected Project Output In the end you need to submit a section with the following headings: 1.
Introduction (briefly what your aim was for the analysis and your research question) 2. Process
(what types of statistical testing did you use to answer the research question and the rationale for
using said methods). 3. Results (the results of all the analyses, including figures, tables, and test
outputs). The output needs to be written for an academic audience. You must also submit: • Your
new csv file with any new variables you extracted/modified from the data. You must name this
exploratory.csv • Your R script that shows, with comments, step by step the process you took to
analyse the data. You must name this exploratory.R • Anything else that you feel is relevant is
welcome (but not required)
Section 2 – Predictive Analytics Case study: You have been hired as a consultant to provide datadriven recommendations to the marketing department of the German-Hellenic bank. The bank has
supplied you with anonymised data (the data we will be using has been supplied by Moro et al.
(2014), and can be found in the UCI website). Here is a list of the variables: Input variables: # bank
client data: 1 – age (numeric) 2 – job : type of job (categorical:
‘admin.’,’blue collar’,’entrepreneur’,’housemaid’,’management’,’retired’,’self employed’,’services’,’s
tudent’,’technician’,’unemployed’,’unknown’) 3 – marital : marital status (categorical:
‘divorced’,’married’,’single’,’unknown’; note: ‘divorced’ means divorced or widowed) 4 – education
(categorical:
‘basic.4y’,’basic.6y’,’basic.9y’,’high.school’,’illiterate’,’professional.course’,’university.degree’,’unkn
own’) 5 – default: has credit in default? (categorical: ‘no’,’yes’,’unknown’) 6 – balance: Account
balance 7 – housing: has housing loan? (categorical: ‘no’,’yes’,’unknown’) 8 – loan: has personal loan?
(categorical: ‘no’,’yes’,’unknown’) # related with the last contact of the current campaign: 9 –
contact: contact communication type (categorical: ‘cellular’,’telephone’) 10 – month: last contact
month of year (categorical: ‘jan’, ‘feb’, ‘mar’, …, ‘nov’, ‘dec’) 11 – day_of_week: last contact day of the
week (categorical: ‘mon’,’tue’,’wed’,’thu’,’fri’) 12 – duration: last contact duration, in seconds
(numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then
y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is
obviously known. Thus, this input should only be included for benchmark purposes and should be
discarded if the intention is to have a realistic predictive model. # other attributes: 13 – campaign:
number of contacts performed during this campaign and for this client (numeric, includes last
contact) 14 – pdays: number of days that passed by after the client was last contacted from a
previous campaign (numeric; 999 means client was not previously contacted) 15 – previous: number
of contacts performed before this campaign and for this client (numeric) 16 – poutcome: outcome of
the previous marketing campaign (categorical: ‘failure’,’nonexistent’,’success’) # social and economic
context attributes Output variable (desired target): 17 – y – has the client subscribed a term deposit?
(binary: ‘yes’,’no’) Note that variable 12 should be discarded. The original data set has outcome
variable 17 as the desired output. However, you do not necessarily need to focus on this variable.
You are expected to explore other relationships in the data and present interesting findings to your
client (hint: look at balance for example). You should build multiple models for comparisons, but
present two final
models on two different outcome variables. All actions taken need to be critically analysed and
justified. The more detailed and exhaustive your analysis, the more likely you are to score a high
grade (see marking scheme). Your outcome variables can be categorical, continuous, or a mix of
both (i.e., one model as a classification model, one as a regression model). Expected Project Output
In the end you need to submit a section with the following headings: 1. Introduction (briefly what
your aim was for the analysis, along with your research question) 2. Process (what types of models
did you use to answer the research question and the rationale for using said modelling techniques).

  1. Results (the results of the analyses and the models, including model performance and model
    comparisons). All the output needs to be written for an academic audience. You must also submit: •
    Your new csv file with any new variables you extracted/modified from the data. You must name this
    analysis.csv • Your R script that shows, with comments, step by step the process you took to analyse
    the data. You must name this analysis.R • A R shiny app or anything else you feel is relevant
    (optional)
    Section 3 – Prescriptive Analytics In this section you will form data-driven recommendations using
    your findings from Section 1 and Section 2 for your two clients. You can expect your audience to be a
    layman audience with little to no understanding of statistics and modelling. Therefore, unlike the
    results is section 1 and 2, your report needs to be written in such a way that a layman audience can
    understand it. Ultimately you need to make a convincing argument that states how your client
    should proceed based on the results of your findings. You are expected to use a critical approach by
    using the results obtained to both generate recommendations and identify limitations. You can
    include an interactive Shiny R app, which is optional but will increase your likelihood of delivering a
    more robust solution. Expected Project Output In the end you need to submit a document with the
    following headings: 1. Client: HigherEdCo ltd. – Executive summary 2. Aims and Objectives 3. Analysis
  2. Recommendations 5. Limitations 1. Client: German-Hellenic Bank. – Executive summary 2. Aims
    and Objectives 3. Analysis 4. Recommendations