Data Science & Knowledge Discovery: Final Project:
In the third phase of the project, we will focus on modeling the data and evaluation of the model. You may employ any of the open source libraries (and programming languages) to perform analysis.
Note that you need to go beyond exploratory data analysis and perform data mining using any of the supervised/unsupervised learning approaches. You also need to compare a couple of simple modeling approaches (such as decision tree or kNN) or perform detailed analysis with a complex model (such as ANN) and analyze results with multiple metrics.
• Select and apply one or more modeling techniques (depending on the model complexity, multiple different models should be employed)
• Calibrate model settings to optimize results
• If necessary, additional data preparation may be required
• Explain why you selected a model to others
• Explain the setting parameters you chose (high level description only)
• Describe the first results and eventually the adjustments you made
• Describe eventual adjustments you made back to the data
• Describe final results
• Evaluate one or more models for effectiveness
• Determine whether defined objectives achieved
• Make a decision regarding data mining results before deploying to the field
• Supervised learning models can be evaluated using part of the data you have (discuss different evaluation strategies and their results)
• Describe the results, using reliable metrics (e.g.: error/confusion matrix)
• If the model is unsupervised (no data for testing), evaluate your data using a relevant performance evaluator
• Read the results with business sense and provide your comments
The report needs to be at least 500 words long, excluding figures, tables, lists, etc. Your writing should be clear, engaging, technically sound, and written in an academic style.