For this assignment, you will work on a real data science project. The goal of the project is to go through the complete data science process to answer questions you have about some topic of your own choosing. You will acquire the data, design your visualizations, run statistical analysis, and communicate the results.
There are a few milestones for your final project.
Date | Assignment |
---|---|
November 10 | Project proposal |
December 01 | Project update |
December 15 | Project report |
December 16 | Slides |
December 16 | Project presentation |
Note that:
The proposal should be received by November 10, 2020.
The update (HW4) should be received by December 01, 2020.
The final report is composed of two parts:
The most important deliverable of your project is the set of RMarkdown and compiled HTML files by December 15, 2020. They should detail your steps in developing your solution, including how you collected the data, alternative solutions you tried, describing statistical methods you used, and the insights you got. Equally important to your final results is how you got there! Your RMarkdown and HTML files are the place you describe and document the space of possibilities you explored at each step of your project. We strongly advise you to include many visualizations.
Your RMarkdown should include the following topics. Depending on your project type, the amount of discussion you devote to each of them will vary:
Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
Related Work: Anything that inspired you, such as a paper, a website, or something we discussed in class.
Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?
Data: Source, scraping method, cleanup, etc.
Exploratory Analysis: What visualizations did you use to look at your data in different ways? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?
Modeling: What are the different statistical methods you considered? Why did you choose a given model? How about competing approaches?
Final Analysis: What did you learn about the data? How did you answer the questions? How can you justify your answers?
As this will be your only chance to describe your project in detail, make sure that your RMarkdown file and compiled HTML file are standalone documents that fully describe your process and results. For instructions on how to submit, please see Submission Instructions below.
As a side note, the following scale will be used to grade projects:
We expect you to write high-quality and readable R code in your RMarkdown file. You should strive for doing things the right way and think about aspects such as reproducibility, efficiency, cleaning data, etc. We also expect you to document your code.
Each team will prepare a seven minutes presentation showing a demo of your project and/or some slides.
We will strictly enforce the time limit, so please make sure you are not running longer. Use principles of good storytelling and presentations to get your key points across. Focus the majority of your presentation on your main contributions rather than on technical details. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away? Make sure it is upfront and center rather than at the end.
We use the following criteria to evaluate oral presentations:
The project presentations will be during class on December 16, 2020, and slides should be submitted by the same day. For instructions on how to submit, please see Submission Instructions below.
Fill in this form.
TBA
TBA
TBA