Project

For this assignment, you will work on a real data science project. The goal of the project is to go through the complete data science process to answer questions you have about some topic of your own choosing. You will acquire the data, design your visualizations, run statistical analysis, and communicate the results.

Project Milestones

There are a few milestones for your final project.

Date Assignment
November 10 Project proposal
December 01 Project update
December 15 Project report
December 16 Slides
December 16 Project presentation

Note that:

  • No extensions will be given for any of the project due dates except for medical reasons.
  • Projects submitted after the final due date will not be graded.

Project proposal

The proposal should be received by November 10, 2020.

  • The title can be changed at a later date.
  • Each team (or individual if working alone) only needs to submit one proposal.
  • At this stage, we reserve the right to reject a project proposal if it is not judged satisfactory. But the goal is mostly to set you on a path to succeed!

Project udpate

The update (HW4) should be received by December 01, 2020.

  • Make sure that the title is final at this points.
  • Each team (or individual if working alone) only needs to submit one update.
  • At this stage, we reserve the right to schedule a meeting to provide additional guidance if the update is not judged satisfactory.

Project report

The final report is composed of two parts:

  • RMarkdown and compiled HTML files describe the project in details.
  • The slides from your presentation.

RMarkdown and compiled HTML

The most important deliverable of your project is the set of RMarkdown and compiled HTML files by December 15, 2020. They should detail your steps in developing your solution, including how you collected the data, alternative solutions you tried, describing statistical methods you used, and the insights you got. Equally important to your final results is how you got there! Your RMarkdown and HTML files are the place you describe and document the space of possibilities you explored at each step of your project. We strongly advise you to include many visualizations.

Your RMarkdown should include the following topics. Depending on your project type, the amount of discussion you devote to each of them will vary:

  • Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.

  • Related Work: Anything that inspired you, such as a paper, a website, or something we discussed in class.

  • Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?

  • Data: Source, scraping method, cleanup, etc.

  • Exploratory Analysis: What visualizations did you use to look at your data in different ways? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?

  • Modeling: What are the different statistical methods you considered? Why did you choose a given model? How about competing approaches?

  • Final Analysis: What did you learn about the data? How did you answer the questions? How can you justify your answers?

As this will be your only chance to describe your project in detail, make sure that your RMarkdown file and compiled HTML file are standalone documents that fully describe your process and results. For instructions on how to submit, please see Submission Instructions below.

As a side note, the following scale will be used to grade projects:

  • Overview, motivation, related work, research questions: 10/100
  • Data sources and description: 10/100
  • Exploratory data analysis: 35/100
  • Answering the research questions and final analysis: 35/100
  • Aesthetics, english writing, project organization, code quality: 10/100
  • Additionally, you have the opportunity to earn bonuses of up to 20/100 (e.g., by using interactive visualization or presenting your work in especially interesting ways, for instance by complementing your report with a website generated with markdown).

Code

We expect you to write high-quality and readable R code in your RMarkdown file. You should strive for doing things the right way and think about aspects such as reproducibility, efficiency, cleaning data, etc. We also expect you to document your code.

Project presentation

Each team will prepare a seven minutes presentation showing a demo of your project and/or some slides.

We will strictly enforce the time limit, so please make sure you are not running longer. Use principles of good storytelling and presentations to get your key points across. Focus the majority of your presentation on your main contributions rather than on technical details. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away? Make sure it is upfront and center rather than at the end.

We use the following criteria to evaluate oral presentations:

  • Content (4/20)
  • Organization (4/20)
  • Teamwork (4/20, project done individually automatically earn the 4 points)
  • Visuals (4/20)
  • Presentation mechanics (4/20)

The project presentations will be during class on December 16, 2020, and slides should be submitted by the same day. For instructions on how to submit, please see Submission Instructions below.

Submission Instructions

How to submit the project proposal (due November 10, 2020)

Fill in this form.

How to submit the project udpate (due December 01, 2020)

TBA

How to submit the data, RMarkdown and compiled HTML files (due December 15, 2020)

TBA

How to submit the presentation slides (due December 16, 2020)

TBA

Grading

  • While the project proposal is not part of the grade, it is MANDATORY to proceed.
  • The project update, namely HW4, is worth 10% of the overall course grade.
  • The report and the presentation represent 60% of the overall course grade:
    • 50% for your RMarkdown files and the knitted report. This includes the quality of your data analysis and R code, the complexity and level of difficulty of your project, completeness and overall functionality of your analysis.
      • 10% for your presentation and the quality of its storytelling aspects.

Data sources / project ideas

comments powered by Disqus