Skip to content

Data Analysis Process

This process will help you understand, explore and use your data intelligently so that you make the most of the information you're given.

Five steps:

  • Question
  • Wrangle
  • Explore
  • Draw conclusions
  • Communicate.

Question

The data analysis process always starts with asking questions. Sometimes, you're already given a data set and glance over it to figure out good questions to ask. Other times, your questions come first, which will determine what kinds of data you'll gather later.

In both cases, you should be thinking:

  • what am I trying to find out?
  • Is there a problem I'm trying to solve?

Example:

  • What are the characteristics of students who pass their projects?
  • How can I better stock my store with products people want to buy?

In the real world, you often deal with multiple sets of massive amounts of data, all in different forms. The right questions can really help you focus on relevant parts of your data and direct your analysis towards meaningful insights.

Wrangle

Once you have your questions, you'll need to wrangle your data to help you answer them. By that, I mean making sure you have all the data you need in great quality.

There are three parts to this step:

  1. You gather your data. If you are already given that data, then all you need to do is open it, like importing it into a Jupyter notebook. If you weren't provided data, you need think carefully about what data would be most helpful in answering your questions and then collect them from all the sources available.
  2. You assess your data to identify any problems in your data's quality or structure.
  3. You clean your data. This often involves modifying, replacing, or moving data to ensure that your data set is as high quality and well-structured as possible.

This wrangling step is all about getting the data you need in a form that you can work with.

Explore

Exploring involves finding patterns in your data, visualizing relationships in your data and just building intuition about what you're working with. After exploring, you can do things like remove outliers and create new and more descriptive features from existing data, also known as feature engineering.

Many times modifying and engineer your data properly and even creatively can significantly increase the quality of your analysis. As you become more familiar with your data in this EDA step, you'll often revisit previous steps.

Example: you might discover new problems in your data and go back to wrangle them. Or you might discover exciting, unexpected patterns and decide to refine your questions.

The data analysis process isn't always linear. This exploratory step in particular is very intertwined with the rest of the process. It's usually where you discover and learn the most about your data.

Conclusions

After you've done your exploratory data analysis, you want to draw conclusions or even make predictions.

Example: Predicting which students will fail a project so you can reach out to those students Or predicting which products are most likely to sell so you can start your store appropriately.

Communicate

Finally, you need to communicate your results to others. This is one of the most important skills you can develop. Your analysis is only as valuable as your ability to communicate it. You often need to justify and convey meaning in the insights you found Or if your end goal is to build a system, like a movie recommender or a news feed ranking algorithm, you usually share what you've built, explain how you reach design decisions and report how well it performs.

You can communicate results in many ways:

  • Reports
  • Slide
  • Decks
  • Blog Posts
  • Emails
  • Presentations