This short lesson summarizes the topics we covered in this section and why they'll be important to you as a data scientist.
In this section, we spent time getting comfortable with pandas and getting some more practice with exploratory data analysis. Some of the key takeaways:
- For non-trivial datasets you'll usually want to store your data in pandas data structures rather than native Python lists and dictionaries
- Pandas has a range of great features for easily importing data from anything from a CSV, an Excel file, JSON, SQL, or a Python dictionary
- Pandas
Series
andDataFrame
classes have a bunch of powerful methods for munging data - Pandas also has a range of methods for applying descriptive statistics to Series and DataFrames
- Finally, by wrapping Matplotlib, Pandas also provides some very convenient plotting capabilities for quickly visualizing data
- We also got some experience working with the Ames Housing dataset, and set up accounts on Kaggle - a really useful resource for practicing data scientists.