Git Product home page Git Product logo

pandas-cookbook's Introduction

Pandas Cookbook

This is the code repository for Pandas Cookbook, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way.

The pandas library is massive, and it’s common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

>>> employee = pd.read_csv('data/employee')
>>> max_dept_salary = employee.groupby('DEPARTMENT')['BASE_SALARY'].max()

Pandas is a third-party package for the Python programming language and, as of the printing of this book, is on version 0.20. Currently, Python has two major supported releases, versions 2.7 and 3.6. Python 3 is the future, and it is now highly recommended that all scientific computing users of Python use it, as Python 2 will no longer be supported in 2020. All examples in this book have been run and tested with pandas 0.20 on Python 3.6.

In addition to pandas, you will need to have the matplotlib version 2.0 and seaborn version 0.8 visualization libraries installed. A major dependence for pandas is the NumPy library, which forms the basis of most of the popular Python scientific computing libraries.

There are a wide variety of ways in which you can install pandas and the rest of the libraries mentioned on your computer, but by far the simplest method is to install the Anaconda distribution. Created by Continuum Analytics, it packages together all the popular libraries for scientific computing in a single downloadable file available on Windows, Mac OSX, and Linux. Visit the download page to get the Anaconda distribution (https://www.anaconda.com/download).

In addition to all the scientific computing libraries, the Anaconda distribution comes with Jupyter Notebook, which is a browser-based program for developing in Python, among many other languages. All of the recipes for this book were developed inside of a Jupyter Notebook and all of the individual notebooks for each chapter will be available for you to use.

It is possible to install all the necessary libraries for this book without the use of the Anaconda distribution. For those that are interested, visit the pandas Installation page (http://pandas.pydata.org/pandas-docs/stable/install.html).

Related Products

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781784393878

pandas-cookbook's People

Contributors

dominicpereira92 avatar packt-itservice avatar packtutkarshr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pandas-cookbook's Issues

Ch 5 Translating SQL WHERE clauses section.

Enjoying the book!

In the “Translating SQL WHERE clauses“ There’s More” section the negation causes Pandas to return salaries < $80000 and >$120000 so it doesn’t emulate a SQL ‘between’.

Chapter 10, important typo

crime_sort.resample('QS-MAR')['IS_CRIME', 'IS_TRAFFIC'].sum().head() Page 412 describes 'QS_MAR as quarters starting in March, while it should read "ending" in March.
It's important because the resultant table starts at the beginning of Dec, which is not possible under the latter scenario and caused some confusion for me.

File won't download

When I try to download the file by clicking the download button it just opens another webpage with unformatted csv text information. How do I actually download the file?

Chapter 11, In [49]

The code, as written, does not produce the output shown in Out [49] in ipython in a jupyter notebook.

In the scatter plot, color= expects one input, not a list of two inputs.
In the line plot, the x axis has no labels.

ValueError: Table tracks not found

I solved this problem by trying different expressions for paths to the chinook.db file for the create_engine(), which eventually produced a useful error message. The following line works if you provide an absolute address for the dir that holds the chinook.db file in the string that I called "path_to_file":

engine = create_engine('sqlite:////' + path_to_file + 'chinook.db')

For absolute addresses, create_engine expects //// not ///.

this was my original statement of the problem

from sqlalchemy import create_engine
engine = create_engine('sqlite:///chinook.db')
tracks = pd.read_sql_table('tracks', engine)
tracks.head()

InvalidRequestError Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/sql.py in read_sql_table(table_name, con, schema, index_col, coerce_float, parse_dates, columns, chunksize)
239 try:
--> 240 meta.reflect(only=[table_name], views=True)
241 except sqlalchemy.exc.InvalidRequestError:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sqlalchemy/sql/schema.py in reflect(self, bind, schema, views, only, extend_existing, autoload_replace, **dialect_kwargs)
4148 "Could not reflect: requested table(s) not available "
-> 4149 "in %r%s: (%s)" % (bind.engine, s, ", ".join(missing))
4150 )

InvalidRequestError: Could not reflect: requested table(s) not available in Engine(sqlite:///chinook.db): (tracks)

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in ()
----> 1 tracks = pd.read_sql_table('tracks', engine)
2 tracks.head()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/sql.py in read_sql_table(table_name, con, schema, index_col, coerce_float, parse_dates, columns, chunksize)
240 meta.reflect(only=[table_name], views=True)
241 except sqlalchemy.exc.InvalidRequestError:
--> 242 raise ValueError("Table %s not found" % table_name)
243
244 pandas_sql = SQLDatabase(con, meta=meta)

ValueError: Table tracks not found

Website seems to have changed.

Chapter 9, Comparing President Trump's and Obama's approval ratings

Code below does not work any more,

base_url = 'http://www.presidency.ucsb.edu/data/popularity.php?pres={}'
trump_url = base_url.format(45)

df_list = pd.read_html(trump_url)
len(df_list)

resulting in ValueError: No tables found

Can you share the data in csv format or modify the source code to work with the present website?

Vehicles dataset?

In Ch. 5 - EDA, there are examples with the vehicles.csv dataset, but I cannot seem to locate it in the dataset. Is there a url that can be provided to the actual data? I couldn't locate this on fueleconomy.gov

thanks

Chapter 1 > notebook

.dtypes.value_counts() is the one to go for;
this following is deprecated: .get_dtype_counts()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.