Git Product home page Git Product logo

python-visualization-london-ds-062518's Introduction

Visualizations and Word Clouds

Table of Contents

  1. Introduction
    1. Last Week
      1. Data Science
      2. Python
        1. Data Types
        2. Methods
        3. Packages
    2. Visualization
      1. Common Graphs
      2. Packages
  2. Getting Started
  3. Choosing Colors
  4. Adding Labels
  5. Summary

Introduction

Welcome to week 2! Last week we covered a lot. We started with an overview of data science, both from a career and work flow perspective and tehn continued on to cover the basic datatypes of python. Today, we are going to continue where we left off and take a further look into creating some stunning visuals!

Last Week

The Typical Data Science Workflow <a id="ds_workflow>

Python Data Types

Packages

We also saw how to load packages into python in order to use additional functions and methods stored within them.
Recall:

Visualization

Common Graphs

Packages
  • plotly
  • matplotlib
  • pandas (Primarily a spreadsheet [excel-like] package, but has some built in visualizations using matplotlib)
  • folium
  • seaborn (Makes everything prettier!)
import pandas
travel_df = pandas.read_excel('./cities.xlsx')
cities = travel_df.to_dict('records')
cities[0]
{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700, 'Area': 59}

Getting Started

Let's pull in some data!

Importing Modules

Technically we already did this (we imported the pandas package up above) but let's do it once again just as a reminder.
This time we'll also alias the pandas package as pd while it may not seem like a lot saving those 4 letters typing will add up if we're using it frequently.

import pandas as pd

Here's the official blurbs about Methods and Importing from python:

"If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script. As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.

To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode)."

Packages (collections of Modules) https://docs.python.org/3/reference/import.html

"It’s important to keep in mind that all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module. Specifically, any module that contains a path attribute is considered a package."

"Python code in one module gains access to the code in another module by the process of importing it. The import statement is the most common way of invoking the import machinery, but it is not the only way. Functions such as importlib.import_module() and built-in import() can also be used to invoke the import machinery."

df = pd.read_excel('cities.xlsx')
df.head() #Preview the first 5 rows of the dataframe
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
City Country Population Area
0 Solta Croatia 1700 59
1 Greenville USA 84554 68
2 Buenos Aires Argentina 13591863 4758
3 Los Cabos Mexico 287651 3750
4 Walla Walla Valley USA 32237 33

Common Pandas methods

  • df.head() #Preview the first 5 rows of the dataframe
  • df.head(10) #Preview the first 10 rows
  • df.tail() #Preview the last 5 rows
  • df.columns #Returns a list of the column names Notice that this is an attribute not a method/function; there are no parentheses
  • df.info() #Return column names, length of dataframe and storage size info
  • df[col] #Return a particular column of the dataframe where col is the name of the column
  • df[col].value_counts() #Returns a frequency count of entries within the column in descending order
  • df[col].unique() #Returns a list of unique entries within the column
  • df[col].nunique() #Returns the number of unique entries within the column as an integer
  • df[[cols]] #Returns the dataframe with only those columns indicated
%matplotlib inline

Making a Bar Chart

Let's make a bar chart of cities and their population.

df['Population'].plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x11b925b00>

png

Hmmm, it would sure be nice to have the actual names of the cities on our graph! To do this, we have to tell Pandas what feature we want to use as the index for the dataframe. The index is shown on the left edge and can be thought of as the row names.

df.set_index('City')['Population'].plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x112f9ab00>

png

Better. Let's also change this to a horizontal bar chart so that the cities are easier to read.

df.set_index('City')['Population'].plot(kind='barh') #Notice barh instead of bar
<matplotlib.axes._subplots.AxesSubplot at 0x11bbc0f60>

png

Great! I want to make my chart all orange though!

Choosing Colors

Here's a few helpful resources for getting started:
http://colorbrewer2.org/#type=sequential&scheme=YlOrRd&n=3
https://matplotlib.org/api/colors_api.html

df.set_index('City')['Population'].plot(kind='barh', color='Orange')
<matplotlib.axes._subplots.AxesSubplot at 0x11c35f6a0>

png

We can do way better then that though!
Checkout what you can do with the seaborn package and their color palettes!

import seaborn as sns
sns.set_style("darkgrid") #Load in some snazzier visual settings to spice things upb

See https://seaborn.pydata.org/tutorial/aesthetics.html for all too many options

df.set_index('City')['Population'].plot(kind='barh') #Same code, prettier graph thanks to Seaborn!
<matplotlib.axes._subplots.AxesSubplot at 0x1a1e21fda0>

png

sns.palplot(sns.light_palette((210, 90, 60), input="husl")) #Previewing a color scheme

png

sns.palplot(sns.dark_palette("muted purple", input="xkcd")) #Another color scheme!

png

sns.palplot(sns.color_palette("Paired")) #And another

png

Those purples were amazing! Let's incorporate them into our graph.

dark_purples = sns.dark_palette("muted purple", input="xkcd")
df.set_index('City')['Population'].plot(kind='barh', color = dark_purples)
<matplotlib.axes._subplots.AxesSubplot at 0x1a1e4e1438>

png

Adding Labels

We need another module for this one.

import matplotlib.pyplot as plt
#Same Initial Code
dark_purples = sns.dark_palette("muted purple", input="xkcd")
df.set_index('City')['Population'].plot(kind='barh', color = dark_purples)

#Now Add a title
plt.title('Cities by Population')

#Label the X-Axis
plt.xlabel('Population in Tens of Millions')
Text(0.5,0,'Population in Tens of Millions')

png

Once more for good measure...

This time lets make everything bigger!

#Same Initial Code
dark_purples = sns.dark_palette("muted purple", input="xkcd") 
df.set_index('City')['Population'].plot(kind='barh', color = dark_purples, figsize=(15,12)) #Bigger graph

#Now Add a title
plt.title('Cities by Population', fontsize=22)

#Label the X-Axis
plt.xlabel('Population in Tens of Millions', fontsize=16)

#Enlarge the Y-Axis Label
plt.ylabel('City', fontsize=16)

#Enlarge the City Names themselves
plt.yticks(fontsize=14)
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]),
 <a list of 12 Text yticklabel objects>)

png

Summary

So there you have it! Quick and easy visuals and how to import data from .xlsx or .csv files. Bon voyage!

Remember:

#Import to Module
import pandas as pd
#Load a spreadsheet as a DataFrame using Pandas
df = pd.read_excel(filename)  
#or  
df = pd.read_csv(filename)
# Make sure that graphs show up in Jupyter Notebook:  
%matplotlib inline 

df[[x_col, y_col]].plot(kind='barh') #Create a bar chart!

python-visualization-london-ds-062518's People

Watchers

 avatar Rishikesh Tirumala avatar  avatar Victoria Thevenot avatar  avatar Joe Cardarelli avatar Sam Birk avatar Sara Tibbetts avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar Antoin avatar Alex Griffith avatar  avatar Amanda D'Avria avatar  avatar A. Perez avatar Nicole Kroese  avatar Lisa Jiang avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.