Git Product home page Git Product logo

ds-vis-stats_seaborn-readme's Introduction

Statistical visualizations with seaborn

Scatterplots, as we briefly seen in our introductory lessons and labs, display the values of 2 sets of data on 2 dimensions. Each dot represents an observation. The position on the X (horizontal) and Y (vertical) axis represents the values of the 2 variables. These are useful to study the relationship between different variables. It is common to provide even more information using colors or shapes (to show groups, or a third variable).

The sample scatter plot above shows a relationship between Income and health index for a given sample of data. We can see an overall trend in the data depicting that increase in the income may have some effect on the health of individuals.

Creating Scatter Plots in SeaBorn

As seen earlier, scatter plots are simple to draw in ,atplotlib using the .scatter() method. In this lesson, we shall use a different plotting library avaialble in Python calles seaborn. Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. Here is some of the functionality that seaborn offers out of the box:

  • An API for examining relationships between multiple variables
  • Support for using categorical variables and their aggregate statistics
  • Visualizing univariate or bivariate distributions
  • High-level abstractions for structuring multi-plot grids
  • Concise control over matplotlib figure styling with several built-in themes
  • Tools for choosing color palettes that faithfully reveal patterns in your data

Let's focus at scatter plots for now. In order to use seaborn, we first need to import it alongside matplotlib as shown below:

import matplotlib.pyplot as plt
import seaborn as sns

Seaborn comes packaged with a number of datasets for pratice and exploration. We shall import the famous iris dataset for drawing our scatter plots in the lesson.

# Load the iris dataset into a pandas dataframe
iris_data = sns.load_dataset('iris')
# View the head of dataset
iris_data.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

A detailed list of all the datasets included with seaborn is available at this github repository.

With seaborn, scatterplots are made using the regplot() function (compareable to .scatter() in matplotlib). Here is an example showing regplot() in action with the most basic settings. This function needs 2 lists for the positions of points on the X and Y axis. By default it also draws a linear regression fit which we shall remove with fit_reg=False.

Let's draw a scatter plot between the sepal length and sepal width using regplot.

# Use seaborn to draw a scatter plot between speal length and sepal width columns from the dataset
sns.regplot(x=iris_data["sepal_length"], y=iris_data["sepal_width"], fit_reg=False);

png

Seaborn offers custom coloring and further customization of the scatter plots with shapes. the scatter_kws can be used to specify the size , color and transparency of markers as shown the example below:

# More marker customization:
sns.regplot(x=iris_data["sepal_length"], 
            y=iris_data["sepal_width"], 
            fit_reg=False, 
            scatter_kws={"color":"darkblue",
                         "alpha":0.3,
                         "s":200} )
plt.show()

png

Once you understood how to make a basic scatterplot with seaborn and how to custom shapes and color, you probably want the color corresponds to a categorical variable (a group). This is possible using the hue argument: it’s here that you must specify the column to use to map the color.

# Use the 'hue' argument to provide a factor variable
sns.lmplot(x="sepal_length", 
           y="sepal_width", 
           data=iris_data, 
           fit_reg=False, 
           hue='species', 
           legend=True);
plt.show()

png

ds-vis-stats_seaborn-readme's People

Contributors

shakeelraja avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.