Git Product home page Git Product logo

pytenn2014_tutorial's Introduction

Statistical Data Analysis in Python

Intermediate Tutorial, PyTennessee 2014, 23 February 2014

Christopher Fonnesbeck - Vanderbilt University School of Medicine

This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the tutorial is comprised of an overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. In the second half, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping and kernel density estimation. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.

Data Wrangling with Pandas (60 min)

  • Introduction to NumPy arrays
  • Series and DataFrame objects
  • Indexing, data selection and subsetting
  • Hierarchical indexing
  • Reading and writing files
  • Date/time types
  • String conversion
  • Missing data
  • Data summarization
  • Indexing, selection and subsetting
  • Reshaping DataFrame objects
  • Pivoting
  • Data aggregation and GroupBy operations
  • Merging and joining DataFrame objects

Statistical Data Modeling (50 min)

  • Fitting data to probability distributions
  • Kernel density estimation
  • Ordinary least squares regression
  • Logistic regression
  • Bootstrapping

Software Requirements

To follow along interactively with the tutorial, you should have the following Python packages installed:

These can most easily be installed via conda on the Anaconda Python distribution, or with the Scipy Superpack on OS X 10.9 (Mavericks).

pytenn2014_tutorial's People

Contributors

lluang avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.