Git Product home page Git Product logo

unsiap-python-oct-2019's Introduction

Big Data Analytics with Python

The material in this repository was presented at a training workshop at UN, Statistical Institute for Asia and Pacific(SIAP) in Chiba, Japan on October 16-17, 2019. These sessions were presented as part of the Theory and Practices in Official Statistics for Monitoring SDGs training organized by SIAP.

Course Outline and Goals

The goal of the course is to introduce participants to the use of Python to perfom data science tasks such as data ingestion, data analysis and machine learning with focus on processing of large scale datasets. This course is different from regular online courses as it uses real life datasets and case studies to challenge participants with real world data science problems, instead of solving toy problems. Since this is a 2 day (8 hour course), the idea of the course is to introduce participants to the concepts rather than provide a detailed coverage. The following topics will be covered:

  • Day 1 [Python for Data Science]: During the first day, participants will be given a crash course on Python programming. The rest of the day will focus on generating data using Python by accessing APIs and scraping web pages.
  • Day 2 [Machine Learning and Big Data in Python]: On the second day, we will go through how to tackle Machine Learning(ML) probelems using Python. Participants will also be shown a demonstration of processing a large scale dataset using Python.

Delivery Style

Considering that we have only 8 hours to cover the material, this course is intended more as an information to introduce the participants on state of the art of tools in Data Science using Python. In this regard, the course will utilize different approaches as follows to deliver the material:

  • Lecture: power point slides will be used to provide introduction to key concept
  • Follow along coding: the participants will be provided with a pre-prepared Jupyter Notebook which they can follow along with the course instructor.
  • Coding exercise: Short programming exercises will be given to participants to enable them practice key concepts
  • Demonstrations: Due to time limitations, in some cases, the instructor will show the participants demonstrations so that they appreciate how some concepts are implemented in practice.

Repository Setup

The main materials contained in this repository are source code(src), powerpoint slides and data. All the source code live in the src folder. The rest of the folders are organized by topic (e.g., e.g., machine learning). In these folders, we have data as well other useful resources. Note that in cases where the data files are huge, the data isn't available in the folder in the repository due to Github data storage limitations. Most of the powerpoint slides are large, these are not included in the repository, instead you can find uptodate powerpoint slides here. All the code use Python 3.

Pre-course Training Materials

In the Big Data Analytics with Python course, we will use the Python programming language to interact with data. To ensure that participants gain the most out of the course, we require that you have basic skills in Python. Luckily, the internet is full of very good introductory Python courses. Please see below for two of such course which you can go through. In addition to Python, a basic understanding of Github is also required for this course. See Github pre-course preparation for tutorials.

Introduction to Python

See below two links for free Python courses. You need only do one of the courses, but you can do both if you will. They are both free and will take less than 5 hours of your time. Once you finish the course(s), you will have the prerequisite Python knowledge to enable you gain the most out of the 5-day course.

  1. Free Udemy Python Course

  2. Another Free Udemy Python Course

Github

We will use Github for tracking our code and submitting exercises. As such, its important that you make yourself familiar with Github. Refer to the links below for Github training materials.

  1. Github tutorial on Youtube
  2. Github tutorial

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.