Git Product home page Git Product logo

lgaalves / appliedmathschoollectures Goto Github PK

View Code? Open in Web Editor NEW
41.0 2.0 15.0 69.47 MB

Lectures on "crime and political corruption analysis using data mining, machine learning and complex networks" at the School of Applied Mathematics in the Institute of Mathematics and Computer Science at University of São Paulo

Home Page: https://github.com/lgaalves/school_crime_and_corruption_analysis

License: Other

Jupyter Notebook 98.64% HTML 1.11% CSS 0.02% Python 0.23%
python jupyter-notebook web-scraping data-mining data-science big-data machine-learning complex-networks community-detection corruption-networks

appliedmathschoollectures's Introduction

Crime and political corruption analysis using data mining, machine learning and complex networks

There has been a remarkable increasing in the amount of stored data by private and public companies. On one hand, these huge amounts of data enable a detailed historical review of the processes under investigation; on the other hand, this excess of data makes harder to extract summarized information and also to make good decisions supported by well-established empirical facts. This modern phenomenon has been called a big data and understanding these systems and extracting patterns from these data requires a multidisciplinary approach. In this sense, during the course at the School of Applied Mathematics in the Institute of Mathematics and Computer Science at University of São Paulo we will address topics that involve computer science, statistics, and physics to understand these systems. Among the topics, we will focus on the following ones:

  • Introduction to Python;
  • Web scraping;
  • Data mining;
  • Machine learning;
  • Complex networks.

Using these tools, we will focus on two issues that are of great relevance in Brazil: predicting homicides in cities and describing the mechanism behind political corruption networks. In the first topic, we will use machine learning techniques to predict the number of crimes in Brazilian cities. In the second topic, we will use complex networks to describe the interaction between politicians investigated in corruption scandals in Brazil from 1987 to 2014.

Any comments, questions, or concerns can be directed to:

Course Syllabus

This course is broken up into several modules with each module having a set of Jupyter notebooks to help teach concepts.

Basics, Collections and Files (Day 1)

  1. Jupyter Notebook
  2. Basic Data Types
  3. Flow Control
  4. Errors
  5. Lists, Tuples, and Sets
  6. File I/O
  7. Section Review (Optional)

Imports, Plots, Functions, Dictionaries, and Web Scraping (Day 2)

  1. The Python Standard Library
  2. Data Visualization
  3. Functions
  4. Review (Optional)
  5. Dictionaries
  6. Review (Optional)
  7. Mini-Project
  8. Web Scraping

Data Mining, Statistics, and Data Analysis (Day 3)

  1. Statistical analysis with Python
  2. Bootstrapping MC chains
  3. More stats with Python
  4. The Bootstrap
  5. Structured Data Analysis Pt1
  6. Structured Data Analysis Pt2

Machine Learning Part I (Day 4)

  1. Data Loading
  2. Introduction to Scikit Learn
  3. Unsupervised Transforms
  4. Cross-validation and Grid Search
  5. Preprocessing

Machine Learning Part II (Day 5)

  1. Linear Models for Regression
  2. Linear Models for Classification
  3. Trees
  4. Random Forests
  5. Gradient Boosting
  6. Homicides Prediction

Complex Network and Analysis of Corruption Networks (Day 6)

  1. Network Basics
  2. Analysis of Structural Properties
  3. Network Vizualization and Queries on Networks
  4. Network Analysis from Data
  5. Corruption Network

Social Network Analysis Using igraph and leidenalg (Extra)

  1. Network Basics
  2. Social Networks
  3. Complex Networks Models
  4. Community Detection

Software Installation

This bootcamp uses the Anaconda Python 3.7 distribution

You must have Anaconda Python 3.7 installed before the first day of class

Downloading Course Materials

The course materials can be downloaded from the repository's github page. Just download the zip file, unzip it onto your Desktop, and rename the directory school-of-applied-math.

Usage of Course Materials

This text and the majority of the course will conducted with Jupyter Notebook http://jupyter.org. Jupyter Notebook is a 'web-based interactive computational environment', meaning that it allows to write and execute python code in a web page from your own computers. Jupyter Notebook is a relatively new tool and we believe that is an excellent way to teach the basics of python programming and computational data analysis.

Jupyter Notebook is installed by default with the Anaconda Python distribution and can be laucnhed from the Anaconda Navigator program.

Location and period of the course:

Period: July 1 to July 6, 2019.

Hours: 08:00 to 12:00

Location: (Institute of Mathematics and Computer Science at University of São Paulo) / University of São Paulo (rooms of block 3).

Approval Criteria: 85% of attendance and performance of proposed activities.

Target Audience: Senior year students and postgraduate students in applied mathematics, statistics, computer science and physics interested in data science.

Number of vacancies: 20

Enrollment Period: 04/15/2019 to 05/30/2019.

References

appliedmathschoollectures's People

Contributors

lgaalves avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.