Git Product home page Git Product logo

bcy0123's Projects

cookbook icon cookbook

A repository of machine learning codes written for re-usability

dowhy icon dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

econml icon econml

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

flask-api icon flask-api

In this repo I show how to simple create an API for your machine learning models in Python

geatpy icon geatpy

A high-performance GEA framework of Python. Welcome to star and fork.

graphembedding icon graphembedding

Implementation and experiments of graph embedding algorithms.deep walk,LINE(Large-scale Information Network Embedding),node2vec,SDNE(Structural Deep Network Embedding),struc2vec

keras-mmoe icon keras-mmoe

A Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

obsp_ls icon obsp_ls

order batching and sequencing problem

recurrsive-feature-selection-logistic-regression- icon recurrsive-feature-selection-logistic-regression-

Data extraction and exploration This is a brief analysis of the dataset on each pitch. The atbat data is also joined with the pitches data to get information from 7 additional columns. This is a left join between pitches and atbat data (i-e pitches LEFT JOIN atbats) so it contains all the rows in the pitches data. The first plot is a correlation plot where darker red tones show a positive correlation among the variables while a darker blue color show a negative correlation. Greyish tones show no or poor correlation. A few subsequent plots show scores of home and away games and how they vary with attendance and delay in the start of the game. These plots show relationships and distribution of the data as well as position of outliers. For example, attendance between 20,000 and 30,000 seems to be correlated with both away and home games that have high scores. This exploration can be further built upon as well. Data manipulation The feature of interest is the "Event" variable. This is the outcome of each pitch. There are 30 possible events. This makes the analysis complicated. This feature is converted into a binary feature with the value of 0 if the event is a 'Single', 'Walk', 'Double', 'Home Run' ,'Hit By Pitch', 'Field Error' ,'Intent Walk' or a 'Triple' and a value of 1 otherwise. Moreover, all meaningless variables that do not contribute to the correlations or variation in the data are dropped. This includes certain keys/IDs and certain categorical variables. The remaining numerical variables are then brought to a single scale. The scaling has a major impact on the modeling and analysis that is to follow. Initial model The initial model consists of 45 features and almost 3 million rows of data. The data is split into 2 partitions; a training set and a testing set in a 70:30 respective ration. A logistic regression algorithm is trained on the training dataset on this data. It can be seen that the logistic regression algorithm performs well on the training data. The accuracy is 99.4%. However there are a large number of features that are difficult to analyse and can cause overfitting to the noise in the training data. A recursive algorithm that drops 1 weak variable in each iteration is also used. This algorithm reduces the number of features to 15, without any decrease in training accuracy. Perhaps, all the 45 features are not required for the training of the algorithm and the features can be decreased even below 15 for further optimization. The algorithm is not tested on the testing set yet. The goal is to improve the algorithm using the training set and then test it. This would ensure the integrity of the algorithm on the testing set.

rtb-papers icon rtb-papers

A collection of research and survey papers of real-time bidding (RTB) based display advertising techniques.

xlnt icon xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++14

zhihu icon zhihu

This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.