bcy0123 Goto Github PK

followers: 0.0 following: 8.0 repos: 30.0 gists: 0.0

Name: bcy0123

Type: User

bcy0123's Projects

baidu-map-house-price-visualization

百度地图房价可视化

branch-and-bound-algorithms

Used branch and bound algorithms to solve NP Hard problems on a cluster of workstations.

cookbook

A repository of machine learning codes written for re-usability

deploying-machine-learning-models

Example Repo for the Udemy Course "Deployment of Machine Learning Models"

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

dvc_learning

econml

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

flask-api

In this repo I show how to simple create an API for your machine learning models in Python

geatpy

A high-performance GEA framework of Python. Welcome to star and fork.

gnn_review

GNN综述阅读报告

graphembedding

Implementation and experiments of graph embedding algorithms.deep walk,LINE(Large-scale Information Network Embedding),node2vec,SDNE(Structural Deep Network Embedding),struc2vec

kdd2019_hetgnn

code of HetGNN

keras-mmoe

A Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

learningpyspark

Code base for the Learning PySpark book (in preparation)

lihang-code

《统计学习方法》的代码实现

machine-learning-from-scratch

常用机器学习的算法简洁实现

machinelearninglecturenotes

张志华机器学习导论MOOC讲义

ntu-hsuantienlin-machinelearning

obsp_ls

order batching and sequencing problem

python_data_analysis_and_mining_action

《python数据分析与挖掘实战》的代码笔记

pytorch-gnn

The implement of GNN based on Pytorch

pytorch-tutorial

Build your neural network easy and fast

recurrsive-feature-selection-logistic-regression-

Data extraction and exploration This is a brief analysis of the dataset on each pitch. The atbat data is also joined with the pitches data to get information from 7 additional columns. This is a left join between pitches and atbat data (i-e pitches LEFT JOIN atbats) so it contains all the rows in the pitches data. The first plot is a correlation plot where darker red tones show a positive correlation among the variables while a darker blue color show a negative correlation. Greyish tones show no or poor correlation. A few subsequent plots show scores of home and away games and how they vary with attendance and delay in the start of the game. These plots show relationships and distribution of the data as well as position of outliers. For example, attendance between 20,000 and 30,000 seems to be correlated with both away and home games that have high scores. This exploration can be further built upon as well. Data manipulation The feature of interest is the "Event" variable. This is the outcome of each pitch. There are 30 possible events. This makes the analysis complicated. This feature is converted into a binary feature with the value of 0 if the event is a 'Single', 'Walk', 'Double', 'Home Run' ,'Hit By Pitch', 'Field Error' ,'Intent Walk' or a 'Triple' and a value of 1 otherwise. Moreover, all meaningless variables that do not contribute to the correlations or variation in the data are dropped. This includes certain keys/IDs and certain categorical variables. The remaining numerical variables are then brought to a single scale. The scaling has a major impact on the modeling and analysis that is to follow. Initial model The initial model consists of 45 features and almost 3 million rows of data. The data is split into 2 partitions; a training set and a testing set in a 70:30 respective ration. A logistic regression algorithm is trained on the training dataset on this data. It can be seen that the logistic regression algorithm performs well on the training data. The accuracy is 99.4%. However there are a large number of features that are difficult to analyse and can cause overfitting to the noise in the training data. A recursive algorithm that drops 1 weak variable in each iteration is also used. This algorithm reduces the number of features to 15, without any decrease in training accuracy. Perhaps, all the 45 features are not required for the training of the algorithm and the features can be decreased even below 15 for further optimization. The algorithm is not tested on the testing set yet. The goal is to improve the algorithm using the training set and then test it. This would ensure the integrity of the algorithm on the testing set.

rtb-papers

A collection of research and survey papers of real-time bidding (RTB) based display advertising techniques.

samplemod

saprk_learning

watchlist

xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++14

zhihu

This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.

bcy0123 Goto Github PK

bcy0123's Projects

Recommend Projects

Recommend Topics

Recommend Org