aster - a bot to write kaggle baseline kernels

Aster is a python based bot (or a module), which is capabale of writing baseline starter kernels for competitions or datasets hosted on Kaggle. As of now, It can work with two types of datasets - numerical dataset (having continuous and / or categorical columns) and text datasets having single text / document field.

Key features

Can create kernels on Compeititon and Datasets both
Can create kernels on datasets with binary / multi classification
Can create kernels on text datasets and numerical datasets
Performs Quick Exploration, Preprocessing, Feature Engineering, and Modelling
Changes the visuals according to data, for example - generates word clouds for text data and pairplots for numerical datasets
Uses a config to create new kernels

How Aster Works

Aster first understands the inputs given in the config by the user and the types of columns present in the dataset. According to this information, aster dynamically chooses the most relevant code / text templates and appends them to the baseline kernel. For example, if the dataset is belongs to text classification category, then aster will generate some wordclouds, will not perform correlation charts, pair plots or categorical variable distributions. While if the dataset is non text classification type, then aster will choose the most relevant templates, for example - distribution of categorical variables, missing value treatments etc.

Detailed table of contents

Aster creates following contents based on the type of data.

Environment Preparation
Quick Exploration
     2.1 Load Dataset
     2.2 Dataset Snapshot and Summary
     2.3 Target Variable Distribution
     2.4 Missing Values
     2.5 Variable Types
     2.6 Variable Correlations
Preprocessing
     3.1 Label Encoding
     3.2 Missing Values Treatment
     3.3 Feature Engineering (text fields)
         3.3.1 TF-IDF Vectorizor
         3.3.2 Top Keywords - Wordcloud
     3.4 Train Test Split
Modelling
     4.1 Logistic Regression
     4.2 Decision Tree
     4.3 Random Forest
     4.4 ExtraTrees Classifier
     4.5 Extereme Gradient Boosting
Feature Importance
Model Ensembling
6.1 A simple Blender
Creating Submission

Useage : example 1

from aster.aster import aster

config = {	"COMPETITION" : "titanic", 
            "_TARGET_COL" : "Survived", 
            "_ID_COL" : "PassengerId"}

ast = aster(config) # aster object with config 
ast._prepare() # prepare the kernel
ast._push() # push the kernel on kaggle

Useage : example 2

from aster.aster import aster

config = {	"COMPETITION" : "spooky-author-identification", 
            "_TARGET_COL" : "author", 
            "_ID_COL" : "id",
            "_TAG" : "doc",
            "_TEXT_COL" : "text"}

ast = aster(config) # aster object with config 
ast._prepare() # prepare the kernel
ast._push() # push the kernel on kaggle

config examples

Aster uses config and its key-value pairs to write kernels on different datasets. All of the keys are not mandatory and most of them are optional. Check the following table.

Key	Example Value	Default	Optional/Mandatory	Definition
DATASET	iris	""	optional	Name of the dataset to be used
COMPETITION	titanic	""	optional	Name of the competition
_TARGET_COL	Survived	""	mandatory	target column name
_ID_COL	PassengerId	""	optional	id column name
_TRAIN_FILE	train	train	optional	name of the train file
_TEST_FILE	test	test	optional	name of the test file
_TAG	doc	num	optional (only for text)	doc : text dataset, num : numerical dataset
_TEXT_COL	text	""	optional (only for text)	name of the column containing text data

Example Kernels generated by Aster

1. Binary Classification on Numerical Data - Competition Data

Titanic Baseline Kernel :

2. Multi Classification on Text Data - Competition Data

Spooky Author Baseline Kernel

3. Classification - Non Competition Data

Iris Dataset
Diabetes Dataset
Mushrooms Dataset

Installation

Aster can be installed directly from github using following commands

git clone https://github.com/shivam5992/aster.git
cd textstat
python setup.py install

Future Work

Dynamic Code Selection Improvements
Add More Content
- Automated Feature Engineering
- Hyperparameter Tuning
Extend Datatypes
- Regression Problems - Numerical Data
- Image Classifiication

amrrs / aster Goto Github PK

aster's Introduction

aster - a bot to write kaggle baseline kernels

Key features

How Aster Works

Detailed table of contents

Useage : example 1

Useage : example 2

config examples

Example Kernels generated by Aster

1. Binary Classification on Numerical Data - Competition Data

2. Multi Classification on Text Data - Competition Data

3. Classification - Non Competition Data

Installation

Future Work

aster's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent