anilsener Goto Github PK

followers: 56.0 following: 181.0 repos: 169.0 gists: 4.0

Name: Anil Sener (Anıl Şener)

Type: User

Anil Sener (Anıl Şener)'s Projects

2018-machinelearning-lectures-esa

Machine Learning Lectures at the European Space Agency (ESA) in 2018

aas

Code to accompany Advanced Analytics with Spark from O'Reilly Media

air_tranportation_statistics_data_inteview_case_study

Analysis of Air Tranportation Statistics Data Case Study solutions for a Lead Data Engineering Position

airsim

Open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research

alluxio

Alluxio, formerly Tachyon, Unify Data at Memory Speed

amazon-emr-management-guide

The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.

apache-spark-2x-machine-learning-cookbook

Apache Spark 2x Machine Learning Cookbook, published by Packt

apache-spark-deep-learning-cookbook

Apache Spark Deep Learning Cookbook, published by Packt

awesome-machine-learning

A curated list of awesome Machine Learning frameworks, libraries and software.

awesome-scala

A community driven list of useful Scala libraries, frameworks and software.

aws-devops-essential

In few hours, quickly learn how to effectively leverage various AWS services to improve developer productivity and reduce the overall time to market for new product capabilities.

axa-insurance-telematics-kaggle

I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.