Git Product home page Git Product logo

kaggle-talkingdata-adtracking-fraud-detection's Introduction

Kaggle TalkingData Ad Tracking Fraud Detection Challenge

Scripts for competition https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection

Final ranking #55 with private leaderboard score 0.9824250

  • Aggregate factors from Baris' kernel computed on all data with test_supplement
  • Prev/next click times (1-, 2-, 3-step) computed on all data with test_supplement
  • LIBFFM out-of-fold using 5 folds
  • LGBM on 50% data (using all is_attributed=1 and 50% of is_attributed=0)
  • Average of a few LGBM models trained on different subsets of data and factors
  • Training with 96GB RAM
  • Store data column-wise in binary formats for fast loading
  • HOCON configurations are awesome

EDA preprocessing/eda.ipyndb - distributions of frequencies in train.csv

Merge test datasets preprocessing/merge_test_sets.ipynb - script for merging test datasets test.csv test_supplement.csv

Baris kernel https://www.kaggle.com/bk0000/non-blending-lightgbm-model-lb-0-977?scriptVersionId=3224614 https://www.kaggle.com/aharless/kaggle-runnable-version-of-baris-kanber-s-lightgbm

Next/prev clicks without hashing trick https://www.kaggle.com/asydorchuk/nextclick-calculation-without-hashing-trick

Submissions

model dump train_set params val AUC lb AUC factors
lgbm lgbm_08 d_8_9_h_4_5_9_... it=250 lr=0.2 md=3 nl=7 scw=300 0.983xx 0.9795 baris
lgbm lgbm_09 na_10pct it=500 lr=0.1 md=3 nl=7 scw=na 0.98498 0.9803 baris
lgbm lgbm_11 d_8_9_h_4_5_9_... it=2000 lr=0.01 md=3 nl=7 scw=300 0.98332 0.9791 baris
cbst cbst_09 na_10pct it=500 lr=0.05 md=6 rms=0.7 0.98221 0.9766 baris
lgbm lgbm_12 na_10pct it=500 lr=0.1 md=3 nl=7 scw=na 0.98564 0.9806 baris + t2
lgbm lgbm_13 na_10pct it=1000 lr=0.1 md=3 nl=7 scw=na 0.98610 0.9810 baris + t2
lgbm lgbm_14 na_10pct it=1000 lr=0.1 md=3 nl=7 scw=na 0.9855x 0.9810 baris + t2 + t3
lgbm lgbm_15 na_10pct it=1370 lr=0.1 md=3 nl=7 scw=na 0.98567 0.9811 baris + t2 + t3
lgbm lgbm_16 na_10pct it=1360 lr=0.1 md=3 nl=7 scw=na 0.98544 0.9809 baris + t2 + libffm
lgbm lgbm_17 na_10pct it=820 lr=0.1 md=4 nl=15 scw=na 0.98580 0.9810 baris + t2 + libffm
blend logit, weights=1.0 - 0.9812 blend lgbm_13..lgbm_17
lgbm lgbm_18 na_20pct it=1500 lr=0.1 md=3 nl=7 scw=na 0.98571 0.9808 baris + t2 + libffm + tc2
lgbm lgbm_19 na_50pct it=2500 lr=0.1 md=3 nl=7 scw=na 0.98595 0.9811 baris + t2
blend logit, weights=1.0 - 0.9813 blend lgbm_15 + lgbm_19
lgbm lgbm_19 na_50pct it=1500, all attributed - 0.9810 baris + t2 + libffm
lgbm lgbm_20 na_50pct_2 it=1500, all attributed - 0.9810 baris + t2 + libffm
blend logit, weights=1.0 - 0.9813 blend lgbm_13..lgbm_20

Factors strength (lgbm_19)

feature gain split
libffm_oof 82.34410715832054 363
ip_app_device_os_t_next 4.858614532478393 555
app 4.285458537390102 1057
ip_nunique_channel 1.7134587347643337 173
channel 1.2185210564277669 2220
os 0.9917807676129159 1516
hour 0.8615137822058627 1078
ip_nunique_app 0.5879513478431967 157
ip_nunique_device 0.5397789939897564 190
ip_day_hour_count 0.461685682523349 191
ip_app_count 0.42650563120670515 99
ip_app_device_os_t_next_2 0.35302018803960517 194
ip_device_os_nunique_app 0.3238393490783555 198
ip_day_nunique_hour 0.30262413717998854 76
ip_app_os_count 0.2384258533635993 84
ip_device_os_cumcount_app 0.11690381200637338 58
ip_app_device_os_t_prev 0.07133512725949863 95
device 0.06228725573269818 70
ip_app_nunique_os 0.06172545078968752 141
ip_cumcount_os 0.05417631528656273 44
ip_app_channel_mean_hour 0.04849352774733603 144
day 0.017391508495188064 61
app_nunique_channel 0.014128035383901352 35
ip_app_os_var_hour 0.01325167381501645 52
ip_app_device_os_t_prev_2 0.012127423364797763 60
ip_app_channel_var_day 0.011027400971738753 39
ip_day_channel_var_hour 0.009866716722734953 50

kaggle-talkingdata-adtracking-fraud-detection's People

Contributors

stys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.