Finish basic data cleaning Try to make a benchmark using kNN
convert all training data in one file
- Data Cleaning Code is dcp.py
- Beside weather_data,traffic_data & time, I have included previous 3 gaps of each data points as their features as well. Just to remind you.
- didi_concat_data.ipynb is used to join training data different day to a single csv file.
- My knn code is in didi_knn_on_training_set.ipynb
The final MAE result on Training set is 1.538
To run didi_knn_on_training_set.ipynb
, you only need to unzip TrainingData.zip
「以上」
knn - benchmark on test-set-1 has been uploaded to UdaGroup
final MAE score = 9.6
- rewrite dcp.py into a better looking form
- add
weekdays/weekend
,price
as new features - implement feature selection
- Try better regression algorithms
- Find a way to include POI as features
「以上」
add weekdays/weekend
, price
as new features
implement feature selection
the most useful features are three previous gap and two previous count
mae reduced to 5.3
data cleaning finished, csv files have been uploaded to Udacity slack channel