akamlani / dat_sf_10 Goto Github PK

This project forked from kebaler/dat_sf_10

Repository for data science 10 course

Python 100.00%

dat_sf_10's Introduction

Hi, I'm Ari 👋

I'm a Data Scientist, Strategist, and AI Technology Leader who is constantly experimenting, innovating, and tinkering...

🔭 I joined Beyond Limits in 2021 in their CTO Office, but have since departed in September 2023.
🔭 Since then I have taken on AI consulting / advisory functions in the interim.
⚡ I'm always listening to Podcasts as I stroll through the city.
😅 Fun fact: I've gone globetrotting around many countries and ended up scaling Mount Everest, a bit past Base Camp...

I'm best reached via email. I'm always open to interesting conversations and collaboration.

dat_sf_10's People

Watchers

Forkers

anilprasad

dat_sf_10's Issues

HW4 Review

Hi Ari,

Sorry for the late review. But I love your code style! All of your functions are well generalized to be recycled throughout the assignment, which makes it much easier to read through.

I'm not sure I follow your approach to Q3. I understood the question to want n_components=2. But I gather that in your code, keep the variance of n_comp=5 and plot on 2 axis? But perhaps this was correct - both of your graphs looks quite a bit more spread out compared to mine with n_comp=2 (i.e., more distinct clusters).

Wish I could give you more constructive criticism, but I'm afraid I don't have much to critique! :]

@kebaler @ghego @craigsakuma

HW2

Excellent...

HW6 Review

I enjoyed reading your report and I think it helped me understand the details of the assignment much better (in particular the beginning exploratory analysis). A few observations/thoughts:

When transforming to categorical variables, you didn't use the functions suggested (the scikit learn preprocessing package). It looks like what you did worked fine, but it's something to look into as it could be faster in the future. I did mine using the preprocessing.LabelEncoder() function
For the decision tree section, decision trees kind of inherently over fit the data, so it's useful to have a max depth or prune it in some way.
For your k-fold cross validation you find the number of folds which yield the maximum accuracy, which isn't really the point of k-folds (the idea is to get the best understanding of what your out of sample accuracy will be).
Your explanation of learning curves wasn't all that thorough (though I did go in not really knowing what they are at all)
Could have used a bit more explanatory text throughout the notebook, but you explained all the main points
I like your crosstab tables, definitely going to start using those

@ghego, @craigsakuma, @kebaler

HW3 review

Hi, apologies for the lateness.

Import looks good.

I like how from the start you normalized the time to seconds using def convert_to_sec(frame):

Next you start plotting on the axis right away and use the class_id feature as the unique value.

The plot exercise you use it to identify the outliers to possibly eliminate data. Seems like it helps to confirm you are on the right path for next step.

Once you filtered the data you plot again to understand the data curve.

Next you normalize the dataset to have a range that you can work with.

You do the correlation using Pearson, which means you measured the linear correlation of the points in the data and these are between 1 and -1.

Now you go back to normalize the problems you determined as targets earlier.

I'm not sure what happens at the class per time section. It seems like restructuring the data set based on those normalized targets.

Next using PCA, yes! Will need more time understanding the rest.

Good job!

@ghego, @craigsakuma, @kebaler

HW1 Review by Matt Lichti

Hi Ari,

Your homework looks really good. I like how you displayed the results were displayed in Pandas tables to make them very easy to read. Your code for loading the data into a SQL database and querying it is much cleaner than mine.

We got the same results for the date with the most logins, but we interpreted finding the hour with the most logins a bit differently. I found the hour of the day with the most total logins throughout the entire period, while you found the combination of hour and date with the most logins.

@ghego, @craigsakuma, @kebaler @akamlani

akamlani / dat_sf_10 Goto Github PK

dat_sf_10's Introduction

Hi, I'm Ari 👋

dat_sf_10's People

Watchers

Forkers

dat_sf_10's Issues

HW4 Review

HW2

HW6 Review

HW3 review

HW1 Review by Matt Lichti

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent