Git Product home page Git Product logo

dat_sf_10's Introduction

Hi, I'm Ari ๐Ÿ‘‹

I'm a Data Scientist, Strategist, and AI Technology Leader who is constantly experimenting, innovating, and tinkering...

  • ๐Ÿ”ญ I joined Beyond Limits in 2021 in their CTO Office, but have since departed in September 2023.
  • ๐Ÿ”ญ Since then I have taken on AI consulting / advisory functions in the interim.
  • โšก I'm always listening to Podcasts as I stroll through the city.
  • ๐Ÿ˜… Fun fact: I've gone globetrotting around many countries and ended up scaling Mount Everest, a bit past Base Camp...

I'm best reached via email. I'm always open to interesting conversations and collaboration.


Twitter Follow

dat_sf_10's People

Watchers

 avatar  avatar  avatar

Forkers

anilprasad

dat_sf_10's Issues

HW4 Review

Hi Ari,

Sorry for the late review. But I love your code style! All of your functions are well generalized to be recycled throughout the assignment, which makes it much easier to read through.

I'm not sure I follow your approach to Q3. I understood the question to want n_components=2. But I gather that in your code, keep the variance of n_comp=5 and plot on 2 axis? But perhaps this was correct - both of your graphs looks quite a bit more spread out compared to mine with n_comp=2 (i.e., more distinct clusters).

Wish I could give you more constructive criticism, but I'm afraid I don't have much to critique! :]

@kebaler @ghego @craigsakuma

HW2

Excellent...

HW6 Review

I enjoyed reading your report and I think it helped me understand the details of the assignment much better (in particular the beginning exploratory analysis). A few observations/thoughts:

  • When transforming to categorical variables, you didn't use the functions suggested (the scikit learn preprocessing package). It looks like what you did worked fine, but it's something to look into as it could be faster in the future. I did mine using the preprocessing.LabelEncoder() function
  • For the decision tree section, decision trees kind of inherently over fit the data, so it's useful to have a max depth or prune it in some way.
  • For your k-fold cross validation you find the number of folds which yield the maximum accuracy, which isn't really the point of k-folds (the idea is to get the best understanding of what your out of sample accuracy will be).
  • Your explanation of learning curves wasn't all that thorough (though I did go in not really knowing what they are at all)
  • Could have used a bit more explanatory text throughout the notebook, but you explained all the main points
  • I like your crosstab tables, definitely going to start using those

@ghego, @craigsakuma, @kebaler

HW3 review

Hi, apologies for the lateness.

Import looks good.

I like how from the start you normalized the time to seconds using def convert_to_sec(frame):

Next you start plotting on the axis right away and use the class_id feature as the unique value.

The plot exercise you use it to identify the outliers to possibly eliminate data. Seems like it helps to confirm you are on the right path for next step.

Once you filtered the data you plot again to understand the data curve.

Next you normalize the dataset to have a range that you can work with.

You do the correlation using Pearson, which means you measured the linear correlation of the points in the data and these are between 1 and -1.

Now you go back to normalize the problems you determined as targets earlier.

I'm not sure what happens at the class per time section. It seems like restructuring the data set based on those normalized targets.

Next using PCA, yes! Will need more time understanding the rest.

Good job!

@ghego, @craigsakuma, @kebaler

HW1 Review by Matt Lichti

Hi Ari,

Your homework looks really good. I like how you displayed the results were displayed in Pandas tables to make them very easy to read. Your code for loading the data into a SQL database and querying it is much cleaner than mine.

We got the same results for the date with the most logins, but we interpreted finding the hour with the most logins a bit differently. I found the hour of the day with the most total logins throughout the entire period, while you found the combination of hour and date with the most logins.

@ghego, @craigsakuma, @kebaler @akamlani

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.