Git Product home page Git Product logo

yeti-thesis-project's People

Contributors

chaostewart avatar cnorrisjones avatar liuyejia avatar oschulte avatar zeruniverse avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

liuyejia

yeti-thesis-project's Issues

implement shuckers' metric

Evaluate using Shucker's metric (i.e. correlation between predicted ranking and actual ranking according to number of games).

  • For each player, export their leaf node
  • For each leaf, export its equation (currently manual)
  • For each player, apply the equation to get a predicted score (number of games played, probability of playing > 0 games).
  • use predicted score to rank players. Compute spearman rank correlation with ranking according to observed metric

upload datasets

Can we

  • upload NBA data to github?
  • upload NBA + NHL data to the prague relational repository? Or first the database server?

GitHub issue fix

Hi

Please bind the ~/.ssh/id_rsa to your github configuration so that you don't get publickey error

conference paper

Some notes for the conference paper

  • I see in your thesis there has been discussion about grouping NBA players into positions. The tree in a sense discovers groupings of positions. e.g.

For basketball,
many clustering approaches focus on defining appropriate roles or positions for a
player.

  • meet with Max and Galen about learning model trees

  • explain why plus-minus is important in the tree

  • submit to KDD workshop?

report on WHL prediction

Hi Yeti,

thanks for doing this work. It looks like an interesting result, especially that SVM does so much better. A few suggestions.

  • how about neural net? Deep neural net?

  • In your report, please include a brief description of the dataset, maybe some sample lines from the data file. Also a description of what you want to predict.

  • How does this relate to Wilson's result?

  • How does this relate to Shuckers' work? I guess we should get a script for computing the correlations with TOI performance as he did.

draft data questions

  1. why are there CSS ranks missing (e.g. in chao_draft.join_skater_and_season_stats_10_years_view we go from 249 to 242)?
  2. why are there 668 players at maximum rank in chao_draft.norm_dataset_for_lmt but only 666 players with CSS rank null in chao_draft.join_skater_and_season_stats_10_years_CSS_null?

regarding Schucker's metric

link to Shucker's paper: https://github.com/sfu-cl-lab/Yeti-Thesis-Project/blob/master/papers/1559-Draft-by-Numbers.pdf

  1. Schucker used Spearman rank correlation as a measure of association and predictive ability. Can I calculate this correlation by following the instructions as follows: https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php ???
  2. Schucker used data from 3 consecutive (1998, 1999, and 2000) NHL drafts to build the model, then predicted each players performance in the 2 subsequent (2001 and 2002) out of sample years. Should we do it in the same way?
  3. Table 4 and Table 5 in Schucker's paper display results for "Drafted Players" & "All Players" respectively. Their definition of "drafted players" & "all players" is a bit of unclear to me.
    @oschulte

Please install WEKA on your machine at work...

Hi @oschulte Oliver,
Connecting to our datebase in WEKA is non-trivial. I uploaded an instruction file for this particular task. Please check out: https://github.com/sfu-cl-lab/Yeti-Thesis-Project/blob/master/How%20to%20connect%20to%20MySql%20database%20in%20WEKA.md

I can't make the connection work on my loptop for now due two multi-hops. Neither can I export databases I need (chao_draft or ckm_and_exception_mining) to my local. Thus, (if i can't come up with a better solution in the next 24 hours) I suggest, to be able to use WEKA in our next meeting, the easiest way is to have you install it on your desktop in your office. OR come to my desktop in the lab.

Thank you for your understanding!

verify CSS rank

  1. check the websites mentioned by Shuckers to find the CSS rank for missing players
  2. if we can't find it, is that because they are low ranked (late pick)?
  3. How did we get Cescin? This means that for the European players, we have a different ranking. Cescin is converted European rank. I wonder if we can find the original scouting ranking for Europeans.

build decision tree (e.g. J48)

Could try using a discrete prediction. Probably with three classes:

  1. NHL player with > 0 games
  2. NHL player with 0 < 160 games (whatever the official treshold is)
  3. NHL player with > 160 games

Problem 1 seems especially natural, compared to the somewhat artificial threshold of 160.

If we implement Shucker's ranking metric #4 then we can evaluate both regression trees and decision trees in terms of how they rank players. For that we may have to combine decision trees with logistic regression - I wonder if that exists? In R perhaps?

Sloan Abstract

  • write outline
  • make result figures with d_i^2
  • make plots

skaters' position of 'W' or 'F' from eliteprospects.com

Player stats crawled from eliteprospects.com are saved as table chao_draft.elite_prospects_skaters_stats_1998_2008_original. There are ~150 of them have the position as 'W' or 'F'. How do we determine if they are 'L', 'R' or 'C'? Note that even their 'Shoots" information is also know, left or right-handed shoots does not determine a player's position as 'L' or 'R'.

Other datasets

  • Euro 2016 data (see google drive, Maity email)
  • Canadian Women's soccer (Peter Chow-White)
  • dataset from Russ (?? source)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.