Git Product home page Git Product logo

Comments (13)

scientes avatar scientes commented on August 23, 2024

If i'd get a little more info i'd like to work on it using scrapy as a scraping framework if you dont mind.

What do you mean reflect htoe changes in the data folder? Should i keep log on updates made to the data or just make a script which updates the data daily.

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

Any web scraping tool is welcome, we have currently odi records for each player , under zip(matchids and dates),.zip2(batting records), bowl, wk. As these current players play further matches, it simultaneously should update in those folders,
As of now you can update only the zip folder, which contains matchids and dates for each player, if there's any new non retired player whose records have been put up and/or, if the current players have played a new series of ODI cricket, we would like you to add it, for example Pakistan and the england players have played a series recently, we would like you to update those records

Every thing must be scraped from howstat.com
you can also update zip2, bowl, and wk using the scoring table in Dataset.md but to put a PR, updating the zip folder is sufficient as of now

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

I would suggest make a script which updates it

from best11-fantasycricket.

scientes avatar scientes commented on August 23, 2024

Two things:

  1. id like to be assigned
  2. is it a problem if i use the shortened version of the player name for now as filename, because for the other i'd need to crawl the player page also and link the information together which would be kinda complicated for now.

also its kinda finished, i'm just fixing bugs atm:
https://github.com/scientes/Best11-Fantasycricket/tree/webcrawler

currently it recrawls everything but that is a problem i need to fix later. (atm im using a httpcache for development so pages aren't crawled twice but that doesn't help)

So: do i need to filter out retired players? or do you want all.

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024
  1. done
  2. Not at all, I just used the long player names as I scraped them like that

Retired players, I dont mind keeping them, Its your call if you want to remove them

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

For non - Retired players , check this out http://www.howstat.com/cricket/Statistics/Players/PlayerListCurrent.asp
As they get removed from this list ,they are conisdered as retired as per http://www.howstat.com/cricket/Statistics/Players/PlayerMenu.asp

from best11-fantasycricket.

scientes avatar scientes commented on August 23, 2024

Ah thx
Other Issue:
Git is creating problems for me because the total amount of files is very large due to there being a total amount of 5038 Players wouldn't it be better to make one file per folder and just filter on usage or should i just push all 5k files(in the future x3 or so due to zip2,bowl,wk)?
with zip2 and the rest i'm a bit lost on how to calculate because i'm not familar with cricket at all, but those categories are easy to implement with the current crawler.

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

I didnt understand what you mean by one file per folder. If you could elaborate on that ,It would be helpful
Once the folder zip is implemented, zip2 bowl and wk is just a simple function away, so it fine if you dont implement zip2, bowl and wk

from best11-fantasycricket.

scientes avatar scientes commented on August 23, 2024

Well i mean currently you generate one file per player per folder (a bit less because not everyone is in bowl and wk to my knowledge) with 5000 or so total Players you generate approx 15k-20k Files containing a total of maybe 20-30mb that's a load of files for this little data. It would probably be wise to store all data for zip in one file all for zip2 in one file and so on. I mean you are already using pandas why bother splitting up the data and not just filter in pandas

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

ohh you mean like one file called zip.csv, zip2.csv ,bowl.csv,wk.csv?

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

If this is the case, then how will you adjust for each player,
are you suggesting something like

player matches
player1 matchid1
matchid2
matchid3
player2 matchid1

and so on

This actually sounds do-able

from best11-fantasycricket.

scientes avatar scientes commented on August 23, 2024

Yes that was my idea.

from best11-fantasycricket.

roysti10 avatar roysti10 commented on August 23, 2024

Closed! Thanks to @scientes

from best11-fantasycricket.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.