Git Product home page Git Product logo

Comments (7)

guoguo12 avatar guoguo12 commented on July 18, 2024

Thanks for your input! I'm really glad this project has been useful!

I wasn't able to duplicate the problem above on my computer.

untitled

As you might expect, this is very troubling. About a month ago, there was a pull request targeting a different issue that I also couldn't reproduce. It's like other people are getting different HTML pages from Billboard's servers than me, which shouldn't be happening.

Do me a favor, please—run this script and tell me what you get.

import json, requests

url = 'http://www.billboard.com/charts/hot-100'
headers_current = {'User-Agent': 'billboard.py (https://github.com/guoguo12/billboard-charts)'}

req = requests.get(url, headers=headers_current)
print json.dumps(dict(req.headers), sort_keys=True, indent=4, separators=(',', ': '))
print json.dumps(dict(req.request.headers), sort_keys=True, indent=4, separators=(',', ': '))

This script sends a HTTP GET request to Billboard's servers and prints the return and request headers to stdout in JSON format. What I got was this.

Not sure if this will help, but it's worth a shot. Let me know if you have any other ideas as to why this might be happening.

from billboard-charts.

brycematsuda avatar brycematsuda commented on July 18, 2024

Here's my output: http://pastebin.com/4Gy49tbi

The only notable differences I see are the server ("server": "ECS (cpm/F9B6)") and cache hits ("x-cache-hits": "HIT (5)") but other than that, it's relatively the same. I'm not very familiar with how http requests work and all that at the moment, so I'm not too certain as to what might be happening. I tried fooling with user agents on this site but both my browser and the windows FF/Chrome browsers came back with relatively the same info.

Also ran the script again this morning, still getting the all null albums.

from billboard-charts.

guoguo12 avatar guoguo12 commented on July 18, 2024

Hmm. Well, I'm not sure where to go from here. I'm not familiar with the intricacies of HTTP either, but I'm guessing content might be varied based on the client IP address.

I can think of two possible options. We can put something like this in:

if chartInfoSoup.contents[3].string:
    album = chartInfoSoup.contents[3].string.strip()
elif chartInfoSoup.contents[4].string:
    album = chartInfoSoup.contents[4].string.strip()
else:
    album = None
# This might not work for songs without album names on my end.

Alternatively, we can rewrite the code to ignore the line breaks, maybe using regex. Let me know what you think is best.

from billboard-charts.

brycematsuda avatar brycematsuda commented on July 18, 2024

I was thinking more towards the first option to keep things simple for now. Also it seems like to a lot of people that parsing with regex screams bloody murder, so maybe we'll hold back on it for now since the Billboard HTML code is pretty big.

It's been about 24 hours or so and I haven't ran into any problems, so I'll pull up a PR. If anything comes up we can reopen this.

from billboard-charts.

guoguo12 avatar guoguo12 commented on July 18, 2024

Merged. Thank you for your help!

I've given you full access to the repository. If there are any fixes or improvements you want to make in the future, feel free to do so.

from billboard-charts.

brycematsuda avatar brycematsuda commented on July 18, 2024

Oh wow, I wasn't expecting that, thanks again!

To be honest, I think you've gotten the main stuff nailed down at the moment. The only other feature I was thinking about implementing with the data we can get is determining if the entry rose/fell in the ranks from the previous week or if its a new entry/re-entry, which should be pretty easy to do since we already have the necessary info to determine it.

from billboard-charts.

guoguo12 avatar guoguo12 commented on July 18, 2024

Actually, that's already sort of included. Each song has attributes lastPos and peakPos for last position and peak position on the chart. There's also a weeks attribute for number of weeks on chart.

image

from billboard-charts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.