jonchar / ma-scraper Goto Github PK
View Code? Open in Web Editor NEWScraper for the Metal Archives.
Scraper for the Metal Archives.
Hi, I am just getting into web-scraping and is experimenting with metal-archives.com. I just found your ma-scraper here and it looks super helpful for me to learn!
However, the code does not seems to be able to get data from MA right now (was working just a few days earlier, maybe because MA moved their data on a new server? from their homepage: Maintenance / 2018-10-19 14:50
The site will be migrating to a new server tonight at midnight EDT / 4 am UTC. There will be some downtime. We'll try to make the process as quick and smooth as possible.
)
specifically, it seems that the problem is with
r = requests.get(BASEURL + RELURL + letter, params=payload)
,
that I am unable to get data by requests, it just returns
<Response [403]>
I absolutely have no idea how to make the scraper work at this point, so I was wondering someone more knowledgeable than me like you have an idea of what's going on.
Thank you very much for your work!
Hello,
First of all, thank you so much for this. I was trying to make a MA scraper myself, particularly for lyrics and reviews, but couldn't get it to work. Maybe I should pick a less tricky (and less interesting) site to try web-scraping the first time!
Anyway, after running MA_review_scraper.py everything was copied except for the ReviewContent. I tried using a small subset of the code on a sample review it managed to successfully print.
url = "https://www.metal-archives.com/reviews/Death/Scream_Bloody_Gore/598/CactusSlaughter/400395"
r = requests.get(url)
html = r.text
# Create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html, "lxml")
review_soup = BeautifulSoup(r.text, 'html.parser')
review_title = review_soup.find_all('h3')[0].text.strip()[:-6]
review = review_soup.find_all('div', {'class': 'reviewContent'})[0].text
print(review)
Hello! I tried to use MA_band_scraper.py for getting full list of bands, but code was crashing with error json.decoder.JSONDecodeError: Expecting value: line 4 column 11 (char 66)
. After some checking stuff I found that problem is in request response, as value after "sEcho":
is empty space. How it looks:
{
"iTotalRecords": 11835,
"iTotalDisplayRecords": 11835,
"sEcho": ,
"aaData": [
[
"<a href='https://www.metal-archives.com/bands/A_--_Solution/3540442600'>A // Solution</a>",
"United States",
... a lot of strings
Therefore, to fix it, I manually added value to each such response - as payload in lines 36-38 does not help. What's worse is that payload does not work at all - every chunk of band data is the same first 500 bands. It changes only with changing letters, but for one letter every chunk is the same. Here is code after my slight changes: https://gist.github.com/ramskyi/8d831e561d835ef0659bcfb8788ca4e0
Hi Jon, thank you for your brilliant data analyzer for MA :-) However, I seem to have hit a small problem, I'm not sure how to circumvent;
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported.
The error occurs at:
code
I'm just running the notebook as it is - no changes, other than an updated .csv file from MA. Everything works like a charm, until the routine gets to clustering.
I've tried for days to figure out what this means and tried numerous things, but to no avail. Please bear with me - I haven't programmed for 18+ years, so I'm a bit rusty.
Hope you can help, since you little program seems really awesome ๐
Best regards
Kim
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.