Comments (2)
Depending on your needs, you could just pull it from the JSON part of the page, example:
.albumRelease[0].musicReleaseFormat
I'm seeing this in the application/ld+json
tag in the page markup, but where do I find it in the scraper results? I'm not seeing it in the AlbumInfo response. If there's a way to avoid making multiple scraper requests for the Album Info and then also the digital product price, that'd be really helpful.
Plus, it's trivial to use startsWith
to still get a positive match on "Digital AlbumDigital Album" instead of strictly equal to, but I figured this response from the scraper deserved a bug report at least.
from bandcamp-scraper.
Further debugging seems to show that the releases where this is occurring actually do have two .buyItemPackageTitle
spans inside the release list item.
Markup for a result without the issue:
<li class="buyItem digital">
<h3 class="hd">
<button class='download-link buy-link' type="button">
<span class="buyItemPackageTitle primaryText">Digital Album</span>
</button>
<div class="digitaldescription secondaryText"> Streaming + Download </div>
</h3>
...
</li>
Markup returned for a result with the duplicate text issue:
<li class="buyItem digital">
<h3 class="hd">
<button class='download-link buy-link' type="button">
<span class="buyItemPackageTitle primaryText">Digital Album</span>
</button>
<span class="buyItemPackageTitle primaryText you-own-this">Digital Album</span>
<div class="digitaldescription secondaryText"> Streaming + Download </div>
</h3>
...
</li>
This is from a dump of the html
variable returned by the get function and passed into the parser function here: https://github.com/masterT/bandcamp-scraper/blob/master/lib/index.js#L58
First, I don't own this. Second, how would the scraper know that if the request is being made from node? Seems like a weird edge case, but I am seeing this behavior consistently on specific URLs.
Either way, I assume this is the cause of the duplicated text. I'm going to try to debug this further but I just wanted to post this as an update to my initial report that there wasn't duplicate text.
Also, I'm not sure what's happening with this line const $ = cheerio.load(html)
, but by the time I dump the data
variable defined here, the duplicate text is present:
{
products: [
{
imageUrls: [],
name: 'Digital AlbumDigital Album',
nameFallback: '',
format: 'Digital AlbumDigital Album',
formatFallback: '',
priceInCents: 350,
currency: 'EUR',
offerMore: true,
soldOut: false,
nameYourPrice: false,
description: 'Includes unlimited streaming via the free Bandcamp app, plus high-quality download in MP3, FLAC and more.'
}
]
}
from bandcamp-scraper.
Related Issues (20)
- Search only for a particular type HOT 3
- Ability to get an album's tags HOT 2
- Issue with Webpack HOT 2
- Support for labels? HOT 2
- getAlbumInfo crashes server if invalid url is provided
- Relative Module not Found - Cheerio.js HOT 2
- getAlbumInfo breaks HOT 3
- Album info is broken again? HOT 1
- getAlbumInfo bug with "preview" albums HOT 3
- Add support for getting detailed artist info from the artist page HOT 2
- Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at ... (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). HOT 2
- Some albumUrl's invalid? HOT 5
- https://psychic-health.bandcamp.com/album/the-spaces-between-colors HOT 1
- Request: Add getLabelInfo() HOT 5
- Are you still working on this project? HOT 1
- getArtistUrls() returns empty array for some labels
- Refactor using TypeScript
- Fix certificate error in tests
- Request: Add Release Date to getAlbumswithTag
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bandcamp-scraper.