Git Product home page Git Product logo

Comments (2)

ian-pvd avatar ian-pvd commented on June 15, 2024

Depending on your needs, you could just pull it from the JSON part of the page, example:

.albumRelease[0].musicReleaseFormat

I'm seeing this in the application/ld+json tag in the page markup, but where do I find it in the scraper results? I'm not seeing it in the AlbumInfo response. If there's a way to avoid making multiple scraper requests for the Album Info and then also the digital product price, that'd be really helpful.

Plus, it's trivial to use startsWith to still get a positive match on "Digital AlbumDigital Album" instead of strictly equal to, but I figured this response from the scraper deserved a bug report at least.

from bandcamp-scraper.

ian-pvd avatar ian-pvd commented on June 15, 2024

Further debugging seems to show that the releases where this is occurring actually do have two .buyItemPackageTitle spans inside the release list item.

Markup for a result without the issue:

<li class="buyItem digital">
    <h3 class="hd">    
        <button class='download-link buy-link' type="button">
              <span class="buyItemPackageTitle primaryText">Digital Album</span>
        </button>
        <div class="digitaldescription secondaryText">  Streaming + Download </div>
    </h3>
    ...
</li>

Markup returned for a result with the duplicate text issue:

<li class="buyItem digital">
    <h3 class="hd">
        <button class='download-link buy-link' type="button">
            <span class="buyItemPackageTitle primaryText">Digital Album</span>
        </button>
        <span class="buyItemPackageTitle primaryText you-own-this">Digital Album</span>
        <div class="digitaldescription secondaryText">  Streaming + Download </div>
    </h3>
    ...
</li>

This is from a dump of the html variable returned by the get function and passed into the parser function here: https://github.com/masterT/bandcamp-scraper/blob/master/lib/index.js#L58

First, I don't own this. Second, how would the scraper know that if the request is being made from node? Seems like a weird edge case, but I am seeing this behavior consistently on specific URLs.

Either way, I assume this is the cause of the duplicated text. I'm going to try to debug this further but I just wanted to post this as an update to my initial report that there wasn't duplicate text.

Also, I'm not sure what's happening with this line const $ = cheerio.load(html), but by the time I dump the data variable defined here, the duplicate text is present:

{
  products: [
    {
      imageUrls: [],
      name: 'Digital AlbumDigital Album',
      nameFallback: '',
      format: 'Digital AlbumDigital Album',
      formatFallback: '',
      priceInCents: 350,
      currency: 'EUR',
      offerMore: true,
      soldOut: false,
      nameYourPrice: false,
      description: 'Includes unlimited streaming via the free Bandcamp app, plus high-quality download in MP3, FLAC and more.'
    }
  ]
}

from bandcamp-scraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.