Git Product home page Git Product logo

bandcamp-scraper's Introduction

bandcamp-scraper

npm version Test Test daily JavaScript Style Guide

Bandcamp Logo

A scraper for https://bandcamp.com

The scraper allows you to:

  • search artist, album, track, fan, label
  • get album urls from an artist url
  • get album info from an album url
  • get album products from an album url
  • get artist info from an artist url

Why ?

Because Bandcamp has shut down their public API and don't plan to reopen it.

https://bandcamp.com/developer

Installation

npm i --save bandcamp-scraper

Usage

search(params, callback)

Search any resources that match the given params.query for the current params.page.

  • params Object - query String - page Integer (default 1)
  • callback Function(error, searchResults)

Search Results

An array of resources that have different properties depending on their type property: artist, album, track, fan, or label.

Every resource matches the search-result JSON schema.

Example

const bandcamp = require('bandcamp-scraper')

const params = {
  query: 'Coeur de pirate',
  page: 1
}

bandcamp.search(params, function (error, searchResults) {
  if (error) {
    console.log(error)
  } else {
    console.log(searchResults)
  }
})

View example with output.

getAlbumsWithTag(params, callback)

Search for albums with the tag params.tag for the current params.page.

  • params Object - tag String - page Integer (default 1)
  • callback Function(error, tagResults)

Tag Results

An array of album information. Matches the tag-result JSON schema.

Example

const bandcamp = require('bandcamp-scraper')

const params = {
  tag: 'nuwrld',
  page: 1
}

bandcamp.getAlbumsWithTag(params, function (error, tagResults) {
  if (error) {
    console.log(error)
  } else {
    console.log(tagResults)
  }
})

View example with output.

getAlbumUrls(artistUrl, callback)

Retrieve the album URLs from an artist URL. Please note: for Bandcamp labels you may want to use the getArtistsUrls function to retrieve the list of signed artists first.

  • artistUrl String
  • callback Function(error, albumUrls)

Example

const bandcamp = require('bandcamp-scraper')

const artistUrl = 'http://musique.coeurdepirate.com/'
bandcamp.getAlbumUrls(artistUrl, function (error, albumUrls) {
  if (error) {
    console.log(error)
  } else {
    console.log(albumUrls)
  }
})

View example with output.

getAlbumProducts(albumUrl, callback)

Retrieves all the album's products from its URL.

  • albumUrl String
  • callback Function(error, albumProducts)

Album Products

An array of album products that matches the album-product JSON schema.

Example

const bandcamp = require('bandcamp-scraper')

const albumUrl = 'http://musique.coeurdepirate.com/album/blonde'
bandcamp.getAlbumProducts(albumUrl, function (error, albumProducts) {
  if (error) {
    console.log(error)
  } else {
    console.log(albumProducts)
  }
})

View example with output.

getAlbumInfo(albumUrl, callback)

Retrieves the album's info from its URL.

  • albumUrl String
  • callback Function(error, albumInfo)

Album Info

An Object that represents the album's info. It matches the album-info JSON schema.

Example

const bandcamp = require('bandcamp-scraper')

const albumUrl = 'http://musique.coeurdepirate.com/album/blonde'
bandcamp.getAlbumInfo(albumUrl, function (error, albumInfo) {
  if (error) {
    console.log(error)
  } else {
    console.log(albumInfo)
  }
})

View example with output.

getArtistUrls(labelUrl, callback)

Retrieves an array of artist URLs from a label's URL for further scraping.

  • labelUrl String
  • callback Function(error, albumInfo)

Example

const bandcamp = require('bandcamp-scraper')

const labelUrl = 'https://randsrecords.bandcamp.com'
bandcamp.getArtistUrls(labelUrl, function (error, artistsUrls) {
  if (error) {
    console.log(error)
  } else {
    console.log(artistsUrls)
  }
})

View example with output.

getArtistInfo(artistUrl, callback)

Retrieves the artist's info from its URL.

  • artistUrl String
  • callback Function(error, artistInfo)

Artist Info

An Object that represents the artist's info. It matches the artist-info JSON schema.

Example

const bandcamp = require('bandcamp-scraper')

const artistUrl = 'http://musique.coeurdepirate.com'
bandcamp.getArtistInfo(artistUrl, function (error, artistInfo) {
  if (error) {
    console.log(error)
  } else {
    console.log(artistInfo)
  }
})

View example with output.

getTrackInfo(trackUrl, callback)

Retrieves the track info from its URL.

  • trackUrl String
  • callback Function(error, trackInfo)

Track Info

An Object that represents the track's info. It matches the track-info JSON schema.

Example

const bandcamp = require('bandcamp-scraper')

const trackUrl = 'https://dafnez.bandcamp.com/track/serenade'
bandcamp.getTrackInfo(trackUrl, function (error, trackInfo) {
  if (error) {
    console.log(error)
  } else {
    console.log(trackInfo)
  }
})

Test

Feature tests are run daily, thanks to GitHub Action schedule actions. This way we know if the scraper is ever broken.

Run the test:

npm test

Contributing

Contribution is welcome! Open an issue first.

License

MIT.

bandcamp-scraper's People

Contributors

arlenpeiffer avatar bonghead420 avatar idbentley avatar mastert avatar mattbierner avatar maxp-hover avatar nukeop avatar pierluigi avatar seizeweb avatar soundofjw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bandcamp-scraper's Issues

Wish there was a way to get a more comprehensive list of labels

I'm using this tool to try and build an index of labels on bandcamp, sorted by genre.

I have a script that takes a query ("metal") and paginates through the results until they're empty, selecting only those that are labels. The problem is that it only finds 8 or so results - all in the first 2 pages, and there are only 6 total pages of results.

I don't think this is a flaw in your tool in as much as bandcamp doesn't offer this functionality. But I cannot imagine any reason they would not want to do this. It seems so obvious that there should be a categorized list of labels, but there isn't.

I would be open to contributing some code if you can think of a way to achieve this. Or if you have any hunches that would be appreciated too.

Ability to get an album's tags

This is a nice project, but I am surprised it doesn't include the tags from getAlbumInfo. It will probably not be that difficult to get it from the HTML page.

I will hopefully be submitting a PR for this in the near future.

Relative Module not Found - Cheerio.js

I've got a gridsome (vue) project where I'm trying to use the bandcamp scraper and I'm getting the following error:

ERROR  Failed to compile with 2 errors                                      16:27:13

This relative module was not found:

* ./package in ./node_modules/cheerio-req/node_modules/cheerio/index.js, ./node_modules/scrape-it/node_modules/cheerio/index.js

I've tried a few different fixes but I'm still learning and I feel like I'm missing something obvious

parseAlbumRawInfo return null

It looks like Bandcamp update their definition of the variable TralbumData in the <script> tag of the page. That makes the function parseAlbumRawInfo return null which fails the validation. Does anyone have an idea how to fix this? @soundofjw

getAlbumInfo bug with "preview" albums

Hi !

I think I might have found a bug, when trying to retrieve infos from a release that is in "preview" on BC (ie: when the whole tracklist is available, but only a couple tracks are actually playable).

I tried with 3 releases, and I seem to get the same error for all 3 of them :
Validation error on album info: data.tracks[0] should have required property 'name'

Here are the 3 example releases if you want to check out for yourself (bear in mind, that based on when you check out these links, these releases might be out and not be in "preview" anymore) :
https://lobstertheremin.com/album/girl-like-me-ep
https://venusexmachina.com/album/lux
https://biosphere.bandcamp.com/album/angels-flight

Whereas it works perfectly for all "non-preview" releases.

PS: if indeed this is something you'd like fixed, I'm happy to have a go at contributing.

TypeError('Parameter "url" must be a string, not ' + typeof url);

   bandcamp.getAlbumInfo('https://fasterthanmusic.bandcamp.com/album/pr4kaneokastrna-ep', function (error, albumUrls) {
                if (error) {
                    console.log(error);
                } else {
                    console.log(albumUrls.imageUrl);
                    }
                }
            });

TypeError: Parameter "url" must be a string, not undefined
at Url.parse (url.js:103:11)
at urlParse (url.js:97:13)
at Url.resolve (url.js:654:29)
at Object.urlResolve [as resolve] (url.js:650:40)
at Object.convert (C:\Users[User]\node_modules\bandcamp-scraper\lib\htmlParser.js:235:34)
at C:\Users\Stefa\node_modules\scrape-it\lib\index.js:180:38
at iterateObject (C:\Users[User]\node_modules\iterate-object\lib\index.js:25:17)
at handleDataObj (C:\Users[User]\node_modules\scrape-it\lib\index.js:130:9)
at C:\Users[User]\node_modules\scrape-it\lib\index.js:155:32
at iterateObject (C:\Users[User]\node_modules\iterate-object\lib\index.js:25:17)

With every other url is working fine, but when I type this one I get this error. Idk if it's my fault or the scraper isn't working as it should.

Issue with Webpack

First off, thank you for creating this repo!! I am excited to use it. I brought it into a project and I am seeing the following errors:

./node_modules/bandcamp-scraper/node_modules/ajv/lib/async.js
Critical dependency: the request of a dependency is an expression

./node_modules/bandcamp-scraper/node_modules/ajv/lib/async.js
Critical dependency: the request of a dependency is an expression

./node_modules/bandcamp-scraper/node_modules/ajv/lib/compile/index.js
Critical dependency: the request of a dependency is an expression

Can you recreate this error in a webpack project? It looks like its related to ajv:
https://stackoverflow.com/questions/42908116/webpack-critical-dependency-the-request-of-a-dependency-is-an-expression

FYI, I have already reset my npm cache

update .gitignore

Should add package-lock.json to .gitignore...

I would be happy to make this change.

Request: Add Release Date to getAlbumswithTag

First of all, thank you very much for this project. I review music for a certain unnamed publication, and this repository has saved me countless hours.

It would be incredibly helpful if the getAlbumswithTag function also displayed the release date - even just the release year. This function would be incredibly useful for tracking genre trends, new genres, and more.

I am just beginning to learn Python - shit I barely got this code to work myself - so I have no idea to the complexity of my request. I also know whomever reads this is also likely incredibly busy, but I figured I'd toss the idea out anyway.

I do also apologize if this function is already possible!

Cheers,

OZ

getArtistUrls() returns empty array for some labels

Example:

getArtistUrls('http://rufdug.bandcamp.com/', (err, result) => {
   assert(result.length === 0) // !!!
})

This label has 16 releases currently. Many other labels I've tried produce the same issue.

I'm using 1.4.1.

Thanks.

Search only for a particular type

Currently the search function returns all types of information. Is there a clean way to only search for one particular type—for example, albums?

Support for labels?

Is there a way to also add support for returning the full list of albums for a label?

It seems as though bandcamp.getAlbumUrls(artistUrl, _) only returns the albums published as the artist name matching the label's name. Alternatively, a method that returns the list of artists for a label, as an entry point for further scraping.

Error Cannot read property 'title' of null

.../bandcamp-scraper/lib/htmlParser.js:351
      object.name = albumInfo.title;
                             ^

TypeError: Cannot read property 'title' of null
    at /.../bandcamp-scraper/lib/htmlParser.js:351:30
    at Array.reduce (native)
    at Object.exports.parseAlbumProducts (.../bandcamp-scraper/lib/htmlParser.js:341:24)
    at .../bandcamp-scraper/lib/index.js:63:33
    at Function.<anonymous> (.../tinyreq/lib/index.js:58:9)
    at res (.../assured/lib/index.js:27:12)
    at IncomingMessage.<anonymous> (.../tinyreq/lib/index.js:101:13)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:188:7)
    at endReadableNT (_stream_readable.js:975:12)

getAlbumInfo crashes server if invalid url is provided

returns TypeError: Cannot read property 'replace' of undefined if provided url is not a proper bandcamp route.. seems to be being caused by line 279 in htmlParser.js

raw =  raw.replace('" + "', '')

i think returning an empty string as a fallback may potentially fix it.. happy to submit a PR if that's cool!

Return license information of albums and tracks

I do not see any license information returned by getAlbumInfo and it would be nice to provide that information.

For example, running:

var bandcamp = require("bandcamp-scraper");
var albumUrl='https://bit-rot.bandcamp.com/album/twisted-pair';
console.log(">>>", albumUrl);
bandcamp.getAlbumInfo(albumUrl, function(error, albumInfo) {
  if (error) { console.log(error); }
  else { console.log(albumInfo); }
});

returns:

>>> https://bit-rot.bandcamp.com/album/twisted-pair
{ artist: 'bit rot',
  title: 'Twisted Pair',
  imageUrl: 'https://f4.bcbits.com/img/a1538849378_2.jpg',
  tracks:
   [ { name: 'Uplink',
       url: 'https://bit-rot.bandcamp.com/track/uplink',
       duration: '06:19' },
     { name: 'Driver',
       url: 'https://bit-rot.bandcamp.com/track/driver',
       duration: '04:34' },
     { name: 'Psychadelic Death Trip',
       url: 'https://bit-rot.bandcamp.com/track/psychadelic-death-trip',
       duration: '06:18' },
     { name: 'POST',
       url: 'https://bit-rot.bandcamp.com/track/post',
       duration: '02:22' } ],
  raw:
   { current:
      { purchase_url: null,
        release_date: '24 Jan 2018 00:00:00 GMT',
        new_desc_format: 1,
        selling_band_id: 1888597831,
        set_price: 7,
        killed: null,
        purchase_title: null,
        minimum_price_nonzero: 7,
        title: 'Twisted Pair',
        new_date: '24 Jan 2018 02:32:19 GMT',
        featured_track_id: 286527331,
        minimum_price: 0,
        is_set_price: null,
        upc: null,
        credits: 'https://celebornidril.bandcamp.com/',
        private: null,
        art_id: 1538849378,
        require_email: null,
        id: 1214178877,
        band_id: 1888597831,
        about: 'Collaborations between bit rot & Celebornidril',
        require_email_0: null,
        download_pref: 2,
        publish_date: '24 Jan 2018 03:19:17 GMT',
        audit: 0,
        type: 'album',
        download_desc_id: null,
        auto_repriced: null,
        artist: null,
        mod_date: '17 Sep 2018 17:50:13 GMT' },
     is_preorder: null,
     album_is_preorder: null,
     album_release_date: '24 Jan 2018 00:00:00 GMT',
     preorder_count: null,
     hasAudio: true,
     art_id: 1538849378,
     trackinfo: [ [Object], [Object], [Object], [Object] ],
     playing_from: 'album page',
     featured_track_id: 286527331,
     initial_track_num: null,
     packages: null,
     url: 'http://bit-rot.bandcamp.com/album/twisted-pair',
     defaultPrice: 7,
     freeDownloadPage:
      'https://bandcamp.com/download?id=1214178877&ts=1550427274.1409455241&tsig=2e8c8dec6b5ffd439741a5698ac690d4&type=album',
     FREE: 1,
     PAID: 2,
     artist: 'bit rot',
     item_type: 'album',
     id: 1214178877,
     last_subscription_item: null,
     has_discounts: null,
     is_bonus: null,
     play_cap_data: null,
     client_id_sig: null,
     is_purchased: null,
     items_purchased: null,
     is_private_stream: null,
     is_band_member: null,
     licensed_version_ids: null,
     package_associated_license_id: null,
     tralbum_collect_info: { show_collect: true, show_wishlist_tooltip: false } },
  url: 'https://bit-rot.bandcamp.com/album/twisted-pair' }

The album is put under a CC-BY-SA license but I don't see that reflected in the returned data. I do see a licensed_version_ids and package_associated_license_id but I'm not sure if that's relevant to the license the album is put under and they're both null in this case.

Request: Add getLabelInfo()

I would love to be able to provide a labelUrl to a function and get info about the label, like its name, bio, links, etc. Any plans to add this @masterT?

Thank you!

Typos in README.md

I noticed there are a few typos in the README file. I'd be happy to correct them.

Some albumUrl's invalid?

Hello and thanks for this great package.

I don't see what's different about certain URL's or label/artist profiles that would affect this, but a call like:

bandcamp.getAlbumInfo('https://yantmusicuk.bandcamp.com/?label=3961057738&tab=artists/album/contravention-ep-sk11x006', ...)

loads data just fine, but another URL (also returned from getArtistInfo) does not return data:

bandcamp.getAlbumInfo('https://borderonerecords.bandcamp.com/?label=3961057738&tab=artists/album/zener-diode-volt001a', ...)

You'll notice that if you point your browser to the first URL, the album page loads, whereas the second URL redirects to the artist's album grid.

When I click on a link to the album in question, the URL looks different from that returned from getArtistInfo, so I'm wondering if perhaps something's changed and needs to be updated?

Thanks again.

CORS issue when used in a webpage

I'm trying to use this API inside of a react app created by create-react-app.
Here is my code

const bandcamp = require("bandcamp-scraper");
var albumUrl = 'https://patthebunny.bandcamp.com/album/ceschi-pat-the-bunny-split-12-and-zine';

bandcamp.getAlbumInfo(albumUrl, function(error, albumInfo) {
  if (error) {
    console.log(error);
  } else {
    console.log(albumInfo);
  }
});

Here is the error message printed to the console.

localhost/:1 Access to fetch at 'https://patthebunny.bandcamp.com/album/ceschi-pat-the-bunny-split-12-and-zine' from origin 'http://localhost:3000' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

The request is occurring within your library so I have no way to set no-cors request mode.

Fix search result schema json

This is started to complain about wrong schema in search result schema. Problem is with oneOf it is now #/oneOf. So #/ must be moved from child object $ref to parent like this:

"#/oneOf": [
    { "$ref": "definitions/artist" },
    { "$ref": "definitions/album" },
    { "$ref": "definitions/track" },
    { "$ref": "definitions/fan" },
    { "$ref": "definitions/label" }
  ]

getAlbumInfo breaks

Hi there,

The same error is returned no matter which album url I'm passing.

SyntaxError: Unexpected EOF at line 1 column 2 of the JSON5 data. Still to read: ""
at error (.../node_modules/json5/lib/json5.js:56:25)
at word (.../node_modules/json5/lib/json5.js:393:13)
at value (.../node_modules/json5/lib/json5.js:493:56)
at Object.parse (.../node_modules/json5/lib/json5.js:508:18)
at Object.exports.parseAlbumInfo (.../node_modules/bandcamp-scraper/lib/htmlParser.js:281:24)
at .../node_modules/bandcamp-scraper/lib/index.js:51:34
at Function. (.../node_modules/tinyreq/lib/index.js:58:9)
at res (.../node_modules/assured/lib/index.js:27:12)
at IncomingMessage. (.../node_modules/tinyreq/lib/index.js:101:13)
at IncomingMessage.emit (events.js:327:22) {
at: 1,
lineNumber: 1,
columnNumber: 2
}
Validation error on album info: data should have required property 'raw' {
tags: [
{ name: 'pop' },
{ name: 'amour' },
{ name: 'coeur de pirate' },
{ name: 'french' },
{ name: 'french pop' },
{ name: 'grosse boîte' },
{ name: 'montreal' },
{ name: 'piano pop' },
{ name: 'Montréal' }
],
artist: 'Cœur de pirate',
title: 'Blonde',
imageUrl: 'https://f4.bcbits.com/img/a1328452291_2.jpg',
tracks: [
{
name: 'Lève les voiles',
url: 'http://musique.coeurdepirate.com/track/l-ve-les-voiles',
duration: '01:12'
},
{
name: 'Adieu',
url: 'http://musique.coeurdepirate.com/track/adieu',
duration: '02:27'
},
{
name: 'Danse et danse',
url: 'http://musique.coeurdepirate.com/track/danse-et-danse',
duration: '03:10'
},
{
name: 'Golden Baby',
url: 'http://musique.coeurdepirate.com/track/golden-baby',
duration: '03:07'
},
{
name: 'Ava',
url: 'http://musique.coeurdepirate.com/track/ava',
duration: '03:16'
},
{
name: "Loin d'ici",
url: 'http://musique.coeurdepirate.com/track/loin-dici',
duration: '02:43'
},
{
name: 'Les amours dévouées',
url: 'http://musique.coeurdepirate.com/track/les-amours-d-vou-es',
duration: '02:27'
},
{
name: 'Place de la République',
url: 'http://musique.coeurdepirate.com/track/place-de-la-r-publique',
duration: '04:11'
},
{
name: 'Cap Diamant',
url: 'http://musique.coeurdepirate.com/track/cap-diamant',
duration: '02:43'
},
{
name: 'Verseau',
url: 'http://musique.coeurdepirate.com/track/verseau',
duration: '03:53'
},
{
name: 'Saint-Laurent',
url: 'http://musique.coeurdepirate.com/track/saint-laurent',
duration: '03:14'
},
{
name: 'La petite mort',
url: 'http://musique.coeurdepirate.com/track/la-petite-mort',
duration: '03:49'
}
],
url: 'http://musique.coeurdepirate.com/album/blonde'
}
null

Duplicate Product Format String

When using getAlbumProducts, some URLs return duplicated strings for the format prop.

For example:

bandcamp.getAlbumProducts('https://bandcamp.prspct.nl/album/the-hardcore-party-ep', function (error, albumProducts) {
    console.log(albumProducts);
});

This consistently returns "Digital AlbumDigital Album" as the format. I'm not sure how this is happening, since the . buyItemPackageTitle element only contains this text once.

This seems to happen to certain URLs consistently, ex:

I'm using a random URL out of a set of 1000 for debugging in my app, and I'm seeing this ~5% of the time.

It also seems to happen to the name prop for some URLs, and I'm also seeing the string "Full Digital Discography" doubled.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.