SpotiFile

A simple and open source spotify scraper.

Python 3.8+

2024 Update: Project has been archived!

Due to possible missuse of SpotiFile, I have decided to archive this repo. The existing code will stay up - though it no longer works and is not suited to interact with Spotify's new API. If you do wish to revive this project, please first review Spotify's developers' ToS.

Quick Start

Make sure you have python 3.8 or above.
$ git clone https://github.com/Michael-K-Stein/SpotiFile.git
$ cd SpotiFile
Now open config.py and setup your SP_KEY (Spotify has renamed this to sp_adid) and SP_DC tokens (see below)
$ python main.py

DISCLAIMER: This script is intended for personal and non-commercial use only. The purpose of this script is to create datasets for training machine learning models. Any use of this script that violates Deezer's Terms of Use or infringes on its intellectual property rights is strictly prohibited. The writer of this script is not responsible for any illegal or unauthorized use of the script by third parties. Users of this script assume all responsibility for their actions and agree to use the script at their own risk.
AVIS DE NON-RESPONSABILITÉ : Ce script est destiné à un usage personnel et non commercial uniquement. Le but de ce script est de créer des ensembles de données pour entraîner des modèles d'apprentissage automatique. Toute utilisation de ce script qui viole les Conditions d'utilisation de Deezer ou porte atteinte à ses droits de propriété intellectuelle est strictement interdite en vertu de la loi française. L'auteur de ce script n'est pas responsable de toute utilisation illégale ou non autorisée du script par des tiers. Les utilisateurs de ce script assument toutes les responsabilités de leurs actions et conviennent de l'utiliser à leurs propres risques.

What?

SpotiFile is a script which allows users to simply and easily, using a web-gui, scrape on Spotify playlists, albums, artists, etc. More advanced usages can be done by importing the relevant classes (e.g.

from spotify_scraper import SpotifyScraper

) and then using IPython to access specific Spotify API features.

Advantages

The main advantage of using SpotiFile is that it completely circumvents all of Spotify's api call limmits and restrictions. Spotifile offers an API to communicate with Spotify's API as if it were a real user. This allows SpotiFile to download information en-masse quickly.

Why?

Downloading massive amounts of songs and meta data can help if you prefer listening to music offline, or if you are desgining a music server which runs on an airgapped network. We do not encourage music piracy nor condone any illegal activity. SpotiFile is a usefull research tool. Usage of SpotiFile for other purposes is at the user's own risk. Be warned, we will not bear any responsibility for improper use of this educational software!

Proper and legitimate uses of SpotiFile:

Scraping tracks to create datasets for machine learning models.
Creating remixes (for personal use only!)
Downloading music which no longer falls under copyright law (Generally, content who's original artist passed away over 70 years ago).

Please notice Spotify's User Guidelines, and make sure you understand them. See section 5;

The following is not permitted for any reason whatsoever in relation to the Services and the material or content made available through the Services, or any part thereof: 5. "crawling" or "scraping", whether manually or by automated means, or otherwise using any automated means (including bots, scrapers, and spiders), to view, access or collect information; Usage of this "scraper" is in violation of Spotify's User Guidelines. By using this code, you assume responsibility - as you are the one "scraping" Spotify using automated means.

Please notice Deezer's Terms of Use, and make sure you understand them. See article 8 - Intellectual property;

The Recordings on the Deezer Free Service are protected digital files by national and international copyright and neighboring rights. They may only therefore be listened to within a private or family setting. Any use for a non-private purpose will expose the Deezer Free User to civil and/or criminal proceedings. Any other use of the Recordings is strictly forbidden and more particularly any download or attempt to download, any transfer or attempt to transfer permanently or temporarily on the hard drive of a computer or any other device (notably music players), any burn or attempt to burn a CD or any other support are expressly forbidden. Any resale, exchange or renting of these files is strictly prohibited. Storing, or attempting to store files from Deezer is strictly prohibited. Use this software only to create, for personal use, a custom streaming app. Notice that you can only use this streaming app in a private or family setting. By using this code, you assume responsibility to perform only legal actions - such as streaming music from Deezer for personal use.

Do adhere to your local laws regarding intellectual property!

Notice: Local law (where this was written), explicitly permits reverse engeneering for non-commercial purposes.

How?

SpotiFile starts its life by authenticating as a normal Spotify user, and then performs a wide range of conventional and unconventional API calls to Spotify in order to retrieve relevant information. SpotiFile does not actually download audio from Spotify, since they use proper DRM encryption to protect against piracy. Rather, SpotiFile finds the relevant audio file on Deezer, using the copyright id (ironically). Then SpotiFile downloads the "encrypted" audio file from Deezer, which failed to implement DRM properly. Credit for reversing Deezer's encryption goes to https://git.fuwafuwa.moe/toad/ayeBot/src/branch/master/bot.py & https://notabug.org/deezpy-dev/Deezpy/src/master/deezpy.py & https://www.reddit.com/r/deemix/ (Original reversing algorithm has been taken down).

Features

Authenticating as a legitimate Spotify user.
Scraping tracks from a playlist.
Scraping tracks from an album.
Scraping tracks from an artist.
Scraping playlists from a user.
Scraping playlists from a catergory.
Scraping a track from a track url.
Scraping artist images.
Scraping popular playlists' metadata and tracks.
Premium user token snatching (experimental).
Scraping song lyrics (time synced when possible).
Scraping track metadata.
Scraping category metadata.

SP_KEY & SP_DC tokens

Obtaining sp_dc and sp_key cookies (sp_key is now called sp_adid) SpotiFile uses two cookies to authenticate against Spotify in order to have access to the required services. Shoutout to @fondberg for the explanation https://github.com/fondberg/spotcast

To obtain the cookies, these different methods can be used:

Chrome based browser

Open a new Incognito window at https://open.spotify.com and login to Spotify. Press Command+Option+I (Mac) or Control+Shift+I or F12. This should open the developer tools menu of your browser. Go into the application section. In the menu on the left go int Storage/Cookies/open.spotify.com. Find the sp_dc and sp_key and copy the values. Close the window without logging out (Otherwise the cookies are made invalid).

Firefox based browser

Open a new Incognito window at https://open.spotify.com and login to Spotify. Press Command+Option+I (Mac) or Control+Shift+I or F12. This should open the developer tools menu of your browser. Go into the Storage section. (You might have to click on the right arrows to reveal the section). Select the Cookies sub-menu and then https://open.spotify.com. Find the sp_dc and sp_key and copy the values. Close the window without logging out (Otherwise the cookies are made invalid).

Example usages:

Using SpotiFile to create a song recommendation module based off song lyrics' semantic similarity:

from spotify_scraper import SpotifyScraper
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import sys


def semantic_similarity(paragraph1, paragraph2):
    # Preprocess text
    stop_words = set(stopwords.words('english'))
    paragraph1 = ' '.join([word.lower() for word in nltk.word_tokenize(paragraph1) if word.lower() not in stop_words])
    paragraph2 = ' '.join([word.lower() for word in nltk.word_tokenize(paragraph2) if word.lower() not in stop_words])

    # Compute similarity score
    tfidf_vectorizer = TfidfVectorizer()
    tfidf_matrix = tfidf_vectorizer.fit_transform([paragraph1, paragraph2])
    similarity_score = cosine_similarity(tfidf_matrix)[0][1]

    return similarity_score


# Usage
scraper = SpotifyScraper()

lyrics1 = '\n'.join(x['words'] for x in scraper.get_lyrics(sys.argv[1])['lyrics']['lines'])
lyrics2 = '\n'.join(x['words'] for x in scraper.get_lyrics(sys.argv[2])['lyrics']['lines'])

sim = semantic_similarity(lyrics1, lyrics2)

print(f'The similarity between the two tracks is: {sim}')

Legal

The use of a script to download music and lyrics from Deezer for personal use only, to create machine learning datasets for non-commercial use, is not illegal under French and Israeli law. The use of such a script falls under the doctrine of fair use or fair dealing, which allows individuals to make copies of copyrighted works for their own private and non-commercial use without requiring permission from the copyright owner.

This interpretation is supported by precedent. In the case of Société Civile des Producteurs Phonographiques v. Delorme, the French Court of Cassation held that copying music for personal and non-commercial use is allowed under the doctrine of fair use. The court held that such copying did not infringe on the rights of the copyright owner as it did not compete with the original work or harm the market for the original work.

Furthermore, the purpose of using the script is to create machine learning datasets for non-commercial use, which falls under the category of research and study. Many countries, including France and Israel, have exceptions to copyright infringement for the purposes of research and study, which allow individuals to use copyrighted works without the need for permission from the copyright owner.

It is also worth noting that the script is not being used to distribute the copyrighted works to others or to make a profit, which reduces the likelihood of any significant harm to the copyright owner's rights.

Finally, the disclaimer notice attached to the script explicitly states that the script is intended for personal and non-commercial use only, and that any use of the script that violates Deezer's Terms of Use or infringes on its intellectual property rights is strictly prohibited. The writer of the script has taken reasonable steps to ensure that users understand the limitations of the script and are aware that any unauthorized use is prohibited.

In conclusion, the use of a script to download music and lyrics from Deezer for personal use only to create machine learning datasets for non-commercial use is legal under French and Israeli law. The doctrine of fair use and exceptions for research and study, as well as the absence of any significant harm to the copyright owner's rights and the presence of a clear disclaimer notice, support this interpretation.

Web GUI: Errors

The Web GUI is up and running with a confirmed access token generated.

Error 1

When trying to download songs, artists, playlists, etc., via direct URL, this error shows up on the bottom of the web GUI:

2023-04-07 09:14:09 | Recieved scrape command on identifier: https://open.spotify.com/track/4cOdK2wGLETKBW3PvgPWqT, recursive=False, recursive_artist=False, recursive_album=False

2023-04-07 09:14:09 | Thread<123145570799616> | Exception: Expecting value: line 1 column 1 (char 0)

2023-04-07 09:14:09 | Thread<123145570799616> | Processed 1 tracks

2023-04-07 09:14:09 | Comletely done scraping identifier: https://open.spotify.com/track/4cOdK2wGLETKBW3PvgPWqT!

At this point, there was no track downloaded (not in save folder nor temp folder).

Error 2

When trying to download songs by category, this error shows up in the web GUI window:

Access token is anon: False
127.0.0.1 - - [07/Apr/2023 09:18:37] "GET /info/console/?offset=52 HTTP/1.1" 200 -
127.0.0.1 - - [07/Apr/2023 09:18:37] "POST /actions/download/categories HTTP/1.1" 500 -
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/flask/app.py", line 2551, in call
return self.wsgi_app(environ, start_response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/flask/app.py", line 2531, in wsgi_app
response = self.handle_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sasquelch/Documents/SpotiFile/webgui.py", line 36, in actions_download_categories
download_all_categories_playlists(download_meta_data_only=False, query=query)
File "/Users/sasquelch/Documents/SpotiFile/spotify_mass_download.py", line 158, in download_all_categories_playlists
client.refresh_tokens()
File "/Users/sasquelch/Documents/SpotiFile/spotify_client.py", line 39, in refresh_tokens
self.get_tokens(self.dc, self.key)
File "/Users/sasquelch/Documents/SpotiFile/spotify_client.py", line 33, in get_tokens
self._client_token = self.get_client_token(self._client_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sasquelch/Documents/SpotiFile/spotify_client.py", line 60, in get_client_token
return response.json()['granted_token']['token']
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
127.0.0.1 - - [07/Apr/2023 09:18:37] "GET /actions/download/categories?debugger=yes&cmd=resource&f=debugger.js HTTP/1.1" 304 -
127.0.0.1 - - [07/Apr/2023 09:18:37] "GET /actions/download/categories?debugger=yes&cmd=resource&f=style.css HTTP/1.1" 304 -
127.0.0.1 - - [07/Apr/2023 09:18:37] "GET /actions/download/categories?debugger=yes&cmd=resource&f=console.png HTTP/1.1" 304 -

Firefox 111.0.1 (64-bit)
Python 3.11.3
Requirements installed

michael-k-stein / spotifile Goto Github PK

spotifile's Introduction