Comments (3)
This one is a bit more tricky... There's an API for JW broadcasting, and there's an API for downloading publications. But I haven't seen any API for articles and pages on the website, and I wouldn't think there is any either, because that would be overkill.
That would mean we need a web page scrapper. And that would mean it could break whenever there's an update to the layout etc of the webpage.
I know there's interest in scrapping jw.org, not only for downloading a bunch of audio, but also for things like a jw.org news client for Kodi etc... It would be nice, but it's a bit of a project on its own.
I'll take a look at how the audio recordings are handled, but chances are all solutions are too fragile.
from jw-scripts.
May I ask why you need this, and how Python-savvy you are?
from jw-scripts.
Yeah if you can get hold of the document ID there is an API to download the MP3s... But the kink is to get the ID... I'm giving you an unorthodox quick fix here and it only works for web articles. Tweak it to suit your needs.
#!/usr/bin/env python3
# Run the program with an jw.org URL as an argument to
# download all recordings that are referenced to in that page
import sys, re, urllib.request, json
lang = 'E'
api_url = 'https://apps.jw.org/GETPUBMEDIALINKS?output=json&alllangs=0&fileformat=MP3&langwritten=' + lang + '&txtCMSLang=' + lang + '&docid='
data = urllib.request.urlopen(sys.argv[1]).read().decode('utf-8')
matches = re.finditer('data-page-id="mid([^"]*)"', data)
ids = set(x.group(1) for x in matches) # set() removes all doubles
for i in ids:
try:
print('requesting data about', i)
response = urllib.request.urlopen(api_url + i)
except:
continue
tree = json.loads(response.read().decode('utf-8'))
file_url = tree['files'][lang]['MP3'][0]['file']['url'] # Assuming there's only one MP3
file_title = tree['files'][lang]['MP3'][0]['title']
file_name = re.sub('[<>:"|?*/\0]', '', file_title) + '.mp3' # NTFS safe
print('downloading', file_title)
urllib.request.urlretrieve(file_url, filename=file_name)
from jw-scripts.
Related Issues (20)
- UnicodeEncodeError HOT 2
- is it posible to download songs in mp3 format HOT 2
- Lyrics and lead sheets HOT 1
- JW Broadcasting video bug HOT 5
- Return watching HOT 5
- List languages HOT 3
- Checksums are very slow HOT 8
- possible to get the friendly name ? HOT 3
- Windows: link not working HOT 3
- All videos/media for meeting that week HOT 1
- TypeError on windows 10 HOT 4
- new approach HOT 1
- shuffle all the time HOT 4
- Improving streaming HOT 12
- Not an Issue: Just a windows example HOT 2
- Match subtitle names and add language code to filename HOT 13
- jw-scripts 2.0 survey (I want your brain) HOT 7
- Symlinks are not created correctly HOT 2
- issue with downloading HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jw-scripts.