Dear brother, This is not an issue....more of a feature request ^_^<

download experiences audio about jw-scripts HOT 3 CLOSED

jongkok commented on August 14, 2024

download experiences audio

from jw-scripts.

Comments (3)

allejok96 commented on August 14, 2024

This one is a bit more tricky... There's an API for JW broadcasting, and there's an API for downloading publications. But I haven't seen any API for articles and pages on the website, and I wouldn't think there is any either, because that would be overkill.

That would mean we need a web page scrapper. And that would mean it could break whenever there's an update to the layout etc of the webpage.

I know there's interest in scrapping jw.org, not only for downloading a bunch of audio, but also for things like a jw.org news client for Kodi etc... It would be nice, but it's a bit of a project on its own.

I'll take a look at how the audio recordings are handled, but chances are all solutions are too fragile.

from jw-scripts.

allejok96 commented on August 14, 2024

May I ask why you need this, and how Python-savvy you are?

from jw-scripts.

allejok96 commented on August 14, 2024

Yeah if you can get hold of the document ID there is an API to download the MP3s... But the kink is to get the ID... I'm giving you an unorthodox quick fix here and it only works for web articles. Tweak it to suit your needs.

#!/usr/bin/env python3
# Run the program with an jw.org URL as an argument to
# download all recordings that are referenced to in that page
import sys, re, urllib.request, json

lang = 'E'
api_url = 'https://apps.jw.org/GETPUBMEDIALINKS?output=json&alllangs=0&fileformat=MP3&langwritten=' + lang + '&txtCMSLang=' + lang + '&docid='
data = urllib.request.urlopen(sys.argv[1]).read().decode('utf-8')
matches = re.finditer('data-page-id="mid([^"]*)"', data)
ids = set(x.group(1) for x in matches)  # set() removes all doubles

for i in ids:
    try:
        print('requesting data about', i)
        response = urllib.request.urlopen(api_url + i)
    except:
        continue

    tree = json.loads(response.read().decode('utf-8'))
    file_url = tree['files'][lang]['MP3'][0]['file']['url']  # Assuming there's only one MP3
    file_title = tree['files'][lang]['MP3'][0]['title']
    file_name = re.sub('[<>:"|?*/\0]', '', file_title) + '.mp3'  # NTFS safe
    print('downloading', file_title)
    urllib.request.urlretrieve(file_url, filename=file_name)

from jw-scripts.

Recommend Projects

download experiences audio about jw-scripts HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent