garret1317 / yt-dlp-rajiko Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 1.0 465 KB

improved radiko.jp extractor for yt-dlp

Home Page: https://427738.xyz/yt-dlp-rajiko/

License: Other

Python 100.00%

radiko yt-dlp yt-dlp-plugins

yt-dlp-rajiko's People

Contributors

Stargazers

Watchers

Forkers

hdbhwi41

yt-dlp-rajiko's Issues

gRPC APIs

the website has one gRPC API - BatchGetActors
does about what the name suggests - gets a list (batch) of metadata about the presenters (actors) of the specified programme

ties into the mystery of minds-r somewhat (#1) as it uses one of the numbers

the one here isn't particularly interesting, but the mobile app and https://radiko.jp/mobile have a lot more that are quite a bit more interesting

they revamped the app, imo it's only a matter of time before they do the site as well
best to not get caught with our trousers down when it actually happens

i am not familiar with gRPC
yt-dlp libraries can't speak it
hmmm

https://blog.davidvassallo.me/2018/10/27/pentesting-grpc-web-recon-and-reverse-engineering/ and the linked gist are a good start point

some apis do appear very similar to existing json ones, to the point where i could probably write a reasonable .proto file for them if i knew how
unfortunately i dont

https://protobuf-decoder.netlify.app/

paginate search

currently only does a few items, not all

from the discord:

OnDemandPagedList if you know how many videos each playlist page will yield (and it's constant)
InAdvancePagedList if you know how many total videos/pages are in the playlist before pagination
generator if you know neither of these things

Adding to this, note that pagedlists only work if the URL for the pages is static in addition to knowing videos per page. So, sites with cursors like youtube is forced to use generators. The advantage of pagedlist over generators is that user can do -I 100-120 etc without downloading earlier pages.

don't download from stations w/ no timefree (nhk)

nhk doesn't have timefree, and it says as much on the programme pages
(use nhk radiru instead)

download doesn't fail though, you just get a loop of elevator music and an apology that its not available (lasting the duration if the programme)
would prefer not to have that, just wastes space and time

iirc theres timefree/areafree availability in the stations xml, grab and store that somehow, probably in the same place as station regions, then fail in the sameish way as for not aired/not finished

maybe suggest radiru? but wouldn't really work for not nhk lol
though don't know if there are others with no timefree

law-abiding citizen mode

as the title says
no geo bypass

pain to test lol (need to use vpn, thats exactly what i was trying to escape)
will need to add stuff to:

grab token from the js
throw proper geo restriction errors when something isn't in your area

probably not too excruciating but the extractor was written with the assumption that nothing will ever be blocked lol
could maybe get it merged into dlp proper

also as a nice to have:

im not paying for it though (unless it works legit outside japan)

cache formats?

should maybe cache the results from https://radiko.jp/v3/station/stream/{device}/{station}.xml

similar vein to #16 - dont want to hammer the servers too hard
+ may as well cache since it basically never changes

maybe cache station details?

for an hour or two lets say
seems awfully silly to keep on downloading the exact same thing over and over
wastes requests, wastes time
station info is unlikely to change very often
and if it does, oh well you'll just have to wait a bit (or clear cache)

extractor arg: stream device

currently the device is hardcoded to aSmartPhone7a
works nice, always "happy path"
but there are other devices, with different streams, that it might be interesting to try instead
would certainly make it easier to test new ones

also would give me a place to write down all the devices ive found so far

this wouldn't affect auth, just which streams are available

chapters with played songs

use noa apis to make chapters for songs played on the programme
wont work very well for non-music programmes lol

https://radiko.jp/v3/feed/pc/noa/FMT.xml - live
https://radiko.jp/v2/api/noa?station_id=INT&ft=20230616000000&to=20230616050000 - specific range
there are limits to how far back in time it'll go, and how big a range you can have - not sure what these actually are

can i modify the metadata on the fly during a live stream?
very probably not, so no live then

timefree probably doable though

station button embeds

some station websites, e.g. TBS ラジオ, FM COCOLO, 文化放送 have embedded radiko players

https://radiko.jp/button-embed/live/libs/radiko-button-player.min.js is the js
https://radiko.jp/res/app/external/web/playback_permission.json is the stations it's embedded on - not sure if it forbids playback on other sites

app is xExternalWebStations, it uses a different key - relatively obvious in the js
it's still region locked same as pc_html5 though

could be nice to "see" the embeds and extract the relevant station - just url_result though, nothing actually new
same as ShareIE but with embeds basically

mark as-live streams as such

arg implemented
most of the timefree formats arent actually downloadable though, because they're streamed to you as live
so we need ffmpeg

probably need to have a list of whats compatible with what, a la yt-dlp's extractor

Originally posted by @garret1317 in #17 (comment)

share embeds

the player has a share button that lets you share links to the programme you're listening to

you can get urls, which are already covered in ShareIE, but you can also get embeds

(amusingly/annoyingly, the share dialogue uses your system time, which if you're bypassing the geoblock, doesn't necessarily match japan time. in my case, it's all 8 hours off)

hardest part of this might actually be finding an embed in the wild to test on

question dump (why are things the way they are)

this issue was previously a massive todo list for everything
however, that's retarded, so the list items have been split off into their own issues

the mystery of minds-r
- used somewhat in the grpc api BatchGetActors
  - literally just the number though, dont see why they have the https://minds-r.org/people/
- who/what is ARTIST COMMONS? is minds-r their domain?
  - https://acoms.jp
what does the l in the m3u8 query mean?
- why is it 60 in v8
endless in m3u8 query?
why is v8 user agent hardcoded to Android 10 on Pixel 4 XL
are there iphone apis?
- definitely not getting research device unless theres one on ebay for like £10 lmao
- would it mitmproxy? can i install user CAs? are they respected?
https://www.gnus-inc.com/ - what else have they made? any similarities?

please do reply if you know the answers to any of these

package/load v8 key properly

dont think the way im doing it currently is the right way
don't know how to actually do it right though

How to designate region code

I know that this is a ridiculous question , but I couldn’t understand how to designate region code to bypass radiko’s geo-blocking.

decouple device spoofing from main extractor

ie, have the key, all the random info code, all in a separate file
then, e.g. the aSmartPhone8 branch can just be one file, if you want to be aSmartPhone7a or pc_html5 you can just swap out that one file

i think all you'd need is

key
auth1 info: X-Radiko-App, X-Radiko-App-Version, X-Radiko-Device, X-Radiko-User? User-Agent
auth2 info: ~~X-Radiko-AuthToken~~ X-Radiko-Location X-Radiko-Connection ~~X-Radiko-Partialkey~~
X-Radiko-AuthToken can be done by the main thing, that's just "send what you got from auth1"
X-Radiko-Partialkey could also be done in the main one - just gib the key and let the main extractor handle it
X-Radiko-User is just random - but maybe could be overriden for eg premium support - note, dont know if premium actually uses this, haven't had the opportunity to look at it yet

will have to think about this

switch to v8 auth

it's pretty much just sitting there in the aSmartPhone8 branch

#10
not sure i'm importing the token right though, need to look up what's the accepted standard for this

also, is it really worth the bother when v7 keeps on working fine?

law-abiding citizen mode: premium account support

account needed
not paying for it myself because i'd still have to pirate the thing im paying for (not available outside japan)

(mobile) programme support

mobile app and mobile web give programmes their own pages
so you can see all available episodes easily, and be recommended other similar ones

extracting these would be very good indeed - won't have to use a convoluted search query to download weekly episodes (and other stuff that happens to match 🚎)

here's a show page on mobile web
https://radiko.jp/mobile/r_seasons/10001894

note- you need a japanese ip/vpn (rajiko deliberately doesnt work), and to spoof your user agent to something mobile
or just use curl :p

it appears to be next.js, so we can get most of the interesting stuff from the nice <script id="__NEXT_DATA__" type="application/json"> block at the end
it will block you if youre outside japan, but fortunately thats only client side - so downloading the page will work whatever

looks like everything has new (actual) ids, not sure what to do about that
while the desktop site's still non-renovated, could resolve to regular #!/ts/ urls by converting the air time
and i guess start using those ids once desktop makes the switch

here's an app show page

that one only has currently available eps and recommendations

here's another (not all shown)

has current eps and future, but no recs

if you follow the show, it lets you choose which station you want

e.g. whole jfn
just one station
partial jfn

^to deal with this, have an extractor arg: preferred stations
a list of which station( id)s to try in descending order of preference
so for eg. JFN shows you can prioritise the stations with best sound quality
and the extractor picks the first one that's available
if there is no arg, default the key/"home" station - mobile web js has it, so should be ok

that's not available on mobile web (yet?) though, so don't think i can implement it any time soon - app remains uncooperative with mitmproxy

(todo: compare the entire JFN, find ones with best sound quality)
(note: don't know why my phone screenshots display so huge, or what to do about it)

unified time handling might be a good idea

some of it's a bit fragmented and hacky at the moment, might be a good idea to make a big unified one that uses datetime and can handle all the formats needed

todo:specific examples (ctrl-f for "hack")

auth fails on newer yt-dlp!

[RadikoTimeFree] Authenticating: step 1
[debug] [RadikoTimeFree] please send a part of key
ERROR: Response.info() is deprecated, use Response.headers; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
[RadikoTimeFree] JP22: Authenticating: step 2
ERROR: [RadikoTimeFree] 20230729000000: Unable to download webpage: HTTP Error 401: Unauthorized (caused by <HTTPError 401: Unauthorized>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
  File "/home/g/.config/yt-dlp/plugins/yt-dlp-rajiko/yt_dlp_plugins/extractor/radiko.py", line 809, in _real_extract
    auth_data = self._auth(region)
  File "/home/g/.config/yt-dlp/plugins/yt-dlp-rajiko/yt_dlp_plugins/extractor/radiko.py", line 492, in _auth
    token = self._negotiate_token(station_region)
  File "/home/g/.config/yt-dlp/plugins/yt-dlp-rajiko/yt_dlp_plugins/extractor/radiko.py", line 463, in _negotiate_token
    auth2 = self._download_webpage("https://radiko.jp/v2/api/auth2", station_region,
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 1118, in _download_webpage
    return self.__download_webpage(url_or_request, video_id, note, errnote, None, fatal, *args, **kwargs)
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 1069, in download_content
    res = getattr(self, download_handle.__name__)(url_or_request, video_id, **kwargs)
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 903, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 860, in _request_webpage
    raise ExtractorError(errmsg, cause=err)

  File "/usr/local/bin/yt-dlp/yt_dlp/networking/_urllib.py", line 437, in _send
    res = opener.open(urllib_req, timeout=float(request.extensions.get('timeout') or self.timeout))
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 4060, in urlopen
    return self._request_director.send(req)
  File "/usr/local/bin/yt-dlp/yt_dlp/networking/common.py", line 90, in send
    response = handler.send(request)
  File "/usr/local/bin/yt-dlp/yt_dlp/networking/_helper.py", line 203, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/yt-dlp/yt_dlp/networking/common.py", line 301, in send
    return self._send(request)
  File "/usr/local/bin/yt-dlp/yt_dlp/networking/_urllib.py", line 442, in _send
    raise HTTPError(UrllibResponseAdapter(e.fp), redirect_loop='redirect error' in str(e)) from e
yt_dlp.networking.exceptions.HTTPError: HTTP Error 401: Unauthorized

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 847, in _request_webpage
    return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query))
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 4079, in urlopen
    raise _CompatHTTPError(e) from e
yt_dlp.networking.exceptions._CompatHTTPError: HTTP Error 401: Unauthorized