cosmichorrordev / subwinder Goto Github PK
View Code? Open in Web Editor NEWAn ergonomic Python library for the opensubtitles.org API
License: GNU Affero General Public License v3.0
An ergonomic Python library for the opensubtitles.org API
License: GNU Affero General Public License v3.0
I believe timeouts are tested for a duration and that's pretty much it. It would be good to try and get examples of other api errors being returned as well since I will likely make changes to how requesting is handled soon.
So this whole library is centered around being as ergonomic as possible. To stay true to this class' constructors should be the most common way to create them. Take for example Media
which will likely almost always be constructed using .from_file("<file>")
. Instead, it would likely be better for the constructor to build the class from the file and to have a .from_parts(...)
classmethod to build it from each of the individual members. By doing this the same functionality is exposed, but the more common usage is now easier. This is a breaking change though, so it should be done before the next major release.
Some people may want the option to directly get all the results returned by the SearchSubtitles
and GuessMovieFromString
endpoints instead of trying to have the results limited to a single value by asw.search_subtiltes(...)
and asw.guess_media(...)
. This can easily be done by having an alternate function that orders the results to match the provided queries and returns the results without any ranking. To prevent duplicate code the ranked variant can just use the unranked functions under the hood before applying the ranking_func
.
Testing is done through pytest, there's some good example in the tests
directory
Make sure to mock out any API calls
Currently this library only exposes some of the information from the API in English (such as the long language names): however, it would be best to expose other languages as well. This could be done with some mild tweaking to the lang
module as well as changing the endpoints to give the option to pass in other languages (or maybe pass it in to the whole SubWinder
or AuthSubWinder
class).
So trying to be as strict as possible it would be nice to have a nice meaningful error returned from the library when .report_movie(...)
is called with a SearchResult
that wasn't matched by a hash, since I'm not sure what type of response you get out of the API, but it would be nice to prevent requesting the API incorrectly in the first place.
This is a tracking issue for finishing exposing all (maybe not all) of the api endpoints through this library. These endpoints are detailed here.
Implemented | Endpoint | method Note |
---|---|---|
Yes | ServerInfo | server_info daily_download_limit as well |
Yes | LogIn | _login Happens automatically on entering with |
Yes | LogOut | _logout Happens automatically on exiting with |
Yes | SearchSubtitles | search_subtitles Works for searching from movie's hash and size, but not the other options. EDIT: it's now working for all the current desired search options |
No | SearchToMail | Could implement as a parameter option for search_subtitles like search_to_mail=True , this only takes movie's hash and size though so it will be a limited search |
Yes | CheckSubHash | check_subtitles might not be useful externally so consider only exposing internally. Only use seems to be for uploading subtitles |
Not planned | CheckMovieHash | The search usage seems to be already covered under search_subtitles using SearchSubtitles endpoint, and verifying if a movie is included in the hash for inserting it seems to be covered by CheckMovieHash2 endpoint |
Not planned? | CheckMovieHash2 | Possibly useful for uploading subtitles: however, SearchSubtitles may already cover this use |
No | InsertMovieHash | Useful to expose externally, would probably need its own method |
No | TryUploadSubtitles | Only expose internally to call before UploadSubtitles endpoint |
No | UploadSubtitles | Should be external, lot of work to do to get all the information used in this call though, but most of that should be automated |
No | DetectLanguage | Will this be useful, should it be exposed? |
Yes | PreviewSubtitles | This would be very useful since it gives a preview of the subtitles without adding to the download count. Edit: implemented in preview_subtitles method |
Yes | DownloadSubtitles | download_subtitles this is done pretty well, but should be stricter on verifying the encodings used, and should take more than just SearchResults |
Yes | ReportWrongMovieHash | report_movie |
No | ReportWrongImdbMovie | This has similar use to ReportWrongMovieHash and would be useful as well |
Yes | GetSubLanguages | lang submodule, and has the method get_languages on SubWinder that was implemented in 4bd517a |
Not planned | GetAvailableTranslations | This is only supported for subdownloader and oscar apparently |
Not planned | GetTranslation | Like above, only supported for subdownloader and oscar |
No | SearchMoviesOnIMDB | This apparently should not be used very often, but should only be needed for uploading subtitles. Try to only use internally to prevent people using it too often |
Not planned | GetIMDBMovieDetails | Same story as above, Shouldn't be needed though so don't expose it |
No | InsertMovie | This has a small use case so potentially expose it, but low priority |
Yes | SubtitlesVote | Should be easy to implement, ensure that the rank is from 1 to 10. Edit: implemented with vote method |
Yes | GetComments | get_comments would be nice to open up to more than SearchResult s when other endpoints are finished. Just needs a idsubtitle |
Yes | AddComment | add_comment Needs to work with existing objects. Edit: implemented with add_comment method |
No | AddRequest | Would be useful eventually, not sure how often requests are fulfilled |
No | SetSubscribeUrl | Would be useful, may be bundled into the subscribe to hash? |
No | SubscribeToHash | Useful for waiting for subtitles, expose externally or through another method? |
No | QuickSuggest | Use this as in a limit parameter of SuggestMovie ? Returns different info an takes language while SuggestMovie does not though |
Yes | SuggestMovie | Could be useful for getting the imdb id instead of SearchMoviesOnIMDB depending on how accurate it is. Edit: implemented as suggest_media |
Yes | GetUserInfo | user_info Returns information about the logged in user |
Yes | GuessMovieFromString | guess_media BestGuess |
Yes | AutoUpdate | Should be useful for programs that use this library so would be good to implement. Edit: implemented with auto_update method |
Yes | NoOperation | ping used for keeping the user logged in if desired |
subwinder
version: 1.0.1
This isn't a bug per se, but .report_movie(...)
is a deceptive name since it's not reporting the movie, only the search result that matched for the movie. It would make a lot more sense for it to be something like .report_search_result(...)
or .report_subtitles(...)
to the point that it should likely be renamed to avoid confusion in the future.
N/A
Setup github actions to automatically check that the library:
flake8
lintingblack
formattingpytest
test suiteExamples going stable seems to be a relatively common problem from how quickly I normally change them, so it would be good to test all of them. This might not be too hard if I can import the example into the test and mock out _request.request
with all the expected calls. Some of them assume a certain setup with hardcoded values that may need to be changed though.
Some people seem to like the ability to download the release assets from the release page, so it would be a nice feature to automatically build and upload the release assets to the release page when a release tag is pushed to the repo.
Just to ensure everything is well tested it would be good to see how the API behaves when the daily download limit is hit. From what I've seen it looks like the remaining downloads value is updated rather slowly so this could cause problems if the download limit kicks in before the value is updated. It would also just be good to see what value is returned when the download limit is hit regardless.
Currently, all the ranking functions take the raw response instead of some sort of object defined in the library. This has come up as a slight pain point when working on #34. It would be nice to have these ranking functions take the MediaInfo
flavor of objects in some format similar to what is returned.
To move forward it the path would likely be:
Again working on an example has helped me get a better idea of a couple of pain points in the library. I realized that one common pattern may be saving the information for some downloaded SearchResult
somewhere since you want to retain the ability to vote/comment/report them after actually being able to use them. However, the only thing that should be required for these methods is SubtitlesInfo
. It would still be nice to use a SearchResult
, but they should be modified to take SubtitlesInfo
objects as well.
So this will likely apply to people who are creating interactive interfaces instead of scripts/extensions/plugins that will be run in a quick session, but it would be nice for an option to automatically re-authenticate with the API after a timeout (15 minutes). This could be a param on AuthSubwinder
where if credentials are passed in as parameters then they are stored in some way or another, and nothing special has to be done for env vars. I think the easiest solution is to keep track of when the last request was and reauth if it's over the 15-minute limit. The more robust way would be to try the request, then automatically reauth if the API throws an auth error, then retry the request again. If there are consistently auth errors then the request should fail and bubble up the error instead of retrying forever.
This library is a bit light on documentation. Now that it's getting closer to having a polished release it would be good to finish up some of the documentation which leaves:
So there two main problems I have with using the wiki for the library documentation currently (and the two problems I have with GitHub wikis in general) is that people are not able to do pull requests on the wiki if they find something wrong (which increases the barrier for entry for having community-driven documentation so all the changes are up to the repo collaborators). And that the wiki is not kept track in tree with the rest of the repo. This means that if there is a release that involves changes to the wiki it's hard for people to see the wiki from previous releases.
There are a couple possible bandaids that may improve this situation a bit.
So there's plenty of good documentation in the repo, but basically no doc comments in the library itself which can be painful since doc comments are what normally pop up from things like jedi
. It will involve duplicating some of the documentation, but it would definitely be helpful to include documentation within the library itself.
So working through adding another example I ended up with this snippet
from subwinder import AuthSubwinder, Media
from itertools import repeat
LANG = "en"
media = [Media("/path/to/file1"), Media("/path/to/file2")]
with AuthSubwinder() as asw:
asw.search_subtitles(list(zip(media, repeat(LANG))))
Where it would be nice to directly accept the zip
instead of having to expand it to a list
before passing it in.
After thinking about some of the more complex examples I could add (#39), I've come to realize that it would actually be nice to expose the default ranking functions in the public API. I originally hid them, because I thought there wouldn't be any reason to rely on them. Either you would use the default ranking function or just run your own custom one. However, I now realize that exposing the default allows for a custom ranking function to cleanly fall back on the default on if it doesn't find any matches which gives a suitable fallback without adding a lot of complexity.
opensubtitles.org has a 24 hour download limit on most users. This could lead to problems where this library would try to download over the limit and the request. To prevent this the library should check the amount of downloads remaining before downloading, and outright fail if the requested amount exceeds the amount remaining. This prevents a request from downloading some amount then failing, without explicitly providing context over how many suceeded and failed.
To make this nice for the library user it would be good to have an obvious method that returns the download limits and amount remaining so that the library user can chunk if the requested downloads go over the limit. This information is currently tucked away in the server_info
method with keys 'global_24h_download_limit'
, 'client_24h_download_count'
, 'client_download_quota'
, and 'client_24h_download_limit'
.
with AuthSubWinder(...) as asw:
# Do things to get the SearchResults to download as `results`
daily_remaining = asw.download_limits().remaining_24h
download_paths = asw.download(results[:daily_remaining])
if daily_remaining < len(results):
# Do somthing to keep track of the ones still needing to download
downloaded = results[:daily_remaining]
pending_download = results[daily_remaining:]
There were changes to make dirname
and filename
in Media
and MediaInfo
to private members with getters and setters. However, some of the doc examples still use dirname
and filename
directly which is incorrect.
Currently request
uses the string for the method
to call the method. This can easily be error-prone where a typo in the string causes a failure. Furthermore, most editors can have autocompletion for something like enum options which gives a better dev experience. Beyond that using a string can cause methods to not get tested accurately since the call getting mocked out means that an error would never be raised for incorrectly typing the method, whereas an enum forces a consistent method string.
Currently search_subtitles
only returns one result for each query, but it could be nice to have multiple results for each query? This might not be required since the library user can specify a customranking_function
.
It would be nice to try out typing
's type hinting. It's not strictly enforced, but it gives a good reference for what things should be which should be nice for both developing on and using the library.
Documentation is missing for several of the helper functions including subwinder.hashers.special_hash
, and subwinder.hashers.md5_hash
,subwinder.utils.extract
It would be very convenient if pushing a release tag to the repo automatically published a new build to PyPI. There's some good examples to go off of for this so hopefully, it doesn't take too long to accomplish ๐ค
This will make releases for even small changes like bumping dependencies near painless.
This can be done with classmethods as detailed in this response. Ideally the default class will just take the info with an optional filepath
in Movie
, and then there would be a classmethod to construct .from_file(<filepath>)
While having good documentation is nice, it can still be hard to figure out how to piece all of it together to form an actual program. Adding in some common examples to demonstrate this (along with some more niche things you can do with the library) could help with getting people starting using the library. This is doubly true with how much this library is starting to differ from the official API reference.
To try and reduce friction from tests failing from the possibility of linting/formatting/testing it would be nice to include a checklist for what needs to be run for a pull request to succeed, just to ensure people are aware of the standards enforced in this library.
subwinder
version: master
Importing different things using from
pulls them into the current namespace which can in turn accidentally leak private portions of the library into the public interface. Fixing this should just involve switching stuff like from subwinder.foo import bar
to import subwinder.foo.bar as bar
I think.
The API currently exposes everything through a mix the 2 letter and 3 letter ISO 639-1 language codes. Ideally all the parameters used in this wrapper should be exclusively 2 letter codes to standardize. Ensure that this is the case
Currently there is no logging whatsoever in the library. This can be painful since I'm sure some people would like access to information for the inner workings of the API to help troubleshoot. A good start would be to log API requests along with raising warnings around any abnormal responses, along with logging any errors.
I have no past experience in setting up logging for a library, so it will take some time to figure out best practices and how it should be exposed to the user. Ideally, log preferences should be set by library users, and users should be able to isolate what log messages come from the library and enable/disable them as desired.
Currently the library uses a hardcoded language list, this is nice to save on requests, but also quite limiting. To keep the list dynamically updated it would be nice to have a single language handler that only fetches the language list when needed and stores it since the language list is very rarely updated. It should refetch periodically after a long period of time (hourly?) to allow long running processes to function correctly still. This can be outside of the base and auth subwinder (could use base subwinder to get language list). Syntax could be something like
from lang import convert, formats
convert("en", from=formats.lang_2, to=formats.lang_3)
formats
is an enum with the lang_2
, lang_3
, and long
opitons.
After thinking over it for a while I think this library is pretty stable with the functionality it already possesses, so I went ahead and pushed the changes to pypi since quite a bit was updated from the 0.4.0 release. To mark this though it would be good to add installation instructions in the README along with removing the warning, and on top of that it would be good to publish some brief release notes talking about the direction for future work and what currently is implemented.
subwinder
version: master
The tests currently don't work in isolation. This is due to language list caching which isn't followed through for all tests in the same manner, because of this some tests depend on earlier tests faking the language list. Setting up an autouse
fixture might help here for the default and then have another to disable the language list cache since that is the least common case.
Currently, CI testing only runs on Ubuntu latest. To ensure everything works fine on other platforms it would be good to add other targets to CI testing.
subwinder
version: current masterCurrently if there's a server error then it gets raised as a protocol error in the xmlrpc request. Currently this will crash the program, but this is too inconsistent to be tested well. Ideally catch the error and set a custom status message to make it look like any other response.
Too inconsistent to reproduce.
subwinder
version: current master
(131e210)So I've seen this a couple of ways so far. On an empty string (""
) GuessMovie
returns an empty list resulting in a TypeError
when a value is attempted to be returned by a key. The other is when there is a string that just doesn't return any matches ("asjdlfkjweifadsfjk"
). In this case, it returns a pretty bare dict
(see below). Both of these will effect both the ranked and unranked variants. I suppose the best option from here would be to return None
for either one since there isn't enough information for a GuessMediaResult
or any MediaInfo
derivatives. The other option would be to return GuessMediaResult
for the unranked variant that has no information (full of empty lists and None
, which might be better since stuff will likely be expecting a GuessMediaResult
and should be able to handle a lack of information.
{
"asjdlfkjweifadsfjk": {
"GuessIt": {
"title": "asjdlfkjweifadsfjk",
"type": "movie"
},
"GetIMDBSuggest": []
}
}
from subwinder import AuthSubwinder
with AuthSubwinder() as asw:
asw.guess_media([""])
asw.guess_media(["asjdlfkjweifadsfjk"])
subwinder
version: current master
So since we're about to allow for some breaking changes with this next major release it would make sense to hide ranking
and request
externally. Currently, there should be no use for either of them to be public and if anything it will just pollute the options for importing in peoples' editors. ranking
is already documented and requests
only function is incorrectly private. Beyond that _force_path
in utils
is also incorrectly private and it would make sense to have an _internal_utils
to address this issue.
So even though we're all consenting adults here. Because filename
and dirname
in MediaInfo
and Media
should always be paths it would be good to make them private and force everything through getters and setters no matter how much I loathe that style. While this is being done it would be nice to add a .get_filepath(...)
as well, this will need to raise an error if filename
or dirname
is None
though since a valid path can't be generated. Kinda a small papercut, but just make sure it's documented.
I started using pathlib
in other projects and I really favor it for not just keeping all paths as strings, and for dealing with paths in a more ergonomic/readable. This should be a pretty painless change: however, because the stored information will be exposed through different objects this will be considered a breaking change.
It would be nice to include a reminder to run linters/formatters/unit testing along with having a checklist that documentation was updated and note that changes will be assumed to be licensed under AGPLv3 since the project is.
Switch out from type
checking an object to using isinstance
to allow for user's to inherit and provide extra functionality from the base classes, while still being able to supply these objects to different methods that require objects to be of specific types.
None of the classes implement __repr__
which can make debugging moderately harder. Add descriptive __repr__
so that printing different classes is more informative.
Currently, there can be a problem when the author specified the wrong encoding on subtitles when uploading causing utils.extract(...)
to fail currently (which is used by .dowload_subtitles(...)
and .preview_subtitles(...)
. However, only .preview_subtitles(...)
makes sense for needing to extract the string while .download_subtitles(...)
can just directly write the bytes to a file.
For this reason, it makes sense to switch extract to return bytes and decode within just .preview_subtitles(...)
. It would also be nice to try and provide a fallback for failing encoding. Right now I've only seen UnicodeDecodeError
, but it would be nice to have a common error to look for.
The api docs mention for security to use HTTPS and/or send the password as the MD5 hash. I've heard mixed things on sending the hash versus the password, but there's no reason not to since it's trying to enforce sending out the password in case it gets forced back to HTTP somehow.
Currently MediaInfo
and SearchResult
take optionally take dirname
and filename
in .from_data(...)
. Aside from not being used much, this isn't really required in most cases and can just be covered with calling .set_dirname(...)
and .set_filename(...)
instead, so my vote would be to remove this to make the behavior more consistent.
Currently, the library throws an error if the download_dir
isn't set and there is no original directory information associated with the SearchResult
for asw.download_subtitles(...)
. It might make sense to fall back to the current working directory for the download since this behavior is used for several of the objects and functions from the standard library that write to files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.