cosmichorrordev / subwinder Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 912 KB

An ergonomic Python library for the opensubtitles.org API

License: GNU Affero General Public License v3.0

Python 100.00%

api opensubtitles python python-library subtitles

subwinder's People

Contributors

Stargazers

Watchers

subwinder's Issues

Feature: Test opensubtitles errors in unit testing as well

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other: test enhancement

Description

I believe timeouts are tested for a duration and that's pretty much it. It would be good to try and get examples of other api errors being returned as well since I will likely make changes to how requesting is handled soon.

Feature: Have classes built in the most common way

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other: Simplify API

Description

So this whole library is centered around being as ergonomic as possible. To stay true to this class' constructors should be the most common way to create them. Take for example Media which will likely almost always be constructed using .from_file("<file>"). Instead, it would likely be better for the constructor to build the class from the file and to have a .from_parts(...) classmethod to build it from each of the individual members. By doing this the same functionality is exposed, but the more common usage is now easier. This is a breaking change though, so it should be done before the next major release.

Feature: Add unranked variants for functions that take `ranking_func`

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Some people may want the option to directly get all the results returned by the SearchSubtitles and GuessMovieFromString endpoints instead of trying to have the results limited to a single value by asw.search_subtiltes(...) and asw.guess_media(...). This can easily be done by having an alternate function that orders the results to match the provided queries and returns the results without any ranking. To prevent duplicate code the ranked variant can just use the unranked functions under the hood before applying the ranking_func.

Implement unit tests to ensure no behavior regresses

Testing is done through pytest, there's some good example in the tests directory

Make sure to mock out any API calls

Expose other language options

Currently this library only exposes some of the information from the API in English (such as the long language names): however, it would be best to expose other languages as well. This could be done with some mild tweaking to the lang module as well as changing the endpoints to give the option to pass in other languages (or maybe pass it in to the whole SubWinder or AuthSubWinder class).

Feature: Raise error when calling `.report_movie(...)` on result with no filehash

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

So trying to be as strict as possible it would be nice to have a nice meaningful error returned from the library when .report_movie(...) is called with a SearchResult that wasn't matched by a hash, since I'm not sure what type of response you get out of the API, but it would be nice to prevent requesting the API incorrectly in the first place.

Tracking issue for implementing all API endpoints

This is a tracking issue for finishing exposing all (maybe not all) of the api endpoints through this library. These endpoints are detailed here.

Implemented	Endpoint	`method` Note
Yes	ServerInfo	`server_info` ~~Some info is kind of hidden here so it may be good to expose that in a more obvious way~~ This has been acomplished with `daily_download_limit` as well
Yes	LogIn	`_login` Happens automatically on entering `with`
Yes	LogOut	`_logout` Happens automatically on exiting `with`
Yes	SearchSubtitles	`search_subtitles` Works for searching from movie's hash and size, but not the other options. EDIT: it's now working for all the current desired search options
No	SearchToMail	Could implement as a parameter option for `search_subtitles` like `search_to_mail=True`, this only takes movie's hash and size though so it will be a limited search
Yes	CheckSubHash	`check_subtitles` might not be useful externally so consider only exposing internally. Only use seems to be for uploading subtitles
Not planned	CheckMovieHash	The search usage seems to be already covered under `search_subtitles` using `SearchSubtitles` endpoint, and verifying if a movie is included in the hash for inserting it seems to be covered by `CheckMovieHash2` endpoint
Not planned?	CheckMovieHash2	Possibly useful for uploading subtitles: however, `SearchSubtitles` may already cover this use
No	InsertMovieHash	Useful to expose externally, would probably need its own method
No	TryUploadSubtitles	Only expose internally to call before `UploadSubtitles` endpoint
No	UploadSubtitles	Should be external, lot of work to do to get all the information used in this call though, but most of that should be automated
No	DetectLanguage	Will this be useful, should it be exposed?
Yes	PreviewSubtitles	This would be very useful since it gives a preview of the subtitles without adding to the download count. Edit: implemented in `preview_subtitles` method
Yes	DownloadSubtitles	`download_subtitles` this is done pretty well, but should be stricter on verifying the encodings used, and should take more than just `SearchResults`
Yes	ReportWrongMovieHash	~~Not implemented yet, but would be very nice to have since timings are often inaccurate~~Implemented in `report_movie`
No	ReportWrongImdbMovie	This has similar use to `ReportWrongMovieHash` and would be useful as well
Yes	GetSubLanguages	This isn't yet implemented since we use a hardcoded list to prevent extra requests, could be useful to use eventually though. Could automatically get the language list if needed to still prevent using a lot of requests? This functionality is now exposed through the `lang` submodule, and has the method `get_languages` on `SubWinder` that was implemented in `4bd517a`
Not planned	GetAvailableTranslations	This is only supported for `subdownloader` and `oscar` apparently
Not planned	GetTranslation	Like above, only supported for `subdownloader` and `oscar`
No	SearchMoviesOnIMDB	This apparently should not be used very often, but should only be needed for uploading subtitles. Try to only use internally to prevent people using it too often
Not planned	GetIMDBMovieDetails	Same story as above, Shouldn't be needed though so don't expose it
No	InsertMovie	This has a small use case so potentially expose it, but low priority
Yes	SubtitlesVote	Should be easy to implement, ensure that the rank is from 1 to 10. Edit: implemented with `vote` method
Yes	GetComments	`get_comments` would be nice to open up to more than `SearchResult`s when other endpoints are finished. Just needs a idsubtitle
Yes	AddComment	`add_comment` Needs to work with existing objects. Edit: implemented with `add_comment` method
No	AddRequest	Would be useful eventually, not sure how often requests are fulfilled
No	SetSubscribeUrl	Would be useful, may be bundled into the subscribe to hash?
No	SubscribeToHash	Useful for waiting for subtitles, expose externally or through another method?
No	QuickSuggest	Use this as in a limit parameter of `SuggestMovie`? Returns different info an takes language while `SuggestMovie` does not though
Yes	SuggestMovie	Could be useful for getting the imdb id instead of `SearchMoviesOnIMDB` depending on how accurate it is. Edit: implemented as `suggest_media`
Yes	GetUserInfo	`user_info` Returns information about the logged in user
Yes	GuessMovieFromString	`guess_media` ~~Might be good to allow for returning more than just `BestGuess`~~ Custom ranking function can now be provided
Yes	AutoUpdate	Should be useful for programs that use this library so would be good to implement. Edit: implemented with `auto_update` method
Yes	NoOperation	`ping` used for keeping the user logged in if desired

Bug: `.report_movie(...)` has a deceptive name

Bug Report

subwinder version: 1.0.1
OS name and version: N/A

Description

This isn't a bug per se, but .report_movie(...) is a deceptive name since it's not reporting the movie, only the search result that matched for the movie. It would make a lot more sense for it to be something like .report_search_result(...) or .report_subtitles(...) to the point that it should likely be renamed to avoid confusion in the future.

Code to reproduce

N/A

Add github actions

Setup github actions to automatically check that the library:

Follows flake8 linting
Follows black formatting
Passes the pytest test suite

Feature: Add tests for examples

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Examples going stable seems to be a relatively common problem from how quickly I normally change them, so it would be good to test all of them. This might not be too hard if I can import the example into the test and mock out _request.request with all the expected calls. Some of them assume a certain setup with hardcoded values that may need to be changed though.

Feature: Automatically add release assets to release page

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other: Release assets

Description

Some people seem to like the ability to download the release assets from the release page, so it would be a nice feature to automatically build and upload the release assets to the release page when a release tag is pushed to the repo.

Other: Test out behavior when hitting download limit

Other

Description

Just to ensure everything is well tested it would be good to see how the API behaves when the daily download limit is hit. From what I've seen it looks like the remaining downloads value is updated rather slowly so this could cause problems if the download limit kicks in before the value is updated. It would also just be good to see what value is returned when the download limit is hit regardless.

Feature: Have ranking functions take `MediaInfo` objects

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Currently, all the ranking functions take the raw response instead of some sort of object defined in the library. This has come up as a slight pain point when working on #34. It would be nice to have these ranking functions take the MediaInfo flavor of objects in some format similar to what is returned.

Doc: Try out building docs with Sphinx

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other: New hosting

Current Setup

Pros

Hosting right now is done by storing the docs as markdown files in the repo. This is better than using the github wiki before, because people can actually open pull requests for docs and the docs are versioned with the repo itself.

Cons

Documentation is duplicated between the doc comments in code and the official documentation. This makes it very easy to miss updating something, so information is more likely to slip out of sync.
Building documentation from docs comments encourages more thorough comments in the library itself which could be helpful

Steps

To move forward it the path would likely be:

Combine docs from the separate documentation and in-code docs
Setup sphinx with markdown support (will be another dev dependency)
Get hosting figured out with readthedocs.io

Feature: Allow subtitles based methods to take `SubtitlesInfo`

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Again working on an example has helped me get a better idea of a couple of pain points in the library. I realized that one common pattern may be saving the information for some downloaded SearchResult somewhere since you want to retain the ability to vote/comment/report them after actually being able to use them. However, the only thing that should be required for these methods is SubtitlesInfo. It would still be nice to use a SearchResult, but they should be modified to take SubtitlesInfo objects as well.

Feature: Option to store auth and automatically reauthenticate

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

So this will likely apply to people who are creating interactive interfaces instead of scripts/extensions/plugins that will be run in a quick session, but it would be nice for an option to automatically re-authenticate with the API after a timeout (15 minutes). This could be a param on AuthSubwinder where if credentials are passed in as parameters then they are stored in some way or another, and nothing special has to be done for env vars. I think the easiest solution is to keep track of when the last request was and reauth if it's over the 15-minute limit. The more robust way would be to try the request, then automatically reauth if the API throws an auth error, then retry the request again. If there are consistently auth errors then the request should fail and bubble up the error instead of retrying forever.

Add Documentation

This library is a bit light on documentation. Now that it's getting closer to having a polished release it would be good to finish up some of the documentation which leaves:

Feature: Pain-points with using Wiki for Documentation

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

So there two main problems I have with using the wiki for the library documentation currently (and the two problems I have with GitHub wikis in general) is that people are not able to do pull requests on the wiki if they find something wrong (which increases the barrier for entry for having community-driven documentation so all the changes are up to the repo collaborators). And that the wiki is not kept track in tree with the rest of the repo. This means that if there is a release that involves changes to the wiki it's hard for people to see the wiki from previous releases.

There are a couple possible bandaids that may improve this situation a bit.

Fork and create a separate repo just for the wiki and use CI or something to keep them in sync or
Have all the documentation in a folder in the repo and just use that. The link from the README can just link to the landing page and beyond that it should be essentially exactly like the current setup, but keeps track of the documentation with the rest of the repo fixing the main problems. From there the current wiki can just have a stub pointing to the landing page in the repo to help guide anyone who is used to using wikis for documentation.

Doc: In code documentation

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

So there's plenty of good documentation in the repo, but basically no doc comments in the library itself which can be painful since doc comments are what normally pop up from things like jedi. It will involve duplicating some of the documentation, but it would definitely be helpful to include documentation within the library itself.

Feature: Allow for functions that take `list` and `tuple` to take `zip` too

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

So working through adding another example I ended up with this snippet

from subwinder import AuthSubwinder, Media
from itertools import repeat

LANG = "en"
media = [Media("/path/to/file1"), Media("/path/to/file2")]
with AuthSubwinder() as asw:
    asw.search_subtitles(list(zip(media, repeat(LANG))))

Where it would be nice to directly accept the zip instead of having to expand it to a list before passing it in.

Feature: Expose ranking functions.

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

After thinking about some of the more complex examples I could add (#39), I've come to realize that it would actually be nice to expose the default ranking functions in the public API. I originally hid them, because I thought there wouldn't be any reason to rely on them. Either you would use the default ranking function or just run your own custom one. However, I now realize that exposing the default allows for a custom ranking function to cleanly fall back on the default on if it doesn't find any matches which gives a suitable fallback without adding a lot of complexity.

Check download limit before downloading

opensubtitles.org has a 24 hour download limit on most users. This could lead to problems where this library would try to download over the limit and the request. To prevent this the library should check the amount of downloads remaining before downloading, and outright fail if the requested amount exceeds the amount remaining. This prevents a request from downloading some amount then failing, without explicitly providing context over how many suceeded and failed.

To make this nice for the library user it would be good to have an obvious method that returns the download limits and amount remaining so that the library user can chunk if the requested downloads go over the limit. This information is currently tucked away in the server_info method with keys 'global_24h_download_limit', 'client_24h_download_count', 'client_download_quota', and 'client_24h_download_limit'.

with AuthSubWinder(...) as asw:
    # Do things to get the SearchResults to download as `results`
    daily_remaining = asw.download_limits().remaining_24h
    download_paths = asw.download(results[:daily_remaining])
    if daily_remaining < len(results):
        # Do somthing to keep track of the ones still needing to download
        downloaded = results[:daily_remaining]
        pending_download = results[daily_remaining:]

Doc: Update the doc examples to stop using `dirname` and `filename`

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

There were changes to make dirname and filename in Media and MediaInfo to private members with getters and setters. However, some of the doc examples still use dirname and filename directly which is incorrect.

Other: request should use an Enum for method

Other

Description

Currently request uses the string for the method to call the method. This can easily be error-prone where a typo in the string causes a failure. Furthermore, most editors can have autocompletion for something like enum options which gives a better dev experience. Beyond that using a string can cause methods to not get tested accurately since the call getting mocked out means that an error would never be raised for incorrectly typing the method, whereas an enum forces a consistent method string.

Potentially switch `search_subtitles` to have optional `limit` param

Currently search_subtitles only returns one result for each query, but it could be nice to have multiple results for each query? This might not be required since the library user can specify a customranking_function.

Feature: Add in type hinting for functions and methods

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

It would be nice to try out typing's type hinting. It's not strictly enforced, but it gives a good reference for what things should be which should be nice for both developing on and using the library.

Doc: Add missing documentation

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

Documentation is missing for several of the helper functions including subwinder.hashers.special_hash, ~~subwinder.hashers.md5_hash,~~ and subwinder.utils.extract

Feature: Automated Releases

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other: Repo feature

Description

It would be very convenient if pushing a release tag to the repo automatically published a new build to PyPI. There's some good examples to go off of for this so hopefully, it doesn't take too long to accomplish 🤞

This will make releases for even small changes like bumping dependencies near painless.

Have different constructors for `Movie` and `Subtitles` classes

This can be done with classmethods as detailed in this response. Ideally the default class will just take the info with an optional filepath in Movie, and then there would be a classmethod to construct .from_file(<filepath>)

Doc: Add more examples

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

While having good documentation is nice, it can still be hard to figure out how to piece all of it together to form an actual program. Adding in some common examples to demonstrate this (along with some more niche things you can do with the library) could help with getting people starting using the library. This is doubly true with how much this library is starting to differ from the official API reference.

Doc: Add pull request template

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

To try and reduce friction from tests failing from the possibility of linting/formatting/testing it would be nice to include a checklist for what needs to be run for a pull request to succeed, just to ensure people are aware of the standards enforced in this library.

Bug: Importing with `from` can have unintended behavior

Bug Report

subwinder version: master
OS name and version: N/A

Description

Importing different things using from pulls them into the current namespace which can in turn accidentally leak private portions of the library into the public interface. Fixing this should just involve switching stuff like from subwinder.foo import bar to import subwinder.foo.bar as bar I think.

Expose everything through 2 letter lang codes

The API currently exposes everything through a mix the 2 letter and 3 letter ISO 639-1 language codes. Ideally all the parameters used in this wrapper should be exclusively 2 letter codes to standardize. Ensure that this is the case

Feature: Logging

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Currently there is no logging whatsoever in the library. This can be painful since I'm sure some people would like access to information for the inner workings of the API to help troubleshoot. A good start would be to log API requests along with raising warnings around any abnormal responses, along with logging any errors.

I have no past experience in setting up logging for a library, so it will take some time to figure out best practices and how it should be exposed to the user. Ideally, log preferences should be set by library users, and users should be able to isolate what log messages come from the library and enable/disable them as desired.

Fetch language list from API

Currently the library uses a hardcoded language list, this is nice to save on requests, but also quite limiting. To keep the list dynamically updated it would be nice to have a single language handler that only fetches the language list when needed and stores it since the language list is very rarely updated. It should refetch periodically after a long period of time (hourly?) to allow long running processes to function correctly still. This can be outside of the base and auth subwinder (could use base subwinder to get language list). Syntax could be something like

from lang import convert, formats

convert("en", from=formats.lang_2, to=formats.lang_3)

formats is an enum with the lang_2, lang_3, and long opitons.

Doc: 0.5.0 Release Notes

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

After thinking over it for a while I think this library is pretty stable with the functionality it already possesses, so I went ahead and pushed the changes to pypi since quite a bit was updated from the 0.4.0 release. To mark this though it would be good to add installation instructions in the README along with removing the warning, and on top of that it would be good to publish some brief release notes talking about the direction for future work and what currently is implemented.

Bug: Tests don't all work in isolation

Bug Report

subwinder version: master
OS name and version: NA

Description

The tests currently don't work in isolation. This is due to language list caching which isn't followed through for all tests in the same manner, because of this some tests depend on earlier tests faking the language list. Setting up an autouse fixture might help here for the default and then have another to disable the language list cache since that is the least common case.

Code to reproduce

Other: Test on other platforms

Other

Description

Currently, CI testing only runs on Ubuntu latest. To ensure everything works fine on other platforms it would be good to add other targets to CI testing.

Bug: Catch server errors

Bug Report

subwinder version: current master
OS name and version: Arch Linux

Description

Currently if there's a server error then it gets raised as a protocol error in the xmlrpc request. Currently this will crash the program, but this is too inconsistent to be tested well. Ideally catch the error and set a custom status message to make it look like any other response.

Code to reproduce

Too inconsistent to reproduce.

Bug: exception raised when `.guess_media(...)` returns no results

Bug Report

subwinder version: current master (131e210)
OS name and version: N/A

Description

So I've seen this a couple of ways so far. On an empty string ("") GuessMovie returns an empty list resulting in a TypeError when a value is attempted to be returned by a key. The other is when there is a string that just doesn't return any matches ("asjdlfkjweifadsfjk"). In this case, it returns a pretty bare dict (see below). Both of these will effect both the ranked and unranked variants. I suppose the best option from here would be to return None for either one since there isn't enough information for a GuessMediaResult or any MediaInfo derivatives. The other option would be to return GuessMediaResult for the unranked variant that has no information (full of empty lists and None, which might be better since stuff will likely be expecting a GuessMediaResult and should be able to handle a lack of information.

{
    "asjdlfkjweifadsfjk": {
        "GuessIt": {
            "title": "asjdlfkjweifadsfjk",
            "type": "movie"
        },
        "GetIMDBSuggest": []
    }
}

Code to reproduce

from subwinder import AuthSubwinder

with AuthSubwinder() as asw:
    asw.guess_media([""])
    asw.guess_media(["asjdlfkjweifadsfjk"])

Bug: Properly hide `ranking` and `request`, and part of `utils`

Bug Report

subwinder version: current master
OS name and version: Not important

Description

So since we're about to allow for some breaking changes with this next major release it would make sense to hide ranking and request externally. Currently, there should be no use for either of them to be public and if anything it will just pollute the options for importing in peoples' editors. ranking is already documented and requests only function is incorrectly private. Beyond that _force_path in utils is also incorrectly private and it would make sense to have an _internal_utils to address this issue.

Code to reproduce

Feature: Make `filename` and `dirname` private in `MediaInfo` and `Media`

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other: Modify existing behavior

Description

So even though we're all consenting adults here. Because filename and dirname in MediaInfo and Media should always be paths it would be good to make them private and force everything through getters and setters no matter how much I loathe that style. While this is being done it would be nice to add a .get_filepath(...) as well, this will need to raise an error if filename or dirname is None though since a valid path can't be generated. Kinda a small papercut, but just make sure it's documented.

Other: Switch to `pathlib`

Other

Description

I started using pathlib in other projects and I really favor it for not just keeping all paths as strings, and for dealing with paths in a more ergonomic/readable. This should be a pretty painless change: however, because the stored information will be exposed through different objects this will be considered a breaking change.

Doc: Add in a pull request template

Documentation

There is a mistake in the documentation
Something is missing or needs clarifying
Other:

Description

It would be nice to include a reminder to run linters/formatters/unit testing along with having a checklist that documentation was updated and note that changes will be assumed to be licensed under AGPLv3 since the project is.

Feature: Switch `type` to `isinstance`

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Switch out from type checking an object to using isinstance to allow for user's to inherit and provide extra functionality from the base classes, while still being able to supply these objects to different methods that require objects to be of specific types.

Add `repr` to different classes

None of the classes implement __repr__ which can make debugging moderately harder. Add descriptive __repr__ so that printing different classes is more informative.

Other: Switching `.download_subtitles(...)` to write bytes is less error prone

Other

Description

Currently, there can be a problem when the author specified the wrong encoding on subtitles when uploading causing utils.extract(...) to fail currently (which is used by .dowload_subtitles(...) and .preview_subtitles(...). However, only .preview_subtitles(...) makes sense for needing to extract the string while .download_subtitles(...) can just directly write the bytes to a file.

For this reason, it makes sense to switch extract to return bytes and decode within just .preview_subtitles(...). It would also be nice to try and provide a fallback for failing encoding. Right now I've only seen UnicodeDecodeError, but it would be nice to have a common error to look for.

Feature: Send password as MD5 hash

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

The api docs mention for security to use HTTPS and/or send the password as the MD5 hash. I've heard mixed things on sending the hash versus the password, but there's no reason not to since it's trying to enforce sending out the password in case it gets forced back to HTTP somehow.

Feature: Remove `dirname` and `filename` from `.from_data(...)`

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other: Change current functionality

Description

Currently MediaInfo and SearchResult take optionally take dirname and filename in .from_data(...). Aside from not being used much, this isn't really required in most cases and can just be covered with calling .set_dirname(...) and .set_filename(...) instead, so my vote would be to remove this to make the behavior more consistent.

Feature: Default `download_dir` to cwd

Feature Request

Expose functionality from the API
Add extra functionality in the library
Other:

Description

Currently, the library throws an error if the download_dir isn't set and there is no original directory information associated with the SearchResult for asw.download_subtitles(...). It might make sense to fall back to the current working directory for the download since this behavior is used for several of the objects and functions from the standard library that write to files.