paperswithcode / paperswithcode-client Goto Github PK

API Client for paperswithcode.com

License: Apache License 2.0

Makefile 1.39% Python 98.61%

paperswithcode-client's Introduction

paperswithcode.com API client

This is a client for PapersWithCode read/write API.

The API is completely covered by the client and it wraps all the API models into python objects and communicates with the API by getting and passing those objects from and to the api client.

Documentation can be found on the ReadTheDocs website.

It is published to the Python Package Index and can be installed by simply calling pip install paperswithcode-client.

Quick usage example

To install:

pip install paperswithcode-client

To list papers indexed on Papers with Code:

from paperswithcode import PapersWithCodeClient

client = PapersWithCodeClient()
papers = client.paper_list()
print(papers.results[0])
print(papers.next_page)

For full docs please see our ReadTheDocs page.

How to mirror your competition

Papers with Code offers a mirroring service for ongoing competitions that allows competition administrators to automatically upload the results to Papers with Code using an API.

To use the API in the write mode you'll need to first obtain an API token.

Using the API token you'll be able to use the client in write mode:

from paperswithcode import PapersWithCodeClient

client = PapersWithCodeClient(token="your_secret_api_token")

To mirror a live competition, you'll need to make sure the corresponding task (e.g. "Image Classification") exists on Papers with Code. You can use the search to check if it exists, and if it doesn't, you can add a new task on the Task addition page.

If you cannot find your dataset on the website, you can create it with the API like this:

from paperswithcode.models.dataset import *
client.dataset_add(
    DatasetCreateRequest(
        name="VeryTinyImageNet",
    )
)

Now we are ready to programatically create the competition on Papers with Code. Here is an example of how we would do this on a fictional VeryTinyImageNet dataset.

from paperswithcode import PapersWithCodeClient
from paperswithcode.models.evaluation.synchronize import *

client = PapersWithCodeClient(token="your_secret_api_token")

r = EvaluationTableSyncRequest(
    task="Image Classification",
    dataset="VeryTinyImageNet",
    description="Optional description of your challenge in markdown format",
    metrics=[
        MetricSyncRequest(
            name="Top 1 Accuracy",
            is_loss=False,
        ),
        MetricSyncRequest(
            name="Top 5 Accuracy",
            is_loss=False,
        )
    ],
    results=[
        ResultSyncRequest(
            metrics={
                "Top 1 Accuracy": "85",
                "Top 5 Accuracy": "95"
            },
            paper="",
            methodology="My Unpublished Model Name",
            external_id="competition-submission-id-4321",
            evaluated_on="2020-11-20",
            external_source_url="https://my.competition.com/leaderboard/entry1"
        ),
        ResultSyncRequest(
            metrics={
                "Top 1 Accuracy": "75",
                "Top 5 Accuracy": "81"
            },
            paper="https://arxiv.org/abs/1512.03385",
            methodology="ResNet-50 (baseline)",
            external_id="competition-submission-id-1123",
            evaluated_on="2020-09-20",
            external_source_url="https://my.competition.com/leaderboard/entry2"
        )
    ]
)

client.evaluation_synchronize(r)

This is going to add two entries to the leaderboard, a ResNet-50 baseline that is referenced by the provided arXiv paper link, and an unpublished entry for model My Unpublished Model Name.

To decompose it a bit more:

metrics=[
    MetricSyncRequest(
        name="Top 1 Accuracy",
        is_loss=False,
    ),
    MetricSyncRequest(
        name="Top 5 Accuracy",
        is_loss=False,
    )
],

This defines two global metrics that are going to be used in the leaderboard. The table will be ranked based on the first provided metric. The paramter is_loss indicates if the metric is a loss metric, i.e. if smaller-is-better. Since in this case both are accuracy metric where higher-is-better, we set is_loss=False which will produce the correct sorting order in the table.

An individual row in the leaderboard is represented by:

ResultSyncRequest(
    metrics={
        "Top 1 Accuracy": "85",
        "Top 5 Accuracy": "95"
    },
    paper="",
    methodology="My Unpublished Model Name",
    external_id="competition-submission-id-4321",
    evaluated_on="2020-11-20",
    external_source_url="https://my.competition.com/leaderboard/entry1"
)

Metrics is simply a dictionary of metric values for each of the global metrics. The paper parameter can be a link to an arXiv paper, conference paper, or a paper page on Papers with Code. Any code that's associated with the paper will be linked automatically. The methodology parameter should contain the model name that is informative to the reader. external_id is your ID of this submission - this ID should be unqiue and is used when you make repeated calls to merge results if they changed. evaluated_on is the date in YYYY-MM-DD format on which the method was evaluated on - we use this to create progress graphs. Finally, external_source_url is the URL to your website, ideally linking back to this individual entry. This will be linked in the "Result" column of the leaderboard and will enable users to navigate back to your website.

Finally, this line of code:

client.evaluation_synchronize(r)

This will execute the request on our API and will return you the ID of your leaderboard on Papers with Code. You can then access it by going to https://paperswithcode.com/sota/<your_leaderboard_id> or find it using the site search.

To keep your Papers with Code leaderboard in sync, you can simply re-post all the entries in your competition on regular intervals. If a row already exists, it will be merged and no duplicates will be created.

For in-depth API docs please refer to our ReadTheDocs page.

By using the API you agree that any competition data you submit will be licenced under CC-BY-SA 4.0.

If you need any help contact us on [email protected].

paperswithcode-client's People

Contributors

Stargazers

Watchers

paperswithcode-client's Issues

Invalid OPENAPI specification

Hi,

I didn't find a better place to report to, just I wanted to publish it somewhere.
So, if you open this
https://paperswithcode.com/api/v1/docs/?format=openapi
then you realize that the "operationId": "repositories_read" is a duplicate,
therefore swagger / openapi tools can't generate proper client.
I guess it is a trivial renaming fix, but would be nice to put an automated validation somewhere in the future.

Cannot fetch more pages of area task list

client.area_task_list("computer-vision") works fine, whereas

client.area_task_list("computer-vision", page=2) throws


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-82-e71e2ef59317> in <module>
----> 1 client.area_task_list("computer-vision", page=2)

~/.local/lib/python3.8/site-packages/tea_client/handler.py in wrapper(self, *args, **kwargs)
     16     def wrapper(self, *args, **kwargs):
     17         try:
---> 18             return func(self, *args, **kwargs)
     19         except HttpClientError as e:
     20             if e.status_code == 401:

~/.local/lib/python3.8/site-packages/paperswithcode/client.py in area_task_list(self, area_id, page, items_per_page)
    334             Tasks: Tasks object.
    335         """
--> 336         return self.__page(
    337             self.http.get(
    338                 f"/areas/{area_id}/tasks/",

~/.local/lib/python3.8/site-packages/paperswithcode/client.py in __page(cls, result, page_model)
     79         previous_page = result["previous"]
     80         if previous_page is not None:
---> 81             previous_page = cls.__parse(previous_page)
     82         return page_model(
     83             count=result["count"],

~/.local/lib/python3.8/site-packages/paperswithcode/client.py in __parse(url)
     70         else:
     71             q = parse.parse_qs(p.query)
---> 72             return q["page"][0]
     73 
     74     @classmethod

KeyError: 'page'

This should not happen, as results of first query say there exists next page, and also it says that there are 934 results for computer vision, but maximal number of retrieved results is 500.

getting papers list doesn't work appropriately

I have tried querying
https://paperswithcode.com/api/v1/papers/?title=generative+adversarial+networks

and it returns entries sorted by the publishing date not by the most relevant orders.

I have also tried with different parameters q, and it didn't work either. Actually, title and q performs the same behaviour.

can you please check @alefnula ?

Missing dependency doing setup.py

(pwc) rjt-mbp:paperswithcode-client rjt$ python setup.py install
Traceback (most recent call last):
File "setup.py", line 3, in
from paperswithcode import version
File "/Users/rjt/Documents/software/paperswithcode-client/paperswithcode/init.py", line 3, in
from paperswithcode.client import PapersWithCodeClient
File "/Users/rjt/Documents/software/paperswithcode-client/paperswithcode/client.py", line 5, in
from tea_client.http import HttpClient
ModuleNotFoundError: No module named 'tea_client'

400: Bad Request

I ran the exact sample code in the README, replacing the token with my API token and the dataset name with my target dataset, and I've received the following error:
"tea_client.errors.HttpClientError: HttpClientError(400: Bad Request.)"

Is the sample code still working? Is there something else that I need to change in the code?

Don't return full list of results

When call paper_result_list(), it only return results on first page. I look at source code, now it only did:
self.http.get(f"/evaluations/{evaluation_id}/results/")
But for all later pages, it should also call self.http.get(f"/evaluations/{evaluation_id}/results/?page=2")
Please fix this bug or provide new api, thanks!

Getting Latest or Query by Date

Hello,

Is it presently possible to query by date or get ordered papers starting with the most current dates? Presently I'm not sure if once can use the client for getting ordered results (though I did see a pull request). Additionally when you do order the results by published date I find that it starts ascending with the first papers being dated I believe in 1951 and the last papers being Null values.

Currently I'm guessing the best way forward is to just keep track of where specific date ranges exist in the paginated results or to download the entire dataset and manipulate it from there. If there is a better way I'm eager to know.

Thanks!

Error in task paper listing

I have tried to use the client.task_paper_list() to obtain the papers for different tasks, but some of them yield the following error:

ValidationError(400: Request validation error.)

As an example, I printed the number of papers or an error message for the tasks listed for the query 'monocular', with the following output:

3d-object-detection-from-monocular-images: 5
indoor-monocular-depth-estimation: 3
Error for monocular-3d-human-pose-estimation
Error for monocular-3d-object-detection
monocular-3d-object-localization: 3
monocular-cross-view-road-scene-parsing-road: 1
monocular-cross-view-road-scene-parsing: 1
monocular-depth-estimation: 283
monocular-visual-odometry: 46

I verified on the webpage that these tasks have papers associated with them.

HttpClientTimeout(500: Timeout exceeded)

First, I have created an API token.
Next, I want to query papers by title using this code:
from paperswithcode import PapersWithCodeClient
client = PapersWithCodeClient(token="my_token")
papers = client.paper_list(q=title)

but I got HttpClientTimeout error:

tea_client.errors.HttpClientTimeout: HttpClientTimeout(500: Timeout exceeded)

Is there any way to prevent read timeout, or is it possible to modify the value of timeout?

Are the evaluation results extracted automatically from research papers using an automated system, or are they submitted and curated by the end-users, such as researchers and developers?

Dear PapersWithCode maintainers @alefnula @lambdaofgod @rstojnic @mkardas,

As a student researcher at Queen's University, I am interested in understanding how evaluation results are obtained and added to the PapersWithCode platform. Are the evaluation results extracted automatically from research papers using an automated system, or are they submitted and curated by the end-users, such as researchers and developers? For example, end-users can add new evaluation results to the HELM dataset:

I believe this information will help me better comprehend the reliability and scope of the evaluation results presented on the platform, as well as the potential opportunities for contributing to the platform as a researcher.

Thank you for your time and assistance in clarifying this matter.

Best regards,

Jimmy

paper_list query returns different results than on the website

When I run the query for "utrecht university" on the website (https://paperswithcode.com/search?q_meta=&q_type=&q=utrecht+university), I get 5 results in total:

When I run the same query, I get only 3 results and some of them are different results:

For example the paper I marked (see this link also: https://physics.paperswithcode.com/paper/long-lived-non-equilibrium-interstitial-solid) isn't even in the original query I did on the website. Is this an issue or am I using this wrong?

No paper_dataset_list() interface in PapersWithCodeClient?

I cannot find a method to get the datasets used in the paper

how to get the creation time of the repository

Hi, can anyone tell me how to get the creation time of the repository? Because I don't see the create_time attribute in the Repository class. Thanks a lot.

ordering by published field doesn't work

This gets papers ordered by title:
https://paperswithcode.com/api/v1/papers/?ordering=title

This doesn't get papers ordered by published:
https://paperswithcode.com/api/v1/papers/?ordering=published

paper_repository_list() fails with `TypeError: ModelMetaclass object argument after ** must be a mapping, not str`

I've noticed that calling the method paper_repository_list() (and other methods with that follow the same structure). From what I've come to conclude, the problem is that in the return line return [Repository(**r) for r in self.http.get(f"/papers/{paper_id}/repositories/") ], the HTTP request returns a dict such as:

 'next': None,
 'previous': None,
 'results': [{'url': 'https://github.com/andreev-io/Simulated-Annealing',
   'is_official': False,
   'description': 'Implementation of the 1983 Simulated Annealing paper (Kirkpatrick et al.) in Rust. Combinatorial optimization, traveling salesman.',
   'stars': 1,
   'framework': 'none'}]}

so looping with that only gives the keys of this dict, instead of the list of results. I believe that adding the key 'results' should fix the problem, as the list comprehension would be looping indeed with each list element (a dict):
return [Repository(**r) for r in self.http.get(f"/papers/{paper_id}/repositories/")["results"]]

Querying next page doesn't seem to work

papers_page = client.paper_list(page=2, items_per_page=50)
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/pwc/lib/python3.7/site-packages/tea_client/handler.py", line 18, in wrapper
return func(self, *args, **kwargs)
File "/opt/anaconda3/envs/pwc/lib/python3.7/site-packages/paperswithcode_client-0.0.1-py3.7.egg/paperswithcode/client.py", line 98, in paper_list
File "/opt/anaconda3/envs/pwc/lib/python3.7/site-packages/paperswithcode_client-0.0.1-py3.7.egg/paperswithcode/client.py", line 74, in __page
File "/opt/anaconda3/envs/pwc/lib/python3.7/site-packages/paperswithcode_client-0.0.1-py3.7.egg/paperswithcode/client.py", line 65, in __parse
KeyError: 'page'

Hierarchy of method/tasks

It would be great if we could have a function that gives the children/parent or a task/method as we can see in the website

Add citation information

I am using this client in my master thesis and would like to properly give credit to the authors/maintainers of this repo. Have you thought about adding a CFF file or codemeta file?
About CFF: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files
About codemeta: https://github.com/codemeta/codemeta

Documentation

Hello, I was hoping to get some better documentation for this API. While it's a great resource it seems that its focused on getting SOTA updated comparisons.

My goal was to get all the data from the NAS ImageNet classification page and recreate a similar graph to the one on top but with some additional data. Unfortunately, I don't seem to see a way of doing this though the API. It even looks like I'd have to actually go an scrape the page itself for the desired data (but then I'm missing the release date which is only shown as year).

Some additional documentation would be really helpful for me and those in the future trying to simply get specific data instead of doing SOTA comparisons.

If its currently possible to do what I'm requesting please let me know! Thank you!

How to access to the unlisted datasets in PWC?

I discovered that the main dataset page mentions the availability of up to 9,753 machine-learning datasets:

However, upon navigating through the pages from page 1 to page 100, I found no way to access the datasets not listed within the first 100 pages. Even when I manually attempted to access pages beyond 100, the website returned the same dataset list as page 100.

Could you please advise if there is a method to retrieve datasets beyond the first 100 pages? Your assistance in this matter would be greatly appreciated. @alefnula @lambdaofgod @rstojnic @mkardas

The website search box failure

The website search box failure.