Git Product home page Git Product logo

wikidata's Introduction

Wikidata client library for Python

Latest PyPI version

Documentation Status

GitHub Actions

This package provides easy APIs to use Wikidata for Python.

>>> from wikidata.client import Client >>> client = Client() # doctest: +SKIP >>> entity = client.get('Q20145', load=True) >>> entity <wikidata.entity.Entity Q20145 'IU'> >>> entity.description m'South Korean singer and actress' >>> image_prop = client.get('P18') >>> image = entity[image_prop] >>> image <wikidata.commonsmedia.File 'File:KBS "The Producers" press conference, 11 May 2015 10.jpg'> >>> image.image_resolution (820, 1122) >>> image.image_url 'https://upload.wikimedia.org/wikipedia/commons/6/60/KBS_%22The_Producers%22_press_conference%2C_11_May_2015_10.jpg'

wikidata's People

Contributors

admp avatar dahlia avatar hcordobest avatar joker234 avatar k----n avatar nelson-liu avatar yorwba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wikidata's Issues

Wikidata identifier to name

Hi, is there any way to retrieve an entity's property's name? eg.

from wikidata.client import Client
client = Client()
entity = client.get(wiki_id)
nationality_prop = client.get('P27',load=True)
nationality = entity[nationality_prop]

I would like to retrieve entity's country of citizenship's value instead of description

url encode issue

Some url as special character, better to use quote before construct the URL for the request.

In client.py

row 189.

change to url = urllib.parse.urljoin(self.base_url, urllib.parse.quote(path))

Speed of looking up properties

I am looping through entities and looking up multiple properties for each (7 in my real project, 3 in the attached toy example). Each property slows it down, so it will take hours to go through all the entities. Is there a way to speed this up please?

from wikidata.client import Client

client = Client()  # doctest: +SKIP

p_givenname = client.get('P735')
p_surname = client.get('P734')
p_dob = client.get('P569')


def get_entity(wikidata_id):
    entity = client.get(wikidata_id, load=True)

    givenname = entity[p_givenname].label
    surname = entity[p_surname].label
    dob = entity[p_dob]
    print ('%s %s %s' % (givenname, surname, dob))

w_ids = ['Q498805',
         'Q482745',
         'Q186',
         'Q1363428',
         'Q299700',
         'Q196223',
         'Q488828',
         'Q490120']


import datetime as dt
n0 = dt.datetime.now()
for w_id in w_ids:
    get_entity(w_id)
n1 = dt.datetime.now()
print ('elapsed time: ', n1 - n0)
print ('record count: ', len(w_ids))

Not an Issue : regarding documentation on using Wikidata

Hello,
Firstly this is not an issue with the code. Sorry for posting this under issues.

I wanted to get some additional documentation on using the module.

Suppose I want get a list of all the entities with a certain property.
The current documentation Wikidata client library for Python only shows how to get a property given an entity.
I'm trying to find if I can get a list of entities with the mentioned property.

e.g. get a list of countries (entities) given the property for country 'P17'.

I'm not sure if this already exists. Please let me know if it exists.
I would appreciate if the module had an improved documentation Issue #10 .

Thanks,
@sobalgi.

Method to get qualifiers

elon_musk_educated_at

Code:

client = Client()  # doctest: +SKIP
entity = client.get('Q317521', load=True)

client = Client()
prop = client.get('P69', load=True)

res = entity.getlist(prop)
print(res[0])
print(res[0].label)

client = Client()
start_time = res[0].get('P580')
print(start_time)

entity2 = client.get('Q105424537', load=True)
prop2 = client.get('P580', load=True)
print(prop2)
print(prop2.label)
print(type(prop2))
print(prop2.get('start time'))
print(prop2.get('P580'))

Output:

<wikidata.entity.Entity Q105424537>
Smith School of Business
None
<wikidata.entity.Entity P580 'start time'>
start time
<class 'wikidata.entity.Entity'>
None
None

How to download English data only?

Hi, wondering if it's possible to do a .get but for English language data points only? I noticed a majority of the data being downloaded is non-English and probably won't be useful for my use case.

Thanks!

Empty response (JSONDecodeError) when sending many requests in a row

Version

0.7.0

Problem

When sending several (10~100) requests in a row, some requests fail, without determinism, with the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Upon closer investigation, the actual response is a 429,

  • with "reason": Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy, and
  • with the following "body":
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
	* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
	<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
	<p>Our servers are currently under maintenance or experiencing a technical problem.

	Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p>

<p>See the error message at the bottom of this page for more&nbsp;information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from 122.216.10.145 via cp5012 cp5012, Varnish XID 477962109<br>Upstream caches: cp5012 int<br>Error: 429, Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy at Sun, 17 Jul 2022 22:28:20 GMT</code></p>
</div>
</html>

Root cause

This library doesn't follow Wikimedia's user-agent policy, specifically:

<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

which leads in a temporary rate limiting/blacklisting of the agent:

Requests from disallowed user agents may instead encounter a less helpful error message like this:
Our servers are currently experiencing a technical problem. Please try again in a few minutes.

See also: https://meta.wikimedia.org/wiki/User-Agent_policy

Solution

Set an User-Agent header compliant with the above policy, e.g.:

>>> import urllib
>>> od = urllib.request.OpenerDirector()
>>> od.addheaders 
[('User-agent', 'Python-urllib/3.9')]
>>> 
>>> import wikidata
>>> wikidata.__version__
'0.7.0'
>>> 
>>> import sys
>>> od.addheaders = { 
...     "Accept": "application/sparql-results+json",
...     "User-Agent": "wikidata-based-bot/%s (https://github.com/dahlia/wikidata ; [email protected]) python/%s.%s.%s Wikidata/%s" % (wikidata.__version__, sys.version_info.major, sys.version_info.minor, sys.version_info.micro, wikidata.__version__),
... }
>>> 
>>> od.addheaders 
{'Accept': 'application/sparql-results+json', 'User-Agent': 'wikidata-based-bot/0.7.0 (https://github.com/dahlia/wikidata ; [email protected]) python/3.9.13 Wikidata/0.7.0'}

urllib.error.HTTPError: HTTP Error 400: Bad Request

from wikidata.client import Client
client = Client()
entity = client.get('20145', load=True)
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\site-packages\wikidata\client.py", line 139, in get
entity.load()
File "C:\ProgramData\Anaconda3\lib\site-packages\wikidata\entity.py", line 239, in load
result = self.client.request(url)
File "C:\ProgramData\Anaconda3\lib\site-packages\wikidata\client.py", line 193, in request
response = self.opener.open(url)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

wikidata.datavalue.DatavalueError: 9: time precision other than 11 or 14 is unsupported

This date of birth is just the year 1950 instead of the full date. How would I get the partial data 1950 without this exception?

>>> from wikidata.client import Client                                              
>>> client = Client()  # doctest: +SKIP                                             
>>> prop_dob = client.get('P569')                                                      
>>> entity = client.get('Q4794599')
>>> entity[prop_dob]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/z/.local/lib/python3.5/site-packages/wikidata/entity.py", line 160, in __getitem__
    result = self.getlist(key)
  File "/home/z/.local/lib/python3.5/site-packages/wikidata/entity.py", line 191, in getlist
    for snak in (claim['mainsnak'] for claim in claims)]
  File "/home/z/.local/lib/python3.5/site-packages/wikidata/entity.py", line 191, in <listcomp>
    for snak in (claim['mainsnak'] for claim in claims)]
  File "/home/z/.local/lib/python3.5/site-packages/wikidata/client.py", line 178, in decode_datavalue
    return decode(self, datatype, datavalue)
  File "/home/z/.local/lib/python3.5/site-packages/wikidata/datavalue.py", line 127, in __call__
    return method(client, datavalue)
  File "/home/z/.local/lib/python3.5/site-packages/wikidata/datavalue.py", line 210, in time
    datavalue
wikidata.datavalue.DatavalueError: 9: time precision other than 11 or 14 is unsupported: {'type': 'time', 'value': {'precision': 9, 'timezone': 0, 'after': 0, 'before': 0, 'time': '+1950-01-01T00:00:00Z', 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}}

http error 404

Hello,
I'm running this package for more than two weeks on a project, but since a few days a get this error on all of my machines (Ubuntu 20.04). To me it seems to be that wikidata changed some sort of server port, address or api.

  File ".../lib/python3.9/site-packages/wikidata/client.py", line 140, in get
    entity.load()
  File ".../lib/python3.9/site-packages/wikidata/entity.py", line 261, in load
    result = self.client.request(url)
  File ".../lib/python3.9/site-packages/wikidata/client.py", line 200, in request
    raise e
  File ".../lib/python3.9/site-packages/wikidata/client.py", line 194, in request
    response = self.opener.open(url)
  File "/usr/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found```

SyntaxError: invalid syntax (Python 2.7)

Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from wikidata.client import Client
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/Wikidata-0.6.1-py2.7.egg/wikidata/client.py", line 86
base_url: str=WIKIDATA_BASE_URL,
^
SyntaxError: invalid syntax

errors when querying wikidata due to cbk-zam

in entity.py (line 42) there is a use in babel:
return Locale.parse(locale.replace('-', '_'))

while in babel the function parse_locale in core.py file can't parse the value cbk-zam which wikidata (and wikipedia) uses (https://cbk-zam.wikipedia.org/)

you can fix it by manually changing that function in babel. but for better solution I assume a bigger change should be considered.

(to see this bug just get the entity 'Q30' and try to print the label

entity.getlist(key) fails for "no value" entries

Example code:

from wikidata.client import Client
client = Client()
hong_kong = client.get('Q8646')
locator_map_image = client.get('P242')
hong_kong.getlist(locator_map_image)

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.6/site-packages/wikidata/entity.py", line 191, in getlist
    for snak in (claim['mainsnak'] for claim in claims)]
  File "/lib/python3.6/site-packages/wikidata/entity.py", line 191, in <listcomp>
    for snak in (claim['mainsnak'] for claim in claims)]
KeyError: 'datavalue'

I believe that this is due to the no value entry in the list for the locator map image of Hong Kong. In this case it appears to be an error in the underlying data, but the function should handle that gracefully. Since the docstring says "Return all values associated to the given key property in sequence.", such no value entries should probably simply get ignored.

unable to parse birth date

When i run:

   wikidata_client = Client()
    entity = wikidata_client.get('Q1893392', load=True)
    p = wikidata_client.get('P21')
    gender = entity[p].label
    p = wikidata_client.get('P569')
    birthdate = entity[p].label  # <---- Fail here`

The error is:
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 7, in
birthdate = entity[p].label
File "C:\Python37\lib\site-packages\wikidata\entity.py", line 160, in getitem
result = self.getlist(key)
File "C:\Python37\lib\site-packages\wikidata\entity.py", line 191, in getlist
for snak in (claim['mainsnak'] for claim in claims)]
File "C:\Python37\lib\site-packages\wikidata\entity.py", line 191, in
for snak in (claim['mainsnak'] for claim in claims)]
File "C:\Python37\lib\site-packages\wikidata\client.py", line 178, in decode_datavalue
return decode(self, datatype, datavalue)
File "C:\Python37\lib\site-packages\wikidata\datavalue.py", line 127, in call
return method(client, datavalue)
File "C:\Python37\lib\site-packages\wikidata\datavalue.py", line 210, in time
datavalue
wikidata.datavalue.DatavalueError: 9: time precision other than 11 or 14 is unsupported: {'type': 'time', 'value': {'time': '+1905-01-01T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 9, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}}

Add support for Proleptic Julian calendar

When fetching date of birth (P569) of Pythagoras (Item: Q10261) I got a DatavalueError... unsupported calendarmodel for time datavalue. Can you add/adjust the calender model to support for date of birth BCE. Many thanks.

ImportError: No module named 'wikidata' (Python 3.5)

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

from wikidata.client import Client
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named 'wikidata'

SSL error

After installing and running the demonstration code I get the following error after the first two lines

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)>

Entity.lists() is not robust to unknown locales

See, for instance:

client = Client()
client.get("Q5", load=True).lists()
*** babel.core.UnknownLocaleError: unknown locale 'io'

I know that I can manually access all the data, but it'd be nice if .lists() was a bit more robust...

How do you get the value of an "instance" property?

Hi, I'm new to wikidata in general.

I can't figure out how to get the start/end dates of someone who held office.
For example, president George Washington (Q23), there is a property P39 (offices held) and within that I can get that he was a president (Q11696). But that item itself has no start/end date property (P580, P582). However, on the wikidata page there is definitely those properties for each office held.

In my code I'm doing something like:

p39 = client.get('P39')
george = client.get('Q23')
george.get(p39)

which returns <wikidata.entity.Entity Q11696 'President of the United States'>
which is my dead end. this is the same object you'd get if i just did
client.get('Q11696') because the item has no relation to the original item Q23.

Even the SPARQL to figure out the dates for a held office is kinda crazy. You have to do something like:

?pres p:P39 ?position_held_statement .
?position_held_statement ps:P39 wd:Q11696 .
?position_held_statement pq:P580 ?start .

any help is appreciated. thanks.

Entities cannot be pickled

Seems like this would require moving away from WeakValueDictionary?

In [1]: from wikidata.client import Client

In [2]: import pickle

In [3]: c = Client()

In [4]: example = c.get("Q42")

In [5]: with open("test", "wb") as f:
   ...:     pickle.dump(example, f)
   ...:
   ...:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-966820a5735d> in <module>
      1 with open("test", "wb") as f:
----> 2     pickle.dump(example, f)
      3
      4

AttributeError: Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'

ImportError: No module named 'typing'

Hello,

It seems that the pip requirement misses the typing include:
ImportError: No module named 'typing'

In [1]: from wikidata.client import Client
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-1b52aaa17050> in <module>()
----> 1 from wikidata.client import Client

/home/.../lib/python3.4/site-packages/wikidata/client.py in <module>()
      6 import json
      7 import logging
----> 8 from typing import (TYPE_CHECKING,
      9                     Callable, Mapping, MutableMapping, Optional, Sequence,
     10                     Union, cast)

ImportError: No module named 'typing'

It can be hand fixed by pip install typing but it might be cleaner to add it in the setup

DatavalueError: unsupported calendarmodel for time datavalue

from wikidata.client import Client
from wikidata.client import Entity

client = Client()
entity = client.get("Q220", load=True)
for _, values in entity.iterlists():
  print(values)
---------------------------------------------------------------------------
DatavalueError                            Traceback (most recent call last)
<ipython-input-39-8af3489602dd> in <module>()
     10         yield str(value.label)
     11 
---> 12 list(generate_related_entities_labels("Q220"))

6 frames
/usr/local/lib/python3.7/dist-packages/wikidata/datavalue.py in time(self, client, datavalue)
    166         if cal != 'http://www.wikidata.org/entity/Q1985727':
    167             raise DatavalueError('{!r} is unsupported calendarmodel for time '
--> 168                                  'datavalue'.format(cal), datavalue)
    169         try:
    170             time = value['time']

DatavalueError: 'http://www.wikidata.org/entity/Q1985786' is unsupported calendarmodel for time datavalue: {'type': 'time', 'value': {'time': '-0753-04-21T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 11, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985786'}}

Would you mind a rewrite?

Hey I wanted to use wikidata previously and wrote my own little script. Would you accept a pull request if I polished that up a bit?

I don't really understand why you use all those different modules and programming techniques, like caches or why you need to create a client to do a http request for you. If you could give me a list of features you want, I can try to reach feature parity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.