dahlia / wikidata Goto Github PK
View Code? Open in Web Editor NEWWikidata client library for Python
Home Page: https://pypi.org/project/Wikidata/
License: GNU General Public License v3.0
Wikidata client library for Python
Home Page: https://pypi.org/project/Wikidata/
License: GNU General Public License v3.0
How to check "P18" exist in claims or not?
@dahlia
Seems like this would require moving away from WeakValueDictionary?
In [1]: from wikidata.client import Client
In [2]: import pickle
In [3]: c = Client()
In [4]: example = c.get("Q42")
In [5]: with open("test", "wb") as f:
...: pickle.dump(example, f)
...:
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-966820a5735d> in <module>
1 with open("test", "wb") as f:
----> 2 pickle.dump(example, f)
3
4
AttributeError: Can't pickle local object 'WeakValueDictionary.__init__.<locals>.remove'
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
from wikidata.client import Client
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/Wikidata-0.6.1-py2.7.egg/wikidata/client.py", line 86
base_url: str=WIKIDATA_BASE_URL,
^
SyntaxError: invalid syntax
Hey I wanted to use wikidata previously and wrote my own little script. Would you accept a pull request if I polished that up a bit?
I don't really understand why you use all those different modules and programming techniques, like caches or why you need to create a client to do a http request for you. If you could give me a list of features you want, I can try to reach feature parity.
in entity.py (line 42) there is a use in babel:
return Locale.parse(locale.replace('-', '_'))
while in babel the function parse_locale
in core.py file can't parse the value cbk-zam which wikidata (and wikipedia) uses (https://cbk-zam.wikipedia.org/)
you can fix it by manually changing that function in babel. but for better solution I assume a bigger change should be considered.
(to see this bug just get the entity 'Q30' and try to print the label
Hello,
It seems that the pip requirement misses the typing include:
ImportError: No module named 'typing'
In [1]: from wikidata.client import Client
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-1b52aaa17050> in <module>()
----> 1 from wikidata.client import Client
/home/.../lib/python3.4/site-packages/wikidata/client.py in <module>()
6 import json
7 import logging
----> 8 from typing import (TYPE_CHECKING,
9 Callable, Mapping, MutableMapping, Optional, Sequence,
10 Union, cast)
ImportError: No module named 'typing'
It can be hand fixed by pip install typing
but it might be cleaner to add it in the setup
When i run:
wikidata_client = Client()
entity = wikidata_client.get('Q1893392', load=True)
p = wikidata_client.get('P21')
gender = entity[p].label
p = wikidata_client.get('P569')
birthdate = entity[p].label # <---- Fail here`
The error is:
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 7, in
birthdate = entity[p].label
File "C:\Python37\lib\site-packages\wikidata\entity.py", line 160, in getitem
result = self.getlist(key)
File "C:\Python37\lib\site-packages\wikidata\entity.py", line 191, in getlist
for snak in (claim['mainsnak'] for claim in claims)]
File "C:\Python37\lib\site-packages\wikidata\entity.py", line 191, in
for snak in (claim['mainsnak'] for claim in claims)]
File "C:\Python37\lib\site-packages\wikidata\client.py", line 178, in decode_datavalue
return decode(self, datatype, datavalue)
File "C:\Python37\lib\site-packages\wikidata\datavalue.py", line 127, in call
return method(client, datavalue)
File "C:\Python37\lib\site-packages\wikidata\datavalue.py", line 210, in time
datavalue
wikidata.datavalue.DatavalueError: 9: time precision other than 11 or 14 is unsupported: {'type': 'time', 'value': {'time': '+1905-01-01T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 9, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}}
How to get instance_of information?
currently, at least for me, it is not obvious
from wikidata.client import Client
client = Client()
entity = client.get('20145', load=True)
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\site-packages\wikidata\client.py", line 139, in get
entity.load()
File "C:\ProgramData\Anaconda3\lib\site-packages\wikidata\entity.py", line 239, in load
result = self.client.request(url)
File "C:\ProgramData\Anaconda3\lib\site-packages\wikidata\client.py", line 193, in request
response = self.opener.open(url)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Hello,
I'm running this package for more than two weeks on a project, but since a few days a get this error on all of my machines (Ubuntu 20.04). To me it seems to be that wikidata changed some sort of server port, address or api.
File ".../lib/python3.9/site-packages/wikidata/client.py", line 140, in get
entity.load()
File ".../lib/python3.9/site-packages/wikidata/entity.py", line 261, in load
result = self.client.request(url)
File ".../lib/python3.9/site-packages/wikidata/client.py", line 200, in request
raise e
File ".../lib/python3.9/site-packages/wikidata/client.py", line 194, in request
response = self.opener.open(url)
File "/usr/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/usr/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found```
As in title, is there a way to get the "also known as" field in wikidata?
Thanks
Hi, I'm new to wikidata in general.
I can't figure out how to get the start/end dates of someone who held office.
For example, president George Washington (Q23), there is a property P39 (offices held) and within that I can get that he was a president (Q11696). But that item itself has no start/end date property (P580, P582). However, on the wikidata page there is definitely those properties for each office held.
In my code I'm doing something like:
p39 = client.get('P39')
george = client.get('Q23')
george.get(p39)
which returns <wikidata.entity.Entity Q11696 'President of the United States'>
which is my dead end. this is the same object you'd get if i just did
client.get('Q11696')
because the item has no relation to the original item Q23.
Even the SPARQL to figure out the dates for a held office is kinda crazy. You have to do something like:
?pres p:P39 ?position_held_statement .
?position_held_statement ps:P39 wd:Q11696 .
?position_held_statement pq:P580 ?start .
any help is appreciated. thanks.
Some url as special character, better to use quote before construct the URL for the request.
In client.py
row 189.
change to url = urllib.parse.urljoin(self.base_url, urllib.parse.quote(path))
0.7.0
When sending several (10~100) requests in a row, some requests fail, without determinism, with the following error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Upon closer investigation, the actual response is a 429
,
Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy
, and<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
<p>Our servers are currently under maintenance or experiencing a technical problem.
Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few minutes.</p>
<p>See the error message at the bottom of this page for more information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from 122.216.10.145 via cp5012 cp5012, Varnish XID 477962109<br>Upstream caches: cp5012 int<br>Error: 429, Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy at Sun, 17 Jul 2022 22:28:20 GMT</code></p>
</div>
</html>
This library doesn't follow Wikimedia's user-agent policy, specifically:
<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]
. Parts that are not applicable can be omitted.
which leads in a temporary rate limiting/blacklisting of the agent:
Requests from disallowed user agents may instead encounter a less helpful error message like this:
Our servers are currently experiencing a technical problem. Please try again in a few minutes.
See also: https://meta.wikimedia.org/wiki/User-Agent_policy
Set an User-Agent
header compliant with the above policy, e.g.:
>>> import urllib
>>> od = urllib.request.OpenerDirector()
>>> od.addheaders
[('User-agent', 'Python-urllib/3.9')]
>>>
>>> import wikidata
>>> wikidata.__version__
'0.7.0'
>>>
>>> import sys
>>> od.addheaders = {
... "Accept": "application/sparql-results+json",
... "User-Agent": "wikidata-based-bot/%s (https://github.com/dahlia/wikidata ; [email protected]) python/%s.%s.%s Wikidata/%s" % (wikidata.__version__, sys.version_info.major, sys.version_info.minor, sys.version_info.micro, wikidata.__version__),
... }
>>>
>>> od.addheaders
{'Accept': 'application/sparql-results+json', 'User-Agent': 'wikidata-based-bot/0.7.0 (https://github.com/dahlia/wikidata ; [email protected]) python/3.9.13 Wikidata/0.7.0'}
Code:
client = Client() # doctest: +SKIP
entity = client.get('Q317521', load=True)
client = Client()
prop = client.get('P69', load=True)
res = entity.getlist(prop)
print(res[0])
print(res[0].label)
client = Client()
start_time = res[0].get('P580')
print(start_time)
entity2 = client.get('Q105424537', load=True)
prop2 = client.get('P580', load=True)
print(prop2)
print(prop2.label)
print(type(prop2))
print(prop2.get('start time'))
print(prop2.get('P580'))
Output:
<wikidata.entity.Entity Q105424537>
Smith School of Business
None
<wikidata.entity.Entity P580 'start time'>
start time
<class 'wikidata.entity.Entity'>
None
None
After installing and running the demonstration code I get the following error after the first two lines
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)>
Example code:
from wikidata.client import Client
client = Client()
hong_kong = client.get('Q8646')
locator_map_image = client.get('P242')
hong_kong.getlist(locator_map_image)
Error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/lib/python3.6/site-packages/wikidata/entity.py", line 191, in getlist
for snak in (claim['mainsnak'] for claim in claims)]
File "/lib/python3.6/site-packages/wikidata/entity.py", line 191, in <listcomp>
for snak in (claim['mainsnak'] for claim in claims)]
KeyError: 'datavalue'
I believe that this is due to the no value entry in the list for the locator map image of Hong Kong. In this case it appears to be an error in the underlying data, but the function should handle that gracefully. Since the docstring says "Return all values associated to the given key property in sequence.", such no value entries should probably simply get ignored.
from wikidata.client import Client
from wikidata.client import Entity
client = Client()
entity = client.get("Q220", load=True)
for _, values in entity.iterlists():
print(values)
---------------------------------------------------------------------------
DatavalueError Traceback (most recent call last)
<ipython-input-39-8af3489602dd> in <module>()
10 yield str(value.label)
11
---> 12 list(generate_related_entities_labels("Q220"))
6 frames
/usr/local/lib/python3.7/dist-packages/wikidata/datavalue.py in time(self, client, datavalue)
166 if cal != 'http://www.wikidata.org/entity/Q1985727':
167 raise DatavalueError('{!r} is unsupported calendarmodel for time '
--> 168 'datavalue'.format(cal), datavalue)
169 try:
170 time = value['time']
DatavalueError: 'http://www.wikidata.org/entity/Q1985786' is unsupported calendarmodel for time datavalue: {'type': 'time', 'value': {'time': '-0753-04-21T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 11, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985786'}}
See, for instance:
client = Client()
client.get("Q5", load=True).lists()
*** babel.core.UnknownLocaleError: unknown locale 'io'
I know that I can manually access all the data, but it'd be nice if .lists()
was a bit more robust...
Hi, is there any way to retrieve an entity's property's name? eg.
from wikidata.client import Client
client = Client()
entity = client.get(wiki_id)
nationality_prop = client.get('P27',load=True)
nationality = entity[nationality_prop]
I would like to retrieve entity's country of citizenship's value instead of description
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
from wikidata.client import Client
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named 'wikidata'
This date of birth is just the year 1950 instead of the full date. How would I get the partial data 1950 without this exception?
>>> from wikidata.client import Client
>>> client = Client() # doctest: +SKIP
>>> prop_dob = client.get('P569')
>>> entity = client.get('Q4794599')
>>> entity[prop_dob]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/z/.local/lib/python3.5/site-packages/wikidata/entity.py", line 160, in __getitem__
result = self.getlist(key)
File "/home/z/.local/lib/python3.5/site-packages/wikidata/entity.py", line 191, in getlist
for snak in (claim['mainsnak'] for claim in claims)]
File "/home/z/.local/lib/python3.5/site-packages/wikidata/entity.py", line 191, in <listcomp>
for snak in (claim['mainsnak'] for claim in claims)]
File "/home/z/.local/lib/python3.5/site-packages/wikidata/client.py", line 178, in decode_datavalue
return decode(self, datatype, datavalue)
File "/home/z/.local/lib/python3.5/site-packages/wikidata/datavalue.py", line 127, in __call__
return method(client, datavalue)
File "/home/z/.local/lib/python3.5/site-packages/wikidata/datavalue.py", line 210, in time
datavalue
wikidata.datavalue.DatavalueError: 9: time precision other than 11 or 14 is unsupported: {'type': 'time', 'value': {'precision': 9, 'timezone': 0, 'after': 0, 'before': 0, 'time': '+1950-01-01T00:00:00Z', 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}}
Hello,
Firstly this is not an issue with the code. Sorry for posting this under issues.
I wanted to get some additional documentation on using the module.
Suppose I want get a list of all the entities with a certain property.
The current documentation Wikidata client library for Python only shows how to get a property given an entity.
I'm trying to find if I can get a list of entities with the mentioned property.
e.g. get a list of countries (entities) given the property for country 'P17'.
I'm not sure if this already exists. Please let me know if it exists.
I would appreciate if the module had an improved documentation Issue #10 .
Thanks,
@sobalgi.
I am looping through entities and looking up multiple properties for each (7 in my real project, 3 in the attached toy example). Each property slows it down, so it will take hours to go through all the entities. Is there a way to speed this up please?
from wikidata.client import Client
client = Client() # doctest: +SKIP
p_givenname = client.get('P735')
p_surname = client.get('P734')
p_dob = client.get('P569')
def get_entity(wikidata_id):
entity = client.get(wikidata_id, load=True)
givenname = entity[p_givenname].label
surname = entity[p_surname].label
dob = entity[p_dob]
print ('%s %s %s' % (givenname, surname, dob))
w_ids = ['Q498805',
'Q482745',
'Q186',
'Q1363428',
'Q299700',
'Q196223',
'Q488828',
'Q490120']
import datetime as dt
n0 = dt.datetime.now()
for w_id in w_ids:
get_entity(w_id)
n1 = dt.datetime.now()
print ('elapsed time: ', n1 - n0)
print ('record count: ', len(w_ids))
e.g. something like this:
id = client.get_id('Physics')
id = 'Q2902'
Hi, wondering if it's possible to do a .get but for English language data points only? I noticed a majority of the data being downloaded is non-English and probably won't be useful for my use case.
Thanks!
Hi. Is this library GPLv3 or GPLv3+ (=or later)?
When fetching date of birth (P569) of Pythagoras (Item: Q10261) I got a DatavalueError... unsupported calendarmodel for time datavalue. Can you add/adjust the calender model to support for date of birth BCE. Many thanks.
As in title,can I get all properties of an entity?Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.