wdscholia / scholia Goto Github PK

Wikidata-based scholarly profiles

Home Page: https://scholia.toolforge.org

License: Other

Python 16.78% HTML 10.60% JavaScript 67.34% CSS 0.30% Jupyter Notebook 4.93% Dockerfile 0.04%

wikidata scientometrics bibliography code4lib sparql bibliometrics fairdata citations wikicite datacuration dataviz linked-open-data latex bibtex hacktoberfest literature

scholia's Introduction

Scholia is a python package and webapp for interaction with scholarly information in Wikidata.

Installation

Scholia can be installed directly from GitHub with:

$ python3 -m pip install git+https://github.com/WDscholia/scholia

It can be installed in development mode with:

$ git clone https://github.com/WDscholia/scholia
$ cd scholia
$ pip install --editable .

Webapp

As a webapp, it currently runs from Wikimedia Toolforge, a facility provided by the Wikimedia Foundation. It is accessible from

https://scholia.toolforge.org/

The webapp displays scholarly profiles for individual researchers, research topics, organizations, journals, works, events, awards and so on. For instance, the scholarly profile for psychologist Uta Frith is accessible from

https://scholia.toolforge.org/author/Q8219

The information displayed on the page is only what is available in Wikidata.

Run locally after installing with pip:

$ scholia run

Script

It is possible to use methods of the scholia package as a script:

$ python -m scholia.query twitter-to-q fnielsen
Q20980928

Contributing

A simple way to get up and running is to launch Scholia via Gitpod, which installs the dependencies listed in requirements.txt automatically and launches the web app via runserver.py.

See file CONTRIBUTING.rst for technical details on how to improve Scholia.

References

Scholia's page about itself: https://scholia.toolforge.org/topic/Q45340488
Wikidata overview page about Scholia: https://www.wikidata.org/wiki/Wikidata:Scholia
Lane Rasberry, Egon Willighagen, Finn Nielsen, Daniel Mietchen, "Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata. Research Ideas and Outcomes", 2019, RIO Journal, 5: e35820. https://doi.org/10.3897/rio.5.e35820
Finn Årup Nielsen, Daniel Mietchen, Egon Willighagen, "Scholia and scientometrics with Wikidata", Joint Proceedings of the 1st International Workshop on Scientometrics and 1st International Workshop on Enabling Decentralised Scholarly Communication, 2017. http://ceur-ws.org/Vol-1878/article-03.pdf
Finn Årup Nielsen, Daniel Mietchen, Egon Willighagen, "Scholia, Scientometrics and Wikidata", The Semantic Web: ESWC 2017 Satellite Events, 2017. DOI: 10.1007/978-3-319-70407-4_36. https://link.springer.com/content/pdf/10.1007%2F978-3-319-70407-4_36.pdf

scholia's People

Contributors

Stargazers

Watchers

scholia's Issues

Handle taxa hierarchy in the topic aspect

For Work also list a table with citing articles sorted by number of citations

Like you get on Google Scholar... this is the query that can be used:

#defaultView:Table
SELECT ?work ?work_label (count(?citing_work) as ?count) WHERE {
  ?work wdt:P2860 wd:Q27065423 .
  ?citing_work wdt:P2860 ?work . 
  ?work rdfs:label ?work_label . filter (lang(?work_label) = 'en')
} group by ?work ?work_label
order by desc(?count)
limit 20

Where you can replace wd:Q27065423 with $work

Provide link to full text (if available)

Maybe you can provide a link to the full text (P953) is that is given in the record, just below the title?

most cited articles for a journal (venue)

Use this query:

SELECT ?work ?work_label (COUNT(?citing_work) AS ?count) WHERE {
  ?work wdt:P1433 wd:Q6294930.
  ?citing_work wdt:P2860 ?work.
  ?work rdfs:label ?work_label.
  FILTER((LANG(?work_label)) = "en")
}
GROUP BY ?work ?work_label
ORDER BY DESC(?count)
LIMIT 20

Using the Journal of Cheminformatics (wd:Q6294930) as an example.

Histogram view for venue

Histogram view for venue per suggestion of @egonw https://twitter.com/egonwillighagen/status/835776043239817216

Add search to toolbar

The search form should be visible on all pages.

Versioneer

Add versioneer

Use labels instead of URI in map in authors

The selector uses URIs, see https://tools.wmflabs.org/scholia/author/Q18921408 for instance. Labels should be used.

Use arXiv API instead of HTML

Change get_metadata in arxiv.py to use API.

Citation network for topics

UTF-8 problem with bibtex

Scholia generates bibtex files with UTF-8 characters that bibtex cannot understand. Bibtex's sorting and author name extraction fails. Either users are forced over to biber or something must be done, e.g., a conversion table and accent conversion.

Collaborating organization for organizations

Collaborating organization for organizations suggested by Egon Willighagen https://twitter.com/egonwillighagen/status/807230336224677888

select (count(distinct ?work) as ?count) ?affiliation2 ?affiliation2_label where {
  ?work (p:P50|p:P2093) ?author_statement . 
  ?author_statement ps:P50 wd:Q20895241 .
  ?author_statement pq:P1416 wd:Q19845644 .
  {
    ?work p:P2093 ?author2_statement . 
    ?author2_statement ps:P2093 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  } UNION {
    ?work p:P50 ?author2_statement . 
    ?author2_statement ps:P50 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  }
  ?affiliation2 rdfs:label ?affiliation2_label . filter (lang(?affiliation2_label) = 'en')
  FILTER (?affiliation2 != wd:Q19845644)
  FILTER (?author2 != wd:Q20895241)
} group by ?affiliation2 ?affiliation2_label
order by desc(?count) asc(?affiliation2_label)

Show all authors in list of publications and sorted

The list of publications shows only P50-authors, - not P2093-authors. And the authors are not sorted according to P1545.

Page with sponsor

Scholia could have a page with sponsor/funder.

more prolific authors in some venue

Again with JCheminf as example:

SELECT ?author ?author_label (count(?work) as ?count) WHERE {
  ?work wdt:P50 ?author ;
        wdt:P1433 wd:Q6294930.
  ?author rdfs:label ?author_label . filter (lang(?author_label) = 'en')
} group by ?author ?author_label
order by desc(?count)
limit 20

Collaborating institutions for author

Collaborating institutions for author suggested by Egon Willighagen https://twitter.com/egonwillighagen/status/807230510267244544

select (count(distinct ?work) as ?count) ?affiliation2 ?affiliation2_label where {
  ?work (p:P50|p:P2093) ?author_statement . 
  ?author_statement ps:P50 wd:Q20895241 .
  ?author_statement pq:P1416 ?affiliation .
  ?affiliation rdfs:label ?affiliation_label . filter (lang(?affiliation_label) = 'en')
  {
    ?work p:P2093 ?author2_statement . 
    ?author2_statement ps:P2093 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  } UNION {
    ?work p:P50 ?author2_statement . 
    ?author2_statement ps:P50 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  }
  ?affiliation2 rdfs:label ?affiliation2_label . filter (lang(?affiliation2_label) = 'en')
  FILTER (?affiliation2 != ?affiliation)
  FILTER (?author2 != wd:Q20895241)
} group by ?affiliation2 ?affiliation2_label
order by desc(?count) asc(?affiliation2_label)

Additional aspects

occupations
academic fields
methods

Add articles about person/ organization/ funder/ journal/ etc.

Such that articles like
https://www.wikidata.org/wiki/Q21145375 (i.e. for which there is a P921 statement)
pop up in the scholia view for the objects of those P921 statements, i.e.
https://tools.wmflabs.org/scholia/author/Q92714
and
https://tools.wmflabs.org/scholia/author/Q4829027
for the above example.

Use QID for aspects as well

e.g.
https://tools.wmflabs.org/scholia/Q2085381/
as an equivalent to
https://tools.wmflabs.org/scholia/publisher/ .

In query results, provide option to replace Wikidata links with Scholia links

Allow traveling from one Scholia page to another one by replacing - optionally - the links to Wikidata items with links to Scholia profiles.

Expand scholia template on Wikidata

I set up a very basic version of a Scholia template:
https://www.wikidata.org/wiki/Template:Scholia ,
test corpus at
https://www.wikidata.org/wiki/User:Daniel_Mietchen/Bioinformatics .

Add co-founders to sponsor page

For instance:

# Co-founders
select (count(?work) as ?count) ?sponsor ?sponsorLabel where {
  ?work wdt:P859 wd:Q22329431 .
  ?work wdt:P859 ?sponsor .
  filter (wd:Q22329431 != ?sponsor)
  service wikibase:label { bd:serviceParam wikibase:language "en,de,fr,es,jp,it,da,sv,zh" }
}
group by ?sponsor ?sponsorLabel
order by desc(?count)

Citation per year for author

Citation per year for author suggested by Egon Willighagen https://twitter.com/egonwillighagen/status/807281048006651904

#defaultView:BarChart
#defaultView:Table
SELECT ?year (count(distinct ?citing_work) as ?count) WHERE {
  ?work wdt:P50 wd:Q20895241 .
  ?citing_work wdt:P2860 ?work . 
  # to remove self-citations: minus { ?citing_work wdt:P50 wd:Q20895241 }
  ?citing_work wdt:P577 ?date .
  BIND(str(YEAR(?date)) AS ?year)
} group by ?year
order by desc(?year)

maybe max(?birthdate)?

The SPARQLs for the "venue" list the birthdates of workers, and uses:

(min(?birthdates) as ?birthdate)

But if multiple dates are given for someone, like one with only a year, the date is defaulted to the January 1, the minimum... it seems to me that "min(?birthdates)" returns the least precise information?

Missing distinct in Co-occurring topics

Non-humans should have curly braches in bibtex generation

Display license icons in bibliography

For references whose Wikidata item has P275 set to an open license (i.e. CC BY or CC BY-SA) or CC0/PD, display the corresponding Creative Commons icon.

Show list of co-topics

Basic error handling

If a SPARQL query times out, this currently produces an error that Scholia should catch and replace with a useful boilerplate message.

Likewise, if the SPARQL query produced no results, this should be stated in a standardized fashion.

Suborganization affiliation should be one or more

Suborganization affiliation should be one or more: ?suborganization wdt:P361 wd:Q1269766 to ?suborganization wdt:P361+ wd:Q1269766

iframe height are too big for some query

iframe height are too big for some query, leaving a lot a white space.

Check paths for organization queries

For instance, Max Planck Society
https://tools.wmflabs.org/scholia/organization/Q158085
should aggregate from across the different Max Planck Institutes but currently does not.

Display license information in article/ journal/ publisher/ author/ organization aspects

e.g. as per this sample query:

#defaultView:Table
SELECT ?work ?workLabel 
(min(?dates) as ?date) 
(sample(?pages_) as ?pages) 
(sample(?venue_labels) as ?venue) 
(sample(?license_labels) as ?license) 
(group_concat(?author_label; separator=", ") as ?authors)  
WHERE {
  ?work wdt:P50 wd:Q20980928 .
  ?work wdt:P50 ?author .
  ?author rdfs:label ?author_label . filter (lang(?author_label) = 'en')
  
  optional { ?work wdt:P577 ?dates }.
  optional { ?work wdt:P1104 ?pages_ }.
  optional { ?work wdt:P275 ?licenses }.
  optional { ?work wdt:P1433 ?venues . ?venues rdfs:label ?venue_labels . filter (lang(?venue_labels) = 'en') }.
  optional { ?work wdt:P275 ?licenses . ?licenses rdfs:label ?license_labels . filter (lang(?license_labels) = 'en') }.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr,de,ru,es,zh,jp". }  
} group by ?work ?workLabel
order by desc(?date)

Fix zero count issue in SPARQL return for bar charts

For bar chart plotting years with zero count are not rendered. The SPARQL query can be modified to include zero count years.

This issue is also discussed on Stackoverflow: https://stackoverflow.com/questions/40454268/how-can-get-default-return-values-for-a-sparql-query-when-counting

"Citations by year" for authors should not be distinct on citing works

Show non-english label if no English label exists

At least for works as not everything is published in English and translating publication titles seems wrong. Unfortunately I don't know how to support any language. I stumbled upon this because of https://www.wikidata.org/wiki/Q27044176. To support at least some other languages change

SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr,de,ru,es,zh,jp". }

and/or try to get the title via title propert https://www.wikidata.org/wiki/Property:P1476.

Topics for journals

@egonw suggest topics for journals https://twitter.com/egonwillighagen/status/815143507639828480

Look into tweeting Scholia entries

some inspiration in http://kitchingroup.cheme.cmu.edu/blog/2016/08/25/Automated-bibtex-entry-tweeting/ .

Co-authors

Too lazy for pull request; suggest using co-authors in sci articles AND books, using:

  VALUES ?type { wd:Q13442814 wd:Q571 } .
  ?work wdt:P31 ?type .

List of journals in topic pages

A list of relevant journals on the topic page

Switch DOI handling to all-uppercase

as per the recent switch in practice on Wikidata.

Allow for zooming between specific topic and general area

For instance,
https://tools.wmflabs.org/scholia/topic/Q18123741
lists basically only papers directly annotated with P921=Q18123741, which is good, but it would be good to have an option (think
https://tools.wmflabs.org/scholia/topic/Q18123741&zoom=+2 )
to aggregate results from some subclasses of infectious diseases as well.
Conversely, it would be nice to be able to zoom out to diseases or medicine more generally.

Think about a logo

In
https://www.wikidata.org/wiki/Template:Scholia ,
I have used
https://commons.wikimedia.org/wiki/File:Sonar_tracking_of_tungsten_ball_underneath_research_vessel_for_calibration_(16824332958).jpg but this is probably not what we want in the long term.

citations per author

citations per author:

#defaultView:BarChart
select ?year (sum(?citations_per_author_) as ?citations_per_author) ?researcher_label where {
 {
    select ?researcher_label ?work ?year (count(distinct ?citing_work) / count(distinct ?researcher_of_paper) as ?citations_per_author_) where {
      { ?researcher wdt:P108 wd:Q24283660 . } union { ?researcher wdt:P1416 [ wdt:P361* wd:Q24283660 ] .  }
      ?work (wdt:P50|wdt:P2093) ?researcher_of_paper .
      # ?work wdt:P31 wd:Q13442814 .
      ?work wdt:P50 ?researcher .
      ?citing_work wdt:P2860 ?work .
      ?work wdt:P577 ?date . 
      bind(str(year(?date)) as ?year) 
      ?researcher rdfs:label ?researcher_label . filter(lang(?researcher_label) = 'en')
    } 
    group by ?work ?researcher_label ?year
  }
}
group by ?year ?researcher_label 
order by ?year

{ ?work wdt:P921/wdt:P31*/wdt:P279* wd:Q52 . }
union { ?work wdt:P921/wdt:P361+ wd:Q52 . }
union { ?work wdt:P921/wdt:P1269+ wd:Q52 . }

Property path:

?work wdt:P921/( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+ ) wd:Q52 .

Specialized search page

Specialized search page for "advanced" search where the category can be specified. It is started here: http://127.0.0.1:8100/search Missing functionality:

Link from the search bar for enter, see #1460
Display the label of the Wikidata item.
Enable paging.
Make the search "intelligent", so if a recognized pattern, e.g., arXiv identifier or DOI is entered/copy-pasted it will search with http://127.0.0.1:8100/arxiv-to-quickstatements
- #1474