Git Product home page Git Product logo

wdscholia / scholia Goto Github PK

View Code? Open in Web Editor NEW
209.0 15.0 77.0 4.63 MB

Wikidata-based scholarly profiles

Home Page: https://scholia.toolforge.org

License: Other

Python 16.78% HTML 10.60% JavaScript 67.34% CSS 0.30% Jupyter Notebook 4.93% Dockerfile 0.04%
wikidata scientometrics bibliography code4lib sparql bibliometrics fairdata citations wikicite datacuration dataviz linked-open-data latex bibtex hacktoberfest literature

scholia's Introduction

Scholia

Website Gitpod Github last commit GitHub issues Documentation Status


Scholia is a python package and webapp for interaction with scholarly information in Wikidata.

Installation

Scholia can be installed directly from GitHub with:

$ python3 -m pip install git+https://github.com/WDscholia/scholia

It can be installed in development mode with:

$ git clone https://github.com/WDscholia/scholia
$ cd scholia
$ pip install --editable .

Webapp

As a webapp, it currently runs from Wikimedia Toolforge, a facility provided by the Wikimedia Foundation. It is accessible from

https://scholia.toolforge.org/

The webapp displays scholarly profiles for individual researchers, research topics, organizations, journals, works, events, awards and so on. For instance, the scholarly profile for psychologist Uta Frith is accessible from

https://scholia.toolforge.org/author/Q8219

The information displayed on the page is only what is available in Wikidata.

Run locally after installing with pip:

$ scholia run

Script

It is possible to use methods of the scholia package as a script:

$ python -m scholia.query twitter-to-q fnielsen
Q20980928

Contributing

A simple way to get up and running is to launch Scholia via Gitpod, which installs the dependencies listed in requirements.txt automatically and launches the web app via runserver.py.

See file CONTRIBUTING.rst for technical details on how to improve Scholia.

References

scholia's People

Contributors

adafede avatar alexanderpico avatar ammar257ammar avatar awakenting avatar bittu426 avatar carlinmack avatar cthoyt avatar curibe avatar daniel-mietchen avatar danielflopez avatar dartar avatar daxserver avatar egonw avatar faresh9 avatar fehrhart avatar fgkolf avatar fnielsen avatar guyfawcus avatar khanspers avatar lahire avatar larsgw avatar librerli avatar maxlath avatar mkutmon avatar nichtich avatar nintendofan885 avatar oolonek avatar physikerwelt avatar tholzheim avatar wolfgangfahl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scholia's Issues

For Work also list a table with citing articles sorted by number of citations

Like you get on Google Scholar... this is the query that can be used:

#defaultView:Table
SELECT ?work ?work_label (count(?citing_work) as ?count) WHERE {
  ?work wdt:P2860 wd:Q27065423 .
  ?citing_work wdt:P2860 ?work . 
  ?work rdfs:label ?work_label . filter (lang(?work_label) = 'en')
} group by ?work ?work_label
order by desc(?count)
limit 20

Where you can replace wd:Q27065423 with $work

most cited articles for a journal (venue)

Use this query:

SELECT ?work ?work_label (COUNT(?citing_work) AS ?count) WHERE {
  ?work wdt:P1433 wd:Q6294930.
  ?citing_work wdt:P2860 ?work.
  ?work rdfs:label ?work_label.
  FILTER((LANG(?work_label)) = "en")
}
GROUP BY ?work ?work_label
ORDER BY DESC(?count)
LIMIT 20

Using the Journal of Cheminformatics (wd:Q6294930) as an example.

UTF-8 problem with bibtex

Scholia generates bibtex files with UTF-8 characters that bibtex cannot understand. Bibtex's sorting and author name extraction fails. Either users are forced over to biber or something must be done, e.g., a conversion table and accent conversion.

Collaborating organization for organizations

Collaborating organization for organizations suggested by Egon Willighagen https://twitter.com/egonwillighagen/status/807230336224677888

select (count(distinct ?work) as ?count) ?affiliation2 ?affiliation2_label where {
  ?work (p:P50|p:P2093) ?author_statement . 
  ?author_statement ps:P50 wd:Q20895241 .
  ?author_statement pq:P1416 wd:Q19845644 .
  {
    ?work p:P2093 ?author2_statement . 
    ?author2_statement ps:P2093 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  } UNION {
    ?work p:P50 ?author2_statement . 
    ?author2_statement ps:P50 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  }
  ?affiliation2 rdfs:label ?affiliation2_label . filter (lang(?affiliation2_label) = 'en')
  FILTER (?affiliation2 != wd:Q19845644)
  FILTER (?author2 != wd:Q20895241)
} group by ?affiliation2 ?affiliation2_label
order by desc(?count) asc(?affiliation2_label)

more prolific authors in some venue

Again with JCheminf as example:

SELECT ?author ?author_label (count(?work) as ?count) WHERE {
  ?work wdt:P50 ?author ;
        wdt:P1433 wd:Q6294930.
  ?author rdfs:label ?author_label . filter (lang(?author_label) = 'en')
} group by ?author ?author_label
order by desc(?count)
limit 20

Collaborating institutions for author

Collaborating institutions for author suggested by Egon Willighagen https://twitter.com/egonwillighagen/status/807230510267244544

select (count(distinct ?work) as ?count) ?affiliation2 ?affiliation2_label where {
  ?work (p:P50|p:P2093) ?author_statement . 
  ?author_statement ps:P50 wd:Q20895241 .
  ?author_statement pq:P1416 ?affiliation .
  ?affiliation rdfs:label ?affiliation_label . filter (lang(?affiliation_label) = 'en')
  {
    ?work p:P2093 ?author2_statement . 
    ?author2_statement ps:P2093 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  } UNION {
    ?work p:P50 ?author2_statement . 
    ?author2_statement ps:P50 ?author2 .
    ?author2_statement pq:P1416 ?affiliation2 .
  }
  ?affiliation2 rdfs:label ?affiliation2_label . filter (lang(?affiliation2_label) = 'en')
  FILTER (?affiliation2 != ?affiliation)
  FILTER (?author2 != wd:Q20895241)
} group by ?affiliation2 ?affiliation2_label
order by desc(?count) asc(?affiliation2_label)

Add co-founders to sponsor page

For instance:

# Co-founders
select (count(?work) as ?count) ?sponsor ?sponsorLabel where {
  ?work wdt:P859 wd:Q22329431 .
  ?work wdt:P859 ?sponsor .
  filter (wd:Q22329431 != ?sponsor)
  service wikibase:label { bd:serviceParam wikibase:language "en,de,fr,es,jp,it,da,sv,zh" }
}
group by ?sponsor ?sponsorLabel
order by desc(?count)

Citation per year for author

Citation per year for author suggested by Egon Willighagen https://twitter.com/egonwillighagen/status/807281048006651904

#defaultView:BarChart
#defaultView:Table
SELECT ?year (count(distinct ?citing_work) as ?count) WHERE {
  ?work wdt:P50 wd:Q20895241 .
  ?citing_work wdt:P2860 ?work . 
  # to remove self-citations: minus { ?citing_work wdt:P50 wd:Q20895241 }
  ?citing_work wdt:P577 ?date .
  BIND(str(YEAR(?date)) AS ?year)
} group by ?year
order by desc(?year)    

maybe max(?birthdate)?

The SPARQLs for the "venue" list the birthdates of workers, and uses:

(min(?birthdates) as ?birthdate)

But if multiple dates are given for someone, like one with only a year, the date is defaulted to the January 1, the minimum... it seems to me that "min(?birthdates)" returns the least precise information?

Basic error handling

If a SPARQL query times out, this currently produces an error that Scholia should catch and replace with a useful boilerplate message.

Likewise, if the SPARQL query produced no results, this should be stated in a standardized fashion.

Display license information in article/ journal/ publisher/ author/ organization aspects

e.g. as per this sample query:

#defaultView:Table
SELECT ?work ?workLabel 
(min(?dates) as ?date) 
(sample(?pages_) as ?pages) 
(sample(?venue_labels) as ?venue) 
(sample(?license_labels) as ?license) 
(group_concat(?author_label; separator=", ") as ?authors)  
WHERE {
  ?work wdt:P50 wd:Q20980928 .
  ?work wdt:P50 ?author .
  ?author rdfs:label ?author_label . filter (lang(?author_label) = 'en')
  
  optional { ?work wdt:P577 ?dates }.
  optional { ?work wdt:P1104 ?pages_ }.
  optional { ?work wdt:P275 ?licenses }.
  optional { ?work wdt:P1433 ?venues . ?venues rdfs:label ?venue_labels . filter (lang(?venue_labels) = 'en') }.
  optional { ?work wdt:P275 ?licenses . ?licenses rdfs:label ?license_labels . filter (lang(?license_labels) = 'en') }.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr,de,ru,es,zh,jp". }  
} group by ?work ?workLabel
order by desc(?date)    

Show non-english label if no English label exists

At least for works as not everything is published in English and translating publication titles seems wrong. Unfortunately I don't know how to support any language. I stumbled upon this because of https://www.wikidata.org/wiki/Q27044176. To support at least some other languages change

SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 

to

SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr,de,ru,es,zh,jp". } 

and/or try to get the title via title propert https://www.wikidata.org/wiki/Property:P1476.

Co-authors

Too lazy for pull request; suggest using co-authors in sci articles AND books, using:

  VALUES ?type { wd:Q13442814 wd:Q571 } .
  ?work wdt:P31 ?type . 

citations per author

citations per author:

#defaultView:BarChart
select ?year (sum(?citations_per_author_) as ?citations_per_author) ?researcher_label where {
 {
    select ?researcher_label ?work ?year (count(distinct ?citing_work) / count(distinct ?researcher_of_paper) as ?citations_per_author_) where {
      { ?researcher wdt:P108 wd:Q24283660 . } union { ?researcher wdt:P1416 [ wdt:P361* wd:Q24283660 ] .  }
      ?work (wdt:P50|wdt:P2093) ?researcher_of_paper .
      # ?work wdt:P31 wd:Q13442814 .
      ?work wdt:P50 ?researcher .
      ?citing_work wdt:P2860 ?work .
      ?work wdt:P577 ?date . 
      bind(str(year(?date)) as ?year) 
      ?researcher rdfs:label ?researcher_label . filter(lang(?researcher_label) = 'en')
    } 
    group by ?work ?researcher_label ?year
  }
}
group by ?year ?researcher_label 
order by ?year

Display "main subject" for more aspects

Publishers, organizations, funders could in principle all have some form of P921 display based on aggregation of article and/ or journal-level P921 statements.

Add documentation to the SPARQL queries

It would probably good to enrich the "Edit query on Wikidata.org" links with brief documentation of the SPARQL queries, such that SPARQL and/ or Wikidata newbies can use them as a playground.

Use property paths instead of unions where possible

Investigate whether property paths instead of unions works better and change them and use them.

For instance, topic identification:

{ ?work wdt:P921/wdt:P31*/wdt:P279* wd:Q52 . }
union { ?work wdt:P921/wdt:P361+ wd:Q52 . }
union { ?work wdt:P921/wdt:P1269+ wd:Q52 . }

Property path:

?work wdt:P921/( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+ ) wd:Q52 .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.