andreaspacher / openeditors Goto Github PK

View Code? Open in Web Editor NEW

52.0 4.0 11.0 121.06 MB

Webscraping data about editors of scientific journals.

Home Page: https://openeditors.ooir.org/

License: Creative Commons Zero v1.0 Universal

R 100.00%

science scientometrics meta-science journals editors publishers academia research bibliometrics

openeditors's People

Contributors

Stargazers

Watchers

Forkers

jeanbaptisteb daniel-mietchen bmkramer jeroenbaas nwilliam868 mwaiton nuest javis25 mitcreos stefanocoretta yundongxie

openeditors's Issues

Some wrong ROR IDs

We have noticed a few wrong ROR IDs during our attempt to create a subset of India-specific results from the OpenEditor dataset (editors1_ror_and_countries.csv and editors2_ror_and_countries.csv).

The classic two cases as examples are as follows:

A) Indian Institute of Science, Bangalore: the corresponding records for this premier Indian institute show wrong ror IDs in all rows/records - https://ror.org/05j873a45 - This ror ID is actually for Indian Institute of Soil Science (IISS, भाकृअनुप-भारतीय मृदा विज्ञान संस्थान, Website - http://www.iiss.nic.in/index.html)

B) Christian Medical College Vellore, Vellore, India: the corresponding records for this institute show wrong ror ID in all rows/records - https://ror.org/01vj9qy35 - This ror ID is actually for Christian Medical College, Ludhiana (another CMC in another city and state in India) (Website - http://cmcludhiana.in/medical_college/)

Possible reasons:

An API call to ROR database (in affiliation field) for Indian Institute of Science like - https://api.ror.org/organizations?filter=country.country_code:IN&affiliation=Indian+Institute+of+Science - shows a few results (around 14) with following data in json format
++++++++++
{"number_of_results":10,"items":[{"substring":"Indian Institute of Science","score":0.92,"matching_type":"COMMON TERMS","chosen":true,"organization":{"id":"https://ror.org/05j873a45","name":"Indian Institute of Soil Science","email_address":null,"ip_addresses":[],"established":1988,"types":["Facility"],"relationships":[{"label":"Indian Council of Agricultural Research","type":"Parent","id":"https://ror.org/04fw54a43"}],"addresses":[{"lat":23.309722,"lng":77.403056,"state":null,"state_code":null,"city":"Bhopal","geonames_city":{"id":1275841,"city":"Bhopal","geonames_admin1":{"name":"Madhya Pradesh","id":1264542,"ascii_name":"Madhya Pradesh","code":"IN.35"},"geonames_admin2":{"name":"Bhopāl","id":1275842,"ascii_name":"Bhopal","code":"IN.35.444"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iiss.nic.in/index.html"],"aliases":[],"acronyms":["IISS"],"status":"active","wikipedia_url":"https://en.wikipedia.org/wiki/Indian_Institute_of_Soil_Science","labels":[{"label":"भाकृअनुप-भारतीय मृदा विज्ञान संस्थान","iso639":"hi"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0000 9288 3664"]},"Wikidata":{"preferred":null,"all":["Q18125957"]},"GRID":{"preferred":"grid.464869.1","all":"grid.464869.1"}}}},{"substring":"Indian Institute of Science","score":0.84,"matching_type":"PHRASE","chosen":false,"organization":{"id":"https://ror.org/04dese585","name":"Indian Institute of Science Bangalore","email_address":null,"ip_addresses":[],"established":1909,"types":["Education"],"relationships":[],"addresses":[{"lat":13.021275,"lng":77.565769,"state":null,"state_code":null,"city":"Bengaluru","geonames_city":{"id":1277333,"city":"Bengaluru","geonames_admin1":{"name":"Karnataka","id":1267701,"ascii_name":"Karnataka","code":"IN.19"},"geonames_admin2":{"name":"Bangalore Urban","id":1277331,"ascii_name":"Bangalore Urban","code":"IN.19.572"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iisc.ernet.in/"],"aliases":[],"acronyms":["IISc"],"status":"active","wikipedia_url":"http://en.wikipedia.org/wiki/Indian_Institute_of_Science","labels":[{"label":"ఇండియన్ ఇన్ స్టిట్యూట్ ఆఫ్ సైన్స్","iso639":"te"},{"label":"இந்திய அறிவியல் கழகம்","iso639":"ta"},{"label":"ਭਾਰਤੀ ਵਿਗਿਆਨ ਅਦਾਰਾ","iso639":"pa"},{"label":"ഇന്ത്യൻ ഇൻസ്റ്റിറ്റ്യൂട്ട് ഓഫ് സയൻസ്","iso639":"ml"},{"label":"ಭಾರತೀಯ ವಿಜ್ಞಾನ ಸಂಸ್ಥೆ","iso639":"kn"},{"label":"भारतीय विज्ञान संस्थान","iso639":"hi"},{"label":"ભારતીય વિજ્ઞાન સંસ્થા","iso639":"gu"},{"label":"ভারতীয় বিজ্ঞান সংস্থা","iso639":"bn"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0001 0482 5067"]},"FundRef":{"preferred":"100007780","all":["100007780","100007871","100008044","100009935"]},"OrgRef":{"preferred":null,"all":["37533"]},"Wikidata":{"preferred":null,"all":["Q948720"]},"GRID":{"preferred":"grid.34980.36","all":"grid.34980.36"}}}},........
++++++++++++++++++++

We can easily understand now that what is the reason for wrong ror ID in this case. The first one i.e Indian Institute of Soil Science has been picked up the process. In fact we have also observed that to be on the safe side score=1.0 is a better condition than chosen==true for extracting ror IDs through API call (but I am not quite sure that you have also adopted API path for ror ID or you are fetching ror IDs through some other means).

We found a total of 455 records (India-specific only) initially with wrong ror IDs in a total of 8170 records having ror IDs (out of 10316 records with affiliated country as India).

I am attaching a csv file containing these 455 records (rorORI column is the ror ID as available in the dataset and rorOEM is the corrected ror ID as fetched for our subset of data)

no-match-report.csv

Add LICENSE

to clarify reusability

Harvest from EPIsciences

https://www.episciences.org/

Create a release of this repo

So as to facilitate reuse.

Here is a description of the workflow to do this via Zenodo.

Add harvesting of Copernicus Publications

Hi! 👋 Great project!

Do you accept PRs for new publishers? In my field, Copernicus Publications would be particularly interesting an is missing from the list.

https://publications.copernicus.org/open-access_journals/journals_by_subject.html provides a list of links to editorial boards, and could be a good starting point for harvesting.

Add harvesting from SciPost

https://scipost.org/?tab=journals has a list of journals

https://scipost.org/colleges/ has the "editorial colleges". Seems like some colleges are editors for more than one journal. Please advise how you would model that in the data.

Add harvesting of OJS journals

Open Journal Systems is one of the main platforms for independent journals, often diamond OA and researcher-led. These OJS servers are all around, of course, and not that easy to find (no single "publisher"), so a list of URLs might be needed. And harvesting would be tricky because of the possibly varying structure of the information on the websites.

Some random examples:

Maybe find many of these pages with https://duckduckgo.com/?q=ojs+about%2FeditorialTeam&t=ffsb&ia=web ?

This would add a significant "other part" of the academic publishing system.

Non ASCII characters support

Hi,
Bravo for this great initiative !
I suppose you already know that non ASCII strings are not well supported in your data. They seem to be filtered out of the strings : erased or replaced.
Example from this search : https://openeditors.ooir.org/index.php?editor_query=Nantes :
. Journal Title : 'Archives de Pdiatrie' should be 'Archives de Pédiatrie' > character erased
. University name : 'Universit de Nantes; Nantes, France' should be 'Université de Nantes; Nantes, France' > character erased
. Editor name : 'Francois Galgani' should be ''François Galgani'' > character 'ç' replaced by 'c'

If all characters could be preserved in Unicode, it would be eprfect !

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.