Git Product home page Git Product logo

openeditors's People

Contributors

andreaspacher avatar bmkramer avatar daniel-mietchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

openeditors's Issues

Some wrong ROR IDs

We have noticed a few wrong ROR IDs during our attempt to create a subset of India-specific results from the OpenEditor dataset (editors1_ror_and_countries.csv and editors2_ror_and_countries.csv).

The classic two cases as examples are as follows:

A) Indian Institute of Science, Bangalore: the corresponding records for this premier Indian institute show wrong ror IDs in all rows/records - https://ror.org/05j873a45 - This ror ID is actually for Indian Institute of Soil Science (IISS, भाकृअनुप-भारतीय मृदा विज्ञान संस्थान, Website - http://www.iiss.nic.in/index.html)

B) Christian Medical College Vellore, Vellore, India: the corresponding records for this institute show wrong ror ID in all rows/records - https://ror.org/01vj9qy35 - This ror ID is actually for Christian Medical College, Ludhiana (another CMC in another city and state in India) (Website - http://cmcludhiana.in/medical_college/)

Possible reasons:

An API call to ROR database (in affiliation field) for Indian Institute of Science like - https://api.ror.org/organizations?filter=country.country_code:IN&affiliation=Indian+Institute+of+Science - shows a few results (around 14) with following data in json format
++++++++++
{"number_of_results":10,"items":[{"substring":"Indian Institute of Science","score":0.92,"matching_type":"COMMON TERMS","chosen":true,"organization":{"id":"https://ror.org/05j873a45","name":"Indian Institute of Soil Science","email_address":null,"ip_addresses":[],"established":1988,"types":["Facility"],"relationships":[{"label":"Indian Council of Agricultural Research","type":"Parent","id":"https://ror.org/04fw54a43"}],"addresses":[{"lat":23.309722,"lng":77.403056,"state":null,"state_code":null,"city":"Bhopal","geonames_city":{"id":1275841,"city":"Bhopal","geonames_admin1":{"name":"Madhya Pradesh","id":1264542,"ascii_name":"Madhya Pradesh","code":"IN.35"},"geonames_admin2":{"name":"Bhopāl","id":1275842,"ascii_name":"Bhopal","code":"IN.35.444"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iiss.nic.in/index.html"],"aliases":[],"acronyms":["IISS"],"status":"active","wikipedia_url":"https://en.wikipedia.org/wiki/Indian_Institute_of_Soil_Science","labels":[{"label":"भाकृअनुप-भारतीय मृदा विज्ञान संस्थान","iso639":"hi"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0000 9288 3664"]},"Wikidata":{"preferred":null,"all":["Q18125957"]},"GRID":{"preferred":"grid.464869.1","all":"grid.464869.1"}}}},{"substring":"Indian Institute of Science","score":0.84,"matching_type":"PHRASE","chosen":false,"organization":{"id":"https://ror.org/04dese585","name":"Indian Institute of Science Bangalore","email_address":null,"ip_addresses":[],"established":1909,"types":["Education"],"relationships":[],"addresses":[{"lat":13.021275,"lng":77.565769,"state":null,"state_code":null,"city":"Bengaluru","geonames_city":{"id":1277333,"city":"Bengaluru","geonames_admin1":{"name":"Karnataka","id":1267701,"ascii_name":"Karnataka","code":"IN.19"},"geonames_admin2":{"name":"Bangalore Urban","id":1277331,"ascii_name":"Bangalore Urban","code":"IN.19.572"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iisc.ernet.in/"],"aliases":[],"acronyms":["IISc"],"status":"active","wikipedia_url":"http://en.wikipedia.org/wiki/Indian_Institute_of_Science","labels":[{"label":"ఇండియన్ ఇన్ స్టిట్యూట్ ఆఫ్ సైన్స్","iso639":"te"},{"label":"இந்திய அறிவியல் கழகம்","iso639":"ta"},{"label":"ਭਾਰਤੀ ਵਿਗਿਆਨ ਅਦਾਰਾ","iso639":"pa"},{"label":"ഇന്ത്യൻ ഇൻസ്റ്റിറ്റ്യൂട്ട് ഓഫ് സയൻസ്","iso639":"ml"},{"label":"ಭಾರತೀಯ ವಿಜ್ಞಾನ ಸಂಸ್ಥೆ","iso639":"kn"},{"label":"भारतीय विज्ञान संस्थान","iso639":"hi"},{"label":"ભારતીય વિજ્ઞાન સંસ્થા","iso639":"gu"},{"label":"ভারতীয় বিজ্ঞান সংস্থা","iso639":"bn"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0001 0482 5067"]},"FundRef":{"preferred":"100007780","all":["100007780","100007871","100008044","100009935"]},"OrgRef":{"preferred":null,"all":["37533"]},"Wikidata":{"preferred":null,"all":["Q948720"]},"GRID":{"preferred":"grid.34980.36","all":"grid.34980.36"}}}},........
++++++++++++++++++++

We can easily understand now that what is the reason for wrong ror ID in this case. The first one i.e Indian Institute of Soil Science has been picked up the process. In fact we have also observed that to be on the safe side score=1.0 is a better condition than chosen==true for extracting ror IDs through API call (but I am not quite sure that you have also adopted API path for ror ID or you are fetching ror IDs through some other means).

We found a total of 455 records (India-specific only) initially with wrong ror IDs in a total of 8170 records having ror IDs (out of 10316 records with affiliated country as India).

I am attaching a csv file containing these 455 records (rorORI column is the ror ID as available in the dataset and rorOEM is the corrected ror ID as fetched for our subset of data)

no-match-report.csv

Add harvesting of OJS journals

Open Journal Systems is one of the main platforms for independent journals, often diamond OA and researcher-led. These OJS servers are all around, of course, and not that easy to find (no single "publisher"), so a list of URLs might be needed. And harvesting would be tricky because of the possibly varying structure of the information on the websites.

Some random examples:

Maybe find many of these pages with https://duckduckgo.com/?q=ojs+about%2FeditorialTeam&t=ffsb&ia=web ?

This would add a significant "other part" of the academic publishing system.

Non ASCII characters support

Hi,
Bravo for this great initiative !
I suppose you already know that non ASCII strings are not well supported in your data. They seem to be filtered out of the strings : erased or replaced.
Example from this search : https://openeditors.ooir.org/index.php?editor_query=Nantes :
. Journal Title : 'Archives de Pdiatrie' should be 'Archives de Pédiatrie' > character erased
. University name : 'Universit de Nantes; Nantes, France' should be 'Université de Nantes; Nantes, France' > character erased
. Editor name : 'Francois Galgani' should be ''François Galgani'' > character 'ç' replaced by 'c'

If all characters could be preserved in Unicode, it would be eprfect !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.