tamulib / scholars-discovery Goto Github PK

This project forked from vivo-community/scholars-discovery

License: MIT License

Java 85.63% HTML 13.36% Dockerfile 0.15% JavaScript 0.81% Shell 0.05%

scholars-discovery's Introduction

scholars-discovery

VIVO Scholars Discovery is a middleware project that pulls VIVO content into its own search index (Solr) and then exposes that content via a RESTful service endpoint.

Various frontend applications are available (or can be built) to display the content as read-only websites. Existing frontend applications include:

VIVO Scholars Angular

API

Scholars Middleware REST Service API Documentation

Background

Scholars Discovery project was initiated by Scholars@TAMU project team at Texas A&M University (TAMU) Libraries. In support of the Libraries’ goal of enabling and contextualizing the discovery of scholars and their expertise across disciplines, the Scholars’ team at TAMU Office of Scholarly Communications (OSC) proposed the Scholars version 2 project, which focuses on deploying (1) new public facing layer (Read-only), (2) faceted search engine, (3) Data reuse options, and (4) search engine optimization. Digital Initiative (DI) at TAMU Libraries collaborated with the OSC to design and implement the current system architecture including Scholars Discovery and VIVO Scholars Angular. In a later stage, Scholars Discovery project was adopted by VIVO Community’s VIVO Scholar Task Force.

Technology

Scholars discovery system is first and foremost an ETL system in which extracts data from VIVO's triplestore, transforms triples into flattened documents, and loads the documents into Solr. The Solr index is then exposed via REST API and GraphQL API as a nested JSON. A secondary feature is that of providing a persistent, configurable discovery layout for rendering a UI.

Extraction from VIVO is done view configurable harvesters in which make SPARQL requests to the triplestore for a collection of objects and subsequent SPARQL requests for each property value of the target document. The SPARQL requests can be found in src/main/resources/templates/sparql. The transformation is done granularly converting resulting triples of a SPARQL request into a property of a flattened document. This document is then saved into a heterogeneous Solr collection. The configuration of the Solr collection can be found in solr/config. In order to represent a flatten document as a nested JSON response, the field values are indexed with a relationship identifier convention. [value]::[id], [value]::[id]::[id], etc. During serialization the document model is traversed parsing the Solr field value and constructing a nested JSON.

Here is a list of some dependencies used:

Configuration

The basic Spring Boot application configuration can be found at src/main/resources/application.yml. Here you be able to configure basic server and spring configuration as well as custom configuration for Scholars Discovery. There are several configuration POJOs to represent configurations. They can be found in src/main/java/edu/tamu/scholars/middleware/config/model, and src/main/java/edu/tamu/scholars/middleware/auth/config.

Assets

Assets are hosted at /file/:id/:filename and configured location middleware.assets-location.

Tested options are

Assets stored in src/main/resources/assets

middleware.assets-location: classpath:/assets

Assets stored in externally

middleware.assets-location: file:/scholars/assets

Harvesting

Harvesting can be configured via middleware.harvesters and represented with HarvesterConfig. For each harvester, a bean will be created in which specifies the type of harvester and which document types it maps to. The reference implementation is the local triplestore harvester.

Indexing

Indexing can be configured via middleware.indexers and represented with IndexerConfig. For each indexer, a bean will be created in which specifies the type of indexer and which document types it indexes. The reference implementation is the solr indexer.

The application can be configured to harvest and index on startup, middleware.index.onStartup, and via a cron schedule via middleware.index.cron. The indexing is done in batch for performance. It can be tuned via middleware.index.batchSize.

Solr

Solr is configured via spring.data.solr.

Development Instructions

Install Maven
Install Docker
Start Solr

   cd solr && docker build --tag=scholars/solr . && docker run -d -p 8983:8983 scholars/solr && cd ..

Build and Run the application

   mvn clean install
   mvn spring-boot:run

Note: Custom application configuration can be achieved by providing a location and an optional profile, such as:

   mvn spring-boot:run -Dspring-boot.run.profiles=dev -Dspring-boot.run.config.location=/some/directory/

..where an application-dev.yml exists in the /some/location/ directory

Docker Deployment

docker build -t scholars/discovery .

docker run -d -p 9000:9000 -e SPRING_APPLICATION_JSON="{\"spring\":{\"data\":{\"solr\":{\"host\":\"http://localhost:8983/solr\"}}},\"ui\":{\"url\":\"http://localhost:3000\"},\"vivo\":{\"base-url\":\"http://localhost:8080/vivo\"},\"middleware\":{\"allowed-origins\":[\"http://localhost:3000\"],\"index\":{\"onStartup\":false},\"export\":{\"individualBaseUri\":\"http://localhost:3000/display\"}}}" scholars/discovery

The environment variable SPRING_APPLICATION_JSON will override properties in application.yml.

Verify Installation

With the above installation instructions, the following service endpoints can be verified:

The HAL(Hypertext Application Language) explorer can be used to browse scholars-discovery resources.

scholars-discovery's People

Contributors

Stargazers

Watchers

Forkers

wwelling

scholars-discovery's Issues

Make "View All" as default for all individual views

Reorder discovery view tabs

people, research overview, publications, grants, awards, courses, concepts, ideas, organizations

Add new badge for "URL"

https://app.abstract.com/projects/dff00c78-4bb7-4ffe-b0f1-35cbc7f4c92f/branches/ba0c186b-e529-4726-9056-eda096bbac70/commits/47c4988273e4fe49988e362d486d1cb022a76471/files/519fcad7-9751-4f96-b486-6937177aa8bb/layers/DF093475-12E1-4577-8487-01A17F7934D3?collectionId=33e5a222-6309-4f6e-b20a-6bedbaf42dd2&collectionLayerId=69bd2932-bdc9-4c76-80e4-2e96df6b9645&mode=build&selected=658629294-2B49CBC6-8236-45FD-A0FF-431B2CF52C61

Aside tags for person display view

Upgrade dependencies

Update document directory view result template to render tags

Person Teaching Tab

If the publication is in a new “Teaching Material” type (http://vivo.library.tamu.edu/ontology/TAMU#TeachingMaterial), publish them under “Teaching” tab of faculty profile.

The format is the same as “Institutional Repository Items.”

New search result view: Research Overview

Result: full text search of "research overview" only

Update label to "PubMed ID"

PubMed Central ID => PubMed ID

Patents sorting by year

Refactor Publication Individual Section Tags to UN Sustainable Development Goals

Searchable more facets

Make the facets searchable when you click the “more” button, instead of blindly clicking through the pages

Altmetric API call using "Key" properties

For "The Conversation" articles

Prefer key for altmetric lookup.
https://vivo.library.tamu.edu/vivo/display/n402148SE

Person Browse and Discovery Views should have UN Sustainable Developments Goals facet

Will reuse research tags property.

Update document discovery view result template to render tags

Add software tab in faculty profile view

https://vivo.library.tamu.edu/vivo/display/n419842SE

Will require list of nested object software on person with relevant properties. Needs SparQL to retrieve values. Then update to person display view with an additional tab for software.

UN Sustainable Development Goal widget on dashboard

A widget would be comprised of all the goals w/icons and be interactive to navigate to a filtered list of persons or publications with designated research tag matching goals.

https://www.un.org/sustainabledevelopment/news/communications-material/

Image only for reference and not mock-up.

Add tags to articles

Harvest and index tag on document. Update individual view subsection accordingly.

https://vivo.library.tamu.edu/vivo/display/n342196SE
https://vivo.library.tamu.edu/vivo/display/n367437SE https://vivo.library.tamu.edu/vivo/display/n359194SE https://vivo.library.tamu.edu/vivo/display/n126047SE https://vivo.library.tamu.edu/vivo/display/n280600SE https://vivo.library.tamu.edu/vivo/display/n289688SE https://vivo.library.tamu.edu/vivo/display/n289081SE

*subsection dropdown for selecting filter
vivo-community/scholars-angular#97

*aside tags for publication display view
#16

*tag facet for publication discovery view
#17

Future Research Idea UI

https://vivo.library.tamu.edu/vivo/display/nidea00000001

See word document and abstract.

Tune Grant Discovery

Discovery search tune: Add Grant Abstract
Current: Grants (sort by relevancy score)
· Grant title (1pt)
· Awarded by (1pt)
· contributors (1pt)
· Abstract (1pt)

Person individual view to render widget of UN SDGs from research tag

Image only for reference and not mock-up.

In The News - sort by date (descending)

sort by date (descending order)

(see Tracy Hammond’s news tab, https://scholars.library.tamu.edu/vivo/display/ne852c439/Persons/News?in%20the%20news.page=1&in%20the%20news.size=10)

Person's Grants/Awards - Principal Investigator should always be listed at the top

Example: https://scholars.library.tamu.edu/vivo/display/ne852c439/Persons/Grants%2FAwards

• Principal Investigator
• Co-Principal Investigator
• Investigator
• Collaborator

Concept's Research Area Of - faculty names need bullet point

https://scholars.library.tamu.edu/vivo/display/nfst00910345/Concepts/Overview

Upgrade dependencies

Track search keywords for data analysis

a. We were keeping track users’ search keywords in VIVO but we since new Scholars lost the function
b. We need to keep track their keywords that they are using for data analysis

In The News - date mismatch

publication date doesn’t match between VIVO and Scholars

Add Copy To for Grant Abstract to include in full text searching

Grant’s contributor property doesn’t provide bullet points for faculty

https://scholars.library.tamu.edu/vivo/display/n90786197/Relationships/View%20All

Ensure "URI/URL" is exposed in Document display view

For this item (Properties not displaying in UI (URI, URL)), here is an example:

All handles are not visible in Scholars.

https://vivo.library.tamu.edu/vivo/display/n348911SE

https://scholars.library.tamu.edu/vivo/display/n348911SE/Documents/View%20All

Add emailto on Future Research Idea main content template

Like a button with prominence. To should be email for the person idea of.

Organizational hierarchy

Top level organization requires recursively get children organizations for inclusion. Include all faculty of child organizations.

Harvest afford for recursive sparql lookups.

Tune search results

Search Result Tune:
*search term case-insensitive
*boost on term frequency, not on field
*ability to supply sort for tie breaker

People (sort by relevancy score):
-Boost with last name and first name
• Research areas (2 pt)
• Overview (2 pt)
• Publication titles (1 pt)

Publications (sort by relevancy score)
-no boost
• Publication titles (1pt)
• Abstracts (1pt)
• Journal title (1pt)
• Keywords

Grants (sort by relevancy score)
-no boost
• Grant title (1pt)
• Awarded by (1pt)
• contributors (1pt)

Awards (sort by relevancy score)
-no boost
• Award title (1pt)
• Conferred by (1pt)

Courses (sort by relevancy score)
-no boost
• Course title (1pt)
• Participants (1pt)
• Add Participant facet

Concepts (sort by relevancy score)
-no boost
• Concept (1pt)
• Research area of (1pt)

Research Overview (sort by relevancy score):
-no boost
• Overview (1 pt)

Research Idea (sort by relevancy score):
-no boost
• Idea title (1 pt)
• Idea owner (1 pt)
• Description (1 pt)
• keywords (1 pt)

Dashboard update link to “Tutorials: How to Update”

Add text of “Tutorials: How to Update” in the front page. Hyperlink the text to https://scholars.library.tamu.edu/howto_scholars/

I will send an update for the linked page (above) with multiple pdf tutorials.

Internet publication - template update

a. https://scholars.library.tamu.edu/vivo/display/n402148SE/Documents/View%20All
b. Currently we don’t use Note. We want to use Note next to the current citation
c. Example: Ros, A. (2019). The bias hiding in your library – A article submitted to The Conversation

Update publication for altmetric score and citation count

harvest and index on document

Add "Advised capstone (non-thesis) project" under "works by students"

Will now be three types; dissertation, master's thesis, and advised capstone project. Capstone will be custom type of document and not inferred from text like dissertation and master's thesis.

Example for this: Add "Advised capstone (non-thesis) project" under "works by students"

https://vivo.library.tamu.edu/vivo/display/ncapstone000001

Requires adding advisedBy to Document and capstoneAdvisedOf to Person.

Customize number range facets

Altmetric

range 50
minimum 1 
maximum 500

0    (n)
1-49
50-99
...
450-499
500+ (n)

Citation

range 100
minimum 1 
maximum 1000

0    (n)
1-99 (n)
100-199
...
1000+ (n)

Add range facet support

Missing property (indexed but not displaying) - Chapter Book Title

https://scholars.library.tamu.edu/vivo/display/n94628SE/Documents/View%20All

Expected
Book Title: Mask for Scholars

Add stem filters to Solr managed schema

<analyzer type="index">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>

Update "Discovery" label to "Advanced Search" label in the main entry page

Update person publication subsection templates to render tags

Add new publication types (report, internet publication, theses)

Webpage - https://vivo.library.tamu.edu/vivo/display/n402148SE
Thesis - https://vivo.library.tamu.edu/vivo/display/n401399SE
Report - https://vivo.library.tamu.edu/vivo/display/n286121SE

Update views accordingly. Same as award template for search result with note appended.

Missing College and School Organizations on Publication Author Organization

4.  Discovery > Publication > facet > Author organization
  a.  Facet: Author organization > missing School of Law
  b.  Missing College level organizations

Also, determine if any other organization types are missing. Possibly provide a count of Author Organizations.

*ui widget for number range
vivo-community/scholars-angular#100

*afford number range facet to solr
vivo-community#196

Make badges in templates open in a new tab

target="_blank"

Add "In the News" tab in faculty profile

https://vivo.library.tamu.edu/vivo/display/nidea00000001https://vivo.library.tamu.edu/vivo/individual?uri=http%3A%2F%2Fscholars.library.tamu.edu%2Fvivo%2Findividual%2Fnhttp%3A%2F%2Fnews.cci.fsu.edu%2Fcci-in-the-news%2Fischool-doc-student-dong-joon-lee-spends-summer-at-purdue-university

See abstract.

Will require list of nested object news on person with relevant properties. Needs SparQL to retrieve values. Then update to person display view with an additional tab for In the News.