Git Product home page Git Product logo

scholars-discovery's Introduction

Java CI with Maven Coverage Status

scholars-discovery

VIVO Scholars Discovery is a middleware project that pulls VIVO content into its own search index (Solr) and then exposes that content via a RESTful service endpoint.

Various frontend applications are available (or can be built) to display the content as read-only websites. Existing frontend applications include:

  1. VIVO Scholars Angular

API

Scholars Middleware REST Service API Documentation

Background

Scholars Discovery project was initiated by Scholars@TAMU project team at Texas A&M University (TAMU) Libraries. In support of the Libraries’ goal of enabling and contextualizing the discovery of scholars and their expertise across disciplines, the Scholars’ team at TAMU Office of Scholarly Communications (OSC) proposed the Scholars version 2 project, which focuses on deploying (1) new public facing layer (Read-only), (2) faceted search engine, (3) Data reuse options, and (4) search engine optimization. Digital Initiative (DI) at TAMU Libraries collaborated with the OSC to design and implement the current system architecture including Scholars Discovery and VIVO Scholars Angular. In a later stage, Scholars Discovery project was adopted by VIVO Community’s VIVO Scholar Task Force.

Technology

Scholars discovery system is first and foremost an ETL system in which extracts data from VIVO's triplestore, transforms triples into flattened documents, and loads the documents into Solr. The Solr index is then exposed via REST API and GraphQL API as a nested JSON. A secondary feature is that of providing a persistent, configurable discovery layout for rendering a UI.

Extraction from VIVO is done view configurable harvesters in which make SPARQL requests to the triplestore for a collection of objects and subsequent SPARQL requests for each property value of the target document. The SPARQL requests can be found in src/main/resources/templates/sparql. The transformation is done granularly converting resulting triples of a SPARQL request into a property of a flattened document. This document is then saved into a heterogeneous Solr collection. The configuration of the Solr collection can be found in solr/config. In order to represent a flatten document as a nested JSON response, the field values are indexed with a relationship identifier convention. [value]::[id], [value]::[id]::[id], etc. During serialization the document model is traversed parsing the Solr field value and constructing a nested JSON.

Here is a list of some dependencies used:

  1. Spring Boot
  2. Apache Jena
  3. Apache Solr

Configuration

The basic Spring Boot application configuration can be found at src/main/resources/application.yml. Here you be able to configure basic server and spring configuration as well as custom configuration for Scholars Discovery. There are several configuration POJOs to represent configurations. They can be found in src/main/java/edu/tamu/scholars/middleware/config/model, and src/main/java/edu/tamu/scholars/middleware/auth/config.

Assets

Assets are hosted at /file/:id/:filename and configured location middleware.assets-location.

Tested options are

Assets stored in src/main/resources/assets

middleware.assets-location: classpath:/assets

Assets stored in externally

middleware.assets-location: file:/scholars/assets

Harvesting

Harvesting can be configured via middleware.harvesters and represented with HarvesterConfig. For each harvester, a bean will be created in which specifies the type of harvester and which document types it maps to. The reference implementation is the local triplestore harvester.

Indexing

Indexing can be configured via middleware.indexers and represented with IndexerConfig. For each indexer, a bean will be created in which specifies the type of indexer and which document types it indexes. The reference implementation is the solr indexer.

The application can be configured to harvest and index on startup, middleware.index.onStartup, and via a cron schedule via middleware.index.cron. The indexing is done in batch for performance. It can be tuned via middleware.index.batchSize.

Solr

Solr is configured via spring.data.solr.

Development Instructions

  1. Install Maven
  2. Install Docker
  3. Start Solr
   cd solr && docker build --tag=scholars/solr . && docker run -d -p 8983:8983 scholars/solr && cd ..
  1. Build and Run the application
   mvn clean install
   mvn spring-boot:run
  • Note: Custom application configuration can be achieved by providing a location and an optional profile, such as:
   mvn spring-boot:run -Dspring-boot.run.profiles=dev -Dspring-boot.run.config.location=/some/directory/
  • ..where an application-dev.yml exists in the /some/location/ directory

Docker Deployment

docker build -t scholars/discovery .
docker run -d -p 9000:9000 -e SPRING_APPLICATION_JSON="{\"spring\":{\"data\":{\"solr\":{\"host\":\"http://localhost:8983/solr\"}}},\"ui\":{\"url\":\"http://localhost:3000\"},\"vivo\":{\"base-url\":\"http://localhost:8080/vivo\"},\"middleware\":{\"allowed-origins\":[\"http://localhost:3000\"],\"index\":{\"onStartup\":false},\"export\":{\"individualBaseUri\":\"http://localhost:3000/display\"}}}" scholars/discovery

The environment variable SPRING_APPLICATION_JSON will override properties in application.yml.

Verify Installation

With the above installation instructions, the following service endpoints can be verified:

  1. HAL Explorer (9000/explorer)
  2. REST API (9000/individual)
  3. REST API Docs (9000/api)

The HAL(Hypertext Application Language) explorer can be used to browse scholars-discovery resources.

scholars-discovery's People

Contributors

wwelling avatar kaladay avatar jsavell avatar rmathew1011 avatar jimwood avatar jeremythuff avatar bluedevelz avatar nymbyl avatar doug-hahn avatar ht29 avatar jmicah avatar snyk-bot avatar

Stargazers

Ben avatar

Watchers

James Cloos avatar  avatar

Forkers

wwelling

scholars-discovery's Issues

Searchable more facets

Make the facets searchable when you click the “more” button, instead of blindly clicking through the pages

Add tags to articles

Tune Grant Discovery

Discovery search tune: Add Grant Abstract
Current: Grants (sort by relevancy score)
· Grant title (1pt)
· Awarded by (1pt)
· contributors (1pt)
· Abstract (1pt)

Track search keywords for data analysis

a. We were keeping track users’ search keywords in VIVO but we since new Scholars lost the function
b. We need to keep track their keywords that they are using for data analysis

Organizational hierarchy

Top level organization requires recursively get children organizations for inclusion. Include all faculty of child organizations.

Harvest afford for recursive sparql lookups.

Tune search results

Search Result Tune:
*search term case-insensitive
*boost on term frequency, not on field
*ability to supply sort for tie breaker

People (sort by relevancy score):
-Boost with last name and first name
• Research areas (2 pt)
• Overview (2 pt)
• Publication titles (1 pt)

Publications (sort by relevancy score)
-no boost
• Publication titles (1pt)
• Abstracts (1pt)
• Journal title (1pt)
• Keywords

Grants (sort by relevancy score)
-no boost
• Grant title (1pt)
• Awarded by (1pt)
• contributors (1pt)

Awards (sort by relevancy score)
-no boost
• Award title (1pt)
• Conferred by (1pt)

Courses (sort by relevancy score)
-no boost
• Course title (1pt)
• Participants (1pt)
• Add Participant facet

Concepts (sort by relevancy score)
-no boost
• Concept (1pt)
• Research area of (1pt)

Research Overview (sort by relevancy score):
-no boost
• Overview (1 pt)

Research Idea (sort by relevancy score):
-no boost
• Idea title (1 pt)
• Idea owner (1 pt)
• Description (1 pt)
• keywords (1 pt)

Customize number range facets

image

Altmetric

range 50
minimum 1 
maximum 500

0    (n)
1-49
50-99
...
450-499
500+ (n)

Citation

range 100
minimum 1 
maximum 1000

0    (n)
1-99 (n)
100-199
...
1000+ (n)

Add "In the News" tab in faculty profile

https://vivo.library.tamu.edu/vivo/display/nidea00000001https://vivo.library.tamu.edu/vivo/individual?uri=http%3A%2F%2Fscholars.library.tamu.edu%2Fvivo%2Findividual%2Fnhttp%3A%2F%2Fnews.cci.fsu.edu%2Fcci-in-the-news%2Fischool-doc-student-dong-joon-lee-spends-summer-at-purdue-university

See abstract.

Will require list of nested object news on person with relevant properties. Needs SparQL to retrieve values. Then update to person display view with an additional tab for In the News.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.