usnistgov / oar-pdr Goto Github PK

The NIST Open Access to Research (OAR) Public Data Repository (PDR) system software

Shell 2.17% Python 53.03% Dockerfile 0.19% TypeScript 24.21% JavaScript 0.08% CSS 6.32% HTML 3.65% Java 5.48% Less 0.04% SCSS 4.83%

python data-management data-repository javascript java typescript data-publisher

oar-pdr's Introduction

Publishing Data Repository (oar-pdr)

This repository provides the implementation of the NIST Publishing Data Repository (PDR) platform, the technology that provides the NIST Data Publishing Repository (DPR).

The software provided in this repository has two main parts:

The Landing Page Service -- this converts JSON metadata descriptions into viewable HTML presentations. It is written in Typescript using the Angular web application framework.
Publishing services -- this set of services, written in Python, provides web services that support the publishing process, including providing pre-publication metadata and preserving datasets.

java/      --> Java source code (none at this time)
python     --> Python source code for the metadata and preservation
                services
angular    --> Angular (Typescript) source code for the landing page
                service
scripts    --> Tools for running the services and running all tests
oar-build  --> general oar build system support (do not customize)
docker/    --> Docker containers for building and running tests

Prerequisites

Landing page service

As a Javascript/Typescript application, this product is built and run using node and npm. Both become available by installing,

node 8.9.0 or higher

All prerequisite Javascript modules needed are provided via the npm build tool. See angular/package.json for a listing of primary dependencies and angular/package-lock.json for a complete listing of all dependencies.

Publishing services

The publishing services are built and run using Python (supporting versions 2.7.11 through 2.7.13).

The oar-metadata package is a prerequisite which is configured as git sub-module of this package. This means after you clone the oar-pdr git repository, you should use git submodule to pull in the oar-metadata package into it:

git submodule update --init

See oar-metadata/README.md for a list of its prerequisites.

In addition to oar-metadata and its prerequisites, this package requires the following third-party packages:

multibag-py v0.4 or later
bagit v1.6.X
fs v2.X.X

Acquiring prerequisites via Docker

As an alternative to explicitly installing prerequisites to run the tests, the docker directory contains scripts for building a Docker container with these installed. Running the docker/run.sh script will build the containers (caching them locally), start the container, and put the user in a bash shell in the container. From there, one can run the tests or use the jq and validate tools to interact with metadata files.

Building and Testing the software

This repository provides two specific software products:

pdr-lps -- the Landing Page Service
pdr-publish -- the publishing services

Simple Building with `makedist`

As a standard OAR repository, the software products can be built by simply via the makedist script, assuming the prerequisites are installed:

  scripts/makedist

The built products will be written into the dist subdirectory (created by the makedist); each will be written into a zip-formatted file with a name formed from the product name and a version string.

The individual products can be built separately by specifying the product name as arguments, e.g:

  scripts/makedist pdr-lps
  scripts/makedist pdr-publish

Additional options are available; use the -h option to view the details:

  scripts/makedist -h

Simple Testing with `testall`

Assuming the prerequisites are installed, the testall script can be used to execute all unit and integration tests:

  scripts/testall

Like with makedist, you can run the tests for the different products separately by listing the desired product names as arguments to testall. Running testall -h will explain available command-line options.

Building and Testing Using Native Tools

The makedist and testall scripts are simply wrappers around the native build tools for the products--namely, npm and python. You can use these tools directly to build and test. Consult the README.md files in the angular and python directories for more details.

Building and Testing Using Docker

Like all standard OAR repositories, this repository supports the use of Docker to build the software and run its tests. (This method is used at NIST in production operations.) The advantage of the Docker method is that it is not necessary to first install the prerequisites; this are installed automatically into Docker containers.

To build the software via a docker container, use the makedist.docker script:

  scripts/makedist.docker

Similarly, testall.docker runs the tests in a container:

  scripts/testall.docker

Like their non-docker counterparts, these scripts accept product names as arguments.

Running the services

Consult the README.md files in the angular and python directories for details on how to launch the services provided by the software products.

License and Disclaimer

This software was developed by employees and contractors of the National Institute of Standards and Technology (NIST), an agency of the Federal Government and is being made available as a public service. Pursuant to title 17 United States Code Section 105, works of NIST employees are not subject to copyright protection in the United States. This software may be subject to foreign copyright. Permission in the United States and in foreign countries, to the extent that NIST may hold copyright, to use, copy, modify, create derivative works, and distribute this software and its documentation without fee is hereby granted on a non-exclusive basis, provided that this notice and disclaimer of warranty appears in all copies.

THE SOFTWARE IS PROVIDED 'AS IS' WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND FREEDOM FROM INFRINGEMENT, AND ANY WARRANTY THAT THE DOCUMENTATION WILL CONFORM TO THE SOFTWARE, OR ANY WARRANTY THAT THE SOFTWARE WILL BE ERROR FREE. IN NO EVENT SHALL NIST BE LIABLE FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES, ARISING OUT OF, RESULTING FROM, OR IN ANY WAY CONNECTED WITH THIS SOFTWARE, WHETHER OR NOT BASED UPON WARRANTY, CONTRACT, TORT, OR OTHERWISE, WHETHER OR NOT INJURY WAS SUSTAINED BY PERSONS OR PROPERTY OR OTHERWISE, AND WHETHER OR NOT LOSS WAS SUSTAINED FROM, OR AROSE OUT OF THE RESULTS OF, OR USE OF, THE SOFTWARE OR SERVICES PROVIDED HEREUNDER.

oar-pdr's People

Contributors

Stargazers

Watchers

Forkers

rayplante deoyani skolli853780 chuanlin2018 pombredanne clintonkildepstein christopheradavis sshyran bmaranville bradley39e

oar-pdr's Issues

Angular/Typescript Code Conventions

We are currently in the process of putting into place stricter code style conventions for our Java code, so it is appropriate that we start setting down similar conventions for our Typescript code. This issue is for proposing a set of such conventions.

This proposal assumes these policies build on the conventions set out for Angular projects (e.g. file naming conventions, class naming conventions, etc.). Exceptions to conventions are expected (but rare) to improve readability.

Spacing

Indentation and Line Length

Standard indentation is 4 spaces; indentation must not include TAB (Ctrl-I) characters.
Avoid statements longer than 120 characters. Longer statements should be either continued onto another line with line-breaks that optimize readability or otherwise broken into multiple, shorter statements.
The last character of a source file should be a new-line (i.e. "return") character.

Classes, Functions, and Decorators

Each decorator of a class, function, or variable (e.g. @Component({...) should begin on a new line.
Decorators should be considered conceptually part of signature line they decorate: there should be no blank line between a decorator block and it class of function signature.
When a decorator includes with an object argument (e.g. {...}), each property of the object should appear on
a separate line and be indented by the standard amount (4); the closing object brace ({) and parenthesis (() should appear on its own line.

This example illustrates the above three recommendations:

@Component({
    moduleId: module.id,
    selector: 'pdr-headbar',
    templateUrl: 'headbar.component.html',
    styleUrls: ['headbar.component.css']
})
export class HeadbarComponent {

When a function signature fits on one line, the opening brace ({) should appear on the same line as the signature. In this case, it is recommended that a blank line be inserted after after the brace and before the first line of the body for better readability.
When a function signature does not fit on a single line:
- continue signature on multiple lines, breaking after the comma following an argument
- subsequent argument lines should be indented to the column that the first argument appears in.
- the closing function parenthesis should appear on the same line as the last argument
- the opening function body brace should appear on its own line, aligned with the start of the function signature.
- the return type may appear on the same line as the last argument or on its own line; in the latter case, it should be indented four spaces from the start of the function signature

Multi-line function signature:

    public setDownloadStatus(resid: string, filePath: string, 
                             downloadedStatus: string = "downloaded") : boolean 
    {
        this.restore();

When a class signature fits on one line, the opening brace should appear on the same line as the signature. In this case, it is recommended that a blank line be inserted after after the brace and before the first line of the body for better readability.

Comments and Documentation

The term "in-line documentation" here is intended to to refer to comment blocks that could be extracted and converted into human-readable API documentation (like with javadoc for Java code). We do not currently use a documentation extractor with our Typescript code; however, we may in the future. Nevertheless, extractable documentation markup is on can be highly readable and provides conventions for indicating what is being described.

In-line class, interface, function, and variable documentation should follow the Java in-line documentation conventions, using /** ... */ comment blocks.
In-line class, interface, function, and variable documentation should appear immediately above both the signature and any decorators; the comment opener, /**, should be indented to align with the start of the class, interface, function, or variable.

This example illustrates the above two recommendations:

/**
 * a data structure describing a file in the cart.  A CartEntryData object, in effect, is a NerdmComp 
 * that *must* have the filePath property and is expected to have some additional 
 * data cart-specific properties.  
 */
export interface DataCartItem {

    /**
     * a local identifier for resource.  This must not start with an underscore as this is reserved.  
     */
    resId? : string;

In-line function documentation is recommended for any function marked with the public modifier.
In-line class documentation is recommended for any exported class, interface or function.
In-line variable documentation is recommended for properties of an exported interface.
In-line class documentation is highly recommended for any exported component class (i.e., marked with the @Component decorator). It should briefly summarize what visually appears in the component when it is rendered.

Example Component documentation:

/**
 * A component for displaying access to landing page tools in a menu.
 * 
 * Items include:
 * * links to the different sections of the landing page
 * * links to view or export metadata
 * * information about usage (like Citation information in a pop-up)
 * * links for searching for similar resources
 */
@Component({
    selector: 'tools-menu',
    template: ;tools-menu.html',
    styleUrls: ['./toolmenu.component.css']
})
export class ToolMenuComponent implements OnChanges {

jq docker install failing: automatic LF/CRLF conversion is foiling checksums

The metadata/docker directory contains Docker containers for installing all the needed dependencies and running the metadata tests. We found that the container that installs jq is failing because of a failed checksum. This occurs when the repository is cloned from usnistgov/oar-pdr.

The metadata/docker/jq directory contains the files for building the jq container. The Dockerfile downloads jq as a binary from the official jq site, so it also does a checksum check on the downloaded executable. A file storing the binary's checksum, jq-sha256sum.txt, is included in the jq directory. To check for unexpected (or malicious) changes to the checksum file, a checksum check is also done on the checksum file, comparing the checksum file in our repository with the one on the jq web site. Unfortunately, git will sometimes convert line-feeds in text files between the LF and CRLF conventions, depending on the platform one is working on. This has occurred somewhere in the merging into the integration branch. As a result, the file is not exactly the same as the checksum file at the jq website, and the checksum fails.

This extra test of the checksum file is probably redundant and can be removed. It was meant catch unnoticed changes not only in the binary but also in the checksum file either in the repo or at its original website, but on further reflection, I'm not sure it is providing any extra protection over just having a copy of the checksum file in the repo.

Not matching schema causes error only at render step for NERDm records

We made some (nominally) NERDm records and uploaded to the dev API, and they were accepted. It was only when trying to render them with the angular server /lps endpoint that we discovered a problem - the page returned an error and did not render the record. The error trace said something about a value being undefined.

After some slow investigation and looking at the schema, I found that a) the problem was caused by us not complying with the schema and b) it was because records contained the top-level key "keywords": string[] instead of the required "keyword": string[]

We already made the change to use the key keyword on our end, but I wonder what the appropriate place to validate the records is? At render-time is not ideal. Would it make sense to validate as part of the ingestion, returning an error if the record does not match the schema? I feel like that is a better guarantee of consistency than asking submitters to validate before PUT/PATCH on a record.

withdrawn

Why does every US Government agency need a different one of these?

I'm just wondering why every government agency needs to have a different one of these...

https://github.com/pacifica/pacifica (the one I work on)
https://github.com/ckan/ckan (other gov agencies)
- https://github.com/GSA/data.gov/ (with some word press?)
https://github.com/GetDKAN/dkan2 (Drupal CKAN)
https://github.com/nasa/data-nasa-gov-frontpage (It'd be nice to know what's behind this)
https://github.com/doecode/doecode (code repo but holds data too)
There's more NIH has a bunch
https://github.com/ARM-DOE/

It'd be nice to have a standard way government can build a data repository, so we can all use the same technology in the same way.