Git Product home page Git Product logo

openhds-rest's People

Contributors

benjamin-heasly avatar brucemacleod avatar munk avatar wolfe-lienhardt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

openhds-rest's Issues

Refactor ResourceLinkAssembler to avoid instanceof

ResourceLinkAssembler uses instanceof to add extra links for objects of type ExtIdIdentifiable. This is brittle design.

Refactor ResourceLinkAssembler around an interface and separate implementations for Uuididenrtifiable and ExtIdIdentifiable. Let appropriate implementations get Autowired to appropriate controllers. Then dynamic dispatch will replace instanceof.

Support bulk POST as well as bulk GET

I (Ben) would like to allow bulk POST of data for populating the system. This will make it easier to integrate with large existing datasets, for project bootstrapping or migration.

Bulk requests will only make sense for POST, not PUT, because it will handle multiple records at once.

As with bulk GETs, this should make use of the same XML and JSON message converter beans used by the rest of the application.

This should use the same conventions for representing streams/collections that the application currently uses for GET. For XML, the root element has a plural entity name and contains a list of elements representing single entity records:

<entities>
  <entity>…</entity>
  …
</entities>

For JSON there is an array at the root, containing an object element for each single entity record:

[{}, {}, …]

The implementations will need to use a XML or JSON stream parsers to identify and buffer each whole record in memory. Then it can pass each record to the appropriate message converter bean for unmarshalling.

This approach will be similar to what clients have to do when they parse bulk data. For example, we used a similar approach for the openhds-tablet in Bioko.

The implementation might be a subclass of EntityIterator, perhaps named StreamParsingEntityIterator. The constructor could take an InputStream, an ObjectMapper, and a helper able to identify and buffer each whole record in memory, perhaps named StreamRecordParser.

It would be nice to implement this behavior in a Spring message converter. That way bulk inputs could be handled "magically" and REST controller methods could be implemented cleanly in terms of EntityIterator.

If this doesn't work out, we can implement this behavior explicitly with controller methods that accept the HTTP request and chew through it directly.

Work out database isolation for tests

Currently each integration test must clean up the databases to prevent leaking into other tests.

This is not the way.

We can use Spring DbUint support to set up and tear down the database for each test. I'm holding off on this because it will be a good synergy with Wolfe's master's project.

Revise SampleDataGenerator to generate structured data sets of various sizes.

Some motivation:

  1. Currently, SampleDataGenerator can generate one arbitrary data set, just enough to pass integration tests that require pre-existing data. The records created are ad-hoc and a bit messy.
  2. SampleDataGenerator has become a large class which imports all of the repositories. Since it touches all entities, it's a hot spot for merge conflicts.
  3. In order to generate valid data, SampleDataGenerator duplicates some service logic.
  4. It would be useful to generate structured data sets of various sizes, to facilitate project bootstrapping, client development, and performance testing.
  5. The SampleDataGenerator needs to know the order in which it's safe to create or delete entities.

So I'd like to revise the SampleDataGenerator.

Some goals:

  1. Accept command line arguments that specify data set size:
    • default is make sure essential data are in place (like the first User), but don't mess with existing data
    • a size argument can specify "generate a data set with at least this many records"
  2. Generate structured data in meaningful chunks, like whole-families.
  3. Use the services instead of duplicating service logic.
  4. Don't import every service. Break the problem into smaller chunks, perhaps LocationGenerator and FamilyGenerator
  5. Don't hard-code the ordering of entity creation / deletion. Instead, let smaller chunks like LocationGenerator declare some ordering information which the SampleDataGenerator can obey.

Add H5S links using extIds in addition to Uuids

Currently, we have H5S links that follow UUID references between entities.

Some entities also have extId ids. It would be useful to expose links based on extIds as well.

One caveat: extIds can change and won't always be unique. So extId links will have to be resolved like queries (0 or many results), rather than canonical locations for unique resources.

Factor LocationRestController PUT and POST behavior into superclass

LocationRestController implements proof of concept PUT and POST behavior.

We should implement most of this once, in the entity controller superclass.

Some controller subclasses may override to un-support these methods. More generally, we can protect these operations with authorizations at the service level.

Create JpaRepository type hierarchy that mirrors Services

I think it would make sense to have a repository hierarchy that mirrored the new service hierarchy where things like "findByExtId()" could be written once in the "AuditableCollectedRepository" and used by all entity specific repositories where finding by extId would be helpful.

Add a Dockerfile

With Spring Boot, our app config and deployment is pretty simple.

So it should be easy for us to write a Docker file that specifies a Docker image which will make it really easy to deploy the app.

Version 1 is write the Dockerfile with simple config like we use for testing. This may bundle-in dependencies like MySQL. Then set up DockerHub integration to trigger re-builds of the image whenever we commit to master.

Version 2 is to set up integration with our CI service (see #65) so that we can automatically deploy a container after each successful build.

Set up continuous integration

Set up a continuous build and deployment server. This will facilitate testing and collaboration.

It looks like people like Travis as a free, easy to use CI service:
http://stackshare.io/travis-ci

Version 1 is to set up the GitHub integration so that all pushes trigger the integrations tests. gradle test or similar.

Make site properties configurable

Instead of static site properties that can only be modified at deployment time, we should make site properties queryable and configurable through the REST API. This will facilitate integration with clients that need to discover the site properties.

This will also move towards my (Ben's) goal of deploying a vanilla instance of the openhds-rest server and doing all the site baseline/config/bootstrapping through the REST API. Down with obscure config files!

SiteProperty can be a simple UUID entity with a name, string-value pair.

At startup, we can pre-populate the SiteProperty table with required properties found the default site.properties file. These must be present for the application to work.

Then at runtime, the values can be customized and added to by a user with sufficient privileges.

Support queries by ExtId

For those entities that implement ExtIdentifiable, expose REST queries based on extId.

Note that extIds are not always unique, so extId queries must be allows to return collections.

ProjectCodes should accommodate code look-up as well as set-based value constraints.

The existing openhds-server has two kinds of "code" configuration in two files:

  • codes.properties enables simple name -> value lookup
  • value-constraint.xml enables constraints of the form “is value x a member of set Y”?

The ProjectCodes model and service should be extended accommodate both operations.

The ProjectCodes model currently has

  • codeName
  • codeValue

It should add

  • codeGroup
  • description

This model will accommodate all the data found in codes.properties and value-constraint.xml.

The ProjectCodes service should then expose operations of the form

  • lookup (codeName) -> value
  • predicate (codeGroup, value x) -> is x one of the codeValues in codeGroup?

The lookup assumes codeNames are still unique -- codeGroup is metadata, not a namespace.

So in summary, adding codeGroups to the model and service operations will make ProjectCodes more expressive and useful.

Adding a description to each code will just be handy. It's a good idea from the existing openhds-server.

Salt and hash passwords for Users and FieldWorkers

Currently User and FieldWorker passwords are stored as plain text.

Incorporate bcrypt (or better?) to salt and hash passwords.

Make the "password" fields transient and ignored by the database.

Only persist the salted, hashed "passwordHash" fields.

Add "raw" REST results

For large data transfers, we should add "raw" resource flavors.

These should be based on repository query methods that accept an insertDate range and return a Stream of results.

The results should be Marshalled incrementally and written directly to the response body.

The results should never be loaded into memory all at once.

Replace EntityControllerRegistry with Spring EntityLinks

I wrote an EntityControllerRegistry class which helps us map from entities to controllers. This makes it easy to build links for any given entity instance.

Turns out this is a solved problem with EntityLinks. So let's use it.

A nice think about EntityLinks is that the mapping from entity class to controller class is done with controller-level annotations, not explicit code.

UuidRestControler should support DELETE

Allow clients to make DELETE request for single records.

Internally, this would mark the record as "voided" and not actually delete it.

As a consequence, GET responses should exclude voided records.

Refactor Domain Hierarchy?

Do we want to change the hierarchy of the OpenHDS Domain classes to

UuidIdentifiable -> Auditable -> AuditableCreated (No extId) -> AuditableCollected (Has extId)

This would mean that anything collected would have extIds and that the "ExtIdentifiable" type may be redundant.

This hierarchy also means that UuidIdentifiable could be a class and not an interface.

The design would be cleaner/more straight forward - but I like I am overlooking something major in terms of motivation for the current way it's designed with UuidIdentifiable and ExtIdentifiable being separate interfaces.

I assumed the current design is the way it is because of the fact that there are 'type irregularities' like FieldWorker who has an extId but isn't collectable. I think it is desirable to have cases like this 'squashed' into the proposed hierarchy (i.e. making FieldWorker have a FieldWorkerId instead of an extId, etc).

We spoke about this in person - but I'd like to come to a solid conclusion on steps forward.

Create Super Tests for Services

Create a set of integration tests that test the fields shared among OpenHDS entities which can be extended for each specific entity.

This will allow us to write only 1 set of tests for the fundamental functionality of all the entities like get, get all, etc.

Entity specific implementations will come later as the domain documentation and specification is created with the services.

Add a category field for ErrorLogs

In Bioko we found it useful to query/filter for ErrorLogs by some category, like form type.

Add a category string to the ErrorLog model. Also expose this string as a query parameter in the ErrorLog resource.

Detect update conflicts? With record revision numbers?

Currently, changes to entities can be tracked by insertDateTime and lastModifiedDateTime. But update conflicts are not explicitly detected. Rather, the last update always wins.

Should the service layer attempt to detect revision conflicts between records? For example, it might use an optimistic locking design, in which clients submitting updates must also submit an expected timestamp or revision number as part of the update record. The update would only succeed if the expected value matched the current value in the server database.

Would lastModifiedDateTime timestamps be sufficient for detecting revision conflicts? I suspect these would lead to some racey edge cases. Should we then add an explicit revision number to each record?

Add Location Registration

Add the ability to PUT or POST a new Location. This should use a LocationRegistration DTO. This be part of the "inbound" boundary of the application.

HATEOAS link for each entity UUID

Generalize HATEOAS link building to apply to OpenHDS entities in a generic way. Each entity representation should contain:

  • actual content
  • "self" link
  • UUID links corresponding to "stubs" produced by our ShallowCopier

Swagger to allow exploration of the API

Swagger provides a UI for exploring REST APIs that works well with Spring Boot applications. As I understand it, it infers the various paths from the controllers and constructs examples for the request body. We use it at my work for providing easy, discoverable documentations for anyone who wants to integrate with a service, whether it's an internal team or an external vendor.

Replay Bioko data into openhds-rest

As a real-world test of openhds-rest, I (Ben) want to replay a large dataset from the Bioko CIMS project into an openhds-rest instance.

This will require an ETL tool.

The tool must reads records from the existing Bioko CIMS form database, convert each record to a JSON entity registration, and POST it to the openhds-rest instance.

This would be a good test for bulk POSTs. See #68 .

Pentaho Kettle is one possibility for this ETL work.

Talend Open Studio looks nicer to me.

Populate ProjectCodes repository at startup

When the server starts, it should read default project codes from a properties file, and add each code to the project code repository if it doesn't already exist.

This will ensure that required codes exist.

This should never over-write codes that have already been customized by a project.

Tests to verify correct/sensible rendering of HAL+Json, HAL+Xml

It took some fussing with message converters and dependencies to get good rendering of responses with HAL+Json and HAL+Xml. So we should add some integration tests that verify we are still getting good rendering.

Check for things like:

  • Paged responses include embedded collections with generic names like "locations", not implementaiton-specific names like "locationList".
  • Xml Links look like single elements with "rel" and "href" attributes (not nested elements)

English documentation of the REST API

The repository should include English prose documentation of the REST API.

This should include a broad overview, some expected usages, a summary of all current endpoints, and pictures.

Sample Registrations

Add /sampleRegistration resources for each entity.

These should serve up templates for clients that want to submit registrations, using JSON or XML as requested by the client.

These should include "flat" entities with all top-level fields represented. "uuid" fields that reference related entities should be filled in to reference the "UNKNOWN" entity records.

Factor out common test behaviors for entity controllers

Based on the current LocationRestControllerTest and UserRestControllerTest, factor out common behaviors to test for all RestControllers. These should include:

  • paged queries with a few parameters
  • single records at canonical location with correct and incorrect ids
  • PUT and POST new records with valid and invalid content and ids
  • PUT and POST update records with valid and invalid content and ids
  • correctly rejected unauthenticated user
  • correctly rejected unauthorized user
  • JSON input and ouput
  • XML input and output

And where appropriate:

  • single records at external id location with correct and incorrect ids

What "complex" operations should openhds-rest support

From a domain point of view, we need to know what operations openhds-rest needs to support.

Some simple operations, like create or update a single entity, seem obvious.

Others are more complex. For example, in the CIMS Bioko project, we had a concept of a "household", which incorporated all of the Census entities in a typical pattern, like Location 1:1 SocialGroup, and Relationships only recorded between household members and heads.

For Bioko, we had a dedicated endpoint that could register an Individual as a household "head" or "member", with many side effects following the household pattern. The "head" and "member" registrations are two examples of "complex" operations.

Should openhds-rest support these complex operations for household head and member? If so, should it explicitly model a "household"?

Are there other "complex" operations that openhds-rest should support?

Add "home" controller

Add a "home" controller that clients can hit first. This should return a simple greeting for content.

It should return links to each known controller. The "rels" should be plural entity names and the links should point to the base path for each corresponding controller.

It should also return a "self" link.

Use cases, documentation, and tests for essential functionality.

We would like to gather requirements related to essential functionality. These should be motivated by the health and demographics domain. They should have the flavor, "openhds-rest must be able to do at least such-and-such".

We should document these requirements in the repository. We should also write well-commented integration tests which play out each story and verify expected behavior.

Work out correct/sensible rendering of HAL as Json and Xml

Currently the application returns HAL-style Json and Xml which contains correct data and links.

But the text rendering looks odd to me. I want to figure out if this needs to be corrected, or if this is correctly obeying some HAL conventions.

For example, for paged data, the Json "_embedded" field contains a sub-field named after the run time type of a Java object, in this case locationList. This seems brittle. Shouldn't the sub-field have a well-known name that's independent of Java types?

_embedded: {
  locationList:
    0:  {
      uuid: "76ae08cb-a64d-44cc-9073-0212273c9ac3"
      insertBy: {
        uuid: "bb1bb44e-8b30-476e-ba2e-a7fdbfaa3a1e"
...

The same data in Xml is called "content" instead of "_embedded" and contains extra nested tags. Shouldn't the Xml resemble the Json more closely?

<content>
  <content>
    <uuid>76ae08cb-a64d-44cc-9073-0212273c9ac3</uuid>
    <insertBy>
      <uuid>bb1bb44e-8b30-476e-ba2e-a7fdbfaa3a1e</uuid>
...

Extract interface for ExtIdentifiable Entities

We should explicitly model ExtIds instead of leaving it up to each Entity class to declare extId fields on an ad-hoc basis.

This will help with H5S link building, making it easier to automatically add "rels" based on extIds.

It will also allow us to factor API queries by ExtId and put this controller logic in one place.

Add OpenHDS events module

Add the events module, which keeps track of system events and facilitates integration with external system.

IT for HATEOAS link traversal

Write an integration test to prove that we can follow links from the home controller and access entity data.

The test may not hard-code or build any Urls! It may only look up Urls by parsing HAL Json and looking for well known "rels" like "self" and "locations".

For example:

  1. get from the home controller
  2. follow "locations" rel and get
  3. follow "self" rel of the first location listed and get
  4. follow "insertBy" rel to the location's user and get
  5. make sure the user has a "self" rel.

Refactor findBy insert/void/modified service methods

Currently the service methods for findByInsertBy(), findByVoidBy(), findByModifiedBy() and their respective date methods all have very similar logic and heavy code duplication.

Will refactor the implementation so beneath the public facing methods so that common functionality is factored out and reused between the methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.