Git Product home page Git Product logo

ega-data-api's People

Contributors

afoix avatar anandmohan777 avatar ashutoshebi avatar blankdots avatar dtitov avatar gariem avatar jbygdell avatar jorizci avatar juhtornr avatar nanjiangshu avatar omllobet avatar silverdaz avatar viklund avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ega-data-api's Issues

Decypt the key stored in database and get using fileId

The logic has changed for profile spring.profiles.active=db to fetch the key from the database. It requires 2 datasource:

  1. The first datasource has a file_key table. It stores the mapping of FileId to KeyId
  2. The Second datasource has an encryption_key table. It stores the mapping of KeyId to the actual key.

The db actual key is encrypted so to decrypt it we use AesCtr256Ega class decrypt method and pass the db key + password decrypt key.

Update spring-security-oauth2 version due security issue

Description

GitHub notified that our current version of spring-security-oauth2 has security issue and we should update it. Read more from:

https://github.com/EGA-archive/ega-data-api/network/alert/ega-data-api-htsget/pom.xml/org.springframework.security.oauth:spring-security-oauth2/open
https://github.com/EGA-archive/ega-data-api/network/alert/ega-data-api-dataedge/pom.xml/org.springframework.security.oauth:spring-security-oauth2/open

Definition of Done

The spring-security-oauth2 is updated, all current test passes, possible code changes are reviewed by two developers and a PR is merged.

How to test

All current tests should pass. No need to implement new tests for this.

RES microservice doesn't support POSIX filesystem anymore

Description

RES microservice should support storage backends with Cleversafe, S3 and POSIX interfaces. In the past it was possible to specify which storage backend is suppose to be used in configuration. For example:

service.archive.class = GenericArchiveServiceImpl

The correct ArchiveService implementation was then chosen using this annotation

    @Value("${service.archive.class}")
    String archiveImplBean;

However this was removed in this commit and POSIX storage backend doesn't work anymore. This should be fixed.

Definition of Done

The POSIX filesystem storage backend works and there are unit tests that verifies it, code has to be reviewed by two developers and a PR has to be merged.

How to test

Create a unit test that mocks a Filedatabase service, POSIX fs and verifies that RES can encrypt and decrypt the data from POSIX fs.

Define the logging requirements for audit trail

Description

We need to have proper audit trail and log some mandatory information about the data use. Gather a list about the things that we are currently logging in Data API and discuss with EBI and CRG what they are logging.

Definition of Done

The document contains a list of thing that we currently have in Local EGA code and the list is reviewed by CEGA.

How to test

It is not needed to test the document. Peer review.

Documentation for Data-Out

Description

We would like to document the Data-Out part for LocalEGA, providing a description how to use it and how the components are tied together.
@AlexanderSenf @anandmohan777 - please provide some support for this issue or if there is some documentation available

DoD (Definition of Done)

Documentation Publicly available, a recommended resource where the documentation could be uploaded is: https://localega.readthedocs.io/en/latest/

Testing

Peer Review

Bring support for REMS dataset authorisation to Data API codebase

Description

We have implemented support for REMS dataset authorisation for Data API at CSC (Tryggve milestone 3). However the code is not merged to this codebase. We should merge and publish the code.

Definition of Done

Merge the existing code and implement unit test cases that verifies that the data authorisation works. the code has to be reviewed by two developers and a PR has to be merged.

How to test

Implement unit test cases that verifies that the data authorisation works.

Add TLS termination

Description

All communication between our services needs to be TLS terminated, including the cluster internal communication. As a result we can not rely on external PKI providers and thus must be able to supply a self signed CA root certificate unique for each deployment site.

Proposed solution

Add options for each service to set a TLS certificate and corresponding CA root certificate.
These certificated should be mounted in at runtime.

Definition of Done

All service communication is TLS terminated

Implement unit tests for RES microservice REST controller

Description

Implement a proper unit tests for RES microservice REST controller that makes sure that all endpoints are working as expected. All external components, like connection Key service, S3 or POSIX FS backend should be mocked.

There are existing test cases in this repository and most likely it makes sense to utilise these as unit test cases.

Definition of Done

The unit tests are implemented, reviewed by two developers and a PR is merged.

How to test

The implemented test cases passes.

Key service unit tests fails

After fresh git clone unit tests fails and mvn test says:

Results :

Failed tests: 
  MyCipherConfigTest.testGetAsciiArmouredKeys:91 null

Tests run: 34, Failures: 1, Errors: 0, Skipped: 0

This is the assert that fails. Investigate why this happens and fix the problem.

HEX conversions in RES

Description

This is related to crypt4gh header in the Database it is stored as HEX, however RES expected bytes, moreover when decrypting the header in parseHeader function the header is decrypted and the session key and IV are encoded into base64 but the LocalEGAServiceImpl expects Hex via getInputStream ... and this creates a confusion, of encoding and decoding.

Can we operate with just HEX for now or base64. - don't care much about the format let us just agree on one and use it.
A solution is proposed in https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-res/src/main/java/eu/elixir/ega/ebi/reencryptionmvc/service/internal/LocalEGAArchiveServiceImpl.java

Definition of Done

Allow RES to process headers as HEX strings from db, indifferent of the programming language of encoding. Proper error handling, as now it is difficult to spot where the header was not properly decrypted and why.

How to test

Unit test and integration tests added for headers that match the desired format and those that don't.

Disable Hystrix timeout

Currently, we have configuration property hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 30000 in microservices. This doesn't work at least for RES in the case when downloading process takes more than 30 secs.

After discussion with @anandmohan777, we decided to disable timeouts in all of the microservices by using hystrix.command.default.execution.timeout.enabled: false property.

Add authentication to all microservices

Description

All micro services should require proper authentication even if they are not exposed to outside.

Proposed solution

There is at least two different options:

  1. We can validate the access token that user passes to the frontend service (Dataedge or Zuul) and for example verify the token signature.

  2. We can implement new authentication mechanism that is used internally in Data API.

Definition of Done

The feature is implemented, tested, reviewed and merged.

Env variables missing in application YAML

Description

Some environment variables required to run FileDatabase and DataEdge are missing from application.yml This task would add them.
These settings were added in test/m4 branch, the task is to bring them to master:
https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-filedatabase/src/main/resources/application.yml
https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-dataedge/src/main/resources/application.yml

Definition of Done

Updated application.yml for FileDatabase and DataEdge

How to test

The new Env variables can be used. Unit test and Integration test pass for no-oss profile.

htsget API in DataEdge: error message when index file is missing

Description

DataEdge performs genomic queries via HTSJDK to answer htsget requests. This only works if there is an index file available for the BAM/CRAM/VCF/BCF file requested. Create an htsget compiat error message in the ticket server (/ticket/) response if there is no index .

Definition of Done

The correct error message (according to htsget spec) is implemented, there are unit tests that verifies that the error is provided if there's no index files, the code is reviewed by two developers and the PR is merged.

How to test

Create unit tests that performs queries to htsget ticket server endpoint and ensures that the correct error message is provided.

Fix AES encryption/decryption in RES microservice

Description

When we were deploying Data API at CSC we were not able to get AES encryption working with current RES implementation. When we used the older (~6 months) RES code it worked so we should make sure that the AES still works.

Also we need to create unit tests that verifies that AES encryption/decryption works.

Definition of Done

There are unit tests that verifies that AES works.

How to test

Create unit tests that mocks all external communication, like FileDatabase microservice and storage backend(s) and verifies that RES microservice can encrypt and decrypt data using AES128 and AES256 algorithms.

Bytes Length is confusing

Description

When retrieving the file from DataEdge the implementation extracts 16 bytes https://github.com/EGA-archive/ega-data-api/blob/master/ega-data-api-dataedge/src/main/java/eu/elixir/ega/ebi/dataedge/service/internal/RemoteFileServiceImpl.java#L782, a crypt4gh needs to extract 32 bytes as shown here: https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-dataedge/src/main/java/eu/elixir/ega/ebi/dataedge/service/internal/RemoteFileServiceImpl.java#L782

Then there are bytes removed again here: https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-dataedge/src/main/java/eu/elixir/ega/ebi/dataedge/service/internal/RemoteFileServiceImpl.java#L159

Then there is this prefix: https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-dataedge/src/main/java/eu/elixir/ega/ebi/dataedge/service/internal/RemoteFileServiceImpl.java#L774 of 16 bytes.

The explanation is here: https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-dataedge/src/main/java/eu/elixir/ega/ebi/dataedge/service/internal/RemoteFileServiceImpl.java#L188-L190 ?

The implementation is a confusing and could be better in the way it handles distributing plain and encrypted files.

Definition of Done

Figure out a flexible solution in dealing with the file distribution via DataEdge.

How to test

Unit test and integration tests for different file formats, encrypted or not with different algorithms.

Keyserver format check

Description

We should implement a check in the Keyserver that verifies that the key that is loaded is according to standards. e.g.

-----BEGIN PGP PRIVATE KEY BLOCK-----
Version: PGPy v0.4.3
xcaGBFwaI1ABEACnhcEON/zsnmpPYpPm8bNfonlmuVXQyGwYS9KuPTTqUPwrTeEV

vs

-----BEGIN PGP PRIVATE KEY BLOCK-----
Version: PGPy v0.4.3

xcaGBFwaI1ABEACnhcEON/zsnmpPYpPm8bNfonlmuVXQyGwYS9KuPTTqUPwrTeEV

According to standard https://tools.ietf.org/html/rfc4880#section-6.2 the blank line is required.

This is not an actual bug but an additional check that should be done and to properly report if anything is wrong when loading the configuration.

Definition of Done

Check if the key is in the proper format before loading it in the keyring.

How to test

Unit tests added to verify for improper key format validation.

Allow custom schemas and table names for FileDatabase

Description

Currently FileDatabase API allows custom schema names but the table for Files and File2Databse and IndexFiles and fixed names, we would like to have them configurable via e.g. env variables.

In test/m4 branch we added https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-filedatabase/src/main/java/eu/elixir/ega/ebi/downloader/domain/entity/File.java#L38 to specify the view used in LocalEGA for the files table, however we would like to make it configurable.

The same idea for: https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-filedatabase/src/main/java/eu/elixir/ega/ebi/downloader/domain/entity/FileDataset.java#L38
and https://github.com/EGA-archive/ega-data-api/blob/test/m4/ega-data-api-filedatabase/src/main/java/eu/elixir/ega/ebi/downloader/domain/entity/FileIndexFile.java#L38 and the rest of the tables.

Definition of Done

Add @Table(name = <configurable_table_name>) to FileDatabase API and updated configuration both for no-oss and default profile. <configurable_table_name> should have a default value, but can be modified via ENV variables.

How to test

Unit tests and Integration tests pass.

Retry fails on no-oss profile

Description

There seems to be a retry issue in DataEdge that affects the no-oss profile.
Full log: Untitled.txt

Definition of Done

Retry does not affect no-oss profile capabilities to query a service.

How to test

integration tests for no-oss profile.

Define FileDatabase interface with OpenAPI

Description

Write an OpenAPI specification for FileDataBase services that covers all the endpoints in the service and their queries and responses.

Definition of Done

The OpenAPI specification is reviewed by two developers.

How to test

It is not needed to test specification.

Create a plan how to deliver file-level permissions to Data API

This is an important long-term issue. The addition of API calls to fetch file level permissions violates the architecture of the Data API. [I apologize for never really describing the architecture and its goals and motivations .. I will do that shortly!]

The correct way to resolve this issue is: remove any external API calls from data-edge as well as htsget; move that call to the filedatabase service; resolve it internally, but leave the htsget and data-edge implementation untouched.

One of the aims of the Data API architecture is to have a unified, simple, and small code base for all API-supported functionality, which can be deployed without changes at any node (including EBI). An architectural devision was made to outsource any interfaces with local infrastructure to a supporting microservice: in case of listing files per dataset, that is the filedatabase service. If a local site wants to fetch file information from an API rather than a database, the architecturally correct location is file-database. That way the main functionality-providing code (data-edge, htsget), can remain identical).

The reason is to keep the main components of the code as simple and universal as possible. There shouldn't be 'dead' code (only used by a single deployment) in the main part of the API, even if it is just an extra implementation class! The first aim should be to assess whether deviating/new code is really necessary; and secondly to externalize it into the appropriate sub-microservice (that is the reason file-database, and key, ... even exist: to externalize differences in deployment and environment and to unify and simplify the main code components) .

Async doesn't work with Eureka

Expected Behavior

For the below code the AsyncRestTemplate could not able to resolve the FILEDATABASE_SERVICE service discovery through eureka.
AsyncRestTemplate.postForEntity(FILEDATABASE_SERVICE + "/log/download/", entity, String.class);

The above code works for RestTemplate.postForEntity(...)

Current Behavior

Throws java.net.UnknownHostException at line
https://github.com/EGA-archive/ega-data-api/blob/70bfd58080b912d8681c8e177355ef00e66b7f81/ega-data-api-dataedge/src/main/java/eu/elixir/ega/ebi/shared/service/internal/RemoteDownloaderLogServiceImpl.java#L72

Possible Solution

Used @async to write the logs asynchronously and changed AsyncRestTemplate to RestTemplate

Make it possible to Data API act as a htsget proxy

We have a use case in Tryggve project where user wants to stream data from multiple htsget endpoints but doesn't have network access to all of them due security reasons (so they are not visible to public network) but the Data API servers can access each others endpoints.

Therefore we need to have a proxy service that fetches the data from another endpoint, caches that and serves the content to the original requestor. So when user is going make following query to Data API

GET https://data-api.csc.fi/another-endpoint/data/reads

Data API should then fetch the data from another endpoint:

GET https://another-endpoint.se/data/reads

And serve the content to the original requestor.

In order to make this work it is necessary to have endpoint names and URL mapped in database or in configuration. In Data API the flow would be following:

  1. Data API gets the query
  2. Data API does database/configuration lookup and figures out the URL that has to be queried
  3. Data API checks if it already has the data cached
  4. a) Data API makes the query to another endpoint and returns the data
  5. b) Data API returns the data from cache
  6. Data API will cache the data if it's not already cached

Most likely Zuul service is the right component to do this feature.

Create development environment for the Data API

Description

Currently it is quite difficult to boot the whole Data API and develop that and there should be an easy way to start all the microservices and ease the development. This could done for example with Docker-compose (but if it is not necessary if we prefer to to use Maven on local machine).

Also we should have different configuration for development environment. For example we should expose JVM debug ports and tune Eureka service discovery timeouts.

Definition of Done

An easy way to boot Data API on local machine exists and the solution is documented (e.g. in README file)

How to test

No tests are needed for this issue.

java.net.SocketException Connection Reset on Download

I am using the EgaDemoClient.jar and am getting a Connection reset when trying to download a request. Java has access to the internet and a ega login is successful, however the request seems to fail when downloading. Specifically am trying to download dataset EGAD00000000013.

Tried with varying number of threads and all attempts failed.

It should be noted that shell says that client is version 2.2.3 while the actual files download indicate it is version 2.2.2

Add new endpoint to Keyservice which enumerates standards (PGP, AES etc.)

Description

Add a new endpoint /keys/formats that returns a list of supported key formats.

Definition of Done

Implement a new endpoint to KeyController that checks which formats are supported and return a list of supported formats.

How to test

Implement a unit test that makes a query to endpoint and validates the result.

getPublic key URL does not work with keyType for crypt4gh

Description

Current endpoint for retrieving PublicKey does not work for crypt4gh using keyType https://github.com/EGA-archive/ega-data-api/blob/master/ega-data-api-res/src/main/java/eu/elixir/ega/ebi/reencryptionmvc/service/internal/KeyServiceImpl.java#L66

That line should be restTemplate.getForEntity(KEYS_SERVICE + "/keys/retrieve/{keyId}/public", String.class, id).getBody(); as done in test/m4 branch.

Definition of Done

Retrieving public key fro crypt4gh should work and also for other key types aes.

How to test

Retrieval of public key is possible. and unit + integration tested .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.