neicnordic / bifrost Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1.14 MB

This is a service that enables job submissions to secure compute platforms from the open internet

Python 93.02% Shell 6.98%

bifrost's People

Contributors

Watchers

bifrost's Issues

Verify that files have been transferred successfully by verifying md5sum

Describe pain points or successes here until completion.

Figure out how to integrate local-EGA

Local EGA has been suggested to be used in the schizophrenia implementation plan (point 10) as well as the imputation implementation plan (point 11) for the final full scale version. This issue will serve as a placeholder until we can look closer at this issue.

Split by chromosome script

The splitByChromosome.sh script was added because the imputation server expects the input files to only contain a single chromosome, and as the name reveals, the script splits VCF files into separate chromosomes.

Encrypting input files before transfer

The input files for the imputation project are sensitive and should thus be encrypted before transfer to TSD is initiated. Currently the idea is to use the crypt4gh tool for this.

Write M2 documentation

Document everything before moving on to the next milestone.

Cron job for file transfer to compute/scratch disk

After the files have been successfully transferred to TSD they need to be moved from the /tsd/pXXX/data/durable/... disk to the disk where the compute will happen.
This will be handled by a cron job that will trigger a script that will handle the task.
Below is a preliminary outline of what steps needs to be done.

Make a cron job that runs a script that does the following:
Verify that files have been transferred successfully by verifying md5sum
Initiate file transfer of input files and yaml file to uniquely named directory
Log these events in a log file

And also write documentation when this is done:

Write documentation

Start job

Describe pain points or success here.

Figure out how to use ARC for the final full scale Nordic release

The plan is to use ARC to submit queries in a way that would make it possible to do computations on several of the sensitive compute infrastructures within one query.

ARC can delegate jobs to the correct sensitive compute infrastructure and submit the job to the clusters and collect the results.

ARC can also handle queuing and resource management.

How this would work in practice needs to be discussed.

The known obstacles at the moment are with authentication/authorization and privilege management.

Initiate file transfer of input files and yaml file to uniquely named directory

Describe pain points or successes here until completion.

Job config file fields

I suggest we use the yaml file format to make the job config file, it is readable and has a simple syntax.

I suggest that the config file should contain the following:

Job type: imputation or whatever kind of analysis that the schizophrenia project does
Input file md5sum: "45f6esf435se4f56w", this is a mechanism to verify that the transferred files have been transferred correctly. This solves both the issue of incomplete transfers as well as files corrupted during the transfer.

What more should it contain? Is there a better solution to the incomplete/corrupted files issue?

Write M3 documentation

Document everything before moving on to the next milestone.

Query submission requirements

What input data do we need per project?

Imputation project:
VCF files.

Schizophrenia project:
13/03 - No input data from the outside, only query file with parameters.

Verify md5sum and start job

This is all in progress, will close once enough documentation has been written.

Verify file integrity
Start job
Log these events
Write documentation

neicnordic / bifrost Goto Github PK

bifrost's People

Contributors

Watchers

bifrost's Issues

Recommend Projects

Recommend Topics

Recommend Org