Git Product home page Git Product logo

bifrost's People

Contributors

oskarvid avatar radmilak avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bifrost's Issues

Split by chromosome script

The splitByChromosome.sh script was added because the imputation server expects the input files to only contain a single chromosome, and as the name reveals, the script splits VCF files into separate chromosomes.

Cron job for file transfer to compute/scratch disk

After the files have been successfully transferred to TSD they need to be moved from the /tsd/pXXX/data/durable/... disk to the disk where the compute will happen.
This will be handled by a cron job that will trigger a script that will handle the task.
Below is a preliminary outline of what steps needs to be done.

  • Make a cron job that runs a script that does the following:
  • Verify that files have been transferred successfully by verifying md5sum
  • Initiate file transfer of input files and yaml file to uniquely named directory
  • Log these events in a log file

And also write documentation when this is done:

  • Write documentation

Figure out how to use ARC for the final full scale Nordic release

The plan is to use ARC to submit queries in a way that would make it possible to do computations on several of the sensitive compute infrastructures within one query.

ARC can delegate jobs to the correct sensitive compute infrastructure and submit the job to the clusters and collect the results.

ARC can also handle queuing and resource management.

How this would work in practice needs to be discussed.

The known obstacles at the moment are with authentication/authorization and privilege management.

Job config file fields

I suggest we use the yaml file format to make the job config file, it is readable and has a simple syntax.

I suggest that the config file should contain the following:

  • Job type: imputation or whatever kind of analysis that the schizophrenia project does
  • Input file md5sum: "45f6esf435se4f56w", this is a mechanism to verify that the transferred files have been transferred correctly. This solves both the issue of incomplete transfers as well as files corrupted during the transfer.

What more should it contain? Is there a better solution to the incomplete/corrupted files issue?

Query submission requirements

What input data do we need per project?

Imputation project:
VCF files.

Schizophrenia project:
13/03 - No input data from the outside, only query file with parameters.

Verify md5sum and start job

This is all in progress, will close once enough documentation has been written.

  • Verify file integrity
  • Start job
  • Log these events
  • Write documentation

"No supported encryption method" when decrypting scz inputs

When I run the readConfig.py script on an encrypted "schizophrenia" input file it prints the error message No supported encryption method and continues with the rest of the script. This error is not printed when running an imputation job.

Job submit script

The job submit script has been created now, it needs to be edited with all of the schizophrenia related code. Currently it has some placeholders.

Decide where to transfer input files

When the input files have been transferred to TSD they need to be moved to a scratch disk of some sort, as of right now it is not known where this will be.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.