curly-octo-train's People
curly-octo-train's Issues
Start dataset pipeline guide document
Add tests to pre commit hooks
Or just when pushing to the upstream
Introduce structured configs for independent scripts
Set up pre-commit and add instructions
Support hydra multirun threading
Allow to copy profile instead of creating symbolic link
On mounted storages, creating symlinks might not be allowed so it would be good to easily switch between the linking and copying the simulation profile:
https://superuser.com/questions/1337257/clients-cant-create-symlinks-on-samba-share
When copying check for the existence file and compare md5 hash to prevent unnecessary work.
Move cmake to mamba dependencies and update readme
Create a script for random sampling from FASTQ file using the specified percentage of reads
Upgrade pre commit package versions
We want to upgrade the pre-commit packages as file is a bit of older date
Implement filtering by chromosome in sequencing simulation
Similar to filtering in assembler and graph dataset creation, we should be able to create a sequencing experiment only with chromosome matching filter
Use hardlink instead of symlink for simulator profile
It turns out that usage of hardlinks solves the problem for mounted storage much more simple than the intermediate copy
Update the pipeline config
Creating dataset in discrete stages in computationally expensive. We need to update global pipeline config in order to be able to do it in one continuous step and do the cleanup after every dataset is created. Store appropriate metadata along the step in order to enable reconstruction of the data, evaluation and cleanup
Raw zip should be removed from dataset and used for the actual PT files
Previously, all the input data from with PT graph file was constructed were saved in the "raw/" as ZIP file.
We want to remove it for couple of reasons:
- The practice shows that we don't have any use for it
- This only duplicates the data from the assembly step (maybe we can keep some metadata to know the origin but the entire package is unnecessary)
- It does not fit well with how dataset in PyTorch and PyG work (here, we want to place the PT in the raw directory)
Download CHM13 uses hydra config
Use hydra config to simplify downloading the chm13 with custom options
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.