gooserocket's People
Forkers
ahnesthergooserocket's Issues
Find all start and stop codons in an RNA/DNA sequence
- Methionine usually indicates the start of a protein, encoded as AUG
- UAA, UAG, and UGA are usually the stop codons
On ROLLBACK_COMPLETE, delete stack
ServiceError(ServiceError { source: Unhandled(Unhandled { source: ErrorMetadata { code: Some("ValidationError"), message: Some("Stack:arn:aws:cloudformation:us-east-1:208577793574:stack/gr-jupyter-nb-1/0dcb6520-e6c6-11ed-bad7-0ada475864a5 is in ROLLBACK_COMPLETE state and can not be updated."), extras: Some({"aws_request_id": "94cfc515-a1fe-4b5f-b24b-461b658e954e"}) }, meta: ErrorMetadata { code: Some("ValidationError"), message: Some("Stack:arn:aws:cloudformation:us-east-1:208577793574:stack/gr-jupyter-nb-1/0dcb6520-e6c6-11ed-bad7-0ada475864a5 is in ROLLBACK_COMPLETE state and can not be updated."), extras: Some({"aws_request_id": "94cfc515-a1fe-4b5f-b24b-461b658e954e"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {"x-amzn-requestid": "94cfc515-a1fe-4b5f-b24b-461b658e954e", "date": "Sat, 29 Apr 2023 19:50:51 GMT", "content-type": "text/xml", "content-length": "421"}, body: SdkBody { inner: Once(Some(b"<ErrorResponse xmlns="http://cloudformation.amazonaws.com/doc/2010-05-15/\">\n \n Sender\n ValidationError
\n Stack:arn:aws:cloudformation:us-east-1:208577793574:stack/gr-jupyter-nb-1/0dcb6520-e6c6-11ed-bad7-0ada475864a5 is in ROLLBACK_COMPLETE state and can not be updated.\n \n 94cfc515-a1fe-4b5f-b24b-461b658e954e\n\n")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } })
CLI command to spin up AWS batch compute environment and job queue for grabbing data.
- Fargate with spot instances
More fine-grained tuning of CF create/update-stack
At the moment, create/update just always pass in capabilities for creating IAM resources. Not a security concern but it would probably be more correct to have it passed in only for the stacks that need it.
Jupyter notebook server should be robust
Should be able to restart on sporadic failure or to handles ooms etc.
Putting this into a systemd service might be solution, would need to figure out how to bake the service file into the image (s3)
Better cli shutdown
./cli shutdown all
: Should delete all gr-jupyter-nb-* stacks./cli shutdown
: Should just delete user deployed stacks
Add tracing and export to honeycomb.io
Replace all println's with calls to tracing. Should be mostly drag and drop.
Using println
will be tough to work with soon as we start increasing the number of instances that are up at any given time,. Need a place to aggregate all emitted telemetry.
Datadog, newrelic, cloudwatch all work but Honeycomb is nice cause it's free and you can run SQL queries on the emitted data right out of the box
Tasks:
- Replace printlns with
tracing
calls - API token secret needs to be saved somewhere accessible when deploying resources and testing local dev.
- Add honeycomb tracing layer as part of tracing init
- (optionally) instrument some of the existing functions
Rename notebook instance stack so its deployed by user
This way each user can deploy their own stack, at the moment it just unconditionally deploys a gr-jupyter-nb-1
. Probably want to fix this by appending a username so the stack becomes gr-jupyter-nb-<username>
. Username can be parsed from aws sts get-caller-identity
Expire temporary objects in raw data
For data that isn't necessary to keep long term
- Raw data that won't be used anymore
- Temp processed results
Setup s3://gr-data/tmp with object lifecycle policy
Genome alignment service/implementation
Either writing a shell call or setting up a lambda/ecs service to run something that already exists:
- https://github.com/BenLangmead/bowtie2
- https://github.com/bwa-mem2/bwa-mem2
- https://github.com/DaehwanKimLab/hisat2
- https://bioinformaticshome.com/tools/rna-seq/descriptions/GSNAP.html#gsc.tab=0
- https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
Or handrolling one of the simpler algorithms:
- Needleman-Wunsch
- Smith-Waterman
Set up openvpn server
AC:
- Be able to turn it on and off via cli
- Be able to update it
- Build the image with ImageBuilder
- Openvpn instance SG only allows traffic from 1194
- Fix up existing security groups so traffic is allowed only from the VPC
Initial struct representing DNA/RNA sequence
AC
- Should be able to handle infinitely large sequences
- Init with file
Setup browser connection to ec2 instance
Either ssh tunnelling or openvpn instance
OpenVPN is probably better for connecting to multiple instances
Programatically build jupyter image
Bake the deps install into cloud formation
Replicate https://doi.org/10.1038/nbt.2957
Things that need to be implemented:
- Some sort of RNA seq rust struct to store arbitrary sequences
- Write the api calls to be able to query public databases for RNA seq data (and cache into s3) (use serde json)
- Write code to calculate measures
- Avg measurement precision
- Titration order consistency
- Recovery of the expected A/B mixing ratio
- differential expression
- mutual information of sample titration
Impl s3 read/write utils
Create a gr-infra/src/aws/s3.rs and add fns to be able to read and write objects from s3. Take a look at how gr-infra/src/aws/cloudformation.rs is implemented. You will want to bring in this crate https://crates.io/crates/aws-sdk-s3
Function to kick off new ImageBuilder pipeline build
Set up credentials via metadata service on AWS ec2 instances
Need to be able to run aws api calls from inside the notebook instance. Configure aws credentials in jupyter notebook ec2 instance via https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-metadata.html
Setup an exome analysis pipeline
Given some person's genome/sequencing reads
- Align: Use a reference genome to find out the location of each read in the genome
- Variant calling: Get differences between target genome and the reference genome
- Annotation: For each variant, search public data for its potential functional effects
- Filter: Filter annotations for most likely to be disease causing
- Prioritization: Supplement variant analysis with target's clinical history and family history.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.