The gooserocket from eltonlaw

Find all start and stop codons in an RNA/DNA sequence

Methionine usually indicates the start of a protein, encoded as AUG
UAA, UAG, and UGA are usually the stop codons

On ROLLBACK_COMPLETE, delete stack

ServiceError(ServiceError { source: Unhandled(Unhandled { source: ErrorMetadata { code: Some("ValidationError"), message: Some("Stack:arn:aws:cloudformation:us-east-1:208577793574:stack/gr-jupyter-nb-1/0dcb6520-e6c6-11ed-bad7-0ada475864a5 is in ROLLBACK_COMPLETE state and can not be updated."), extras: Some({"aws_request_id": "94cfc515-a1fe-4b5f-b24b-461b658e954e"}) }, meta: ErrorMetadata { code: Some("ValidationError"), message: Some("Stack:arn:aws:cloudformation:us-east-1:208577793574:stack/gr-jupyter-nb-1/0dcb6520-e6c6-11ed-bad7-0ada475864a5 is in ROLLBACK_COMPLETE state and can not be updated."), extras: Some({"aws_request_id": "94cfc515-a1fe-4b5f-b24b-461b658e954e"}) } }), raw: Response { inner: Response { status: 400, version: HTTP/1.1, headers: {"x-amzn-requestid": "94cfc515-a1fe-4b5f-b24b-461b658e954e", "date": "Sat, 29 Apr 2023 19:50:51 GMT", "content-type": "text/xml", "content-length": "421"}, body: SdkBody { inner: Once(Some(b"<ErrorResponse xmlns="http://cloudformation.amazonaws.com/doc/2010-05-15/\">\n \n Sender\n ValidationError\n Stack:arn:aws:cloudformation:us-east-1:208577793574:stack/gr-jupyter-nb-1/0dcb6520-e6c6-11ed-bad7-0ada475864a5 is in ROLLBACK_COMPLETE state and can not be updated.\n \n 94cfc515-a1fe-4b5f-b24b-461b658e954e\n\n")), retryable: true } }, properties: SharedPropertyBag(Mutex { data: PropertyBag, poisoned: false, .. }) } })

CLI command to spin up AWS batch compute environment and job queue for grabbing data.

Fargate with spot instances

More fine-grained tuning of CF create/update-stack

At the moment, create/update just always pass in capabilities for creating IAM resources. Not a security concern but it would probably be more correct to have it passed in only for the stacks that need it.

Jupyter notebook server should be robust

Should be able to restart on sporadic failure or to handles ooms etc.

Putting this into a systemd service might be solution, would need to figure out how to bake the service file into the image (s3)

Better cli shutdown

./cli shutdown all: Should delete all gr-jupyter-nb-* stacks
./cli shutdown: Should just delete user deployed stacks

Add tracing and export to honeycomb.io

Replace all println's with calls to tracing. Should be mostly drag and drop.

Using println will be tough to work with soon as we start increasing the number of instances that are up at any given time,. Need a place to aggregate all emitted telemetry.

Datadog, newrelic, cloudwatch all work but Honeycomb is nice cause it's free and you can run SQL queries on the emitted data right out of the box

Tasks:

Replace printlns with tracing calls
API token secret needs to be saved somewhere accessible when deploying resources and testing local dev.
Add honeycomb tracing layer as part of tracing init
(optionally) instrument some of the existing functions

Rename notebook instance stack so its deployed by user

This way each user can deploy their own stack, at the moment it just unconditionally deploys a gr-jupyter-nb-1. Probably want to fix this by appending a username so the stack becomes gr-jupyter-nb-<username>. Username can be parsed from aws sts get-caller-identity

Expire temporary objects in raw data

For data that isn't necessary to keep long term

Raw data that won't be used anymore
Temp processed results

Setup s3://gr-data/tmp with object lifecycle policy

Genome alignment service/implementation

Either writing a shell call or setting up a lambda/ecs service to run something that already exists:

Or handrolling one of the simpler algorithms:

Needleman-Wunsch
Smith-Waterman

Set up openvpn server

AC:

Be able to turn it on and off via cli
Be able to update it
Build the image with ImageBuilder
Openvpn instance SG only allows traffic from 1194
Fix up existing security groups so traffic is allowed only from the VPC

Initial struct representing DNA/RNA sequence

AC

Should be able to handle infinitely large sequences
Init with file

s3::HeadObject https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html
s3::GetObject https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.htm
s3::PutObject https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html

Align: Use a reference genome to find out the location of each read in the genome
- #9
Variant calling: Get differences between target genome and the reference genome
Annotation: For each variant, search public data for its potential functional effects
Filter: Filter annotations for most likely to be disease causing
Prioritization: Supplement variant analysis with target's clinical history and family history.

eltonlaw / gooserocket Goto Github PK

gooserocket's People

Contributors

Stargazers

Watchers

Forkers

gooserocket's Issues

Recommend Projects

Recommend Topics

Recommend Org