- Current members: Nicole Yoon, Kicheol Kim, Junhee Yoon, Ka-kyung Kim, Hyunmin Kim
- Please, leave a message in Discussions tab if you have any question and requests
- Please use docker image to analyze the data. AWS module is ready and Please ask to members for getting auth if AWS is needed
- Our data is located in S3 bucket
-
Finding potential biomarkers and therapeutic target for helping multiple sclerosis patients, reference: Cell type-specific transcriptomics identifies neddylation as a novel therapeutic target in multiple sclerosis
-
Phase 1
- Extracting significant signal from the dataset and finding Biomarker for early detection & progression
- Finding therapeutic target discovery based on biological dataset
-
Phase 2
- Finding and developing actual business ideas or a practical usage case to make this project for helping patients
- S3 Bucket (Ask to members)
- NAS for main data distribution
- Please refer to this repository for the controller: Snakemake GUI Controller
- Related docker sources:
Image name Location snakemake-gui-controller-image Link
- Please refer to this repository for AWS usage: AWS module repository
- Related docker sources:
Image name Location activation-score-batch-image Link deg-pipeline-batch-image Link feature-extraction-batch-image Link
- Please refer to this repository for Notebook usage: Notebook repository
-
Usage of docker container
-
4 images are needed to use services (notebook, pipelines, celery and redis)
-
We are using docker registry to distribute images, please refer to here
-
Docker compose option
docker-compose -f docker-compose.yaml up --build # composing up by the codes or docker-compose -f docker-compose.example.yaml up # composing up by using the registry
Jupyter notebook container
# Access jupyter notebook # Please use this in your browser after docker-compose up http://localhost:8888/token_number
Pipeline container
# Please use this in your browser after docker-compose up http://localhost/
multiple_sclerosis_proj's People
Forkers
kicheolkimmultiple_sclerosis_proj's Issues
Get MSigDB data
To have pathway activation score, we need to localize MSigDB datset
Keeping the concept of this project/fleshing out the contents
MLOps strategy
- Think about MLOps strategy with Hydra + AWS + Github Action
Composing up Jupyter notebook
Replace bash script to boto3 for aws_module
- Boto3 supports AWS control by python, current bash script needs to be replaced to boto
Upload normalized expression to S3 and R code to the repo
- Need to upload normalized expression data for each dataset(CD4, CD8, CD14, etc) to S3
- Need to upload Rcode (DEseq2) which was used to get DEG
Need help for ML/AI optimization
@kicheolkim It would be good if we have a new person for AI/ML support. We have knowledge about that obviously, but we need more advanced optimization techniques for ML model. What do you think? Do we have to find another person for this project?
Testing pull request with jupyter notebook
*Newly setup reviewNB for the repo, and need to be tested with jupyter code
Data/engineering preparation including github repository
- Setting Github repo
- Data preparation in the local PC or AWS (S3 bucket)
- Setup feather instead of CSV
- Make test-set for gene expression data with random samples
- Establishing engineering strategy (IDE, Jupyter notebook)
Keep updating engineering portion for collaborative environment
Establishing initial data analysis strategy
dataset summary (what we have, what we can do), data analysis strategy
github repository restructuring
Reporting initial result of data analysis
AWS lambda role in the project
Think about archiving
tar cvfz - ./ | aws s3 cp - s3://openkbc-test-lambda-bucket/test1.tar.gz
Celery + Redis for waiting snakemake
- Need progress page for monitoring snakemake log in controller
Finding other datasets for MS
publications, GTeX, GEO for validation or other analysis
Developing pipeline controller
Pipelines are working in a container through snakamake workflow. It needs a controller for user configuration and data input dynamically without up and down running a container.
Building database strategy for external cohorts
- AWS Aurora, DynamoDB or etc
Replacing pipeline/notebook containers to Batch/sageMaker in AWS module(Transition to AWS environment)
Currently, we are using docker instance to deploy notebook and it could be replaced with AWS sageMaker, possibly, pipeline container can be deployed in ECS.
csrf secret embedded in code
Memory issue in DEG pipeline
DEG pipeline has memory issue in reading all files
- Docker container has default memory limit(=2g).
- Attempted to change the limit to 4g, but failed (reference: docker/compose#4513)
- Needs code optimization
Adding R package environment in MAC conda env
#Error occurred in Mac conda-
dyld: Library not loaded: @rpath/libreadline.6.2.dylibSolution1 (FAILED)
Suggested solution: conda/conda/issues/6183 (github reference)
mamba update -c rdonnellyr -c main --all
Solution2 (Partly Succeeded)
Suggested solution: ContinuumIO/anaconda-issues/issues/11184 (github reference)
mamba create --name new-env r-essentials=3.6.1 #it works, not working with DEseq
Solution3 (Succeeded)
Suggested solution: Create r-essentials and using installer for R package(DEseq2, tximport). This would be temporary solution for R package
# First, create conda env with r-essentials mamba create --name new-env r-essentials=3.6.1 # Install packages with Rscript Rscript notebook/installers/installer_Rpackage.R
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
-