Git Product home page Git Product logo

hbvouroboros's Introduction

HBVouroboros automates sequencing-based HBV genotyping and expression profiling

HBVouroboros uses RNA-sequencing reads to infer HBV genotype, quantify HBV transcript expression, and perform variant calling of HBV genomes.

HBVouroboros, distributed under the GPL-3 license, is available at https://github.com/bedapub/HBVouroboros.

Installation and usage

Download the source code

git clone https://github.com/bedapub/HBVouroboros.git

Setup conda environment

## setup conda environment
cd envs; conda env create; cd -
## in case it has been installed, use the command below to update
## conda env update
conda activate HBVouroboros

Run an example

An out-of-box example can be run by starting the snakemake pipeline.

snakemake -j 99 --configfile config/config_template.yaml --use-envmodules ## use --use-conda if no R module is present

Run the pipeline with your own data

Create a config file by copying the template.

cp config/config_template.yaml config/config.yaml

Next, modify the config/config.yaml file to specify a sample annotation file, and make other changes if necessary.

Run HBVouroboros using unmapped reads from a Biokit output directory

This feature has been disabled now. It may be activated in the future.

Validating the sensitivity and specificity of HBVouroboros with RNAsim2

We created RNAsim2, a RNA-seq simulator to validate the sensitivity and specificity of HBVouroboros. See RNAsim2/README.md for details.

Known issues and solutions

What to do if conda environment initialization takes too long?

Above we use the default conda solver. If you suffer from slow speed of conda, consider using mamba, which is a drop-in replacement of conda.

If you met more issues, please raise them using the Issues function of GitHub.

hbvouroboros's People

Contributors

accio avatar dependabot[bot] avatar dingailum avatar lippunej avatar milad4849 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

dingailum

hbvouroboros's Issues

Adding user-friendly report

Use Rmarkdown and/or Jupyter ntoebook to add a user-friendly report, which include

  • mapping statistics
  • genotyping
  • visualization of genomes and features
  • SNP/SV information

See the biokit pipeline for inspirations

Remove wrappers and expose the workflow as a repository

Currently snakemake workflows are wrapped by python wrappers. For instance, HBVouroboros_build_refgenomes.py wraps
build_refgenomes/Snakefile in the HBVouroboros package. This can work, but it causes significant overhead in development, because after every change the HBVouroboros package needs to be installed to make the change effective.

An alternative is to abolish the need of a package, instead expose the workflow directly in a repository, as suggested by the Snakemake documentation. And the internal mpsnake pipeline provides an example as well.

Mapped read files are empty

I tried to run the pipeline with a dataset, but had several problems.

Firts I ran the pipeline without adjusting the config, except for the sample annotation file. After that I tried to set doPerSamp to True. With both configurations these files are empty and the pipeline fails. (See error message below)

  • 02_Sample_mapped_reads_1.fq.gz
  • 02_Sample_mapped_reads_2.fq.gz
# running normalization on reads: $VAR1 = [
          [
            '/gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz'
          ],
          [
            '/gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz'
          ]
        ];


Tuesday, May 23, 2023: 16:10:39 CMD: /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl --seqType fq --JM 10G  --max_cov 200 --min_cov 1 --CPU 1 --output /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/perSamp_trinity/02_Sample/trinity/insilico_read_normalization --max_CV 10000  --left /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz --right /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz --pairs_together  --PARALLEL_STATS
-prepping seqs
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -A -R 1  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz) >> left.fa
CMD: seqtk-trinity seq -A -R 2  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz) >> right.fa
Error, no records were correctly parsed from /dev/fd/63Thread 1 terminated abnormally: Error, cmd: seqtk-trinity seq -A -R 1  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz) >> left.fa died with ret 1280 at /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl line 793.
Error, no records were correctly parsed from /dev/fd/63Thread 2 terminated abnormally: Error, cmd: seqtk-trinity seq -A -R 2  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz) >> right.fa died with ret 1280 at /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl line 793.
Error, conversion thread failed at /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl line 336.
Error, cmd: /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl --seqType fq --JM 10G  --max_cov 200 --min_cov 1 --CPU 1 --output /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/perSamp_trinity/02_Sample/trinity/insilico_read_normalization --max_CV 10000  --left /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz --right /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz --pairs_together  --PARALLEL_STATS   died with ret 7424 at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 2869.
        main::process_cmd("/gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt"...) called at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 3422
        main::normalize("/gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FD"..., 200, ARRAY(0x55f6c69af7b0), ARRAY(0x55f6c69af7f8)) called at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 3362
        main::run_normalization(200, ARRAY(0x55f6c69af7b0), ARRAY(0x55f6c69af7f8)) called at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 1384
[Tue May 23 16:10:39 2023]
Error in rule run_trinity_perSamp:
    jobid: 125
    input: results/02_Sample_mapped_reads_1.fq.gz, results/02_Sample_mapped_reads_2.fq.gz
    output: results/perSamp_trinity/02_Sample/trinity/Trinity.fasta

I also tried to set doInputRef and doPerSamp to True, but then the pipeline couldn't start at all.

MissingInputException in rule get_ref_strain_gb_inpt in file /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/workflow/rules/align_reads.smk, line 335:
Missing input files for rule get_ref_strain_gb_inpt:
    output: results/inpt/inpt_strain.gb
    affected files:
        AB064313

As a reference I used the sampleAnnotation file under the .test folder and with this file the pipeline always worked.

Issues to be solved before publication

first priority

  1. convert steps in build_refgenomes.smk to a separate Python script. The user only needs to run it when an update of HBVdb is needed.
  2. Improve the visuals of the HTML report: with CSS.
  3. Make sure Docker image is correctly built and pushed

second priority

  1. rename biokit to bksnake
  2. document correct_bam
  3. check consistency: use shell directly if necessary, not run: shell(shell), for instance in varscan_vc.smk
  4. separate snakemake (.smk) files from python functions (.py)
  5. Update reference data in repository

multiqc with --force

I would use the option --force in multiqc because if one runs the same data set several times, without this option multiqc will create each time a new folder with another name which makes is impossible to fulfill the rule for the multiqc html report.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.