google / deepconsensus Goto Github PK

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

License: BSD 3-Clause "New" or "Revised" License

Python 77.96% Shell 1.18% Dockerfile 0.27% Jupyter Notebook 20.59%

bioinformatics transformers deep-learning long-read-sequencing

deepconsensus's Introduction

DeepConsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

This results in greater yield of high-quality reads. See yield metrics for results on three full SMRT Cells with different chemistries and read length distributions.

Usage

See the quick start for how to run DeepConsensus, along with guidance on how to shard and parallelize most effectively.

`ccs` settings matter

To get the most out of DeepConsensus, we highly recommend that you run ccs with the parameters given in the quick start. This is because ccs by default filters out reads below a predicted quality of 20, which then cannot be rescued by DeepConsensus. The runtime of ccs is low enough that it is definitely worth doing this extra step whenever you are using DeepConsensus.

Compute setup

The recommended compute setup for DeepConsensus is to shard each SMRT Cell into at least 500 shards, each of which can run on a 16-CPU machine (or smaller). We find that having more than 16 CPUs available for each shard does not significantly improve runtime. See the runtime metrics page for more information.

Where does DeepConsensus fit into my pipeline?

After a PacBio sequencing run, DeepConsensus is meant to be run on the subreads to create new corrected reads in FASTQ format that can take the place of the CCS/HiFi reads for downstream analyses.

For variant-calling downstream

For context, we are the team that created and maintains both DeepConsensus and DeepVariant. For variant calling with DeepVariant, we tested different models and found that the best performance is with DeepVariant v1.5 using the normal pacbio model rather than the model trained on DeepConsensus v0.1 output. We plan to include DeepConsensus v1.2 outputs when training the next DeepVariant model, so if there is a DeepVariant version later than v1.5 when you read this, we recommend using that latest version.

For assembly downstream

We have confirmed that v1.2 outperforms v0.3 in terms of downstream assembly contiguity and accuracy. See the assembly metrics page for details.

How to cite

If you are using DeepConsensus in your work, please cite:

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

How DeepConsensus works

Watch How DeepConsensus Works for a quick overview.

See this notebook to inspect some example model inputs and outputs.

Installation

From pip package

If you're on a GPU machine:

pip install deepconsensus[gpu]==1.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"

If you're on a CPU machine:

pip install deepconsensus[cpu]==1.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"

From Docker image

For GPU:

sudo docker pull google/deepconsensus:1.2.0-gpu

For CPU:

sudo docker pull google/deepconsensus:1.2.0

From source

git clone https://github.com/google/deepconsensus.git
cd deepconsensus
source install.sh

If you have GPU, run source install-gpu.sh instead. Currently the only difference is that the GPU version installs tensorflow-gpu instead of intel-tensorflow.

(Optional) After source install.sh, if you want to run all unit tests, you can do:

./run_all_tests.sh

Disclaimer

This is not an official Google product.

NOTE: the content of this research code repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.

deepconsensus's People

Contributors

Stargazers

Watchers

deepconsensus's Issues

Error: StopIteration

Hi,
I'm getting this error. What is the cause of this?
Thanks.

singularity run -W /data -B /scratch/projects/bin/deepconsensus/model:/model -B pwd /scratch/projects/bin/deepconsensus/deepconsensus_0.3.1.sif deepconsensus run --batch_size=1024 --batch_zmws=100 --cpus 1 --max_passes 20 --subreads_to_ccs=blah.subreads_to_ccs.0018.bam --ccs_bam=blah.ccs.0018.bam --checkpoint=/model/checkpoint --output=blah.fastq

=================================================================
Total params: 8,942,667
Trainable params: 8,942,667
Non-trainable params: 0
_________________________________________________________________
I0809 10:21:26.338892 140397309962048 model_utils.py:231] Setting hidden size to transformer_input_size.
I0809 10:21:26.339057 140397309962048 quick_inference.py:484] Finished initialize_model.
I0809 10:21:26.339549 140397309962048 quick_inference.py:738] Model setup took 1.790560245513916 seconds.
Traceback (most recent call last):
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/preprocess/utils.py", line 981, in proc_feeder
    ccs_bam_read = next(ccs_bam_h)
  File "pysam/libcalignmentfile.pyx", line 1874, in pysam.libcalignmentfile.AlignmentFile.__next__
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/bio/bin/deepconsensus", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/cli.py", line 111, in run
    app.run(main, flags_parser=parse_flags)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/cli.py", line 102, in main
    app.run(quick_inference.main, argv=passed)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py", line 814, in main
    outcome_counter = run()
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py", line 762, in run
    for zmw, subreads, dc_config in input_file_generator:
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py", line 428, in stream_bam
    for input_data in proc_feeder():
RuntimeError: generator raised StopIteration

Strand of Fastq output

Hi,
Deepconsensus takes as one of its input a CCS BAM file.

When the CCS BAM file is already aligned to the genome, does deepconsensus reverse complement the sequence for reads that align to the reverse strand of the genome, such that the output FASTQ of that molecule will be in the original unaligned orientation?

From what I can see, I think this is not happening, which is ok, but just wanted to confirm. Because this means that inputting an unaligned CCS BAM versus an aligned CCS BAM will produce a different FASTQ output, whereby reads that aligned to the reverse strand will have the reverse orientation in the deepconsensus FASTQ output.

deepconsensus-0.2.0 complains that subreads_to_ccs.bam is not indexed

Following the tutorial,

deepconsensus/docs/quick_start.md

Line 95 in f1413ee

/data/subreads_to_ccs.bam

, deepconsensus-0.2.0 complains that it cannot find the index for subreads_to_ccs.bam . Sorting with samtools sort and then samtools index fixes the warning, but is it necessary?

Running deepconsensus results in "free(): invalid pointer" error

I installed deepconsensus via pip in a virtualenv like this:

virtualenv /apps/deepconsensus/1.0.0/python-3.8.2_cpu
source /apps/deepconsensus/1.0.0/python-3.8.2_cpu/bin/activate
pip install pyyaml==5.4.1 'deepconsensus[cpu]==1.0.0'

I used pyyaml==5.4.1 since tf-models-official 2.10.0 requires pyyaml<6.0,>=5.1. I'm using Python 3.8.2.

When I run deepconsensus, even just for the help message, it fails--deepconsensus -h resulted in this error message:

2022-11-10 12:51:17.016037: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
*** Error in `/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python': free(): invalid pointer: 0x00007f075c296c80 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x7f078b91d329]
/lib64/libstdc++.so.6(_ZNSt6locale5_Impl16_M_install_facetEPKNS_2idEPKNS_5facetE+0x142)[0x7f075c000192]
/lib64/libstdc++.so.6(_ZNSt6locale5_ImplC1Em+0x1e3)[0x7f075c0005e3]
/lib64/libstdc++.so.6(+0x71555)[0x7f075c001555]
/lib64/libpthread.so.0(+0x620b)[0x7f078c37920b]
/lib64/libstdc++.so.6(+0x715a1)[0x7f075c0015a1]
/lib64/libstdc++.so.6(_ZNSt6localeC2Ev+0x13)[0x7f075c0015e3]
/lib64/libstdc++.so.6(_ZNSt8ios_base4InitC2Ev+0xbc)[0x7f075bffe43c]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so(+0xb1150)[0x7f075bdd0150]
/lib64/ld-linux-x86-64.so.2(+0xf9c3)[0x7f078c7d59c3]
/lib64/ld-linux-x86-64.so.2(+0x1459e)[0x7f078c7da59e]
/lib64/ld-linux-x86-64.so.2(+0xf7d4)[0x7f078c7d57d4]
/lib64/ld-linux-x86-64.so.2(+0x13b8b)[0x7f078c7d9b8b]
/lib64/libdl.so.2(+0xfab)[0x7f078c16ffab]
/lib64/ld-linux-x86-64.so.2(+0xf7d4)[0x7f078c7d57d4]
/lib64/libdl.so.2(+0x15ad)[0x7f078c1705ad]
/lib64/libdl.so.2(dlopen+0x31)[0x7f078c170041]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyImport_FindSharedFuncptr+0x16b)[0x539abb]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyImport_LoadDynamicModuleWithSpec+0x159)[0x503e69]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x501a23]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x46f563]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(PyVectorcall_Call+0x5c)[0x439d8c]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x76d8)[0x42a308]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalCodeWithName+0xadf)[0x4e171f]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyFunction_Vectorcall+0x90)[0x438570]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x422821]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x5f91)[0x428bc1]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x421571]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x422821]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x1fb5)[0x424be5]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x421571]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x422821]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x15af)[0x4241df]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x421571]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x422821]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x15af)[0x4241df]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x421571]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x422821]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x15af)[0x4241df]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x421571]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x437f74]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyObject_CallMethodIdObjArgs+0xf1)[0x439831]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(PyImport_ImportModuleLevelObject+0x3fd)[0x502c8d]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x5ee426]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x437c24]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x76d8)[0x42a308]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalCodeWithName+0xadf)[0x4e171f]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyFunction_Vectorcall+0x90)[0x438570]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x422821]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x15af)[0x4241df]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalCodeWithName+0xadf)[0x4e171f]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyFunction_Vectorcall+0x90)[0x438570]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x437f74]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyObject_CallMethodIdObjArgs+0xf1)[0x439831]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(PyImport_ImportModuleLevelObject+0x4e6)[0x502d76]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x6e78)[0x429aa8]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalCodeWithName+0xadf)[0x4e171f]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(PyEval_EvalCode+0x23)[0x4e1b43]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x5efe34]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python[0x46f563]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(PyVectorcall_Call+0x5c)[0x439d8c]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalFrameDefault+0x76d8)[0x42a308]
/zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/bin/python(_PyEval_EvalCodeWithName+0xadf)[0x4e171f]
======= Memory map: ========
00400000-006f3000 r-xp 00000000 00:31 37327427                           /zapps7/python/3.8.2/gcc-9.2.0/bin/python3.8
008f2000-008f3000 r--p 002f2000 00:31 37327427                           /zapps7/python/3.8.2/gcc-9.2.0/bin/python3.8
008f3000-0092b000 rw-p 002f3000 00:31 37327427                           /zapps7/python/3.8.2/gcc-9.2.0/bin/python3.8
0092b000-0094c000 rw-p 00000000 00:00 0 
01dd2000-03209000 rw-p 00000000 00:00 0                                  [heap]
7f0754000000-7f0754021000 rw-p 00000000 00:00 0 
7f0754021000-7f0758000000 ---p 00000000 00:00 0 
7f075bd1f000-7f075bdbc000 r--p 00000000 00:31 77736144                   /zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
7f075bdbc000-7f075bf0b000 r-xp 0009d000 00:31 77736144                   /zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
7f075bf0b000-7f075bf7f000 r--p 001ec000 00:31 77736144                   /zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
7f075bf7f000-7f075bf80000 ---p 00260000 00:31 77736144                   /zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
7f075bf80000-7f075bf85000 r--p 00260000 00:31 77736144                   /zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
7f075bf85000-7f075bf8f000 rw-p 00265000 00:31 77736144                   /zapps7/deepconsensus/1.0.0/python-3.8.2_cpu/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.sozsh: abort      deepconsensus -h

Is there something different I should have done when installing?

An tutorial for running deepconsensus

My computer hardware equipment look like this:

OS: Ubuntu 20.04.3 LTS (x86_64)
Python version: Python 3.8.10
CPUs: i7 10700k(8c16t, SkyLake)
Memory: 32G
GPU: 1 NVIDIA RTX A4000 8G

Install the requirement packages

Create an environment for deepconsensus using conda

mamba create -n deepconsensus -c bioconda -c conda-forge python=3.8 pbcore pbbam pbccs pbmm2 parallel jq gcc pycocotools bioconda::seqtk bioconda::unimap bioconda::bedtools bioconda::minimap2 bioconda::extracthifi bioconda::zmwfilter bioconda::pysam bioconda::samtools=1.10 bioconda::pyfastx=0.8.4

Download the ACTC for reads mapping

wget https://github.com/PacificBiosciences/align-clr-to-ccs/releases/download/0.1.0/actc 
chmod u+x actc
mv actc PATH/miniconda3/envs/deepconsensus/bin

Install the Deepconsensus[GPU] by using pip

conda activate deepconsensus
pip install deepconsensus[gpu]==0.2.0

Prepare all the needed input file for Deepconsensus

Get the ccs.bam

ccs --all -j 15 raw.subreads.bam out.ccs.bam

Get the subreads_to_ccs.bam

Tips

If you use the actc to map the subreads to ccs without chunks, then you may encounter this error when running the deepconsensus.

I0324 19:48:00.776319 140117319313216 quick_inference.py:492] Processed a batch of 100 ZMWs in 62.39794731140137 seconds
I0324 19:48:00.808807 140117319313216 quick_inference.py:570] Processed 7000 ZMWs in 4584.726703 seconds
Process ForkPoolWorker-1061:
Traceback (most recent call last):
  File "/home/wanglab/miniconda3/envs/deepconsensus/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/home/wanglab/miniconda3/envs/deepconsensus/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/home/wanglab/miniconda3/envs/deepconsensus/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/wanglab/miniconda3/envs/deepconsensus/lib/python3.8/multiprocessing/connection.py", line 405, in _send_bytes
    self._send(buf)
  File "/home/wanglab/miniconda3/envs/deepconsensus/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

This error is caused by the number of stream processors reaching an upper limit as the iteration process increases. To avoid this mistake， the right way is chunking the data when using actc.

Chunking your subreads.bam

### Generating all command lines using shell
for i in {1..1000}; do echo 'actc -j 1 raw.subreads.bam out.ccs.bam subreads_to_ccs.'${i}'.bam --chunk '${i}'/1000' ; done > actc_chunk.job

### Submiting all scripts in parallel using parallel
parallel -j 15 < actc_chunk.job

### Index all the subreads_to_ccs.${i}.fasta
for i in {1..1000}; do echo 'samtools faidx subreads_to_ccs.'${i}'.fasta' ; done > samtools_index.job

parallel -j 15 < samtools_index.job

Get the model for Deepconsensus

mkdir deepconsensus_model && cd deepconsensus_model
wget https://storage.googleapis.com/brain-genomics-public/research/deepconsensus/models/v0.2/params.json
wget https://storage.googleapis.com/brain-genomics-public/research/deepconsensus/models/v0.2/checkpoint-50.index
wget https://storage.googleapis.com/brain-genomics-public/research/deepconsensus/models/v0.2/checkpoint-50.data-00000-of-00001

Run the Deepconsensus

for i in {1..1000};
do
deepconsensus run \
  --subreads_to_ccs=subreads_to_ccs.${i}.bam  \
  --ccs_fasta=subreads_to_ccs.${i}.fasta \
  --checkpoint=deepconsensus_model/checkpoint-50 \
  --output=output.${i}.fastq \
  --batch_zmws=100
done

Merge the output

cat output.*.fastq > total.fastq

Quick start example code question

In the quick start guide, you've got this code:

# Only subread alignments to the correct molecule were retained.
# We used samtools and awk to filter incorrect alignments using the below
# command:
samtools view -h "aligned.subreads.bam" | \
  awk '{ if($1 ~ /^@/) { print; } else { split($1,A,"/"); \
  split($3,B,"/"); if(A[2]==B[2]) { split(A[3],C,"_"); \
  print $0 "\tqs:i:" C[1]; } } }' | samtools view -b > "subreads_to_ccs.bam"

It seems to be checking to make sure subreads are aligned against the corresponding ccs read and that it has a quality start tag that matches the start position from the name. However, at least for my version of pbmm2 (1.5.0), all that seems to already be true, meaning this code doesn't seem to do anything. And it's really slow, too - for my test case, with maybe 32gb of ccs sequence, pbmm2 ran in ~20h on a 64-core machine. But this will take maybe 3x as long, and it's not like I can throw cores at it to speed it up (when I tried adding -@ # to samtools, it'd just core out).

Is this code actually necessary for anything, or can I just remove this step?

(I'll note it also seems to not be doing anything for your test data, although I didn't throroughly check that.)

deepconsensus-0.2.0 uses a lot of system RAM, so maybe best to split jobs into pieces, example provided

Maybe someone finds this useful to break up deepconsensus jobs (pretty much following the the quick start guide) except sorting the BAM file with samtools then indexing with samtools followed by breaking the job input files into 48 parts. If one has access to a cluster, then this is very helpful to submit deepconsensus on one GPU and one CPU core to keep system RAM usage low. I had found that deepconsensus-0.2.0 would quickly uses a lot of system RAM, so for me the steps below were necessary. I also have access to two compute nodes with 8 Nvidia A10 cards each.

# let's say you start with a BAM file called 9.bam that contains the subreads to be converted to ccs/HiFi reads
MIMALLOC_PAGE_RESET=0 MIMALLOC_LARGE_OS_PAGES=1 ccs --all -j 34 9.bam 9.ccs.bam

# use pbindex to index the BAM file if need (https://github.com/PacificBiosciences/align-clr-to-ccs/issues/2)
pbindex 9.bam

# use actc 0.1.0 with 34 cores to align the subreads to the ccs reads
~/bin/actc-0.1.0 -j 34 9.bam 9.ccs.bam 9.subreads_to_ccs.bam

# sort the alignments
samtools sort -@34 9.subreads_to_ccs.bam > 9.subreads_to_ccs_sorted.bam

# index the alignments
samtools index -@34 9.subreads_to_ccs_sorted.bam

# make a FAI index
samtools faidx 9.subreads_to_ccs.fasta

# get the sorted header
samtools view -H 9.subreads_to_ccs_sorted.bam|grep "^@SQ"|cut -f 2-3 |perl -pe "s/SN://g" |perl -pe "s/LN://g" \
> regions

# break into 48 equal parts
fspec=regions
num_files=48

# Work out lines per file.
total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))

# Split the actual file, maintaining lines.
split --lines=${lines_per_file} --numeric-suffixes=1 ${fspec}


# get a BED file of the ccs regions to extract from each of the 48 parts
# also get the read names
# use the read names to get just the FASTA for the 48 ccs parts

for i in `ls x??`
do
    awk '{print $1,0,$2}' OFS='\t' ${i} > ${i}.bed
    cut -f 1 ${i} > ${i}-reads
    seqtk seq -l0 9.subreads_to_ccs.fasta |paste - - |grep -f "${i}-reads" |tr '\t' '\n' > ${i}.fasta
    samtools faidx ${i}.fasta
done

# break the subreads aligned to ccs reads into 48 parts

for i in `ls x??`
do
    samtools view -@20 -h 9.subreads_to_ccs_sorted.bam --regions=${i}.bed |samtools view -@20 -b > ${i}.bam
    samtools index -@20 ${i}.bam
done

this is a SLURM script to submit to a cluster that has Nvidia A10 GPUs, same could be done if you have other GPUs

dc.sh

#!/bin/bash
#
#----------------------------------------------------------------
# running a multiple independent jobs
#----------------------------------------------------------------
#


#  Defining options for slurm how to run
#----------------------------------------------------------------
#
#SBATCH --job-name=dc
#
#Number of CPU cores to use within one node
#SBATCH -c 1
#
#Define the number of hours the job should run.
#Maximum runtime is limited to 10 days, ie. 240 hours
#SBATCH --time=0:30:00
#
#Define the amount of RAM used by your job in GigaBytes
#In shared memory applications this is shared among multiple CPUs
#SBATCH --mem=24G
#
#Do not requeue the job in the case it fails.
#SBATCH --no-requeue
#
# Define number of GPUs
#SBATCH --partition gpu
#SBATCH --gres=gpu:1
#SBATCH --constraint=A10
#
#Do not export the local environment to the compute nodes
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV

# load the respective software module(s) you intend to use
#----------------------------------------------------------------
module load python/3.7
module load cuda/11.2.2
module load cudnn/8.1
. "/nfs/scistore16/itgrp/jelbers/miniconda3/etc/profile.d/conda.sh"

# slurm ids are 1,2,3,4,5,6,7,8,9 whilst 
# object names are 01,02,03,04,05,06,07,08,09
# so modify with this if statement
if [[ ${SLURM_ARRAY_TASK_ID} -lt 10 ]]
then
    i=`echo "${SLURM_ARRAY_TASK_ID}"|perl -pe "s/(\d+)/0\1/g"`
else
    i=${SLURM_ARRAY_TASK_ID}
fi

export PATH="/nfs/scistore12/itgrp/jelbers/.local/bin:$PATH"

cd /nfs/scistore16/itgrp/jelbers/deepconsensus_quick_start/9.all

hostname
echo "--subreads_to_ccs=x${i}.bam"
echo "--ccs_fasta=x${i}.fasta"
echo "--output=9.deepconsensus.${i}.fastq"
deepconsensus run --cpus 1 --subreads_to_ccs=x${i}.bam \
--ccs_fasta=x${i}.fasta \
--checkpoint=/nfs/scistore16/itgrp/jelbers/deepconsensus_quick_start/model/checkpoint-50 \
--output=9.deepconsensus.${i}.fastq

submit to SLURM scheduler

sbatch --array=1-48 dc.sh

Nofound model for Quick start

I followed "Quick start for DeepConsensus", downloaded test data. However, when I run commend:
gsutil cp gs://brain-genomics-public/research/deepconsensus/models/v0.1/* "${MODEL_DIR}"/
There is no no matches found.

I also wonder what does "checkpoint-50" mean for "python3 -m deepconsensus.scripts.run_deepconsensus". How to get checkpoint file if I want to run my own data?

Error: ModuleNotFoundError: No module named 'pandas._libs.interval'

Hi,
Getting this error with new deepconsensus 1.0.0. I'm running the exact same command that worked for the previous deepconsensus version. I'm running from the docker.

2022-10-11 16:17:35.711032: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/__init__.py", line 30, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
ModuleNotFoundError: No module named 'pandas._libs.interval'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/bio/bin/deepconsensus", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 111, in run
    app.run(main, flags_parser=parse_flags)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 99, in main
    from deepconsensus.inference import quick_inference
  File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 53, in <module>
    import pandas as pd
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/__init__.py", line 34, in <module>
    raise ImportError(
ImportError: C extension: No module named 'pandas._libs.interval' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.

deepconsensus on metagenomic data

Hi,

Is it ok to run deepconsensus on HiFi metagenomic data? I can see that it has been tested on human genomic data.

Many thanks

General performance question

Hi,

This is a great tool and we really want to put it into our production. However, after I tried out a small-scale run, the main issue was time-consuming to run deepconsensus itself (the last step). It took 616 seconds to run only 1000 ZMWs. I used one V100 GPU with 32G memory and 16 CPUs on our HPC system, I don't know how much time is needed to finish one regular SMRTCell and generate HiFi reads, in general, 8M ZMWs will take about 2 months to get the results.

I would like to get your input on the possibility to speed up the last step, more GPUs? more CPUs? or adjust the batch size?

Thanks

Jack

I0216 11:21:42.449043 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 58.32757616043091 seconds
I0216 11:21:42.460514 140621518309184 quick_inference.py:570] Processed 100 ZMWs in 62.885846 seconds
I0216 11:22:43.300720 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 56.282196283340454 seconds
I0216 11:22:43.311169 140621518309184 quick_inference.py:570] Processed 200 ZMWs in 123.736485 seconds
I0216 11:23:44.079205 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 56.101356506347656 seconds
I0216 11:23:44.090897 140621518309184 quick_inference.py:570] Processed 300 ZMWs in 184.516218 seconds
I0216 11:24:41.855663 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 53.60238575935364 seconds
I0216 11:24:41.864543 140621518309184 quick_inference.py:570] Processed 400 ZMWs in 242.289871 seconds
I0216 11:25:45.063106 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 58.29405975341797 seconds
I0216 11:25:45.074118 140621518309184 quick_inference.py:570] Processed 500 ZMWs in 305.499436 seconds
I0216 11:26:47.627478 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 57.61990475654602 seconds
I0216 11:26:47.638337 140621518309184 quick_inference.py:570] Processed 600 ZMWs in 368.063655 seconds
I0216 11:27:50.719364 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 58.07153558731079 seconds
I0216 11:27:50.729454 140621518309184 quick_inference.py:570] Processed 700 ZMWs in 431.154773 seconds
I0216 11:28:52.719674 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 57.31240630149841 seconds
I0216 11:28:52.730889 140621518309184 quick_inference.py:570] Processed 800 ZMWs in 493.156202 seconds
I0216 11:29:55.011658 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 57.378705739974976 seconds
I0216 11:29:55.021590 140621518309184 quick_inference.py:570] Processed 900 ZMWs in 555.446918 seconds
I0216 11:30:55.729195 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 56.30019426345825 seconds
I0216 11:30:55.740186 140621518309184 quick_inference.py:570] Processed 1000 ZMWs in 616.165509 seconds

Advice for dealing with barcode multiplexing

Hi,

I'd like to use deepconsensus on a set of HiFi datasets that were generated on a Sequel II but with barcoded multiplexing. My understanding is that without running the CCS step first detecting barcodes can be difficult. Do you have a recommended pipeline for going from a barcoded subread bam file in order to generate the subset of necessary inputs for CCS/actc/deepconsensus? Thank you for your time and advice.

general performance question and --cpu flag is not working

Hi Developers,

first thanks for the great tool. It really improves the reads and the assemblies as well.

However the compute requirements are still challenging. We do have a Intel(R) Xeon(R) CPU E5-2680 v3 nodes with 24 cores.
By default the cpu version requests 23 cores, but it never achieves the 23 threads. On average the jobs run with 12-16 threads in total, because many threads are stalling (waiting for IO?). Do you have any suggestions of how to improve the performance?
Copying the input data to a local ssd disc or changing batch_zmws or batch_size did not change anything. Do you think copying the data into /dev/shm could help?

I also tried to change the number of cpus with --cpu flag. But that does not work. Deepconsensus always uses #-cores-1 many threads, even if the log reports a change according to --cpu user input.

I tried the gpu version as well. Is there an option to restrict the number of gpu to use? We do have to 2 gpu's node. But sometime's other users are using one of them. Any advice would be highly appreciated.

Thanks,
Martin

can deepconsensus run on an arm machine？

hello，deepconsensus team!
Thanks for make an amazing job for hifi sequencing. It is useful to work on an x86 cpu or gpu machine.
However when I try to install it on an arm hpc, I was fail for installing the base requirements pacakage.
Do you have the plan to move the deepconsensus on the arm machine ?

Training tutorial?

Curious if there will be a tutorial (sorry if I missed it somewhere in repo) for training to make a custom DeepConsensus model for other than human PacBio HiFi reads? I tried DeepConsensus on Bacterial PacBio HiFi/CCS reads, and as expected, it does not perform as well as it does for Human.

pbmm2 using too much memory

my machine is down when i using pbmm2 aligning,my machine memory is 500G,and the hifi data 130G.how to solve it ?

Docker - tensorflow error

Hi @MariaNattestad,

In the last months, I have used the deepconsensus v0.2.0 without problems.
Now I intend to move to the 0.3.0 version.
The docker image seems to work ok.
I downloaded the model directly from the git ./deepconsensus/testdata/model.

When I run the deepconsensus the process stop with the error below:

I0708 11:35:33.810553 140663252805440 quick_inference.py:727] Using multiprocessing: cpus is 30.
I0708 11:35:33.820718 140663252805440 quick_inference.py:459] Loading model/checkpoint
2022-07-08 11:35:33.827062: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 
I0708 11:35:33.948971 140663252805440 networks.py:358] Condensing input.
2022-07-08 11:35:34.684176: W tensorflow/core/util/tensor_slice_reader.cc:96] Could not open model/checkpoint: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 92, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unable to open table file model/checkpoint: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/deepconsensus", line 8, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.8/dist-packages/deepconsensus/cli.py", line 111, in run
    app.run(main, flags_parser=parse_flags)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.8/dist-packages/deepconsensus/cli.py", line 102, in main
    app.run(quick_inference.main, argv=passed)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.8/dist-packages/deepconsensus/inference/quick_inference.py", line 814, in main
    outcome_counter = run()
  File "/usr/local/lib/python3.8/dist-packages/deepconsensus/inference/quick_inference.py", line 734, in run
    loaded_model, model_params = initialize_model(
  File "/usr/local/lib/python3.8/dist-packages/deepconsensus/inference/quick_inference.py", line 476, in initialize_model
    checkpoint.restore(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/util.py", line 2537, in restore
    status = self.read(save_path, options=options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/util.py", line 2417, in read
    result = self._saver.restore(save_path=save_path, options=options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/tracking/util.py", line 1423, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 96, in NewCheckpointReader
    error_translator(e)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 40, in error_translator
    raise errors_impl.DataLossError(None, None, error_message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file model/checkpoint: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Can you help me with the error?

Best Regards
André

Error: model/params.json not found

Hi,
I'm running the below and getting this error:

singularity run deepconsensus_0.3.1.sif deepconsensus run ... [per the quick start guidelines]
...
tensorflow.python.framework.errors_impl.NotFoundError: model/params.json; No such file or directory

install problem

there are error show you blow when using code pip install deepconsensus==0.1.0

run issue

I have some issues to run deepconsensus on our Centos 7 linux cluster. Any advice would be highly appreciated.

python3.6 -m deepconsensus.scripts.run_deepconsensus --input_subreads_aligned=aligned.pbmm2_to_ccs.bam --input_subreads_unaligned=m64046_210915_224507.subreads.bam --input_ccs_fasta=m64046_210915_224507.ccs.fasta --output_directory=out --checkpoint=${CHECKPOINT_PATH}
Traceback (most recent call last):
  File "/projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/site-packages/deepconsensus/__init__.py", line 35, in <module>
    from nucleus.io import bed
  File "/projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/site-packages/nucleus/io/bed.py", line 62, in <module>
    from nucleus.io import genomics_reader
  File "/projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/site-packages/nucleus/io/genomics_reader.py", line 68, in <module>
    from nucleus.io.python import tfrecord_reader
ImportError: /lib64/libm.so.6: version 'GLIBC_2.23' not found (required by /projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/site-packages/nucleus/io/python/tfrecord_reader.so)

I also got the following two errors during the installation:

  ERROR: Command errored out with exit status 1:
   command: /projects/dazzler/pippel/prog/pkgs/deepconsensus/bin/python3.6 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-pwwbu26e/google-nucleus_2aee0a6f3e0749a692932a2d8a48e902/setup.py'"'"'; __file__='"'"'/tmp/pip-install-pwwbu26e/google-nucleus_2aee0a6f3e0749a692932a2d8a48e902/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open
)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-s20xid5d
       cwd: /tmp/pip-install-pwwbu26e/google-nucleus_2aee0a6f3e0749a692932a2d8a48e902/
  Complete output (3 lines):
  /projects/dazzler/pippel/prog/pkgs/deepconsensus/lib/python3.6/site-packages/setuptools/version.py:1: UserWarning: Module google was already imported from None, but /tmp/pip-install-pwwbu26e/google-nucleus_2aee0a6f3e0749a692932a2d8a48e902 is being added to sys.path
    import pkg_resources
  This package does not support wheel creation.
  ----------------------------------------
  ERROR: Failed building wheel for google-nucleus
  Running setup.py clean for google-nucleus
  Building wheel for avro-python3 (setup.py) ... done
  Created wheel for avro-python3: filename=avro_python3-1.9.2.1-py3-none-any.whl size=43512 sha256=ca414c03c5335f98f1cff074ad5f8399c4ad5781fa3d6aac92b2107fd5e138cb
  Stored in directory: /lustre/projects/dazzler/pippel/prog/pkgs/deepconsensus/cache/pip/wheels/4e/08/0c/727bff8f20fedbdeb8a2c5214e460b214d41c10dc879cf6dac
  Building wheel for crcmod (setup.py) ... done
  Created wheel for crcmod: filename=crcmod-1.7-cp36-cp36m-linux_x86_64.whl size=36011 sha256=8bf271e9143d8486dfc10aad8fd01ff27c61328616118bf24fb4dd872f3016dd
  Stored in directory: /lustre/projects/dazzler/pippel/prog/pkgs/deepconsensus/cache/pip/wheels/ac/bb/07/adfb4ffd0aaace2022ea25c082a7cdc688b10d30e86d6d2fde
  Building wheel for kaggle (setup.py) ... done
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73051 sha256=aab46bf6b1503207873dc81c1e09f5e1febf195e534636f4d833c2be1004384f
  Stored in directory: /lustre/projects/dazzler/pippel/prog/pkgs/deepconsensus/cache/pip/wheels/77/47/e4/44a4ba1b7dfd53faaa35f59f1175e123b213ff401a8a56876b
  Building wheel for oauth2client (setup.py) ... done
  Created wheel for oauth2client: filename=oauth2client-3.0.0-py3-none-any.whl size=106375 sha256=006538ef14effae42f9742b82416925bd713eaa35098036891603963ae88a86e
  Stored in directory: /lustre/projects/dazzler/pippel/prog/pkgs/deepconsensus/cache/pip/wheels/85/84/41/0db9b5f02fab88d266e64a52c5a468a3a70f6d331e75ec0e49
  Building wheel for py-cpuinfo (setup.py) ... done
  Created wheel for py-cpuinfo: filename=py_cpuinfo-8.0.0-py3-none-any.whl size=22258 sha256=c2de11c9b239c87ac3f1fd52183e4ec7a541cc0d84108408ce2723d2de89c6b2
  Stored in directory: /lustre/projects/dazzler/pippel/prog/pkgs/deepconsensus/cache/pip/wheels/3e/e1/d9/9b782b170e5272d6500cee4d29dd6c724598b22dc399d81d01
Successfully built avro-python3 crcmod kaggle oauth2client py-cpuinfo
Failed to build google-nucleus
Installing collected packages: tensorflow-model-optimization, tensorflow-hub, tensorflow-datasets, tensorflow-addons, tensorflow, sentencepiece, scipy, pyyaml, pymongo, pydot, pyarrow, py-cpuinfo, psutil, pandas, opencv-python-headless, oauth2client, mock, matplotlib, kaggle, ipython, hdfs, google-cloud-bigquery, google-api-python-client, gin-config, fastavro, Cython, crcmo
d, contextlib2, avro-python3, tf-models-official, ml-collections, google-nucleus, apache-beam, deepconsensus
    Running setup.py install for google-nucleus ... done
  DEPRECATION: google-nucleus was installed using the legacy 'setup.py install' method, because a wheel could not be built for it. A possible replacement is to fix the wheel build issue reported above. You can find discussion regarding this at https://github.com/pypa/pip/issues/8368.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
crossmap 0.4.2 requires bx-python, which is not installed.
crossmap 0.4.2 requires pyBigWig, which is not installed.

Thanks,
Martin

Error detecting params.json using docker in debian (10) HPC

docker run google/deepconsensus:1.1.0 deepconsensus run --subreads_to_ccs=m54274Ue_220814_163631.aligned.subreads.bam --ccs_bam=m54274Ue_220814_163631.hifi_S3_reads.bam --checkpoint=model/checkpoint --output=m54274Ue_220814_163631_deepcon.output.fastq
Traceback (most recent call last):
File "/opt/conda/envs/bio/bin/deepconsensus", line 8, in
sys.exit(run())
File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 111, in run
app.run(main, flags_parser=parse_flags)
File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 102, in main
app.run(quick_inference.main, argv=passed)
File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 842, in main
outcome_counter = run()
File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 703, in run
params = model_utils.read_params_from_json(checkpoint_path=FLAGS.checkpoint)
File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/models/model_utils.py", line 405, in read_params_from_json
json.load(tf.io.gfile.GFile(json_path, 'r')))
File "/opt/conda/envs/bio/lib/python3.9/json/init.py", line 293, in load
return loads(fp.read(),
File "/opt/conda/envs/bio/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/opt/conda/envs/bio/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.NotFoundError: model/params.json; No such file or directory

I am getting this error even though i have all the files from model
checkpoint.data-00000-of-00001
checkpoint.index
params.json
in the model dir.

Error (ccs software) - No space left on device (tmp file).

Dear @pichuan,

Using the docker system, I installed and run the tests successfully on the second version of the software in our cluster.
Now, during the tests with real data, we found some issues in the ccs software.
Usually, we run the software in the nodes, and the output it's printed in the front-end.
The front-end has more than 1PB of space, while the nodes only have +-60gb.
The tmp files seem to be saved in node, right? It's possible to relocate these temp files to another path?

Below, you can consult the error.

Best Regard

André

no subreads available

Hi,

I am working with sequencing output coming from Sequel IIe meaning that I only have the ccs.bam files available and not the subreads.bam.
Is DeepConsensus can work without the alignment of subreads to ccs using actc in the preliminary step?

Thanks

deepconsensus.scripts.run_deepconsensus errors

i got some errors show in blow when i run python3 -m deepconsensus.scripts.run_deepconsensus --input_subreads_aligned=P05TYD21354128-1_r64114_20210126_052631_1_A02.subreads_to_ccs.bam --input_subreads_unaligned=DRC_chr3_20Mb_rmTail.P05TYD21354128-1_r64114_20210126_052631_1_A02.bam --input_ccs_fasta=P05TYD21354128-1_r64114_20210126_052631_1_A02.ccs.fasta --output_directory=1.tmp --checkpoint=01.DeepConsusModel

i dont know how to solve it ,could you help me ?thanks~

pbccs parameters

Hello,

In the online methods, you indicate that you didn't apply any filtering based on read quality for the CCS reads, and reads of qualities were included in the training dataset from pbccs output. Can you indicate the exact command line you used to generate ccs file?

Regards,
Alex

actc and kinetic data orientation

Hi, actc is used to align subreads to ccs, but does actc reverse the orientation of kinetic (ip/pw) data in the BAM file it outputs? Or does Deepconsensus do this? I'm assuming either actc or Deepconsensus would have to do this to ensure that the kinetic data is in the correct orientation with the subreadstoccs alignment.

Thanks.

BrokenPipeError: [Errno 32] Broken pipe

$time deepconsensus run --subreads_to_ccs=aligned.subreads_to_ccs.bam --ccs_fasta=HiFiCCS.fasta --checkpoint=/home/models/checkpoint-50 --output=./dc/out.fastq --batch_zmws=100 --cpus=18 >& report.log

2022-01-21 12:09:55.926395: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-21 12:09:55.939376: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
I0121 12:09:56.290417 139850387081024 quick_inference.py:358] Finished initialize_model.
I0121 12:09:56.290970 139850387081024 quick_inference.py:530] Model setup took 0.43245720863342285 seconds.
I0121 12:10:06.089605 139850387081024 quick_inference.py:436] Using multiprocessing: cpus is 18.
WARNING:tensorflow:From /tools/deep/lib/python3.8/site-packages/official/nlp/transformer/attention_layer.py:54: DenseEinsum.init (from official.nlp.modeling.layers.dense_einsum) is deprecated and will be removed in a future version.
Instructions for updating:
DenseEinsum is deprecated. Please use tf.keras.experimental.EinsumDense layer instead.
W0121 12:10:14.684705 139850387081024 deprecation.py:341] From /deep/lib/python3.8/site-packages/official/nlp/transformer/attention_layer.py:54: DenseEinsum.init (from official.nlp.modeling.layers.dense_einsum) is deprecated and will be removed in a future version.
Instructions for updating:
DenseEinsum is deprecated. Please use tf.keras.experimental.EinsumDense layer instead.
/deep/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py:242: RuntimeWarning: divide by zero encountered in log10
quality_scores = -10 * np.log10(error_prob)
I0121 12:12:50.063338 139850387081024 quick_inference.py:492] Processed a batch of 100 ZMWs in 163.97370052337646 seconds
I0121 12:12:50.078274 139850387081024 quick_inference.py:570] Processed 100 ZMWs in 173.787042 seconds
I0121 12:15:39.261736 139850387081024 quick_inference.py:492] Processed a batch of 100 ZMWs in 161.06492471694946 seconds
.
.
.
.
I0122 02:50:06.622056 139850387081024 quick_inference.py:492] Processed a batch of 100 ZMWs in 228.55985140800476 seconds
I0122 02:50:06.664712 139850387081024 quick_inference.py:570] Processed 26500 ZMWs in 52810.373480 seconds
Process ForkPoolWorker-4783:
Traceback (most recent call last):
File "/anaconda3/lib/python3.8/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/anaconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
self._writer.send_bytes(obj)
File "/anaconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/anaconda3/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header)
File "/anaconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Lower number of >Q30 average quality reads for v1.1 compared to v0.3

Hi all,

I am assembling a genome of a land snail that has extreme repeat content (~85%) and large genome size (6.6 gb). My mean insert size is 8kb and I have data from six SMRT cells.

I have ran DeepConsensus (cpu only) on my six SMRT cells using v0.3 and on two of the SMRT cells using v1.1. I have noticed I have gotten more reads with >Q20 average quality using v1.1 but a lower number of reads that are >Q30 compared to v.0.3. Histograms of average read quality attached.
v0.3.qchist.txt
v1.1.qchist.txt

Manual inspection of the same reads from either version confirmed that most reads had longer regions of lower quality in v1.1 than v0.3. First 100 reads for v0.3 and v1.1 attached (.txt extension for github upload).
v0.3_smrtcell1_100.fastq.txt
v1.1_smrtcell1_100.fastq.txt

This was surprising to me as my expectation was that the >Q20 yield would remain relatively constant between versions but >Q30 yield would increase.

Perhaps this is the result of lower insert length or high repeat content of this library? I would appreciate hearing the DeepConsensus team's thoughts on this discrepancy. Any help would be appreciated!

Thanks!
Mason

pip install deepconsensus==0.1.0 fails...

Hello,

I don't have sudo privileges on my server and I would love to try using deepconsensus. I have attempting installing deepconsensus through the recommended method.

The following commands:

pip install deepconsensus==0.1.0
pip install deepconsensus

returns

ERROR: Could not find a version that satisfies the requirement deepconsensus (from versions: none)
ERROR: No matching distribution found for deepconsensu

The server details are as follows:
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Regards,
Sangjin

Two alignments from the same subread to the CCS read

Hello,

I was exploring the test dataset and the alignment before applying deepConsensus on my dataset. I noticed that m64014_181209_091052/3146438/0_11098 subread was aligned both in the forward and reverse direction to the m64014_181209_091052/3146438/ccs.

I would assume that both alignments will be used for the consensus calling by deepconsenus, which I believe should not be the case. Or does deepconsensus have an internal filter to account for double counting of the subreads for the target CCS read?

Regards,
Sangjin

error in 1_merge_datasets step

Dear all,
I'm a PhD student working with a series of Pacbio CCS reads and I noticed the release of the preprint paper of the deepconsensus. This tool can improve the correctness of CCS reads generated from pbccs. I installed it on our lab server and met with the following error when using it:

$ INPUTS="$(pwd)"
$ OUTPUTS="$(pwd)"
$ CHECKPOINT_PATH="/home/data/vip21/jgs/test_deepconsensus/models/checkpoint-50"
$ python3 -m deepconsensus.scripts.run_deepconsensus --input_subreads_aligned=${INPUTS}/subreads_to_ccs.bam --input_subreads_unaligned=${INPUTS}/subreads.bam --input_ccs_fasta=${INPUTS}/ccs.fasta --output_directory=${OUTPUTS} --checkpoint=${CHECKPOINT_PATH}

***** Running the command:*****
python3 -m deepconsensus.preprocess.merge_datasets   --input_bam=/home/data/vip21/jgs/test_deepconsensus/subreads_to_ccs.bam   --input_unaligned_bam=/home/data/vip21/jgs/test_deepconsensus/subreads.bam   --output_path=/home/data/vip21/jgs/test_deepconsensus/1_merge_datasets   --inference=true
*******************************

2021-09-04 09:01:31.543001: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-09-04 09:01:31.543076: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
I0904 09:01:43.412593 140136974808896 fn_api_runner_transforms.py:548] ==================== <function annotate_downstream_side_inputs at 0x7f71432847b8> ====================
I0904 09:01:43.413714 140136974808896 fn_api_runner_transforms.py:548] ==================== <function fix_side_input_pcoll_coders at 0x7f71432848c8> ====================
I0904 09:01:43.414398 140136974808896 fn_api_runner_transforms.py:548] ==================== <function lift_combiners at 0x7f7143284950> ====================
I0904 09:01:43.414686 140136974808896 fn_api_runner_transforms.py:548] ==================== <function expand_sdf at 0x7f71432849d8> ====================
I0904 09:01:43.416237 140136974808896 fn_api_runner_transforms.py:548] ==================== <function expand_gbk at 0x7f7143284a60> ====================
I0904 09:01:43.417087 140136974808896 fn_api_runner_transforms.py:548] ==================== <function sink_flattens at 0x7f7143284b70> ====================
I0904 09:01:43.417660 140136974808896 fn_api_runner_transforms.py:548] ==================== <function greedily_fuse at 0x7f7143284bf8> ====================
I0904 09:01:43.420369 140136974808896 fn_api_runner_transforms.py:548] ==================== <function read_to_impulse at 0x7f7143284c80> ====================
I0904 09:01:43.420574 140136974808896 fn_api_runner_transforms.py:548] ==================== <function impulse_to_input at 0x7f7143284d08> ====================
I0904 09:01:43.420840 140136974808896 fn_api_runner_transforms.py:548] ==================== <function inject_timer_pcollections at 0x7f7143284ea0> ====================
I0904 09:01:43.421363 140136974808896 fn_api_runner_transforms.py:548] ==================== <function sort_stages at 0x7f7143284f28> ====================
I0904 09:01:43.421562 140136974808896 fn_api_runner_transforms.py:548] ==================== <function window_pcollection_coders at 0x7f7142d02048> ====================
I0904 09:01:43.425726 140136974808896 statecache.py:154] Creating state cache with size 100
I0904 09:01:43.427177 140136974808896 fn_api_runner.py:2011] Created Worker handler <apache_beam.runners.portability.fn_api_runner.EmbeddedWorkerHandler object at 0x7f7142cd9cf8> for environment urn: "beam:env:embedded_python:v1"

I0904 09:01:43.427564 140136974808896 fn_api_runner.py:974] Running (((((ref_AppliedPTransform_write_merged_subreads/Write/WriteImpl/DoOnce/Impulse_50)+(ref_AppliedPTransform_write_merged_subreads/Write/WriteImpl/DoOnce/FlatMap(<lambda at core.py:2639>)_51))+(ref_AppliedPTransform_write_merged_subreads/Write/WriteImpl/DoOnce/Map(decode)_53))+(ref_AppliedPTransform_write_merged_subreads/Write/WriteImpl/InitializeWrite_54))+(ref_PCollection_PCollection_33/Write))+(ref_PCollection_PCollection_34/Write)
I0904 09:01:43.482085 140136974808896 fn_api_runner.py:974] Running (((ref_AppliedPTransform_read_unaligned_reads/Read/_SDFBoundedSourceWrapper/Impulse_23)+(read_unaligned_reads/Read/_SDFBoundedSourceWrapper/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction))+(read_unaligned_reads/Read/_SDFBoundedSourceWrapper/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction))+(ref_PCollection_PCollection_13_split/Write)
I0904 09:01:43.515367 140136974808896 fn_api_runner.py:974] Running ((((ref_PCollection_PCollection_13_split/Read)+(read_unaligned_reads/Read/_SDFBoundedSourceWrapper/ParDo(SDFBoundedSourceDoFn)/Process))+(ref_AppliedPTransform_reshuffle_unaligned_reads/AddRandomKeys_26))+(ref_AppliedPTransform_reshuffle_unaligned_reads/ReshufflePerKey/Map(reify_timestamps)_28))+(reshuffle_unaligned_reads/ReshufflePerKey/GroupByKey/Write)
2021-09-04 09:01:43.533782: W nucleus/io/sam_reader.cc:115] Unknown tag pb: in header line, ignoring: @HD	VN:1.5	SO:unknown	pb:3.0.7
I0904 09:01:43.533916 140124420044544 genomics_reader.py:208] Reading /home/data/vip21/jgs/test_deepconsensus/subreads.bam with NativeSamReader
I0904 10:52:22.286793 140136974808896 fn_api_runner.py:974] Running ((((((reshuffle_unaligned_reads/ReshufflePerKey/GroupByKey/Read)+(ref_AppliedPTransform_reshuffle_unaligned_reads/ReshufflePerKey/FlatMap(restore_timestamps)_33))+(ref_AppliedPTransform_reshuffle_unaligned_reads/RemoveRandomKeys_34))+(ref_AppliedPTransform_get_unaligned_read_name_35))+(ref_AppliedPTransform_group_by_read_name/pair_with_1_38))+(group_by_read_name/Flatten/Transcode/1))+(group_by_read_name/Flatten/Write/1)
Traceback (most recent call last):
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/site-packages/deepconsensus/scripts/run_deepconsensus.py", line 246, in <module>
    app.run(main)
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/site-packages/deepconsensus/scripts/run_deepconsensus.py", line 238, in main
    example_width=EXAMPLE_WIDTH)
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/site-packages/deepconsensus/scripts/run_deepconsensus.py", line 220, in run_deepconsensus
    run_command(command, dry_run=dry_run, log_file=log_file)
  File "/home/data/vip21/miniconda3/envs/deepconsensus/lib/python3.6/site-packages/deepconsensus/scripts/run_deepconsensus.py", line 184, in run_command
    raise RuntimeError(f'Command failed: \n{command}\n')
RuntimeError: Command failed: 
python3 -m deepconsensus.preprocess.merge_datasets   --input_bam=/home/data/vip21/jgs/test_deepconsensus/subreads_to_ccs.bam   --input_unaligned_bam=/home/data/vip21/jgs/test_deepconsensus/subreads.bam   --output_path=/home/data/vip21/jgs/test_deepconsensus/1_merge_datasets   --inference=true

My ccs.fasta file s about 800Mb, my subreads_to_ccs.bam file is 8.9Gb, and my subreads.bam is about 28Gb.
Can anyone help me to solve this problem?
Best,
Guo-Song

params.json and checkpoint sample files

Hi,

I'm interested in running deepconcensus on a couple of different HiFi projects. I'm able to setup and run the sample datasets on CPU and GPU. The problem arises when I try to run my own datasets, in that the guidelines for deepconcensus do not describe how to create the param.json or checkpoint files which are needed for running deepconcensus run. I might be missing something though. Please let me know if I can provide any additional information for troubleshooting my issue.

Use of samtools and awk to filter incorrect alignments.

I was trying to utilize the pipeline for one of my samples. But I am not able to understand the filtering process.

Its not present in the quick start guide.

But its there in the paper.
Do i need to use this filtering process with the latest version of the Deep Consensus or its been added in the new release.

Command from Paper
samtools view -h "aligned.subreads.bam" |
awk '{ if($1 ~ /^@/) { print; } else { split($1,A,"/");
split($3,B,"/"); if(A[2]==B[2]) { split(A[3],C,"_");
print $0 "\tqs:i:" C[1]; } } }'

Also In this command the input was aligned.subreads.bam. But there is no output file mentioned for this step.

-Will it produce an output (I guess it should produce output as awk is piped with the samtools view command).
-if output will be produced in this step .It will be in which format (Because currently it prints in the terminal, And i see that in the next step i.e DeepConsensus run .bam file is used for subreads.aligned.bam ). And awk will produce a file that will not be .bam (If i am not wrong)

How to improve run time?

I tried to solve my issue #10 by creating a Singularity container and it seems to work on our compute cluster. However it is very slow. Do you have any advice how to speed up the deepconsenus step?

I created a toy data set, with the following specs:

2.0G m54345U_211128_022942.chunk0.subreads.bam
60M  m54345U_211128_022942.chunk0.ccs.fasta
1.8G subreads_to_ccs.bam

I started deepconsensus (on a 24 core machine, 250Gb RAM) with the default args:

$SING_CMD python3 -m deepconsensus.scripts.run_deepconsensus --input_subreads_aligned=subreads_to_ccs.bam --input_subreads_unaligned=split/m54345U_211128_022942.chunk0.subreads.bam --input_ccs_fasta=ccs/m54345U_211128_022942.chunk0.ccs.fasta --output_directory=deepconsensus --checkpoint=${CHECKPOINT_PATH}

After almost 7 hours run time it is still in step 2 2_generate_input.
It is also using only a single thread. This is a snapshot of htop:

74354 pippel     20   0 70.6G 64.8G  100M R 100. 25.8  6h26:45 python3 -m deepconsensus.preprocess.generate_input --merged_datasets_path=deepconsensus/1_merge_datasets --output_path=deepconsensus/2_generate_input --input_ccs_fasta=ccs/m54345U_211128_022942.chunk0.ccs.fas
74160 pippel     20   0 70.6G 64.8G  100M S 100. 25.8  6h37:12 python3 -m deepconsensus.preprocess.generate_input --merged_datasets_path=deepconsensus/1_merge_datasets --output_path=deepconsensus/2_generate_input --input_ccs_fasta=ccs/m54345U_211128_022942.chunk0.ccs.fas

Additionally, I do get the following tensorflow error that might be related to my problem:

 $SING_CMD python3 -m deepconsensus.preprocess.generate_input --helpfull
2021-12-03 09:26:35.456162: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs
2021-12-03 09:26:35.456208: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Is deepconsensus using a gpu? Any advice is highly appreciated.

Thanks,
Martin

question about the permutation invariance in the MSA input to the DeepConsensus model

Hello,

Thank you for this interesting work. I have a question regarding the input to the model. From what I can tell, you convert subread sequences from a multiple sequence alignment into input tensors (adding additional information as well). What I cannot understand is whether your model is invariant to permutations of the order of subread sequences in the MSA. When you trained the model, how did you address this issue? Did you use some sort of default order of the subread sequences in the MSA?

Merging aligned fastq output with original ccs BAM

Is there a recommended approach / best practices for merging aligned deepconsensus fastq output with the original ccs BAM file, so that aligned deepconsensus reads are re-associated with the tags from the original ccs BAM (eg, ec, np, sn, zm, etc tags)?

(This issue could also be fixed by making deepconsensus output as BAM instead of fastq. )

HPC resource managment not working with deepconsensus

Hello, I have a few partitions in the HPC of my work, and some of them have nodes with multiple GPUs. Since this is a shared space I would like to not block a whole node when I can use just 1 or 2 GPUs, but deepconsensus uses all of the GPUs available even if I limit the number of requested GPUs in SLURM. There is a way to avoid that deepconsensus uses all the available resources?

Error: ValueError: Shapes (2640, 280) and (560, 280) are incompatible

Hi,
I'm getting the below error:

Total params: 9,525,067
Trainable params: 9,525,067
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 130, in restore
    assigned_variable = resource_variable_ops.shape_safe_assign_variable_handle(
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 308, in shape_safe_assign_variable_handle
    shape.assert_is_compatible_with(value_tensor.shape)
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py", line 1291, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2640, 280) and (560, 280) are incompatible

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/bio/bin/deepconsensus", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/cli.py", line 111, in run
    app.run(main, flags_parser=parse_flags)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/cli.py", line 102, in main
    app.run(quick_inference.main, argv=passed)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py", line 814, in main
    outcome_counter = run()
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py", line 734, in run
    loaded_model, model_params = initialize_model(
  File "/opt/conda/envs/bio/lib/python3.8/site-packages/deepconsensus/inference/quick_inference.py", line 476, in initialize_model
    checkpoint.restore(
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 2537, in restore
    status = self.read(save_path, options=options)
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 2417, in read
    result = self._saver.restore(save_path=save_path, options=options)
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1468, in restore
    base.CheckpointPosition(
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 295, in restore
    restore_ops = trackable._restore_from_checkpoint_position(self)  # pylint: disable=protected-access
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 1060, in _restore_from_checkpoint_position
    current_position.checkpoint.restore_saveables(tensor_saveables,
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 349, in restore_saveables
    new_restore_ops = functional_saver.MultiDeviceSaver(
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 415, in restore
    restore_ops = restore_fn()
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 398, in restore_fn
    restore_ops.update(saver.restore(file_prefix, options))
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 112, in restore
    restore_ops[saveable.name] = saveable.restore(
  File "/home/user01/.local/lib/python3.8/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 133, in restore
    raise ValueError(
ValueError: Received incompatible tensor with shape (560, 280) when attempting to restore variable with shape (2640, 280) and name model/transformer_input_condenser/kernel/.ATTRIBUTES/VARIABLE_VALUE.

CPU installation on HPC, no sudo, no docker/singularity

Hello! 👋

I am trying to install DeepConsensus on an HPC environment (no GPU) without root permissions, and without Docker/Singularity access. I am approaching this by making a deepconsensus venv and trying to install with a modified install script. I have done the following steps:

python3 -m venv $SCRATCH/venvs/deepconsensus_venv_1
source deepconsensus_venv_1/bin/activate
source install_edit.sh , where install_edit.sh skips the apt-get steps and just upgrades pip + installs requirements.txt & intel-tensorflow for the venv

Here are the contents of install_edit.sh:

#!/bin/bash
# Copyright (c) 2021, Google Inc.
# All rights reserved.
# 
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
# 
# 1. Redistributions of source code must retain the above copyright notice, this
#    list of conditions and the following disclaimer.
# 
# 2. Redistributions in binary form must reproduce the above copyright notice,
#    this list of conditions and the following disclaimer in the documentation
#    and/or other materials provided with the distribution.
# 
# 3. Neither the name of Google Inc. nor the names of its contributors
#    may be used to endorse or promote products derived from this software without
#    specific prior written permission.
# 
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Usage:  source install.sh
#
# This script installs all the packages required to build DeepConsensus.
#
# This script will run as-is on Ubuntu 20.04.
#
# We also assume that apt-get is already installed and available.

function note_build_stage {
  echo "========== [$(date)] Stage '${1}' starting"
}

# Update package list
################################################################################

# Install pip
################################################################################
python3 -m pip install --upgrade pip

# Update PATH so that newly installed pip is the one we actually use.
export PATH="$SCRATCH/venvs/deepconsensus_venv_1/bin:$PATH"
echo "$(pip --version)"

# Install python packages used by DeepConsensus.
################################################################################
python3 -m pip install -r requirements.txt
python3 -m pip install "intel-tensorflow>=2.4.0,<=2.7.0"

And here is the output from running that install script:

(deepconsensus_venv_1) [labueg@login04 /lustre/fs5/vgl/scratch/labueg/deepconsensus]$ source install_edit.sh 
Collecting pip
  Using cached pip-22.1.2-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.2.3
    Uninstalling pip-20.2.3:
      Successfully uninstalled pip-20.2.3
Successfully installed pip-22.1.2
pip 22.1.2 from /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages/pip (python 3.8)
Collecting numpy>=1.19
  Using cached numpy-1.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
Collecting pandas>=1.1
  Using cached pandas-1.4.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
Collecting tf-models-official<=2.7.0,>=2.4.0
  Using cached tf_models_official-2.7.0-py2.py3-none-any.whl (1.8 MB)
Collecting ml_collections>=0.1.0
  Using cached ml_collections-0.1.1.tar.gz (77 kB)
  Preparing metadata (setup.py) ... done
Collecting absl-py>=0.13.0
  Using cached absl_py-1.1.0-py3-none-any.whl (123 kB)
Collecting pysam
  Using cached pysam-0.19.1.tar.gz (3.9 MB)
  Preparing metadata (setup.py) ... done
Collecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1
  Using cached pytz-2022.1-py2.py3-none-any.whl (503 kB)
Collecting py-cpuinfo>=3.3.0
  Using cached py-cpuinfo-8.0.0.tar.gz (99 kB)
  Preparing metadata (setup.py) ... done
Collecting opencv-python-headless
  Using cached opencv_python_headless-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (48.3 MB)
Collecting tensorflow-datasets
  Using cached tensorflow_datasets-4.6.0-py3-none-any.whl (4.3 MB)
Collecting gin-config
  Using cached gin_config-0.5.0-py3-none-any.whl (61 kB)
Collecting tensorflow-hub>=0.6.0
  Using cached tensorflow_hub-0.12.0-py2.py3-none-any.whl (108 kB)
Collecting tensorflow-text>=2.7.0
  Using cached tensorflow_text-2.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
Collecting Cython
  Using cached Cython-0.29.30-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Collecting oauth2client
  Using cached oauth2client-4.1.3-py2.py3-none-any.whl (98 kB)
Collecting scipy>=0.19.1
  Using cached scipy-1.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting tensorflow-model-optimization>=0.4.1
  Using cached tensorflow_model_optimization-0.7.2-py2.py3-none-any.whl (237 kB)
Collecting pycocotools
  Using cached pycocotools-2.0.4-cp38-cp38-linux_x86_64.whl
Collecting kaggle>=1.3.9
  Using cached kaggle-1.5.12.tar.gz (58 kB)
  Preparing metadata (setup.py) ... done
Collecting tensorflow>=2.7.0
  Using cached tensorflow-2.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB)
Collecting matplotlib
  Using cached matplotlib-3.5.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting seqeval
  Using cached seqeval-1.2.2.tar.gz (43 kB)
  Preparing metadata (setup.py) ... done
Collecting tensorflow-addons
  Using cached tensorflow_addons-0.17.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
Collecting pyyaml>=5.1
  Using cached PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)
Collecting google-api-python-client>=1.6.7
  Using cached google_api_python_client-2.51.0-py2.py3-none-any.whl (8.6 MB)
Collecting psutil>=5.4.3
  Using cached psutil-5.9.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284 kB)
Collecting sacrebleu
  Using cached sacrebleu-2.1.0-py3-none-any.whl (92 kB)
Collecting Pillow
  Using cached Pillow-9.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
Collecting tf-slim>=1.1.0
  Using cached tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)
Collecting contextlib2
  Using cached contextlib2-21.6.0-py2.py3-none-any.whl (13 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5
  Using cached google_api_core-2.8.2-py3-none-any.whl (114 kB)
Collecting google-auth-httplib2>=0.1.0
  Using cached google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-auth<3.0.0dev,>=1.16.0
  Using cached google_auth-2.8.0-py2.py3-none-any.whl (164 kB)
Collecting httplib2<1dev,>=0.15.0
  Using cached httplib2-0.20.4-py3-none-any.whl (96 kB)
Collecting uritemplate<5,>=3.0.1
  Using cached uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)
Collecting certifi
  Using cached certifi-2022.6.15-py3-none-any.whl (160 kB)
Collecting requests
  Using cached requests-2.28.0-py3-none-any.whl (62 kB)
Collecting tqdm
  Using cached tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
Collecting python-slugify
  Using cached python_slugify-6.1.2-py2.py3-none-any.whl (9.4 kB)
Collecting urllib3
  Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
Collecting grpcio<2.0,>=1.24.3
  Using cached grpcio-1.47.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
Collecting tensorboard<2.10,>=2.9
  Using cached tensorboard-2.9.1-py3-none-any.whl (5.8 MB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Using cached tensorflow_io_gcs_filesystem-0.26.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB)
Collecting libclang>=13.0.0
  Using cached libclang-14.0.1-py2.py3-none-manylinux1_x86_64.whl (14.5 MB)
Collecting protobuf<3.20,>=3.9.2
  Using cached protobuf-3.19.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
Collecting keras-preprocessing>=1.1.1
  Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting opt-einsum>=2.3.2
  Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting google-pasta>=0.1.1
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting flatbuffers<2,>=1.12
  Using cached flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting h5py>=2.9.0
  Using cached h5py-3.7.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
Collecting gast<=0.4.0,>=0.2.1
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting wrapt>=1.11.0
  Using cached wrapt-1.14.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81 kB)
Collecting packaging
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting tensorflow-estimator<2.10.0,>=2.9.0rc0
  Using cached tensorflow_estimator-2.9.0-py2.py3-none-any.whl (438 kB)
Requirement already satisfied: setuptools in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorflow>=2.7.0->tf-models-official<=2.7.0,>=2.4.0->-r requirements.txt (line 3)) (49.2.1)
Collecting keras<2.10.0,>=2.9.0rc0
  Using cached keras-2.9.0-py2.py3-none-any.whl (1.6 MB)
Collecting astunparse>=1.6.0
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting termcolor>=1.1.0
  Using cached termcolor-1.1.0.tar.gz (3.9 kB)
  Preparing metadata (setup.py) ... done
Collecting typing-extensions>=3.6.6
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting dm-tree~=0.1.1
  Using cached dm_tree-0.1.7-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (142 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.4.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
Collecting pyparsing>=2.2.1
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting cycler>=0.10
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0
  Using cached fonttools-4.33.3-py3-none-any.whl (930 kB)
Collecting pyasn1-modules>=0.0.5
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting pyasn1>=0.1.7
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting rsa>=3.1.4
  Using cached rsa-4.8-py3-none-any.whl (39 kB)
Collecting colorama
  Using cached colorama-0.4.5-py2.py3-none-any.whl (16 kB)
Collecting regex
  Using cached regex-2022.6.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (764 kB)
Collecting portalocker
  Using cached portalocker-2.4.0-py2.py3-none-any.whl (16 kB)
Collecting tabulate>=0.8.9
  Using cached tabulate-0.8.10-py3-none-any.whl (29 kB)
Collecting scikit-learn>=0.21.3
  Using cached scikit_learn-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31.2 MB)
Collecting typeguard>=2.7
  Using cached typeguard-2.13.3-py3-none-any.whl (17 kB)
Collecting toml
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting etils[epath]
  Using cached etils-0.6.0-py3-none-any.whl (98 kB)
Collecting importlib-resources
  Using cached importlib_resources-5.8.0-py3-none-any.whl (28 kB)
Collecting promise
  Using cached promise-2.3.tar.gz (19 kB)
  Preparing metadata (setup.py) ... done
Collecting tensorflow-metadata
  Using cached tensorflow_metadata-1.9.0-py3-none-any.whl (51 kB)
Collecting dill
  Using cached dill-0.3.5.1-py2.py3-none-any.whl (95 kB)
Collecting wheel<1.0,>=0.23.0
  Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Collecting googleapis-common-protos<2.0dev,>=1.56.2
  Using cached googleapis_common_protos-1.56.3-py2.py3-none-any.whl (211 kB)
Collecting cachetools<6.0,>=2.0.0
  Using cached cachetools-5.2.0-py3-none-any.whl (9.3 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting charset-normalizer~=2.0.0
  Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting joblib>=1.0.0
  Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting werkzeug>=1.0.1
  Using cached Werkzeug-2.1.2-py3-none-any.whl (224 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Using cached tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
Collecting markdown>=2.6.8
  Using cached Markdown-3.3.7-py3-none-any.whl (97 kB)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Using cached google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting tensorboard-plugin-wit>=1.6.0
  Using cached tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
Collecting zipp
  Using cached zipp-3.8.0-py3-none-any.whl (5.4 kB)
Collecting text-unidecode>=1.3
  Using cached text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting importlib-metadata>=4.4
  Using cached importlib_metadata-4.12.0-py3-none-any.whl (21 kB)
Collecting oauthlib>=3.0.0
  Using cached oauthlib-3.2.0-py3-none-any.whl (151 kB)
Using legacy 'setup.py install' for ml_collections, since package 'wheel' is not installed.
Using legacy 'setup.py install' for pysam, since package 'wheel' is not installed.
Using legacy 'setup.py install' for kaggle, since package 'wheel' is not installed.
Using legacy 'setup.py install' for py-cpuinfo, since package 'wheel' is not installed.
Using legacy 'setup.py install' for seqeval, since package 'wheel' is not installed.
Using legacy 'setup.py install' for termcolor, since package 'wheel' is not installed.
Using legacy 'setup.py install' for promise, since package 'wheel' is not installed.
Installing collected packages: text-unidecode, termcolor, tensorboard-plugin-wit, sentencepiece, pytz, pysam, pyasn1, py-cpuinfo, libclang, keras, gin-config, flatbuffers, dm-tree, zipp, wrapt, wheel, werkzeug, urllib3, uritemplate, typing-extensions, typeguard, tqdm, toml, threadpoolctl, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, tabulate, six, rsa, regex, pyyaml, python-slugify, pyparsing, pyasn1-modules, psutil, protobuf, portalocker, Pillow, oauthlib, numpy, kiwisolver, joblib, idna, gast, fonttools, etils, dill, Cython, cycler, contextlib2, colorama, charset-normalizer, certifi, cachetools, absl-py, tf-slim, tensorflow-model-optimization, tensorflow-hub, scipy, sacrebleu, requests, python-dateutil, promise, packaging, opt-einsum, opencv-python-headless, ml_collections, keras-preprocessing, importlib-resources, importlib-metadata, httplib2, h5py, grpcio, googleapis-common-protos, google-pasta, google-auth, astunparse, tensorflow-metadata, tensorflow-addons, scikit-learn, requests-oauthlib, pandas, oauth2client, matplotlib, markdown, kaggle, google-auth-httplib2, google-api-core, tensorflow-datasets, seqeval, pycocotools, google-auth-oauthlib, google-api-python-client, tensorboard, tensorflow, tensorflow-text, tf-models-official
  Running setup.py install for termcolor ... done
  Running setup.py install for pysam ... done
  Running setup.py install for py-cpuinfo ... done
  Running setup.py install for promise ... done
  Running setup.py install for ml_collections ... done
  Running setup.py install for kaggle ... done
  Running setup.py install for seqeval ... done
Successfully installed Cython-0.29.30 Pillow-9.1.1 absl-py-1.1.0 astunparse-1.6.3 cachetools-5.2.0 certifi-2022.6.15 charset-normalizer-2.0.12 colorama-0.4.5 contextlib2-21.6.0 cycler-0.11.0 dill-0.3.5.1 dm-tree-0.1.7 etils-0.6.0 flatbuffers-1.12 fonttools-4.33.3 gast-0.4.0 gin-config-0.5.0 google-api-core-2.8.2 google-api-python-client-2.51.0 google-auth-2.8.0 google-auth-httplib2-0.1.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 googleapis-common-protos-1.56.3 grpcio-1.47.0 h5py-3.7.0 httplib2-0.20.4 idna-3.3 importlib-metadata-4.12.0 importlib-resources-5.8.0 joblib-1.1.0 kaggle-1.5.12 keras-2.9.0 keras-preprocessing-1.1.2 kiwisolver-1.4.3 libclang-14.0.1 markdown-3.3.7 matplotlib-3.5.2 ml_collections-0.1.1 numpy-1.23.0 oauth2client-4.1.3 oauthlib-3.2.0 opencv-python-headless-4.6.0.66 opt-einsum-3.3.0 packaging-21.3 pandas-1.4.3 portalocker-2.4.0 promise-2.3 protobuf-3.19.4 psutil-5.9.1 py-cpuinfo-8.0.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycocotools-2.0.4 pyparsing-3.0.9 pysam-0.19.1 python-dateutil-2.8.2 python-slugify-6.1.2 pytz-2022.1 pyyaml-6.0 regex-2022.6.2 requests-2.28.0 requests-oauthlib-1.3.1 rsa-4.8 sacrebleu-2.1.0 scikit-learn-1.1.1 scipy-1.8.1 sentencepiece-0.1.96 seqeval-1.2.2 six-1.16.0 tabulate-0.8.10 tensorboard-2.9.1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.9.1 tensorflow-addons-0.17.1 tensorflow-datasets-4.6.0 tensorflow-estimator-2.9.0 tensorflow-hub-0.12.0 tensorflow-io-gcs-filesystem-0.26.0 tensorflow-metadata-1.9.0 tensorflow-model-optimization-0.7.2 tensorflow-text-2.9.0 termcolor-1.1.0 text-unidecode-1.3 tf-models-official-2.7.0 tf-slim-1.1.0 threadpoolctl-3.1.0 toml-0.10.2 tqdm-4.64.0 typeguard-2.13.3 typing-extensions-4.2.0 uritemplate-4.1.1 urllib3-1.26.9 werkzeug-2.1.2 wheel-0.37.1 wrapt-1.14.1 zipp-3.8.0
Collecting intel-tensorflow<=2.7.0,>=2.4.0
  Using cached intel_tensorflow-2.7.0-cp38-cp38-manylinux2010_x86_64.whl (186.4 MB)
Requirement already satisfied: absl-py>=0.4.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.1.0)
Requirement already satisfied: flatbuffers<3.0,>=1.12 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.12)
Requirement already satisfied: google-pasta>=0.1.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (0.2.0)
Requirement already satisfied: numpy>=1.14.5 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.23.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.47.0)
Collecting tensorflow-estimator<2.8,~=2.7.0rc0
  Using cached tensorflow_estimator-2.7.0-py2.py3-none-any.whl (463 kB)
Requirement already satisfied: six>=1.12.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.16.0)
Requirement already satisfied: protobuf>=3.9.2 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (3.19.4)
Requirement already satisfied: opt-einsum>=2.3.2 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (3.3.0)
Requirement already satisfied: tensorboard~=2.6 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (2.9.1)
Requirement already satisfied: h5py>=2.9.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (3.7.0)
Requirement already satisfied: wheel<1.0,>=0.32.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (0.37.1)
Requirement already satisfied: termcolor>=1.1.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.1.0)
Requirement already satisfied: gast<0.5.0,>=0.2.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (0.4.0)
Requirement already satisfied: libclang>=9.0.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (14.0.1)
Requirement already satisfied: wrapt>=1.11.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.14.1)
Requirement already satisfied: astunparse>=1.6.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.6.3)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (1.1.2)
Collecting keras<2.8,>=2.7.0rc0
  Using cached keras-2.7.0-py2.py3-none-any.whl (1.3 MB)
Requirement already satisfied: typing-extensions>=3.6.6 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (4.2.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.21.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from intel-tensorflow<=2.7.0,>=2.4.0) (0.26.0)
Requirement already satisfied: markdown>=2.6.8 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (3.3.7)
Requirement already satisfied: requests<3,>=2.21.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (2.28.0)
Requirement already satisfied: setuptools>=41.0.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (49.2.1)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (1.8.1)
Requirement already satisfied: werkzeug>=1.0.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (2.1.2)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (0.4.6)
Requirement already satisfied: google-auth<3,>=1.6.3 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (2.8.0)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (5.2.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (4.8)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (0.2.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (4.12.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (2.0.12)
Requirement already satisfied: certifi>=2017.4.17 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (2022.6.15)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (1.26.9)
Requirement already satisfied: idna<4,>=2.5 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (3.3)
Requirement already satisfied: zipp>=0.5 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (3.8.0)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /lustre/fs5/vgl/scratch/labueg/venvs/deepconsensus_venv_1/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->intel-tensorflow<=2.7.0,>=2.4.0) (3.2.0)
Installing collected packages: tensorflow-estimator, keras, intel-tensorflow
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.9.0
    Uninstalling tensorflow-estimator-2.9.0:
      Successfully uninstalled tensorflow-estimator-2.9.0
  Attempting uninstall: keras
    Found existing installation: keras 2.9.0
    Uninstalling keras-2.9.0:
      Successfully uninstalled keras-2.9.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.9.1 requires keras<2.10.0,>=2.9.0rc0, but you have keras 2.7.0 which is incompatible.
tensorflow 2.9.1 requires tensorflow-estimator<2.10.0,>=2.9.0rc0, but you have tensorflow-estimator 2.7.0 which is incompatible.
Successfully installed intel-tensorflow-2.7.0 keras-2.7.0 tensorflow-estimator-2.7.0

I then ran run_all_tests.sh on a compute node via slurm and the full output is in this gist: https://gist.github.com/abueg/421ca972563b5c32825cde17525a49bf

There is this exception in the output, which leads me to think the installation failed:

Exception ignored in: <function Pool.__del__ at 0x7fe60613b820>
Traceback (most recent call last):
  File "/vggpfs/fs3/vgl/store/labueg/anaconda3/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/vggpfs/fs3/vgl/store/labueg/anaconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/vggpfs/fs3/vgl/store/labueg/anaconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/vggpfs/fs3/vgl/store/labueg/anaconda3/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/vggpfs/fs3/vgl/store/labueg/anaconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

Would this error arise from the tensorflow dependency problem, in which case should I try to install tensorflow 2.9.1? Or have I gone wrong somewhere else?

There is also this error before the exception: [E::idx_find_and_load] Could not retrieve index file for 'deepconsensus/testdata/human_1m/subreads_to_ccs.bam', but there is no index file for that file in the deepconsensus/testdata/human_1m/ directory, so should I index it prior to running the tests?

Any help would be appreciated, thank you in advance!

Chunking with deepconsensus

Hi,
Is there a way to do chunking with deepconsensus? For datasets where ccs was previously run and then chunks were merged, it would help, because then deepconsensus can do some form of chunking rather than having to redo ccs chunking back on the original subreads.

Missing majority of ZMWs after running lima to search for adapters

Hi, I'm not sure if you can help me on this but just want to raise this issue I've encountered after filtering adapters with lima. Not entirely sure why lima filtered out majority of the 'deepconsensus hifi reads'. Below is the script for the deepconsensus and lima:

DC

cmd4="module purge && module load deepconsensus/0.3.1 && deepconsensus run --checkpoint=/cluster/home/dc_model_0.3/checkpoint --ccs_bam=${outDir}/${outFilePrefix}.${SLURM_ARRAY_TASK_ID}.bam --subreads_to_ccs=${outDir}/${outFilePrefix%.ccs}.subreads_to_ccs.${SLURM_ARRAY_TASK_ID}.bam --output=${outDir}/${outFilePrefix%.ccs}.deepconsensus.${SLURM_ARRAY_TASK_ID}.fastq --cpus ${THREAD}"

Lima

lima --num-threads 84 --split-bam-named --same --ccs ${ID}.deepconsensus.fastq /cluster/home/lima_pbmarkdup/pb_pcr_adapter.fa ${ID}.deepconsensus.lima.fastq

Here is the output summary from lima:

ZMWs input (A) : 1925775
ZMWs above all thresholds (B) : 13042 (0.68%)
ZMWs below any threshold (C) : 1912733 (99.32%)
ZMW marginals for (C):
Below min length : 199 (0.01%)
Below min score : 648496 (33.90%)
Below min end score : 648496 (33.90%)
Below min passes : 0 (0.00%)
Below min score lead : 648496 (33.90%)
Below min ref span : 1912733 (100.00%)
Without SMRTbell adapter : 0 (0.00%)
ZMWs for (B):
With same pair : 13042 (100.00%)
Coefficient of correlation : 0.00%
ZMWs for (A):
Allow diff pair : 1925775 (100.00%)
Allow same pair : 1925775 (100.00%)
Reads for (B):
Above length : 13042 (100.00%)
Below length : 0 (0.00%)

Thank you, I appreciate your help!

parallelizing consensus calling

Hello,

Is it correct that deepConsensus currently only supports single thread execution at the moment?

I couldn't find a flag for threads python3 -m deepconsensus.scripts.run_deepconsensus --helpfull from the help section or from the https://github.com/google/deepconsensus/tree/main/deepconsensus.

Regards,
Sangjin

Missing about 30% of ZMWs in output

Hi,
I'm running the below and found that ~30% of ZMWs are missing from the deepconsensus FASTQ output, even though I see them in the input CCS bam and input subreads BAM:

deepconsensus_0.3.1.sif deepconsensus run --batch_size=1024 --batch_zmws=100 --cpus 4 --max_passes 20 --subreads_to_ccs=subreads.bam --ccs_bam=ccs.bam --checkpoint=/model/checkpoint

Is this expected behavior? Is there any way to see in the logs why many ZMWs are not in the output?

PS: I don't think this is due to deepconsensus output reads having lower quality than the threshold of Q20, because I'm using ccs BAM input with --min-rq=0.99. I know you recommend lower than that, but if anything, inputting ccs BAM with reads > ccs rq 0.99 should not have 30% of reads failing to have a consensus from Deepconsensus.

PPS: I manually ran deepconsensus on the ccs and subreadstoccs of one ZMW that failed to be output by deepconsensus and I got this: failed_quality_filter=1. In CCS, the rq of this ZMW was rq:f:0.994125.
Does Deepconsensus have a more stringent definition of read quality, such that it outputs fewer ZMWs than CCS?

download params.json but Connection refused.

When I download the model by commandgsutil cp -r gs://brain-genomics-public/research/deepconsensus/models/v0.3/model_checkpoint/params.json , I get the following error. How should I solve it, is there any other way to download params.json file？
INFO 0909 10:04:28.460050 retry_util.py] Retrying request, attempt #20... INFO 0909 10:05:00.531276 retry_util.py] Retrying request, attempt #21... INFO 0909 10:05:32.603370 retry_util.py] Retrying request, attempt #22... OSError: Connection refused.

DeepConsensus model is not accessed from China

Hi, I am a user from China and I can not access the model via gsutil. Would you please also place the model in github? The model should be small files?

DeepConsensus 0.3 --min_quality has no default?

Dear DeepConsensus developers,

thanks for the update. I really like the performance improvements of release 0.3.

Quick question: Why does the --min_quality flag is not set to 20?
Due to the great yield, it was a bit hard to detect that I got several reads with an error rate >> 2%.

I usually run DC (v03) only on reads that have 97%-99.9% consensus accuracy (for time reasons).
Without setting the --min_quality flag to 20 I do get the following results (filtered afterwards with >=Q20: pass, otherwise fail)

file                  num_seqs      sum_len  min_len   avg_len  max_len      Q1      Q2      Q3  sum_gap     N50  Q20(%)  Q30(%)
DC_OUT_fail.fq      33,456  511,999,446      577  15,303.7   49,575  12,407  14,738  17,771        0  15,831   81.25   54.75
DC_OUT_pass.fq      796,187  12,815,329,195      510  16,095.9   56,419  13,044  15,533  18,748        0  16,679   98.36   94.94

When I add the --min_quality 20 value to the deepconsensus run:
I get the expected results:

./m54345U_220708_140613_v03/DCIN_GE0.97_LE0.999/DC_OUT.deepconsensus.fq.gz  FASTQ   DNA     796,188  12,815,347,763      510  16,095.9   56,419    13,044  15,533    18,748        0  16,679   98.36   94.94

Just for completeness. A very very small fraction of the DC improved reads got a lower QV value compared to the input reads. When I try rescue those, that initially had >=Q20 I get the following "rescued" read stats:

./m54345U_220708_140613_v03/DCIN_GE0.97_LE0.999/DC_OUT.ccsRescued.fq.gz     FASTQ   DNA       2,136      31,701,185      573  14,841.4   38,323  11,889.5  14,319  17,213.5        0  15,400   95.24   90.55

Once again thanks for the great tool. I just posted this message so that others are aware of this behaviour.
The attached figure compares unfiltered QV values between pbccs v6.4 - deepconsensus v0.2 - deepconsensus v0.3 (from left to right) The red dots highlight reads <Q20.

Cheers,
Martin

Docker or Singularity

Hi,

First of all thank you for this amazing program.
I'm working in a cluster with the centos7. I tried to install the software several times, always with errors in the pip/Python distributions. Could you make available some docker and/or singularity for the deep consensus?
Best Regards
Andre

Alignment loss function

Thanks for the very interesting work.

I was wondering about the alignment loss used to train the model. It is clear that indels can shift the whole predicted sequence and then a loss like cross-entropy explodes by small mistakes. I thought a CTC loss would work in this scenario, but you developed a new alignment loss for this task. I was wondering if you could elaborate on why this alignment loss is needed or why CTC is not viable here.

Workflow questions

Hi, Thanks for a great tool. A few basic workflow questions:

Why do you use pbmm2 to align subreads to ccs, instead of just using the subreads BAM together with the CCS BAM files made by pbccs, and matching them using read names?
Can deepconsensus output BAM files instead of FASTQ?

Support FOFN original subread input, document CLI

Behavior I expected

Accept FOFN input for subreads (--input_subreads_unaligned=subreads.fofn).

Very frequently PacBio subread data is spread across directories and files according to flow cell and run architecture. This is the standard format in which PacBio reads are delivered to customers by service providers. The solution to this is the FOFN format (see for example PBMM2 documentation).

Behavior I observed

Command-line arguments are not documented. Unclear expected input of --input_subreads_unaligned; however instead I received an error that the FOFN file did not have a SAM header. Example workflow does show BAM input but does not otherwise describe inputs.

Reprocessing, merging, and housekeeping related to transformations on very large BAM files is a notable overhead and makes deepconsensus less useful.

Background

I am working with a rather large dataset (several TB) that involves combining across multiple PB BAM files from different flow cells. Therefore I have used the commonly used FOFN (file of file names) format as input to the PBMM2 step. Accepting FOFN is standard for PB tools.

I got to the deepconsensus step itself, however, before observing that BAM only appears to be supported for the unaligned input subreads.

I am currently using a workaround of pbmerge from the PB toolkit to prepare a single unmapped BAM file from my subreads. This single BAM can then presumably be passed to deepconsensus.

What would help

I suggest some options for addressing this issue, at various levels of effort:

Best: Explicitly accept FOFN input for unaligned subreads, fully document CLI including help strings.
Medium: Document CLI including help strings, update the quickstart.md to reflect the requirement for a single subread input BAM, including a pbmerge step for multiple BAM files.
Minimal: Update the quickstart.md to reflect the requirement for a single subread input BAM, including a pbmerge step for multiple BAM files.

google / deepconsensus Goto Github PK

deepconsensus's Introduction

DeepConsensus

Usage

ccs settings matter

Compute setup

Where does DeepConsensus fit into my pipeline?

For variant-calling downstream

For assembly downstream

How to cite

How DeepConsensus works

Installation

From pip package

From Docker image

From source

Disclaimer

deepconsensus's People

Contributors

Stargazers

Watchers

Forkers

deepconsensus's Issues

Install the requirement packages

Create an environment for deepconsensus using conda

Download the ACTC for reads mapping

Install the Deepconsensus[GPU] by using pip

Prepare all the needed input file for Deepconsensus

Get the ccs.bam

Get the subreads_to_ccs.bam

Tips

Chunking your subreads.bam

Get the model for Deepconsensus

Run the Deepconsensus

Merge the output

DC

Lima

Behavior I expected

Behavior I observed

Background

What would help

Recommend Projects

Recommend Topics

Recommend Org

`ccs` settings matter