Git Product home page Git Product logo

chiron's People

Contributors

allenday avatar biogeek avatar etheleon avatar haotianteng avatar nmiculinic avatar puneith avatar puneithk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chiron's Issues

Types mixed up in function parameters

In this line the function read_signal takes an argument normalize='median' by default. From looking through it's usage inside the function it seems it can only take on the value 'median or 'mean'.

However, when parameters are passed to this function on this line, it is passed a boolean value. Passing a boolean value like this means that the signal is not being normalised. This could possibly be a big issue?

Unknown exit perhaps related to tensorflow

Environment: Fedora23 x64, 32core server w/256GB RAM, no GPU (not a good one, ATI Radeon 5000 series?), on a fresh pyenv python environment -> 2.7.15.
Installed via pip:
pip install chiron
pip install tensorflow
pip install -e git+https://github.com/tqdm/tqdm.git@master#egg=tqdm
Run exits after signal processing with the following:

Subdirectory processing:: 1it [00:05,  5.42s/it]
2018-06-24 16:33:45.873239: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
chiron_recall/2016-04-04_CJ5:00, ?it/s]
Real time:2.250 Systime:0.213 Usertime:2.229

Run command:
chiron call -i /media/Data_2/Sequences/Nanopore/2016_04_04_CJ5-1/uploaded -o chiron_recall/2016-04-04_CJ5 --batch_size 1000 --beam 50

This may be a CPU threading issue is there a switch to specify gpu/cpu? I could not find it in the --help. -t 32 has no effect on error.

Illumina Data Availability

Hi Haotian,

Nice work on Chiron. It looks very promising. I was able to obtain the E.coli fast5 files from genomicsresearch.org, but I do not see the corresponding MiSeq data. It also does not seem to be on Genbank yet. Is it possible to post the MiSeq data for E.coli on the genomicsresearch.org site? Do you happen to know when it might be available on Genbank?

Thanks!

Cannot basecall anything

I'm trying to start Chiron, but I cannot basecall anything. I've started it with:
python chiron/entry.py call -i chiron/example_data/ -o /tmp/xxx cloned from my github (( since I had to submit few fixed for it to run without exceptions )) And I get /tmp/xxx with 5 empty directories (raw, ...)

What went wrong? How can I debug it further?

question about trainning

About nanoraw re-squiggle
I use this command line:
nanoraw genome_resquiggle /home/human/f5/ /home/human/fa/1.fasta --bwa-mem-executable /home/qhli/anaconda2/envs/tensorflow-gpu/bin/bwa --overwrite
It is right or not? This command would not report any errors.

chiron_ file_batch.py
I use this command line:
python /chiron/utils/file_batch.py --input /home/qhli/human/f5 --output /home/qhli/human/process --length 400 --mode dna
the result:
File batch transfer completed, 0 batches have been processed
2 files scussesfully read, 0 files failed.

But when i use the result of the re_ squiggled fast5, it would show no batch, which result in the result: ValueError: string_input_producer requires a non-null input tensor when I use the Chiron_train.py.

Therefore, could you please give me some advice? Thank you very much!

Potential bug in write_output

This line will result in an error if concise=true is passed to the function. The variable path_reads is only set if concise=false (which is the default).
Similarly issue with path_meta on this line.

Issues training a model

Hello,

I've been having trouble training a model with the latest git release. I've resquiggled my training set and extracted the signal/label files without any issue. However, when I try to run chiron_train.py, I end up with the following error:

python new_chiron/chiron/chiron/chiron_train.py -i $HOME/signal -o $HOME/log -m test

Traceback (most recent call last):
  File "/home/new_chiron/chiron/chiron/chiron_train.py", line 184, in <module>
    run(hparam.HParams(**args.__dict__)) 
  File "/home/new_chiron/chiron/chiron/chiron_train.py", line 125, in run
    train(hparam)
  File "/home/new_chiron/chiron/chiron/chiron_train.py", line 75, in train
    for_valid=False)
  File "/stornext/HPCScratch/home/new_chiron/chiron/chiron/chiron_queue_input.py", line 130, in inputs
    filename_queue = tf.train.string_input_producer(filenames)
  File "/home/.conda/envs/chiron_basecaller_env/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 217, in string_input_producer
    raise ValueError(not_null_err)
ValueError: string_input_producer requires a non-null input tensor

I tried running chiron_train_rcnn.py too, but that failed with the following error:

python new_chiron/chiron/chiron/chiron_rcnn_train.py -i $HOME/signal -o $HOME/log -m test

(200, 1, 400, 1)
Traceback (most recent call last):
  File "/home/new_chiron/chiron/chiron/chiron_rcnn_train.py", line 137, in <module>
    run(args)
  File "/home/new_chiron/chiron/chiron/chiron_rcnn_train.py", line 108, in run
    train()
  File "/home/new_chiron/chiron/chiron/chiron_rcnn_train.py", line 49, in train
    opt = model.train_step(ctc_loss, FLAGS.step_rate, global_step=global_step)
AttributeError: 'module' object has no attribute 'train_step'

Any advice would be appreciated! Thanks.

Issues in chiron_label

This module has a note at the top saying it is a temporary script. Is it still temporary or not? If so should we delete it?

If we are not deleting it there are multiple bugs in the function label.

  • The variable seq_length is used multiple times but is never declared.
  • This line calls the function get_label, but all this function does is return None.

AttributeError: 'tqdm' object has no attribute 'pos'

Environment: Fedora23 x64 , pyenv python environment -> 2.7.15.
Installed via pip:
pip install chiron
pip install tensorflow
got the following Traceback:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/site-packages/chiron/chiron_eval.py", line 267, in run_listener
    worker_fn()
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/site-packages/chiron/chiron_eval.py", line 239, in worker_fn
    for name in tqdm(file_list,desc = "Logits inferencing.",position = 0):
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 990, in __iter__
    self.close()
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 1087, in close
    self._decr_instances(self)
  File "/home/rbutler/.pyenv/versions/2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 446, in _decr_instances
    if inst.pos > abs(instance.pos):
AttributeError: 'tqdm' object has no attribute 'pos'

bare install of python, pip list:

Package           Version
----------------- ---------
absl-py           0.2.2
astor             0.6.2
backports.weakref 1.0.post1
bleach            1.5.0
chiron            0.4
enum34            1.1.6
funcsigs          1.0.2
futures           3.2.0
gast              0.2.0
grpcio            1.12.1
h5py              2.8.0
html5lib          0.9999999
mappy             2.11
Markdown          2.6.11
mock              2.0.0
numpy             1.14.5
pandas            0.23.1
patsy             0.5.0
pbr               4.0.4
pip               10.0.1
protobuf          3.6.0
python-dateutil   2.7.3
pytz              2018.4
scipy             1.1.0
setuptools        39.0.1
six               1.11.0
statsmodels       0.9.0
tensorboard       1.8.0
tensorflow        1.8.0
termcolor         1.1.0
tqdm              4.23.4
Werkzeug          0.14.1
wheel             0.31.1

NotFoundError when calling

I tried to call without further training but received a not Found Error. Extraction works fine, afterwards Chiron loads the checkpoint but can't find certain stuff in it.
pip version works fine.
Maybe checkpoints aren't mobile between systems or has the code changed till checkpoint?
Full Error Message in File attached:
chiron_error.txt

Cannot restore default model

commit: 0dbd1c6 ((current master as of now))

2018-04-28 15:55:11.432647: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key res_layer4/branch1/conv1/weights not found in checkpoint

My guess...somebody changed the default settings and things break.

python chiron/entry.py call -e fasta --beam 0 --batch_size 25 -i chiron/example_data/ -o /tmp/cch4

data available

how can I get the training data described in the paper, especially Ecoli data

No sequence in FASTQ output files when basecalling with own model

Describe the bug
When basecalling with own model, the segments files are empty and the result fastq files only contain the header, but not the sequence. The basecalling works with the default provided model. There were no errors when training a new model, and the stdout of training looks like this:

Step 0/100 Epoch 0, batch number 400, loss: 435.982 edit_distance: 3.430 edit_distance_val 3.208 Elapsed Time/step: 24.688
Step 10/100 Epoch 0, batch number 2600, loss: 237.681 edit_distance: 1.737 edit_distance_val 1.902 Elapsed Time/step: 6.834
Step 20/100 Epoch 0, batch number 4800, loss: 104.462 edit_distance: 0.761 edit_distance_val 0.792 Elapsed Time/step: 5.719
Step 30/100 Epoch 0, batch number 7000, loss: 61.681 edit_distance: 0.731 edit_distance_val 0.745 Elapsed Time/step: 5.278
Step 40/100 Epoch 0, batch number 9200, loss: 56.102 edit_distance: 0.819 edit_distance_val 0.820 Elapsed Time/step: 5.992
Step 50/100 Epoch 0, batch number 11400, loss: 55.746 edit_distance: 0.864 edit_distance_val 0.868 Elapsed Time/step: 7.603
Step 60/100 Epoch 0, batch number 13600, loss: 55.831 edit_distance: 0.869 edit_distance_val 0.867 Elapsed Time/step: 7.876
Step 70/100 Epoch 0, batch number 15800, loss: 55.905 edit_distance: 0.862 edit_distance_val 0.865 Elapsed Time/step: 7.416
Step 80/100 Epoch 0, batch number 18000, loss: 55.787 edit_distance: 0.862 edit_distance_val 0.864 Elapsed Time/step: 7.012
Step 90/100 Epoch 0, batch number 20200, loss: 54.375 edit_distance: 0.867 edit_distance_val 0.859 Elapsed Time/step: 6.740
Model ../tmp/chiron/training_out///////test_interactive_new saved.
Reads number 306575

The basecalling does not given an error and the stderr and stdout of basecalling look like this:

Subdirectory processing:: 1it [02:44, 164.03s/it]
INFO:tensorflow:Graph was finalized.████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [02:43<00:00, 12.20it/s]
INFO:tensorflow:Graph was finalized.
2018-09-11 09:16:06.717571: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-11 09:16:08.188803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:89:00.0
totalMemory: 22.38GiB freeMemory: 22.21GiB
2018-09-11 09:16:08.188856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-11 09:16:08.522758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-11 09:16:08.522813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-09-11 09:16:08.522823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-09-11 09:16:08.523328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21551 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:89:00.0, compute
capability: 6.1)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Restoring parameters from ../tmp/chiron/training_out/test_interactive_new/final.ckpt-100
INFO:tensorflow:Restoring parameters from ../tmp/chiron/training_out/test_interactive_new/final.ckpt-100
Logits(batches): 100.0%|########################################| 800/623
ctc(batches): 100.0%|########################################| 800/623
logits(files): 100.0%|########################################| 1999/1999
ctc(files): 100.0%|########################################| 1999/1999
../tmp/chiron/basecalling/train_small
Real time:23461.442 Systime:9469.731 Usertime:39756.178

Any advice on where to start looking where it goes wrong?

Test Issue

Test. I was unaware Issues cannot be deleted in Github. Pls ignore.

chiron_train.py

when run chiron_train.py to train the model,it reports the errors below.

(hn_py27) huangneng@bio9:~/Chiron/chiron$ python chiron_train.py -i /home/huangneng/data/chiron_data/ecoli/data.genomicsresearch.org/Projects/basecall/Ecoli-S10/file_batch/ -o ~/log/ -m train_test
WARNING:tensorflow:From /home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Filling queue with 500000 signals before starting to train. This will take some time.(200, 1, 512, 1)
2018-04-18 16:11:25.236146: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Model init finished, begin training.

Traceback (most recent call last):
File "chiron_train.py", line 185, in
run(hparam.HParams(**args.dict))
File "chiron_train.py", line 126, in run
train(hparam)
File "chiron_train.py", line 104, in train
loss_val, _ = sess.run([ctc_loss, opt], feed_dict=feed_dict)
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1125, in _run
self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 427, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 245, in for_fetch
return _ListFetchMapper(fetch)
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 352, in init
self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 253, in for_fetch
return _ElementFetchMapper(fetches, contraction_fn)
File "/home/huangneng/anaconda3/envs/hn_py27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 286, in init
(fetch, type(fetch), str(e)))
TypeError: Fetch argument <tensorflow.python.training.adam.AdamOptimizer object at 0x7f7ad7a7cfd0> has invalid type <class 'tensorflow.python.training.adam.AdamOptimizer'>, must be a string or Tensor. (Can not convert a AdamOptimizer into a Tensor or Operation.)

training data format

Hi hao,
when I read the codes, I feel a little confused about ".label" format of the training data. Whether each line in the label file is the bases of the whole sequence or the kmer ? If you don't mind, can you give me an example of training data. Thank you.

ImportError: No module named chiron

when calling "python entry.py" the Error "ImportError: No module named chiron" occurs.

the line "from chiron import chiron_eval" raises this. Changing it to "import chiron_eval" fixes this, but raises the same error in many other lines.

Tensorflow version

Hi,
been trying to install Chiron, unsuccessfully.
I tried both the git and the pip installation routine, both failed on the tensorflow requirement.
We have a cluster-wide installation for python 2.7.10 as well as 2.7.14 - in both cases, tensorflow=1.0.1 could not be found via pip.

pip search tensorflow:
tensorflow (1.6.0rc0)

And that breaks the installation.


Collecting chiron

  Using cached chiron-0.3.tar.gz

Collecting tensorflow==1.0.1 (from chiron)

  Could not find a version that satisfies the requirement tensorflow==1.0.1 (from chiron) (from versions: )

No matching distribution found for tensorflow==1.0.1 (from chiron)```

Did you train a BNLSTM model?

Hi, Haotian. Now the default model is based on LSTM model. Did you train a BNLSTM model? If so, can you share with me? Training a new model in 20000 steps needs so much time and when I trained a new model, it interrupted at 7340 steps maybe because of my hardware problem.

which function is used to extract raw signal data from fast5

Describe the requested feature

Additional context
Add any other context about the problem here.

hi:
I want to know which function is used to extract the raw signal and then trim the signal. Since the front part of raw signal is not related to the nucleotide, how do you trim the raw signal? which code file contains these operations. Thank you.

unable to run 'chiron/entry.py call'

I am getting a bunch of errors when I try to call using the sample data. I wonder if the pretrained model is missing?

Kind regards

Andy

(Chiron_env) $ gb
* master 548c493 [origin/master] Docstring in chiron_train module.

(Chiron_env) $ pwd
/Users/ad/workSpace/UCSC/nanoPore/Chiron

(Chiron_env) $ python chiron/entry.py call -i chiron/example_data/ -o ../Chiron.out/ 2>&1 | tee entry.py.call.error.out.txt

generates a lot of messages like

W tensorflow/core/framework/op_kernel.cc:993] Not found: Key block1dilate_layer1/dilate_branch/filter_branch/filter/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key block1dilate_layer1/dilate_branch/filter_branch/filter_bn/filter_bn_offset not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key block1dilate_layer1/dilate_branch/filter_branch/filter_bn/filter_bn_scale not found in checkpoint

eventually I get the following stack trace

Caused by op u'save/RestoreV2_12', defined at:
  File "chiron/entry.py", line 86, in <module>
    main()
  File "chiron/entry.py", line 81, in main
    args.func(args)
  File "chiron/entry.py", line 23, in evaluation
    chiron_eval.run(args)
  File "/Users/ad/workSpace/UCSC/nanoPore/Chiron/chiron/chiron_eval.py", line 228, in run
    time_dict = unix_time(evaluation)
  File "/Users/ad/workSpace/UCSC/nanoPore/Chiron/chiron/utils/unix_time.py", line 23, in unix_time
    function(*args, **kwargs)
  File "/Users/ad/workSpace/UCSC/nanoPore/Chiron/chiron/chiron_eval.py", line 147, in evaluation
    saver = tf.train.Saver()
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1040, in __init__
    self.build()
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1070, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 675, in build
    restore_sequentially, reshape)
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 402, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 242, in restore_op
    [spec.tensor.dtype])[0])
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 668, in restore_v2
    dtypes=dtypes, name=name)
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/ad/workSpace/pythonEnv/Chiron_env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Key block1dilate_layer1/dilate_branch/filter_branch/filter/weights not found in checkpoint
	 [[Node: save/RestoreV2_12 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_12/tensor_names, save/RestoreV2_12/shape_and_slices)]]

entry.py.call.error.out.txt

RNA model - no output

Hi,

I tried to use Chiron with the RNA model, but all output files are empty. It works fine with the DNA model though.

I looked into the source code, and in the evaluation function, it seems that the prediction from the CTC decoder is empty. I don't know where the problem comes from. Any idea ?

Thank you

Illumina E.coli S10 raw fastq availability

Hi Haotian,

Thank you for uploading the Ecoli S10 assemblies. However, I was hoping to get the raw fastq files. Would it be possible to upload those? I see the TB sequences on Genbank, but I do not yet see E.coli.

Unable to basecall with non-default model

Hello,

I’ve been trying to train Chiron (v0.3, GPU) with a custom dataset. I’ve created a model using chiron_rcnn_train.py (without any apparent issues), but basecalling against this model has been failing with the following error:

NotFoundError (see above for traceback): Key BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/biases not found in checkpoint [[Node: save/RestoreV2_2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_2/tensor_names, save/RestoreV2_2/shape_and_slices)]] [[Node: save/RestoreV2_35/_49 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_195_save/RestoreV2_35", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

I’ve examined the TF checkpoint file for the created model (with inspect_checkpoint.py) and it’s at odds with the provided DNA_default model. It’s missing the following fields:
BDLSTM_rnn/cell_0/bidirectional_rnn/bw/lstm_cell/biases (DT_FLOAT)[400] BDLSTM_rnn/cell_0/bidirectional_rnn/bw/lstm_cell/biases/Adam (DT_FLOAT) [400] BDLSTM_rnn/cell_0/bidirectional_rnn/bw/lstm_cell/biases/Adam_1 (DT_FLOAT) [400] BDLSTM_rnn/cell_0/bidirectional_rnn/bw/lstm_cell/weights (DT_FLOAT) [356,400] BDLSTM_rnn/cell_0/bidirectional_rnn/bw/lstm_cell/weights/Adam (DT_FLOAT) [356,400] BDLSTM_rnn/cell_0/bidirectional_rnn/bw/lstm_cell/weights/Adam_1 (DT_FLOAT) [356,400] BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/biases (DT_FLOAT) [400] BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/biases/Adam (DT_FLOAT) [400] BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/biases/Adam_1 (DT_FLOAT) [400] BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/weights (DT_FLOAT) [356,400] BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/weights/Adam (DT_FLOAT) [356,400] BDLSTM_rnn/cell_0/bidirectional_rnn/fw/lstm_cell/weights/Adam_1 (DT_FLOAT) [356,400] BDLSTM_rnn/cell_1/bidirectional_rnn/bw/lstm_cell/biases (DT_FLOAT) [400] BDLSTM_rnn/cell_1/bidirectional_rnn/bw/lstm_cell/biases/Adam (DT_FLOAT) [400] BDLSTM_rnn/cell_1/bidirectional_rnn/bw/lstm_cell/biases/Adam_1 (DT_FLOAT) [400] BDLSTM_rnn/cell_1/bidirectional_rnn/bw/lstm_cell/weights (DT_FLOAT) [300,400] BDLSTM_rnn/cell_1/bidirectional_rnn/bw/lstm_cell/weights/Adam (DT_FLOAT) [300,400] BDLSTM_rnn/cell_1/bidirectional_rnn/bw/lstm_cell/weights/Adam_1 (DT_FLOAT) [300,400] BDLSTM_rnn/cell_1/bidirectional_rnn/fw/lstm_cell/biases (DT_FLOAT) [400] BDLSTM_rnn/cell_1/bidirectional_rnn/fw/lstm_cell/biases/Adam (DT_FLOAT) [400] BDLSTM_rnn/cell_1/bidirectional_rnn/fw/lstm_cell/biases/Adam_1 (DT_FLOAT) [400] BDLSTM_rnn/cell_1/bidirectional_rnn/fw/lstm_cell/weights (DT_FLOAT) [300,400] BDLSTM_rnn/cell_1/bidirectional_rnn/fw/lstm_cell/weights/Adam (DT_FLOAT) [300,400] BDLSTM_rnn/cell_1/bidirectional_rnn/fw/lstm_cell/weights/Adam_1 (DT_FLOAT) [300,400] BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/biases (DT_FLOAT) [400] BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/biases/Adam (DT_FLOAT) [400] BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/biases/Adam_1 (DT_FLOAT) [400] BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/weights (DT_FLOAT) [300,400] BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/weights/Adam (DT_FLOAT) [300,400] BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/weights/Adam_1 (DT_FLOAT) [300,400] BDLSTM_rnn/cell_2/bidirectional_rnn/fw/lstm_cell/biases (DT_FLOAT) [400] BDLSTM_rnn/cell_2/bidirectional_rnn/fw/lstm_cell/biases/Adam (DT_FLOAT) [400] BDLSTM_rnn/cell_2/bidirectional_rnn/fw/lstm_cell/biases/Adam_1 (DT_FLOAT) [400] BDLSTM_rnn/cell_2/bidirectional_rnn/fw/lstm_cell/weights (DT_FLOAT) [300,400] BDLSTM_rnn/cell_2/bidirectional_rnn/fw/lstm_cell/weights/Adam (DT_FLOAT) [300,400] BDLSTM_rnn/cell_2/bidirectional_rnn/fw/lstm_cell/weights/Adam_1 (DT_FLOAT) [300,400] rnn_fnn_layer/bias (DT_FLOAT) [100] rnn_fnn_layer/bias/Adam (DT_FLOAT) [100] rnn_fnn_layer/bias/Adam_1 (DT_FLOAT) [100] rnn_fnn_layer/bias_class (DT_FLOAT) [5] rnn_fnn_layer/bias_class/Adam (DT_FLOAT) [5] rnn_fnn_layer/bias_class/Adam_1 (DT_FLOAT) [5] rnn_fnn_layer/weights (DT_FLOAT) [2,100] rnn_fnn_layer/weights/Adam (DT_FLOAT) [2,100] rnn_fnn_layer/weights/Adam_1 (DT_FLOAT) [2,100] rnn_fnn_layer/weights_class (DT_FLOAT) [100,5] rnn_fnn_layer/weights_class/Adam (DT_FLOAT) [100,5] rnn_fnn_layer/weights_class/Adam_1 (DT_FLOAT) [100,5]
Instead containing:
global_step (DT_INT32) [] logit_bias (DT_FLOAT) [5] logit_bias/Adam (DT_FLOAT) [5] logit_bias/Adam_1 (DT_FLOAT) [5] logit_weights (DT_FLOAT) [256,5] logit_weights/Adam (DT_FLOAT) [256,5] logit_weights/Adam_1 (DT_FLOAT) [256,5]
Any advice would be appreciated.

Thanks!

chiron-0.3.tar.gz and Pypi have invalid setup.py

  File "/private/var/folders/3_/7gn1zhl111q5c1cx1dvr2pt00000gp/T/pip-install-tjw5y3nx/chiron/setup.py", line 10
    """
      ^
SyntaxError: invalid syntax

Ok in current master, but your pip install instruction fail due to this error.

TensorFlow Version

Hi, Haotianteng.

Can you tell me the version of tensorflow you used to run Chiron? Or is it no matter about the versions of tensorflow? Based on my previous experience, the version of tensorflow is very important because different versions of tensorflow have different APIs. If the version of my tensorflow does not match yours, maybe I can't install your Chiron program.

OP_REQUIRES failed at save_restore_v2_ops.cc:184

Describe the bug
script fails with:
...
INFO:tensorflow:Restoring parameters from chiron/model/DNA_default/model.ckpt-209071
INFO:tensorflow:Restoring parameters from chiron/model/DNA_default/model.ckpt-209071
2019-02-12 20:07:14.446087: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key res_layer1/branch1/conv1_bn/offset not
found in checkpoint

To Reproduce
python chiron/entry.py call -i chiron/example_data/DNA/ -m chiron/model/DNA_default/ -o out/

Environment (please complete the following information):
Ubuntu
2xGTX2080ti
chiron: master branch: f21d7be
conda with recent packages

Additional context
Add any other context about the problem here.

problems about accuarcies

Dear Hao Tianteng,
I want to ask you two reasons about the training and the model have been trained.

  1. I use your DNA model to test. The testing function that I use is chiron_eval.py. And the testing data is provided by you: Lambda_eval_0001.signal and Lambda_eval_0001.label. But the identity rate is only 0.02%. After that, I use other data to test, some could reach 60%. But the overall performance is not satisfactory and can not reach the accuracy mentioned in the paper. Therefore, I want to ask the reasons.

  2. I try to train the new model. The dataset is pass.tar.gz provided by you. when I use the first method:chiron_rcnn_train.py, the loss decreases from 500 to 4 and it dose not seem to decrease. But the training model is not accurate. After that, I use the chiron_train.py. The loss decreases to about 90 and dose not decrease again.

Thank you for your patience to read my issues. I really look forward to your reply.

Bests,
Li Qiuhui

Master still broken on GPU

ceback (most recent call last):
  File "/usr/local/bin/chiron", line 11, in <module>
    load_entry_point('chiron==0.4', 'console_scripts', 'chiron')()
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/entry.py", line 90, in main
    args.func(args)
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/entry.py", line 25, in evaluation
    chiron_eval.run(args)
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/chiron_eval.py", line 361, in run
    time_dict = unix_time(evaluation)
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/utils/unix_time.py", line 21, in unix_time
    function(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/chiron_eval.py", line 211, in evaluation
    decode_predict_op, decode_prob_op, decode_idx_op, decode_queue_size = decoding_queue(logits_queue)
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/chiron_eval.py", line 317, in decoding_queue
    prob = path_prob(q_logits)
  File "/usr/local/lib/python3.5/dist-packages/chiron-0.4-py3.5.egg/chiron/chiron_eval.py", line 100, in path_prob
    logits_diff = tf.slice(top2_logits, [0, 0, 0], [bsize, seg_len, 1]) - tf.slice(
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 561, in slice
    return gen_array_ops._slice(input_, begin, size, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3125, in _slice
    name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 499, in apply_op
    repr(values), type(values).__name__))
TypeError: Expected int32 passed to parameter 'size' of op 'Slice', got [Dimension(1000), Dimen

When running on GPU. CPU appears to be doing something and not breaking, but it screams about exceeding 10% of system memory (( which makes no sense judging by htop, and the machine had 600GB of RAM ))

Parallel run & architecture warnings

Hello,

I was wondering if one can run basecalling in parallel? What are the expected run times? I'm calling a 300k read 2Gb Nanopore run and it's still not done after 24h.

I'm also getting some warnings after installing chiron with pip - would this matter for basecalling, or is it only relevant for training? Thanks!

tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

training and call use different model?

I find when do model training, the program will call the function run() in file chiron_rcnn_train.py. And when do base calling, the program will call the function run() in file chiron_eval.py. But the structure of the two neural nets in two files seem to be different. In chiron_rcnn_train, there are no RNN layers, rather there are RNN layers in chiron_eval.

Mutable Default Argument in CNN module

In function conv_layer the parameter stride on this line has a mutable default argument [1, 1, 1, 1]. Is this intentional?

This article explains why this is a problem with the following example:

def append_to(element, to=[]):
    to.append(element)
    return to

my_list = append_to(12)
print(my_list)
#[12]
my_other_list = append_to(42)
print(my_other_list)
#[12, 42]

A new list is created once when the function is defined, and the same list is used in each successive call.
Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.

If this is the functionality you want (i.e continual adding elements into that initial list) then that's fine, we can leave it as is.
But if you want the list to be redefined everytime the function is called, I would suggest changing it to this:

def conv_layer(indata, ksize, padding, training, name, dilate=1,
               strides=None, bias_term=False, active=True,
               BN=True, active_function='relu'):
    if strides is None:
        strides = [1, 1, 1, 1]

Function getcnnfeature()

Hi Teng, I am afraid we can not discuss at a closed issue section, so I create a new issue with a new name.

I know the getcnnfeature() function will be called whenever rnn layer>0 or not. But from my opinion, getcnnfeature() function does't include any operations such as matrix multiply or convolution operations just some reshape operations. Can getcnnfeature() function get CNN output from signal input?
Sincerely,
Qian

Error with basecalling with non-default model

Hello,

I have been training Chiron v.0.4.2 with a custom dataset I’ve created a model using chiron_rcnn_train.py (without any apparent issues), but basecalling against this model has been failing with the following error. I have trained the model with 20000 steps and attached the log output. In addition, I have modified the checkpoint file to: model_checkpoint_path: "final.ckpt-20000" . The command that I am using to basecall with the non-default model is chiron call -i /n/scratch2/dct7/Run12_raw_reads/1 -o test_model --model /home/dct7/chiron/lib/python2.7/site-packages/chiron/model/HIV
Any suggestions?

Thanks,
Damien
training-30411904.txt

WARNING:tensorflow:From /home/dct7/chiron/lib/python2.7/site-packages/chiron/chiron_eval.py:399: __init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.datamodule. WARNING:tensorflow:From /home/dct7/chiron/lib/python2.7/site-packages/chiron/chiron_eval.py:400: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use thetf.datamodule. 2019-01-03 07:10:12.480454: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-01-03 07:10:26.695457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:08:00.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-01-03 07:10:26.933233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:09:00.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-01-03 07:10:26.933596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1 2019-01-03 07:10:27.621913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-01-03 07:10:27.621946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2019-01-03 07:10:27.621951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y 2019-01-03 07:10:27.621954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N 2019-01-03 07:10:27.622517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10757 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:08:00.0, compute capability: 3.7) 2019-01-03 07:10:27.623001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10757 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:09:00.0, compute capability: 3.7) WARNING:tensorflow:From /home/dct7/chiron/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py:804: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use thetf.data module. WARNING:tensorflow:ParseError: 1:24 : Expected string but found: '\xe2' WARNING:tensorflow:/home/dct7/chiron/lib/python2.7/site-packages/chiron/model/HIV/checkpoint: Checkpoint ignored 2019-01-03 07:12:27.951255: W tensorflow/core/kernels/queue_base.cc:285] _1_fifo_queue: Skipping cancelled dequeue attempt with queue not closed 2019-01-03 07:12:27.951701: W tensorflow/core/kernels/queue_base.cc:285] _1_fifo_queue: Skipping cancelled dequeue attempt with queue not closed 2019-01-03 07:12:27.954637: W tensorflow/core/kernels/queue_base.cc:285] _1_fifo_queue: Skipping cancelled dequeue attempt with queue not closed 2019-01-03 07:12:27.954767: W tensorflow/core/kernels/queue_base.cc:285] _1_fifo_queue: Skipping cancelled dequeue attempt with queue not closed 2019-01-03 07:12:27.954779: W tensorflow/core/kernels/queue_base.cc:285] _1_fifo_queue: Skipping cancelled dequeue attempt with queue not closed 2019-01-03 07:12:27.955063: W tensorflow/core/kernels/queue_base.cc:285] _1_fifo_queue: Skipping cancelled dequeue attempt with queue not closed ('model_default_path', '/home/dct7/chiron/lib/python2.7/site-packages/chiron/model/DNA_default') CNN output has the segment length 300, and 256 channels Traceback (most recent call last): File "/home/dct7/chiron/bin/chiron", line 11, in <module> sys.exit(main()) File "/home/dct7/chiron/lib/python2.7/site-packages/chiron/entry.py", line 91, in main args.func(args) File "/home/dct7/chiron/lib/python2.7/site-packages/chiron/entry.py", line 26, in evaluation chiron_eval.run(args) File "/home/dct7/chiron/lib/python2.7/site-packages/chiron/chiron_eval.py", line 408, in run time_dict = unix_time(evaluation) File "/home/dct7/chiron/lib/python2.7/site-packages/chiron/utils/unix_time.py", line 21, in unix_time function(*args, **kwargs) File "/home/dct7/chiron/lib/python2.7/site-packages/chiron/chiron_eval.py", line 233, in evaluation saver.restore(sess, tf.train.latest_checkpoint(FLAGS.model)) File "/home/dct7/chiron/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1534, in restore raise ValueError("Can't load save_path when it is None.") ValueError: Can't load save_path when it is None.

run entry.py call report error

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [Ubuntu 18.04]
  • GPU [GTX 1080]
  • Chiron Version [0.4.2]
  • Tensorflow Version [1.8.0-gpu]
  • Python Version [2.7]
  • Other related packages version

Additional context
Add any other context about the problem here.

['call', '-i', 'chiron/example_data/DNA/', '-o', 'exp', '-m', '/home/tiger/codes/Chiron/chiron/model/DNA_default']
('model_default_path', '/home/tiger/codes/Chiron/chiron/model/DNA_default')
Subdirectory processing:: 0it [00:00, ?it/s] INFO:root:chiron/example_data/DNA/read5.fast5 has no reference. | 0/5 [00:00<?, ?it/s]

Traceback (most recent call last):
File "chiron/entry.py", line 99, in
main()
File "chiron/entry.py", line 91, in main
args.func(args)
File "chiron/entry.py", line 24, in evaluation
extract(FLAGS)
File "/home/tiger/codes/Chiron/chiron/utils/extract_sig_ref.py", line 77, in extract
if (FLAGS.test_number is not None) and (count >=FLAGS.test_number):
AttributeError: 'Namespace' object has no attribute 'test_number'

Training issues about new version of Chiron

Hi Teng,

Thanks for your previous advice and updates. I used wget command to download your data.genomicsresearch.org/Projects/basecall/Human_CHR19/file_batch data. After that, I used "python chiron/chiron_train.py -i /home/louqian/Chiron/traindata/ -o /home/louqian/Chiron/trainlog/ -m train_test
" to train my new model. But there is a error and I attached the output on error1.txt.
error1.txt. I found that the definition of inference function in chiron_model file has five parameters, but there are 4 parameters in chiron_train file when calls this inference function. Maybe this is a bug. I also tried another trainning method which uses chiron_rcnn_train.py. I used chiron/utils/raw.py to generate tf.records , signal and label data. But I can't still train it. Can you try these two methods again? Any advice will be appreciated.

Sincerely,
Qian

Issue with chiron_train.py

Hello,

Thanks so much for the advice before! I've used file_batch.py to transfer my training set of fast5 files into batch, but I'm still running into issues with chiron_train.py (the segment length is set to 512 in both cases).

python chiron_train.py --data-dir new_fast5 --log-dir model_update --model-name bb12_new

Filling queue with 500000 signals before starting to train. This will take some time.(200, 1, 512, 1)
Traceback (most recent call last):
  File "chiron_train.py", line 184, in <module>
    run(hparam.HParams(**args.__dict__)) 
  File "chiron_train.py", line 125, in run
    train(hparam)
  File "chiron_train.py", line 81, in train
    step = opt.minimize(loss,global_step = global_step)
NameError: global name 'loss' is not defined

error

Hello,
I keep getting this error, I would appriciate if I can get some help.

(chiron-env) -bash-4.2$ python chiron/chiron_rcnn_train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
(400, 1, 300, 1)
[400, 300, 256]
Traceback (most recent call last):
File "chiron/chiron_rcnn_train.py", line 131, in
run(flags)
File "chiron/chiron_rcnn_train.py", line 115, in run
train()
File "chiron/chiron_rcnn_train.py", line 70, in train
opt = train_step(ctc_loss,global_step = global_step)
TypeError: train_step() takes at least 2 arguments (2 given)
(chiron-env) -bash-4.2$

Chiron base calling error using new model file

ok got it "Fixed it by modifying chiron_eval.py file". thanks

Hi,

I am using TensorFlow version 1.1.0 and Chiron in a virtual env

I have successfully trained Chiron on our own nanopore sequencing data (RNA) and facing an error when trying to do the base-calling using trained model.

I am getting following error when I do the base calling using my trained model.

I am guessing this is a possible error due to change in TensorFlow version and model file output format.
or am I missing something else

Please suggest a workaround.

Following is the complete error log.

NotFoundError (see above for traceback): Key BDLSTM_rnn/cell_2/bidirectional_rnn/bw/lstm_cell/biases not found in checkpoint
[[Node: save/RestoreV2_8 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_8/tensor_names, save/RestoreV2_8/shape_and_slices)]]
[[Node: save/RestoreV2_43/_35 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_250_save/RestoreV2_43", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Any help will be greatly appreciated
Raja

training problem

I use the given dataset from the website which named "pass.tar.gz" and using file_batch.py to generate the batch file. Then I run chiron_train.py to train the model. Besides the file directory parameters, all other parameters I use the default setting. It always output No valid path found and the loss is Inf. Can you help me how to solve the problem? Thanks

base call failed after training

Describe the bug
Dear Developer,
i`m trying to train chiron on some data for my model organism. this is a fungi and the characteristics is that there is not methylation at all in the genome

To Reproduce
these are the command that I used to create the model and run the model

tombo resquiggle fast5/ Fus.contigs.polished.methilated.SRR1609157.sorted.vcf.fasta --processes 4 --num-most-common-errors 5 --dna --overwrite
python3 /usr/local/lib/python3.5/dist-packages/chiron/utils/raw.py --input fast5/ --output b

python3 /usr/local/lib/python3.5/dist-packages/chiron/chiron_rcnn_train.py --data_dir chiron_output --log_dir modelFus --model_name Fus -n 20000 &> logTraining

python3 /usr/local/lib/python3.5/dist-packages/chiron/entry.py call -i fast5/ -o chiron_call/ -m Fus

However, after that the last step generated about 40k reads, I tried to map the reads on the genome and to my surprise only 3 reads where mapped to the genome out of 40k. I think that something went wrong somewhere. Can you place help me?

Environment (please complete the following information):

  • OS: 22~16.04.1-Ubuntu SMP
  • GPU Tesla P100-PC
  • Chiron Version 0.4.2
  • Tensorflow Version 1.12.0rc0
  • Python Version python 3.5

Additional context
I used the latest version of tombo to resquiggle the reads

basecall_group

the default basecall_group changed to cwDTWCorrected_000, but none of the provided sample fast5 at https://data.genomicsresearch.org/Projects/train_set_all/ have this key.

Also, within the description of the basecall_group parameter, "Basecall_1D_000" is described as default. This key does not provide "read_start_rel_to_raw", but "duration" and "start_time". Maybe "start_time" is the same here? For me, "RawGenomeCorrected_000" is the only option that works. Is there somewhere a documented distinction between the options for this parameter?

from the provided (34,383) sample fast5 files, these are the numbers of keys present within:
default(cwDTWCorrected_000): 0
RawGenomeCorrected_000: 29,849
Basecall_1D_000: 34,383 (all)

Attention decoder implementation

So, when you implemented attention decoder, you just inserted in the attention_loss as prediction error?

Another question, in attention_loss function, you describe "label_len:[batch_size] label length, the symbol is included." Do we have to include "end" symbol at the end of each label batch as we pass to this command?

Is it possible to provide a training code with attention mechanism used? I would like to make sure I use the chiron model with attention right. (I want to see how much of a difference there is with attention and non-attention mechanism and see mathematically what is causing that). This would help me GREATLY!

Plus I face this error:

WARNING:tensorflow:From /home/conjugacy/Downloads/project/utils/attention.py:87: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
Traceback (most recent call last):
File "train_test.py", line 104, in
error = prediction(logits, seq_length, y)
File "train_test.py", line 82, in prediction
error = attention_loss(logits,seq_length,label,FLAGS.batch_size)
File "/home/conjugacy/Downloads/project/utils/attention.py", line 26, in attention_loss
logits,attention_list=attention_decoder(inputs,seq_len,cell,max_label_len=max_seq_len,label=label_dense)
File "/home/conjugacy/Downloads/project/utils/attention.py", line 183, in attention_decoder
y_hat = tf.one_hot(label[:,index],depth=5,on_value=1.0,off_value=0.0,axis=-1)
TypeError: 'SparseTensor' object is not subscriptable

Originally posted by @nbathreya in #75 (comment)

attention.py

Hi. I am very new to tensorflow, ML and chiron. I had few basic questions about the implementation of the attention in chiron model.

  • Is this used instead of rnn layers or after all bi-directional rnn layers or is it used with one rnn to encode and then attention to decode and minimize its loss?
  • In attention.py, the attention_decoder outputs logits and attention_list which is taken to calculate the loss. However, the attention_loss function outputs another attention_list. what is this list?
  • Is the loss from attention_loss used directly into the adam optimizer or a ctc loss is further calculated and then fed into adam optimizer?

It would be of the greatest help if I could get some feedback from you. I look forward to hearing back from you as soon as possible.

Nagendra

Test

| Device | Greedy decoder rate(bp/s) | Beam Search decoder rate(bp/s), beam_width=50 | | --- | --- | --- | | CPU | 21 | 17 | | GPU | 1652 | 1204 |

appear at different stage: AttributeError: 'tqdm' object has no attribute 'pos'

Hi Haotian,

I ran into the following error and it seems to pop up at different time during multiple trials of

$ chiron call -i 736688-175.fast5/ -o ./ -m ~/h.liu/scripts/chiron/chiron/model/DNA_default/ -b 4  -t 6

/home/913/hl0222/.local/lib/python2.7/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Subdirectory processing:: 1it [01:04, 64.80s/it]
2018-07-18 16:32:16.327732: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-18 16:32:16.563024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:84:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-07-18 16:32:16.731698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-07-18 16:32:16.732188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-07-18 16:32:16.732255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2018-07-18 16:32:16.732287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y
2018-07-18 16:32:16.732309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y
2018-07-18 16:32:16.732338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:84:00.0, compute capability: 3.7)
2018-07-18 16:32:16.732363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
Logits inferencing.:   0%|          | 4/3996 [05:31<91:56:43, 82.92s/it]                  Exception
Logits inferencing:   0%|          | 0/83 [00:00<?, ?it/s] in <bound method tqdm.__del__ of CTC decoding.:   0%|          | 3/3996 [05:31<122:37:42, 110.56s/it]> ignored7, 27.79s/it]Traceback (most recent call last):
  File "/home/913/hl0222/.local/bin/chiron", line 11, in <module>]decoded_q=0, logits_q=0]
    load_entry_point('chiron==0.4', 'console_scripts', 'chiron')()
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/chiron/entry.py", line 91, in main
    args.func(args)
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/chiron/entry.py", line 26, in evaluation
    chiron_eval.run(args)
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/chiron/chiron_eval.py", line 398, in run
    time_dict = unix_time(evaluation)
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/chiron/utils/unix_time.py", line 21, in unix_time
    function(*args, **kwargs)
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/chiron/chiron_eval.py", line 307, in evaluation
    break
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/tqdm-4.23.4-py2.7.egg/tqdm/_tqdm.py", line 878, in __exit__
    self.close()
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/tqdm-4.23.4-py2.7.egg/tqdm/_tqdm.py", line 1087, in close
    self._decr_instances(self)
  File "/home/913/hl0222/.local/lib/python2.7/site-packages/tqdm-4.23.4-py2.7.egg/tqdm/_tqdm.py", line 446, in _decr_instances
    if inst.pos > abs(instance.pos):
AttributeError: 'tqdm' object has no attribute 'pos'

Although it seems to be tqdm specific issue. I have no clue to fix it.
Hope you can help me sort it out.

Thanks a lot.

Huanlee

Training Data

I found the supporting data at gigadb.org (http://gigadb.org/dataset/100425, which is falsely linked at academic.oup.com, by the way).
I find signal and label data, but a mandatory tfrecords file appears to be missing, or can I use any tfrecord file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.