yasas1994 / jaeger Goto Github PK
View Code? Open in Web Editor NEWJaeger is a quick and precise tool for detecting phages in sequence assemblies.
License: MIT License
Jaeger is a quick and precise tool for detecting phages in sequence assemblies.
License: MIT License
A weird bug I just encountered that is specific to only to some length of input sequences
import jaegeraa.lib
model=jaegeraa.lib.Predictor()
input = 'GTAACCGTCGAGAGAGATTTGGCTATGCCAATGAGCCCTGAGGATCCCGCTAGTGGTTTACTGTTCACTATTGGGGATATTCATGAGAATGGCAGGAATGTGAGCGTGGTTGAGAGTAGGTTGCTCAATGGTCGAGTGCCTTTTAGAGCTGGAGACTTACGAAACATGAGCTACAATTACTTTATGGAGTTCGTGAGGATCTACGCAACTATCTATATGGAGAATCAACAGCAACTCGTGGCTAAGCTTTCAGGAGATGATTACGAAAGCTCTTCATCATCGTTTCCCGAGAATGAGGAATTGGAATTTGACTTCCTAGCCCAAGCACACAATGGTGTGTACCTAACGATAGAGGAAGTTGTAGCTAAATTTGAGTCAATGAAATTCTCGGGAAAACAACTCAATGCTGAAATTGAAAAATTCGAAAGAATTGGAGTTGATGGATGGAGAACTAACAAAGCTCTCTCCTTTAATGATTTGGTCAAAAGGTTTTGTGGATGCTGCTTAGGTGATGACTGTAACTTTGATTTCCACTATCGAACTTTATTCAAAGTGCTAATAGAGAATAAGCAAATCCCAGCCTACAAGTGTATGGTTCTCCATAAAGTGAATCCAGATAGAATGAAGACTCAGATAAAGATGGTGAACGGGTACACTTTGGAAACAATGTTTAAGACTTTGAACCCTCTCACCATTTTCTTATATCTGGTTTTTGTGCTGAAATGTGGTATTAGTGCCGACAATGTATGTTTATCGTACCAATTATTTGCTATGAATGACGCAGAGCAAGTTGAATTTGAAATTGAAGATTCTTTGCGTCTGGATGAACAGGTACAAATTGGTCAATACTCATGCTATGTTTGGCCTAGTGTCGGAAAATTCTATCCGGAAATTCTGGCGAAGAGAGGTTGCATTGCTGTGAATGATGGAACTACATTTTATATTTTCGTTTCAAGTTCACAGATAGATAAAATTCACCCAGAAGCAGCGTGGTCGGATATGCTACAAGGAGTAGGCAGAAGAGGAGTCGATATTTTAAGTATAGCTGGTCCAACAAAAACCAAGTTTCTGATAAAACATGTGGAAAGTTGTTACGAAACTCTTAAGAGTCCGGAAGATTGGAAAGCTAAATGCAAAGAGTACTATGAGTCCATAAGCTTATATGAGTACATTCTCTTACTGATGGCAGTTGGGTCTCGAGCTGGAATTGAAACCCAGAGGATGAGTAAATATCAGGCCCGAAAGAACAAAATTAGAATGCCAGAAGTGTTGGAGAAGTACATTGAAGTTGAGAAAGCGACCATAGGAAAGCTGTCAAAACCAGCCAAGACCTGTCTAGCAATTGGTGCCGGAGTGGCTATTTTTGGAGTTCTAGCGGGGCTAGGAGTCGGTCTATATAAATTGATAACTCATTTTTCTAAGACCGACTCAGAAGACAATGACATTGAAATAGATGATCTAGTCCCGGAGATGAGTGGAGCTCATGCTTCTGATGAGAATGTTACCACATATGCTGTCAGGAGACAAGTTCCAAAGGTGCGACTAGCCAAACAATTCAAAGTTCGCTCGTCACCAAGCCCATCAGACAATGAACAACCAAAAGTAGATATTCTAGTGCCTGAAATGACAGGGTGCCATGCCAGTGATGAACACCTCACCAAGCATTTTACAAAAAGGAGAGTCACCATGAAGAGAGTTGGAGCTGTCAAGGAATCACACATTGTGACATATGACGAGAATACTCCACATGTGAGACTCATCAGAAATCTGAGAAGAACACGCTTGGCGAGAGCTATTAAGCAAATGGCACAACTTGGAGAACTACCGGACACATTGTCAGAAATTCAAGTGTGGCAACAATATGTAGTGGACAAAGGTATCAGACCAGCTGAACATACAACAGATTTTAGACTCTTCTCAGCTATAGCTGATCAGGAACAAGAGGATCCAGAAGAAATCAATATGGCGAGTGGAGAAACGATGAAATTTGACGAAAACAAGTACAATGAGATAGTCCAAGTCGTCAAAGGGATATCGCCAACTAAATCTGACATAGTGACAATGACTACTAAAGGAGCCCACCATACGGCGATCAAGCAGGTTCGAATTGGATACAAAAGTTTAGACAAGGATCCGAATATGGTGAGCATACTTTCTAACCAACTAACCAAAATTAGTTGTGTAATTTTGAACGTGACTCCTGGTAGAACGGCGTACCTAAACGTCATGAGGTTGTGTGGGACATTTGTTGTGTGCCCAGCCCATTATCTAGAAGCTCTAGAAGAGGATGACACGATTTACTTCATATCCTTTTCTGTCTGTATTAAACTCAGATTTCAACCAGACAGAGTGACATTAGTCAACACTCATCAAGATCTTGTAGTGTGGGATTTGGGTAATTCAGTACCACCGGCTATTGACGTTTTGAGCATGATACCAACCGTGGCAGATTGGGACAAGTTTCAAGATGGCCCTGGTGCTTTTGGTGTGACAAAGTACAATGCTCGGTATCCAACAAATTACATAAATACTCTTGATATGATTGAGAGAATCCGAGCCGACACTCAGAACCCCACGGGCATATACAAAATGCTCAACTCCGATCACACAATCACCACAGGTCTTAGATATCAGATGTACTCATTAGAAGGATTCTGTGGTGGGCTGATACTACGGGCTTGCACTAGAATGGTTAGAAAGATTGTGGGACTTCATGTAGCTGCTAGTGCAAATCACGCTATGGGATATGCAGAATGTCTGGTGCAAGAAGATCTTAAACATGCTATAAATAAGCTGTCACCAGATGCAAGGAGTTTAATTATCGGACATCTCAATCCCAAAGTAGAAACAGCCACAAAACAGTGTGGAATTGTGAGGAGCCTTGGAAGTCTAGGGTGCCACGGAAAGGTTACAAGTGAGGACGTGGCGATGACTGCAACAAAGACCACGATCAGAAAGTCTAGAATTTATGGTCTTGTTGGAGATATCAAAACAGAACCCTCAATTTTACATGCTCATGACCCACGTCTCCCTGAGGATCAGATTGGAAAGTGGGACCCAGTGTTTGAAGCTGCCTTGAAGTATGGAACAAGAATAGAACCATTCCCCATTGAAGAAATTCTTGAAGTGGAAGATCATTTATCTATTATACTTAAAGGCATGGACAATACTCTCAAGAAAAGAAATGTCAACAATCTTGAAGTTGGGATAAACGGAATAGATCAATCAGATTATTGGCTTCAGATAGAGACAAATACTTCTCCTGGGTGGCCCTACACAAAAAGAAAACCGAAGGGAGCTGAAGGAAAGAAATGGTTGTTCAAAGAGGTTGGGAACTACCCCTCCGGGAAACCCATTCTAGAAATGGAGGACTCAGGACTCATTGAGAGCTACAATAAAATGTTGAGAGATGCCAAACAGGGTGTAGCTCCCATTGTGGTTACTGTGGAGTGCCCAAAAGATGAACGCAGAAAGTTAAGTAAGATCTACGAACAACCAGCCACCAGGACTTTCACGATTCTCCCGCCTGAAATAAACATTCTCTTTAGGCAATATTTTGGTGACTTTGCCGCCATGATAATGACTAATAGATCAAAATTATTCTGTCAGGTTGGGATAAATCCAGAGAATATGGAATGGAGTGATCTAATGCATGAGTTCCTCCACAAGTCAACACATGGCTTTGCTGGAGACTACTCAAAATTTGATGGAATTGGAGATCCTCAGATTTATCATTCCATAACTCAGGTGGTAAATAACTGGTACGATGATGGGGAAGAAAATGCCAGGACACGTCACGCACTAATTAGTAGTATAATACATAGAGAGGGTATAGTTAAGGAGTATCTTTTCCAGTATTGTCAGGGAATGCCTTCTGGTTTTGCCATGACAGTCATTTTCAACTCCTTCGTGAATTATTACTATTTAGCTATGGCGTGGATGAATTTAATCTCACACTCACCATTGAGTCCCCAATCCACGGTTAGAGATTTCGACAACTATTGTAAGGTAGTAGTTTATGGGGACGATAACATAGTTTCAGTAGATTTGAACTTTCTAGAATATTACAACCTTAGGACTGTAGCAGCTTATTTGTCTCAATTTGGAGTAACGTACACAGATGACGCAAAGAATCCGATTGAGAAAAGTGTGCCTTTCGTAGAAATAACTTCTGTTTCATTTCTTAAGCGTAGGTGGGTGCCCTTGGGTGGAAGACTTTCAACTATTTACAAGGCACCTTTGGACAAAACTAGCATAGAGGAGCGCCTTCATTGGATAAGGGAGTGCGATAATGACATCGAAGCTCTCAATCAGAATATTGAAAGCGCCCTATATGAAGCAAGCATTCATGGAAAGATCTACTTTGGTGATCTCCTTCAGAGGATCCGGATTGCTTGTGACGCTGTGATGATCCCAGTTCCATCAGTAACATTTAAGGATTGTCACAAAAGGTGGTGGGCTTCCATGACTGGAGGAGCTTTAGATCCAGCTAGTCTAAGTCGGTTGTACTTGGCCGCCGAGAACCAGTTGGTCGACACTCGGAAAGTGTGGAAAGATCGCTTCCTTGGTGAGGATAGGTCTTTAATAGACATGCTGAAGTCAGCTCGTGCTGTTCCTCTAGCTGCCTATCATGTATAAGCCTCACGACTCTGTGCAGAGTATAACAGCACGACCCCAGGTTATCGATAAGTCATGTTGGTAGTCGTCAAGTAAGAATGGGACAGAAAAGAGATTGGAACTTTTAGGATGGAACATCAGTAAACCTACGGGAAACAGAGCTATGGAACTCCCAAGTACTGTAGGTCCCTATTGGTAGTTCACTAAAAGTAACCTTCTGTGTATGATCCCTACCCTGAGTGAACGACAGAAATATGATACACGAGTACTCTCATTAGAGAGAACCGGATTCCACATTGTGGAATCTCCCAGGAATTGACCTGGGTTCCTCACGAAAGTGAGGCGACAACTTGGTCGAAAAACAAGTTCAGTTTAGTTGAGAC'
predictions=model.predict(input,stride=10,fragsize=3000,batch=100)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/lib.py", line 47, in predict
for c,(a,s,d,f,l) in enumerate(extract_pred_entry(self.model,idataset)):
File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/postprocessing.py", line 30, in extract_pred_entry
for prob,y_pred,id_,pos_,is_last_,index_,clen_ in get_predictions(idataset,model):
File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/postprocessing.py", line 9, in get_predictions
logits = model(batch[0]).numpy()
File "/home/AD/macma/miniconda3/envs/jaeger/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/AD/macma/miniconda3/envs/jaeger/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 7209, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "add_5" "f"(type Add).
{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [21,250,128] vs. [21,249,128] [Op:AddV2]
Call arguments received by layer "add_5" " f"(type Add):
• inputs=['tf.Tensor(shape=(21, 250, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 250, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)']
Some of the lengths that don't work:
108,109,120,121,132,133,144,145,156,157,168,169,180,181,192,193,204,205,216,217,228,229,240,241,252,253,264,265,276,277,288,289,300,301,312,313,324,325,336,337,348,349,360,361,372,373,384,385,396,397,408,409,420,421,432,433,444,445,456,457,468,469,480,481,492,493,504,505,516,517,528,529,540,541,552,553,564,565,576,577,588,589,600,601,612,613,624,625,636,637,648,649,660,661,672,673,684,685,696,697,708,709,720,721,732,733,744,745,756,757,768,769,780,781,792,793,804,805,816,817,828,829,840,841,852,853,864,865,876,877,888,889,900,901,912,913,924,925,936,937,948,949,960,961,972,973,984,985,996,997,1008,1009,1020,1021,1032,1033,1044,1045,1056,1057,1068,1069,1080,1081,1092,1093,1104,1105,1116,1117,1128,1129,1140,1141,1152,1153,1164,1165,1176,1177,1188,1189,1200,1201,1212,1213,1224,1225,1236,1237,1248,1249,1260,1261,1272,1273,1284,1285,1296,1297,1308,1309,1320,1321,1332,1333,1344,1345,1356,1357,1368,1369,1380,1381,1392,1393,1404,1405,1416,1417,1428,1429,1440,1441,1452,1453,1464,1465,1476,1477,1488,1489,1500,1501,1512,1513,1524,1525,1536,1537,1548,1549,1560,1561,1572,1573,1584,1585,1596,1597,1608,1609,1620,1621,1632,1633,1644,1645,1656,1657,1668,1669,1680,1681,1692,1693,1704,1705,1716,1717,1728,1729,1740,1741,1752,1753,1764,1765,1776,1777,1788,1789,1800,1801,1812,1813,1824,1825,1836,1837,1848,1849,1860,1861,1872,1873,1884,1885,1896,1897,1908,1909,1920,1921,1932,1933,1944,1945,1956,1957,1968,1969,1980,1981,1992,1993,2004,2005,2016,2017,2028,2029,2040,2041,2052,2053,2064,2065,2076,2077,2088,2089,2100,2101,2112,2113,2124,2125,2136,2137,2148,2149,2160,2161,2172,2173,2184,2185,2196,2197,2208,2209,2220,2221,2232,2233,2244,2245,2256,2257,2268,2269,2280,2281,2292,2293,2304,2305,2316,2317,2328,2329,2340,2341,2352,2353,2364,2365,2376,2377,2388,2389,2400,2401,2412,2413,2424,2425,2436,2437,2448,2449,2460,2461,2472,2473,2484,2485,2496,2497,2508,2509,2520,2521,2532,2533,2544,2545,2556,2557,2568,2569,2580,2581,2592,2593,2604,2605,2616,2617,2628,2629,2640,2641,2652,2653,2664,2665,2676,2677,2688,2689,2700,2701,2712,2713,2724,2725,2736,2737,2748,2749,2760,2761,2772,2773,2784,2785,2796,2797,2808,2809,2820,2821,2832,2833,2844,2845,2856,2857,2868,2869,2880,2881,2892,2893,2904,2905,2916,2917,2928,2929,2940,2941,2952,2953,2964,2965,2976,2977,2988,2989,3000,3001
Hi Yasas,
I was poking around your Jaeger model, but I am running into an issue and I was hoping you might know how to fix. When I am trying to train the model, the script reads fasta records from a file on the fly and feeds it to the model on GPU. But at the end of epoch 1 and the start of epoch 2, the pointer is not automatically reset to the beginning of the file to be read again and as a result the training is terminated with an error message:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 600 batches). You may need to use the repeat() function when building your dataset.
Here is your modified code that I was using:
import jaegeraa.lib
from jaegeraa.nnlib.cmodel import JaegerModel
from jaegeraa.nnlib.layers import WRes_model
from jaegeraa.utils import get_compressed_file_handle
from jaegeraa.preprocessing import fasta_gen, codon_mapper, process_string, c_mapper
from jaegeraa.postprocessing import extract_pred_entry, per_class_preds, average_per_class_score, get_class, pred2string
import tensorflow as tf
def fasta_entries(input_file_handle):
num = 0
for i in input_file_handle:
if i.startswith('>'):
num+=1
input_file_handle.seek(0)
return num
def process_string_textline(string, t1=codon_mapper(), t3=c_mapper(),onehot=True, label_onehot=True, numclasses=4):
x = tf.strings.split(string, sep=',')
label= tf.strings.to_number(x[0], tf.int32)
label= tf.cast(label, dtype=tf.int32)
forward_strand = tf.strings.bytes_split(x[1])#split the string
reverse_strand = t3.lookup(forward_strand[::-1])
tri_forward = tf.strings.ngrams(forward_strand,ngram_width=3,separator='')
tri_reverse = tf.strings.ngrams(reverse_strand,ngram_width=3,separator='')
f1=t1.lookup(tri_forward[::3])
f2=t1.lookup(tri_forward[1::3])
f3=t1.lookup(tri_forward[2::3])
r1=t1.lookup(tri_reverse[::3])
r2=t1.lookup(tri_reverse[1::3])
r3=t1.lookup(tri_reverse[2::3])
if label_onehot:
label = tf.one_hot(label, depth=numclasses, dtype=tf.float32, on_value=1, off_value=0)
return {"forward_1": f1, "forward_2": f2, "forward_3": f3, "reverse_1": r1, "reverse_2" : r2, "reverse_3" : r3 }, label
mode = 'GPU'
device = "/gpu:0"
BATCH_SIZE = 10
input_fh = get_compressed_file_handle('../test_lab.fasta')
num = fasta_entries(input_fh)
stratergy = tf.distribute.OneDeviceStrategy(device)
input_dataset = tf.data.Dataset.from_generator(fasta_gen(input_fh,fragsize=100,stride=100,num=num),
output_signature=(tf.TensorSpec(shape=(), dtype=tf.string)))
idataset = input_dataset.map(process_string_textline,
num_parallel_calls=tf.data.AUTOTUNE).batch(BATCH_SIZE, num_parallel_calls=tf.data.AUTOTUNE).prefetch(5)
inputs, outputs = WRes_model(input_shape=(None,))
model = JaegerModel(inputs=inputs, outputs=outputs)
model.compile(optimizer = tf.keras.optimizers.Adam())
model.fit(idataset, epochs=3)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.