Git Product home page Git Product logo

jaeger's People

Contributors

lingyi-owl avatar tish05 avatar yasas1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lingyi-owl

jaeger's Issues

Model can't take sequences of certain length as input

A weird bug I just encountered that is specific to only to some length of input sequences

import jaegeraa.lib
model=jaegeraa.lib.Predictor()
input = 'GTAACCGTCGAGAGAGATTTGGCTATGCCAATGAGCCCTGAGGATCCCGCTAGTGGTTTACTGTTCACTATTGGGGATATTCATGAGAATGGCAGGAATGTGAGCGTGGTTGAGAGTAGGTTGCTCAATGGTCGAGTGCCTTTTAGAGCTGGAGACTTACGAAACATGAGCTACAATTACTTTATGGAGTTCGTGAGGATCTACGCAACTATCTATATGGAGAATCAACAGCAACTCGTGGCTAAGCTTTCAGGAGATGATTACGAAAGCTCTTCATCATCGTTTCCCGAGAATGAGGAATTGGAATTTGACTTCCTAGCCCAAGCACACAATGGTGTGTACCTAACGATAGAGGAAGTTGTAGCTAAATTTGAGTCAATGAAATTCTCGGGAAAACAACTCAATGCTGAAATTGAAAAATTCGAAAGAATTGGAGTTGATGGATGGAGAACTAACAAAGCTCTCTCCTTTAATGATTTGGTCAAAAGGTTTTGTGGATGCTGCTTAGGTGATGACTGTAACTTTGATTTCCACTATCGAACTTTATTCAAAGTGCTAATAGAGAATAAGCAAATCCCAGCCTACAAGTGTATGGTTCTCCATAAAGTGAATCCAGATAGAATGAAGACTCAGATAAAGATGGTGAACGGGTACACTTTGGAAACAATGTTTAAGACTTTGAACCCTCTCACCATTTTCTTATATCTGGTTTTTGTGCTGAAATGTGGTATTAGTGCCGACAATGTATGTTTATCGTACCAATTATTTGCTATGAATGACGCAGAGCAAGTTGAATTTGAAATTGAAGATTCTTTGCGTCTGGATGAACAGGTACAAATTGGTCAATACTCATGCTATGTTTGGCCTAGTGTCGGAAAATTCTATCCGGAAATTCTGGCGAAGAGAGGTTGCATTGCTGTGAATGATGGAACTACATTTTATATTTTCGTTTCAAGTTCACAGATAGATAAAATTCACCCAGAAGCAGCGTGGTCGGATATGCTACAAGGAGTAGGCAGAAGAGGAGTCGATATTTTAAGTATAGCTGGTCCAACAAAAACCAAGTTTCTGATAAAACATGTGGAAAGTTGTTACGAAACTCTTAAGAGTCCGGAAGATTGGAAAGCTAAATGCAAAGAGTACTATGAGTCCATAAGCTTATATGAGTACATTCTCTTACTGATGGCAGTTGGGTCTCGAGCTGGAATTGAAACCCAGAGGATGAGTAAATATCAGGCCCGAAAGAACAAAATTAGAATGCCAGAAGTGTTGGAGAAGTACATTGAAGTTGAGAAAGCGACCATAGGAAAGCTGTCAAAACCAGCCAAGACCTGTCTAGCAATTGGTGCCGGAGTGGCTATTTTTGGAGTTCTAGCGGGGCTAGGAGTCGGTCTATATAAATTGATAACTCATTTTTCTAAGACCGACTCAGAAGACAATGACATTGAAATAGATGATCTAGTCCCGGAGATGAGTGGAGCTCATGCTTCTGATGAGAATGTTACCACATATGCTGTCAGGAGACAAGTTCCAAAGGTGCGACTAGCCAAACAATTCAAAGTTCGCTCGTCACCAAGCCCATCAGACAATGAACAACCAAAAGTAGATATTCTAGTGCCTGAAATGACAGGGTGCCATGCCAGTGATGAACACCTCACCAAGCATTTTACAAAAAGGAGAGTCACCATGAAGAGAGTTGGAGCTGTCAAGGAATCACACATTGTGACATATGACGAGAATACTCCACATGTGAGACTCATCAGAAATCTGAGAAGAACACGCTTGGCGAGAGCTATTAAGCAAATGGCACAACTTGGAGAACTACCGGACACATTGTCAGAAATTCAAGTGTGGCAACAATATGTAGTGGACAAAGGTATCAGACCAGCTGAACATACAACAGATTTTAGACTCTTCTCAGCTATAGCTGATCAGGAACAAGAGGATCCAGAAGAAATCAATATGGCGAGTGGAGAAACGATGAAATTTGACGAAAACAAGTACAATGAGATAGTCCAAGTCGTCAAAGGGATATCGCCAACTAAATCTGACATAGTGACAATGACTACTAAAGGAGCCCACCATACGGCGATCAAGCAGGTTCGAATTGGATACAAAAGTTTAGACAAGGATCCGAATATGGTGAGCATACTTTCTAACCAACTAACCAAAATTAGTTGTGTAATTTTGAACGTGACTCCTGGTAGAACGGCGTACCTAAACGTCATGAGGTTGTGTGGGACATTTGTTGTGTGCCCAGCCCATTATCTAGAAGCTCTAGAAGAGGATGACACGATTTACTTCATATCCTTTTCTGTCTGTATTAAACTCAGATTTCAACCAGACAGAGTGACATTAGTCAACACTCATCAAGATCTTGTAGTGTGGGATTTGGGTAATTCAGTACCACCGGCTATTGACGTTTTGAGCATGATACCAACCGTGGCAGATTGGGACAAGTTTCAAGATGGCCCTGGTGCTTTTGGTGTGACAAAGTACAATGCTCGGTATCCAACAAATTACATAAATACTCTTGATATGATTGAGAGAATCCGAGCCGACACTCAGAACCCCACGGGCATATACAAAATGCTCAACTCCGATCACACAATCACCACAGGTCTTAGATATCAGATGTACTCATTAGAAGGATTCTGTGGTGGGCTGATACTACGGGCTTGCACTAGAATGGTTAGAAAGATTGTGGGACTTCATGTAGCTGCTAGTGCAAATCACGCTATGGGATATGCAGAATGTCTGGTGCAAGAAGATCTTAAACATGCTATAAATAAGCTGTCACCAGATGCAAGGAGTTTAATTATCGGACATCTCAATCCCAAAGTAGAAACAGCCACAAAACAGTGTGGAATTGTGAGGAGCCTTGGAAGTCTAGGGTGCCACGGAAAGGTTACAAGTGAGGACGTGGCGATGACTGCAACAAAGACCACGATCAGAAAGTCTAGAATTTATGGTCTTGTTGGAGATATCAAAACAGAACCCTCAATTTTACATGCTCATGACCCACGTCTCCCTGAGGATCAGATTGGAAAGTGGGACCCAGTGTTTGAAGCTGCCTTGAAGTATGGAACAAGAATAGAACCATTCCCCATTGAAGAAATTCTTGAAGTGGAAGATCATTTATCTATTATACTTAAAGGCATGGACAATACTCTCAAGAAAAGAAATGTCAACAATCTTGAAGTTGGGATAAACGGAATAGATCAATCAGATTATTGGCTTCAGATAGAGACAAATACTTCTCCTGGGTGGCCCTACACAAAAAGAAAACCGAAGGGAGCTGAAGGAAAGAAATGGTTGTTCAAAGAGGTTGGGAACTACCCCTCCGGGAAACCCATTCTAGAAATGGAGGACTCAGGACTCATTGAGAGCTACAATAAAATGTTGAGAGATGCCAAACAGGGTGTAGCTCCCATTGTGGTTACTGTGGAGTGCCCAAAAGATGAACGCAGAAAGTTAAGTAAGATCTACGAACAACCAGCCACCAGGACTTTCACGATTCTCCCGCCTGAAATAAACATTCTCTTTAGGCAATATTTTGGTGACTTTGCCGCCATGATAATGACTAATAGATCAAAATTATTCTGTCAGGTTGGGATAAATCCAGAGAATATGGAATGGAGTGATCTAATGCATGAGTTCCTCCACAAGTCAACACATGGCTTTGCTGGAGACTACTCAAAATTTGATGGAATTGGAGATCCTCAGATTTATCATTCCATAACTCAGGTGGTAAATAACTGGTACGATGATGGGGAAGAAAATGCCAGGACACGTCACGCACTAATTAGTAGTATAATACATAGAGAGGGTATAGTTAAGGAGTATCTTTTCCAGTATTGTCAGGGAATGCCTTCTGGTTTTGCCATGACAGTCATTTTCAACTCCTTCGTGAATTATTACTATTTAGCTATGGCGTGGATGAATTTAATCTCACACTCACCATTGAGTCCCCAATCCACGGTTAGAGATTTCGACAACTATTGTAAGGTAGTAGTTTATGGGGACGATAACATAGTTTCAGTAGATTTGAACTTTCTAGAATATTACAACCTTAGGACTGTAGCAGCTTATTTGTCTCAATTTGGAGTAACGTACACAGATGACGCAAAGAATCCGATTGAGAAAAGTGTGCCTTTCGTAGAAATAACTTCTGTTTCATTTCTTAAGCGTAGGTGGGTGCCCTTGGGTGGAAGACTTTCAACTATTTACAAGGCACCTTTGGACAAAACTAGCATAGAGGAGCGCCTTCATTGGATAAGGGAGTGCGATAATGACATCGAAGCTCTCAATCAGAATATTGAAAGCGCCCTATATGAAGCAAGCATTCATGGAAAGATCTACTTTGGTGATCTCCTTCAGAGGATCCGGATTGCTTGTGACGCTGTGATGATCCCAGTTCCATCAGTAACATTTAAGGATTGTCACAAAAGGTGGTGGGCTTCCATGACTGGAGGAGCTTTAGATCCAGCTAGTCTAAGTCGGTTGTACTTGGCCGCCGAGAACCAGTTGGTCGACACTCGGAAAGTGTGGAAAGATCGCTTCCTTGGTGAGGATAGGTCTTTAATAGACATGCTGAAGTCAGCTCGTGCTGTTCCTCTAGCTGCCTATCATGTATAAGCCTCACGACTCTGTGCAGAGTATAACAGCACGACCCCAGGTTATCGATAAGTCATGTTGGTAGTCGTCAAGTAAGAATGGGACAGAAAAGAGATTGGAACTTTTAGGATGGAACATCAGTAAACCTACGGGAAACAGAGCTATGGAACTCCCAAGTACTGTAGGTCCCTATTGGTAGTTCACTAAAAGTAACCTTCTGTGTATGATCCCTACCCTGAGTGAACGACAGAAATATGATACACGAGTACTCTCATTAGAGAGAACCGGATTCCACATTGTGGAATCTCCCAGGAATTGACCTGGGTTCCTCACGAAAGTGAGGCGACAACTTGGTCGAAAAACAAGTTCAGTTTAGTTGAGAC'
predictions=model.predict(input,stride=10,fragsize=3000,batch=100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/lib.py", line 47, in predict
    for c,(a,s,d,f,l) in enumerate(extract_pred_entry(self.model,idataset)):
  File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/postprocessing.py", line 30, in extract_pred_entry
    for prob,y_pred,id_,pos_,is_last_,index_,clen_ in get_predictions(idataset,model):
  File "/projects/macma/220530_kraken2-simreads/scripts/230629/Jaeger/jaegeraa/postprocessing.py", line 9, in get_predictions
    logits = model(batch[0]).numpy()
  File "/home/AD/macma/miniconda3/envs/jaeger/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/AD/macma/miniconda3/envs/jaeger/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 7209, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "add_5" "f"(type Add).

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [21,250,128] vs. [21,249,128] [Op:AddV2]

Call arguments received by layer "add_5" "                 f"(type Add):
  • inputs=['tf.Tensor(shape=(21, 250, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 250, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)', 'tf.Tensor(shape=(21, 249, 128), dtype=float32)']

Some of the lengths that don't work:

108,109,120,121,132,133,144,145,156,157,168,169,180,181,192,193,204,205,216,217,228,229,240,241,252,253,264,265,276,277,288,289,300,301,312,313,324,325,336,337,348,349,360,361,372,373,384,385,396,397,408,409,420,421,432,433,444,445,456,457,468,469,480,481,492,493,504,505,516,517,528,529,540,541,552,553,564,565,576,577,588,589,600,601,612,613,624,625,636,637,648,649,660,661,672,673,684,685,696,697,708,709,720,721,732,733,744,745,756,757,768,769,780,781,792,793,804,805,816,817,828,829,840,841,852,853,864,865,876,877,888,889,900,901,912,913,924,925,936,937,948,949,960,961,972,973,984,985,996,997,1008,1009,1020,1021,1032,1033,1044,1045,1056,1057,1068,1069,1080,1081,1092,1093,1104,1105,1116,1117,1128,1129,1140,1141,1152,1153,1164,1165,1176,1177,1188,1189,1200,1201,1212,1213,1224,1225,1236,1237,1248,1249,1260,1261,1272,1273,1284,1285,1296,1297,1308,1309,1320,1321,1332,1333,1344,1345,1356,1357,1368,1369,1380,1381,1392,1393,1404,1405,1416,1417,1428,1429,1440,1441,1452,1453,1464,1465,1476,1477,1488,1489,1500,1501,1512,1513,1524,1525,1536,1537,1548,1549,1560,1561,1572,1573,1584,1585,1596,1597,1608,1609,1620,1621,1632,1633,1644,1645,1656,1657,1668,1669,1680,1681,1692,1693,1704,1705,1716,1717,1728,1729,1740,1741,1752,1753,1764,1765,1776,1777,1788,1789,1800,1801,1812,1813,1824,1825,1836,1837,1848,1849,1860,1861,1872,1873,1884,1885,1896,1897,1908,1909,1920,1921,1932,1933,1944,1945,1956,1957,1968,1969,1980,1981,1992,1993,2004,2005,2016,2017,2028,2029,2040,2041,2052,2053,2064,2065,2076,2077,2088,2089,2100,2101,2112,2113,2124,2125,2136,2137,2148,2149,2160,2161,2172,2173,2184,2185,2196,2197,2208,2209,2220,2221,2232,2233,2244,2245,2256,2257,2268,2269,2280,2281,2292,2293,2304,2305,2316,2317,2328,2329,2340,2341,2352,2353,2364,2365,2376,2377,2388,2389,2400,2401,2412,2413,2424,2425,2436,2437,2448,2449,2460,2461,2472,2473,2484,2485,2496,2497,2508,2509,2520,2521,2532,2533,2544,2545,2556,2557,2568,2569,2580,2581,2592,2593,2604,2605,2616,2617,2628,2629,2640,2641,2652,2653,2664,2665,2676,2677,2688,2689,2700,2701,2712,2713,2724,2725,2736,2737,2748,2749,2760,2761,2772,2773,2784,2785,2796,2797,2808,2809,2820,2821,2832,2833,2844,2845,2856,2857,2868,2869,2880,2881,2892,2893,2904,2905,2916,2917,2928,2929,2940,2941,2952,2953,2964,2965,2976,2977,2988,2989,3000,3001

Problems with training

Hi Yasas,
I was poking around your Jaeger model, but I am running into an issue and I was hoping you might know how to fix. When I am trying to train the model, the script reads fasta records from a file on the fly and feeds it to the model on GPU.  But at the end of epoch 1 and the start of epoch 2, the pointer is not automatically reset to the beginning of the file to be read again and as a result the training is terminated with an error message:
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 600 batches). You may need to use the repeat() function when building your dataset.

Here is your modified code that I was using:

import jaegeraa.lib
from jaegeraa.nnlib.cmodel import JaegerModel
from jaegeraa.nnlib.layers import WRes_model
from jaegeraa.utils import get_compressed_file_handle
from jaegeraa.preprocessing import fasta_gen, codon_mapper, process_string, c_mapper
from jaegeraa.postprocessing import extract_pred_entry, per_class_preds, average_per_class_score, get_class, pred2string
import tensorflow as tf 

def fasta_entries(input_file_handle):
    num = 0
    for i in input_file_handle: 
        if i.startswith('>'):
            num+=1
    input_file_handle.seek(0)
    return num

def process_string_textline(string, t1=codon_mapper(), t3=c_mapper(),onehot=True, label_onehot=True, numclasses=4):
    
    x = tf.strings.split(string, sep=',')

    label= tf.strings.to_number(x[0], tf.int32)
    label= tf.cast(label, dtype=tf.int32)

    forward_strand = tf.strings.bytes_split(x[1])#split the string 
    reverse_strand = t3.lookup(forward_strand[::-1])

    tri_forward = tf.strings.ngrams(forward_strand,ngram_width=3,separator='')
    tri_reverse = tf.strings.ngrams(reverse_strand,ngram_width=3,separator='')

    f1=t1.lookup(tri_forward[::3])
    f2=t1.lookup(tri_forward[1::3])
    f3=t1.lookup(tri_forward[2::3])

    r1=t1.lookup(tri_reverse[::3])
    r2=t1.lookup(tri_reverse[1::3])
    r3=t1.lookup(tri_reverse[2::3])

    if label_onehot:
        label = tf.one_hot(label, depth=numclasses, dtype=tf.float32, on_value=1, off_value=0)

    return {"forward_1": f1, "forward_2": f2, "forward_3": f3, "reverse_1": r1, "reverse_2" : r2, "reverse_3" : r3 }, label


mode = 'GPU'
device = "/gpu:0"
BATCH_SIZE = 10


input_fh = get_compressed_file_handle('../test_lab.fasta')
num = fasta_entries(input_fh)
stratergy = tf.distribute.OneDeviceStrategy(device)

input_dataset = tf.data.Dataset.from_generator(fasta_gen(input_fh,fragsize=100,stride=100,num=num),
                                                    output_signature=(tf.TensorSpec(shape=(), dtype=tf.string)))

idataset = input_dataset.map(process_string_textline,
                            num_parallel_calls=tf.data.AUTOTUNE).batch(BATCH_SIZE, num_parallel_calls=tf.data.AUTOTUNE).prefetch(5)

inputs, outputs = WRes_model(input_shape=(None,))
model = JaegerModel(inputs=inputs, outputs=outputs)
model.compile(optimizer = tf.keras.optimizers.Adam())
model.fit(idataset, epochs=3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.