Hi , I would like to ask about ""num_outputs": {"data": [1,2], "classes": [79,1], "siz

Asking about num_outputs in config about returnn HOT 8 CLOSED

rwth-i6 commented on August 11, 2024

Asking about num_outputs in config

from returnn.

Comments (8)

sanakhamekhem commented on August 11, 2024

Besides, I would ask about how can I extract the log probabilies of the recognized frames in order to decode them using a decoder software.
Thanks.

from returnn.

pvoigtlaender commented on August 11, 2024

Hi,

the [1,2] and [2,1] entries in data and sizes should better not be touched. the 79 means we have 79 characters (excluding the CTC blank), so yes, it's correct to change it to 107 if you have 107 characters.

To extract posteriors, you can use the config_fwd of the mdlstm example.
Then you can do a very simple decoding (without language model, or lexicon) using decode.py to see if it is working.

from returnn.

sanakhamekhem commented on August 11, 2024

Hi, thank you for your response.
I have used this code to decode:
`#!/usr/bin/env python

import h5py
import numpy

with open("chars.txt") as f:
  chars = [l.strip() for l in f.readlines()]
print chars
with h5py.File("mdlstm_real_test.h5", "r") as f:
  x = f["inputs"][...]
  x = numpy.argmax(x, axis=1)
  print 'x: ', x
  x = [chars[idx] for idx in x]
  **print 'char:' , x**
  lens = f["seqLengths"][...]
  tags = f["seqTags"][...]
  start = 0
  for tag, len_ in zip(tags, lens):
    y = []
    last_char = None
    for c in x[start:start+len_]:
      if last_char != c:
        y.append(c)
        last_char = c
    y = [" " if c == "|" else c for c in y]
    output = "".join(y).strip()
    print tag, output
    start += len_`

the char list : ['', '', 'aaA', 'aaE', 'aeA', 'aeE'.....]

But, x contains: char: ['|', '|', '|', '|', '|', '|', '|', '|', '|', '|',................
and no result in the decoding, no output

from returnn.

pvoigtlaender commented on August 11, 2024

Hi,

please first make sure that the demo including decoding (using the original set of characters) works for you. Is this the case?

from returnn.

sanakhamekhem commented on August 11, 2024

I did not test it, I will verify, thanks

from returnn.

sanakhamekhem commented on August 11, 2024

I have just a question, when we train the MDLSTM for line images which are written from right to left,
is there some parameters to change in the code or config??
such the case of Arabic text,
Thanks

from returnn.

pvoigtlaender commented on August 11, 2024

We don't have such an option unfortunately. One possibility would be to explicitly flip the images and overwrite them with the flipped versions, or do the flipping when converting the raw images to hdf5.

from returnn.

sanakhamekhem commented on August 11, 2024

I have done this when I convert to hdf5:
img = imread(img_name)
rimg = cv2.flip(img, 1)
....................

from returnn.

Recommend Projects

Asking about num_outputs in config about returnn HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent