Git Product home page Git Product logo

pylaia's Introduction

PyLaia

PyLaia is a device agnostic, PyTorch based, deep learning toolkit for handwritten document analysis.

It is also a successor to Laia.

Build Coverage Code quality

Python: 3.8+ PyTorch: 1.13.0+ pre-commit: enabled Code style: black

Get started by having a look at our Wiki!

Several (mostly undocumented) examples of its use are provided at PyLaia-examples.

Installation

In order to install PyLaia, follow this recipe:

git clone https://github.com/jpuigcerver/PyLaia
cd PyLaia
pip install -e .

Please note that the CUDA version of nnutils (nnutils-pytorch-cuda) is installed by default. If you do not have a GPU, you should install the CPU version (nnutils-pytorch).

The following Python scripts will be installed in your system:

Acknowledgments

Work in this toolkit was financially supported by the Pattern Recognition and Human Language Technology (PRHLT) Research Center

BibTeX

@misc{puigcerver2018pylaia,
  author = {Joan Puigcerver and Carlos Mocholí},
  title = {PyLaia},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jpuigcerver/PyLaia}},
  commit = {commit SHA}
}

pylaia's People

Contributors

ahector avatar carmocca avatar jpuigcerver avatar la0 avatar starride-teklia avatar stweil avatar vbosch avatar yschneider-sinneria avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pylaia's Issues

Inference model for english words

Hi there,

I'm new to this repo and I wanted to ask if there any inference models I can use handwriting text recognition for english words? If yes, how do i use it?

Thank you!

Print stats on the dataset before training

It could be useful to have some stats on the dataset before training. Something like:

  • number of images (training/validation sets)
  • number of characters/words (training/validation sets)
  • character distribution (training set)
  • unknown characters

Looking for maintainers ?

Hi, I'm a software engineer at Teklia, we are currently using PyLaia and planning to develop new features.
Would it be possible to be added as contributors so that we can publish patches and update/maintain the code a bit ? Thanks.
Solène

compute_loss error get NoneType

Note to possible users of PyLaia.
If you encounter this error, it is highly likely due to your text contains extra characters which are not specified in your syms_ctc.txt file

2020-08-13 08:24:04,191 INFO laia.common.arguments : 
{'add_logsoftmax_to_loss': True,
 'batch_size': 10,
 'checkpoint': 'ckpt.lowest-valid-cer*',
 'delimiters': ['@'],
 'gpu': 1,
 'img_dirs': ['/content/getuigenissen/imgs/textlines_h128'],
 'iterations_per_update': 1,
 'learning_rate': 0.0003,
 'logging_also_to_stderr': 20,
 'logging_config': None,
 'logging_file': '/content/getuigenissen/log/training-trace.log',
 'logging_level': 20,
 'logging_overwrite': False,
 'max_epochs': None,
 'max_nondecreasing_epochs': 20,
 'model_filename': 'model',
 'momentum': 0,
 'num_rolling_checkpoints': 3,
 'print_args': True,
 'save_checkpoint_interval': 10,
 'seed': 74565,
 'show_progress_bar': True,
 'syms': <_io.TextIOWrapper name='/content/getuigenissen/imgs/syms_ctc.txt' mode='r' encoding='UTF-8'>,
 'tr_txt_table': <_io.TextIOWrapper name='/content/getuigenissen/imgs/tr.txt' mode='r' encoding='UTF-8'>,
 'train_path': '/content/getuigenissen',
 'train_samples_per_epoch': None,
 'use_baidu_ctc': False,
 'use_distortions': False,
 'va_txt_table': <_io.TextIOWrapper name='/content/getuigenissen/imgs/va.txt' mode='r' encoding='UTF-8'>,
 'valid_samples_per_epoch': None}
2020-08-13 08:24:04,297 INFO laia.common.loader : Loaded model /content/getuigenissen/model
2020-08-13 08:24:12,390 INFO laia : Training data transforms:
Compose(
    vision.Convert(mode=L)
    vision.Invert()
    ToTensor()
)
Train:  24% 114/485 [01:58<06:24,  1.04s/it]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/engine.py", line 232, in exception_catcher
    yield
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/trainer.py", line 222, in compute_loss
    loss = self._criterion(batch_output, batch_target, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/losses/ctc_loss.py", line 340, in forward
    acts, labels, act_lens, label_lens = CTCPrepare.apply(
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/losses/ctc_loss.py", line 140, in forward
    labels = torch.tensor(
TypeError: an integer is required (got type NoneType)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/pylaia-htr-train-ctc", line 4, in <module>
    __import__('pkg_resources').run_script('laia==0.1.0', 'pylaia-htr-train-ctc')
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py", line 665, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py", line 1463, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/EGG-INFO/scripts/pylaia-htr-train-ctc", line 279, in <module>
    experiment.run()
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/experiments/experiment.py", line 76, in run
    self._tr_engine.run()
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/hooks/action.py", line 29, in wrapper
    return func(
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/trainer.py", line 126, in run
    self._run_epoch()
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/engine.py", line 224, in _run_epoch
    self._run_iteration(it, batch)
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/trainer.py", line 180, in _run_iteration
    batch_loss = self.compute_loss(batch, batch_output, batch_target)
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/trainer.py", line 228, in compute_loss
    return loss
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.8/dist-packages/laia-0.1.0-py3.8.egg/laia/engine/engine.py", line 240, in exception_catcher
    raise_from(wrapper, e)
  File "/usr/local/lib/python3.8/dist-packages/torch/_six.py", line 53, in raise_from
    raise value from from_value
laia.engine.engine_exception.EngineException: Exception "TypeError('an integer is required (got type NoneType)')" raised during epoch 0, iteration 114. The batch that caused the exception was: ['32505198-73', '32505199-31', '32500853-08', '32501271-05', '32505219-03', '32505217-04', '32505165-26', '32500973-14', '32500665-39', '32505171-19']
/
/
/

Decoding nn output with dictionary and keyword spoting

Can not find any actual examples or documentation of how to use PyLaia for keyword spotting and decoding the output with dictionary and language model.
Are there some examples of how to do that?
Please ,i'll be grateful for any info

Question: Is there a parameter to change the data augmentation probability?

Hi Joan Puigcerver

I'm trying to figure out how I can modify the probability in which the distortions are applied to reproduce the exact same setup used in your work on ICDAR 2017. I am checking the code of the Dilation and Erosion morphologies techniques but I did not understand yet how I can modify the chance of applying these transformations to 0.5 (actually I don't know if this is already being made), maybe I can use the alpha and beta parameters.

Thanks,
Dayvid

Add a command for evaluation

I would like to create a new command pylaia-htr-evaluate to compute CER/WER.

  • Command:
pylaia-htr-evaluate --config evaluation_config.yaml
  • Configuration file:
eval:
   labels:
      train: path/to/train_labels.txt
      val: path/to/val_labels.txt
      test: path/to/test_labels.txt
   predictions:
      train: path/to/train_predictions.txt
      val: path/to/val_predictions.txt
      test: path/to/test_predictions.txt
  • Output formatted in markdown:
Set CER (%) WER (%)
train 2.40 8.10
val 7.45 19.75
test 6.55 18.2

Issues using images with more than one Channel

At the moment I am trying to use PyLaia with images that have more than one channel.

Unfortunately the training script pylaia-htr-train-ctc loads both training and validation images with image_channels=1 ( for example in line 168 of this script). If I leave this value at 1 although the model is set to have 3 channels and the images used have 3 channels I get the following exception:

Exception "AssertionError('Input image depth (1) does not match the expected (3)')"

If I set the value to 3 I get errors from the padding_collater.py:

File "/usr/local/lib/python3.7/dist-packages/laia-0.1.0-py3.7.egg/laia/data/padding_collater.py", line 31, in _get_max_size_and_check_batch_tensor
assert maxv == expected_shape[d] and minv == expected_shape[d]

Is there a way to workaround this issue ?

Fix pylaia-create-model-htr

The script is wrong.

This should be a fixed input height (passed as an argument):

hyperparameters = (args.lstm_hidden_size,

This should be the number of channels of the input images (passed as an option, default = 1):

args.lstm_num_layers, len(syms),

The options regarding the LSTM layers should go there:

256, 5, 0.5, 0.5)

The script should accept the same options as the original Laia script (one should be able to create a model with more layers, enable/disable dropout, etc).

Error about missing symbols in the symbols table

Hi, maybe I don't completely understand the structure for the required files, but I tried to guess them looking at the code.

Command I use is
./laia/scripts/htr/train_ctc.py syms.txt [./train_rus2/images/] tr.txt val.txt

Where
./train_rus2/images/ contains jpg files with names such as 00000003_1.jpg, etc

tr.txt and val.txt contain strings like that:
00000003_1 въ доме московскаго универ
00000003_10 валентина
(<image filename without .jpg> <string that corresponds to the image>)

Where syms.txt contains

⊗ 0
) 1
+ 2
/ 3
0 4
1 5
2 6
3 7
4 8
5 9
6 10
7 11
8 12
9 13
[ 14
] 15
i 16
k 17
l 18
| 19
× 20
ǂ 21
а 22
б 23
в 24
г 25
д 26
е 27
ж 28
з 29
и 30
й 31
к 32
л 33
м 34
н 35
о 36
п 37
р 38
с 39
т 40
у 41
ф 42
х 43
ц 44
ч 45
ш 46
щ 47
ъ 48
ы 49
ь 50
э 51
ю 52
я 53
і 54
– 55
. 56
<space> 57

What I get is

[2021-03-03 16:08:03, ... ERROR laia.data.transforms.text.transforms] Could not find ... in the symbols table
[2021-03-03 16:08:03,856 ERROR laia.data.transforms.text.transforms] Could not find "20" in the symbols table
[2021-03-03 16:08:04,049 CRITICAL laia] Uncaught exception:
Traceback (most recent call last):
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_module.py", line 115, in compute_loss
    batch_loss = self.criterion(batch_y_hat, batch_y, **kwargs)
  File "/home/pchela/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/pchela/TextRecognition/PyLaia/laia/losses/ctc_loss.py", line 102, in forward
    y = torch.tensor(list(itertools.chain.from_iterable(y)), dtype=torch.int)
TypeError: an integer is required (got type NoneType)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./laia/scripts/htr/train_ctc.py", line 204, in <module>
    main()
  File "./laia/scripts/htr/train_ctc.py", line 200, in main
    run(**args)
  File "./laia/scripts/htr/train_ctc.py", line 128, in run
    trainer.fit(engine_module, datamodule=data_module)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 468, in fit
    results = self.accelerator_backend.train()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 61, in train
    results = self.train_or_test()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 66, in train_or_test
    results = self.trainer.train()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 490, in train
    self.run_sanity_check(self.get_model())
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 697, in run_sanity_check
    _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 613, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 77, in validation_step
    output = self.trainer.model.validation_step(*args)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/htr_engine_module.py", line 72, in validation_step
    result = super().validation_step(batch, *args, **kwargs)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_module.py", line 150, in validation_step
    batch_loss = self.compute_loss(batch, batch_y_hat, batch_y)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_module.py", line 119, in compute_loss
    return batch_loss
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_exception.py", line 29, in exception_catcher
    raise EngineException(
laia.engine.engine_exception.EngineException: Exception "TypeError('an integer is required (got type NoneType)')" raised during epoch=0, global_step=0 with batch=['00000004_20', '00000003_22', '00000003_8', '00000004_34', '00000003_5', '00000003_36', '00000004_48', '00000003_0']`

Support for PyTorch 1.8.1 (or PyTorch >= 1.7.0 in general)

When I install PyTorch 1.8.1 with CUDA 11.1 support (from https://pytorch.org/get-started/locally: pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html), I get the following error when running e.g. pylaia-htr-decode-ctc:

Traceback (most recent call last):
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 567, in _build_master
    ws.require(__requires__)
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 884, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 775, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (torch 1.8.1+cu111 (/usr/local/anaconda3/lib/python3.8/site-packages), Requirement.parse('torch<1.7.0,>=1.6.0'), {'nnutils-pytorch'})

Seems to be that nnutils-pytorch does not support torch > 1.7.0?

Problem when resuming training

When trying to continue an already started training (i finished a few epochs, evaluated the results and wanted to continue training), i got the following error:

019-03-06 15:03:33,653 INFO laia.common.loader : Loaded checkpoint train/experiment.ckpt.lowest-valid-cer-12 Traceback (most recent call last): File "/data1/home/beat.wolf/anaconda3/envs/tf/bin/pylaia-htr-train-ctc", line 4, in <module> __import__('pkg_resources').run_script('laia==0.1.0', 'pylaia-htr-train-ctc') File "/data1/home/beat.wolf/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/data1/home/beat.wolf/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1438, in run_script exec(code, namespace, namespace) File "/data1/home/beat.wolf/anaconda3/envs/tf/lib/python3.6/site-packages/laia-0.1.0-py3.6.egg/EGG-INFO/scripts/pylaia-htr-train-ctc", line 237, in <module> os.path.join(args.train_path, "experiment.{}".format(args.checkpoint)) File "/data1/home/beat.wolf/anaconda3/envs/tf/lib/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/common/loader.py", line 138, in load_by pattern, key=key, reverse=reverse File "/data1/home/beat.wolf/anaconda3/envs/tf/lib/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/common/loader.py", line 98, in load_by return self.load(filepath) File "/data1/home/beat.wolf/anaconda3/envs/tf/lib/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/common/loader.py", line 132, in load set_rng_state(state.pop("rng"), self._device) File "/data1/home/beat.wolf/anaconda3/envs/tf/lib/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/common/random.py", line 32, in set_rng_state torch.set_rng_state(state["torch_cpu"]) File "/data1/home/beat.wolf/anaconda3/envs/tf/lib/python3.6/site-packages/torch/random.py", line 14, in set_rng_state default_generator.set_state(new_state) TypeError: expected a torch.ByteTensor, but got torch.cuda.ByteTensor

Bug in memory_meter.py

The code that you added to log the memory used by the model produce exceptions.

--- Logging error ---
Traceback (most recent call last):
  File "/home/jpuigcerver/src/PyLaia/laia/logging.py", line 32, in emit
    msg = self.format(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 830, in format
    return fmt.format(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 567, in format
    record.message = record.getMessage()
  File "/usr/lib/python3.5/logging/__init__.py", line 328, in getMessage
    msg = str(self.msg)
  File "/home/jpuigcerver/src/PyLaia/laia/logging.py", line 47, in __str__
    return str(self.fmt).format(*self.args, **self.kwargs)
  File "/home/jpuigcerver/src/PyLaia/laia/hooks/meters/memory_meter.py", line 48, in value
    if torch.cuda.is_available()
  File "/home/jpuigcerver/src/PyLaia/laia/hooks/meters/memory_meter.py", line 30, in get_gpu_memory
    if int(pid) == os.getpid():
ValueError: invalid literal for int() with base 10: "b'9412"
Call stack:
  File "./src/python/train_ctc.py", line 179, in <module>
    engine_wrapper.run()
  File "/home/jpuigcerver/src/PyLaia/laia/engine/htr_engine_wrapper.py", line 113, in run
    self._tr_engine.run()
  File "/home/jpuigcerver/src/PyLaia/laia/hooks/action.py", line 31, in wrapper
    **{k: v for k, v in kwargs.items() if k in argspec.args}
  File "/home/jpuigcerver/src/PyLaia/laia/engine/trainer.py", line 138, in run
    self._run_epoch()
  File "/home/jpuigcerver/src/PyLaia/laia/engine/engine.py", line 244, in _run_epoch
    self._call_hooks(EPOCH_END, epoch=self._epochs)
  File "/home/jpuigcerver/src/PyLaia/laia/engine/engine.py", line 176, in _call_hooks
    hook(*args, caller=self, **kwargs)
  File "/home/jpuigcerver/src/PyLaia/laia/hooks/action.py", line 31, in wrapper
    **{k: v for k, v in kwargs.items() if k in argspec.args}
  File "/home/jpuigcerver/src/PyLaia/laia/engine/htr_engine_wrapper.py", line 198, in _epoch_summary
    self.logger.info(", ".join(self._summary_format), **self._summary_params)
  File "/usr/lib/python3.5/logging/__init__.py", line 1279, in info
    self._log(INFO, msg, args, **kwargs)
  File "/home/jpuigcerver/src/PyLaia/laia/logging.py", line 72, in _log
    level=level, msg=msg, args=(), exc_info=exc_info, extra=extra
  File "/usr/lib/python3.5/logging/__init__.py", line 1415, in _log
    self.handle(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 1425, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 1487, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.5/logging/__init__.py", line 855, in handle
    self.emit(record)
  File "/home/jpuigcerver/src/PyLaia/laia/logging.py", line 36, in emit
    self.handleError(record)
Message: <laia.logging.FormatMessage object at 0x7f0011551978>
Arguments: ()

ValueError: Input images must have a fixed height of 16 pixels, found [15, 16]

I get an error while running script "pylaia-htr-train-ctc ". Here is log:


[2022-02-09 06:55:50,368 INFO laia] Arguments: {'syms': '/content/kzh/syms_ctc.txt', 'img_dirs': ['/content/kzh/imgs/PYLAIA_PREPARED'], 'tr_txt_table': '/content/kzh/tr.txt', 'va_txt_table': '/content/kzh/va.txt', 'common': CommonArgs(seed=74565, train_path='', model_filename='model_h128', experiment_dirname='experiment', monitor=<Monitor.va_cer: 'va_cer'>, checkpoint=None), 'data': DataArgs(batch_size=10, color_mode=<ColorMode.L: 'L'>), 'train': TrainArgs(delimiters=['<space>'], checkpoint_k=3, resume=False, early_stopping_patience=20, gpu_stats=False, augment_training=False), 'optimizer': OptimizerArgs(name=<Name.RMSProp: 'RMSProp'>, learning_rate=0.0003, momentum=0.0, weight_l2_penalty=0.0, nesterov=False), 'scheduler': SchedulerArgs(active=False, monitor=<Monitor.va_cer: 'va_cer'>, patience=10, factor=0.1), 'trainer': TrainerArgs(gradient_clip_val=0.0, process_position=0, num_nodes=1, num_processes=1, gpus=1, auto_select_gpus=False, tpu_cores=None, progress_bar_refresh_rate=1, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=1, max_epochs=1000, min_epochs=1, max_steps=None, min_steps=None, limit_train_batches=1.0, limit_val_batches=1.0, limit_test_batches=1.0, val_check_interval=1.0, flush_logs_every_n_steps=100, log_every_n_steps=50, accelerator=None, sync_batchnorm=False, precision=32, weights_summary='full', weights_save_path=None, num_sanity_val_steps=2, truncated_bptt_steps=None, profiler=None, benchmark=False, deterministic=False, reload_dataloaders_every_epoch=False, replace_sampler_ddp=True, terminate_on_nan=False, prepare_data_per_node=True, plugins=None, amp_backend='native', amp_level='O2', distributed_backend=None, automatic_optimization=None, move_metrics_to_cpu=False, enable_pl_optimizer=True)}
[2022-02-09 06:55:50,918 INFO laia] Installed:
[2022-02-09 06:55:51,001 INFO laia.common.loader] Loaded model model_h128
[2022-02-09 06:55:51,002 INFO laia.engine.data_module] Training data transforms:
ToImageTensor(
  vision.Convert(mode=L),
  vision.Invert(),
  ToTensor()
)
[2022-02-09 06:55:51,004 WARNING py.warnings] UserWarning: Checkpoint directory experiment exists and is not empty. With save_top_k=3, all files in this directory will be deleted when a checkpoint is saved!
[2022-02-09 06:55:51,006 WARNING py.warnings] UserWarning: You have set progress_bar_refresh_rate < 20 on Google Colab. This may crash. Consider using progress_bar_refresh_rate >= 20 in Trainer.
[2022-02-09 06:55:51,061 INFO lightning] GPU available: True, used: True
[2022-02-09 06:55:51,061 INFO lightning] TPU available: False, using: 0 TPU cores
[2022-02-09 06:55:51,061 INFO lightning] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[2022-02-09 06:55:55,327 WARNING py.warnings] UserWarning: Experiment logs directory experiment exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
[2022-02-09 06:55:55,341 INFO lightning] 
   | Name                    | Type                  | Params
-------------------------------------------------------------------
0  | model                   | LaiaCRNN              | 9.6 M 
1  | model.conv              | Sequential            | 92.5 K
2  | model.conv.0            | ConvBlock             | 160   
3  | model.conv.0.conv       | Conv2d                | 160   
4  | model.conv.0.activation | LeakyReLU             | 0     
5  | model.conv.0.pool       | MaxPool2d             | 0     
6  | model.conv.1            | ConvBlock             | 4.6 K 
7  | model.conv.1.conv       | Conv2d                | 4.6 K 
8  | model.conv.1.activation | LeakyReLU             | 0     
9  | model.conv.1.pool       | MaxPool2d             | 0     
10 | model.conv.2            | ConvBlock             | 13.9 K
11 | model.conv.2.conv       | Conv2d                | 13.9 K
12 | model.conv.2.activation | LeakyReLU             | 0     
13 | model.conv.2.pool       | MaxPool2d             | 0     
14 | model.conv.3            | ConvBlock             | 27.7 K
15 | model.conv.3.conv       | Conv2d                | 27.7 K
16 | model.conv.3.activation | LeakyReLU             | 0     
17 | model.conv.4            | ConvBlock             | 46.2 K
18 | model.conv.4.conv       | Conv2d                | 46.2 K
19 | model.conv.4.activation | LeakyReLU             | 0     
20 | model.sequencer         | ImagePoolingSequencer | 0     
21 | model.rnn               | LSTM                  | 9.5 M 
22 | model.linear            | Linear                | 33.3 K
23 | criterion               | CTCLoss               | 0     
-------------------------------------------------------------------
9.6 M     Trainable params
0         Non-trainable params
9.6 M     Total params
[2022-02-09 06:55:55,721 CRITICAL laia] Uncaught exception:
Traceback (most recent call last):
  File "/content/PyLaia/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/content/PyLaia/laia/engine/engine_module.py", line 148, in validation_step
    batch_y_hat = self.model(batch_x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/PyLaia/laia/models/htr/laia_crnn.py", line 118, in forward
    x = self.sequencer(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/PyLaia/laia/nn/image_pooling_sequencer.py", line 53, in forward
    "Input images must have a fixed "
ValueError: Input images must have a fixed height of 16 pixels, found [15, 16]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/pylaia-htr-train-ctc", line 33, in <module>
    sys.exit(load_entry_point('laia', 'console_scripts', 'pylaia-htr-train-ctc')())
  File "/content/PyLaia/laia/scripts/htr/train_ctc.py", line 200, in main
    run(**args)
  File "/content/PyLaia/laia/scripts/htr/train_ctc.py", line 128, in run
    trainer.fit(engine_module, datamodule=data_module)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 468, in fit
    results = self.accelerator_backend.train()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 66, in train
    results = self.train_or_test()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 66, in train_or_test
    results = self.trainer.train()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 490, in train
    self.run_sanity_check(self.get_model())
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 697, in run_sanity_check
    _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 613, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 90, in validation_step
    output = self.__validation_step(args)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 98, in __validation_step
    output = self.trainer.model.validation_step(*args)
  File "/content/PyLaia/laia/engine/htr_engine_module.py", line 72, in validation_step
    result = super().validation_step(batch, *args, **kwargs)
  File "/content/PyLaia/laia/engine/engine_module.py", line 148, in validation_step
    batch_y_hat = self.model(batch_x)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/content/PyLaia/laia/engine/engine_exception.py", line 34, in exception_catcher
    ) from e
laia.engine.engine_exception.EngineException: Exception "ValueError('Input images must have a fixed height of 16 pixels, found [15, 16]',)" raised during epoch=0, global_step=0 with batch=['7_44_825', '10_35_229', '2_51_124', '13_17_158', '10_26_124', '7_0_439', '12_45_126', '10_4_131', '10_5_399', '13_16_155']
absl-py                 1.0.0
asn1crypto              0.24.0
cachetools              4.2.4
certifi                 2021.10.8
charset-normalizer      2.0.11
contextvars             2.4
cryptography            2.1.4
cycler                  0.11.0
dataclasses             0.8
docstring-parser        0.13
fsspec                  2022.1.0
future                  0.18.2
google-auth             2.6.0
google-auth-oauthlib    0.4.6
grpcio                  1.43.0
idna                    2.6
immutables              0.16
importlib-metadata      4.8.3
jsonargparse            4.1.4
keyring                 10.6.0
keyrings.alt            3.0
kiwisolver              1.3.1
laia                    1.0.1.dev0      /content/PyLaia
Markdown                3.3.6
matplotlib              3.3.4
natsort                 8.1.0
nnutils-pytorch         1.6.0
numpy                   1.19.5
oauthlib                3.2.0
Pillow                  8.4.0
pip                     21.3.1
protobuf                3.19.4
pyasn1                  0.4.8
pyasn1-modules          0.2.8
pybind11                2.9.1
pycrypto                2.6.1
PyGObject               3.26.1
pyparsing               3.0.7
python-apt              1.6.5+ubuntu0.7
python-dateutil         2.8.2
pytorch-lightning       1.1.0.dev0
pyxdg                   0.25
PyYAML                  6.0
requests                2.27.1
requests-oauthlib       1.3.1
rsa                     4.8
scipy                   1.5.4
screen-resolution-extra 0.0.0
SecretStorage           2.3.1
setuptools              59.6.0
six                     1.11.0
tensorboard             2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
textdistance            4.2.2
torch                   1.6.0
torchvision             0.7.0
tqdm                    4.62.3
typing_extensions       4.0.1
urllib3                 1.26.8
Werkzeug                2.0.3
wheel                   0.30.0
xkit                    0.0.0
zipp                    3.6.0

Whole script in colab notebook: https://colab.research.google.com/drive/1Pxd_rzZ50LQhm0ZWrHwfiNcnqnXYdvC1?usp=sharing
Please, help!

The loss is NaN or ± inf

While training the model, I got the following error:

(pyl2) pchela@main-computational-machine-2:~/TextRecognition/PyLaia$ ./laia/scripts/htr/train_ctc.py syms.txt [./train_rus2/images/] tr.txt val.txt 

[2021-03-06 09:50:40,183 INFO laia] Arguments: {'syms': 'syms.txt', 'img_dirs': ['./train_rus2/images/'], 'tr_txt_table': 'tr.txt', 'va_txt_table': 'val.txt', 'common': CommonArgs(seed=74565, train_path='', model_filename='model', experiment_dirname='experiment', monitor=<Monitor.va_cer: 'va_cer'>, checkpoint=None), 'data': DataArgs(batch_size=8, color_mode=<ColorMode.L: 'L'>), 'train': TrainArgs(delimiters=['<space>'], checkpoint_k=3, resume=False, early_stopping_patience=20, gpu_stats=False, augment_training=False), 'optimizer': OptimizerArgs(name=<Name.RMSProp: 'RMSProp'>, learning_rate=0.0005, momentum=0.0, weight_l2_penalty=0.0, nesterov=False), 'scheduler': SchedulerArgs(active=False, monitor=<Monitor.va_loss: 'va_loss'>, patience=5, factor=0.1), 'trainer': TrainerArgs(gradient_clip_val=0, process_position=0, num_nodes=1, num_processes=1, gpus=None, auto_select_gpus=False, tpu_cores=None, progress_bar_refresh_rate=1, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=1, max_epochs=1000, min_epochs=1, max_steps=None, min_steps=None, limit_train_batches=1.0, limit_val_batches=1.0, limit_test_batches=1.0, val_check_interval=1.0, flush_logs_every_n_steps=100, log_every_n_steps=50, accelerator=None, sync_batchnorm=False, precision=32, weights_summary='top', weights_save_path=None, num_sanity_val_steps=2, truncated_bptt_steps=None, profiler=None, benchmark=False, deterministic=False, reload_dataloaders_every_epoch=False, replace_sampler_ddp=True, terminate_on_nan=False, prepare_data_per_node=True, plugins=None, amp_backend='native', amp_level='O2', distributed_backend=None, automatic_optimization=None, move_metrics_to_cpu=False, enable_pl_optimizer=True)}

[2021-03-06 09:50:41,546 INFO laia] Installed:
[2021-03-06 09:50:41,605 INFO laia.common.loader] Loaded model model
[2021-03-06 09:50:41,606 INFO laia.engine.data_module] Training data transforms:
ToImageTensor(
  vision.Convert(mode=L),
  vision.Invert(),
  ToTensor()
)
[2021-03-06 09:50:41,656 INFO lightning] GPU available: True, used: False
[2021-03-06 09:50:41,656 INFO lightning] TPU available: False, using: 0 TPU cores
[2021-03-06 09:50:41,657 WARNING py.warnings] UserWarning: GPU available but not used. Set the --gpus flag when calling the script.
[2021-03-06 09:50:41,678 INFO lightning]
  | Name      | Type     | Params
---------------------------------------
0 | model     | LaiaCRNN | 4.8 M
1 | criterion | CTCLoss  | 0
---------------------------------------
4.8 M     Trainable params
0         Non-trainable params
4.8 M     Total params

[2021-03-06 09:50:42,408 INFO laia] VA sanity check: 100% 2/2 [00:00<00:00, 2.91it/s]
[2021-03-06 09:50:46,026 CRITICAL laia] Uncaught exception:
Traceback (most recent call last):
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_exception.py", line 27, in exception_catcher
    yield
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_module.py", line 118, in compute_loss
    raise ValueError("The loss is NaN or ± inf")
ValueError: The loss is NaN or ± inf

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./laia/scripts/htr/train_ctc.py", line 204, in <module>
    main()
  File "./laia/scripts/htr/train_ctc.py", line 200, in main
    run(**args)
  File "./laia/scripts/htr/train_ctc.py", line 128, in run
    trainer.fit(engine_module, datamodule=data_module)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 468, in fit
    results = self.accelerator_backend.train()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 61, in train
    results = self.train_or_test()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 66, in train_or_test
    results = self.trainer.train()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 521, in train
    self.train_loop.run_training_epoch()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 546, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 698, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 480, in optimizer_step
    optimizer.step(closure=train_step_and_backward_closure)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 226, in step
    optimizer.step(closure=closure, *args, **kwargs)
  File "/home/pchela/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/home/pchela/.local/lib/python3.8/site-packages/torch/optim/rmsprop.py", line 66, in step
    loss = closure()
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 688, in train_step_and_backward_closure
    result = self.training_step_and_backward(
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 778, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 327, in training_step
    training_step_output = self.trainer.accelerator_backend.training_step(args)
  File "/home/pchela/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 69, in training_step
    output = self.trainer.model.training_step(*args)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/htr_engine_module.py", line 37, in training_step
    result = super().training_step(batch, *args, **kwargs)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_module.py", line 132, in training_step
    batch_loss = self.compute_loss(batch, batch_y_hat, batch_y)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_module.py", line 119, in compute_loss
    return batch_loss
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/pchela/TextRecognition/PyLaia/laia/engine/engine_exception.py", line 29, in exception_catcher
    raise EngineException(
laia.engine.engine_exception.EngineException: Exception "ValueError('The loss is NaN or ± inf')" raised during epoch=0, global_step=3 with batch=['/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_37.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_35.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_39.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_33.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_4.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_34.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_32.jpg', '/home/pchela/TextRecognition/PyLaia/train_rus2/images/00000003_38.jpg']

No erorrs about missing tokens in syms.txt here

Update the documentation in the wiki

We should update the documentation to describe:

  • how to format the dataset ;
  • how to create and train a model ;
  • how to run a prediction on an image from a trained model ;
  • how to create an ARPA language model and combine it with a PyLaia model.

Branch cleanup

Hey @jpuigcerver,

I was thinking we could do some branch cleanup. I'm asking you first in case you have local changes which have not been pushed. We could also have master be called master again.

I'm talking about deleting egs, icfhr2014kws, kws_gw, refactor_crnn_model, and refactor_kws_egs since they don't have any changes. Also possibly PyTorch-0.3.1

Wrong torch version?

I was trying to resume a training and I switched to the 'refactor_kws_egs_master' and pylaia-htr-train-ctc prompted an ImportError:

Traceback (most recent call last):
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/bin/pylaia-htr-train-ctc", line 4, in <module>
    __import__('pkg_resources').run_script('laia==0.1.0', 'pylaia-htr-train-ctc')
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib64/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib64/python3.6/site-packages/pkg_resources/__init__.py", line 1438, in run_script
    exec(code, namespace, namespace)
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib/python3.6/site-packages/laia-0.1.0-py3.6.egg/EGG-INFO/scripts/pylaia-htr-train-ctc", line 13, in <module>
    import laia.common.logging as log
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib64/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/__init__.py", line 21, in <module>
    import laia.data
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib64/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/data/__init__.py", line 4, in <module>
    from laia.data.image_data_loader import ImageDataLoader
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib64/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/data/image_data_loader.py", line 5, in <module>
    from laia.data.padding_collater import PaddingCollater
  File "/labos/alumnos/jaizaber/workw/RDNN-HTR-PY/lib64/python3.6/site-packages/laia-0.1.0-py3.6.egg/laia/data/padding_collater.py", line 11, in <module>
    from torch.utils.data.dataloader import numpy_type_map
ImportError: cannot import name 'numpy_type_map'

That's because numpy_type_map was removed on latest version of Torch. I downgraded Torch to 1.0.1 and Torchvision to 0.2.2 and it worked. I hope this could help to someone who could have the same problem.

Lattice output is wrong

The sign of the log-probability should be inverted, since lattices have "costs" in their arcs, not log-probabilities (i.e. cost = - logprob).

row, row + 1, col + 1, float(output[row, col]), col + 1, digits=digits

Fix issue with recent versions of jsonargparse

We have an error with recent versions of jsonargparse.
To avoid it, I set an old version (4.7) in the requirements.txt but it would be better to fix the code to be compatible with the latest versions (4.18). I also need to fix this test related to this issue.

Here is the error I get:

Traceback (most recent call last):
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/bin/pylaia-htr-train-ctc", line 11, in <module>
    load_entry_point('laia', 'console_scripts', 'pylaia-htr-train-ctc')()
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/laia/scripts/htr/train_ctc.py", line 198, in main
    args = get_args()
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/laia/scripts/htr/train_ctc.py", line 185, in get_args
    args = parser.parse_args(argv, with_meta=False)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/deprecated.py", line 112, in patched_parse
    cfg = parse_method(*args, _skip_check=_skip_check, **kwargs)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 364, in parse_args
    cfg, unk = self.parse_known_args(args=args, namespace=cfg)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 237, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/usr/lib/python3.8/argparse.py", line 2018, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/usr/lib/python3.8/argparse.py", line 1958, in consume_optional
    take_action(action, args, option_string)
  File "/usr/lib/python3.8/argparse.py", line 1886, in take_action
    action(self, namespace, argument_values, option_string)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/actions.py", line 173, in __call__
    self.apply_config(parser, cfg, self.dest, values)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/actions.py", line 190, in apply_config
    cfg_file = parser.parse_path(value, **kwargs)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 533, in parse_path
    parsed_cfg = self.parse_string(
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/deprecated.py", line 112, in patched_parse
    cfg = parse_method(*args, _skip_check=_skip_check, **kwargs)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 578, in parse_string
    cfg = self._load_config_parser_mode(cfg_str, cfg_path, ext_vars, previous_config.get())
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 621, in _load_config_parser_mode
    return self._apply_actions(cfg_dict, prev_cfg=prev_cfg)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 1229, in _apply_actions
    value = self._check_value_key(action, value, action_dest, prev_cfg)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/core.py", line 1279, in _check_value_key
    value = action._check_type(value, cfg=cfg)  # type: ignore
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/typehints.py", line 366, in _check_type
    val = adapt_typehints(val, self._typehint, **kwargs)
  File "/home/users/starride/git_repos/pylaia_bump_pytorch/.env/lib/python3.8/site-packages/jsonargparse/typehints.py", line 574, in adapt_typehints
    adapt_kwargs_n = {**adapt_kwargs, 'prev_val': prev_val[n]} if isinstance(prev_val, list) else adapt_kwargs
IndexError: list index out of range

Contradictory requirements?

The requirements.txt after i clone pylaia says "python 3.7 and torchvision >=0.14"

But torchvision 0.14 requires python >= 3.7.2

Maybe I don't understand it. I'm genuinely confused

UserWarning: nnutils does not seem to be installed, masking cannot be used

Hello, after installing PyLaia using the small script you provide in an Ubuntu 20.04 machine with a Python 3.7 virtual environment and trying to run some experiments the following error appears:

/home/dvillanova/tfgenv/lib/python3.7/site-packages/laia-0.1.0-py3.7.egg/laia/models/htr/conv_block.py:56: UserWarning: nnutils does not seem to be installed, masking cannot be used
"nnutils does not seem to be installed, masking cannot be used"
Traceback (most recent call last):
File "/home/dvillanova/tfgenv/bin/pylaia-htr-create-model", line 4, in
import('pkg_resources').run_script('laia==0.1.0', 'pylaia-htr-create-model')
File "/home/dvillanova/tfgenv/lib/python3.7/site-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/dvillanova/tfgenv/lib/python3.7/site-packages/pkg_resources/init.py", line 1464, in run_script
exec(code, namespace, namespace)
File "/home/dvillanova/tfgenv/lib/python3.7/site-packages/laia-0.1.0-py3.7.egg/EGG-INFO/scripts/pylaia-htr-create-model", line 223, in
model = LaiaCRNN(**parameters)
File "/home/dvillanova/tfgenv/lib/python3.7/site-packages/laia-0.1.0-py3.7.egg/laia/models/htr/laia_crnn.py", line 75, in init
sequencer=image_sequencer, columnwise=not vertical_text
File "/home/dvillanova/tfgenv/lib/python3.7/site-packages/laia-0.1.0-py3.7.egg/laia/nn/image_pooling_sequencer.py", line 31, in init
raise ImportError("nnutils does not seem installed")
ImportError: nnutils does not seem installed

And it does not make much sense as nnutils-pytorch was specifically installed as it is present in the requirements.txt file you provide. Do you have any insight regarding what could be missing?

Fix tests for multiprocessing

The bump to Pytorch 1.13 broke some tests related to multiprocessing on CPU and GPU. We get the following errors:

  • torch.multiprocessing.spawn.ProcessRaisedException
  • AttributeError: 'LightningDistributedDataParallel' object has no attribute '_sync_params'

On these tests:

tests/callbacks/learning_rate_test.py:# TODO: fix test with num_processes=2
tests/callbacks/training_timer_test.py:# TODO: fix test with num_processes=2
tests/loggers/epoch_csv_logger_test.py:# TODO: fix test with num_processes=2
tests/scripts/htr/decode_ctc_test.py:# TODO: fix test with nprocs=2
tests/scripts/htr/netout_test.py:# TODO: fix test with nprocs=2
tests/scripts/htr/train_ctc_test.py:# TODO: fix "ddp_cpu" mode
tests/scripts/htr/train_ctc_test.py:# TODO: fix "ddp" mode
tests/scripts/htr/train_ctc_test.py:# TODO: fix first assertion

I skipped the tests for now, but I need to investigate why we are getting this error and how to fix it.

height of images

I'm running 2 models with the default hyperparameters as specified in iam-htr on my own data.
That data contains images with rectangles of text lines.
If I don't rescale these to height 128 I'm getting a lot of times samples which are ignored for loss computation. This does not happen when rescaling my images to height 128. Any reason why this might happen.

'2020-08-13 09:37:59,471 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32500665-32', '32505213-12']
2020-08-13 09:38:01,027 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505235-01']
2020-08-13 09:38:03,464 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32504708-17']
2020-08-13 09:38:15,463 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505213-08']
2020-08-13 09:38:16,740 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32501299-03']
2020-08-13 09:38:19,458 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505198-74']
2020-08-13 09:38:22,154 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505158-51']
2020-08-13 09:38:26,983 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505184-07']2020-08-13 09:37:59,471 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32500665-32', '32505213-12']
2020-08-13 09:38:01,027 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505235-01']
2020-08-13 09:38:03,464 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32504708-17']
2020-08-13 09:38:15,463 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505213-08']
2020-08-13 09:38:16,740 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32501299-03']
2020-08-13 09:38:19,458 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505198-74']
2020-08-13 09:38:22,154 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505158-51']
2020-08-13 09:38:26,983 WARNING laia.losses.ctc_loss : The following samples in the batch were ignored for the loss computation: ['32505184-07']

FWIW:

Decode on iam-htr experiment

Hi, I trained the model from the iam-htr example, but when I try to run the decode script it gives the following error

./src/decode_net.sh 
ERROR: File "data/lang/lines/char/aachen/te.txt" does not exist!

I did solve this one changing the path with puigcerver partition te.txt file, but then comes another missing file error.
Do you have a running version of the decode part of the iam-htr example?
Thank you,
Manu

Error when runing IAM example

Hi, I'm trying to run the iam-htr example but get this error when the training is about to start. Do you know what can be the reason?

Train:   0%|                                                                                                                                                                        | 0/617 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/engine.py", line 232, in exception_catcher
    yield
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/trainer.py", line 222, in compute_loss
    loss = self._criterion(batch_output, batch_target, **kwargs)
  File "/home/mcarbonell/virtualenvs/pylaia/lib/python3.5/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mcarbonell/Documents/PyLaia/laia/losses/ctc_loss.py", line 341, in forward
    acts, target, act_lens, valid_indices if err_indices else None
  File "/home/mcarbonell/Documents/PyLaia/laia/losses/ctc_loss.py", line 143, in forward
    device=torch.device("cpu"),
TypeError: an integer is required (got type NoneType)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mcarbonell/Documents/PyLaia/egs/iam-htr/../../pylaia-htr-train-ctc", line 279, in <module>
    experiment.run()
  File "/home/mcarbonell/Documents/PyLaia/laia/experiments/experiment.py", line 76, in run
    self._tr_engine.run()
  File "/home/mcarbonell/Documents/PyLaia/laia/hooks/action.py", line 31, in wrapper
    **{k: v for k, v in kwargs.items() if k in argspec.args}
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/trainer.py", line 126, in run
    self._run_epoch()
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/engine.py", line 224, in _run_epoch
    self._run_iteration(it, batch)
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/trainer.py", line 180, in _run_iteration
    batch_loss = self.compute_loss(batch, batch_output, batch_target)
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/trainer.py", line 228, in compute_loss
    return loss
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/mcarbonell/Documents/PyLaia/laia/engine/engine.py", line 240, in exception_catcher
    raise_from(wrapper, e)
  File "<string>", line 3, in raise_from
laia.engine.engine_exception.EngineException: Exception "TypeError('an integer is required (got type NoneType)',)" raised during epoch 0, iteration 0. The batch that caused the exception was: ['b03-092-03', 'c02-078-05', 'a03-040-01', 'r03-030-05', 'a04-003-03', 'f03-191-01', 'a02-120-08', 'e07-079-08', 'c06-076-07', 'g04-011-07']
Train:   0%|                                                                                                                                                                        | 0/617 [00:00<?, ?it/s]
```
`

issue with cuda environments (1.6/1.5) with nnutils or the like

Hello @jpuigcerver .
I've been trying to build a handwritten text recognition model with PyLaia on Google Colab on Dutch transcriptions but I'm failing to do so.
You can see the progress here https://github.com/DIGI-VUB/HTR-tests/blob/master/pylaia_getuigenissen.ipynb (a whole lot of installation trouble before the interesting part which is at the end)

The part where it is failing is that Google Colab by default has CUDA version 10.1 with Pytorch 1.6 with the following CUDA driver and NVIDIA environment:
| NVIDIA-SMI 450.57 Driver Version: 418.67 CUDA Version: 10.1 | Unfortunately your nnutils https://github.com/jpuigcerver/nnutils module is incompatible with Pytorch 1.6, so I install Pytorch 1.5, and nnutils 0.6.0 with pip
!pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Next, after a whole bit of installations of dependencies of PyLaia, I build the model and get the following error message:

2020-08-12 13:56:43,139 INFO laia.common.arguments : 
{'add_logsoftmax_to_loss': True,
 'batch_size': 10,
 'checkpoint': 'ckpt.lowest-valid-cer*',
 'delimiters': ['@'],
 'gpu': 1,
 'img_dirs': ['/content/getuigenissen/imgs/textlines_h128'],
 'iterations_per_update': 1,
 'learning_rate': 0.0003,
 'logging_also_to_stderr': 20,
 'logging_config': None,
 'logging_file': '/content/getuigenissen/log/training-trace.log',
 'logging_level': 20,
 'logging_overwrite': False,
 'max_epochs': None,
 'max_nondecreasing_epochs': 20,
 'model_filename': 'model',
 'momentum': 0,
 'num_rolling_checkpoints': 3,
 'print_args': True,
 'save_checkpoint_interval': 10,
 'seed': 74565,
 'show_progress_bar': True,
 'syms': <_io.TextIOWrapper name='/content/getuigenissen/imgs/syms_ctc.txt' mode='r' encoding='UTF-8'>,
 'tr_txt_table': <_io.TextIOWrapper name='/content/getuigenissen/imgs/tr.txt' mode='r' encoding='UTF-8'>,
 'train_path': '/content/getuigenissen',
 'train_samples_per_epoch': None,
 'use_baidu_ctc': False,
 'use_distortions': False,
 'va_txt_table': <_io.TextIOWrapper name='/content/getuigenissen/imgs/va.txt' mode='r' encoding='UTF-8'>,
 'valid_samples_per_epoch': None}
2020-08-12 13:56:43,214 INFO laia.common.loader : Loaded model /content/getuigenissen/model
2020-08-12 13:56:46,104 INFO laia : Training data transforms:
Compose(
    vision.Convert(mode=L)
    vision.Invert()
    ToTensor()
)
Train:   0% 0/485 [00:00<?, ?it/s]CUDA error : 35 (CUDA driver version is insufficient for CUDA runtime version)
/
/
/

Would you either be so kind in helping out how to debug this, or provide wheels for nnutils with pytorch for cu101.
Thanks for the help

Line Images not being created for every Image

Hi! When I run the portion of the script that uses TextFeats to create the line images, it isnt creating Line images for every single Id. When I run the htr-train-ctc I get a slew of warnings because the line images are present for a lot of the Text, which is all correct and present in train_gt.txt. Any suggestions for what might be happening?

tqdm bug: "Set changed size during iteration"

Since your latest commits I get this error from tqdm:

/home/jpuigcerver/.local/lib/python2.7/site-packages/tqdm/_monitor.py:89: TqdmSynchronisationWarning: Set changed size during iteration (see tqdm/tqdm#481)
TqdmSynchronisationWarning)

I've been also getting some segmentation faults, but I'm not sure if its related to this, since I have not identified which produces the SegFault.

Hugging Face Hub intergration

It's really great to see so much work going into open-source HTR! It could be cool to have closer integration with the Hugging Face Hub. I know you are already sharing models there, but having a direct method for downloading and pushing models to the Hub would be super nice and could help increase the usage of open models.

Happy to offer some guidance on the best approach for doing this if it is of interest 🤗

Non-editable installation is broken

The release package of PyLaia v1.0.0 cannot be installed as a regular non-editable module. There are two main issues that render the resulting installation in a broken state:

  • Folders laia/scripts and laia/models lack the needed __init__.py to be recognized as Python modules and, hence, they are ignored and not installed by pip.
  • The method get_installed_versions() relies on the presence of requirements.txt. pip will not install that file and the method will fail.

Installing PyLaia as an editable module is not an option for us as we want to make the software available to multiple users.

bad tensor format

I was trying to run the iam-htr example and I received the following error:

Train:   0%|          | 0/617 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/local/bin/pylaia-htr-train-ctc", line 4, in <module>
    __import__('pkg_resources').run_script('laia==0.1.0', 'pylaia-htr-train-ctc')
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/EGG-INFO/scripts/pylaia-htr-train-ctc", line 279, in <module>
    experiment.run()
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/experiments/experiment.py", line 76, in run
    self._tr_engine.run()
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/hooks/action.py", line 31, in wrapper
    **{k: v for k, v in kwargs.items() if k in argspec.args}
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/engine/trainer.py", line 126, in run
    self._run_epoch()
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/engine/engine.py", line 224, in _run_epoch
    self._run_iteration(it, batch)
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/engine/trainer.py", line 180, in _run_iteration
    batch_loss = self.compute_loss(batch, batch_output, batch_target)
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/engine/trainer.py", line 228, in compute_loss
    return loss
  File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python2.7/dist-packages/laia-0.1.0-py2.7.egg/laia/engine/engine.py", line 240, in exception_catcher
    raise_from(wrapper, e)
  File "/usr/local/lib/python2.7/dist-packages/torch/_six.py", line 117, in raise_from
    raise value
laia.engine.engine_exception.EngineException: Exception "RuntimeError("Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)",)" raised during epoch 0, iteration 0. The batch that caused the exception was: ['b01-073-03', 'h04-007-04', 'g06-045m-05', 'b06-015-10', 'e01-050-03', 'f04-028-00', 'a01-038-09', 'c02-026-06', 'a01-128-04', 'a01-043-05']

Any ideas how to fix it?

Specs: Debian 9, Python 3.7

Error in run_epoch

Hi,
First, I would like to great you for this awesome work. It is very well coded!
Now I think there is an error at this line:

There should be an "else" and the two next lines should be unindented: if not, we go in a dead loop in my mind.

Thank you for your answer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.