Git Product home page Git Product logo

glat's Introduction

GLAT

Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Requirements

  • Python >= 3.7
  • Pytorch >= 1.5.0
  • Fairseq 1.0.0a0

Preparation

Train an autoregressive Transformer according to the instructions in Fairseq.

Use the trained autoregressive Transformer to generate target sentences for the training set.

Binarize the distilled training data.

input_dir=path_to_raw_text_data
data_dir=path_to_binarized_output
src=source_language
tgt=target_language
python3 fairseq_cli/preprocess.py --source-lang ${src} --target-lang ${tgt} --trainpref ${input_dir}/train \
    --validpref ${input_dir}/valid --testpref ${input_dir}/test --destdir ${data_dir}/ \
    --workers 32 --src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt

Train

  • For training GLAT
save_path=path_for_saving_models
python3 train.py ${data_dir} --arch glat --noise full_mask --share-all-embeddings \
    --criterion glat_loss --label-smoothing 0.1 --lr 5e-4 --warmup-init-lr 1e-7 --stop-min-lr 1e-9 \
    --lr-scheduler inverse_sqrt --warmup-updates 4000 --optimizer adam --adam-betas '(0.9, 0.999)' \
    --adam-eps 1e-6 --task translation_lev_modified --max-tokens 8192 --weight-decay 0.01 --dropout 0.1 \
    --encoder-layers 6 --encoder-embed-dim 512 --decoder-layers 6 --decoder-embed-dim 512 --fp16 \
    --max-source-positions 1000 --max-target-positions 1000 --max-update 300000 --seed 0 --clip-norm 5\
    --save-dir ${save_path} --src-embedding-copy --length-loss-factor 0.05 --log-interval 1000 \
    --eval-bleu --eval-bleu-args '{"iter_decode_max_iter": 0, "iter_decode_with_beam": 1}' \
    --eval-tokenized-bleu --eval-bleu-remove-bpe --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric --decoder-learned-pos --encoder-learned-pos \
    --apply-bert-init --activation-fn gelu --user-dir glat_plugins \
  • For training GLAT+CTC
save_path=path_for_saving_models
python3 train.py ${data_dir} --arch glat_ctc --noise full_mask --share-all-embeddings \
    --criterion ctc_loss --label-smoothing 0.1 --lr 5e-4 --warmup-init-lr 1e-7 --stop-min-lr 1e-9 \
    --lr-scheduler inverse_sqrt --warmup-updates 4000 --optimizer adam --adam-betas '(0.9, 0.999)' \
    --adam-eps 1e-6 --task translation_lev_modified --max-tokens 8192 --weight-decay 0.01 --dropout 0.1 \
    --encoder-layers 6 --encoder-embed-dim 512 --decoder-layers 6 --decoder-embed-dim 512 --fp16 \
    --max-source-positions 1000 --max-target-positions 1000 --max-update 300000 --seed 0 --clip-norm 2\
    --save-dir ${save_path} --length-loss-factor 0 --log-interval 1000 \
    --eval-bleu --eval-bleu-args '{"iter_decode_max_iter": 0, "iter_decode_with_beam": 1}' \
    --eval-tokenized-bleu --eval-bleu-remove-bpe --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric --decoder-learned-pos --encoder-learned-pos \
    --apply-bert-init --activation-fn gelu --user-dir glat_plugins \

Inference

  • The default setting without self re-ranking
checkpoint_path=path_to_your_checkpoint
python3 fairseq_cli/generate.py ${data_dir} --path ${checkpoint_path} --user-dir glat_plugins \
    --task translation_lev_modified --remove-bpe --max-sentences 20 --source-lang ${src} --target-lang ${tgt} \
    --quiet --iter-decode-max-iter 0 --iter-decode-eos-penalty 0 --iter-decode-with-beam 1 --gen-subset test
  • Generating with self re-ranking of beam 5
checkpoint_path=path_to_your_checkpoint
python3 fairseq_cli/generate.py ${data_dir} --path ${checkpoint_path} --user-dir glat_plugins \
    --task translation_lev_modified --remove-bpe --max-sentences 20 --source-lang ${src} --target-lang ${tgt} \
    --quiet --iter-decode-max-iter 0 --iter-decode-eos-penalty 0 --iter-decode-with-beam 5 --gen-subset test

The script for averaging checkpoints is scripts/average_checkpoints.py

Thanks dugu9sword for contributing part of the code.

glat's People

Contributors

flc777 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

glat's Issues

KeyError: 'bleu'

@FLC777 Hello, thank you very much for the open source. When I run the glat_CTC script, the model can run and save, but the following errors are always reported from time to time, and then the program stops and kills:
image
image

About the CTC/NPD implementation

Hello thanks for your work!

There are 3 length prediction method mentioned in your paper.
 ① the method used in Mask-predict
 ② NPD
 ③ CTC

I noticed that the Length prediction method in this source is ↓, the implementation of ①.

def forward_length_prediction(self, length_out, encoder_out, tgt_tokens=None):

Do you have some plan to release the CTC or NPD implementation?

Thanks a lot!

train command gives error related to 'torch_imputer'

I got this error while running train command.
2023-09-12 15:47:46.250592: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-12 15:47:47.042422: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/train.py", line 14, in
cli_main()
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/fairseq_cli/train.py", line 440, in cli_main
parser = options.get_training_parser()
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/fairseq/options.py", line 36, in get_training_parser
parser = get_parser("Trainer", default_task)
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/fairseq/options.py", line 216, in get_parser
utils.import_user_module(usr_args)
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/fairseq/utils.py", line 459, in import_user_module
importlib.import_module(module_name)
File "/home/administrator/anaconda3/envs/itv2/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/glat_plugins/init.py", line 2, in
from .models import *
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/glat_plugins/models/init.py", line 2, in
from .glat_ctc import *
File "/media/administrator/0adda20a-df3e-4c0c-a4fc-ef1c1c039914/EXPERIMENTS/GLAT-main/glat_plugins/models/glat_ctc.py", line 19, in
from torch_imputer.imputer import best_alignment
ModuleNotFoundError: No module named 'torch_imputer'

Please let me know how to resolve this

size mismatch for encoder.embed_tokens.weight

Thanks for your code. An error occurred while attempting to evaluate on test dataset.

fairseq plugins loaded...
2021-11-12 08:25:29 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': 'glat_plugins', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False}, 'common_eval': {'_name': None, 'path': '/data/wbxu/GLAT_Checkpoints/checkpoint_best.pt', 'post_process': 'subword_nmt', 'quiet': True, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 2, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'tpu': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 20, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 20, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 0, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': None, 'task': {'_name': 'translation_lev_modified', 'data': '/data/wbxu/DSLP-main/data-bin/wmt14.en-de_kd', 'source_lang': 'en', 'target_lang': 'de', 'load_alignments': False, 'left_pad_source': False, 'left_pad_target': False, 'max_source_positions': 1024, 'max_target_positions': 1024, 'upsample_primary': -1, 'truncate_source': False, 'num_batch_buckets': 0, 'train_subset': 'train', 'dataset_impl': None, 'required_seq_len_multiple': 1, 'eval_bleu': False, 'eval_bleu_args': '{}', 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': '{}', 'eval_tokenized_bleu': False, 'eval_bleu_remove_bpe': None, 'eval_bleu_print_samples': False, 'noise': 'random_delete', 'start_p': 0.5, 'minus_p': 0.2, 'total_up': 300000}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None}
2021-11-12 08:25:29 | INFO | fairseq.tasks.translation | [en] dictionary: 39840 types
2021-11-12 08:25:29 | INFO | fairseq.tasks.translation | [de] dictionary: 39840 types
2021-11-12 08:25:29 | INFO | fairseq_cli.generate | loading model(s) from /data/wbxu/GLAT_Checkpoints/checkpoint_best.pt
/home/wbxu/anaconda3/envs/GLATEnv/lib/python3.7/site-packages/omegaconf/omegaconf.py:579: UserWarning: update() merge flag is is not specified, defaulting to False.
For more details, see https://github.com/omry/omegaconf/issues/367
  stacklevel=1,
Traceback (most recent call last):
  File "fairseq_cli/generate.py", line 408, in <module>
    cli_main()
  File "fairseq_cli/generate.py", line 404, in cli_main
    main(args)
  File "fairseq_cli/generate.py", line 49, in main
    return _main(cfg, sys.stdout)
  File "fairseq_cli/generate.py", line 102, in _main
    num_shards=cfg.checkpoint.checkpoint_shard_count,
  File "/data/wbxu/GLAT-main/fairseq/checkpoint_utils.py", line 304, in load_model_ensemble
    state,
  File "/data/wbxu/GLAT-main/fairseq/checkpoint_utils.py", line 358, in load_model_ensemble_and_task
    model.load_state_dict(state["model"], strict=strict, model_cfg=cfg.model)
  File "/data/wbxu/GLAT-main/fairseq/models/fairseq_model.py", line 115, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/wbxu/anaconda3/envs/GLATEnv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Glat:
	size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([39847, 512]) from checkpoint, the shape in current model is torch.Size([39840, 512]).
	size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([39847, 512]) from checkpoint, the shape in current model is torch.Size([39840, 512]).
	size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([39847, 512]) from checkpoint, the shape in current model is torch.Size([39840, 512]).

Problem of inference speed

I test the inference speed compared autoregressive model with GLAT model,
It is no more than 2x speed up actually.
There is about 15.3x speed up in your paper (Table 1), pls provide inference scripts.
thanks

backpropagation

Hi, I would like to ask if replacing the Encoder output will have an impact on the backpropagation

about mask_id in "glat_ctc.py" 41 line

I noticed that the blank_id used in the generator file and ctc_loss is unk_idx=3, why use "pad_id=1" as the masked_id in the "get_align_target" function in the "glat_ctc.py" file, this should be blank_id.

cannot reproduce results on wmt14 ende distill

I follow the instructions but I cannot reproduce the results on wmt14 ende. With the provided generation script I only got
BLEU4 = 16.65, 48.6/22.0/11.4/6.3 (BP=1.000, ratio=1.014, syslen=65432, reflen=64506)
After re-ranking with an AT model, I got
BLEU4 = 19.93, 52.2/25.6/14.2/8.3 (BP=1.000, ratio=1.013, syslen=65313, reflen=64506)
Still far from 26.55 as reported in the paper.
I use the provided training script and train the model on 8 V100 GPUs. The training log shows that the glat accuracy is pretty high at the end of training.
epoch 151: 1050 / 1993 loss=3.323, nll_loss=1.371, glat_accu=0.758, glat_keep=0.073, glat_context_p=0.3, word_ins=3.162, length=3.223, ppl=10.01, wps=132588, ups=2.16, wpb=61302.6, bsz=2018.2, num_updates=300000, lr=5.7735e-05, gnorm=1.455, clip=0, train_wall=46, gb_free=6.8, wall=139238
But the valid loss is also very high.
valid | epoch 151 | valid on 'valid' subset | loss 7.412 | nll_loss 5.888 | glat_accu 0 | glat_keep 0 | glat_context_p 0 | word_ins 7.252 | length 3.209 | ppl 170.31 | wps 242820 | wpb 41551 | bsz 1500 | num_updates 300000 | best_loss 6.511

About Data Set Selection

Hello, author, while reviewing your code, I found that you did not specify which dataset to use. I read your paper, such as WMT4en de, and downloaded it. However, as for the src dict and tgt dict parameters in preprocess.py, I did not find dict.en.txt and dict.de.txt in the dataset. I dare to bother you, but I hope you can provide a dataset.

The argument "--cpu" is not supported

I want to use a device without GPU to translate testing datasets.
But it seems that a GPU is necessary, even though I used the argument "--cpu".
My command is:

$ python3 fairseq_cli/generate.py ${data_dir} --path ${checkpoint_path}/checkpoint_best.pt --user-dir glat_plugins \
>     --task translation_lev_modified --remove-bpe --max-sentences 20 --source-lang ${src} --target-lang ${tgt} \
>     --quiet --iter-decode-max-iter 0 --iter-decode-eos-penalty 0 --iter-decode-with-beam 1 --gen-subset test --save-dir transcheck \
>     --cpu

and it responses:

fairseq plugins loaded...
2021-11-05 20:48:08 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': True, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': 'glat_plugins', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False}, 'common_eval': {'_name': None, 'path': 'smallmodel/checkpoint_best.pt', 'post_process': 'subword_nmt', 'quiet': True, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'tpu': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 20, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 20, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': 'transcheck', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'model_parallel_size': 1, 'distributed_rank': 0}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 0, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': None, 'task': {'_name': 'translation_lev_modified', 'data': 'cantonese-mandarin/pre-processed', 'source_lang': 'can', 'target_lang': 'man', 'load_alignments': False, 'left_pad_source': False, 'left_pad_target': False, 'max_source_positions': 1024, 'max_target_positions': 1024, 'upsample_primary': -1, 'truncate_source': False, 'num_batch_buckets': 0, 'train_subset': 'train', 'dataset_impl': None, 'required_seq_len_multiple': 1, 'eval_bleu': False, 'eval_bleu_args': '{}', 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': '{}', 'eval_tokenized_bleu': False, 'eval_bleu_remove_bpe': None, 'eval_bleu_print_samples': False, 'noise': 'random_delete', 'start_p': 0.5, 'minus_p': 0.2, 'total_up': 300000}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None}
2021-11-05 20:48:08 | INFO | fairseq.tasks.translation | [can] dictionary: 10168 types
2021-11-05 20:48:08 | INFO | fairseq.tasks.translation | [man] dictionary: 10168 types
2021-11-05 20:48:08 | INFO | fairseq_cli.generate | loading model(s) from smallmodel/checkpoint_best.pt
2021-11-05 20:48:10 | INFO | fairseq.data.data_utils | loaded 1,000 examples from: cantonese-mandarin/pre-processed/test.can-man.can
2021-11-05 20:48:10 | INFO | fairseq.data.data_utils | loaded 1,000 examples from: cantonese-mandarin/pre-processed/test.can-man.man
2021-11-05 20:48:10 | INFO | fairseq.tasks.translation | cantonese-mandarin/pre-processed test can-man 1000 examples
Traceback (most recent call last):                                                                                                                                         
  File "fairseq_cli/generate.py", line 408, in <module>
    cli_main()
  File "fairseq_cli/generate.py", line 404, in cli_main
    main(args)
  File "fairseq_cli/generate.py", line 49, in main
    return _main(cfg, sys.stdout)
  File "fairseq_cli/generate.py", line 206, in _main
    constraints=constraints,
  File "/data/home/db72687/Documents/glat/fairseq/tasks/fairseq_task.py", line 501, in inference_step
    models, sample, prefix_tokens=prefix_tokens, constraints=constraints
  File "/data/home/db72687/anaconda3/envs/glat/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/data/home/db72687/Documents/glat/fairseq/iterative_refinement_generator.py", line 212, in generate
    prev_decoder_out, encoder_out, **decoder_options
  File "/data/home/db72687/Documents/glat/fairseq/models/nat/nonautoregressive_transformer.py", line 138, in forward_decoder
    step=step,
  File "/data/home/db72687/anaconda3/envs/glat/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/home/db72687/Documents/glat/fairseq/models/nat/fairseq_nat_model.py", line 46, in wrapper
    self, normalize=normalize, encoder_out=encoder_out, *args, **kwargs
  File "/data/home/db72687/Documents/glat/fairseq/models/nat/nonautoregressive_transformer.py", line 239, in forward
    embedding_copy=(step == 0) & self.src_embedding_copy,
  File "/data/home/db72687/Documents/glat/fairseq/models/nat/nonautoregressive_transformer.py", line 301, in extract_features
    x = cat_x.index_select(dim=0, index=torch.arange(bsz * seq_len).cuda() * 2 +
  File "/data/home/db72687/anaconda3/envs/glat/lib/python3.7/site-packages/torch/cuda/__init__.py", line 149, in _lazy_init
    _check_driver()
  File "/data/home/db72687/anaconda3/envs/glat/lib/python3.7/site-packages/torch/cuda/__init__.py", line 54, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

About the CTC implementation

Hi, thanks for your brilliant work!

I have read your paper and want to reproduce the results of GLAT model with CTC loss.

However, I noticed that there is no implementation of the proposed glancing training with CTC loss, where you use LCS distance between the Y and Y^.

I had some trouble of implementing this (mainly the selection of Y and the LCS calculation for tensors) and I am wondering whether you could kindly release this part of the code.

Thanks a lot!

Training Speed

Hi, thanks for your awesome work!

When I use the command line here (https://github.com/FLC777/GLAT#train) to train the model on 8*V100 GPUs, the time cost of each epoch increases rapidly (epoch1 10min -> epoch 50 120min). Is there something wrong with my command? I wonder what I am supposed to do.

Thanks very much!
hemingkx

About CTC

I'd like to ask you something. Can I calculate loss with cross entropy loss on the generated ctc_align sequence instead of torch.nn.CTCloss() in the code?

training fail with the following error

 File "/GLAT/glat_plugins/criterions/glat_loss.py", line 150, in forward
    utils.item(l["loss"].data / l["factor"])
  File "/GLAT/fairseq/utils.py", line 293, in item
    return tensor.item()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Which part of your code copies the encoder hidden when glancing?

Hi,
I read your paper and code with great interest. You said in the paper that during glancing, the sampled tokens are replaced with the embeddings from decoder, while the unsampled tokens use encoder output.
However, in the code, the unglanced tokens are still masked.
Am I missing something?

       glat_info = None
        if glat and tgt_tokens is not None:
            with torch.no_grad():
                with torch_seed(rand_seed):
                    word_ins_out = self.decoder(
                        normalize=False,
                        prev_output_tokens=prev_output_tokens,
                        encoder_out=encoder_out,
                    )
                pred_tokens = word_ins_out.argmax(-1)
                same_num = ((pred_tokens == tgt_tokens) & nonpad_positions).sum(1)
                input_mask = torch.ones_like(nonpad_positions)
                bsz, seq_len = tgt_tokens.size()
                for li in range(bsz):
                    target_num = (((seq_lens[li] - same_num[li].sum()).float()) * glat['context_p']).long()
                    if target_num > 0:
                        input_mask[li].scatter_(dim=0, index=torch.randperm(seq_lens[li])[:target_num].cuda(), value=0)
                input_mask = input_mask.eq(1)
                input_mask = input_mask.masked_fill(~nonpad_positions,False)
                glat_prev_output_tokens = prev_output_tokens.masked_fill(~input_mask, 0) + tgt_tokens.masked_fill(input_mask, 0)
                # this line here
                
                glat_tgt_tokens = tgt_tokens.masked_fill(~input_mask, self.pad)

                prev_output_tokens, tgt_tokens = glat_prev_output_tokens, glat_tgt_tokens

                glat_info = {
                    "glat_accu": (same_num.sum() / seq_lens.sum()).item(),
                    "glat_context_p": glat['context_p'],
                }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.