yuanhy1997 / seqdiffuseq Goto Github PK
View Code? Open in Web Editor NEWText Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]
Home Page: https://arxiv.org/abs/2212.10325
Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]
Home Page: https://arxiv.org/abs/2212.10325
Hi,
I could successfully train a model on IWSLT De-En. Now I have some questions about inference:
decoder_attention_mask
is not in the args. Should I just comment it out? Line 124 in ed56ca4
bash ./inference_scrpts/iwslt_inf.sh path-to-ckpts/ema_0.9999_280000.pt path-to-save-results path-to-ckpts/alpha_cumprod_step_260000.npy
Hi, thanks for your great work. I would like to train your model on wiki-auto for text simplification task. I have found your data used in the google drive link provided from this repo. I've noticed that original wiki-auto dataset has less than 677k sequences. I haven't found detailed instruction in your article. So could you share how you get this data?
Hey guys, as the other issues this is about the code share for research purposes. Would it be possible for you to send me the code or do you plan to make the code public in the near future? My mail is [email protected]
Could you please provide the evaluation code?
Hi, I try to use your script (ccd.sh) to reproduce the Table 1 results on Commonsense Conversation Dataset, but it turns out that my reproduced results (BLEU: 0.154, Rouge-L: 6.38) are far below your reported values (BLEU: 1.02, Rouge-L: 8.59). Could you check whether the hyperparameters in ccd.sh are the optimal ones that you use? It would be better if you could also provide the evaluation scripts for producing BLEU and Rouge-L (currently the inference_scripts only save the testing outputs but no metrics evaluation results if I run it right)? Besides, are there any model checkpoints and testing outputs available?
Hello! I am a student and I am doing the work related with diffusion and seq2seq model. I am very interested in your work code. If possible, may I can be shared with your code: My email is: [email protected]
How can I use the mbr-decode?
Hi, I try to reproduce your result in Table 3, especially for SeqDiffuSeq without Adaptive Noise Schedule. Here is how I reproduce it:
I remove these two lines:
I use your default training and sampling script (sqrt noise schedule with uniform timestep sampling). And run sampling with different checkpoints, e.g. the 280K, 400K checkpoint.
However, my best result is about 18, which is quite different from your reported number 28.94. May I ask what I should change to get the reported number? BTW, I can reproduce the result of SeqDiffuSeq.
Hello, I would like to ask how to solve this problem
When I run the code for inference
Traceback (most recent call last):
File "inference_main.py", line 209, in
main()
File "inference_main.py", line 124, in main
if args.decoder_attention_mask:
AttributeError: 'Namespace' object has no attribute 'decoder_attention_mask'
Hi authors,
Thank you for such a well written repository! Really appreciate researchers like you all.
We are interested in your work and are trying to run your code on our data. However, we are encountering this issue. While we dig into this, could you please also take a look (since you would be more well-versed)?
FWIW, we have not changed anything except the dataloader_utils.py file with code to load in our data.
Thank you!
Traceback (most recent call last):
File "main.py", line 107, in
main()
File "main.py", line 95, in main
warmup=args.warmup,
File "SeqDiffuSeq/trainer.py", line 177, in run_loop
self.run_step(batch, cond)
File "SeqDiffuSeq/trainer.py", line 196, in run_step
self.forward_backward(batch, cond)
File "SeqDiffuSeq/trainer.py", line 265, in forward_backward
losses = compute_losses()
File "SeqDiffuSeq/src/modeling/diffusion/respace.py", line 98, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "SeqDiffuSeq/src/modeling/diffusion/gaussian_diffusion.py", line 376, in training_losses
x_start_mean = model.model.module.get_embeds(input_ids)
File ".pyenv/versions/3.7.14/lib/python3.7/site-packages/torch/nn/modules/module.py", line 948, in getattr
type(self).name, name))
AttributeError: 'TransformerNetModel_encoder_decoder' object has no attribute 'module'
When I tried to run the training script, I was reminded that mpi4py was missing, so I installed mpi4py
pip install mpi4py
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Collecting mpi4py
Using cached http://mirrors.aliyun.com/pypi/packages/2e/1a/1393e69df9cf7b04143a51776727dd048586781bca82543594ab439e2eb4/mpi4py-3.1.5.tar.gz (2.5 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: mpi4py
Building wheel for mpi4py (pyproject.toml) ... done
Created wheel for mpi4py: filename=mpi4py-3.1.5-cp38-cp38-linux_x86_64.whl size=6024408 sha256=64ef1c54d03ecb2c862c4e57da02d6dd8d9e33673ad3948afafca08d60edfd64
Stored in directory: /root/.cache/pip/wheels/9d/2a/7e/c6575a1d595c7d8cce796177f1b9827975c5b48b31e28f25b9
Successfully built mpi4py
Installing collected packages: mpi4py
Successfully installed mpi4py-3.1.5
Then I re-ran the training script, and there was no output at all.
bash ./train_scripts/iwslt_en_de.sh 0 de en
I waited for a while, but the program still didn't output anything. I don't know what's wrong. My operating system is Ubuntu. Is it possible that it's an MPI problem?
I should run “python bleu_eval.py ... ...”
Hi,
when I run your code, I get the output like:
| loss | 0.0201 |
| loss_q0 | 0.0202 |
| loss_q1 | 0.0201 |
| loss_q2 | 0.0202 |
| loss_q3 | 0.0199 |
| mse | 0.0201 |
| mse_q0 | 0.0202 |
| mse_q1 | 0.0201 |
| mse_q2 | 0.0202 |
| mse_q3 | 0.0199 |
what is the difference between loss and loss_qi for i in [0,3].
Hi, when I run the decode of the trained bpe tokenizer, I found the parameter 'skip_special_tokens' is not work. It means the final generate text including the '</s>, <s>, <pad>' etc.
Did you meet this problem? I know it can use the 'replace' function to solve it. But it is really resonable?
Thanks for you sharing your code !
Hi,
I'm confused about the EMA implementation. Could you tell me whether my understanding is correct?
There are two ways to do EMA. The first one is
1. after optimizer.step() you obtain model_new
2. ema_model_new = a * ema_model_old + (1-a) * model_new # a is ema_decay
3. model_new = copy(ema_model_new)
The second one is:
1. after optimizer.step() you obtain model_new
2. ema_model_new = a * ema_model_old + (1-a) * model_new # a is ema_decay
The difference between these two ways are:
May I ask which one you implement? I assume it is the second one, since the training outputs don't change when I change ema_decay.
Hi,
thank you again for your clean code. Here I have a question about the decoder_nll loss.
According to your code, you calculate the decoder_nll loss in this way:
x_start_mean = model.model.module.get_embeds(input_ids)
x_start = self.get_x_start(x_start_mean, std)
decoder_nll = self.token_discrete_loss(x_start, get_logits, input_ids, mask=loss_mask)
def token_discrete_loss(self, x_t, get_logits, input_ids, mask=None):
logits = get_logits(x_t) # bsz, seqlen, vocab
loss_fct = th.nn.CrossEntropyLoss(reduction="none", ignore_index=-100)
decoder_nll = loss_fct(logits.view(-1, logits.size(-1)), input_ids.view(-1)).view(
input_ids.shape
)
if mask is not None:
decoder_nll[mask == 0] = 0
decoder_nll = decoder_nll.sum(dim=-1) / mask.sum(dim=-1)
else:
decoder_nll = decoder_nll.mean(dim=-1)
return decoder_nll
which means that x_start is the noisy word embedding. You calculate the cross entropy between the noisy word embedding and input_ids. However, in the diffusion_lm, "We now describe the inverse process of rounding a predicted x0 back to discrete text" (second sentence in Section 4.2). It seems they use the predicted x_start ( model_output) rather than the noisy x_start.
I know the original code also implements it in this way, but it confuses me. Why don't we replace x_start in the function self.token_discrete_loss with model_out? The noisy word embedding x_start should be very close to the original word embedding, since at the beginning we only add little noise. We don't need to calculate its loss. Instead, we should make sure the predicted x_start (model_out) to be close to the word embedding.
You have to use the most recent .npy schedule file saved before .pt model weight file.
For this sentence, it means Must the pt weight file be greater than the npy noise schedule file? Can't you use the same time step parameters? What happens when you use the command "bash ./inference_scrpts/iwslt_inf.sh path-to-ckpts/ema_0.9999_280000.pt path-to-save-results path-to-ckpts/alpha_cumprod_step_280000.npy
"
Hello! Thanks for your code sharing.
When I run your default inference code:
bash ./inference_scrpts/iwslt_inf.sh path-to-ckpts/ema_0.9999_280000.pt path-to-save-results path-to-ckpts/alpha_cumprod_step_260000.npy
I met a problem:
created 6800 samples
sampling complete
Traceback (most recent call last):
File "inference_main.py", line 210, in
main()
File "inference_main.py", line 159, in main
write_outputs(args=args,
File "inference_main.py", line 192, in write_outputs
with open(output_file_basepath, "w") as text_fout:
FileNotFoundError: [Errno 2] No such file or directory: 'path-to-ckpts/ema_0.9999_280000.pt.samples_6750.steps-2000.clamp-no_clamp-normal_10708.txt'
So how I can find the txt file?
I would appreciate it if you can share your idea.
Hi, thanks for your great work. I would like to train your model on WMT14, but I haven't found your data used in the google drive link provided from this repo. Would you please double check it? Thanks.
Hi @Yuanhy1997 ,
Thanks for your great work.
I wonder if you try to use the weights of pre-trained BART to initialize your model. I didn't find these results in the paper while I can see you have a knob to keep pre-trained weights in the code.
Best,
Chiyu
I apply this adaptive technique by modifying my code, but the system prompts me
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
, and by
87 making sure all forward
function outputs participate in calculating loss.
88 If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward
function. Please include the loss function and the structure of the return value of forward
of your module when reporting this issue (e.g. list, dict, iterable)
Can you give me a hint? About half using 0 and half using the Zt of the previous prediction?
Thanks!
Line 19 in ed56ca4
I want to ask if the result 29.83 compared with the AR and CNAT is calculated by this code? And this result is test set or valid set?
Thanks~
Thanks for the work! For all datasets, could you upload the checkpoints used for the results in the paper?
hi, I am very interested in your work. Could you share the source code with me? Thanks a lot!
my email is: [email protected]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.