adymaharana / storydalle Goto Github PK

View Code? Open in Web Editor NEW

327.0 327.0 26.0 7.38 MB

License: MIT License

Python 99.21% Shell 0.79%

storydalle's People

Contributors

Stargazers

Watchers

storydalle's Issues

what is the sentence embeding?

Thanks for sharing this great work. I have a question about the code, there is a variable 'sent_embeds' whose data is from file 'desciptions_vec_512.pkl', what is this file, and how is this file generated? what is the purpose of using the variable 'sent_embeds'?

From what I understand, this model should take the text tokens and the source frame as inputs to generate the subsequent frames, so I cannot understand why it uses 'desciptions_vec_512.pkl' as additional input.

Failed to download PororoSV dataset

Hi, I try to download PororoSV from https://drive.google.com/file/d/11Io1_BufAayJ1BpdxxV2uJUvCcirbrNc/view?usp=sharing several times, but it will failed after downloading 13.6GB and said 'permision denied'. Anyone else have same problem?

OOM issues

Hey, thanks for interesting work.
I was trying to run train_story.sh, but was running into memory issues when running on NVIDIA V100. Would you be able to share the configurations that you ran it on, and if there are ways to decrease the GPU memory requirements?

Checkpoint files of other datasets

Hello,

is it possible to upload the checkpoint files of the paper for the DiDeMo and the Flintstone dataset somewhere? This would be super nice!

Bug in mega-story-dalle & tokenizer

Hi,
I am trying to train the mega-dalle version, but there is one bug in the dataloader, e.g. in flintstones_dataloader.py in line 160 there is:

tokens.append(self.tokenizer.encode(text.lower()))

while the TextTokenizer has no encode function. I changed it for tokenize and modified slightly as follows:

t = self.tokenizer.tokenize(text.lower())
if len(t) > 64: 
    t = t[:64]
t = t[1:(len(t)-1)]
t += [0]*(64-len(t))
tokens.append(t)

Now there is no error, but I wonder wether this is the way it was supposed to be.

The flintstone dataset

Hi, thanks for your awesome work! I noticed that you proposed the flintstoneSV dataset based on the original flintstone dataset, where the video frames are sampled. I'm wondering could you please provide me with the original complete flintstone dataset for better video generation? I cannot find it anywhere.
I'm looking forward to your reply, thanks!

Inference Instructions

Hi thank you for your work on StoryDALL-E. I am currently in need of experimenting with the outputs so could you share your inference instructions?

PrefixTuningDalle not exist

I am trying to gather generated images from your best-performing checkpoint, and I faced this error.

(ldm) root@2157b047841c:/home/my/storydalle/story-dalle# bash infer_story.sh pororo
Evaluating on Pororo
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/my/storydalle/story-dalle/./infer_t2i.py:24 in <module>                                │
│                                                                                                  │
│    21                                                                                            │
│    22 import logging                                                                             │
│    23 import os, torch                                                                           │
│ ❱  24 from dalle.models import PrefixTuningDalle, StoryDalle, PromptDalle                        │
│    25 import torchvision                                                                         │
│    26 import torchvision.transforms as transforms                                                │
│    27 import pytorch_lightning as pl                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'PrefixTuningDalle' from 'dalle.models' (/home/my/storydalle/story-dalle/dalle/models/__init__.py)

Seems like dalle.models.init.py on the current version doesn't have the module.

I may have missed something. If so, I hope for your reply

Contents of each file in Pororo datasets

Hello, thanks for your great work!

There are many files in your shared pororo datasets, and I want to know which content is stored in each file. For example, is label.py marking the characters that appear in the image and what's the masks folder contains? Can you provide more details about that, thanks very much!

When to Release DiDeMoSV dataset

Hi, thanks a lot for your great work! Could you please tell us the plan of releasing DiDeMoSV dataset?

ImportError

Hello there!
Hello, have read your paper, and I think your paper is very good.
So I attempted to run the code, and unfortunately, I encountered an error. Here are the details:

Traceback (most recent call last):
File "./train_t2i.py", line 24, in
from dalle.models import PrefixTuningDalle, Dalle, PromptDalle, StoryDalle
ImportError: cannot import name 'PrefixTuningDalle' from 'dalle.models'

Can i get PrefixTuningDalle file?

Thank you:)

where can I install vfid?

Thanks for your awesome work! I see that "from vfid.fid_score import fid_score" in your code, where can I install vfid lib? or where can I find the code?

No such file or directory: '../data/pororo/descriptions_vec_512.pkl'

Thank you for sharing your work. I'd like to train the model, but when I ran the command "bash train_story.sh pororo ", it shows that error as follows. Can you kindly check data.zip file of the pororo datasets? Thank you very much!

Difficulties about reproduce results on FlintStones

Hi, I am currently working on reproduce StoryDALLE on Flintstones
I use the provided code and trained mega model for 50 epochs in a lr of 1e-5.
the FID is 32 and my generation results are poor compared to the figure 4 in your paper. blow are my generated images, the right ones are grount truth.

I think the reason maybe I am using the inference code in your training codebase. Am I using the correct method to generate images?

pixels = model.sample_images(texts, src_images).cpu().transpose(1, -1).transpose(-1, -2)

I wil appreciate it if you can give me any advise~

classifier trained weight for flintstones in story-dalle/eval_char_clf.sh

Hi, @adymaharana, I'm very excited to see your excellent work and I'm replicating it. I'm evaluate the model on the flintstones dataset, but I can't find the classifier weights for calculating Char-F1 and F-Acc. Can you provide the classifier weights? Thank you so much. very urgent!

https://github.com/adymaharana/storydalle/blob/132dd19f7277dae36c16c5630792deb12fa5a09f/story-dalle/eval_char_clf.sh#L3C201-L3C201

Weights for Mega version

Hi,
Are you going to release the weights of the pretrained DALL-E Mega version for all the datasets?
I would be grateful for your response.

adymaharana / storydalle Goto Github PK

storydalle's People

Contributors

Stargazers

Watchers

Forkers

storydalle's Issues

Recommend Projects

Recommend Topics

Recommend Org