adymaharana / storydalle Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Thanks for sharing this great work. I have a question about the code, there is a variable 'sent_embeds' whose data is from file 'desciptions_vec_512.pkl', what is this file, and how is this file generated? what is the purpose of using the variable 'sent_embeds'?
From what I understand, this model should take the text tokens and the source frame as inputs to generate the subsequent frames, so I cannot understand why it uses 'desciptions_vec_512.pkl' as additional input.
Hi, I try to download PororoSV from https://drive.google.com/file/d/11Io1_BufAayJ1BpdxxV2uJUvCcirbrNc/view?usp=sharing several times, but it will failed after downloading 13.6GB and said 'permision denied'. Anyone else have same problem?
Hey, thanks for interesting work.
I was trying to run train_story.sh, but was running into memory issues when running on NVIDIA V100. Would you be able to share the configurations that you ran it on, and if there are ways to decrease the GPU memory requirements?
Hello,
is it possible to upload the checkpoint files of the paper for the DiDeMo and the Flintstone dataset somewhere? This would be super nice!
Hi,
I am trying to train the mega-dalle version, but there is one bug in the dataloader, e.g. in flintstones_dataloader.py
in line 160 there is:
tokens.append(self.tokenizer.encode(text.lower()))
while the TextTokenizer has no encode function. I changed it for tokenize and modified slightly as follows:
t = self.tokenizer.tokenize(text.lower())
if len(t) > 64:
t = t[:64]
t = t[1:(len(t)-1)]
t += [0]*(64-len(t))
tokens.append(t)
Now there is no error, but I wonder wether this is the way it was supposed to be.
Hi, thanks for your awesome work! I noticed that you proposed the flintstoneSV dataset based on the original flintstone dataset, where the video frames are sampled. I'm wondering could you please provide me with the original complete flintstone dataset for better video generation? I cannot find it anywhere.
I'm looking forward to your reply, thanks!
Hi thank you for your work on StoryDALL-E. I am currently in need of experimenting with the outputs so could you share your inference instructions?
I am trying to gather generated images from your best-performing checkpoint, and I faced this error.
(ldm) root@2157b047841c:/home/my/storydalle/story-dalle# bash infer_story.sh pororo
Evaluating on Pororo
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/my/storydalle/story-dalle/./infer_t2i.py:24 in <module> │
│ │
│ 21 │
│ 22 import logging │
│ 23 import os, torch │
│ ❱ 24 from dalle.models import PrefixTuningDalle, StoryDalle, PromptDalle │
│ 25 import torchvision │
│ 26 import torchvision.transforms as transforms │
│ 27 import pytorch_lightning as pl │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'PrefixTuningDalle' from 'dalle.models' (/home/my/storydalle/story-dalle/dalle/models/__init__.py)
Seems like dalle.models.init.py on the current version doesn't have the module.
I may have missed something. If so, I hope for your reply
Hello, thanks for your great work!
There are many files in your shared pororo datasets, and I want to know which content is stored in each file. For example, is label.py marking the characters that appear in the image and what's the masks folder contains? Can you provide more details about that, thanks very much!
Hi, thanks a lot for your great work! Could you please tell us the plan of releasing DiDeMoSV dataset?
Hello there!
Hello, have read your paper, and I think your paper is very good.
So I attempted to run the code, and unfortunately, I encountered an error. Here are the details:
Traceback (most recent call last):
File "./train_t2i.py", line 24, in
from dalle.models import PrefixTuningDalle, Dalle, PromptDalle, StoryDalle
ImportError: cannot import name 'PrefixTuningDalle' from 'dalle.models'
Can i get PrefixTuningDalle file?
Thank you:)
Thanks for your awesome work! I see that "from vfid.fid_score import fid_score" in your code, where can I install vfid lib? or where can I find the code?
Hi, I am currently working on reproduce StoryDALLE on Flintstones
I use the provided code and trained mega model for 50 epochs in a lr of 1e-5.
the FID is 32 and my generation results are poor compared to the figure 4 in your paper. blow are my generated images, the right ones are grount truth.
I think the reason maybe I am using the inference code in your training codebase. Am I using the correct method to generate images?
pixels = model.sample_images(texts, src_images).cpu().transpose(1, -1).transpose(-1, -2)
I wil appreciate it if you can give me any advise~
Hi, @adymaharana, I'm very excited to see your excellent work and I'm replicating it. I'm evaluate the model on the flintstones dataset, but I can't find the classifier weights for calculating Char-F1 and F-Acc. Can you provide the classifier weights? Thank you so much. very urgent!
Hi,
Are you going to release the weights of the pretrained DALL-E Mega version for all the datasets?
I would be grateful for your response.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.