Comments (17)
Following in this thread since it's also related to video loading:
Shouldn't there be a frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
inside the video_to_tensor function?
from magvit2-pytorch.
Yep, and I think it's still default in cv2.
from magvit2-pytorch.
Hi @iejMac I'd like to follow along if that's okay. It would be great if you could share any changes you make to the codebase to allow for larger scale training. I'm happy to share any weights I generate to help people get started with pretrained models.
from magvit2-pytorch.
Oh I also noticed one thing - is there a reason we don't normalize the pixels before passing it into the model? Or did I just not catch where that's done?
from magvit2-pytorch.
@lucidrains ah yeah ToTensor does but your VideoDataset doesn't do that and thats what I was using to test (was getting loss ~O(1e5)).
magvit2-pytorch/magvit2_pytorch/data.py
Line 159 in b2f105b
from magvit2-pytorch.
Yes I'm using LFQ from that. The main question I have about config is like can we figure out a parametrization of VideoTokenizer (given all params you added) that corresponds to like MAGVIT2-small so we can do some nice test runs.
Let's start out with - 8 frame videos at 25 FPS. Given that what are reasonable params for layers and other values in order to get decent results.
With the setup I sent above the loss curve/reconstructions look like this and it usually gets a 'nan' at some point (that's where it ended):
![Screenshot 2023-11-26 at 3 45 59 PM](https://private-user-images.githubusercontent.com/61431446/285690857-38e63560-4fba-41cb-a8bb-71cb6c11857a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTYyNDYwMTcsIm5iZiI6MTcxNjI0NTcxNywicGF0aCI6Ii82MTQzMTQ0Ni8yODU2OTA4NTctMzhlNjM1NjAtNGZiYS00MWNiLWE4YmItNzFjYjZjMTE4NTdhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTIwVDIyNTUxN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNkZWM2ZjI2ZTdiMDk3NzM0NzI0NGYxOTExMGZkNjkzMjY2NzBjNmFjYjNmNmIyYjI3MmQ0OTdlOGU2ZTM0N2YmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.YHtIqVUbnzeN2V4ml35uI17BqSwG84Jj-smrbylkVyU)
from magvit2-pytorch.
@iejMac shoot, i normalized for gifs, but not mp4s.. thank you Maciej!
from magvit2-pytorch.
cool, was just wondering if you have something on hand. I'll try to read/play around and I'll report here if I come up with something. Also for lowish-effort video dataloading video2numpy could be a good option! It's pretty fast and does all the normal preprocessing for you. Maybe I'll make a PR for that if you're interested.
from magvit2-pytorch.
@mudtriangle got it, put in a quick fix here (just doing it in tensor space, as i'm not familiar with cv2 enough)
from magvit2-pytorch.
@iejMac oh hey! yea, should be complete, probably one or two more bugs left to iron out
i would stick with LFQ initially, as that was what the magvit2 paper proposed, although some people have reported better results with FSQ. i put it into one repository so we can test them against each other and find out
from magvit2-pytorch.
@jpfeil will do!
Ok I think I'm mostly set up (had to port this code to a repo with a different style). My first question is - do we have some prepared configs (like what layers, how many frames, what fps etc.) which roughly correspond to some models they trained in the paper? Just so we can compare.
For reference, currently I'm using the equivalent of this:
tokenizer = VideoTokenizer(
image_size = 128,
init_dim = 64,
max_dim = 512,
layers = (
'residual',
'compress_space',
('consecutive_residual', 2),
'compress_space',
('consecutive_residual', 2),
'linear_attend_space',
'compress_space',
('consecutive_residual', 2),
'attend_space',
'compress_time',
('consecutive_residual', 2),
'compress_time',
('consecutive_residual', 2),
'attend_time',
)
)
from magvit2-pytorch.
@iejMac oh hey, what is the typical normalization for video? i think .ToTensor()
here should bring it to [0, 1]
?
from magvit2-pytorch.
@iejMac are you using the LFQ from this repo? the main claim of this paper is that this new quantization method helps them scale to more codes and better generation scores. if i had to sum up the paper, it would be, use independent binary latents + mostly convolutions
from magvit2-pytorch.
@iejMac yup, i can get some of the hyperparameters inline with the paper's probably Tuesday (currently in the middle of another project)
from magvit2-pytorch.
@iejMac would greatly appreciate it! π
from magvit2-pytorch.
@mudtriangle there's a BGR format? π
from magvit2-pytorch.
@mudtriangle got it, put in a quick fix here (just doing it in tensor space, as i'm not familiar with cv2 enough)
Hello, I'm wondering if there's any progress of hyperparameter/architecture config alignment with the magvit-v2 paper.
from magvit2-pytorch.
Related Issues (20)
- Running multi-gpu hangs after first step HOT 9
- Is there any requirement on the training images? HOT 3
- object has no attribute 'has_multiscale_discrs' HOT 2
- weights HOT 1
- Unsuccessful image reconstruction HOT 3
- expired discord invitation HOT 3
- pretrained weights
- Pixelated image reconstruction HOT 7
- βvideo_contains_first_frameβ in encoder HOT 1
- recon images is black HOT 9
- Question about casual 3d cnn HOT 1
- The configuration of training
- Is there anyone success to train this model? HOT 15
- Running multi-gpu training HOT 5
- About training steps and correctness. HOT 3
- Error while loading the states of optimizer in Trainer - def load(self, path)
- Is there any pretrained weights for debug? HOT 1
- About training speed.
- Why is magvitv2 different from the description in the paper? Am I understanding it wrong? HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from magvit2-pytorch.