Comments (7)
I want to finetune the model using multiple videos, same prompt each video. Which .yaml file should i use?
This will be a bit different in the next version release, but for now use video_folder.yaml
.
You should be good to go by using a folder containing .mp4
files paired with '.txt', same as image training.
from text-to-video-finetuning.
I want to finetune the model using multiple videos, same prompt each video. Which .yaml file should i use?
This will be a bit different in the next version release, but for now use
video_folder.yaml
.You should be good to go by using a folder containing
.mp4
files paired with '.txt', same as image training.
Great, thank you. Do you think 10 short videos (around 10-30 seconds each) is decent?
from text-to-video-finetuning.
Also, the steps are set to 50k by default, this will take ages even on a a100, isnt this overkill? is 2500 enough?
from text-to-video-finetuning.
@ImBadAtNames2019 You can choose an arbitrary amount of steps for your specific use case (it totally depends on your train data).
Length should be relevant to the data you want to capture. Here's an example of how you should be looking at your train set.
If you have a 10 second clip of a car driving down a road, then there's a cat crossing the street 5 seconds into the video, and your prompt is "a car driving down the road"
, you just told the model that this clip is entirely about a car 🙂 .
Make sure your prompts are relevant to the clip. The training script will handle the rest.
from text-to-video-finetuning.
@ImBadAtNames2019 You can choose an arbitrary amount of steps for your specific use case (it totally depends on your train data).
Length should be relevant to the data you want to capture. Here's an example of how you should be looking at your train set.
If you have a 10 second clip of a car driving down a road, then there's a cat crossing the street 5 seconds into the video, and your prompt is
"a car driving down the road"
, you just told the model that this clip is entirely about a car 🙂 .Make sure your prompts are relevant to the clip. The training script will handle the rest.
Thanks for your help and amazing work. Do you have other general advice on this?
from text-to-video-finetuning.
I'll chime in, another way of prompting is just to use a single fixed prompt for all the videos (similar to textual inversion/dreambooth). Then you can invoke your style at inference time by adding the prompt you finetuned with. This strategy can definitely overfit though, so it's probably best to keep the learning rate low.
Some other advice from my experiments:
- if you change the resolution or fps from their base (256 and 8 respectively), you'll need to unfreeze more relevant parts of the model
- Higher n_sample_frames leads to better temporal consistency
from text-to-video-finetuning.
I'll chime in, another way of prompting is just to use a single fixed prompt for all the videos (similar to textual inversion/dreambooth). Then you can invoke your style at inference time by adding the prompt you finetuned with. This strategy can definitely overfit though, so it's probably best to keep the learning rate low.
Some other advice from my experiments:
- if you change the resolution or fps from their base (256 and 8 respectively), you'll need to unfreeze more relevant parts of the model
- Higher n_sample_frames leads to better temporal consistency
Amazing, thank you.
from text-to-video-finetuning.
Related Issues (20)
- webui Lora Might be causing errors in checkpoint models. HOT 3
- How to train with folder video HOT 1
- Which paper? HOT 1
- RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1, 1, 512] because the unspecified dimension size -1 can be any value and is ambiguous HOT 3
- Does this code support native finetune for damo text to video model? HOT 2
- AttributeError: 'Tensor' object has no attribute 'config' HOT 5
- How can I run the fine-tuning on a GPU with <= 16GB of VRAM? HOT 3
- I have some doubts about the framework HOT 4
- A typo
- TypeError: Linear.forward() got an unexpected keyword argument 'scale' HOT 6
- wrong norm method HOT 1
- issues on train.py HOT 1
- [inference] latents_window index error HOT 1
- Two forward passes in finetune_unet HOT 1
- Lora on ResnetBlock2D in modelscope model HOT 1
- Can the videocomposer model be adapted to this training framework?
- Normal finetuning instead of LoRA
- ControlNet
- init_video problem
- finetune train error of "UnboundLocalError: local variable 'use_offset_noise' referenced before assignment" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-to-video-finetuning.