chengtan9907 / openstl Goto Github PK
View Code? Open in Web Editor NEWOpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
Home Page: https://openstl.readthedocs.io/en/latest/
License: Apache License 2.0
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
Home Page: https://openstl.readthedocs.io/en/latest/
License: Apache License 2.0
Nice work! Is early training an available option for OpenSTL?
您好,非常感谢您再STL模型方面做出的贡献,给我提供了很多帮助。
我想请教一下,从问题“kth数据集问题 https://github.com/chengtan9907/OpenSTL/issues/28”@JCChen778中发现了和我遇到类似的问题,但是我依然没有解决,希望得到您的指导。
我的问题:在进行可视化的时候遇到提示错误:
E:\Anaconda3\envs\pytorch-gpu-openSTL\python.exe H:/OpenSTL-master/tools/visualizations/vis_video.py -d mmnist -w work_dirs/Debug --index 0 --save_dirs fig_mmnist_vis
Traceback (most recent call last):
File "H:/OpenSTL-master/tools/visualizations/vis_video.py", line 136, in
main()
File "H:/OpenSTL-master/tools/visualizations/vis_video.py", line 45, in main
assert os.path.isdir(args.work_dirs)
AssertionError
Process finished with exit code 1
开始怀疑可能是work_dir的路径不正确,按照可视化文档(docs/en/visualization/video_visualization.md)中的指令进行修改后,
-d mmnist -w work_dirs/Debug --index 0 --save_dirs fig_mmnist_vis
仍然不能解决,以下是我从官网下载程序包并训练之后得到的文件的相关结构,是否与您的相同。谢谢。
OpenSTL-master
|--tools
| |--visualizations
| | |--vis_video.py
| |--work_dirs
| | |--Debug
| | | |--checkpoints
| | | | |--lastest.pth
Hi,
I would like to know why PSNR was not used to evaluate the TaxiBJ performance?
Thanks,
Mareeta
Hi, thank u for releasing the code.
But I encountered some problems while training and evaluating on KittiCaltech benchmark.
In Table 2 of the paper, it writes there are 2042 training samples and 1983 test samples.
But I got 3738 training samples (from Kitti train split) and I don't know which split of CalTech Pedestrian is used as test samples.
Could you plz tell me the details about KittiCaltech benchmark.
Thanks a lot.
谢谢你的代码,你的代码对我跑baseline有很大的帮助,用你的代码的时候我遇到了一个问题,对于MIM模型,我将train.py训练1次验证1次改成了训练1次验证4次,这4次验证结果却不同,按我的理解输入数据一样这4次验证结果应该相同
I conducted experiments on some datasets provided by you, and I got SSIM and PSNR as expected except for the t2m_5625 dataset:
I think that's the reason why the paper (journal version of SimVP) does not contain SSIM or PSNR for this dataset. Could you please tell me why this happens? Thank you!
HI, compliments for this project!
I simply executed all the cell of the example google colab notebook, without any modifications, and i got stuck on cell "3.2 Setup the experiment".
The error i got is:
ModuleNotFoundError Traceback (most recent call last)
[<ipython-input-14-1efd4377d66c>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from openstl.api import BaseExperiment
2 from openstl.utils import create_parser
3
4 args = create_parser().parse_args([])
5 config = args.__dict__
1 frames
[/content/OpenSTL/openstl/api/train.py](https://localhost:8080/#) in <module>
8 import numpy as np
9 from typing import Dict, List
---> 10 from fvcore.nn import FlopCountAnalysis, flop_count_table
11
12 import torch
ModuleNotFoundError: No module named 'fvcore'
if i try to install fvcore using !pip install fvcore i get:
Requirement already satisfied: fvcore in /usr/local/lib/python3.10/dist-packages/fvcore-0.1.5.post20221221-py3.10.egg (0.1.5.post20221221)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from fvcore) (1.22.4)
Requirement already satisfied: yacs>=0.1.6 in /usr/local/lib/python3.10/dist-packages/yacs-0.1.8-py3.10.egg (from fvcore) (0.1.8)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from fvcore) (6.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from fvcore) (4.65.0)
Requirement already satisfied: termcolor>=1.1 in /usr/local/lib/python3.10/dist-packages (from fvcore) (2.3.0)
Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from fvcore) (8.4.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.10/dist-packages (from fvcore) (0.8.10)
Requirement already satisfied: iopath>=0.1.7 in /usr/local/lib/python3.10/dist-packages/iopath-0.1.10-py3.10.egg (from fvcore) (0.1.10)
Requirement already satisfied: typing_extensions in /usr/local/lib/python3.10/dist-packages (from iopath>=0.1.7->fvcore) (4.6.3)
Requirement already satisfied: portalocker in /usr/local/lib/python3.10/dist-packages/portalocker-2.7.0-py3.10.egg (from iopath>=0.1.7->fvcore) (2.7.0)
So it looks like the package is already installed but there is some issue on the configuration of the notebook.
How i can fix the issue and make the tutorial notebook working?
公式2写的是SA和DA做Kronecker product,而图4中画的是Hadamard product,虽然实际代码中由于DA是1*1的,所以好像一样,但论文看的有点不理解
Hello, I am trying to train SimVP in KTH, but I find the tool for KTH seems not enough.
After unzipping those .zip files, I got a lot of .avi videos. However, according to the dataloader script, I think under directories like 'boxing', there should be some images of shape (1000, 1000), but I only got .avi consisting of frames of shape (120, 160).
Could you please tell me if there are any steps I forget to take to perform training in KTH dataset? Thank you!
Thank you for sharing
I encountered some issues while using the Caltech dataset. The dataset downloaded from your network disk contains many empty files with a size of 0b. If possible, could you please test it_ data_ Could you please upload the gzip.hkl and fileidx.hkl files again.
你好!首先,非常感谢你提供的优秀的代码库。
这两天我在尝试运行你的代码,遇到了一些问题:
2.可视化文档(docs/en/visualization/video_visualization.md)中的指令
python tools/visualizations/vis_video.py -d mmnist -w work_dirs/EXP/ --index 0 --save_dirs fig_mmnist_vis可能有误
我尝试模仿这个指令的格式进行输入,发生报错
Traceback (most recent call last):
File "tools/visualizations/vis_video.py", line 136, in
main()
File "tools/visualizations/vis_video.py", line 50, in main
base_dir = base_dir.split(method_list[0])[0]
ValueError: empty separator
当我把指令中’work_dirs/EXP/‘最后的/去掉后可视化能正常运行
所以我认为正确的指令格式应该是
python tools/visualizations/vis_video.py -d mmnist -w work_dirs/EXP --index 0 --save_dirs fig_mmnist_vis
3.我尝试使用文档中的方法加载kth数据集(数据为视频形式)
bash tools/prepare_data/download_kth.sh
python tools/train.py -d kth --lr 1e-3 -c configs/kth/PredRNNv2.py --ex_name kth-prev2
报错
NotADirectoryError: [Errno 20] Not a directory: './data/kth/boxing/person13_boxing_d4_uncomp.avi'
然后我去下载了原prernn库中提供的kth数据集(数据为图片形式)
python tools/train.py -d kth --lr 1e-3 -c configs/kth/PredRNNv2.py --ex_name kth-prev2
数据能加载一部分,但很快就out of memory了
我使用的是24G的3090进行训练,在原prernn库中训练时并不会遇到这个问题
最后,如果可以的话能在百度网盘中提供一下你使用的kth数据集吗?非常感谢
Dear Authors,
The pixel values of the experiments typically are normalized to a range (i.e. [0-1]). Is there any reason you use a regular 2D convolution layer without activation (such as sigmod/tanh layers that would normalize the values to such a range) as the last decoder's layer?
On the other hand, how do you normalize the output values for visualization? Do you clip them, or do you do it by another method?
Thanks!
The download link for the KittiCaltech Pedestrian dataset is no longer accessible. Could you please provide a new download link?
I tried following code and it hangs.
What's the right way to do distributed training on custom dataset?
os.environ["RANK"] = "0"
os.environ["WORLD_SIZE"] = "4"
os.environ["MASTER_ADDR"] = "127.0.0.1"
os.environ["MASTER_PORT"] = "29500"
custom_training_config = {
'pre_seq_length': config.seq_len,
'aft_seq_length': config.seq_len,
'total_length': config.seq_len + config.seq_len,
'batch_size': config.batch_size,
'val_batch_size': config.batch_size,
'epoch': config.epochs,
'lr': 0.001,
'metrics': ['mse', 'mae'],
"fp16": True,
"dist": True,
"launcher": "pytorch",
'ex_name': 'custom_exp',
'dataname': 'custom',
'in_shape': [config.seq_len, 1, config.input_shape, config.input_shape],
}
...
args = create_parser().parse_args([])
config = args.__dict__
# update the training config
config.update(custom_training_config)
# update the model config
config.update(custom_model_config)
exp = BaseExperiment(args, dataloaders=(dataloader, dataloader, dataloader))
print('>'*35 + ' training ' + '<'*35)
exp.train()
Dear Authors,
Thank you for your inspiring work. I have some questions regarding of the loss function and model architecture.
MSE loss might produce blurry result. Would it help if I replace it with L1 Loss?
It is believed that vanilla 2D convolution does not suffice to catch spatial-temporal correlation while it seems your model handles this well. I couldn't find detailed explanation in your paper. Could you please explain it?
With kind regrads.
Hello.
/*******/OpenSTL/data has following files
ls data/moving_mnist/
mnist_cifar_test_seq.npy mnist_cifar_test_seq.npy.tar mnist_test_seq.npy train-images-idx3-ubyte.gz
Error message when I run mmnist case using command given in your documentation. 'dataloader.py' does not seem to have handle for mmnist.
File "//OpenSTL/tools/train.py", line 40, in
exp = BaseExperiment(args)
File "//OpenSTL/openstl/api/train.py", line 48, in init
self._preparation()
File "//OpenSTL/openstl/api/train.py", line 129, in _preparation
self._get_data()
File "//OpenSTL/openstl/api/train.py", line 199, in _get_data
get_dataset(self.args.dataname, self.config)
File "/**/OpenSTL/openstl/utils/main_utils.py", line 151, in get_dataset
return load_data(config)
File "//OpenSTL/openstl/datasets/dataloader.py", line 43, in load_data
raise ValueError(f'Dataname {dataname} is unsupported')
ValueError: Dataname mmnist is unsupported
Thank you for releasing the code!
We have tried to train the SimVP model with your default setting. The results we got could be more optimal. This is the line of code that we've run
python tools/non_dist_train.py -d mmnist -m SimVP --model_type gsta --lr 1e-3 --ex_name mmnist_simvp_gsta --epoch 600
The MSE score on the testing set is 47.128. Can you check if there is anything wrong with our settings? And is there a better hyperparameter setting we should use to generate your paper's results?
Hi! I appreciate your wonderful project.
I have a question about the nni package installation error.
I encountered a "ResolvePackageNotFound:nni" error when I tried to "conda create" in my environment.
I checked the previous pullrequests in this repository for this problem.
And I noticed that the fix was done in #15 .
However, I noticed that the change was reverted in the commit b678838 .
Was this change intended?
Thank you in advance.
您好,我最近尝试在MovingMNIST数据集上训练PredRNN,MAU等方法,可是得出的结果与文中的实验结果有差距。例如在文中PredRNN方法的MSE为25.04+-0.08,比文中MAU方法MSE的30.64+-0.10要低很多,而我反复训练了几次得到的实验结果表明,PredRNN方法得到的MSE并不能达到30以下,而MAU方法的MSE要比文中展示结果要好,达到了27左右,请问我的训练是哪里出了问题?如何能得到与您文中相似的实验结果?
以下是我训练时用到的指令
python tools/non_dist_train.py -d mmnist -c configs/mmnist/PredRNN.py --ex_name mmnist_predrnn
python tools/non_dist_train.py -d mmnist -c configs/mmnist/MAU.py --ex_name mmnist_mau
It is a great honor to learn your code, and the results obtained by reproducing the moving-mnist dataset are highly consistent with the results in your literature. The work I am currently doing is to imitate the movement and prediction of mice predation, and create a custom data set with a shape and data structure similar to the moving-mnist test set mnist_test_seq.npy format. The data loader code is as follows, also I give an example of my dataset, but it is confusing surprisingly, after several parameter adjustments, the training loss and test loss both decreased from thousands, and the image quality evaluation index was very large (7-digit MSE). It is indeed difficult to solve the trouble, sincerely request I would be very grateful for your help.
import cv2
import gzip
import numpy as np
import os
import random
import torch
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
from PIL import Image
from openstl.datasets.utils import create_loader
def load_mnist(root, data_name='mnist'):
# Load train dataset.
file_map = {
'mnist': 'moving_mnist/mnist_train_seq.npy',
}
path = os.path.join(root, file_map[data_name])
mnist = np.load(path)
mnist = mnist.transpose(1, 0, 2, 3)
return mnist
def load_fixed_set(root, data_name='mnist'):
# Load the test dataset
file_map = {
'mnist': 'moving_mnist/mnist_test_seq.npy',
}
path = os.path.join(root, file_map[data_name])
dataset = np.load(path)
dataset = dataset.transpose(1, 0, 2, 3)
return dataset
class MovingMNIST(Dataset):
def __init__(self, root, is_train=True, data_name='mnist',
n_frames_input=10, n_frames_output=10,
transform=None, target_transform=None, use_augment=False):
super(MovingMNIST, self).__init__()
self.dataset = None
self.root = os.path.expanduser(root)
self.transform = transform
self.target_transform = target_transform
self.is_train = is_train
self.data_name = data_name
if self.is_train:
self.mnist = load_mnist(root, data_name)
else:
self.dataset = load_fixed_set(root, data_name)
self.n_frames_input = n_frames_input
self.n_frames_output = n_frames_output
self.n_frames_total = self.n_frames_input + self.n_frames_output
self.use_augment = use_augment
self.mean = 0
self.std = 1
def _augment_seq(self, imgs, crop_scale=0.94):
"""Augmentations for video"""
_, _, h, w = imgs.shape # original shape, e.g., [10, 1, 64, 64]
imgs = F.interpolate(imgs, scale_factor=1 / crop_scale, mode='bilinear')
_, _, ih, iw = imgs.shape
# Random Crop
x = np.random.randint(0, ih - h + 1)
y = np.random.randint(0, iw - w + 1)
imgs = imgs[:, :, x:x + h, y:y + w]
# Random Flip
if random.randint(-2, 1) > 0:
imgs = torch.flip(imgs, dims=(2, 3)) # rotation 180
elif random.randint(-2, 1) > 0:
imgs = torch.flip(imgs, dims=(2,)) # vertical flip
elif random.randint(-2, 1) > 0:
imgs = torch.flip(imgs, dims=(3,)) # horizontal flip
return imgs
def __getitem__(self, index):
# need to iterate over time
def _transform_time(data):
new_data = None
for i in range(data.size(0)):
img = Image.fromarray(data[i].numpy(), mode='L')
new_data = self.transform(img) if new_data is None else torch.cat([self.transform(img), new_data], dim=0)
return new_data
if self.is_train:
input, output = self.mnist[index, :self.n_frames_input], self.mnist[index, self.n_frames_input:self.n_frames_total]
else:
input, output = self.dataset[index, :self.n_frames_input], self.dataset[index, self.n_frames_input:self.n_frames_total]
if self.transform is not None:
input = _transform_time(input)
if self.target_transform is not None:
output = _transform_time(output)
# Reshape the input and output tensors
input = torch.from_numpy(np.expand_dims(input, axis=1)).float()
output = torch.from_numpy(np.expand_dims(output, axis=1)).float()
return input, output
def __len__(self):
if self.is_train:
return len(self.mnist)
else:
return len(self.dataset)
def load_data(batch_size, val_batch_size, data_root, num_workers=4, data_name='mnist',
pre_seq_length=10, aft_seq_length=10, in_shape=[10, 1, 64, 64],
distributed=False, use_augment=False, use_prefetcher=False, drop_last=True):
image_size = in_shape[-1] if in_shape is not None else 64
train_set = MovingMNIST(root=data_root, is_train=True, data_name=data_name,
n_frames_input=pre_seq_length,
n_frames_output=aft_seq_length, use_augment=False)
test_set = MovingMNIST(root=data_root, is_train=False, data_name=data_name,
n_frames_input=pre_seq_length,
n_frames_output=aft_seq_length, use_augment=False)
dataloader_train = create_loader(train_set,
batch_size=batch_size,
shuffle=True, is_training=True,
pin_memory=True, drop_last=True,
distributed=distributed, use_prefetcher=use_prefetcher)
dataloader_vali = create_loader(test_set,
batch_size=val_batch_size,
shuffle=False, is_training=False,
pin_memory=True, drop_last=drop_last,
distributed=distributed, use_prefetcher=use_prefetcher)
dataloader_test = create_loader(test_set,
batch_size=val_batch_size,
shuffle=False, is_training=False,
pin_memory=True, drop_last=drop_last,
distributed=distributed, use_prefetcher=use_prefetcher)
return dataloader_train, dataloader_vali, dataloader_test
if name == 'main':
from openstl.utils import init_dist
os.environ['LOCAL_RANK'] = str(0)
os.environ['RANK'] = str(0)
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12357'
dist_params = dict(launcher='pytorch', backend='nccl', init_method='env://', world_size=1)
init_dist(**dist_params)
dataloader_train, _, dataloader_test = \
load_data(batch_size=4,
val_batch_size=4,
data_root='../../data/',
num_workers=4,
data_name='mnist',
pre_seq_length=10,
aft_seq_length=10,
distributed=False,
use_prefetcher=False)
print(len(dataloader_train), len(dataloader_test))
for item in dataloader_train:
print(item[0].shape, item[1].shape)
break
for item in dataloader_test:
print(item[0].shape, item[1].shape)
break
Can I use the tool to make predictions? I only see interfaces for training, validation and testing.
When I play a competition, I don't have labeled data from the test set, can I use the trained model to predict the outcome?
Thank you for doing such a wonderful job. I have a question: Does the predicted length of the SIMVP model need to be the same as the input length, or can they be different? For example, can we input 12 frames from the past and predict 4 frames in the future?
Hi, first of all thank you for a great work on SimVP and really nice repository.
Im currently want to to train a SimVP model on ocean data to forecast ocean transparency. I found out that on T4 GPU training is realy slow on my data, one iteration takes approx. 0.7 seconds, so i decided to train it on multiple GPU, but faced a problem that every process is creating separate entity of Dataset and end up with SIGKILL, caused by out of RAM while loading separate Datasets for every GPU.
I have no experience with distributed trainings so, maybe i messed something up. Maybe you have some ideas on how to solve this issue, it would be very helpful!
This numpy.array() call is coming out to be a bottleneck. Please see the output of CProfiler output below:
I am using a custom dataset (following the steps mentioned in the colab example) and DDP (8 GPUs).
Any idea why is this becoming a bottleneck for training?
Hello researchers. First of all, thank you for sharing this library with the scientific community. I feel that it is going to be really valuable many future researches. I adapted the example you provided to one of my data sets with my custom dataloader and it worked very well with impressive results with SimVP+gSTA. I found some issues, though:
First, the notebook on git wasn't the same as you provided in google drive. I think the git version might be older.
Second, the normalization procedure in the custom dataset might not be working correctly. I changed it because it had the batch dimension on the reshape. But this is the dataset, the batch is not yet defined.
mean = self.data.mean(axis=(0, 1, 2, 3)).reshape(1, -1, 1, 1)
std = self.data.std(axis=(0, 1, 2, 3)).reshape(1, -1, 1, 1)
self.data = (self.data - mean) / std
self.mean = mean
self.std = std
Third, in the notebook you mentioned that the range should be [0,1]. This normalization doesn't achieve that. I kept it like this and the results were very impressive, though I usually work with [0,1]. I'l probably test with [0,1] later, using a min-max scaler instead of stddev.
Hello!
Thanks for your excellent program. I have installed the program following the steps given, but when I am ready to execute the training example you have given, I get the following error:
(OpenSTL) heweibing@ubuntusrv:~/DeepLearning/OpenSTL$ python tools/train.py -d mmnist --lr 1e-3 -c configs/mmnist/simvp/SimVP_gSTA.py --ex_name mmnist_simvp_gsta
Traceback (most recent call last):
File "/home/disk15t/heweibing/DeepLearning/OpenSTL/tools/train.py", line 7, in <module>
from openstl.api import BaseExperiment
File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/api/__init__.py", line 3, in <module>
from .train import BaseExperiment
File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/api/train.py", line 15, in <module>
from openstl.core import Hook, metric, Recorder, get_priority, hook_maps
File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/core/__init__.py", line 7, in <module>
from .optim_scheduler import get_optim_scheduler
File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/core/optim_scheduler.py", line 4, in <module>
from timm.optim.adafactor import Adafactor
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/site-packages/timm/__init__.py", line 2, in <module>
from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/site-packages/timm/models/__init__.py", line 28, in <module>
from .maxxvit import *
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/site-packages/timm/models/maxxvit.py", line 216, in <module>
@dataclass
^^^^^^^^^
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 1221, in dataclass
return wrap(cls)
^^^^^^^^^
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 1211, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 959, in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 816, in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'timm.models.maxxvit.MaxxVitConvCfg'> for field conv_cfg is not allowed: use default_factory
Do you know if there was an error? How do I fix it?
I would greatly appreciate it if you could provide me with some assistance.
Release of OpenSTL V0.3.0 with various features and releases models and logs of SpatioTemporal predictive Learning (STL) methods. On updating!
docs/en/model_zoos
.hi, thk u for your code and your excellent work!
I have encountered some problems when i reproduce the SimVP-sSTA*10 in your paper, which should has a performance of 15.0 on MSE.
Actually, before i reproduce the SimVP-sSTA*10, the SimVP-sSTA, which means train 200 epoch, was shown the consistent 26.6 mse performance reported in your paper. So i can believe the code that i download and the configuration is right.
However, when i train the right SimVP-sSTA for 2K epoch, which means the SimVP-sSTA*10 model, it has a bad performance of 24.5 on MSE. I tried to figure out the inconsistent reproduction, but there is no more detail in your paper about how to train the SimVP-sSTA for 2000 epoch.
Hopefully you could answer this question and provide more training details about the SimVP-sSTA*10 so that i can reproduce your result.
Thanks!
Hi all,
First of all thank you for the amazing work and codebase. I was experimenting with training the model on the mnist dataset in google Colab but had training times of 1h+ per epoch. This is strange considering the "V1" SIMVP only took about 2min per epoch. Did anyone have a similar issue? Could the problem be that google Colab only provides +/- 10 gb of ram?
Thanks in advanced!
Really thank you for your excellent contribution of STL.
However, I cannot forecast using SimVP mdoel while time step of input and output are different. So, I'd like to ask for the solving.
e.g.
input of the model is like B * T1 * C1 * H * W.
I'd like to got a output whose shape is B * T2 * C2 * H * W, which T2 is longer than T1.
Thank you for sharing the code
I was trying to reproduce the results of the SimVP model in taxibj and I found that I needed to make minor changes to the code in 'config_utils.py' to start training the model.
temp_config_file = tempfile.NamedTemporaryFile( dir=temp_config_dir, suffix=fileExtname,delete=False)
I don't know if this has affected my recurrence results.I tried to reproduce the taxibj dataset using code similar to the mmnist example program.taxibj training time set to 50.
python tools/non_dist_train.py -d taxibj -c ./configs/taxibj/simvp/SimVP_gSTA.py --ex_name taxibj_simvp_gsta
The result of the replication is - mse:0.41158363223075867.Can you check if there is anything wrong with our settings?
In utils/main_utils/update_config, it seems the config will be replaced by default args. So when I changed the learning rate in config, it may print 'overwrite config key -- lr: 0.005 -> 0.001'. And the actual learning rate is always 0.001
What's the purpose of update_config function?
def update_config(args, config, exclude_keys=list()): """update the args dict with a new config""" assert isinstance(args, dict) and isinstance(config, dict) for k in config.keys(): if args.get(k, False): if args[k] != config[k] and k not in exclude_keys: print(f'overwrite config key -- {k}: {config[k]} -> {args[k]}') else: args[k] = config[k] else: args[k] = config[k] print(args) return args
Hey @chengtan9907, are there any pretrained models that you could share?
It'd help kickstart learning on newer datasets, as well as act as a stand in set of weights to build pipelines.
Thanks for your contributory works! I notice that you have provided experiment resualts of a lot of models on various datasets. However, codes of some methods are not provided officially (e.g. ConvLSTM), so I wonder whether you implement them yourselves or not? If so, would you mind uploading those codes? If not, I think just a link will also be quite helpful. Thank you!
Hello, researchers.
Thank you for sharing the valuable code with us. I understand that most modules could be used on categorical data by adding a softmax to the end of the models and using categorical metrics (accuracy, f-score, etc).
Is this in your plans for the library?
May I ask how you divided the training and test sets for this dataset? In order to compare the results with those in modelzoo, I would like to divide the data in the same way as you used. Can you provide the taxibj/dataset.npz
file?
Thanks for the great framework.
I would like to utilize this to forecast the motion of the liver in 2D mri slices. The forecasting seems is training the loss approaching 0.002 on average in val set.
However, the images are really noisy and I don't understand it.
My code is as follows:
`from typing import Any, Optional
import torch
import pytorch_lightning as pl
from pytorch_lightning.utilities.types import STEP_OUTPUT
from openstl.models.simvp_model import SimVP_Model
from torch.nn.functional import mse_loss
class VideoTransformer(pl.LightningModule):
def init(self, ) -> None:
super().init()
self.model = SimVP_Model((10, 1, 32, 64))
def forward(self, x: Any) -> Any:
y_hat = self.model(x)
return y_hat
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = mse_loss(y_hat, y)
self.log('train_loss', loss)
return loss
def validation_step(self, batch, batch_dix):
x, y = batch
y_hat = self(x)
loss = mse_loss(y_hat, y)
self.log('val_loss', loss)
return loss
def configure_optimizers(self) -> Any:
return torch.optim.Adam(self.parameters(), lr=1e-4)`
您好,非常感谢您出色的工作!
我按照文档中的流程运行
bash tools/prepare_data/download_mmnist.sh
python tools/train.py -d mmnist --lr 1e-3 -c configs/mmnist/simvp/SimVP_gSTA.py --ex_name mmnist_simvp_gsta
发现log中的train loss只有0.0几
2023-07-04 09:20:07,531 - val mse:103.34159088134766, mae:280.3106689453125
2023-07-04 09:20:07,532 - Intermediate result: 103.34159088134766 (Index 7)
2023-07-04 09:20:07,533 - Epoch: 8, Steps: 625 | Lr: 0.0000815 | Train Loss: 0.0216427 | Vali Loss: 0.0252299
2023-07-04 09:22:15,720 - val mse:102.59400177001953, mae:253.41189575195312
2023-07-04 09:22:15,722 - Intermediate result: 102.59400177001953 (Index 8)
2023-07-04 09:22:15,722 - Epoch: 9, Steps: 625 | Lr: 0.0000923 | Train Loss: 0.0209375 | Vali Loss: 0.0250474
2023-07-04 09:24:23,421 - val mse:99.95042419433594, mae:237.94679260253906
2023-07-04 09:24:23,423 - Intermediate result: 99.95042419433594 (Index 9)
2023-07-04 09:24:23,423 - Epoch: 10, Steps: 625 | Lr: 0.0001043 | Train Loss: 0.0200122 | Vali Loss: 0.0244020
这和文档里下载的mmnist_simvp_s_gsta_one_ep200.log中train loss的结果不是很相符
2023-02-16 23:03:11,358 - val mse:88.54080200195312, mae:213.62892150878906
2023-02-16 23:03:11,360 - Epoch: 8, Steps: 625 | Lr: 0.0000815 | Train Loss: 13.5592894 | Vali Loss: 0.0216210
2023-02-16 23:06:08,100 - val mse:85.24305725097656, mae:203.82467651367188
2023-02-16 23:06:08,102 - Epoch: 9, Steps: 625 | Lr: 0.0000923 | Train Loss: 13.0065427 | Vali Loss: 0.0208163
2023-02-16 23:09:04,901 - val mse:85.50586700439453, mae:193.38180541992188
2023-02-16 23:09:04,902 - Epoch: 10, Steps: 625 | Lr: 0.0001043 | Train Loss: 12.7741899 | Vali Loss: 0.0208791
请问可以帮助我理解一下吗?
First of all, thank you very much for your work.
I saw your update about the taxibj dataset and showed the format of the dataset in tools\prepare_data\download_taxibj.sh. I downloaded https://github.com/chengtan9907/OpenSTL/releases/download/v0.1.0/taxibj_dataset.zip
dataset and looked at the format of the .npz file.
My problem is that by looking at the X_train dataset in .npz, he structure is (20461,4,2,32,32) for a 5D tensor data. But in the original author's explanation of the dataset you gave it shows that the dataset is a 4D tensor, so I would like to ask you what does this 5D structure represent respectively?
Just thought I'd let people know that I created a simple web/browser demo using ONNX Runtime Web.
Inference takes about 2 seconds for the 10-frame moving mnist model using the Wasm backend. It would probably be significantly faster with the WebGL backend, but it lacks quantization op support.
The quantized model comes in at less than 50mb, and seems to match the accuracy of the non-quantized (>150mb) model.
Here's the code and demo:
My next step is to create a version of the training and conversion notebook that can train on arbitrary videos that the user uploads.
(I'm also wondering whether some sort of diffusion-like process could be used to prevent the increasing blurriness as we predict further into the future. Maybe an "un-diffusion" process could actually just be "embedded" as extra frames between the "actual" frames during training? Or maybe it'd need to be a separate model. If anyone has any thoughts on this I'd love to hear them)
Thanks to the paper authors for publishing and open sourcing this!
Release of OpenSTL V0.2.0 for various spatioTemporal predictive learning (STL) tasks.
OpenSTL
instead of SimVPv2
with module name refactoring.5.625deg
, 2.8125deg
, and 1.40625deg
settings.docs/en
documents for the basic usages and new features of V0.2.0.--find_unused_parameters
for DDP training.Thanks for the great work! But I am confused by the results reported in your OpenSTL paper, it seems that the results are different from the results in the original papers. For example, the result of MovingMNIST in SimVP(original paper) is 23.8 (MSE), but in this repo the result is 32.15 (MSE). And there are a lot of the same problems for other methods, such as PredRNN++. So could you explain this? Are the results reproduced by the OpenSTL framework just not the same as the original ones? or there are some other reasons? Thanks in advance!
Best
Hello,
I was following tutorial.ipynb
in examples/
directory. This example does single GPU training. I wanted to extend this to multi-GPU (single node) training. How can I do do this?
I have tried the following approach till now but I am not sure it's correct:
I tried adding to custom_training_config
the following parameters:
'use_gpu': True,
'dist': True,
'launcher': 'pytorch',
After this I would need to specify envvars such as RANK
, WORLD_SIZE
, MASTER_ADDR
, MASTER_PORT
. What value should I specify for RANK
? Shouldn't it be auto set while spawning multiple processes?
Thank you for sharing
I am very interested in your work. If possible, could you share the pre trained models on the SimVP taxibj dataset with me.
Hello,
I was getting shape error while using SimVP model. Relevant part of stack trace:
File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/models/simvp_model.py", line 43, in forward
hid = self.hid(z)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/models/simvp_model.py", line 245, in forward
z = self.enc[i](z)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/models/simvp_model.py", line 208, in forward
z = self.block(x)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/modules/simvp_modules.py", line 222, in forward
self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm1(x)))
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
return F.batch_norm(
File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 2450, in batch_norm
return torch.batch_norm(
RuntimeError: running_mean should contain 384 elements not 512
I am using a custom dataset, with shape (11, 228, 574).
The model parameters are:
custom_training_config = {
'10gth': 10,
'10gth': 10,
'total_length': 10 + 10,
'8': 8,
'val_8': 8,
'epoch': 1,
'lr': 5e-4,
'sched': 'onecycle',
'metrics': ['mse', 'mae'],
'ex_name': 'custom_exp',
'dataname': 'custom',
'in_shape': [8, 11, 228, 574],
# GPU
'use_gpu': True,
# distributed training
'find_unused_parameters': True,
'dist': True,
'launcher': 'pytorch',
# use float 16
'fp16': True,
}
custom_model_config = {
'method': 'SimVP',
"spatio_kernel_enc": 3,
"spatio_kernel_dec": 3,
# Here, we directly set these parameters
'model_type': 'gSTA',
'N_S': 4,
'N_T': 8,
'hid_S': 64,
'hid_T': 256
}
Any help in how I can get rid of the shape error? Is the input shape of (11, 228, 574) not supported and only certain shapes supported?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.