chengtan9907 / openstl Goto Github PK

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

Home Page: https://openstl.readthedocs.io/en/latest/

License: Apache License 2.0

Python 98.13% Shell 1.87%

artificial-intelligence attention-mechanism awesome-list awesome-lists benchmark computer-vision deep-learning mlp predictive-learning pytorch self-supervised-learning transformer video-prediction weather-forecast

openstl's People

Contributors

Stargazers

Watchers

Forkers

genesoul lf-demo tmits37 benchoi93 lupin1998 peterwang9612 gaozhangyang crazytiy pavardxdavis lahm2wang sky-yongjie-xu oooolga shehel harukafukukawa 16uhrpasing zzwei1 lafreze caoyuanpu xihongyang1999 siujohnjai emptydiagram lmssdd xiaofei-guan mathieupvc saumya-svm ybiu s420-hash horczy wp8733684 mazhf shuowang-ai eglrp shengchaochen82 zhangpu00000 herocodemaster liangjie1115 jiazewang cswin fangzuliang ssahgal 623851394 netnerve godjiawen zhengkai15 deepphysicvision vsthiago mauriciodev tianbao-li 740402059 thimabru1010 yiveen petrelli netlabcode gqqarno randomforest1111 i2vec usamimeri felipewhitaker mcrosta saiyidimr jiaqianxie alejandrosalgueiro tcrapse wangzhiyuan120 leonty1 petermaner jiatoka xiaxzq wyk1022 jacky1128 wangmaomaomao0915 ocean2045 wyd2 wyxlambda ucsky duchai972 giokara-oqton apcc-geoslegend pppervez surpace0924 didiao11 54wb saddivisionqaq yingtiandt alex-nkg whq1342 zhufx666 ninevskiyk richard2yang weibinkou wqh2020 public-tatsuya-noyori liuzijian-cs cdonan realwangjiahao anm-pinellia vickie02736 hanbi-kim wangyiheng668 hjn918

openstl's Issues

codes for model zoo

Thanks for your contributory works! I notice that you have provided experiment resualts of a lot of models on various datasets. However, codes of some methods are not provided officially (e.g. ConvLSTM), so I wonder whether you implement them yourselves or not? If so, would you mind uploading those codes? If not, I think just a link will also be quite helpful. Thank you!

SSIM and PSNR seem weird for t2m_5625

I conducted experiments on some datasets provided by you, and I got SSIM and PSNR as expected except for the t2m_5625 dataset:

the SSIM is a small positive close to 0 (4.673859322470631e-07 for different models and different checkpoints)
the PSRN is a negative (-48.89913024845931 for different models and different checkpoints)

I think that's the reason why the paper (journal version of SimVP) does not contain SSIM or PSNR for this dataset. Could you please tell me why this happens? Thank you!

update_config function makes it unable to adjust learning rate by config

In utils/main_utils/update_config, it seems the config will be replaced by default args. So when I changed the learning rate in config, it may print 'overwrite config key -- lr: 0.005 -> 0.001'. And the actual learning rate is always 0.001

What's the purpose of update_config function?

def update_config(args, config, exclude_keys=list()): """update the args dict with a new config""" assert isinstance(args, dict) and isinstance(config, dict) for k in config.keys(): if args.get(k, False): if args[k] != config[k] and k not in exclude_keys: print(f'overwrite config key -- {k}: {config[k]} -> {args[k]}') else: args[k] = config[k] else: args[k] = config[k] print(args) return args

SimVP: Shape error in SimVP implementation

Hello,

I was getting shape error while using SimVP model. Relevant part of stack trace:

  File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/models/simvp_model.py", line 43, in forward
    hid = self.hid(z)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/models/simvp_model.py", line 245, in forward
    z = self.enc[i](z)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/models/simvp_model.py", line 208, in forward
    z = self.block(x)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/OpenSTL-0.3.0-py3.9.egg/openstl/modules/simvp_modules.py", line 222, in forward
    self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm1(x)))
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/home/tarun360/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 2450, in batch_norm
    return torch.batch_norm(
RuntimeError: running_mean should contain 384 elements not 512

I am using a custom dataset, with shape (11, 228, 574).

The model parameters are:

custom_training_config = {
    '10gth': 10,
    '10gth': 10,
    'total_length': 10 + 10,
    '8': 8,
    'val_8': 8,
    'epoch': 1,
    'lr': 5e-4,
    'sched': 'onecycle',
    'metrics': ['mse', 'mae'],

    'ex_name': 'custom_exp',
    'dataname': 'custom',
    'in_shape': [8, 11, 228, 574],

    # GPU
    'use_gpu': True,

    # distributed training
    'find_unused_parameters': True,
    'dist': True,
    'launcher': 'pytorch',

    # use float 16
    'fp16': True,
}

custom_model_config = {
    'method': 'SimVP',

    "spatio_kernel_enc": 3,
    "spatio_kernel_dec": 3,

    # Here, we directly set these parameters
    'model_type': 'gSTA',
    'N_S': 4,
    'N_T': 8,
    'hid_S': 64,
    'hid_T': 256
}

Any help in how I can get rid of the shape error? Is the input shape of (11, 228, 574) not supported and only certain shapes supported?

right way to do distributed training on custom dataset

I tried following code and it hangs.
What's the right way to do distributed training on custom dataset?

os.environ["RANK"] = "0"
os.environ["WORLD_SIZE"] = "4"
os.environ["MASTER_ADDR"] = "127.0.0.1"
os.environ["MASTER_PORT"] = "29500"

custom_training_config = {
        'pre_seq_length': config.seq_len,
        'aft_seq_length': config.seq_len,
        'total_length': config.seq_len + config.seq_len,
        'batch_size': config.batch_size,
        'val_batch_size': config.batch_size,
        'epoch': config.epochs,
        'lr': 0.001,
        'metrics': ['mse', 'mae'],
        "fp16": True,
        "dist": True,
        "launcher": "pytorch",
        'ex_name': 'custom_exp',
        'dataname': 'custom',
        'in_shape': [config.seq_len, 1, config.input_shape, config.input_shape],
    }

...

args = create_parser().parse_args([])
config = args.__dict__

# update the training config
config.update(custom_training_config)
# update the model config
config.update(custom_model_config)
exp = BaseExperiment(args, dataloaders=(dataloader, dataloader, dataloader))
print('>'*35 + ' training ' + '<'*35)
exp.train()

Example 'mmnist' run not working

Hello.

I just installed OpenSTL and tried to run 'mmnist' example case given in your documentation.

/*******/OpenSTL/data has following files

ls data/moving_mnist/
mnist_cifar_test_seq.npy mnist_cifar_test_seq.npy.tar mnist_test_seq.npy train-images-idx3-ubyte.gz

Error message when I run mmnist case using command given in your documentation. 'dataloader.py' does not seem to have handle for mmnist.

File "//OpenSTL/tools/train.py", line 40, in
exp = BaseExperiment(args)
File "//OpenSTL/openstl/api/train.py", line 48, in init
self._preparation()
File "//OpenSTL/openstl/api/train.py", line 129, in _preparation
self._get_data()
File "//OpenSTL/openstl/api/train.py", line 199, in _get_data
get_dataset(self.args.dataname, self.config)
File "/**/OpenSTL/openstl/utils/main_utils.py", line 151, in get_dataset
return load_data(config)
File "//OpenSTL/openstl/datasets/dataloader.py", line 43, in load_data
raise ValueError(f'Dataname {dataname} is unsupported')
ValueError: Dataname mmnist is unsupported

Conda create error for nni pkg.

Hi! I appreciate your wonderful project.
I have a question about the nni package installation error.

I encountered a "ResolvePackageNotFound:nni" error when I tried to "conda create" in my environment.

I checked the previous pullrequests in this repository for this problem.
And I noticed that the fix was done in #15 .

However, I noticed that the change was reverted in the commit b678838 .

Was this change intended?
Thank you in advance.

Inconsistent reproduction results on Moving MNIST

hi, thk u for your code and your excellent work!

I have encountered some problems when i reproduce the SimVP-sSTA*10 in your paper, which should has a performance of 15.0 on MSE.

Actually, before i reproduce the SimVP-sSTA*10, the SimVP-sSTA, which means train 200 epoch, was shown the consistent 26.6 mse performance reported in your paper. So i can believe the code that i download and the configuration is right.

However, when i train the right SimVP-sSTA for 2K epoch, which means the SimVP-sSTA*10 model, it has a bad performance of 24.5 on MSE. I tried to figure out the inconsistent reproduction, but there is no more detail in your paper about how to train the SimVP-sSTA for 2000 epoch.

Hopefully you could answer this question and provide more training details about the SimVP-sSTA*10 so that i can reproduce your result.

Thanks!

Questions about loss and model architecture

Dear Authors,

Thank you for your inspiring work. I have some questions regarding of the loss function and model architecture.

MSE loss might produce blurry result. Would it help if I replace it with L1 Loss?
It is believed that vanilla 2D convolution does not suffice to catch spatial-temporal correlation while it seems your model handles this well. I couldn't find detailed explanation in your paper. Could you please explain it?

With kind regrads.

关于其他预测方法的实验结果

您好，我最近尝试在MovingMNIST数据集上训练PredRNN，MAU等方法，可是得出的结果与文中的实验结果有差距。例如在文中PredRNN方法的MSE为25.04+-0.08，比文中MAU方法MSE的30.64+-0.10要低很多，而我反复训练了几次得到的实验结果表明，PredRNN方法得到的MSE并不能达到30以下，而MAU方法的MSE要比文中展示结果要好，达到了27左右，请问我的训练是哪里出了问题？如何能得到与您文中相似的实验结果？
以下是我训练时用到的指令
python tools/non_dist_train.py -d mmnist -c configs/mmnist/PredRNN.py --ex_name mmnist_predrnn
python tools/non_dist_train.py -d mmnist -c configs/mmnist/MAU.py --ex_name mmnist_mau

The download link for the KittiCaltech Pedestrian dataset is no longer accessible.

The download link for the KittiCaltech Pedestrian dataset is no longer accessible. Could you please provide a new download link?

Custom Dataset distributed training

Hi, first of all thank you for a great work on SimVP and really nice repository.

Im currently want to to train a SimVP model on ocean data to forecast ocean transparency. I found out that on T4 GPU training is realy slow on my data, one iteration takes approx. 0.7 seconds, so i decided to train it on multiple GPU, but faced a problem that every process is creating separate entity of Dataset and end up with SIGKILL, caused by out of RAM while loading separate Datasets for every GPU.

I have no experience with distributed trainings so, maybe i messed something up. Maybe you have some ideas on how to solve this issue, it would be very helpful!

About TaxiBJ Dataset

May I ask how you divided the training and test sets for this dataset? In order to compare the results with those in modelzoo, I would like to divide the data in the same way as you used. Can you provide the taxibj/dataset.npz file?

How can I use the tool to make predictions?

Can I use the tool to make predictions? I only see interfaces for training, validation and testing.

When I play a competition, I don't have labeled data from the test set, can I use the trained model to predict the outcome?

Question about KittiCaltech bechmark.

Hi, thank u for releasing the code.
But I encountered some problems while training and evaluating on KittiCaltech benchmark.
In Table 2 of the paper, it writes there are 2042 training samples and 1983 test samples.
But I got 3738 training samples (from Kitti train split) and I don't know which split of CalTech Pedestrian is used as test samples.

Could you plz tell me the details about KittiCaltech benchmark.
Thanks a lot.

The linkCrevNet

Slow training speedmnist dataset in google Colab

Hi all,

First of all thank you for the amazing work and codebase. I was experimenting with training the model on the mnist dataset in google Colab but had training times of 1h+ per epoch. This is strange considering the "V1" SIMVP only took about 2min per epoch. Did anyone have a similar issue? Could the problem be that google Colab only provides +/- 10 gb of ram?

Thanks in advanced!

Simple browser/JavaScript demo

Just thought I'd let people know that I created a simple web/browser demo using ONNX Runtime Web.

Inference takes about 2 seconds for the 10-frame moving mnist model using the Wasm backend. It would probably be significantly faster with the WebGL backend, but it lacks quantization op support.

The quantized model comes in at less than 50mb, and seems to match the accuracy of the non-quantized (>150mb) model.

Here's the code and demo:

My next step is to create a version of the training and conversion notebook that can train on arbitrary videos that the user uploads.

(I'm also wondering whether some sort of diffusion-like process could be used to prevent the increasing blurriness as we predict further into the future. Maybe an "un-diffusion" process could actually just be "embedded" as extra frames between the "actual" frames during training? Or maybe it'd need to be a separate model. If anyone has any thoughts on this I'd love to hear them)

Thanks to the paper authors for publishing and open sourcing this!

代码中MIM模型问题

谢谢你的代码，你的代码对我跑baseline有很大的帮助，用你的代码的时候我遇到了一个问题，对于MIM模型，我将train.py训练1次验证1次改成了训练1次验证4次，这4次验证结果却不同，按我的理解输入数据一样这4次验证结果应该相同

colab Tutorial doesn't run

HI, compliments for this project!
I simply executed all the cell of the example google colab notebook, without any modifications, and i got stuck on cell "3.2 Setup the experiment".
The error i got is:

ModuleNotFoundError                       Traceback (most recent call last)

[<ipython-input-14-1efd4377d66c>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from openstl.api import BaseExperiment
      2 from openstl.utils import create_parser
      3 
      4 args = create_parser().parse_args([])
      5 config = args.__dict__

1 frames

[/content/OpenSTL/openstl/api/train.py](https://localhost:8080/#) in <module>
      8 import numpy as np
      9 from typing import Dict, List
---> 10 from fvcore.nn import FlopCountAnalysis, flop_count_table
     11 
     12 import torch


ModuleNotFoundError: No module named 'fvcore'

if i try to install fvcore using !pip install fvcore i get:

Requirement already satisfied: fvcore in /usr/local/lib/python3.10/dist-packages/fvcore-0.1.5.post20221221-py3.10.egg (0.1.5.post20221221)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from fvcore) (1.22.4)
Requirement already satisfied: yacs>=0.1.6 in /usr/local/lib/python3.10/dist-packages/yacs-0.1.8-py3.10.egg (from fvcore) (0.1.8)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from fvcore) (6.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from fvcore) (4.65.0)
Requirement already satisfied: termcolor>=1.1 in /usr/local/lib/python3.10/dist-packages (from fvcore) (2.3.0)
Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from fvcore) (8.4.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.10/dist-packages (from fvcore) (0.8.10)
Requirement already satisfied: iopath>=0.1.7 in /usr/local/lib/python3.10/dist-packages/iopath-0.1.10-py3.10.egg (from fvcore) (0.1.10)
Requirement already satisfied: typing_extensions in /usr/local/lib/python3.10/dist-packages (from iopath>=0.1.7->fvcore) (4.6.3)
Requirement already satisfied: portalocker in /usr/local/lib/python3.10/dist-packages/portalocker-2.7.0-py3.10.egg (from iopath>=0.1.7->fvcore) (2.7.0)

So it looks like the package is already installed but there is some issue on the configuration of the notebook.

How i can fix the issue and make the tutorial notebook working?

Error while executing the training code

Hello！
Thanks for your excellent program. I have installed the program following the steps given, but when I am ready to execute the training example you have given, I get the following error:

(OpenSTL) heweibing@ubuntusrv:~/DeepLearning/OpenSTL$ python tools/train.py -d mmnist --lr 1e-3 -c configs/mmnist/simvp/SimVP_gSTA.py --ex_name mmnist_simvp_gsta
Traceback (most recent call last):
  File "/home/disk15t/heweibing/DeepLearning/OpenSTL/tools/train.py", line 7, in <module>
    from openstl.api import BaseExperiment
  File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/api/__init__.py", line 3, in <module>
    from .train import BaseExperiment
  File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/api/train.py", line 15, in <module>
    from openstl.core import Hook, metric, Recorder, get_priority, hook_maps
  File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/core/__init__.py", line 7, in <module>
    from .optim_scheduler import get_optim_scheduler
  File "/home/disk15t/heweibing/DeepLearning/OpenSTL/openstl/core/optim_scheduler.py", line 4, in <module>
    from timm.optim.adafactor import Adafactor
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/site-packages/timm/__init__.py", line 2, in <module>
    from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/site-packages/timm/models/__init__.py", line 28, in <module>
    from .maxxvit import *
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/site-packages/timm/models/maxxvit.py", line 216, in <module>
    @dataclass
     ^^^^^^^^^
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 1221, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 1211, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 959, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anaconda3/envs/OpenSTL/lib/python3.11/dataclasses.py", line 816, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'timm.models.maxxvit.MaxxVitConvCfg'> for field conv_cfg is not allowed: use default_factory

Do you know if there was an error? How do I fix it?

I would greatly appreciate it if you could provide me with some assistance.

Hyper-parameter settings for mmnist

Thank you for releasing the code!

We have tried to train the SimVP model with your default setting. The results we got could be more optimal. This is the line of code that we've run
python tools/non_dist_train.py -d mmnist -m SimVP --model_type gsta --lr 1e-3 --ex_name mmnist_simvp_gsta --epoch 600

The MSE score on the testing set is 47.128. Can you check if there is anything wrong with our settings? And is there a better hyperparameter setting we should use to generate your paper's results?

ConvLSTM: Slow training due to numpy.array() call

This numpy.array() call is coming out to be a bottleneck. Please see the output of CProfiler output below:

I am using a custom dataset (following the steps mentioned in the colab example) and DDP (8 GPUs).

Any idea why is this becoming a bottleneck for training?

可视化模型过程中的问题

您好，非常感谢您再STL模型方面做出的贡献，给我提供了很多帮助。
我想请教一下，从问题“kth数据集问题 https://github.com/chengtan9907/OpenSTL/issues/28”@JCChen778中发现了和我遇到类似的问题，但是我依然没有解决，希望得到您的指导。
我的问题：在进行可视化的时候遇到提示错误：
E:\Anaconda3\envs\pytorch-gpu-openSTL\python.exe H:/OpenSTL-master/tools/visualizations/vis_video.py -d mmnist -w work_dirs/Debug --index 0 --save_dirs fig_mmnist_vis
Traceback (most recent call last):
File "H:/OpenSTL-master/tools/visualizations/vis_video.py", line 136, in
main()
File "H:/OpenSTL-master/tools/visualizations/vis_video.py", line 45, in main
assert os.path.isdir(args.work_dirs)
AssertionError

Process finished with exit code 1
开始怀疑可能是work_dir的路径不正确，按照可视化文档（docs/en/visualization/video_visualization.md）中的指令进行修改后，
-d mmnist -w work_dirs/Debug --index 0 --save_dirs fig_mmnist_vis
仍然不能解决，以下是我从官网下载程序包并训练之后得到的文件的相关结构，是否与您的相同。谢谢。

Codebase for TAU

Caltech Dataset Download

Thank you for sharing

I encountered some issues while using the Caltech dataset. The dataset downloaded from your network disk contains many empty files with a size of 0b. If possible, could you please test it_ data_ Could you please upload the gzip.hkl and fileidx.hkl files again.

How to add early stop during training?

Nice work! Is early training an available option for OpenSTL?

prepare KTH dataset

Hello, I am trying to train SimVP in KTH, but I find the tool for KTH seems not enough.

After unzipping those .zip files, I got a lot of .avi videos. However, according to the dataloader script, I think under directories like 'boxing', there should be some images of shape (1000, 1000), but I only got .avi consisting of frames of shape (120, 160).

Could you please tell me if there are any steps I forget to take to perform training in KTH dataset? Thank you!

Categorical data

Hello, researchers.
Thank you for sharing the valuable code with us. I understand that most modules could be used on categorical data by adding a softmax to the end of the models and using categorical metrics (accuracy, f-score, etc).
Is this in your plans for the library?

Question regarding visualization

Dear Authors,

The pixel values of the experiments typically are normalized to a range (i.e. [0-1]). Is there any reason you use a regular 2D convolution layer without activation (such as sigmod/tanh layers that would normalize the values to such a range) as the last decoder's layer?

On the other hand, how do you normalize the output values for visualization? Do you clip them, or do you do it by another method?

Thanks!

Is it necessary for the number of input and output frames to be the same?

Thank you for doing such a wonderful job. I have a question: Does the predicted length of the SIMVP model need to be the same as the input length, or can they be different? For example, can we input 12 frames from the past and predict 4 frames in the future?

TAU的模型源码在哪呢

我看openstl里只有老模型啊

kth数据集问题

你好！首先，非常感谢你提供的优秀的代码库。

这两天我在尝试运行你的代码，遇到了一些问题：

scikit-image的版本应该小于0.19，否则在test时计算ssim会出现有关win-size的报错

2.可视化文档（docs/en/visualization/video_visualization.md）中的指令
python tools/visualizations/vis_video.py -d mmnist -w work_dirs/EXP/ --index 0 --save_dirs fig_mmnist_vis可能有误
我尝试模仿这个指令的格式进行输入，发生报错
Traceback (most recent call last):
File "tools/visualizations/vis_video.py", line 136, in
main()
File "tools/visualizations/vis_video.py", line 50, in main
base_dir = base_dir.split(method_list[0])[0]
ValueError: empty separator
当我把指令中’work_dirs/EXP/‘最后的/去掉后可视化能正常运行
所以我认为正确的指令格式应该是
python tools/visualizations/vis_video.py -d mmnist -w work_dirs/EXP --index 0 --save_dirs fig_mmnist_vis

3.我尝试使用文档中的方法加载kth数据集（数据为视频形式）
bash tools/prepare_data/download_kth.sh
python tools/train.py -d kth --lr 1e-3 -c configs/kth/PredRNNv2.py --ex_name kth-prev2
报错
NotADirectoryError: [Errno 20] Not a directory: './data/kth/boxing/person13_boxing_d4_uncomp.avi'
然后我去下载了原prernn库中提供的kth数据集（数据为图片形式）
python tools/train.py -d kth --lr 1e-3 -c configs/kth/PredRNNv2.py --ex_name kth-prev2
数据能加载一部分，但很快就out of memory了
我使用的是24G的3090进行训练，在原prernn库中训练时并不会遇到这个问题

最后，如果可以的话能在百度网盘中提供一下你使用的kth数据集吗？非常感谢

Question about time step is different between input and output.

Really thank you for your excellent contribution of STL.
However, I cannot forecast using SimVP mdoel while time step of input and output are different. So, I'd like to ask for the solving.

e.g.
input of the model is like B * T1 * C1 * H * W.
I'd like to got a output whose shape is B * T2 * C2 * H * W, which T2 is longer than T1.

How to create own projects using OpenSTL and train on multiple GPUs

Hello,

I was following tutorial.ipynb in examples/ directory. This example does single GPU training. I wanted to extend this to multi-GPU (single node) training. How can I do do this?

I have tried the following approach till now but I am not sure it's correct:

I tried adding to custom_training_config the following parameters:

    'use_gpu': True,
    'dist': True,
    'launcher': 'pytorch',

After this I would need to specify envvars such as RANK, WORLD_SIZE, MASTER_ADDR, MASTER_PORT. What value should I specify for RANK? Shouldn't it be auto set while spawning multiple processes?

重新跑model zoo中的模型，效果不一致

您好，非常感谢您出色的工作！
我按照文档中的流程运行
bash tools/prepare_data/download_mmnist.sh
python tools/train.py -d mmnist --lr 1e-3 -c configs/mmnist/simvp/SimVP_gSTA.py --ex_name mmnist_simvp_gsta

发现log中的train loss只有0.0几

2023-07-04 09:20:07,531 - val	 mse:103.34159088134766, mae:280.3106689453125
2023-07-04 09:20:07,532 - Intermediate result: 103.34159088134766  (Index 7)
2023-07-04 09:20:07,533 - Epoch: 8, Steps: 625 | Lr: 0.0000815 | Train Loss: 0.0216427 | Vali Loss: 0.0252299

2023-07-04 09:22:15,720 - val	 mse:102.59400177001953, mae:253.41189575195312
2023-07-04 09:22:15,722 - Intermediate result: 102.59400177001953  (Index 8)
2023-07-04 09:22:15,722 - Epoch: 9, Steps: 625 | Lr: 0.0000923 | Train Loss: 0.0209375 | Vali Loss: 0.0250474

2023-07-04 09:24:23,421 - val	 mse:99.95042419433594, mae:237.94679260253906
2023-07-04 09:24:23,423 - Intermediate result: 99.95042419433594  (Index 9)
2023-07-04 09:24:23,423 - Epoch: 10, Steps: 625 | Lr: 0.0001043 | Train Loss: 0.0200122 | Vali Loss: 0.0244020

这和文档里下载的mmnist_simvp_s_gsta_one_ep200.log中train loss的结果不是很相符

2023-02-16 23:03:11,358 - val	 mse:88.54080200195312, mae:213.62892150878906
2023-02-16 23:03:11,360 - Epoch: 8, Steps: 625 | Lr: 0.0000815 | Train Loss: 13.5592894 | Vali Loss: 0.0216210

2023-02-16 23:06:08,100 - val	 mse:85.24305725097656, mae:203.82467651367188
2023-02-16 23:06:08,102 - Epoch: 9, Steps: 625 | Lr: 0.0000923 | Train Loss: 13.0065427 | Vali Loss: 0.0208163

2023-02-16 23:09:04,901 - val	 mse:85.50586700439453, mae:193.38180541992188
2023-02-16 23:09:04,902 - Epoch: 10, Steps: 625 | Lr: 0.0001043 | Train Loss: 12.7741899 | Vali Loss: 0.0208791

请问可以帮助我理解一下吗？

Custom dataset training

It is a great honor to learn your code, and the results obtained by reproducing the moving-mnist dataset are highly consistent with the results in your literature. The work I am currently doing is to imitate the movement and prediction of mice predation, and create a custom data set with a shape and data structure similar to the moving-mnist test set mnist_test_seq.npy format. The data loader code is as follows, also I give an example of my dataset, but it is confusing surprisingly, after several parameter adjustments, the training loss and test loss both decreased from thousands, and the image quality evaluation index was very large (7-digit MSE). It is indeed difficult to solve the trouble, sincerely request I would be very grateful for your help.

import cv2
import gzip
import numpy as np
import os
import random

import torch
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
from PIL import Image

from openstl.datasets.utils import create_loader

def load_mnist(root, data_name='mnist'):
# Load train dataset.
file_map = {
'mnist': 'moving_mnist/mnist_train_seq.npy',
}
path = os.path.join(root, file_map[data_name])
mnist = np.load(path)
mnist = mnist.transpose(1, 0, 2, 3)
return mnist

def load_fixed_set(root, data_name='mnist'):
# Load the test dataset
file_map = {
'mnist': 'moving_mnist/mnist_test_seq.npy',
}
path = os.path.join(root, file_map[data_name])
dataset = np.load(path)
dataset = dataset.transpose(1, 0, 2, 3)

return dataset

class MovingMNIST(Dataset):

def __init__(self, root, is_train=True, data_name='mnist',
             n_frames_input=10, n_frames_output=10,
             transform=None, target_transform=None, use_augment=False):
    super(MovingMNIST, self).__init__()

    self.dataset = None
    self.root = os.path.expanduser(root)
    self.transform = transform
    self.target_transform = target_transform
    self.is_train = is_train
    self.data_name = data_name
    
    if self.is_train:
        self.mnist = load_mnist(root, data_name)
    else:
        self.dataset = load_fixed_set(root, data_name)

    self.n_frames_input = n_frames_input
    self.n_frames_output = n_frames_output
    self.n_frames_total = self.n_frames_input + self.n_frames_output
    self.use_augment = use_augment

    self.mean = 0
    self.std = 1

def _augment_seq(self, imgs, crop_scale=0.94):
    """Augmentations for video"""
    _, _, h, w = imgs.shape  # original shape, e.g., [10, 1, 64, 64]
    imgs = F.interpolate(imgs, scale_factor=1 / crop_scale, mode='bilinear')
    _, _, ih, iw = imgs.shape
    # Random Crop
    x = np.random.randint(0, ih - h + 1)
    y = np.random.randint(0, iw - w + 1)
    imgs = imgs[:, :, x:x + h, y:y + w]
    # Random Flip
    if random.randint(-2, 1) > 0:
        imgs = torch.flip(imgs, dims=(2, 3))  # rotation 180
    elif random.randint(-2, 1) > 0:
        imgs = torch.flip(imgs, dims=(2,))  # vertical flip
    elif random.randint(-2, 1) > 0:
        imgs = torch.flip(imgs, dims=(3,))  # horizontal flip
    return imgs

def __getitem__(self, index):
    # need to iterate over time
    def _transform_time(data):
        new_data = None
        for i in range(data.size(0)):
            img = Image.fromarray(data[i].numpy(), mode='L')
            new_data = self.transform(img) if new_data is None else torch.cat([self.transform(img), new_data], dim=0)
        return new_data

    if self.is_train:
        input, output = self.mnist[index, :self.n_frames_input], self.mnist[index, self.n_frames_input:self.n_frames_total]
    else:
        input, output = self.dataset[index, :self.n_frames_input], self.dataset[index, self.n_frames_input:self.n_frames_total]

    if self.transform is not None:
        input = _transform_time(input)
    if self.target_transform is not None:
        output = _transform_time(output)
        
    # Reshape the input and output tensors
    input = torch.from_numpy(np.expand_dims(input, axis=1)).float()
    output = torch.from_numpy(np.expand_dims(output,  axis=1)).float()


    return input, output

def __len__(self):
    if self.is_train:
        return len(self.mnist)
    else:
        return len(self.dataset)

def load_data(batch_size, val_batch_size, data_root, num_workers=4, data_name='mnist',
pre_seq_length=10, aft_seq_length=10, in_shape=[10, 1, 64, 64],
distributed=False, use_augment=False, use_prefetcher=False, drop_last=True):
image_size = in_shape[-1] if in_shape is not None else 64
train_set = MovingMNIST(root=data_root, is_train=True, data_name=data_name,
n_frames_input=pre_seq_length,
n_frames_output=aft_seq_length, use_augment=False)
test_set = MovingMNIST(root=data_root, is_train=False, data_name=data_name,
n_frames_input=pre_seq_length,
n_frames_output=aft_seq_length, use_augment=False)

dataloader_train = create_loader(train_set,
                                 batch_size=batch_size,
                                 shuffle=True, is_training=True,
                                 pin_memory=True, drop_last=True,
                                 distributed=distributed, use_prefetcher=use_prefetcher)
dataloader_vali = create_loader(test_set,
                                batch_size=val_batch_size,
                                shuffle=False, is_training=False,
                                pin_memory=True, drop_last=drop_last,
                                distributed=distributed, use_prefetcher=use_prefetcher)
dataloader_test = create_loader(test_set,
                                batch_size=val_batch_size,
                                shuffle=False, is_training=False,
                                pin_memory=True, drop_last=drop_last,
                                distributed=distributed, use_prefetcher=use_prefetcher)

return dataloader_train, dataloader_vali, dataloader_test

if name == 'main':
from openstl.utils import init_dist
os.environ['LOCAL_RANK'] = str(0)
os.environ['RANK'] = str(0)
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12357'
dist_params = dict(launcher='pytorch', backend='nccl', init_method='env://', world_size=1)
init_dist(**dist_params)

dataloader_train, _, dataloader_test = \
    load_data(batch_size=4,
              val_batch_size=4,
              data_root='../../data/',
              num_workers=4,
              data_name='mnist',
              pre_seq_length=10,
              aft_seq_length=10,
              distributed=False,
              use_prefetcher=False)

print(len(dataloader_train), len(dataloader_test))
for item in dataloader_train:
    print(item[0].shape, item[1].shape)
    break
for item in dataloader_test:
    print(item[0].shape, item[1].shape)
    break

Pre training model

Thank you for sharing

I am very interested in your work. If possible, could you share the pre trained models on the SimVP taxibj dataset with me.

Find difference between the results of OpenSTL and the original paper?

Thanks for the great work! But I am confused by the results reported in your OpenSTL paper, it seems that the results are different from the results in the original papers. For example, the result of MovingMNIST in SimVP(original paper) is 23.8 (MSE), but in this repo the result is 32.15 (MSE). And there are a lot of the same problems for other methods, such as PredRNN++. So could you explain this? Are the results reproduced by the OpenSTL framework just not the same as the original ones? or there are some other reasons? Thanks in advance！

Best

Data normalization in the example notebook

Hello researchers. First of all, thank you for sharing this library with the scientific community. I feel that it is going to be really valuable many future researches. I adapted the example you provided to one of my data sets with my custom dataloader and it worked very well with impressive results with SimVP+gSTA. I found some issues, though:

First, the notebook on git wasn't the same as you provided in google drive. I think the git version might be older.
Second, the normalization procedure in the custom dataset might not be working correctly. I changed it because it had the batch dimension on the reshape. But this is the dataset, the batch is not yet defined.

mean = self.data.mean(axis=(0, 1, 2, 3)).reshape(1, -1, 1, 1)
std = self.data.std(axis=(0, 1, 2, 3)).reshape(1, -1, 1, 1)
self.data = (self.data - mean) / std
self.mean = mean
self.std = std

Third, in the notebook you mentioned that the range should be [0,1]. This normalization doesn't achieve that. I kept it like this and the results were very impressive, though I usually work with [0,1]. I'l probably test with [0,1] later, using a min-max scaler instead of stddev.

The link of CrevNet is not correct

taxibj model results implementation

Thank you for sharing the code
I was trying to reproduce the results of the SimVP model in taxibj and I found that I needed to make minor changes to the code in 'config_utils.py' to start training the model.

temp_config_file = tempfile.NamedTemporaryFile( dir=temp_config_dir, suffix=fileExtname,delete=False)
I don't know if this has affected my recurrence results.I tried to reproduce the taxibj dataset using code similar to the mmnist example program.taxibj training time set to 50.

python tools/non_dist_train.py -d taxibj -c ./configs/taxibj/simvp/SimVP_gSTA.py --ex_name taxibj_simvp_gsta
The result of the replication is - mse:0.41158363223075867.Can you check if there is anything wrong with our settings?

Video Prediction in Medical Images

Thanks for the great framework.

I would like to utilize this to forecast the motion of the liver in 2D mri slices. The forecasting seems is training the loss approaching 0.002 on average in val set.

However, the images are really noisy and I don't understand it.

My code is as follows:
`from typing import Any, Optional
import torch
import pytorch_lightning as pl
from pytorch_lightning.utilities.types import STEP_OUTPUT
from openstl.models.simvp_model import SimVP_Model
from torch.nn.functional import mse_loss

class VideoTransformer(pl.LightningModule):
def init(self, ) -> None:
super().init()
self.model = SimVP_Model((10, 1, 32, 64))

def forward(self, x: Any) -> Any:
    y_hat = self.model(x)
    return y_hat
    
def training_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self(x)
    loss = mse_loss(y_hat, y)
    self.log('train_loss', loss)
    return loss

def validation_step(self, batch, batch_dix):
    x, y = batch
    y_hat = self(x)
    loss = mse_loss(y_hat, y)
    self.log('val_loss', loss)
    return loss

def configure_optimizers(self) -> Any:
    return torch.optim.Adam(self.parameters(), lr=1e-4)`

文章在哪里可以看到呢

Refactoring and Bumping to V0.2.0

Release of OpenSTL V0.2.0 for various spatioTemporal predictive learning (STL) tasks.

Code Refactoring

Rename the project to OpenSTL instead of SimVPv2 with module name refactoring.
Refactor the code structure thoroughly to support DDP and non-dist training & testing.

New Features

Update the Weather Bench dataloader with 5.625deg, 2.8125deg, and 1.40625deg settings.

Updating documents

Update documents of video prediction and weather prediction benchmarks. Provide config files for supported mixup methods.
Update docs/en documents for the basic usages and new features of V0.2.0.

Fix Bugs

Fix bugs in training loops and validation loops to save GPU memory.
There might be some bugs in not using all parameters for calculating losses in ConvLSTM CrevNet, which should use --find_unused_parameters for DDP training.

TaxiBJ - Question

Hi,
I would like to know why PSNR was not used to evaluate the TaxiBJ performance?

Thanks,
Mareeta

About the updated download_taxibj.sh file and his dataset

First of all, thank you very much for your work.
I saw your update about the taxibj dataset and showed the format of the dataset in tools\prepare_data\download_taxibj.sh. I downloaded https://github.com/chengtan9907/OpenSTL/releases/download/v0.1.0/taxibj_dataset.zip
dataset and looked at the format of the .npz file.

My problem is that by looking at the X_train dataset in .npz, he structure is (20461,4,2,32,32) for a 5D tensor data. But in the original author's explanation of the dataset you gave it shows that the dataset is a 4D tensor, so I would like to ask you what does this 5D structure represent respectively?

New Features

Support visualization tools in vis_video, config files in configs, and trained files (models, logs, and visualizations) in v0.3.0 of STL methods on various datasets (on updating).
Support the dataloader of video classification datasets Kinetics, which has a similar setting as the Human3.6M dataloader. Relevant video transforms are supported according to VideoMAE, and config files are provided.
Update STL results visualization by vis_video for video prediction, traffic prediction, weather prediction tasks in video_visualization, traffic_visualization, and weather_visualization.

Updating documents

Update benchmark results of video prediction, traffic prediction, and weather prediction benchmarks in docs/en/model_zoos.

Fix Bugs

Fix bugs in the dataloader (issue #26) and dataset prepration tools (issue #27 and #28).

chengtan9907 / openstl Goto Github PK

openstl's People

Contributors

Stargazers

Watchers

Forkers

openstl's Issues

I just installed OpenSTL and tried to run 'mmnist' example case given in your documentation.

Code Refactoring

New Features

Updating documents

Fix Bugs

New Features

Updating documents

Fix Bugs

Recommend Projects

Recommend Topics

Recommend Org