brjathu / lart Goto Github PK

Code repository for the paper "On the Benefits of 3D Pose and Tracking for Human Action Recognition", (CVPR 2023)

Home Page: https://github.com/brjathu/LART

Python 1.51% Shell 0.02% Jupyter Notebook 98.48%

lart's Introduction

On the Benefits of 3D Pose and Tracking for Human Action Recognition (CVPR 2023)

Code repository for the paper "On the Benefits of 3D Pose and Tracking for Human Action Recognition".
Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik.

This code repository provides a code implementation for our paper LART (Lagrangian Action Recognition with Tracking), with installation, training, and evaluating on datasets, and a demo code to run on any youtube videos.

Installation

After installing the PyTorch 2.0 dependency, you can install our lart package directly as:

pip install git+https://github.com/brjathu/LART

Step-by-step instructions

conda create -n lart python=3.10
conda activate lart
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install -e .[demo]

If you only wants to train the model and not interested in running demo on any videos, you dont need to install packages rquired for demo code (pip install -e .).

Demo on videos

We have provided a colab notebook to run our model on any youtube video. Please see the demo notebook for details. Since colab has 15 Gb memory limit, the dev branch is running on half precision. If you want to run the demo on full precision, please clone the repo and run the demo locally.

Our Action recogition model uses PHALP to track people in videos, and then uses the tracklets to classify actions. pip install -e .[demo] will install nessory dependencies for running the demo code. Now, run demo.py to reconstruct, track and recognize humans in any video. Input video source may be a video file, a folder of frames, or a youtube link:

# Run on video file
python scripts/demo.py video.source="assets/jump.mp4"

# Run on extracted frames
python scripts/demo.py video.source="/path/to/frames_folder/"

# Run on a youtube link (depends on pytube working properly)
python scripts/demo.py video.source=\'"https://www.youtube.com/watch?v=xEH_5T9jMVU"\'

The output directory (./outputs by default) will contain a video rendering of the tracklets and a .pkl file containing the tracklets with 3D pose and shape and action labels. Please see the PHALP repository for details. The model for demo, uses MViT as a backend and a single person model. The demo code requires about 32 GB memory to run the slowfast code. [TODO: Demo with Hiera backend.]

Training and Evaluation

Download the datasets

We are releasing about 1.5 million tracks of people from Kinetics 400 and AVA datasets. Run the following command to download the data from dropbox and extract them to the data folder. This will download preprocessed data form ava_val, ava_train and kinetics_train datasets (preporcessed data for AVA-kinetics and multispots will be released soon). These tracjectories contains backbone features as well as ground-truth annotations and pseudo ground-truth annotations. For more details see DATASET.md

./scripts/download_data.sh

Please follow the "Demo on videos" section before proceeding with the "Training and Evaluation" section. This is essential to download the required files.

Train LART model

For single node training, run the following command. This will evaulate the model at every epochs and compute the mean average precision on the validation set.

# # LART full model. 
python lart/train.py -m \
--config-name lart.yaml \
trainer=ddp_unused \
task_name=LART \
trainer.devices=8 \
trainer.num_nodes=1 \

Evaluate LART model

First download the pretrained model by running the following command. This will download the pretrained model to ./logs/LART_Hiera/ folder.

./scripts/download_checkpoints.sh

Then run the following command to evaluate the model on the validation set. This will compute the mean average precision on the validation set (AVA-K evaluation will be released soon). For AVA, provided checkpoint should give 45.1 mAP.

# # LART full model. 
python lart/train.py -m \
--config-name lart.yaml \
trainer=ddp_unused \
task_name=LART_eval \
train=False \
trainer.devices=8 \
trainer.num_nodes=1 \
[email protected] \
configs.weights_path=logs/LART_Hiera/0/checkpoints/epoch_002-EMA.ckpt \

Please specify a different configs.weights_path to evaluate your own trained model. However every checkpoint will be evaluated during training, and results will be saved to ./logs/<MODEL_NAME>/0/results/ folder.

Results

The following table shows the performance of our model on AVA-2.2 dataset.

Model	Pretrain	mAP
SlowFast R101, 8×8	K400	23.8
SlowFast 16×8 +NL	K600	27.5
X3D-XL	K600	27.4
MViTv1-B-24, 32×3	K600	28.7
Object Transformer	K600	31.0
ACAR R101, 8×8 +NL	K700	33.3
MViT-L↑312, 40×3	IN-21K+K400	31.6
MaskFeat	K600	38.8
Video MAE	K400	39.5
Hiera	K700	43.3
LART - MViT	K400	42.3 (+2.8)
LART - Hiera	K400	45.1 (+2.5)

Citation

If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:

@inproceedings{rajasegaran2023benefits,
  title={On the Benefits of 3D Pose and Tracking for Human Action Recognition},
  author={Rajasegaran, Jathushan and Pavlakos, Georgios and Kanazawa, Angjoo and Feichtenhofer, Christoph and Malik, Jitendra},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={640--649},
  year={2023}
}

lart's People

Contributors

Stargazers

Watchers

lart's Issues

Questions about encoding

Hi. I have some questions on how you encoded the SMPL parameters.

If you already have it in 229 dim vector form, what is the case for reducing the dimensionality to 128?
How was the SMPL predictions from PHALP encoded into a 229 dim vector in the first place in the dataset? I assume it was simple concatenation but wanted to know if any additional processing was done.
Why subtract the SMPL poses by mean and standard deviation instead of directly using the PHALP predictions? Wouldn't that cause a skew in the image space?

Numpy upgrade error for training and evaluating

Hi, I failed to run the training and evaluating scripts with numpy == 1.26.4 with following error,

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (97148,) + inhomogeneous part. Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

So I change the code from line 81 of file phalp_action_dataset.py from self.track2video = np.array(self.track2video) to self.track2video = np.asarray(self.track2video, dtype=object) and solve it. Might caused by the numpy upgrade?

If you are happy with this, I will pull request the problem and send it back.

Many thanks.

Fine-tune on a custom dataset

@brjathu Thank you for sharing this amazing work.
I have a question regarding finetuning on a custom dataset. I have run the demo on some videos to get the results_temporal_fast pkl files. I modified them to use my gt labels instead of the 80 labels of ava dataset.

I noticed that the demo provides ways for predicting both ava labels and kinetics labels. I'm not sure if I should replace the ava part or the kinetic part with my gt. The code seems to use pseudo labels for ava label space as well, so the dimension of the pseudo labels is not the same as my gt labels. Should I get rid of the pseudo label part since I have gt annotation for each frame ?

My other question is how can I use the pretrained model that you provide to finetune it on my dataset of smaller number of classes ?
Thanks

How to create customer dataset?

May I have codes for creating customer training data from a dataset in the same format as the AVA dataset(RGB and labels)? Thank you so much.

An error occurred when running the code on colab

Hello, first of all thanks for your contribution, why I run this code on colab: !python scripts/demo.py video.source="assets/jump.mp4" +half=True, I get the following error: INFO OpenGL_accelerate module, then In Visualize the reconstructions on your colab interface, there is no processed video after the code runs here, and only the original video is output. Please answer, thank you!

License

Hello,

Great project !
I am a bit curious however, no license seems to be mentioned, is this intended ?

No module named 'sklearn.linear_model'

LART-main$ CUDA_VISIBLE_DEVICES=2 python scripts/demo.py video.source="assets/jump.mp4" +half=True
Traceback (most recent call last):
File "/mnt/745425dc-5d5b-4490-9d38-57169dcb2ab3/wxl/LART-main/scripts/demo.py", line 12, in
from phalp.trackers.PHALP import PHALP
File "/home/user/anaconda3/envs/lart/lib/python3.10/site-packages/phalp/trackers/PHALP.py", line 19, in
from sklearn.linear_model import Ridge
ModuleNotFoundError: No module named 'sklearn.linear_model'

slowfast need scikit-learn instead of sklearn,so how can i solve such question ?

Cannot reproduce results on ava

Hi authors,
I edit the scripts/demo.py file, small_w and small_h, from 100 200 to 25 50:
cfg.phalp.small_w = 25
cfg.phalp.small_h = 50
then run
python scripts/demo.py video.source="assets/jump.mp4"
my visualization results is the same as the one in README.
But when I followe README to evaluate AVA, the mAP is very low.
This is the evaluation of downloaded checkpoint

srun -u -p 3dmr-head3d --gres=gpu:8 python lart/train.py -m \
--config-name lart.yaml \
trainer=ddp_unused \
task_name=LART_eval \
train=False \
trainer.devices=8 \
trainer.num_nodes=1 \
[email protected] \
configs.weights_path=logs/LART_Hiera/0/checkpoints/epoch_002-EMA.ckpt \

...
2.0210105724788336                                                                                                                                                                                                                                                                                                                                                                                                                                                    
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                                                                                                                                                                                                                                                                                                                                                                                   
┃      Validate metric      ┃       DataLoader 0        ┃                                                                                                                                                                                                                                                                                                                                                                                                                   
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩                                                                                                                                                                                                                                                                                                                                                                                                                   
│           mAP :           │     2.021010637283325     │                                                                                                                                                                                                                                                                                                                                                                                                                   
│      val/loss/action      │    0.40725961327552795    │                                                                                                                                                                                                                                                                                                                                                                                                                   
│       val/loss/loca       │            0.0            │                                                                                                                                                                                                                                                                                                                                                                                                                   
│       val/loss/pose       │            0.0            │                                                                                                                                                                                                                                                                                                                                                                                                                   
│          val/mAP          │     2.021010637283325     │                                                                                                                                                                                                                                                                                                                                                                                                                   
└───────────────────────────┴───────────────────────────┘

Full Log:
eval_epoch_002-EMA.log

I also train a new model with this command

# # LART full model. 
python lart/train.py -m \
--config-name lart.yaml \
trainer=ddp_unused \
task_name=LART \
trainer.devices=8 \
trainer.num_nodes=1 \

After 29 epoch, mAP is 14.80

mAP : 14.804414425674029
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│           mAP :           │    14.804414749145508     │
│      val/loss/action      │    0.3739081025123596     │
│       val/loss/loca       │            0.0            │
│       val/loss/pose       │            0.0            │
│          val/mAP          │    14.804414749145508     │
└───────────────────────────┴───────────────────────────┘

Other infomation:
tensorboard:

evaluated and training data seems not broken:

ls data/ava_val | wc -l
97148
ls data/ava_train | wc -l
300000
ls data/kinetics_train | wc -l
1213908
du -sh data/ava_val.tar.gz 
26G     data/ava_val.tar.gz
du -sh data/*.tar.gz
27G     data/ava_train-aa.tar.gz
27G     data/ava_train-ab.tar.gz
27G     data/ava_train-ac.tar.gz
7.6G    data/ava_train-ad.tar.gz
26G     data/ava_val.tar.gz
16G     data/kinetics_train-aa.tar.gz
16G     data/kinetics_train-ab.tar.gz
16G     data/kinetics_train-ac.tar.gz
16G     data/kinetics_train-ad.tar.gz
16G     data/kinetics_train-ae.tar.gz
16G     data/kinetics_train-af.tar.gz
16G     data/kinetics_train-ag.tar.gz
16G     data/kinetics_train-ah.tar.gz
16G     data/kinetics_train-ai.tar.gz
16G     data/kinetics_train-aj.tar.gz
16G     data/kinetics_train-ak.tar.gz
16G     data/kinetics_train-al.tar.gz
2.2G    data/kinetics_train-am.tar.gz
ls -l data/*.tar.gz 
-rw-r--r-- 1 fansiming fansiming 28210381202 Apr 15 14:59 data/ava_train-aa.tar.gz
-rw-r--r-- 1 fansiming fansiming 28240788377 Apr 15 13:14 data/ava_train-ab.tar.gz
-rw-r--r-- 1 fansiming fansiming 28194066666 Apr 15 14:19 data/ava_train-ac.tar.gz
-rw-r--r-- 1 fansiming fansiming  8139723131 Apr 15 12:11 data/ava_train-ad.tar.gz
-rw-r--r-- 1 fansiming fansiming 27415205124 Apr 15 14:34 data/ava_val.tar.gz
-rw-r--r-- 1 fansiming fansiming 16830571176 Apr 15 14:03 data/kinetics_train-aa.tar.gz
-rw-r--r-- 1 fansiming fansiming 16956700310 Apr 15 12:49 data/kinetics_train-ab.tar.gz
-rw-r--r-- 1 fansiming fansiming 17023606161 Apr 15 12:59 data/kinetics_train-ac.tar.gz
-rw-r--r-- 1 fansiming fansiming 17065002575 Apr 15 13:44 data/kinetics_train-ad.tar.gz
-rw-r--r-- 1 fansiming fansiming 17073904502 Apr 15 12:20 data/kinetics_train-ae.tar.gz
-rw-r--r-- 1 fansiming fansiming 16948081754 Apr 15 14:43 data/kinetics_train-af.tar.gz
-rw-r--r-- 1 fansiming fansiming 17005286776 Apr 15 13:25 data/kinetics_train-ag.tar.gz
-rw-r--r-- 1 fansiming fansiming 17023338499 Apr 15 12:30 data/kinetics_train-ah.tar.gz
-rw-r--r-- 1 fansiming fansiming 16889507647 Apr 15 12:06 data/kinetics_train-ai.tar.gz
-rw-r--r-- 1 fansiming fansiming 17093008481 Apr 15 13:54 data/kinetics_train-aj.tar.gz
-rw-r--r-- 1 fansiming fansiming 16957338722 Apr 15 12:40 data/kinetics_train-ak.tar.gz
-rw-r--r-- 1 fansiming fansiming 16926483947 Apr 15 13:35 data/kinetics_train-al.tar.gz
-rw-r--r-- 1 fansiming fansiming  2348990464 Apr 15 13:16 data/kinetics_train-am.tar.gz

Any suggestion?

KeyError: 'apperance_index'

Hi, thank you for your excellent work.

I followed this issue to make PHALP save features for my dataset videos. I had to set the save_fast_tracks = True on phalp/configs/base.py, as well as save_fast_tracks = true on outputs/.hydra/config.yaml to make the post_process function be executed. However, I'm getting the following error:

Error executing job with overrides: ['video.source=assets/gymnasts_short.mp4', '+half=True']
Traceback (most recent call last):
  File "/home/lecun/LART_dev/scripts/demo.py", line 114, in main
    lart_model.postprocessor.run_lart(pkl_path)
  File "/home/lecun/LART_dev/venv/lib/python3.10/site-packages/phalp/visualize/postprocessor.py", line 102, in run_lart
    final_visuals_dic  = self.post_process(final_visuals_dic, save_fast_tracks=self.cfg.post_process.save_fast_tracks, video_pkl_name=video_pkl_name)
  File "/home/lecun/LART_dev/venv/lib/python3.10/site-packages/phalp/visualize/postprocessor.py", line 42, in post_process
    for idx, appe_idx in enumerate(smoothed_fast_track_['apperance_index']):
KeyError: 'apperance_index'

I'm using default configs that come from LART repository. I'm also using dev branch to run my code in half precision.

Do you have any suggestion for why this is happening?

Thanks

Demo not generating sequences

The demo colab notebook seems broken. Everything installs alright but on running the demo:

!python scripts/demo.py video.source="assets/jump.mp4" +half=True
No sequences are generated, hence:

FileNotFoundError: [Errno 2] No such file or directory: 'outputs/results_temporal_videos/'

Difference between smoothed track and non smoothed track

Hi,

In the phalp/visualize/postprocessor.py, in the run_lart() function, there is this code:

final_visuals_dic = joblib.load(phalp_pkl_path)

Where the function read the tracks generated by PHALP from videos. By my understanding, in the phalp_pkl_path file there are all features of all tracks in the target video. However, some lines of code later, the post_process() function is called, where it calls:

self.phalp_tracker.pose_predictor.smooth_tracks()

where all the tracks are smoothed.

What is the difference between smoothed tracks and non smoothed tracks? Can I use the features of non smoothed tracks in my application?

Thanks

configuration problem

Hi, author. I want to train time configuration. Can you tell me your memory and how long you train at such configuration.

cannot download full dataset from dropbox

Download through chrome, download success after two hours, but filesize is 11.3GB < 15.8GB.

Download through wget:

The download process always stop at around 1-2hourds, then the file is truncated without throwing error message.
Is there another source? like google drive?

Sequences are slightly lagged?

The output video has on the left the original video and on the right the reconstructed video with SMPL skins for humans. Now, when the system is run on a video with very fast actions by humans (e.g. hands are moving extremely rapidly) it appears that there is a perceptible lag between the LHS and the RHS.

Do you know why? Is it known to be constant (in which case it does not matter, one can just adjust the two) or does it vary on some parameters (e.g. time, complexity of preceding actions)...?

Dependency issues on Windows

I'm running into multiple dependency issues on Windows Anaconda. It's not just CUDA (I also installed CUDA 11.7 since it was running into CUDA_HOME issues beforehand). Some are with detectron2 and neural-renderer-pytorch

Can't unpickle the generated pickle files

I generated output from my own videoclip, running on a python 3.10.12 notebook in Google collab. (sys.version within the notebook agrees with what is produced by !python --version.) I used the given command:

!python scripts/demo.py video.source="../../4dhumans/inputs/round5_clip.mp4"

However, I cannot read any of the pkl files in outputs/results_temporal_fast. I get:

with open("demo_round5_clip_100_38.pkl", "rb") as f:
  z = pkl.load(f)

==>

UnpicklingError                           Traceback (most recent call last)
[<ipython-input-21-802b467ad1f5>](https://localhost:8080/#) in <cell line: 1>()
      1 with open("demo_round5_clip_100_38.pkl", "rb") as f:
----> 2   z = pkl.load(f)

UnpicklingError: invalid load key, '\x0c'.

Even running python -m pickletools <filename> gives an error:

191: \x94         MEMOIZE    (as 22)
  192: \x88         NEWTRUE
  193: \x8c         SHORT_BINUNICODE 'numpy_array_alignment_bytes'
  222: \x94         MEMOIZE    (as 23)
  223: K            BININT1    16
  225: u            SETITEMS   (MARK at 69)
  226: b        BUILD
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/pickletools.py", line 2883, in <module>
    dis(args.pickle_file[0], args.output, None,
  File "/usr/lib/python3.10/pickletools.py", line 2448, in dis
    for opcode, arg, pos in genops(pickle):
  File "/usr/lib/python3.10/pickletools.py", line 2285, in _genops
    raise ValueError("at position %s, opcode %r unknown" % (
ValueError: at position 227, opcode b'\x0c' unknown

Any clue?

Problem with Downloading Dependencies

I am having some trouble downloading dependencies for running the demo. FYI, I am running this on Windows and in a Conda environment.

First problem is that I get this error when I try running the line pip install -e .[demo] The line that I get is as follows:

Error in atexit._run_exitfuncs: Traceback (most recent call last): File "C:\Users\ahnji\anaconda3\lib\site-packages\colorama\ansitowin32.py", line 59, in closed return stream.closed ValueError: underlying buffer has been detached ERROR: Failed building wheel for detectron2

The second problem is that if I download detectron2 elsewhere and pass this error, I get to a stage where the OpenGL library causes the problem that goes along the lines of such:
raise ImportError("Unable to load OpenGL library", *err.args) ImportError: ('Unable to load OpenGL library', "Could not find module 'OSMesa' (or one of its dependencies)

Can you help me resolve this issue?

Script to generate tracks and features

@brjathu Great work. Thank you for sharing
May request to make script avaible in public to generate data in the format found in https://github.com/brjathu/LART/blob/main/scripts/download_data.sh. I want to apply this to multisports.

Thanks
Gurkirt

Dear author, I am always unable to install the detectron2 dependency

The error is as follows：
ERROR: Could not build wheels for detectron2, which is required to install pyproject.toml-based projects

I am a beginner in computer vision. I want to try to run the demo by myself, but during the installation of the environment, there will always be errors. I guess it is a problem of version incompatibility. If you can take the time to help me solve this problem, I will be very grateful.

Pre trained model not found

Hello author, during the initialization process of the model, some pre trained models need to be loaded. Where can I download these models?
eg:
FileNotFoundError: [Errno 2] No such file or directory: '/home/zck/.cache/phalp/3D/smpl_mean_params.npz'
code：
mean_params = np.load(cfg.MODEL.SMPL_HEAD.SMPL_MEAN_PARAMS) # '/home/zck/.cache/phalp/3D/smpl_mean_params.npz'
init_body_pose = torch.from_numpy(mean_params['pose'].astype(np.float32)).unsqueeze(0)
init_betas = torch.from_numpy(mean_params['shape'].astype('float32')).unsqueeze(0)
init_cam = torch.from_numpy(mean_params['cam'].astype(np.float32)).unsqueeze(0)
self.register_buffer('init_body_pose', init_body_pose)
self.register_buffer('init_betas', init_betas)
self.register_buffer('init_cam', init_cam)

When the demo with Hiera backend would be released?

When demo with Hiera backend would be released?

CUDA OUT OF MEMORY

Can 4090 run the demo?
(lart) lxz@a4061-MS-7E06:~/projects/LART$ python scripts/demo.py video.source="assets/jump.mp4" +half=True
[02/26 14:54:37] INFO No OpenGL_accelerate module loaded: No module named 'OpenGL_accelerate' acceleratesupport.py:17
[2024-02-26 14:54:38,556][pytorch_lightning.utilities.migration.utils][INFO] - Lightning automatically upgraded your loaded checkpoint from v1.8.1 to v2.2.0.post0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/4DHumans/logs/train/multiruns/hmr2/0/checkpoints/epoch=35-step=1000000.ckpt
WARNING: You are using a SMPL model, with only 10 shape coefficients.
[2024-02-26 14:54:40,603][phalp.trackers.PHALP][INFO] - Loading Predictor model...
[2024-02-26 14:54:40,730][phalp.trackers.PHALP][INFO] - Loading Detection model...
[2024-02-26 14:54:44,187][detectron2.checkpoint.detection_checkpoint][INFO] - [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl ...
[2024-02-26 14:54:45,620][detectron2.checkpoint.detection_checkpoint][INFO] - [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl ...
[2024-02-26 14:54:45,709][phalp.trackers.PHALP][INFO] - Setting up Visualizer...
[2024-02-26 14:54:45,954][phalp.utils.io][INFO] - Number of frames: 171
[2024-02-26 14:54:46,181][phalp.trackers.PHALP][INFO] - Setting up DeepSort...
[2024-02-26 14:54:46,181][phalp.trackers.PHALP][INFO] - Saving tracks at : outputs//results/jump
Tracking : jump ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% eta : 0:00:00 time elapsed : 0:00:49
[2024-02-26 14:55:36,588][pytorch_lightning.utilities.migration.utils][INFO] - Lightning automatically upgraded your loaded checkpoint from v1.8.1 to v2.2.0.post0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/4DHumans/logs/train/multiruns/hmr2/0/checkpoints/epoch=35-step=1000000.ckpt
WARNING: You are using a SMPL model, with only 10 shape coefficients.
[2024-02-26 14:55:38,351][main][INFO] - Loading Predictor model...
[2024-02-26 14:55:38,651][phalp.trackers.PHALP][INFO] - Loading Detection model...
[2024-02-26 14:55:41,734][detectron2.checkpoint.detection_checkpoint][INFO] - [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl ...
[2024-02-26 14:55:43,150][detectron2.checkpoint.detection_checkpoint][INFO] - [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl ...
[2024-02-26 14:55:43,309][phalp.trackers.PHALP][INFO] - Setting up Visualizer...
[2024-02-26 14:55:44,810][slowfast.visualization.predictor][INFO] - Start loading model weights.
[2024-02-26 14:55:44,810][slowfast.utils.checkpoint][INFO] - Loading network weights from /home/lxz/.cache/phalp/ava/mvit.pyth.
missing keys: []
unexpected keys: []
[2024-02-26 14:55:45,121][slowfast.visualization.predictor][INFO] - Finish loading model weights
Error executing job with overrides: ['video.source=assets/jump.mp4', '+half=True']
Traceback (most recent call last):
File "/home/lxz/projects/LART/scripts/demo.py", line 103, in main
lart_model.postprocessor.run_lart(pkl_path)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/phalp/visualize/postprocessor.py", line 102, in run_lart
final_visuals_dic = self.post_process(final_visuals_dic, save_fast_tracks=self.cfg.post_process.save_fast_tracks, video_pkl_name=video_pkl_name)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/phalp/visualize/postprocessor.py", line 36, in post_process
smoothed_fast_track_ = self.phalp_tracker.pose_predictor.smooth_tracks(fast_track_, moving_window=True, step=32, window=32)
File "/home/lxz/projects/LART/lart/utils/wrapper_phalp.py", line 226, in smooth_tracks
fast_track = self.add_slowfast_features(fast_track)
File "/home/lxz/projects/LART/lart/utils/wrapper_phalp.py", line 203, in add_slowfast_features
task_ = SlowFastWrapper(t_, cfg, list_of_all_frames, mid_bbox_, video_model, center_crop=center_crop)
File "/home/lxz/projects/LART/lart/utils/wrapper_pyslowfast.py", line 61, in SlowFastWrapper
task = video_model(task)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/visualization/predictor.py", line 110, in call
preds, feats = self.model(inputs, bboxes)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/video_model_builder.py", line 1239, in forward
x, thw = blk(x, thw)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/fairscale/nn/checkpoint/checkpoint_activations.py", line 171, in _checkpointed_forward
return original_forward(module, *args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 547, in forward
x_block, thw_shape_new = self.attn(x_norm, thw_shape)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 407, in forward
attn = cal_rel_pos_spatial(
File "/home/lxz/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 112, in cal_rel_pos_spatial
attn[:, :, sp_idx:, sp_idx:].view(B, -1, q_t, q_h, q_w, k_t, k_h, k_w)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.25 GiB. GPU 0 has a total capacty of 23.62 GiB of which 4.13 GiB is free. Including non-PyTorch memory, this process has 18.56 GiB memory in use. Of the allocated memory 17.68 GiB is allocated by PyTorch, and 428.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Couple questions (video length, HIERA)

Hi @brjathu .

Great work! I had some questions regarding video input requirements/processing and Hiera:

in the paper, it says you take a sequence of 12 frames for prediction (correct me if I am wrong). What is the fps of the training videos? I am confused on how the video processing works as I want to see if it can take in long videos of people performing multiple actions and correlate that over time (~1k+ frames). Does LART have context for previous batches of frames or does it individually pay attention to one batch at a time?
I saw in a previous issue someone asking for the Hiera backend. I see it here now: https://github.com/facebookresearch/hiera/tree/main
Will this be added in this repo soon as well?

Training code needs the "Demo on videos" stuffs

Hello, congratulations on your work!

I cloned the repository and followed the instructions in the "Training and Evaluation" section. However, I encountered a "ModuleNotFoundError: No module named 'lart.ActivityNet'" error, indicating that the 'lart.ActivityNet' file was missing. I resolved this error by following the additional instructions provided in the "Demo on videos" section, which allowed me to download the remaining files. I suggest adding a note to the README advising users to follow the "Demo on videos" section before proceeding with the "Training and Evaluation" section.

Cuda Out of Memory Error when Running the Demo Script

Hi,

Thanks for releasing this great work! When runing the demo script on two 4090 GPU, I obtain the following error

File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 112, in cal_rel_pos_spatialattn[:, :, sp_idx:, sp_idx:].view(B, -1, q_t, q_h, q_w, k_t, k_h, k_w) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.25 GiB (GPU 0; 23.65 GiB total capacity; 17.74 GiB already allocated; 3.99 GiB free; 19.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I wonder if you have any suggestion on the possible reasons leading to the error?

Yufei

models

Hi.author. Will you release the well-trained model?

About AVA datasets

Hello,

First of all, thank you for the interesting research you presented!

Of the benchmarks you used, I'm interested in the AVA dataset.

I downloaded AVA raw video, so how can I create an image frame that matches the annotation you published?

I can not match the human with GT bounding boxes.

What should I do? I will wait for your reply.

Best regards,
Chanwoo