ego4d / episodic-memory Goto Github PK

View Code? Open in Web Editor NEW

101.0 101.0 55.0 99.39 MB

License: MIT License

Python 98.25% Shell 0.85% Jupyter Notebook 0.90%

episodic-memory's People

Contributors

Stargazers

Watchers

episodic-memory's Issues

Is there any way to just download the NLQ subset?

Hi, thanks for sharing your work. In the download sricpt, the choice of --benchmark is 'EM', 'FHO' and 'AV'. How can I download the the NLQ subset without downloading all the video of 'EM'? Thanks for your time, have a nice day.

VQ2D evaluation on validation split takes 70+ hours

Hello I am using the VQ2D evaluation script with default config params (siam_rcnn_residual+kys, data.split="val", data.num_processes=2) on a A100 gpu but it takes 70+ hours.

Is this expected?

Some clikp eval durations:

====> Data uid: val_0000000074 | search window :     411 frames | clip read time:   0.01 mins | detection time:   0.51 mins | peak signal time:   0.00 mins | tracking time:   5.03 mins

====> Data uid: val_0000000075 | search window :     251 frames | clip read time:   0.02 mins | detection time:   0.31 mins | peak signal time:   0.00 mins | tracking time:   0.63 mins

====> Data uid: val_0000000076 | search window :     522 frames | clip read time:   0.01 mins | detection time:   0.65 mins | peak signal time:   0.00 mins | tracking time:   0.82 mins

====> Data uid: val_0000000065 | search window :    1230 frames | clip read time:   0.01 mins | detection time:   1.64 mins | peak signal time:   0.00 mins | tracking time:  16.46 mins

MQ empty annotation

Hello,

I face the following issue while evaluating the prediction:

Retrieval evaluation starts!

a. Generate retrieval!
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/rdai/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
r = call_item()
File "/home/rdai/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in call
return self.fn(*self.args, **self.kwargs)
File "/home/rdai/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 593, in call
return self.func(*args, **kwargs)
File "/home/rdai/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 253, in call
for func, args, kwargs in self.items]
File "/home/rdai/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 253, in
for func, args, kwargs in self.items]
File "/data/stars/user/rdai/PhD_work/Ego4d/code/episodic-memory/MQ/Evaluation/ego4d/generate_retrieval.py", line 88, in _gen_retrieval_video
df = rm_other_category(df, test_anno['annotations'], classes)
File "/data/stars/user/rdai/PhD_work/Ego4d/code/episodic-memory/MQ/Evaluation/ego4d/generate_retrieval.py", line 73, in rm_other_category
df_v = pd.concat(df_v)
File "/home/rdai/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 225, in concat
copy=copy, sort=sort)
File "/home/rdai/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 259, in init
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

Then I check the clip_annotations.json generated by the Convert_annotations.py is empty in the 'annotations'.
I am predicting with the Slowfast feature from ego4d and the baseline code in this repo.

Video to Image conversion (no valid frame for specific sample)

For the VQ2D task, during the conversion from video to image with the command below:
python convert_videos_to_images.py
--annot-paths data/vq_train.json data/vq_val.json
--save-root data/images
--ego4d-videos-root $EGO4D_VIDEOS_DIR
--num-workers 10 # Increase this for speed

I get the error "No valid frames to read for" for the samples below:
4f11d6b8-1f78-4d99-9b34-7b89bc007b25
28ff9118-e876-497c-9901-cb89a62509fc
f4faa012-2e68-4f51-b054-27deaff4204c
e93fdf27-c14c-4794-b4b7-20685b40156f
b1faab00-8666-40e4-8038-c9060e83e6cd
a8175d36-481a-4c4d-9928-f1d31cb2254b
bf7c866e-4623-420c-af44-b9ff5c2fa66c
01111831-9107-43c4-bf0e-6b26e9b32a2b
0a3097fc-baed-4d11-a4c9-30f07eb91af6
609d9772-daa2-45e9-b07a-1ffdadb942b4
111ee12a-6af2-4bfc-8586-e706568fd078
339442c2-7d71-4c1e-821c-b955bdd31f44
0d2231f9-d67a-4df0-b36f-40cbf374ed03
60f4b15e-b23e-4c7f-9cb3-f51ac164bb23
407216c9-b0f1-4fe1-af09-9405a7657953
8f693e8a-5ea4-4b0f-93fd-787fe5f481df
d4c28488-a494-405d-bc8d-27a074aa9c5c

Is this related with my download problem or is there anyone who gets the same error?

Thanx in advance

camera parameters (intrinsics parameters ) for the cameras used for the dataset generation

Hello

Thanks for this great and versatile datasets. I am using this for a computer vision task. So, I am in need of camera parameters (intrinsics parameters ). I will be great help if I can get the parameters for the cameras used.

Thank you

VQ3D: vq2d siam_rcnn_residual_kys_val.json file missing?

Hi,

I am trying to run VQ3D and following the steps mentioned in the VQ3D readme file.
It seems VQ3D requires VQ2D results as I see the following argument is used few times:
--vq2d_results data/vq2d_results/siam_rcnn_residual_kys_val.json

Do I have to run VQ2D project to generate the above file? or can you share the file?

Thanks

vq_test_unannotated.json seems to be missing

python convert_videos_to_clips.py
--annot-paths data/vq_val.json data/vq_test_unannotated.json
--save-root data/clips
--ego4d-videos-root $EGO4D_VIDEOS_DIR
--num-workers 10 # Increase this for speed

The commands above requires the json file which seems to be missing currently in my case.

VQ3D PnP good poses much lower than reported

Hi,

I'm following the data preparation step for VQ3D to generate camera poses. I fixed a few issues along the way but the amount of good poses I got is much lower than 2% (reported in the paper). I could get around 11 - 25 good poses out of 9000 frames and sometimes it's 0 good poses. In order to find the poses, I have the following questions:

I'm assuming the lower number of good poses is due to some errors from previous steps (number of matches, even intrinsic parameters). Is there any way I could check my results before pnp step?
Since the paper has reported results with predicted poses. Is it possible to release the computed poses from paper instead of rerunning the code / steps ourselves?

EgoTracks files missing issue

I think the latest commit to the EgoTracks is missing some files - for example, lt_tracking_metrics.py and others.
Because of this, the training script returns an error.
Maybe the latest commit is not the latest version posted? Or maybe there was a confusion over multiple versions?

If this is not the latest, can you commit the latest version, or if there was a confusion over multiple versions, can you provide which version is runnable for training script?

Thanks!

Leaderboard MQ_baseline differences

Hello, I’m confused with MQ_baseline published results. There is two MQ_baseline on Leaderboard, the first one with Recall 44.24 and average_mAP 23.99, but the other has 24.25 and 5.68

As I understand, MQ_baseline is the official baseline published. Why it’s twice?

Also I clonned the official Github Repo and trained the official model using slowfast features… getting similar results to 24.25 and 5.68 (the worst MQ_baseline). So… which setup has the “better” MQ_baseline? it’s the same model, the same repo? I’d like to reproduce it correctly.

https://eval.ai/web/challenges/challenge-page/1626/leaderboard/3913

Thanks!

VQ 3D and 2D annotation consistency?

Hi,

I'm having a question regarding the consistency between vq2d annotations and 3d annotations. I tried pulling out one 3d annotation (using the center) and projecting it down to any frame in the GT response track. Specifically, I followed the code here to first convert the 3d centroid to the frame coordinate and inverse the operation here to project to 2d space. By comparing the projected 2d centroid and GT bounding box, I have found they are not even close on the 2D image plane.

I have checked the accuracy of the pose by rendering images with poses, it visually looks fine (btw, I render blank RGB images with the visualization code but I can render depth maps.). I'm using GT 2d results so the only possible error source I could find here is the camera pose. How should I interpret this result or if I have done anything wrong here? Should I still assume the camera pose is still not accurate enough?

To replicate:
clip_uid: "1eb995af-0fdd-4f0f-a5ba-089d4b8cb445"
I end up having 3 valid queries in the vq3d val set. I have tried them all and none of them looks close.

This is one of the examples I have, the projected 2d centroid is not even in the frame.

DOWNLOADING THE DATASET

Hello,

Thank you for the wonderful work and for making the dataset publicly available. I have been trying to download the dataset for a month but have been unable to. I signed the license agreement, and it says on the website that I will receive the credentials after ~48 hours, but I haven't received anything yet. I have signed the agreement thrice but have yet to be successful.
Can you please help me? I may be missing something.

Thanks

MQ missing annotation files for the baseline

Hi,

for the MQ baseline, there are files like clip_annotations.json and moment_classes_idx.json (from this code) that are missing from the repo (and I could not find them in the official dataset). Could you provide them or a way to generate them? Thanks!

Best,
Junwei

MQ baseline inference warnings

Hi,
I have followed the latest command to generate clip_annotations.json and training. During inferencing on the validation set, I encountered quite a few of these warnings:

Infer.py:145: RuntimeWarning: invalid value encountered in true_divide
  ovr = inter / (lengths[i] + lengths[order[1:]] - inter)

It seems the NMS failed on some samples. Is this the expected behavior? It did finish without errors. But the evaluation on validation is quite bad, only 0.05 Average-mAP (vs 0.24 in the paper?).

My inference command:

$ python Infer.py --use_xGPN  --is_train false --dataset ego4d --feature_path /youtu_pedestrian_detection/junweil/ego4d/ego4d_data/v1/slowfast8x8_r101_k400/  --checkpoint_path ../baseline_slowfast_vsgn_yesxgpn  --output_path ../baseline_slowfast_vsgn_yesxgpn_output --clip_anno ../../../ego4d_data/v1/annotations/moments_official_clip_annotations.json --moment_classes moment_classes_idx.json --batch_size 2 --infer_datasplit validation

Evaluation command:

$ python Eval.py --dataset ego4d --output_path ../baseline_slowfast_vsgn_yesxgpn_output --out_prop_map true --eval_stage all  --clip_anno ../../../ego4d_data/v1/annotations/moments_official_clip_annotations.json --moment_classes moment_classes_idx.json --infer_datasplit validation

Evaluation outputs:

{'dataset': 'ego4d', 'is_train': 'true', 'out_prop_map': 'true', 'feature_path': '/mnt/sdb1/Datasets/Ego4d/action_feature_canonical', 'clip_anno': '../../../ego4d_data/v1/annotations/moments_official_clip_annotations.json', 'moment_classes': 'moment_classes_idx.json', 'checkpoint_path': 'checkpoint', 'output_path': '../baseline_slowfast_vsgn_yesxgpn_output', 'prop_path': 'proposals', 'prop_result_file': 'proposals_postNMS.json', 'detect_result_file': 'detections_postNMS.json', 'retrieval_result_file': 'retrieval_postNMS.json', 'detad_sensitivity_file': 'detad_sensitivity', 'batch_size': 32, 'train_lr': 5e-05, 'weight_decay': 0.0001, 'num_epoch': 30, 'step_size': 15, 'step_gamma': 0.1, 'focal_alpha': 0.01, 'nms_alpha_detect': 0.46, 'nms_alpha_prop': 0.75, 'nms_thr': 0.4, 'temporal_scale': 928, 'input_feat_dim': 2304, 'bb_hidden_dim': 256, 'decoder_num_classes': 111, 'num_levels': 5, 'num_head_layers': 4, 'nfeat_mode': 'feat_ctr', 'num_neigh': 12, 'edge_weight': 'false', 'agg_type': 'max', 'gcn_insert': 'par', 'iou_thr': [0.5, 0.5, 0.7], 'anchor_scale': [1, 10], 'base_stride': 1, 'stitch_gap': 30, 'short_ratio': 0.4, 'clip_win_size': 0.38, 'use_xGPN': False, 'use_VSS': False, 'num_props': 200, 'tIoU_thr': [0.1, 0.2, 0.3, 0.4, 0.5], 'eval_stage': 'all', 'infer_datasplit': 'validation'}
---------------------------------------------------------------------------------------------
2. Detection evaluation starts!
---------------------------------------------------------------------------------------------
a. Generate detections!
/youtu_pedestrian_detection/junweil/ego4d/mq_task/official_code/MQ/Evaluation/ego4d/generate_detection.py:14: RuntimeWarning: invalid value encountered in double_scalars
  return float(Aand) / Aor
/youtu_pedestrian_detection/junweil/ego4d/mq_task/official_code/MQ/Evaluation/ego4d/generate_detection.py:14: RuntimeWarning: invalid value encountered in double_scalars
  return float(Aand) / Aor
...
b. Evaluate the detection results!
{'take_photo_/_record_video_with_a_camera': 0, 'hang_clothes_in_closet_/_on_hangers': 1, 'browse_through_clothing_items_on_rack_/_shelf_/_hanger': 2, 'withdraw_money_from_atm_/_operate_atm': 3, 'stir_/_mix_ingredients_in_a_bowl_or_pan_(before_cooking)': 4, 'wash_hands': 5, 'clean_/_wipe_other_surface_or_object': 6, 'put_away_(or_take_out)_ingredients_in_storage': 7, 'throw_away_trash_/_put_trash_in_trash_can': 8, 'turn-on_/_light_the_stove_burner': 9, 'arrange_/_organize_items_in_fridge': 10, 'converse_/_interact_with_someone': 11, 'climb_up_/_down_a_ladder': 12, 'plaster_wall_/_surface': 13, 'paint_using_paint_brush_/_roller': 14, 'use_a_vacuum_cleaner_to_clean': 15, 'use_phone': 16, 'watch_television': 17, 'dismantle_other_item': 18, 'drill_into_wall_/_wood_/_floor_/_metal': 19, 'fix_other_item': 20, 'stir_/_mix_food_while_cooking': 21, 'knead_/_shape_/_roll-out_dough': 22, 'clean_/_wipe_kitchen_appliance': 23, 'arrange_/_organize_other_items': 24, 'cut_dough': 25, 'fix_wiring': 26, 'cut_other_item_using_tool': 27, 'read_a_book_/_magazine_/_shopping_list_etc.': 28, 'clean_/_wipe_a_table_or_kitchen_counter': 29, 'walk_down_stairs_/_walk_up_stairs': 30, 'place_items_in_shopping_cart': 31, 'browse_through_groceries_or_food_items_on_rack_/_shelf': 32, '"clean_/_repair_small_equipment_(mower,_trimmer_etc.)"': 33, 'use_hammer_/_nail-gun_to_fix_nail': 34, 'measure_wooden_item_using_tape_/_ruler': 35, 'mark_item_with_pencil_/_pen_/_marker': 36, 'use_a_laptop_/_computer': 37, 'fry_other_food_item': 38, 'put_away_(or_take_out)_food_items_in_the_fridge': 39, 'count_money_before_paying': 40, 'pack_food_items_/_groceries_into_bags_/_boxes': 41, 'pay_at_billing_counter': 42, 'browse_through_accessories_on_rack_/_shelf': 43, 'cut_thread_/_paper_/_cardboard_using_scissors_/_knife_/_cutter': 44, 'dig_or_till_the_soil_with_a_hoe_or_other_tool': 45, '"level_ground_/_soil_(eg._using_rake,_shovel,_etc)"': 46, 'pack_soil_into_the_ground_or_a_pot_/_container': 47, 'plant_seeds_/_plants_/_flowers_into_ground': 48, 'enter_a_supermarket_/_shop': 49, 'browse_through_other_items_on_rack_/_shelf': 50, 'exit_a_supermarket_/_shop': 51, '"put_on_safety_equipment_(e.g._gloves,_helmet,_safety_goggles)"': 52, 'stand_in_the_queue_/_line_at_a_shop_/_supermarket': 53, 'weigh_food_/_ingredient_using_a_weighing_scale': 54, 'arrange_/_organize_clothes_in_closet/dresser': 55, 'fold_clothes_/_sheets': 56, 'fry_dough': 57, 'remove_food_from_the_oven': 58, 'water_soil_/_plants_/_crops': 59, 'play_board_game_or_card_game': 60, 'clean_/_sweep_floor_with_broom': 61, 'eat_a_snack': 62, 'make_coffee_or_tea_/_use_a_coffee_machine': 63, 'fill_a_pot_/_bottle_/_container_with_water': 64, 'drink_beverage': 65, 'cut_open_a_package_(e.g._with_scissors)': 66, 'serve_food_onto_a_plate': 67, 'wash_dishes_/_utensils_/_bakeware_etc.': 68, 'prepare_or_apply_cement_/_concrete_/_mortar': 69, 'move_/_shift_around_construction_material': 70, 'rinse_/_drain_other_food_item_in_sieve_/_colander': 71, 'clean_/_wipe_/_oil_metallic_item': 72, 'pack_other_items_into_bags_/_boxes': 73, 'peel_a_fruit_or_vegetable': 74, '"cut_/_chop_/_slice_a_vegetable,_fruit,_or_meat"': 75, 'cut_/_trim_grass_with_a_lawnmower': 76, '"try-out_/_wear_clothing_items_(e.g._shirt,_jeans,_sweater)"': 77, 'move_/_shift_/_arrange_small_tools': 78, 'play_a_video_game': 79, 'do_some_exercise': 80, 'put_away_(or_take_out)_dishes_/_utensils_in_storage': 81, 'chop_/_cut_wood_pieces_using_tool': 82, 'look_at_clothes_in_the_mirror': 83, 'fix_/_remove_/_replace_a_tire_or_wheel': 84, 'remove_weeds_from_ground': 85, 'harvest_vegetables_/_fruits_/_crops_from_plants_on_the_ground': 86, 'fix_pipe_/_plumbing': 87, 'smooth_wood_using_sandpaper_/_sander_/_tool': 88, 'load_/_unload_a_washing_machine_or_dryer': 89, 'cut_tree_branch': 90, 'collect_/_rake_dry_leaves_on_ground': 91, 'cut_/_trim_grass_with_other_tools': 92, 'smoke_cigar_/_cigarette_/_vape': 93, 'iron_clothes_or_sheets': 94, 'wash_vegetable_/_fruit_/_food_item': 95, 'taste_food_while_cooking': 96, 'compare_two_clothing_items': 97, 'dig_or_till_the_soil_by_hand': 98, 'drive_a_vehicle': 99, 'trim_hedges_or_branches': 100, 'interact_or_play_with_pet_/_animal': 101, '"try-out_/_wear_accessories_(e.g._tie,_belt,_scarf)"': 102, 'tie_up_branches_/_plants_with_string': 103, 'arrange_pillows_on_couch_/_chair': 104, '"make_the_bed_/_arrange_pillows,_sheets_etc._on_bed"': 105, 'put_food_into_the_oven_to_bake': 106, 'fix_bonnet_/_engine_of_car': 107, 'write_notes_in_a_paper_/_book': 108, 'hang_clothes_to_dry': 109}
[INIT] Loaded annotations from validation subset.
        Number of ground truth instances: 4296
        Number of predictions: 104200
        Fixed threshold for tiou score: [0.1, 0.2, 0.3, 0.4, 0.5]
Warning: No predictions of label 'remove_food_from_the_oven' were provided.
Warning: No predictions of label 'put_food_into_the_oven_to_bake' were provided.
[RESULTS] Performance on Ego4D detection task.
Average-mAP: 0.05360927271911966
mAPs are [0.07966768 0.06482197 0.05193738 0.04003949 0.03157985]
mAP at tIoU 0.1 is 0.07966767691516692
mAP at tIoU 0.2 is 0.06482196566549295
mAP at tIoU 0.3 is 0.05193738098020502
mAP at tIoU 0.4 is 0.04003949305069336
mAP at tIoU 0.5 is 0.03157984698404002
Detection evaluation finishes!

---------------------------------------------------------------------------------------------
3. Retrieval evaluation starts!
---------------------------------------------------------------------------------------------
a. Generate retrieval!
b. Evaluate the retrieval results!
[INIT] Loaded annotations from validation subset.
        Number of ground truth instances: 521
        Number of predictions: 521
        Fixed threshold for tiou score: [0.1, 0.2, 0.3, 0.4, 0.5]
Rank 1x @ tIoU 0.3 is 0.32146182495344505
Rank 2x @ tIoU 0.3 is 0.449487895716946
Rank 3x @ tIoU 0.3 is 0.5139664804469274
Rank 4x @ tIoU 0.3 is 0.5616852886405959
Rank 5x @ tIoU 0.3 is 0.5926443202979516
Rank 1x @ tIoU 0.5 is 0.239292364990689
Rank 2x @ tIoU 0.5 is 0.3466014897579143
Rank 3x @ tIoU 0.5 is 0.4029329608938548
Rank 4x @ tIoU 0.5 is 0.44972067039106145
Rank 5x @ tIoU 0.5 is 0.4788175046554935
Rank 1x @ tIoU 0.7 is 0.15409683426443202
Rank 2x @ tIoU 0.7 is 0.2164804469273743
Rank 3x @ tIoU 0.7 is 0.24581005586592178
Rank 4x @ tIoU 0.7 is 0.2723463687150838
Rank 5x @ tIoU 0.7 is 0.2886405959031657
[[0.32146182 0.4494879  0.51396648 0.56168529 0.59264432]
 [0.23929236 0.34660149 0.40293296 0.44972067 0.4788175 ]
 [0.15409683 0.21648045 0.24581006 0.27234637 0.2886406 ]]
Detection evaluation finishes!

Thanks,
Junwei

Is the full_scale video data (5TB) needed for the VQ2D task?

Thanks for this wonderful work!

How to reduce the download size if I want to work only for VQ2D task?

Command given here downloads more than 5TB data: https://github.com/EGO4D/episodic-memory/blob/main/VQ2D/README.md#running-experiments

Metric for EgoTrack

Given the predicted bboxes, how do we go about generating the metrics used for benchmarking? I cant seem to find the implementation in this repo. Is it going to be released soon? Thanks!

VQ3D duplicating baseline: extract video not found in annotation.

Hi,

I am trying to duplicate the VQ3D results follow the instructions. I had a problem running step: "Get intrinsics for each clips". The extracted video frames in "./video_sfm" does not found in annotations: "/data/v1/3d/train_vq3d.json" and "/data/v1/3d/val_vq3d.json".

I've tried to count the number of videos: the number of videos in "./video_sfm" is 91, whereas the total number of videos in "train_vq3d.json" and "val_vq3d.json" is 69.

Errors:
~/episodic-memory/VQ3D/camera_pose_estimation$ python get_median_intrinsics.py --input_dir data/v1/videos_sfm/ --input_dir_greedy data/v1/videos_sfm_greedy/ --annotation_dir data/v1/3d/ --output_filename data/v1/scan_to_intrinsics.json

Traceback (most recent call last):
File "get_median_intrinsics.py", line 59, in
scan_uid=dataset[video_uid]
KeyError: '047c5e54-444f-463e-b677-be38e706a9ea'

I am not sure whether I did something wrong during frames extractions.
Thanks.

Where to find vq3d_<split>.json?

Hi,

Thanks for releasing the dataset! I'm following the step of processing VQ3D subset. However, I have trouble finding the vq3d_.json file and it's not in the annotation folder I have downloaded. Could you guys help me find this file?

MQ - duplicate annotations issue

Hello :) Thank you for maintaining this repository.

I would like to ask about the issue caused by duplicate annotations. Since multiple annotators work for labeling narrations, there are multiple GT action instances corresponding to one real action instance. As shown by the image below, there are three GT action instances for one actual action instance. Some annotations are 99% overlapped while some are slightly overlapped which is caused by ambiguous temporal boundaries of action instances.

I checked that the official evaluation code in this repo does not handle such duplicate cases; this results in the underestimated performance of precise models. May I ask if is there any progress regarding improving the evaluation protocol?

Hmid

A significantly large number of queries of the NLQ task have a response window of 0 seconds.

Thanks for this wonderful work!

A significantly large number of queries of the NLQ task have a response window of 0 seconds according to the annotations provided. Particularly, 1304 (8.59%) windows of the train and validation sets have 0s duration. According to the standard evaluation method (https://github.com/EGO4D/episodic-memory/blob/main/NLQ/VSLNet/utils/evaluate_ego4d_nlq.py), these windows will always produce an IoU of 0 irrespective of the predictions. Is this a problem with the dataset?

Moreover, a large number of windows have very small durations. Particularly, 2641 (17.41%) windows of the train and validation sets have a duration of less than 1 second.

I was wondering if these are errors with annotations? How should we handle these cases?

Bug report: cannot load checkpoints for EgoTracks

Is there any demo code to load the provided checkpoints to a STARK Tracker? I'd like to try the EgoSTARK setup on a different egocentric dataset to see how well it would perform.

NLQ task 2D-TAN yaml file missing

Hi,

Thank you for this wonderful work. For NLQ task baseline 2D-TAN, I can't find the configuration file experiments/ego4d/2D-TAN-40x40-K9L4-pool-window-std-sf.yaml. Is it the same as one of the yaml files for another dataset?

EgoTracks dataset download not working even with proper credentials

Hello. I have gotten credentials and I have successfully downloaded ego4d viz dataset, annotations with the following command.
ego4d --output_directory="~/scratch/data/tracking/ego4d" --datasets viz annotations
I can also check that the metacsv is properly loaded even for the full_scale dataset.
However, I am not able to download the egotracks dataset.
ego4d --output_directory="~/scratch/data/tracking/ego4d" --datasets egotracks

Which gives Error:
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Would there be a reason why something would be possibly going wrong?

VQ2D evaluation: No matching checkpoint file found

Running evaluate_vq2d.py yields the following error:
Traceback (most recent call last): File "evaluate_vq2d.py", line 309, in main metrics, predictions = evaluate_vq_parallel(annotations, cfg) File "evaluate_vq2d.py", line 268, in evaluate_vq_parallel list_of_outputs = list(tqdm.tqdm(pool.imap(_mp_aux_fn, mp_inputs), total=N)) File "/home/astar/.local/lib/python3.8/site-packages/tqdm/std.py", line 1196, in __iter__ for obj in iterable: File "/home/astar/anaconda3/envs/ego4d_vq2d/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value Exception: No matching checkpoint file found
The model file and data files, including json files are all here. What is this checkpoint file I am missing?

Ambiquity in NLQ annotations

Great project! I was looking through the annotations for the NLQ task, and notice that there might be multiple instances in the video that answers the given query. In the paper, it seems that queries are chosen in a way such that answers are unambiguous.

An example of such ambiguity is in video id: 3534864b-2289-4aaf-b3ed-10eeeee7acd2 and query: "Where did I put the scooper". The ground truth is given to be around 1675s.

However, the scooper is seen to be placed onto the tabletop and subsequently on the weighing scale at around timestamp 1785s.

These seem to be appropriate responses to the query that is different from the ground truth. These also seem to fall within the time interval of the clip.

MQ problem formulation

Hi,

I am trying to better understand the exact problem formulation for the Moments Queries task.

The paper (page 26 of the arxiv version) states:

Given an egocentric video V, and a query action category c, the goal is to retrieve all the instances of this action category in the video, assuming that the query is made at the end of the video.

At the same time, the challenge page states the following:

Input: a video clip
Output: a set of predicted action instances, each with the following fields
Action category: the query, representing what type of action is happening in the video segment (e.g., cooking)
Action temporal boundaries: the timestamp when the action starts and the timestamp when the action finishes (e.g., 35'04'' - 43'35'')
Confidence score: the probability that this predicted action is correct (e.g., 0.84)

It is not clear if the queries should be considered as input, after having observed the video, or if the problem should be considered as standard temporal action detection.
In fact queries in the test data seem to be missing.

Could someone clarify this issue?

Thanks,
Federico

Validation set result on the NLQ task using the VSLNet is lower than reported.

Hi,

I tried the VSLNet baseline for the NLQ task and followed the exact steps of https://github.com/EGO4D/episodic-memory/tree/main/NLQ/VSLNet. However, I am getting lower numbers for the validation set than what is reported in the paper. Is there any potential reason for that?

VQ3D: undistort_image_api.py unprotected range

https://github.com/EGO4D/episodic-memory/blob/main/VQ3D/camera_pose_estimation/undistort_image_api.py#L82

If crop_y or crop_x is 0 (default value), this will crop the undistorted_image to have 0 dimension on x axis or y axis.

Where is 'nlq_test_unannotated.json'?

Hi,

I have downloaded the 'annonations' dataset where I can find 'nlq_train.json' and 'nlq_val.json'. But I don't see any 'nlq_test_unannotated.json'. I was wondering how I can create dataset for the test split and evaluate according to this link.

The process of replicating the NLQ code presents a problem with 'num_workers'

I tried to change the value of num_worker, but this error always happened . I need help!!!

Traceback (most recent call last):naconda3\envs\vslnet\lib\multiprocessing\spawn.py", line 126, in _main
  File "main.py", line 240, in <module>arent)
    main(configs, parser)t
  File "main.py", line 29, in main
    dataset = gen_or_load_dataset(configs)
  File "E:\project\episodic-memory-main\NLQ\VSLNet\utils\data_gen.py", line 324, in gen_or_load_dataset
    train_set = dataset_gen_bert(
  File "E:\project\episodic-memory-main\NLQ\VSLNet\utils\data_gen.py", line 260, in dataset_gen_bert
    process.start()
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'dataset_gen_bert.<locals>.worker'
PS E:\project\episodic-memory-main\NLQ\VSLNet> Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "E:\software_for_progamme\anaconda3\envs\vslnet\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

VQ3D and VQ2D annotation mismatch?

Hi,

I'm trying to generate ground truth from VQ3D script and visualize the bbox at least in 2D. However, the assertion often fails on this line here. It looks like the 2D annotation is there but just the ai (annotation index?) is messed up. Not sure if I make any mistakes. Any idea to quick fix this problem?

An example of mismatch in val set:
video_uid: 0066ab25-04ad-41b2-89ab-283d2bfa1c4b
clip_uid: 6c641082-044e-46a7-ad5f-85568119e09e
ai: 1
qset_id: 3

VQ2D val set visualizations are exported for only 12 clips

When I run the evaluation script with default params, there are example_{x}_graph.png, example_{x}_rt.mp4, example_{x}_sw.mp4 visuals exported for x=[0 to 11] (note that process number=12).

How can I make it export visuals for all clips instead of the latest 12 clip? Current implementation overrides the exported visuals for some reason.

Clips in VQ2D annotation seem to be different from benchmark clips.

Hello!
I've checked val_annot.json.gz in vq2d and I found out 'video_frame_number' and '(clip)frame_number' point to other frames, shown in the first image.
But when I used extracted clips from full_scale videos (using convert_videos_to_clips.py), not benchmark clips I downloaded directly from Ego4D, it seems to return frames within the same range, as in the second image.

Is this right? The benchmark clips are not for VQ2D annotations?

ego4d / episodic-memory Goto Github PK

episodic-memory's People

Contributors

Stargazers

Watchers

Forkers

episodic-memory's Issues

Recommend Projects

Recommend Topics

Recommend Org