The aim of this framework is to aid in the selection of the most suitable frames from a video dataset, while reducing the requirement to annotate every frame. This framework is most effective when dealing with video data that has already been partially annotated, and where the goal is to identify the most appropriate sample for further annotation to improve the performance of the model.
This sorts the videos into different time of day
This step calculates the approprate sampling rate for each video and samples the most representative frames for training.
Converts the initial text annotations into toml files
Select the corresponding annotations of the selected frames (Step 2) from Step 3.
Select the corresponding images of the selected frames (Step 2) from the image folder.
Convert selected toml files to cvat format for reannotations of the selected frames.