akarazniewicz / cocosplit Goto Github PK

View Code? Open in Web Editor NEW

238.0 4.0 92.0 10 KB

Simple tool to split COCO annotations into train/test datasets.

Python 100.00%

coco deeplearning datapreprocessing

cocosplit's People

Contributors

Stargazers

Watchers

Forkers

rotgrunblau himanshug396 favervn waspinator shawnnew andydmh ldpalcu dell-research-harvard krishnapsrinivasan dontcryme nnmhuy ngoc-nguyen-nis 1512262 ikramullahniazi hammedb197 mahmoudhanoraa dodoii d-misra allansp84 leesing9 juanmed nguyen-nhat-truong wyx19980727 dome2322 burro-robotics quantumalaviya inderjeetrao cpwan dhuntdegould tjaffri acrousi stefan-matcovici modovado acqto smihael yudongli90 azaleabrowns puck2020 alluprasad tudoulei xolbynz kopaldeep skumarr53 nilsonsales ahmad-ra bijonguha prabhakar-sivanesan deepss-ai mohitburkule hassan11196 mathijsnl praveen5733 gmarziano natriumpikant msaqib17 lisa-janssen smandal94 arunavameister pitkoro adamsau gurpreetshanky arielliu408 samchapman94 themantalope fama0 jav-rez rubyw123 gillesdep theerawatramchuen maroueneo pythonzz0622 matthewleechen lyf6 danieleceul zhaogl123 senbeiiii danielhonies nihalbaig0 fangarenotgnu neosis marinhure sompoonlind klinsc sahandv junetida domenicobarretta miyakawa13 mancubus77

cocosplit's Issues

requirements.txt maybe need to update

python version:3.7.7
os : windows
pip install sklearn will occur this error :

error: metadata-generation-failed

Encountered error while generating package metadata.

and try install scikit-learn sloved
pip install scikit-learn

unrecognized arguments and info = coco['info']

I encounter this issue, please help me to solve it:

- When I type the full command:
python cocosplit.py --having-annotaions -s 0.8 D:/IMG_from_VID/COCO_Datasets/data/trainval.json train.json test.json

It shows this mess:

usage: cocosplit.py [-h] -s SPLIT [--having-annotations]
coco_annotations train test

cocosplit.py: error: unrecognized arguments: --having-annotaions

- When I remove the argument "--having-annotaions", it shows:

Traceback (most recent call last):
File "cocosplit.py", line 52, in
main(args)
File "cocosplit.py", line 30, in main
info = coco['info']
KeyError: 'info'

Thanks

Multiclass argument removes annotations from images

I have a dataset that has multiple classes of objects within each images. I used the multiclass argument in the command line to separate an annotation file into training and testing and found that using this argument led to images having some annotations removed, for example, an image that had 7 annotations originally only had 4 in the output. I don't know if this is the desired behaviour, but this is not good because say during testing the model segmented and classified 7/7 objects correctly, the labels would say 3 of these were incorrect.

--multi-class not splitting images with multiple boxes correctly

I would like to use the --multi-class option, but I noticed that it would only split the image and the first bounding box declaration into the validation set. If the image had more boxes, those would be left orphaned on the training set file. Removing the flag allows that same image to have all it's box definitions in the validation set. I checked that this same bug is in the ahmad-ra repository so it's not something that came up due to the merge

I don't know if this is the best way of fixing it, but I took the working code as an example and modified the multi-class specific one to filter_annotations in hopes of getting all the annotations for the image. That did work

--multi-class option

I tried to use this repository to split a dataset of annotated images in instance segmentation (there are about 30 classes in the problem) and I noticed that the same image can be present in the training and test set.
is this normal?
Is splitting done on the basis of annotations and not on the basis of images (so an image with multiple annotations can be in the training and test set)?
Thank you !

License

Hi, I wonder what is the license for this repository? Would it be possible to add a proper license if there isn't one?

File Not Found Error

When running the program I get the error:

FileNotFoundError: [Errno 2] No such file or directory: 'data/coco/images/frame_014171_d3.PNG'

Is there a way for the script to ignore missing images?

Does not mantain proper id indexes

ids are supposed to start from 1 and increase sequentially. Many dataloaders and libraries for COCO will give errors and not work with this haphazard id indexing.

Cross Validation

It is possible to do cross validation with this?

My first question is if the split is random?

My second question is: Say, I have 1000 annotations in ONE json file, I would like to use the 1-800 annotations for training and the 801-1000 annotations for validating for the 1st train session, then for the next train session I would like to use the 210-1000 annotations for training and 1-200 annotations for validating. Is this possible?

Thank you

Split many categories

When training and testing sets are generated in an 80/20 ratio, if there is more than one category they are balanced, that is, would each category in the json file have 80/20?

same result every time ?

I am wondering does your code reproduce same results every time if executed with the same parameters
I mean by same results images I find in train.json will be the same if I re executed your code

akarazniewicz / cocosplit Goto Github PK

cocosplit's People

Contributors

Stargazers

Watchers

Forkers

cocosplit's Issues

requirements.txt maybe need to update

unrecognized arguments and info = coco['info']

Multiclass argument removes annotations from images

--multi-class not splitting images with multiple boxes correctly

--multi-class option

License

File Not Found Error

Does not mantain proper id indexes

Cross Validation

Split many categories

same result every time ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent