akarazniewicz / cocosplit Goto Github PK
View Code? Open in Web Editor NEWSimple tool to split COCO annotations into train/test datasets.
Simple tool to split COCO annotations into train/test datasets.
python version:3.7.7
os : windows
pip install sklearn will occur this error :
error: metadata-generation-failed
Encountered error while generating package metadata.
and try install scikit-learn sloved
pip install scikit-learn
I encounter this issue, please help me to solve it:
- When I type the full command:
python cocosplit.py --having-annotaions -s 0.8 D:/IMG_from_VID/COCO_Datasets/data/trainval.json train.json test.json
It shows this mess:
usage: cocosplit.py [-h] -s SPLIT [--having-annotations]
coco_annotations train testcocosplit.py: error: unrecognized arguments: --having-annotaions
- When I remove the argument "--having-annotaions", it shows:
Traceback (most recent call last):
File "cocosplit.py", line 52, in
main(args)
File "cocosplit.py", line 30, in main
info = coco['info']
KeyError: 'info'
Thanks
I have a dataset that has multiple classes of objects within each images. I used the multiclass argument in the command line to separate an annotation file into training and testing and found that using this argument led to images having some annotations removed, for example, an image that had 7 annotations originally only had 4 in the output. I don't know if this is the desired behaviour, but this is not good because say during testing the model segmented and classified 7/7 objects correctly, the labels would say 3 of these were incorrect.
I would like to use the --multi-class option, but I noticed that it would only split the image and the first bounding box declaration into the validation set. If the image had more boxes, those would be left orphaned on the training set file. Removing the flag allows that same image to have all it's box definitions in the validation set. I checked that this same bug is in the ahmad-ra repository so it's not something that came up due to the merge
I don't know if this is the best way of fixing it, but I took the working code as an example and modified the multi-class specific one to filter_annotations in hopes of getting all the annotations for the image. That did work
I tried to use this repository to split a dataset of annotated images in instance segmentation (there are about 30 classes in the problem) and I noticed that the same image can be present in the training and test set.
is this normal?
Is splitting done on the basis of annotations and not on the basis of images (so an image with multiple annotations can be in the training and test set)?
Thank you !
Hi, I wonder what is the license for this repository? Would it be possible to add a proper license if there isn't one?
When running the program I get the error:
FileNotFoundError: [Errno 2] No such file or directory: 'data/coco/images/frame_014171_d3.PNG'
Is there a way for the script to ignore missing images?
ids are supposed to start from 1 and increase sequentially. Many dataloaders and libraries for COCO will give errors and not work with this haphazard id indexing.
It is possible to do cross validation with this?
My first question is if the split is random?
My second question is: Say, I have 1000 annotations in ONE json file, I would like to use the 1-800 annotations for training and the 801-1000 annotations for validating for the 1st train session, then for the next train session I would like to use the 210-1000 annotations for training and 1-200 annotations for validating. Is this possible?
Thank you
When training and testing sets are generated in an 80/20 ratio, if there is more than one category they are balanced, that is, would each category in the json file have 80/20?
I am wondering does your code reproduce same results every time if executed with the same parameters
I mean by same results images I find in train.json will be the same if I re executed your code
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.