Comments (3)
I'm sure there's a better way actually using iterative_train_test_split, but here is my quick and dirty fix. It gets all the files which were set as validation and removes them from the training set before grabbing all their annotations. This does mean the split won't be as requested and will actually be very different between the two modes. This gives priority to the test set
if args.multi_class:
annotation_categories = funcy.lmap(lambda a: int(a['category_id']), annotations)
#bottle neck 1
#remove classes that has only one sample, because it can't be split into the training and testing sets
annotation_categories = funcy.lremove(lambda i: annotation_categories.count(i) <=1 , annotation_categories)
filtered_annotations = funcy.lremove(lambda i: i['category_id'] not in annotation_categories , annotations)
X_train, y_train, X_test, y_test = iterative_train_test_split(np.array([filtered_annotations]).T,np.array([ annotation_categories]).T, test_size = 1-args.split)
img_train = filter_images(images, X_train.reshape(-1))
img_test = filter_images(images, X_test.reshape(-1))
image_test_ids = funcy.lmap(lambda i: int(i['id']), img_test)
img_train = funcy.lremove(lambda a: int(a['id']) in image_test_ids, img_train)
anns_train = filter_annotations(annotations, img_train)
anns_test = filter_annotations(annotations, img_test)
save_coco(args.train, info, licenses, img_train, anns_train, categories)
save_coco(args.test, info, licenses, img_test, anns_test, categories)
print("Saved {} entries in {} and {} in {}".format(len(anns_train), args.train, len(anns_test), args.test))
from cocosplit.
Looks like that was an incomplete fix. With that, any images that have multiple boxes in the validation set are created correctly there, but they will also be created in the training set. The problem seems to be that it's splitting the annotations properly by class, but doesn't take into account that some of those split annotations are in the same file.
from cocosplit.
@fama0 This seems to work well, thanks!
from cocosplit.
Related Issues (11)
- unrecognized arguments and info = coco['info'] HOT 19
- --multi-class option HOT 3
- Does not mantain proper id indexes
- License
- requirements.txt maybe need to update
- Multiclass argument removes annotations from images
- Split many categories HOT 1
- File Not Found Error HOT 1
- Cross Validation HOT 1
- same result every time ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cocosplit.