ssaru / convert2yolo Goto Github PK

View Code? Open in Web Editor NEW

457.0 8.0 183.0 994 KB

This project purpose is convert voc annotation xml file to yolo-darknet training file format

Python 100.00%

convert2yolo's Introduction

Convert2Yolo

Object Detection annotation Convert to Yolo Darknet Format

Support DataSet :

COCO
VOC
UDACITY Object Detection
KITTI 2D Object Detection

Pre-Requiredment

pip3 install -r requirements.txt

Required Parameters

each dataset requried some parameters

see example.py

--datasets
- like a COCO / VOC / UDACITY / KITTI
```
--datasets COCO
```
--img_path
- it directory path. not file path
```
--img_path ./example/kitti/images/
```

--label

it directory path. not file path

(some datasets give label *.json or *.csv . this case use file path)

--label ./example/kitti/labels/

--label ./example/kitti/labels/label.json

or

--label ./example/kitti/labels/label.csv

--convert_output_path
- it directory path. not file path
```
--convert_output_path ./
```
--img_type
- like a *.png, *.jpg
```
--img_type ".jpg"
```
--manifest_path
- it need train yolo model in darknet framework
```
--manifest_path ./
```
--cla_list_file(*.names)
- it is *.names file contain class name. refer darknet *.name file
```
--cls_list_file voc.names
```

*.names file example

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

Example

1. example command

python3 example.py --datasets [COCO/VOC/KITTI/UDACITY] --img_path <image_path> --label <label path or annotation file> --convert_output_path <output path> --img_type [".jpg" / ".png"] --manifest_path <output manipast file path> --cls_list_file <*.names file path>

>>
ex) python3 example.py --datasets KITTI --img_path ./example/kitti/images/ --label ./example/kitti/labels/ --convert_output_path ./ --img_type ".jpg" --manifest_path ./ --cls_list_file names.txt

2. VOC datasets

description of dataset directory

suppose that VOC dataset location are ~/VOC and VOC folder contains VOCdevkit folder

here are structure for VOCdevkit

VOCdevkit

$ tree -L 2
.
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    └── SegmentationObject

we use only Annotations and JPEGImages folder

Annotations : Object Detection label folder
JPEGImages : Image data

Annotations

$ tree -L 1
.
├── 2007_000027.xml
├── 2007_000032.xml
├── 2007_000033.xml
...
├── 2012_004319.xml
├── 2012_004326.xml
├── 2012_004328.xml
├── 2012_004329.xml
├── 2012_004330.xml
└── 2012_004331.xml

JPEGImages

.
├── 2007_000027.jpg
├── 2007_000032.jpg
├── 2007_000033.jpg
...
├── 2012_004328.jpg
├── 2012_004329.jpg
├── 2012_004330.jpg
└── 2012_004331.jpg

make `*.names` file

now make *.names file in ~/VOC/

refer darknet voc.names file

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

VOC datasets convert to YOLO format

now execute example code.

this example assign directory for saving YOLO label ~/YOLO/ and assign manifest_path is ./

make YOLO folder

$ mkdir ~/YOLO

VOC convert to YOLO

python3 example.py --datasets VOC --img_path ~/VOCdevkit/VOC2012/JPEGImages/ --label ~/VOCdevkit/VOC2012/Annotations/ --convert_output_path ~/YOLO/ --img_type ".jpg" --manifest_path ./ --cls_list_file ~/VOC/voc.names

>>
VOC Parsing:   |████████████████████████████████████████| 100.0% (17125/17125) Complete
YOLO Generating:|████████████████████████████████████████| 100.0% (17125/17125)Complete
YOLO Saving:   |████████████████████████████████████████| 100.0% (17125/17125) Complete

Result

now check result files (~/YOLO/, ./manifest.txt)

~/YOLO/

$ tree -L 1
>>
├── 2012_004326.txt
├── 2012_004328.txt
├── 2012_004329.txt
├── 2012_004330.txt
└── 2012_004331.txt
...
├── 2012_004326.txt
├── 2012_004328.txt
├── 2012_004329.txt
├── 2012_004330.txt
└── 2012_004331.txt

2012_004331.txt

$ cat 2012_004331.txt

>>
14 0.31 0.34 0.212 0.547

./manifest.txt

$ cat ./manifest.txt

>>
~/VOC/VOCdevkit/VOC2012/JPEGImages/2010_000420.jpg
~/VOC/VOCdevkit/VOC2012/JPEGImages/2010_003674.jpg
~/VOC/VOCdevkit/VOC2012/JPEGImages/2012_002128.jpg
...
~/VOC/VOCdevkit/VOC2012/JPEGImages/2009_000104.jpg
~/VOC/VOCdevkit/VOC2012/JPEGImages/2012_000212.jpg

3. COCO datasets

description of dataset directory

suppose that COCO dataset location are ~/COCO and COCO folder contains annotations, val2017 folder

here are each structure for annotations and val2017

annotations

$ cd ~/COCO/annotations/
$ tree -L 1
.
└── instances_val2017.json

val2017

.
├── 000000000139.jpg
├── 000000000285.jpg
├── 000000000632.jpg
├── 000000000724.jpg
...
├── 000000581357.jpg
├── 000000581482.jpg
├── 000000581615.jpg
└── 000000581781.jpg

make `*.names` file

now make *.names file in ~/COCO/

refer darknet coco.names file

person
bicycle
car
motorbike
aeroplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
sofa
pottedplant
bed
diningtable
toilet
tvmonitor
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
motorcycle
potted plant
dining table
tv
couch
airplane

COCO datasets convert to YOLO format

now execute example code.

this example assign directory for saving YOLO label ~/YOLO/ and assign manifest_path is ./

make YOLO folder

$ mkdir ~/YOLO

COCO convert to YOLO

python3 example.py --datasets COCO --img_path ~/COCO/val2017/ --label ~/COCO/annotations/instances_val2017.json --convert_output_path ~/YOLO/ --img_type ".jpg" --manifest_path ./ --cls_list_file ~/COCO/coco.names

>>
COCO Parsing:  |████████████████████████████████████████| 100.0% (36781/36781) Complete
YOLO Generating:|████████████████████████████████████████| 100.0% (4952/4952)  Complete
YOLO Saving:   |████████████████████████████████████████| 100.0% (4952/4952)  Complete

Result

now check result files (~/YOLO/, ./manifest.txt)

~/YOLO/

.
├── 000000000139.txt
├── 000000000285.txt
├── 000000000632.txt
├── 000000000724.txt
...
├── 000000581206.txt
├── 000000581317.txt
├── 000000581357.txt
├── 000000581482.txt
├── 000000581615.txt
└── 000000581781.txt

000000581781.txt

46 0.446 0.557 0.465 0.209
46 0.517 0.851 0.363 0.128
46 0.939 0.05 0.122 0.071
46 0.786 0.027 0.11 0.054
46 0.171 0.247 0.19 0.139
46 0.865 0.773 0.27 0.372
46 0.111 0.552 0.215 0.333
46 0.51 0.744 0.376 0.207
46 0.811 0.377 0.25 0.36
46 0.955 0.388 0.09 0.181
46 0.195 0.333 0.153 0.224
46 0.036 0.183 0.065 0.357
46 0.496 0.45 0.389 0.132
46 0.499 0.52 0.998 0.956

./manifest.txt

~/COCO/val2017/000000289343.jpg
~/COCO/val2017/000000061471.jpg
~/COCO/val2017/000000472375.jpg
~/COCO/val2017/000000520301.jpg
~/COCO/val2017/000000579321.jpg
~/COCO/val2017/000000494869.jpg
...
~/COCO/val2017/000000097585.jpg
~/COCO/val2017/000000429530.jpg
~/COCO/val2017/000000031749.jpg
~/COCO/val2017/000000284282.jpg

TODO

Refactoring (Release v2.0.0)

Add strict Type Annotation in code
Separate role in class more strictly
Rewrite README.md for more helpful use first
Resolve the problem that strictly validation check when the trivial error
Supported Multiprocessing
Skip object class that don't want

convert2yolo's People

Contributors

Stargazers

Watchers

Forkers

lzane sthayashi isvoid poumy0610 offchan42 hanwsf 3d-a crawlab dataset-fun zed654 cloudpose deanwebb kkk324 anhuipl2010 memina yashiang1986 srishtigoel72 aic25 chencq1234 baolinhtb thankspei fenix0817 alzheimancer ashnair1 brianoppenheim sfmb-mx walkingmachine bruzat john-brooks cicerolneto dennis-chiang wongjl jianxiongcai luoolu fieldsye asi-sx sezus sunn-e my-nlp hujunyao yaroslav-behter ucaglarcaliskan ekocahyonugroho phamhungdg96 vashisht-rahul tyomj merwanski guanfrank smalldroid zengtaotao3390 opheliendjonkep lovehrtf ixtiyoruz joywalker nvlong21 eyesho danieldwf yw0nam amirstudy macwinwin faisalsouz bearzeng madfalc0n rlgalvez vegavk gitwkp lihyin anubhavtalukdar hayleyshim civilpat zhaoyisong rebeen lzbgt faizifp sushant4788 nlgrf manishs86 j-dib junhua-zhang tanik12 jaysonfrancis rambam613 zhouyingchaoai ritesh1991 blue-marker wildflowerschools wuxiaolianggit paulxiong tuanhoang00 edwardnguyen1705 rustybanana ashishpratikpatil eppen matthid zhanghaijason wanglaotou whoyayawho wirikim dontcryme ychuan1115

convert2yolo's Issues

Why the number of output txt are less than the number of input ?(4952/5000 from coco-instances_val2017)

I convert instances_val2017.json in coco dataset, there are 5000 images in validation set, but I only got 4952 output txt. I am wondering why there's difference between them.(I expect to have 5000 output txt)

design abstract class

Remove the ifs in voc.py please

https://github.com/SsaRu/convert2Yolo/blob/441ea073f6f3c7b6197677cc05605fbb41181135/voc.py#L16

Thanks for your share! However, there is a problem in the lines I pointed at. You never get the arguments in that voc.py. Just remove the ifs and the script runs fine!

'ascii' codec can't encode character

Hi,

First of all, thank you for making this script.

I downloaded the 2014 database from Coco's own site. I ran the script below. Process is starting but a character error message appears at % 3 process.

So how can i resolve this problem ?

python3 example.py --datasets COCO --img_path /coco/images/train2014/ --label /coco/annotations/instances_train2014.json --convert_output_path ./ --img_type ".jpg" --manipast_path ../ --cls_list_file ../data/coco_15c.names

COCO Parsing Result : False, msg : ERROR : 'ascii' codec can't encode character '\u2588' in position 17: ordinal not in range(128), moreInfo : <class 'UnicodeEncodeError'> Format.py 366

Convert only specific category

Hi ,
Thanks for the effort .
Lets say I am working with VOC dataset.
As far as I understood this repository will help to convert all the annotations in annotations folder to yolo format. But is it possible to convert only a specific category (for eg: person) and seperate images and annotations in different folder ?

Thanks
Rahul

convert2Yolo specific classes

Hi,

I would like to use the coco dataset for YOLO, but only specific classes and not the entire dataset.

Is it possible to do that with convert2Yolo?
I tried to create a file coco.names and I wrote only the classes that interested me, but it didn't worked.
When I used convert2yolo, I got this error :

COCO parsing : 100.0% complete

Yolo Generating Result : False, msg : ERROR : 'hot dog' is not in list, moreInfo : <class 'ValueError'> Format.py

Hot dog is not a class that I write in tje coco.names.

VOC for image with no objects is fatal

I don't think it should be a requirement that every image has 1 or more labelled objects.

Code works fine (and outputs a 0-length YOLO output) simply by changing Format.py from returning error condition to printing a warning:

                 if len(objects) == 0:
-                    return False, "number object zero"
+                    #return False, "number object zero"
+                    print("WARNING: no objects in file: {}".format(filename))

원하는 클래스만 저장

안녕하세요.
좋은 코드를 올려주셔서 감사합니다.
제가 실행해봤는데 잘 되더라고요.

다만 긍금한 거 하나 있습니다.
제가 사실은 voc datasets의 있는 car, bus, motorbike 클래스에대한 image와 annotation를 따로 저장했습니다.
annotation를 yolo형식으로 바꿀때 3가지 car,bus, motorbike에대한만 나올 수 있도록 voc.names 파일을 "car,bus,motorbike" 3가지만 저정했는데 오류 떴더라고요.
그래서 voc.names파일을 그대로 사용하고 코드를 어떻게 수정해야되나요?

도와주셔서 너무 너무 감사합니다.

Solve issue 8

VOC Parsing error about 'utf-8'

I was trying to run your example.py, and gave all relevant arguments to it. This is what I get as an error:

VOC Parsing Result : False, msg : ERROR : 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte, moreInfo : <class 'UnicodeDecodeError'> Format.py 232

Do you have any idea why this could be?
Thanks.

why the output in the txt are seperated

hello
I use voc 2 yolo and i have two classes.
The output of the same image are separated into 2 txt with all 0 and all 1
it canont be used in the yolo training progress
thx!

Example code outdated

Hi,

I tried to copy paste the example code from your readme but I found out (from the error messages and help) that you renamed these arguments

--image_path became --img_path
--image_type became --img_type
--clas_list_file became --cls_list_file

I prepared a pull request but Git tells me I don't have access to your repo

VOC Parsing Result : False, msg : ERROR : , moreInfo : <class 'StopIteration'> Format.py 224

When I run the example.py this is the error i get

A lot of path and file handling problems

There are a lot of path problems in your code. For example you should combine the paths with a method, not just with the + operator.

And the output path for the text files is wrong in the convert2Yolo.py. It should be:

result_outpath = str(label_dir + xml_name[:-3] + "txt")

Or for example you are not just reading .xml files.

img_type suggestion

I would add more flexibility to the images types (at the moment I need to choose one). Images could be .jpg, .png, .jpeg, etc. in the same dataset.

ERROR : 'dining table' is not in list

After running the script i got this error msg

YOLO Generating Result : False, msg : ERROR : 'dining table' is not in list, moreInfo : <class 'ValueError'> Format.py 704

any solution ?

VOC Parsing Result : False, msg : ERROR : , moreInfo : <class 'StopIteration'> Format.py 224

Hello,

I get this error when trying to parse VOC. Can somebody help me with that? Thanks

manifast/manipast typo

grep -iE "manifast|manipast" *py

Is this meant to be manifest?

.git file is too big

There are nearly 240MB inside .git/objects. Cloud you please clear them all otherwise we can't use git clone directly since it will download all include this giant file.

Thanks~

Problem with running code

I have this issue with following line of code.
python example.py --datasets VOC --img_path invoices-PascalVOC-export/JPEGImages --label invoices-PascalVOC-export/Annotations --convert_output_path yolo_converted --img_type ".jpg" --manipast_path yolo_converted --cls_list_file C:\Users\fkhalil\primeStone\deep_net\darknet\data\invoiceLabels\invoices-PascalVOC-export\classes.names

Error message:
VOC Parsing Result : False, msg : ERROR : , moreInfo : <class 'StopIteration'>  Format.py       224

Note: Classes contains the name of all classed I have use in Vott application to label the images.

Better error handling

Hello,
I am trying to convert over 500,000 files and each little error stops the processing and I have to then hunt for the missing item in my xml files. For example, it appears that one or more of my xml files were missing the tag, so instead of identifying the file(s), the script throws and error and quits.
In another very common error, the tag is missing ion one or more files
Another error type: The value in the classes file does not match the name value in the xml file. This is very common, in case of a typo. For example, one of my files could have a misspelling. But the system stops processing.

Ideally, since it is very common to process tens of thousands of files, a better way of handling this would be to ( a ) print the name of the offending file on the screen ( b ) or better still, rename the offending file, by adding a suffix, so that one can go through all invalid files and fix them.

Hope you'll take my suggestion.

Right now, I'm processing 500K files and am on my 15th iteration, as it stops after every single error
Thanks

Error in VOC Parsing

I'm converting this dataset (https://www.di.ens.fr/willow/research/headdetection/) from xml to txt to use it in YOLO but I'm getting this error.
VOC Parsing Result : False, msg : ERROR : 'NoneType' object has no attribute 'text', moreInfo : <class 'AttributeError'> Format.py 258

this is line 258 in Format.py
tmp = {
"name": _object.find("name").text
}

Do you know how to solve it?

Why an extra 6 classes?

I saw that in a previous github issue, 6 classes had to be added to the 80 existing classes in the coco.names list (though they are only rewordings of objects already in the dataset):

motorcycle
potted plant
dining table
tv
couch
airplane

This causes some of the .txt annotation files to have classes numbered up to 86, which causes errors when I try to run known implementations of YOLO (like the AlexeyAB repository, which expects classes only up to 80). Can somebody please explain?

Running this code on python 2.7 gives a syntax error

Hello,
Thank you very much for the effot you put into this code but I tried to run this code on python 2.7 so I wrote the command: python example.py --datasets VOC --img_path ~/VOC/Images/ --label ~/VOC/Annotations/ --convert_output_path ~/VOC/YOLO/ --img_type ".jpg" --manipast_path ./ --cls_list_file ~/VOC/voc.names and got the following syntax error:
Traceback (most recent call last):
File "example.py", line 11, in
from Format import VOC, COCO, UDACITY, KITTI, YOLO
File "/home/abanoub/Desktop/convert2Yolo-master/Format.py", line 71
print('\r%s|%s| %s%% (%s/%s) %s' % (prefix, bar, percent, iteration, total, suffix), end = '\r')
^
SyntaxError: invalid syntax

Do know what would be a solution to such a problem?
Cheers!

@raggot

@raggot
How did you fix this problem:
python3 example.py --datasets KITTI --img_path /KITTI/images_new --label /KITTI/labels --convert_output_path ~/kitti --img_type ".jpg" --cls_list_file KITTI/my-classes.names
KITTI Parsing Result : False, msg : ERROR : , moreInfo : <class 'StopIteration'> Format.py 484

Please share. Thank You

Originally posted by @lionverve2015 in #6 (comment)

ERROR : 'utf-8' codec can't decode byte 0xb0 in position 37

Hi,

I have the following issue, do you know where the issue is?
VOC Parsing Result : False, msg : ERROR : 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte, moreInfo : <class 'UnicodeDecodeError'> Format.py 233

I am using the VOTT Tool to label my pictures. With the VOTT I am exporting the labels to Pascal VOC in order to convert it to Yolo.

Can you help please?

Thank you very much!
Khani

More specific example

Hello @ssaru ,

I'm trying to execute the script like this:
python example.py --datasets KITTI --img_path ./images/000021.jpg --label ./images/000021.txt --convert_output_path test.txt --img_type ".jpg" --manipast_path manipast.txt --cls_list_file names.txt

But it doesn't work. I don't understand what is manipast_path.
Would it be possible to add a more specific example where all the files are present in the git repository?

Thank you.

A slight change needed in the convert2Yolo.py script

If for any reason an XML file would not have "object" entity, the script chokes at

I'd suggest to close the xml"file before removing it from them "xml_path".

Thanks.

ssaru / convert2yolo Goto Github PK

convert2yolo's Introduction

Convert2Yolo

Pre-Requiredment

Required Parameters

*.names file example

Example

1. example command

2. VOC datasets

description of dataset directory

make *.names file

VOC datasets convert to YOLO format

Result

3. COCO datasets

description of dataset directory

make *.names file

COCO datasets convert to YOLO format

Result

TODO

convert2yolo's People

Contributors

Stargazers

Watchers

Forkers

convert2yolo's Issues

Recommend Projects

Recommend Topics

Recommend Org

make `*.names` file

make `*.names` file