tylin / coco-caption Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 544.0 117.23 MB

License: Other

Python 7.40% Jupyter Notebook 92.51% Shell 0.09%

coco-caption's People

Contributors

Stargazers

Watchers

Forkers

jeffdonahue chagge vsubhashini starkmchen guokr1991 yaoli s-gupta winggyn aishwaryaagrawal plsang raingo freeliying dupuleng chunde braingineer liu09114 kirk86 qingsong99 caomw vrama91 marshimarocj littlecherry11 arunlodhi yangzlthu bogger liuchang8am shivadehghani cvml ericclei snakeztc michaelxin olivernina lvapeab sunfox66 chaonan99 elliottd yangwang166 tigerneil stoneyang-cv iaalm fantine16 vyraun lon9 zomux kekedan silasxue zhoulian elani0 tianlei822 vpm238 tonydeep kevinwenya litoeknee yangliuy zhhezhhe vanpersie32 meiranrubinstein tkanchin lwwang wujiahongpku deeplearningcourse amaiasalvador aaxwaz yanyankangkang khanimar yugnaynehc ml-lab dongzhuoyao sumedhaagarwal phyllish miracle24 mastov davidfsemedo kiyukuta trantorrepository kaiqiangsong tuetschek peipei1109 se4u amds123 mensanyan th4nos weili-nlp deepylt dimplesl zgsxwsdxg doneladams wgharbieh hadyelsahar yeyuel leezqcst xyy19920105 qizailiu nke001 suejing soskek shubhampachori12110095 logonod vargnatt lxtgh

coco-caption's Issues

Tokenizer: `OSError: [Errno 2] No such file or directory`

Like in #22 (comment) , I got

OSError: [Errno 2] No such file or directory: 'java'

When running the evaluation, even with java definitely installed. This SO discusses it: https://stackoverflow.com/a/55675914/2332296. I was able to solve it by setting shell=True and making cmd in

p_tokenizer = subprocess.Popen(cmd, cwd=path_to_jar_dirname, stdout=subprocess.PIPE, shell=True)

from

cmd = ['java', '-cp', 'stanford-corenlp-3.4.1.jar', 'edu.stanford.nlp.process.PTBTokenizer', '-preserveLines', '-lowerCase', 'tmpWS5p0Z']

(where 'tmpWS5p0Z' is the name of the tempfile.NamedTemporaryFile that is added), into:

cmd = ['/abs/path/to/java -cp /abs/path/to/stanford-corenlp-3.4.1.jar edu.stanford.nlp.process.PTBTokenizer -preserveLines -lowerCase /abs/path/to/temporary_file']

Add SPICE

Hi,

Thank you for this code base. The SPICE metric seems to be getting quite a bit of use, including on the coco leaderboard. Rather than maintaining a separate fork that includes the SPICE metric, can I create a pull request to include it in this repo?

Thanks

CIDEr range

What's the CIDEr score range? not from 0 to 1?

METEOR comms errror?

Hi,

you read the score back from METEOR jar on this line:
https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L68

but I believe METEOR wants to give the score twice (and should be read twice), because it first gives back score for all calls of SCORE, and then it gives the average score for all calls to SCORE. In any case, this function doesn't seem to be used right now in the implementation, but if anyone wanted to use it just to evaluate a single sentence against a few reference (e.g. me :) ), then the call

score = float(self.meteor_p.stdout.readline().strip())

should be repeated once again right after the first one (they both give the same result). In any case, just a quick note, but it doesn't seem this code is used.

Add support for all Meteor languages

The evalaution toolkit only supports Meteor evaluation in English but image description is now being studied in non-English languages German, German again, Chinese, Japanese, and Turkish.

Would it be possible to extend Meteor support for other languages? I imagine this would involve two changes:

Replace the bundled version of Meteor with a bash script download of the entire package
Add a --language option to the evaluation script (default=en)

would the system work if refs has only one caption

I notice that coco-caption/pycocoevalcap/bleu/bleu.py has a line 'assert(len(ref)) > 1', which in my case, I have only one ref per image. Would the evaluation pipeline work if I just simply commented out this line? If not could you point out the necessary modification of the code to make it compatible with only one ref?

Thanks.

嗨，请问下windows下面怎么运行起来，

How to contribute

Hi,
Thank you very much for the evaluation script. For my applications, I needed a Python 3.5 version of this script so I converted this code to be compatible with Python 3.5 and was wondering whether I can create a pull request to include it either in this repo or somewhere else.

Thanks again

Can I use coco-caption for Flickr dataset?

I am comparing different architectures in Image Caption Generation for my thesis and I am using Flickr8K and Flickr30K datasets. Now I need to evaluate the scores. Can I use coco-caption for Flickr Dataset? How can I use it? Is there any instructions?

Is there demo code to convert annotations, eg, Flickr8K, to COCO format?

I tried to write my own but can not make it work. Thanks.

hello， Brother, is the label of c40 public

SemgrexPattern Class not found exception

Meteor metric crash

Hi,

I'm trying to use the METEOR metric but I'm facing some issues: The pycocoevalcap/meteor/meteor.py crashes at line 42, because self.meteor_p.stdout.readline() returns an empty string:

ValueError: could not convert string to float:

It seems that the meteor_p subprocess doesn't receive the eval_line at line 40. I've seen an issue #1 that also had a similar problem with the Meteor script, but I don't know how it was solved or if it was due to different reasons than mine.

I have the same problem either by running cocoEvalCapDemo.ipynb and by running the metrics on my own data. I use the meteor-1.5.jar that is included in your package. If I use it directly from the command line (without the python wrapper), it works. The rest of metrics work properly.

Do you have any ideas/suggestions about this error?

Thank you very much

Which smoothing technique are you using for BLEU metric?

I've seen that there are a few different approaches to the smoothing function and I could'nt infer which one do you use by your code. Could you please explain to me which it is and why?

How to calculate the spice scores in other languages

The code seems only support spice evaluation in English, how can i obtain the spice scores in languages such as Chinese, Japanese, and Spanish.

Python-3.x support

Hi all,
I'd like to know whether you have plans to port the codebase to Python-3. Since most of the people have switched to Python-3, it'd be nice to have Python-3 support so that other projects (for e.g. ImageCaptioning PyTorch ) dependent on coco-caption can also be implemented in Python-3.

Thanks!

please add docstrings

it's hard to know what all of the acronyms mean

Wrong image id to score associations

Hi,

I noticed that COCOEvalCap zip's the image ids and scores together so that individual scores can be accessed:
https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/eval.py#L66

I think this is done incorrectly. The scores are ordered according to the output of multiple independent calls to dict.keys() in each scorer, e.g.
https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/cider/cider.py#L33

However I think dict.keys() returns keys in an arbitrary order.

In my experience using the code, the associations in imgToEval were wrong, which caused me some confusion. An easy fix would be to replace each call to dict.keys() with sorted(dict.keys()).

Refer to: CIDEr is 0 if eval on one sample

Hi, in my previous Issue #9 , the provided solution seems not working.

The codes provided here still won't work with one image (test/ref pair).

It seems to me that, in evalscripts.py the df_mode parameter is not doing anything. The 'coco-val-df' in params.json is not used, and there's no code that loads the 'coco-val-df.p'.

Thanks!

CIDER bigger than 1

Is it possible for CIDER to get above 1? I got numbers like 1.06, 1.29, 1.31, 1.35, while Blues, Meteor and Rouge are all below 1.

Blue score looks different in formula

I don't understand two line of code about BLEU score:

They are very different if you look the original formula (3) in https://vision.cornell.edu/se3/wp-content/uploads/2015/04/1504.00325v2.pdf

Can you help me to explain why?

what is meaning of 'gts, res' in code?

which is the reference sentences?

METEOR eats up memory?

I've found yaoli post a problem here about METEOR. It says that METEOR script fails to kill the sub-process and may eats up memory. Is that truly an issue?

self.meteor_p.stdin.write('{}\n'.format(score_line))

when i run "self.meteor_p.stdin.write('{}\n'.format(score_line))" on windows, i just get "IOError: [Errno 22] Invalid argument",everyone know it?

The source of meteor-1.5.jar

Hello,

I was using the same JAR file in one of my projects since a couple of years. Today I discovered that this JAR file's -stdio mode behaves quite differently from the below ones:

Original v1.5 tarball at http://www.cs.cmu.edu/~alavie/METEOR/download/meteor-1.5.tar.gz
A fresh build from https://github.com/cmu-mtlab/meteor
The one shipped within multeval : https://github.com/jhclark/multeval

It took me whole my day to understand what is going on and the same difference seems to be the cause of many commented/uncommented code sections in several Python wrappers.

So do you remember if you modified the source code of the original METEOR code before building the JAR that you are shipping?
EDIT: @endernewton apparently the code was modified according to your comment. Would that be possible to share the modification then?

Let me try to clarify the behavior difference as well:

In -stdio mode, we provide the METEOR binary a list of lines in the following format:

SCORE ||| ref words ||| hyp words

and for each segment we obtain a set of stats in return. For this part, all binaries produce exactly the same output.

According to official documentation we now have to provide the so-called EVAL lines. To be honest, the official documentation is ambiguous about this aspect, i.e. it's not clear whether an EVAL line per segment (Method1) or an EVAL line per all segments (Method2) should be provided.

Your Python wrapper does the latter (Method2): EVAL ||| stats_1 ||| stats_2 ||| .... ||| stats_N\n
Other wrappers does the former Method1:

EVAL ||| stats returned for segment 1
EVAL ||| stats returned for segment 2
...
EVAL ||| stats returned for segment N

And your modified version produces first the segment scores and then a final score which seems to be the actual METEOR score. This score is not equal to the mean of segment scores because of fragmentation penalty. In this case, that final precious line of your modified JAR matches the score that you get if you run METEOR in non-stdio mode. Good.

While for the original code, Method2 does not even work, produces only the segment score for the first segment and it stops since it can not parse the rest of it probably waiting for \n. Method1 works, but it does not produce that final score line that you probably added to your branch. So all other wrappers around github using the original code takes the mean of the segment scores and that score is not penalized i.e. not comparable to cocoeval tools. Unpenalized meteor is ~1 better than the actual METEOR score in one of my German test cases.

In short, I think you need to clarify these points in README.md and also, if possible provide the sources of your modified JAR. I never imagined that you would be shipping a modified JAR.

UPDATE: You also mentioned shortly about this in the README actually, did not check it until I discovered the issue. But since the modification changes the stdio API, I think it deserves a little bit more explanation.

Thank you!

How to deal with IOError: [Errno 32] Broken pipe

I download coco caption two days ago. Unfortunately，I meet a trouble when I run metricsl.py.It is normal when computed bleu, but when computed the meteor,I meet a error as followed:
computing METEOR score...
Traceback (most recent call last):
File "metrics.py", line 203, in
test_cocoeval()
File "metrics.py", line 199, in test_cocoeval
valid_score, test_score = score_with_cocoeval(samples_valid, samples_test, engine)
File "metrics.py", line 92, in score_with_cocoeval
valid_score = scorer.score(gts_valid, samples_valid, engine.valid_ids)
File "/home/tuyunbin/vedio caption/arctic-capgen-vid-master/cocoeval.py", line 42, in score
score, scores = scorer.compute_score(gts, res)
File "/home/tuyunbin/vedio caption/arctic-capgen-vid-master/pycocoevalcap/meteor/meteor.py", line 37, in compute_score
stat = self._stat(res[i][0], gts[i])
File "/home/tuyunbin/vedio caption/arctic-capgen-vid-master/pycocoevalcap/meteor/meteor.py", line 55, in _stat
self.meteor_p.stdin.write('{}\n'.format(score_line))
IOError: [Errno 32] Broken pipe
Finally,I followed the Li Yao's readme way to add this 'self.meteor_p.kill()' at 45 line of meteor.py ,but the result is still the same, and now I am very anxious.

Steps

Hi there, I'm now in this, I'm trying to run an example using COCO, and I got this on internet. Can you send me the steps for run this, I´m confused. I have python with COCO files.

KeyError 'info' when loading resFile

I have seen a similar error ('type') when loading annotation file but the workaround is not solving this one. Maybe this is related to the typo in coco.py info()?

CIDEr

Hi,

The CIDEr metric calculated with this package is the CIDEr or the CIDEr-D?

Thnx!

PythonAPI : unable to load annotations into a text file

I am unable to create an array and load the annotations as text file for later use in my program.

`
from PythonAPI.pycocotools import coco

 if not os.path.exists('./dataset/COCO/KeyPointFiles'):
os.makedirs('./dataset/COCO/KeyPointFiles')

 annTypes = ('instances','captions','person_keypoints')
 annotations_type = annTypes[2]

   for mode in range(0,1):
  if mode == 1:
    datatype = 'val2014'
    ann_file_ = './dataset/COCO/annotations/%s_%s.json' % (annotations_type,datatype)
      else:
    datatype = 'train2014'
    ann_file_ = './dataset/COCO/annotations/%s_%s.json' % (annotations_type,datatype)
my_coco = coco.COCO(ann_file_)
my_ann = my_coco.anns
rev_id = -1
prev_count = 1  
cnt = 0


for i in range(1,len(my_ann)): #TODO: check len
	current_id = my_ann[i]['image_id']`

Here in current_id I am getting keyerror:
The exact error is,

loading annotations into memory... Done (t=4.58s) creating index... index created! Traceback (most recent call last): File "get_annotations.py", line 38, in <module> current_id = my_ann[i]['image_id'] KeyError: 1

Can someone please help me as I have no idea why am I facing this error. Can someone please help me as I am stuck with this error for hours.

COCOEvalCap.evaluate() results collection and assignment

print gts.keys()==imgIds
adding the above code to COCOEvalCap.evaluate() gives
False

In score, scores = scorer.compute_score(gts, res), scores are collected in the order of gts.keys(), while in 'COCOEvalCap.setImgToEvalImgs()' scores are assigned correspondingly to imgIds, doesn't it mean the results are assigned to the wrong images?

KeyError: 'type' when loading annFile

Hi,

Trying to use the cocoEvalCapDemo.ipynb, and running coco = COCO(annFile) throws the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-7f514b15e885> in <module>()
      1 # create coco object and cocoRes object
----> 2 coco = COCO(annFile)
      3 cocoRes = coco.loadRes(resFile)

/Users/lakshay/Documents/NYU/Studies/Fall17/cv/project/coco-caption-master/pycocotools/coco.pyc in __init__(self, annotation_file)
     74             print datetime.datetime.utcnow() - time_t
     75             self.dataset = dataset
---> 76             self.createIndex()
     77 
     78     def createIndex(self):

/Users/lakshay/Documents/NYU/Studies/Fall17/cv/project/coco-caption-master/pycocotools/coco.pyc in createIndex(self)
     91         cats = []
     92         catToImgs = []
---> 93         if self.dataset['type'] == 'instances':
     94             cats = {cat['id']: [] for cat in self.dataset['categories']}
     95             for cat in self.dataset['categories']:

KeyError: 'type'

No clue of why this is so. How could I fix this?

Semgrex Class Not Found Exception

Even after installing Stanford Core NLP 3.9.2 and pasting the core and the models.jar files in lib,further adding all the jars of core-nlp folder into the CLASSPATH the following issue comes

Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/semgraph/semgrex/SemgrexPattern
at edu.anu.spice.SpiceParser.(SpiceParser.java:64)
at edu.anu.spice.SpiceScorer.scoreBatch(SpiceScorer.java:70)
at edu.anu.spice.SpiceScorer.main(SpiceScorer.java:60)
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.semgraph.semgrex.SemgrexPattern
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more
Traceback (most recent call last):
File "test_results.py", line 50, in
metrics_dict = nlgeval.compute_metrics(references, hypotheses)
File "/home/shaunak/ProjectSSSP/FULL_S/caption_generator_resnet/nlg_eval_master/nlgeval/init.py", line 299, in compute_metrics
score, scores = scorer.compute_score(refs, hyps)
File "/home/shaunak/ProjectSSSP/FULL_S/caption_generator_resnet/nlg_eval_master/nlgeval/pycocoevalcap/spice/spice.py", line 70, in compute_score
cwd=os.path.dirname(os.path.abspath(file)))
File "/home/shaunak/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/shaunak/ProjectSSSP/FULL_S/caption_generator_resnet/nlg_eval_master/nlgeval/pycocoevalcap/spice/tmp/tmpVBAF4B', '-cache', '/home/shaunak/ProjectSSSP/FULL_S/caption_generator_resnet/nlg_eval_master/nlgeval/pycocoevalcap/spice/cache', '-out', '/home/shaunak/ProjectSSSP/FULL_S/caption_generator_resnet/nlg_eval_master/nlgeval/pycocoevalcap/spice/tmp/tmpp_GQD4', '-subset', '-silent']' returned non-zero exit status 1
python2 -W ignore test_results.py 26.40s user 0.85s system 193% cpu 14.052 total

Tried this with java 8 and java 9

Little typo in coco.py info()

At line 115 of coco.py, function info()

for key, value in self.datset['info'].items():
    print '%s: %s'%(key, value)

self.datset is not defined.

CIDEr is 0 if eval on one sample

If evaluate on only one sample, say a dummy test sample extracted from "captions_val2014.json", call the eval codes will always give 0 CIDEr score. If given more than one sample, the CIDEr works normal.

Other metrics are valid given one sample except for CIDEr.

Attached test outputs:
loading annotations into memory...
0:00:00.760903
creating index...
index created!

[{u'image_id': 203564, u'id': 37, u'caption': u'A bicycle replica with a clock as the front wheel.'}]

using 1/1 predictions
Loading and preparing results...
DONE (t=0.03s)
creating index...
index created!
tokenization...
PTBTokenizer tokenized 60 tokens at 1344.11 tokens per second.
PTBTokenizer tokenized 11 tokens at 281.16 tokens per second.
setting up scorers...
computing Bleu score...
{'reflen': 10, 'guess': [10, 9, 8, 7], 'testlen': 10, 'correct': [10, 9, 8, 7]}
ratio: 0.9999999999
Bleu_1: 1.000
Bleu_2: 1.000
Bleu_3: 1.000
Bleu_4: 1.000
computing METEOR score...
METEOR: 1.000
computing Rouge score...
ROUGE_L: 1.000
computing CIDEr score...
CIDEr: 0.000

self.meteor_p.stdin.write('{}\n'.format(score_line)) IOError: [Errno 32] Broken pipe

jdk1.8.0 and python2.7, and got this error. I have tried many ways. Reinstall JDK and reload didn't work. How can I do it? Is this a bug? Thank you!

subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', ...] returned non-zero exit status 1

I downloaded coco-caption from https://github.com/ruotianluo/coco-caption/pulls
Environment: ubuntu18.04, openjdk version "1.8.0_312", stanford-corenlp-3.4.1, python3.6

Other scores are OK.
An error occurred when calculating the spice score.
What should I do? Thank you!

spice.py
69 # Start job
78 subprocess.check_call(spice_cmd, cwd=os.path.dirname(os.path.abspath(file)))

subprocess.CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar'...] returned non-zero exit status 1

Error while using coco-caption to evaluate

Hi,
I tried to use coco caption to evaluate my neuraltalk2 results. but this error occured:

Loading and preparing results... Traceback (most recent call last): File "myeval.py", line 29, in <module> cocoRes = coco.loadRes(resFile) File "/home/mozhdeh/Documents/neuraltalk2-master/coco-caption/pycocotools/coco.py", line 318, in loadRes if 'caption' in anns[0]: IndexError: list index out of range /home/mozhdeh/torch/install/bin/luajit: ./misc/utils.lua:17: attempt to index local 'file' (a nil value) stack traceback: ./misc/utils.lua:17: in function 'read_json' ./misc/net_utils.lua:202: in function 'language_eval' eval.lua:167: in function 'eval_split' eval.lua:173: in main chunk [C]: in function 'dofile' ...hdeh/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x55cbcb3e8570

Any solution?

can not get the result showed in demo

Hi, I am running the demo script "cocoEvalCapDemo.py" but the result I have got was as follows:

loading annotations into memory...
0:00:00.654000
creating index...
index created!
Loading and preparing results...
DONE (t=0.03s)
creating index...
index created!
tokenization...
PTBTokenizer tokenized 61268 tokens at 525144.89 tokens per second.
PTBTokenizer tokenized 10892 tokens at 160108.14 tokens per second.
setting up scorers...
computing Bleu score...
{'reflen': 9855, 'guess': [9893, 8893, 7893, 6893], 'testlen': 9893, 'correct':
[5732, 2510, 1043, 423]}
ratio: 1.00385591071
Bleu_1: 0.579
Bleu_2: 0.404
Bleu_3: 0.279
Bleu_4: 0.191
computing METEOR score...
METEOR: 0.195
computing Rouge score...
ROUGE_L: 0.396
computing CIDEr score...
CIDEr: 0.600
CIDEr: 0.600
Bleu_4: 0.191
Bleu_3: 0.279
Bleu_2: 0.404
Bleu_1: 0.579
ROUGE_L: 0.396
METEOR: 0.195
.....

the value of Bleu and other metrics are different from the value showed in the ipynb file "cocoEvalCapDemo.ipynb" I am not sure what is the problem. Is the ipynb file out of date or something others???

Value of w_n for BLEU calculation

Hi,

I tried to find it in the code but I could not do so. I want to ask what is the value used for w_n in the BLEU calculation equation 3 from the paper:

https://vision.cornell.edu/se3/wp-content/uploads/2015/04/1504.00325v2.pdf

It might be a problem of java.

It might be a problem of java.
Try this:

Install java on your linux system, if it wasn't installed.
In the initialization function of meteor.py:

def __init__(self):
    self.meteor_cmd = ['java', '-jar', '-Xmx2G', METEOR_JAR, '-', '-', '-stdio', '-l', 'en', '-norm']
    self.meteor_p = subprocess.Popen(' '.join(self.meteor_cmd),\     # change this line
        cwd=os.path.dirname(os.path.abspath(__file__)), \
        stdin=subprocess.PIPE, \
        stdout=subprocess.PIPE, \
        stderr=subprocess.PIPE,\
        shell=True)
        # Used to guarantee thread safety
        self.lock = threading.Lock()

Originally posted by @brucejing-github in #17 (comment)

question about Bleu script

Would the current Bleu pipeline work if some picture has more (less) than 5 captions?

For example, in eval.py
scorer.compute_score(gts, res) where gts {key, caps} and where caps is a list of variable length, depending on how many ground truth captions a picture get.

Also, why is the Bleu between 0 and 1?

Inconsistent SPICE scores

It was found that in pycocoevalcap/spice/spice.py, line 29 is sorting the img_ids prior the generation of SPICE metrics. The result metrics are then extracted and mapped to the corresponding image ids in pycocoevalcap/eval.py by the setImgToEvalImgs() function. However, the latter function is using an unsorted img_ids list. This incompatibility between the sorted and unsorted lists makes the SPICE metrics being mapped to incorrect image ids. This does not have any effect on the overall SPICE metric as it is only a problem of misalignment between the computed SPICE metrics and their corresponding image ids. For consistency, no sorting is needed for imgIds in pycocoevalcap/spice/spice.py.

OSError: [Errno 2] No such file or directory

when I run cocoEvalCapDemo.ipynb, it throws an exception:

setting up scorers...
Traceback (most recent call last):
  File "cocoEvalDemo.py", line 31, in <module>
    cocoEval.evaluate()
  File "/home/fuleying/coco-caption/pycocoevalcap/eval.py", line 31, in evaluate
    gts  = tokenizer.tokenize(gts)
  File "/home/fuleying/coco-caption/pycocoevalcap/tokenizer/ptbtokenizer.py", line 52, in tokenize
    stdout=subprocess.PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

question about the bleu and meteor

 def  compute_score(self, gts, res):

    assert(gts.keys() == res.keys())
    imgIds = gts.keys()

    bleu_scorer = BleuScorer(n=self._n)
    for id in imgIds:
        hypo = res[id]
        ref = gts[id]

        # Sanity check.
        assert(type(hypo) is list)
        assert(len(hypo) == 1)
        assert(type(ref) is list)
        assert(len(ref) >= 1)

        bleu_scorer += (hypo[0], ref)

    #score, scores = bleu_scorer.compute_score(option='shortest')
    score, scores = bleu_scorer.compute_score(option='closest', verbose=1)
    # score, scores = bleu_scorer.compute_score(option='average', verbose=1)

    # return (bleu, bleu_info)
    return score, scores

I find the return has two params, one is score and the other is scores ,and I found mean(scores) is not equal to score , I want to know what these two return values do, and under what circumstances mean(scores) == score , and the same problem occurred with cider .

def compute_score(self, gts, res):
    assert(gts.keys() == res.keys())
    imgIds = gts.keys()
    scores = []

    eval_line = 'EVAL'
    self.lock.acquire()
    for i in imgIds:
        assert(len(res[i]) == 1)
        stat = self._stat(res[i][0], gts[i])
        eval_line += ' ||| {}'.format(stat)

    self.meteor_p.stdin.write('{}\n'.format(eval_line).encode())
    self.meteor_p.stdin.flush()
    for i in range(0,len(imgIds)):
        scores.append(float(self.meteor_p.stdout.readline().strip()))
    score = float(self.meteor_p.stdout.readline().strip())
    self.lock.release()

    return score, scores

and the later is my res

{'testlen': 14006, 'reflen': 14927, 'guess': [14006, 12389, 10773, 9166], 'correct': [2367, 22, 1, 0]}
ratio: 0.9382997253298762
Bleu_1:  0.1582435446030457
Bleu_2:  0.016220982225013343
Bleu_3:  0.0028384843308123897
Bleu_4:  2.198519789887133e-07
METEOR:  0.04443493208767419
ROUGE_L: 0.16704389834453118
CIDEr:   0.028038780435183798
{'testlen': 14006, 'reflen': 14927, 'guess': [14006, 12389, 10773, 9166], 'correct': [2367, 22, 1, 0]}
ratio: 0.9382997253298762
     val_Bleu_1    val_Bleu_2    val_Bleu_3    val_Bleu_4  val_METEOR  \
0  1.312883e-01  2.181574e-03  1.214780e-04  1.884038e-08    0.046652

   val_ROUGE_L  val_CIDEr
0     0.167044   0.028039

We find only cider and rouge is equal .
I hope to get your help, thanks

self.dataset has no key 'categories'

is this code not updated for quite sometime? Also, can you suggest some good source with Python>=3.7 support?

I tried to run the given IPython file. But while loading I get KeyError. The annotation JSON file does not have the following keys: (a) type (b) categories.

Please let me know. Thanks !!

I want to know is anyone here took part in the COCO Caption Competition?

how about the your ranking?

documentation

@tylin hello!!!
I want to you your tookit for evaluation of my output can you provide any documentation or steps how to calculate bleu score.

Thanks in advance!!!!

ValueError: invalid literal for float()

when I use the coco-caption evaluation code, the bleu1-4 scores are normal, but the script always reports some errors on the meteor . how can I deal with it?
valid_score = scorer.score(gts_valid, samples_valid, dataset.valid_ids)
File /cocoeval.py", line 72, in score
print 'computing %s score...'%(scorer.method())
File "/pycocoevalcap/meteor/meteor.py", line 42, in compute_score
#print('{}\n'.format(eval_line))
ValueError: invalid literal for float(): 28.0 16.0 0.0 0.0 9.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 7.0 9.0 9.0

OSError: [Errno 13] Permission denied When I try to run cocoEvalCapDemo.ipynb

When I try to run cocoEvalCapDemo.ipynb, an error occured for block 4.
The error is as follows:

tokenization...

OSError Traceback (most recent call last)
in ()
9 # evaluate results
10 # SPICE will take a few minutes the first time, but speeds up due to caching
---> 11 cocoEval.evaluate()

/home/lz/DOWNLOAD/coco-caption-master/pycocoevalcap/eval.py in evaluate(self)
30 print 'tokenization...'
31 tokenizer = PTBTokenizer()
---> 32 gts = tokenizer.tokenize(gts)
33 res = tokenizer.tokenize(res)
34

/home/lz/DOWNLOAD/coco-caption-master/pycocoevalcap/tokenizer/ptbtokenizer.py in tokenize(self, captions_for_image)
41 # ======================================================
42 path_to_jar_dirname=os.path.dirname(os.path.abspath(file))
---> 43 tmp_file = tempfile.NamedTemporaryFile(delete=False, dir=path_to_jar_dirname)
44 tmp_file.write(sentences)
45 tmp_file.close()

/home/lz/anaconda3/envs/coco-caption/lib/python2.7/tempfile.pyc in NamedTemporaryFile(mode, bufsize, suffix, prefix, dir, delete)
473 flags |= _os.O_TEMPORARY
474
--> 475 (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
476 try:
477 file = _os.fdopen(fd, mode, bufsize)

/home/lz/anaconda3/envs/coco-caption/lib/python2.7/tempfile.pyc in _mkstemp_inner(dir, pre, suf, flags)
242 file = _os.path.join(dir, pre + name + suf)
243 try:
--> 244 fd = _os.open(file, flags, 0600)
245 _set_cloexec(fd)
246 return (fd, _os.path.abspath(file))

OSError: [Errno 13] Permission denied: '/home/lz/DOWNLOAD/coco-caption-master/pycocoevalcap/tokenizer/tmprn5mrh'

I need some help! Thanks in advance!