audio-captioning / dcase-2020-baseline Goto Github PK
View Code? Open in Web Editor NEWAudio captioning baseline system for DCASE 2020 challenge.
Home Page: http://dcase.community/challenge2020/task-automatic-audio-captioning
License: Other
Audio captioning baseline system for DCASE 2020 challenge.
Home Page: http://dcase.community/challenge2020/task-automatic-audio-captioning
License: Other
Hi! Thank you for this great repository. I have a quick question regarding the pretrained weights for the baseline system (https://zenodo.org/record/3697687#.YKJp7-spCuo). I do not see this mentioned explicitly (sorry if I missed it), but were these weights obtained from training on the training split of Clotho v1 or v2? The results here are reported as v1?
thanks so much!
Greta
Hi
I tried to solve it by myself, but it was unsuccessful
OS:
Ubuntu 18.04 LTS
java -version
output:
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
Error message:
paniquex, [06.03.20 20:58]
computing SPICE score...
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 7 more
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/main.py in <module>
56
57 if name == 'main':
---> 58 main()
59
60 # EOF
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/main.py in main()
52 if settings['workflow']['dnn_training'] or \
53 settings['workflow']['dnn_evaluation']:
---> 54 method.method(settings)
55
56
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/method.py in method(settings)
451 settings_data=settings['dnn_training_settings']['data'],
452 settings_io=settings['dirs_and_files'],
--> 453 indices_list=indices_list)
454 logger_inner.info('Evaluation done')
455
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/method.py in _do_evaluation(model, settings_data, settings_io, indices_list)
177 logger_main.info('Evaluation done')
178
--> 179 metrics = evaluate_metrics(captions_pred, captions_gt)
180
181 for metric, values in metrics.items():
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/eval_metrics.py in evaluate_metrics(prediction_file, reference_file, nb_reference_captions)
287 ground_truths.append([reference_dict[file_name][cap] for cap in cap_names])
288
--> 289 metrics, per_file_metrics = evaluate_metrics_from_lists(predictions, ground_truths)
290
291 total_metrics = combine_single_and_per_file_metrics(
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/eval_metrics.py in evaluate_metrics_from_lists(predictions, ground_truths, ids)
159 write_json(pred, pred_file)
160
--> 161 metrics, per_file_metrics = evaluate_metrics_from_files(pred_file, ref_file)
162
163 # Delete temporary files
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/eval_metrics.py in evaluate_metrics_from_files(pred_file, ref_file)
108 cocoEval = COCOEvalCap(coco, cocoRes)
109 cocoEval.params['audio_id'] = cocoRes.getAudioIds()
--> 110 cocoEval.evaluate()
111
112 # Make dict from metrics
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/eval.py in evaluate(self, verbose)
60 if verbose:
61 print('computing %s score...'%(scorer.method()))
---> 62 score, scores = scorer.compute_score(gts, res)
63 if type(method) == list:
64 for sc, scs, m in zip(score, scores, method):
paniquex, [06.03.20 20:58]
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/spice.py in compute_score(self, gts, res)
73 ]
74 subprocess.check_call(spice_cmd,
---> 75 cwd=os.path.dirname(os.path.abspath(file)))
76
77 # Read and process results
~/anaconda3/lib/python3.7/subprocess.py in check_call(*popenargs, **kwargs)
361 if cmd is None:
362 cmd = popenargs[0]
--> 363 raise CalledProcessError(retcode, cmd)
364 return 0
365
CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/paniquex/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/tmp/tmpi2453jj0', '-cache', '/home/paniquex/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/cache', '-out', '/home/paniquex/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/tmp/tmpq7uq3kif', '-subset', '-silent']' returned non-zero exit status 1.
When I try to run main.py from root directory with code python main.py
, I get this error:
File "main.py", line 34, in <module>
main()
File "main.py", line 28, in main
settings=settings['logging'])
KeyError: 'logging'
Also when I try to run python processes/dataset.py
, I get this error:
KeyError Traceback (most recent call last)
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/dataset.py in <module>
173
174 if __name__ == '__main__':
--> 175 main()
176
177 # EOF
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/dataset.py in main()
151
152 init_loggers(verbose=verbose,
--> 153 settings=settings['dirs_and_files']['logging'])
154
155 logger_main = logger.bind(is_caption=False, indent=1)
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/tools/printing.py in init_loggers(verbose, settings)
59 record['extra']['indent'] == i and not record['extra']['is_caption'])
60
---> 61 logging_path = Path(settings['root_dir'],
62 settings['logger_dir'])
63
KeyError: 'root_dir'
Could you tell me, please, how to solve this problem. Thanks!
i wander if any one faced these errors like me when doing evaluation:
subprocess.CalledProcessError: Command '['/usr/java/jdk-16.0.2/bin/java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/data1/coco-caption/pycocoevalcap/spice/tmp/tmpe6faasnb', '-cache', '/data1/coco-caption/pycocoevalcap/spice/cache/1634609092.4812038', '-out', '/data1/coco-caption/pycocoevalcap/spice/tmp/tmpv2_jep2k', '-subset', '-silent']' returned non-zero exit status 1.
or
File "D:\dcase-2021-baseline-master\coco_caption\pycocoevalcap\tokenizer\ptbtokenizer.py", line 58, in tokenize stdout=subprocess.PIPE) File "C:\Users\admin\miniconda3\envs\dcase-2021-baseline-master\lib\subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "C:\Users\admin\miniconda3\envs\dcase-2021-baseline-master\lib\subprocess.py", line 1207, in _execute_child startupinfo) FileNotFoundError: [WinError 2]
After installing 'Java SE Development Kit 8u361' instead of higher version, the problem solved.
RuntimeError: Error opening 'data/clotho_audio_files/development/Distorted AM Radio noise.wav': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 57, in
main()
File "main.py", line 40, in main
settings_dirs_and_files=settings['dirs_and_files'])
File "/src/processes/dataset.py", line 111, in create_dataset
dir_downloaded_audio)
File "/src/tools/dataset_creation.py", line 288, in create_split_data
mono=settings_audio['to_mono'])
File "/src/tools/file_io.py", line 83, in load_audio_file
offset=offset, duration=duration)[0]
File "/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py", line 147, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py", line 171, in __audioread_load
with audioread.audio_open(path) as input_file:
File "/usr/local/lib/python3.7/dist-packages/audioread/init.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError
Sorry, solved now!
Hi. According to http://dcase.community/challenge2020/task-automatic-audio-captioning#submission, participants are required to submit the output of their audio captioning method for each of the 1043 files in the evaluation split.
Could you please release the codes for the submission of DCASE2020 Task6?
Thanks a lot.
According your 'Readme', if I want evaluate the model I have trained, I should change the settings of 'dirs_and_files.yaml', 'model_baseline.yaml' and 'method_baseline.yaml'.
Then, I find the settings in 'dirs_and_files.yaml' as follows:
Is it right? In fact, I feel puzzled with the 'checkpoint_model_name'. What should it be in this work?
In 'model_baseline.yaml', I change the 'use_pre_trained_model: Yes'. However, in 'method_baseline.yaml', I did not find the settings for evaluation. It looks like a whole training yaml file. What should I do when evaluating with the 'method_baseline.yaml'?
When running Evaluation, this error could happen:
subprocess.calledprocesserror: command '[subprocess.calledprocesserror'java', '-jar', '-xmx8g', 'spice-1.0.jar', <some more commands> -subset -silent returned non-zero exit status 1.
When I try to run main.py from root directory with code python main.py, I get this error:
File "main.py", line 77, in <module>
main()
File "main.py", line 73, in main
method.method(settings)
File "dcase-2020-baseline/processes/method.py", line 504, in method
indices_list=indices_list)
File "dcase-2020-baseline/processes/method.py", line 327, in _do_training
grad_norm_val=settings_training['grad_norm']['value'])
File "dcase-2020-baseline/tools/model.py", line 130, in module_epoch_passing
norm_type=grad_norm)
File "lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line
30, in clip_grad_norm_
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList
In the logger file, this error appears at the beginning of training the model.
Could you tell me, please, how to solve this problem. Thanks!
First of all, thanks for putting the code together!
I have an issue when trying to reproduce the results reported on the DCASE task website.
Running main.py with settings
workflow:
dataset_creation: Yes
dnn_training: No
dnn_evaluation: Yes
using the pre-trained model (https://doi.org/10.5281/zenodo.3697687), I get results that differ from the ones reported in
I am currently getting the following:
bleu_1 : 0.3893
bleu_2 : 0.1353
bleu_3 : 0.0542
bleu_4 : 0.0150
meteor : 0.0839
rouge_l: 0.2622
cider : 0.0737
spice : 0.0332
spider : 0.0535
Should I get the same results? Is there anything I am missing?
In the baseline, there is a code in 'baseline_dcase.py' as h_encoder: Tensor = self.encoder(x)[:, -1, :].unsqueeze(1).expand(-1, self.max_out_t_steps, -1)
. Why the baseline just remain the last dim of the output of encoder?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.