Git Product home page Git Product logo

audio-captioning / dcase-2020-baseline Goto Github PK

View Code? Open in Web Editor NEW
37.0 37.0 11.0 94.75 MB

Audio captioning baseline system for DCASE 2020 challenge.

Home Page: http://dcase.community/challenge2020/task-automatic-audio-captioning

License: Other

Python 99.62% Shell 0.38%
audio-captioning audio-signal-processing captioning dcase dcase2020 deep-learning deep-neural-networks machine-learning machine-listening signal-processing

dcase-2020-baseline's People

Contributors

dependabot[bot] avatar dr-costas avatar lippings avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dcase-2020-baseline's Issues

Problems with computing SPICE score.

Hi

I tried to solve it by myself, but it was unsuccessful

OS:
Ubuntu 18.04 LTS

java -version output:

openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

Error message:

paniquex, [06.03.20 20:58]
computing SPICE score...
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
  at java.lang.Class.getDeclaredMethods0(Native Method)
  at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
  at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
  at java.lang.Class.getMethod0(Class.java:3018)
  at java.lang.Class.getMethod(Class.java:1784)
  at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
  at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
  ... 7 more
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/main.py in <module>
     56 
     57 if name == 'main':
---> 58     main()
     59 
     60 # EOF

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/main.py in main()
     52     if settings['workflow']['dnn_training'] or \
     53             settings['workflow']['dnn_evaluation']:
---> 54         method.method(settings)
     55 
     56 

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/method.py in method(settings)
    451             settings_data=settings['dnn_training_settings']['data'],
    452             settings_io=settings['dirs_and_files'],
--> 453             indices_list=indices_list)
    454         logger_inner.info('Evaluation done')
    455 

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/method.py in _do_evaluation(model, settings_data, settings_io, indices_list)
    177     logger_main.info('Evaluation done')
    178 
--> 179     metrics = evaluate_metrics(captions_pred, captions_gt)
    180 
    181     for metric, values in metrics.items():

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/eval_metrics.py in evaluate_metrics(prediction_file, reference_file, nb_reference_captions)
    287         ground_truths.append([reference_dict[file_name][cap] for cap in cap_names])
    288 
--> 289     metrics, per_file_metrics = evaluate_metrics_from_lists(predictions, ground_truths)
    290 
    291     total_metrics = combine_single_and_per_file_metrics(

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/eval_metrics.py in evaluate_metrics_from_lists(predictions, ground_truths, ids)
    159     write_json(pred, pred_file)
    160 
--> 161     metrics, per_file_metrics = evaluate_metrics_from_files(pred_file, ref_file)
    162 
    163     # Delete temporary files

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/eval_metrics.py in evaluate_metrics_from_files(pred_file, ref_file)
    108     cocoEval = COCOEvalCap(coco, cocoRes)
    109     cocoEval.params['audio_id'] = cocoRes.getAudioIds()
--> 110     cocoEval.evaluate()
    111 
    112     # Make dict from metrics

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/eval.py in evaluate(self, verbose)
     60             if verbose:
     61                 print('computing %s score...'%(scorer.method()))
---> 62             score, scores = scorer.compute_score(gts, res)
     63             if type(method) == list:
     64                 for sc, scs, m in zip(score, scores, method):

paniquex, [06.03.20 20:58]
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/spice.py in compute_score(self, gts, res)
     73         ]
     74         subprocess.check_call(spice_cmd, 
---> 75             cwd=os.path.dirname(os.path.abspath(file)))
     76 
     77         # Read and process results

~/anaconda3/lib/python3.7/subprocess.py in check_call(*popenargs, **kwargs)
    361         if cmd is None:
    362             cmd = popenargs[0]
--> 363         raise CalledProcessError(retcode, cmd)
    364     return 0
    365 

CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/home/paniquex/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/tmp/tmpi2453jj0', '-cache', '/home/paniquex/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/cache', '-out', '/home/paniquex/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/coco_caption/pycocoevalcap/spice/tmp/tmpq7uq3kif', '-subset', '-silent']' returned non-zero exit status 1.

Issues with running main.py

When I try to run main.py from root directory with code python main.py, I get this error:

  File "main.py", line 34, in <module>
    main()
  File "main.py", line 28, in main
    settings=settings['logging'])
KeyError: 'logging'

Also when I try to run python processes/dataset.py, I get this error:

KeyError                                  Traceback (most recent call last)
~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/dataset.py in <module>
    173 
    174 if __name__ == '__main__':
--> 175     main()
    176 
    177 # EOF

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/processes/dataset.py in main()
    151 
    152     init_loggers(verbose=verbose,
--> 153                  settings=settings['dirs_and_files']['logging'])
    154 
    155     logger_main = logger.bind(is_caption=False, indent=1)

~/Documents/Automated_Audio_Captioning_DCASE2020/dcase-2020-baseline/tools/printing.py in init_loggers(verbose, settings)
     59             record['extra']['indent'] == i and not record['extra']['is_caption'])
     60 
---> 61     logging_path = Path(settings['root_dir'],
     62                         settings['logger_dir'])
     63 

KeyError: 'root_dir'

Could you tell me, please, how to solve this problem. Thanks!

Reminder about JAVA jdk version issue in doing evaluation

i wander if any one faced these errors like me when doing evaluation:
subprocess.CalledProcessError: Command '['/usr/java/jdk-16.0.2/bin/java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/data1/coco-caption/pycocoevalcap/spice/tmp/tmpe6faasnb', '-cache', '/data1/coco-caption/pycocoevalcap/spice/cache/1634609092.4812038', '-out', '/data1/coco-caption/pycocoevalcap/spice/tmp/tmpv2_jep2k', '-subset', '-silent']' returned non-zero exit status 1.
or
File "D:\dcase-2021-baseline-master\coco_caption\pycocoevalcap\tokenizer\ptbtokenizer.py", line 58, in tokenize stdout=subprocess.PIPE) File "C:\Users\admin\miniconda3\envs\dcase-2021-baseline-master\lib\subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "C:\Users\admin\miniconda3\envs\dcase-2021-baseline-master\lib\subprocess.py", line 1207, in _execute_child startupinfo) FileNotFoundError: [WinError 2]

After installing 'Java SE Development Kit 8u361' instead of higher version, the problem solved.

unknown file format when using clotho dataset

RuntimeError: Error opening 'data/clotho_audio_files/development/Distorted AM Radio noise.wav': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 57, in
main()
File "main.py", line 40, in main
settings_dirs_and_files=settings['dirs_and_files'])
File "/src/processes/dataset.py", line 111, in create_dataset
dir_downloaded_audio)
File "/src/tools/dataset_creation.py", line 288, in create_split_data
mono=settings_audio['to_mono'])
File "/src/tools/file_io.py", line 83, in load_audio_file
offset=offset, duration=duration)[0]
File "/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py", line 147, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py", line 171, in __audioread_load
with audioread.audio_open(path) as input_file:
File "/usr/local/lib/python3.7/dist-packages/audioread/init.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError

About Evaluation settings

According your 'Readme', if I want evaluate the model I have trained, I should change the settings of 'dirs_and_files.yaml', 'model_baseline.yaml' and 'method_baseline.yaml'.

Then, I find the settings in 'dirs_and_files.yaml' as follows:
image
Is it right? In fact, I feel puzzled with the 'checkpoint_model_name'. What should it be in this work?

In 'model_baseline.yaml', I change the 'use_pre_trained_model: Yes'. However, in 'method_baseline.yaml', I did not find the settings for evaluation. It looks like a whole training yaml file. What should I do when evaluating with the 'method_baseline.yaml'?

Potential SPICE calculation issues

When running Evaluation, this error could happen:
subprocess.calledprocesserror: command '[subprocess.calledprocesserror'java', '-jar', '-xmx8g', 'spice-1.0.jar', <some more commands> -subset -silent returned non-zero exit status 1.

RuntimeError: stack expects a non-empty TensorList

When I try to run main.py from root directory with code python main.py, I get this error:

  File "main.py", line 77, in <module>
    main()
  File "main.py", line 73, in main
    method.method(settings)
  File "dcase-2020-baseline/processes/method.py", line 504, in method
    indices_list=indices_list)
  File "dcase-2020-baseline/processes/method.py", line 327, in _do_training
    grad_norm_val=settings_training['grad_norm']['value'])
  File "dcase-2020-baseline/tools/model.py", line 130, in module_epoch_passing
    norm_type=grad_norm)
  File "lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line
30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

In the logger file, this error appears at the beginning of training the model.
Could you tell me, please, how to solve this problem. Thanks!

Reproducibility of results

First of all, thanks for putting the code together!

I have an issue when trying to reproduce the results reported on the DCASE task website.

Running main.py with settings

workflow:
dataset_creation: Yes
dnn_training: No
dnn_evaluation: Yes

using the pre-trained model (https://doi.org/10.5281/zenodo.3697687), I get results that differ from the ones reported in

http://dcase.community/challenge2020/task-automatic-audio-captioning#results-for-the-development-dataset

I am currently getting the following:
bleu_1 : 0.3893
bleu_2 : 0.1353
bleu_3 : 0.0542
bleu_4 : 0.0150
meteor : 0.0839
rouge_l: 0.2622
cider : 0.0737
spice : 0.0332
spider : 0.0535

Should I get the same results? Is there anything I am missing?

About the baseline net

In the baseline, there is a code in 'baseline_dcase.py' as h_encoder: Tensor = self.encoder(x)[:, -1, :].unsqueeze(1).expand(-1, self.max_out_t_steps, -1). Why the baseline just remain the last dim of the output of encoder?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.