Git Product home page Git Product logo

kaggle-birdcall-6th-place's Introduction

Cornell Birdcall Identification Competition

6th place solution to the Cornell Birdcall Identification Challenge hosted on Kaggle.

Context

In this competition, you will identify a wide variety of bird vocalizations in soundscape recordings. Due to the complexity of the recordings, they contain weak labels. There might be anthropogenic sounds (e.g., airplane overflights) or other bird and non-bird (e.g., chipmunk) calls in the background, with a particular labeled bird species in the foreground. Bring your new ideas to build effective detectors and classifiers for analyzing complex soundscape recordings!

Evaluation

The hidden test_audio directory contains approximately 150 recordings in mp3 format, each roughly 10 minutes long. They will not all fit in a notebook's memory at the same time. The recordings were taken at three separate remote locations in North America. Sites 1 and 2 were labeled in 5 second increments and need matching predictions, but due to the time consuming nature of the labeling process the site 3 files are only labeled at the file level. Accordingly, site 3 has relatively few rows in the test set and needs lower time resolution predictions.

Scores were evaluated based on their row-wise micro averaged F1 score.

Solution

My approach was

  • 3 stages of training to gradually remove noise in labels
  • SED style training and inference as I introduced in an introductory notebook.
  • Ensemble of 11 EMA models trained with the whole dataset / whole extended dataset

Details are described in kaggle discussion.

First stage

  • single PANNs model (Cnn14_DecisionLevelAtt)
  • BCE on clipwise_output and also on max(framewise_output, axis=0) (max pool on time axis) and sum them up with the weight 1.0 and 0.5 each.
  • Adam + CosineAnnealingLR, 55 epochs of training
  • train with randomly cropped 30s chunk
  • validate on randomly cropped 30s chunk
  • Augmentations on raw waveform.
    • NoiseInjection (max noise amplitude 0.04)
    • PitchShift (max pitch level 3)
    • RandomVolume (max db level 4)

Find missing label

After 5fold training of fast stage, I conducted SED inference on the whole train dataset and get time interval labels. Although this label was still too noisy, it was useful enough to find missing labels in the training dataset.

Second stage

  • SED model with ResNeSt50 encoder, attention pooling head of PANNs (torch.clamp -> torch.tanh)
  • BCE on clipwise_output and also on max(framewise_output, axis=0) (max pool on time axis) and sum them up with the weight 1.0 and 0.5 each.
  • Adam + CosineAnnealingLR, 55 epochs of training
  • Add additional secondary_labels found after stage1
  • 3 channels input - (normal log-melspectrogram, PCEN, librosa.power_to_db(melspec ** 1.5))
  • train with randomly cropped 20s chunk
  • validate on randomly cropped 30s chunk
  • Augmentations on raw waveform.
    • NoiseInjection (max noise amplitude 0.04)
    • PitchShift (max pitch level 3)
    • RandomVolume (max db level 4)

Third stage

  • SED model with ResNeSt50 encoder / EfficientNet-B0 encoder, attention pooling head of PANNs (torch.clamp -> torch.tanh)
  • BCE on clipwise_output and also on max(framewise_output, axis=0) (max pool on time axis) and sum them up with the weight 1.0 and 0.5 each. For EfficientNet-B0 encoder, I used FocalLoss in the same way as I did with BCE.
  • Adam + CosineAnnealingLR, 55 epochs of training
  • Add additional secondary_labels found after stage1
  • Correct labels using the prediction of stage2 model.
  • 3 channels input - (normal log-melspectrogram, PCEN, librosa.power_to_db(melspec ** 1.5))
  • train with randomly cropped 20s chunk
  • validate on randomly cropped 30s chunk
  • Augmentations on raw waveform.
    • NoiseInjection (max noise amplitude 0.04)
    • PitchShift (max pitch level 3)
    • RandomVolume (max db level 4)

To reproduce the result

NOTE: This is from my stupidity, but I've been embedding a bug regarding the seed fixation for a long time and found that in the very late stage, which was too late to fix. What I mean is that, you may not get exactly the same result as I got, although it's highly likely to get similar result.

0. Data preparation

Put the original dataset and extended dataset in input/birdsong-recognition.

Run make prepare command in this directory(the directory where this README.md is placed). This will perform resampling on the datasets, which takes a couple of hours.

1. training

make train

(Optional) Stage 1 only

make train-stage1

(Optional) Stage 1 and Stage 2

Since stage2 depends on the output of stage1, it cannot be run without stage1

make train-stage1
make train-stage2

Kaggle Solution

kaggle-birdcall-6th-place's People

Contributors

koukyo1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.