RAVDESS dataset is a database of emotional speech and song that contains 7365 files. All other information could be found on the following link: https://zenodo.org/record/1188976
The model takes an input of MFCC with a shape of (300,200) And feeds this input to a base model Resnet50 followed by a softmax layer with 6 nodes. The model is trained from scratch.
The model gave a categorical accuracy of 80% and an AUC of 94.