Name: Banala Saritha
Type: User
Company: India
Bio: Speaker Recognition and Identification, Meta-learning, Few Shot Learning & Speech Processing, Speech-activity-detection , T-F Representations.
Location: National Institute of Technology
Banala Saritha's Projects
3D convolutional neural network for video classification
kaldi-asr/kaldi is the official location of the Kaldi project.
AS-pVAD: A Real-time Personalized Voice Activity Detection Network With Attentive Score Loss
This repository implements the the encoder and decoder model with attention model for OCR
Audio classification is a popular topic, here I implement several models using TenserFlow and Keras.
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
Meta-Learning for Few Shot Learning
Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"
Bidirectional LSTM + CRF (Conditional Random Fields) in Tensorflow
Channel Attention Convolutional Recurrent Neural Network for Few-Shot Speaker Identification
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
A modified ResNet with channel attention mechanism
End-to-End Chinese Speaker Identification
A novel Clustering algorithm by measuring Direction Centrality (CDC) locally. It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points.
protein-sequence-based drug discovery; dilated convolutions, recurrent interpretation to time-distributed output
TensorFlow code for the paper Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data (https://arxiv.org/abs/1602.05875)
Another project for classifying Covid and non-covid patients through cough sound. Using CRNN-Attention model with the sound data converted into image data
CRNN(Convolutional Recurrent Neural Network), with optional STN(Spatial Transformer Network), in Tensorflow, multi-gpu supported.
Keyword spotting on Arm Cortex-M Microcontrollers
Lightweight CRNN for OCR (including handwritten text) with depthwise separable convolutions and spatial transformer module [keras+tf]
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition
Code and slides for the "Deep Learning (For Audio) With Python" course on TheSoundOfAI Youtube channel.
Learning differentiable temporal resolution on time-series data.
Sound event detection with depthwise separable and dilated convolutions.