Hi 👋, I'm Rishabbh

Data scientist and consultant, having experience across multiple industries like retail, manufacturing, FMCG, banking(fintech), insurance, consumer business etc.

I'm currently:

🔭 Working on deep learning projects such as transfer learning, pre-trained models (transformers), incremental models, layer pruning, quantization & distillation, language models to solve NLP downstream tasks like document parsing pipeline (banking sector), summarization, Q&A, NLU/NLG, context based auto-completion, RestAPI's flask-endpoint, deployment & productionization, docker images & containers, kubernetes, GCR-GCP, CI/CD pipeline, gitHub hooks/actions(pre/post commit, workflows), DVC-pipeline, mongoDB, google-OCR, label studio, Transformer's model interpretability etc.
🌱 Learning autoencoders, self-supervised learning, optimization, time series analysis using deep learning (deepstates model-gluonTS), linear programming (optimization), anomaly detection, feature learning, data comprssion techniques(SVD, matrix factorization), MLOps (ML pipeline), model interpretability (Explicable-AI), ablation study, TextRank - grpah representation of text with PageRank
👀 Interested in doing research work/consulting assignments by sharing, learning and exploring to/from open source communities. Motivated to create numerous projects in the field of AI/ML/DL with the focus to deploy into production.
👯 Looking forward to collaborate on ML/DL projects and Kaggle competitions

About my work:

👨‍💻 All of my projects are available at https://github.com/Rishabbh-Sahu
💬 Ask me about NLU/NLP, intent/sequence/text/email classification, NER (named entity recognition), sentence/document/semantic similarity, information retrieval using tfidf/context-based, time series analysis (forecasting), deep learning, data augmentation, dialog system (voice models), PLM's (pre-trained language models), model ensembling/stacking, feature selection methods, segmentation, tokenization (text), optimization, recommendation engine, customer-360 analysis, statistics, retail analytics, supply chain analytics, dimensionality reduction, regularization techniques, bias & varaince, DOE-design of experiments (ANOVA,T/F/Chi^2/Z-test), KS-test, sampling methods, A/B testing, crowd-sourcing (Toloka, Mtark etc.), Big-query, SQL
📫 You can reach me on www.linkedin.com/in/rishabbh-sahu-pmp

Languages and Tools:

Rishabbh's Projects

explicable_ai

To understand how variety of ML models work and their interpretation

extractive_summarization

Extractive summarization based on significant sentence present in the article. Significant sentences can be found based on word frequency count. Spacy library used for text pre-processing.

ignore_email_address_classifier

Important vs Ignore email classifier based on incoming email addresses. Bert tokenizer is used as a tokenization method and CNN network as the model. Framework used - Tensorflow 2.4

information_retrieval

Given a document, identifying the closest documents within the list of documents using tf-idf matrix and cosine similarity

intent_and_slot_classification

One of the main NLU tasks is to understand the intents (sequence classification) and slots (entities within the sequence). This repo help classify both together using Joint Model (multitask model). BERT_SMALL is used which can be changed to any other BERT variant.

learning_pathways

A repository consists of various links which helps understanding a concepts, algorithm etc.

linear_programming

Solving linear programming using python optimizer interface. This repo allows you to add multiple decision vars and constraints etc. in a very easy way.

semantic_lookalike_transformers

Finding look alike sentences by leveraging the concept of semantic similarities pre-learned by transformer models while pre-training task. I've used cosine similarity as an angular distance matrix applied over sent2vec.