Git Product home page Git Product logo

Sankar Sundaram's Projects

algorithms icon algorithms

Minimal examples of data structures and algorithms in Python

azure-sdk-for-net icon azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.

barcode_scanner_python icon barcode_scanner_python

Linux Python code to read from a USB barcode scanner directly. A symbol LS2208 was used for testing.

brew icon brew

🍺 The missing package manager for macOS (or Linux)

cloud-composer-mssql-dataflow-bigquery icon cloud-composer-mssql-dataflow-bigquery

This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline.

combating-twitter-hate-speech-using-ml-and-nlp icon combating-twitter-hate-speech-using-ml-and-nlp

Using NLP and ML, make a model to identify hate speech (racist or sexist tweets) in Twitter. Problem Statement: Twitter is the biggest platform where anybody and everybody can have their views heard. Some of these voices spread hate and negativity. Twitter is wary of its platform being used as a medium to spread hate. You are a data scientist at Twitter, and you will help Twitter in identifying the tweets with hate speech and removing them from the platform. You will use NLP techniques, perform specific cleanup for tweets data, and make a robust model. Domain: Social Media Analysis to be done: Clean up tweets and build a classification model by using NLP techniques, cleanup specific for tweets data, regularization and hyperparameter tuning using stratified k-fold and cross validation to get the best model. Content: id: identifier number of the tweet Label: 0 (non-hate) /1 (hate) Tweet: the text in the tweet Tasks: Load the tweets file using read_csv function from Pandas package. Get the tweets into a list for easy text cleanup and manipulation. To cleanup: Normalize the casing. Using regular expressions, remove user handles. These begin with '@’. Using regular expressions, remove URLs. Using TweetTokenizer from NLTK, tokenize the tweets into individual terms. Remove stop words. Remove redundant terms like ‘amp’, ‘rt’, etc. Remove ‘#’ symbols from the tweet while retaining the term. Extra cleanup by removing terms with a length of 1. Check out the top terms in the tweets: First, get all the tokenized terms into one large list. Use the counter and find the 10 most common terms. Data formatting for predictive modeling: Join the tokens back to form strings. This will be required for the vectorizers. Assign x and y. Perform train_test_split using sklearn. We’ll use TF-IDF values for the terms as a feature to get into a vector space model. Import TF-IDF vectorizer from sklearn. Instantiate with a maximum of 5000 terms in your vocabulary. Fit and apply on the train set. Apply on the test set. Model building: Ordinary Logistic Regression Instantiate Logistic Regression from sklearn with default parameters. Fit into the train data. Make predictions for the train and the test set. Model evaluation: Accuracy, recall, and f_1 score. Report the accuracy on the train set. Report the recall on the train set: decent, high, or low. Get the f1 score on the train set. Looks like you need to adjust the class imbalance, as the model seems to focus on the 0s. Adjust the appropriate class in the LogisticRegression model. Train again with the adjustment and evaluate. Train the model on the train set. Evaluate the predictions on the train set: accuracy, recall, and f_1 score. Regularization and Hyperparameter tuning: Import GridSearch and StratifiedKFold because of class imbalance. Provide the parameter grid to choose for ‘C’ and ‘penalty’ parameters. Use a balanced class weight while instantiating the logistic regression. Find the parameters with the best recall in cross validation. Choose ‘recall’ as the metric for scoring. Choose stratified 4 fold cross validation scheme. Fit into the train set. What are the best parameters? Predict and evaluate using the best estimator. Use the best estimator from the grid search to make predictions on the test set. What is the recall on the test set for the toxic comments? What is the f_1 score?

covid19-cases-prediction-india icon covid19-cases-prediction-india

I made prediction of covid19 cases in India for all the three cases (Confirmed, Recovered and deceased cases) that give the future covid19 cases prediction using Supervised Machine Learning specifically Regression Algorithm. The prediction is done on both the current datasets and the previous datasets of covid19. Also represents the data in graphical representation using various libraries of python.

devops-exercises icon devops-exercises

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

diagrams icon diagrams

:art: Diagram as Code for prototyping cloud system architectures

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.