shankat2020 Goto Github PK

followers: 0.0 following: 3.0 repos: 83.0 gists: 2.0

Name: Sankar Sundaram

Type: User

Company: Consulting Group

Location: NJ

Sankar Sundaram's Projects

algorithms

Minimal examples of data structures and algorithms in Python

azure-dataengineering-certification

Databricks notebooks for AZ Data Engineer certification path

azure-iot-sdk-node

A Node.js SDK for connecting devices to Microsoft Azure IoT services

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.

azure4everyone-samples

azuredatabricksbestpractices

Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs

azuredatabrickspostmancollection

A collection of Azure Databricks API requests

barcode_scanner_python

Linux Python code to read from a USB barcode scanner directly. A symbol LS2208 was used for testing.

barcodescanner

:mag_right: A simple and beautiful barcode scanner.

brew

🍺 The missing package manager for macOS (or Linux)

bulding-a-prediction-model-for-salary-hike-of-employee-by-using-simple-linear-regression-in-python

Predicting the Salary Hike of an employee by using "years of experience" as the independent variable and building a simple linear regression model to form the linear equation relationship between the two variables and predicting the future salary hike with the help of that linear equation.

car-assembly

cloud-ai-analytics

This Repo contain details related to Data Engineering tech stacks in GCP

cloud-composer-mssql-dataflow-bigquery

This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline.

coding-interview-university

A complete computer science study plan to become a software engineer.

combating-twitter-hate-speech-using-ml-and-nlp

Using NLP and ML, make a model to identify hate speech (racist or sexist tweets) in Twitter. Problem Statement: Twitter is the biggest platform where anybody and everybody can have their views heard. Some of these voices spread hate and negativity. Twitter is wary of its platform being used as a medium to spread hate. You are a data scientist at Twitter, and you will help Twitter in identifying the tweets with hate speech and removing them from the platform. You will use NLP techniques, perform specific cleanup for tweets data, and make a robust model. Domain: Social Media Analysis to be done: Clean up tweets and build a classification model by using NLP techniques, cleanup specific for tweets data, regularization and hyperparameter tuning using stratified k-fold and cross validation to get the best model. Content: id: identifier number of the tweet Label: 0 (non-hate) /1 (hate) Tweet: the text in the tweet Tasks: Load the tweets file using read_csv function from Pandas package. Get the tweets into a list for easy text cleanup and manipulation. To cleanup: Normalize the casing. Using regular expressions, remove user handles. These begin with '@’. Using regular expressions, remove URLs. Using TweetTokenizer from NLTK, tokenize the tweets into individual terms. Remove stop words. Remove redundant terms like ‘amp’, ‘rt’, etc. Remove ‘#’ symbols from the tweet while retaining the term. Extra cleanup by removing terms with a length of 1. Check out the top terms in the tweets: First, get all the tokenized terms into one large list. Use the counter and find the 10 most common terms. Data formatting for predictive modeling: Join the tokens back to form strings. This will be required for the vectorizers. Assign x and y. Perform train_test_split using sklearn. We’ll use TF-IDF values for the terms as a feature to get into a vector space model. Import TF-IDF vectorizer from sklearn. Instantiate with a maximum of 5000 terms in your vocabulary. Fit and apply on the train set. Apply on the test set. Model building: Ordinary Logistic Regression Instantiate Logistic Regression from sklearn with default parameters. Fit into the train data. Make predictions for the train and the test set. Model evaluation: Accuracy, recall, and f_1 score. Report the accuracy on the train set. Report the recall on the train set: decent, high, or low. Get the f1 score on the train set. Looks like you need to adjust the class imbalance, as the model seems to focus on the 0s. Adjust the appropriate class in the LogisticRegression model. Train again with the adjustment and evaluate. Train the model on the train set. Evaluate the predictions on the train set: accuracy, recall, and f_1 score. Regularization and Hyperparameter tuning: Import GridSearch and StratifiedKFold because of class imbalance. Provide the parameter grid to choose for ‘C’ and ‘penalty’ parameters. Use a balanced class weight while instantiating the logistic regression. Find the parameters with the best recall in cross validation. Choose ‘recall’ as the metric for scoring. Choose stratified 4 fold cross validation scheme. Fit into the train set. What are the best parameters? Predict and evaluate using the best estimator. Use the best estimator from the grid search to make predictions on the test set. What is the recall on the test set for the toxic comments? What is the f_1 score?

consume-api-python-demo

Example project demonstrating how to consume open web APIs

coursera

Quiz & Assignment of Coursera

covid19-cases-prediction-india

I made prediction of covid19 cases in India for all the three cases (Confirmed, Recovered and deceased cases) that give the future covid19 cases prediction using Supervised Machine Learning specifically Regression Algorithm. The prediction is done on both the current datasets and the previous datasets of covid19. Also represents the data in graphical representation using various libraries of python.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.