kauvinlucas Goto Github PK

followers: 10.0 following: 7.0 repos: 16.0 gists: 17.0

Name: Kauvin Lucas

Type: User

Bio: Data Analytics / Data Engineering / Artificial Intelligence

Location: Bolivia

Blog: kauvinlucas.com

Hi and welcome 👋

Repository list by topic:

Machine Learning Engineering
- Predicting Car Accident Severity
- Optimizing a Machine Learning Pipeline in Azure

Data Engineering
- Pyspark stateful processing with Twitter API and Apache Kafka
- Running a Spark on Kubernetes application

Data Analytics
- Maven Unicorn Challenge
- ENEM 2019 microdata analysis with PySpark

Course Notes & Other Useful Material
- Big Data - Science notes

Contributions
- Spark Study Club (Data Engineering LATAM)

Kauvin Lucas's Projects

big-data-science-notes

My notes of each module in Big Data Science, an online course offered by Semantix Brasil

dio-analise-de-dados-com-pandas

Neste repositório apresentei os notebooks de analise exploratória e visualização de dados feitos no Python com a ajuda das bibliotecas Pandas e Matplotlib. Este repositório responde ao desafio da plataforma Digital Innovation One.

dio-google-cloud-dataproc

Este repositório contêm os arquivos de contagem de palavras gerados no Google Cloud por meio de script de Python e dentro de um ecossistema de Big Data gerenciado em cloud chamado Google DataProc. O repositório em questão responde ao desafio da plataforma Digital Innovation One.

docker-bigdata

Big Data Ecosystem Docker

fifa18-all-player-statistics

A complete catalog of all the players in Fifa 18 and their complete statistics.

imersao

jupyter-spark-enem-2019

In this project, I analyzed the scores of the ENEM 2019, a standardized test used for admission in Brazilian colleges, in the context of existing socioeconomic disparities between participants. PySpark was used for data ingestion and transformation. Pandas, Statsmodels, Matplotlib/Seaborn/Folium, and Scikit-learn were used for descriptive analysis and data visualization.

kauvinlucas

maven-unicorn-challenge

This is a web app made with Python consisting of a dashboard that was used as submission for a visualization challenge called "Maven Unicorn Challenge" by Maven Analytics

optimizing-a-pipeline-in-azure

The main goal of this project was to build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn Logistic Regression model to solve a classification problem. Hyperdrive was used to optimize the model. This was then compared to an Azure AutoML run to see which of these approaches returns the best tuned model.

kauvinlucas Goto Github PK

Hi and welcome 👋

Repository list by topic:

Kauvin Lucas's Projects

Recommend Projects

Recommend Topics

Recommend Org