Abir Oumghar's Projects
Explorations of k-means clustering for Big Data, featuring sequential, streaming, and distributed implementations tailored for scalability and efficiency.
š ChemNLP-MaterialsAnalysis: Enhancing materials chemistry research with advanced NLP. Key features: š Integrates with arXiv & PubChem datasets š¤ Applies BERT embeddings & ML clustering (KMeans, t-SNE, UMAP, PCA) š Uses pickle for efficient data handling š Aims for deeper insights & accelerated discovery in materials science.
This project presents a movie recommendation system utilizing the AutoRec model with Cornac, aimed at delivering personalized movie recommendations based on user preferences.
A scholarly Python endeavor examining PCA, TSNE, UMAP impacts on PubMed data clustering š, with BBC News/Web Content as optional datasets. It scrutinizes dimensionality reduction's influence on K-means cluster fidelity, aiming for robust analytical insights .
EnergyClusterAnalytics š is an academic project that showcases the power of unsupervised learning in analyzing residential electricity consumption š. Utilizing PCA, clustering, and Binary Segmentation Search , it identifies unique consumption patterns to inform energy management strategies .
Embark on a time series journey exploring electricity usage patterns with XGBoost and Random Forest. Using UC Irvine's data plus weather and holiday insights, this project aims to forecast demand and enhance energy planning. Dive into our predictive analytics adventure for smarter energy management šš”
html code
This repo represent a Jupyter notebooks facilitating ETL processes on Google Cloud Platform, with practical examples and a ready-to-use dataset for easy adaptation and testing in any GCP environment.
A deep dive into optimizing MNIST digit predictions using semi-supervised learning with just 100 labeled samples. Utilizes pseudo-labels to bridge the gap between labeled and unlabeled data, leveraging TensorFlow for model implementation. A compact showcase of enhancing model accuracy with minimal labeled data.
This repository explores the application of supervised learning techniques to two key domains: banking credit data and relational datasets (Cora, CiteSeer, PubMed). It aims to tackle real-world challenges through a comparative analysis of methods such as Naive Bayes, KNN, SVM, and more, all implemented in Python.