The emnist-classifier from xavierspycy

Photo by Chris Ried on Unsplash

graph TD;
    MyDomainKnowledge[My Domain Knowledge]-->MachineLearning[Machine Learning];
    MachineLearning[Machine Learning]-->StatisticalLearning[Statistical Learning];
    MachineLearning[Machine Learning]-->DeepLearning[Deep Learning];
    MyDomainKnowledge[My Domain Knowledge]-->SoftwareDevelopment[Software Development];
    DeepLearning[Deep Learning]-->ComputerVision[Computer Vision];
    DeepLearning[Deep Learning]-->NaturalLanguageProcessing[Natural Language Processing];
    DeepLearning[Deep Learning]-->Multimodality;

Properties	Skills
Domain Knownledge
Language
Data Analysis
Databases
Self-developed package
Statistic Learning Tools
Machine Learning Libraries
Deep Learning Frameworks
Visualization techniques

Curriculum Vitae

English | 中文版

Projects / Experience(s)

Currently working on : LLM Applications
Internship(s):
- Dec, 2023 - Feb, 2024:
  - AIGC Algorithm Intern @
  - Deployed SOTA algorithms such as Depth Estimation, Retrieval-augmented Generation, etc.
NumPy-Based Projects
PyTorch-Based Projects
TensorFlow-Based Projects
- Awesome Tutorials for TensorFlow2

NumPy-Based Projects

⭐ Self-Developed Library using NumPy

Various NumPy-based projects have been successfully integrated into my own open-source Python library, named MLForce. This library is readily accessible on GitHub and the PyPI Community.

Photo by Google DeepMind on Unsplash

⭐ Multilayer Perceptron from Scratch using NumPy

This project embodies a robust implementation of multilayer perceptron classifiers, entirely built upon the powerful NumPy library. We have successfully demonstrated its efficacy in our own unique task. Moving forward, our primary objective is to gradually enhance our model's versatility, ensuring it operates optimally across a diverse array of use cases.

Advantages of our implementation:

Easy to construct:

layers = [
    Input(input_dim=2),
    Dense(units=4, activation='leaky_relu', init='kaiming_normal', init_params={'mode': 'out'}),
    Dense(units=3, activation='hardswish', init='xavier_normal'),
    Dense(units=2, activation='relu', init='kaiming_normal', init_params={'mode': 'in'}),
    Dense(units=1, activation='tanh', init='xavier_uniform')
]
mlp = MultilayerPerceptron(layers)

Easy and stable to train

mlp.compile(optimizer='Adam',
            metrics=['MeanSquareError'])
mlp.fit(X, y, epochs=3, batch_size=8, use_progress_bar=True)

Loss

Great results

Decision boundary

Capability of dealing with complex datasets (10 classes, 128 features, 50,000 samples)

Smooth optimization procedure in 600 epochs

The architecture of this model is:

layers = [
    Input(input_dim=128),
    Dense(units=120, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.25),
    Dense(units=112,  init='kaiming_uniform'),
    Dropout(dropout_rate=0.20),
    Dense(units=96, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.15),
    Dense(units=64, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.10),
    Dense(units=48, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.05),
    Dense(units=32, activation='elu', init='kaiming_uniform'),
    Dense(units=24, activation='elu', init='kaiming_uniform'),
    Dense(units=16, activation='elu', init='kaiming_uniform'),
    Dense(units=10, activation='softmax')
]
mlp = MultilayerPerceptron(layers)
optimizer = Adam(lr=1e-3, weight_decay=0.2)
scheduler = MultiStepLR(optimizer, milestones=[20, 40, 60, 80], gamma=0.8)
earlystopping = EarlyStopping(accuracy, patience=10, mode='max', restore_best_weights=True, start_from_epoch=20)
mlp.compile(optimizer=optimizer,
            metrics=['CrossEntropy', 'Accuracy'],
            scheduler=scheduler
)
mlp.fit(X_train, y_train, 
        epochs=90, batch_size=128, 
        validation_data=(X_test, y_test), use_progress_bar=True, 
        callbacks=[earlystopping]
)

⭐ Non-negative Matrix Factorization using NumPy

This project implements nine different Non-negative Matrix Factorization (NMF) algorithms and compares the robustness of each algorithm to five various types of noise in real-world data applications.

Well-reconstructed effects

Image reconstruction

Sufficient experiments

We conduct a seires of experiments, thus when developing your own algorithms, these results could act as a baseline. The results of the experiments (2 datasets $\times$ 5 noise types $\times$ 2 noise levels $\times$ 5 random seeds implicitly) are displayed in the repository.

Flexible development
Our development framework empowers you to effortlessly create your own NMF algorithms with minimal Python scripting:

import numpy as np
from algorithm.nmf import BasicNMF

class ExampleNMF(BasicNMF):
    name = 'Example'
    # To tailor a unique NMF algorithm, subclass BasicNMF and redefine matrix_init and update methods.
    def matrix_init(self, X, n_components, random_state=None):
        # Implement your initialization logic here.
        # Although we provide built-in methods, crafting a bespoke initialization can markedly boost performance.
        # D, R = <your_initialization_logic>
        # D, R = np.array(D), np.array(R)
        return D, R  # Ensure D, R are returned.

    def update(self, X, **kwargs):
        # Implement the logic for iterative updates here.
        # Modify self.D, self.R as per your algorithm's logic.
        # flag = <convergence_criterion>
        return flag  # Return True if converged, else False.

Mature pipeline

Our framework offers well-established pipelines, accommodating both standard and customized NMF tests. To utilize our existing NMF implementations, simply integrate them into our pipeline as demonstrated below:

from algorithm.pipeline import Pipeline

pipeline = Pipeline(nmf='L1NormRegularizedNMF', 
                    dataset='ORL',
                    reduce=1,
                    noise_type='uniform',
                    noise_level=0.02,
                    random_state=3407, 
                    scaler='MinMax')
# Run the pipeline
pipeline.execute()
pipeline.evaluate()

For personalized NMF models, the nmf parameter accepts a BasicNMF object. You can seamlessly insert your own NMF model into our pipeline to evaluate its performance:

pipeline = Pipeline(nmf=ExampleNMF(),
                    # Insert remaining configurations
)

Multiprocessing experiments (Latest release)

🚀 Latest Release: We've harnessed the power of multiprocessing for extensive experiments, significantly enhancing efficiency. This approach has halved the overall experiment duration, reducing it to 30% ~ 50% of the time it would take to run each experiment sequentially.

For a comprehensive analysis of your algorithm, our platform enables conducting multiple experiments across various datasets:

from algorithm.pipeline import Experiment

exp = Experiment()
# Once you build the data container
# You can choose an NMF algorithm and execute the experiment
exp.choose('L1NormRegularizedNMF')
# This step is very time-consuming, please be patient.
# If you achieve a better performance, congratulations! 
# You can share your results with us.
# Similarly, you can replace 'L1NormRegularizedNMF' with other your customized NMF algorithm
exp.execute()

Interactive algorithm interface (Latest release)

Demo

Note that the initial parameter in these experiments can also be a BasicNMF object, allowing the direct integration of your custom NMF model for thorough evaluation and testing.

DON'T HESITATE TO DEVELOP YOUR OWN ALGORITHM!!!

PyTorch-Based Projects

Photo by Alex Knight on Unsplash

⭐ EMNIST Handwritten Character Classification

This project aims to reproduce various convolutional neural networks and modify them to our specific requirements.

We trained our models on a subset of the original datasets, which accounted for approximately 12% of the data. Surprisingly, our pre-trained models demonstrated remarkable generalization capabilities. During testing on the entire dataset, they exhibited excellent performance, showcasing their ability to achieve similar results when trained on just 10% of the data as compared to training on 100%.

Moreover, our models have proven to be transferable to downstream tasks, such as the MNIST datasets. By employing the architectures we implemented, we were able to attain an accuracy of over 99% on both the PyTorch and Kaggle MNIST datasets. Furthermore, by partially utilizing the trained parameters and unfreezing the parameters of the fully connected layers, we achieved an impressive accuracy of over 95%.

Performance of different CNNs on the training set

	AlexNet	VGGNet	SpinalNet	ResNet
Accuracy	87.95%	89.80%	87.15%	89.28%
Precision	87.62%	90.01%	86.18%	89.24%
Recall	87.95%	89.80%	87.15%	89.28%
F1 score	86.59%	88.42%	85.28%	88.30%

Performance of different CNNs on the test set

	AlexNet	VGGNet	SpinalNet	ResNet
Accuracy	86.96%	87.24%	85.92%	86.88%
Precision	85.55%	86.43%	85.92%	86.88%
Recall	86.96%	87.24%	85.92%	86.88%
F1 score	85.58%	85.66%	84.07%	85.68%

Effects of one model

⭐ CAT - A Visual-Text Multimodal Classifier

This project involves a multi-label multi-classification problem. We deployed four pre-trained image models and two pre-trained text models. To enhance performance, we developed 12 multi-modal models using self-attention and cross-attention mechanisms. The project poster showcases some valuable techniques and intriguing discoveries.

CAT (Convolution, Attention and Transformer) architecture

Project Poster

⭐ Robust Traniners for Noisy Labels

This project is an experimental repository focusing on dealing with datasets containing a high level of noisy labels (50% and above). This repository features experiments conducted on the FashionMNIST and CIFAR datasets using the ResNet34 as the baseline classifier.

The repository explores various training strategies ('trainers'), including ForwardLossCorrection, CoTeaching, JoCoR, and O2UNet. Specifically, for datasets with unknown transition matrices, DualT is employed as the Transition Matrix Estimator. Given the computational complexity and practical performance considerations, our experiments primarily focus on ForwardLossCorrection and CoTeaching. We conducted multiple experiments with different random seeds to compare these two methods.

Initial explorations on FashionMNIST0.5 with JoCoR and O2UNet have shown promising results. This repository serves as a resource for those interested in robust machine learning techniques under challenging conditions of high label noise.

Meaningful Loss Trends

Loss Trend 1

Loss Trend 2

Persuasive Results

Actual Transition Matrix of FashionMNIST0.5

0.5	0.2	0.3
0.3	0.5	0.2
0.2	0.3	0.5

Estimated Transition Matrix of FashionMNIST0.5

0.473	0.209	0.309
0.306	0.485	0.232
0.221	0.306	0.460

Actual Transition Matrix of FashionMNIST0.6

0.4	0.3	0.3
0.3	0.4	0.3
0.3	0.3	0.4

Estimated Transition Matrix of FashionMNIST0.6

0.407	0.295	0.298
0.297	0.394	0.308
0.301	0.310	0.388

TensorFlow-Based Projects

⭐ Awesome Tutorials for TensorFlow2

Certification:

Natural Language Processing Specialization - DeepLearning.AI, Oct 2023 - Badge

Deep Learning Specialization - DeepLearning.AI, Aug 2023- Badge

Mathematics for Machine Learning and Data Science Specialization - DeepLearning.AI, Aug 2023- Badge

Applied Data Science with Python Specialization - University of Michigan, Jul 2023 - Badge

Machine Learning Specialization - DeepLearning.AI, Stanford University, Jul 2023 - Badge

Mathematics for Machine Learning Specialization - Imperial College London, Jun 2023 - Badge

Expressway to Data Science: Python Programming Specialization - University of Colorado Boulder, Dec 2022 - Badge

Python 3 Programming Specializationn - University of Michigan, Dec 2022 - Badge

Introduction to Scripting in Python Specializationn - Rice University, Nov 2022 - Badge

Statistics with Python Specialization - University of Michigan, Nov 2022 - Badge

Excel Skills for Data Analytics and Visualization Specialization - Macquarie University, Oct 2022 - Badge

Python for Everybody Specialization - University of Michigan, Oct 2022 - Badge

Excel Skills for Business Specialization - Macquarie University, Sep 2022 - Badge

How to Reach me:

Email: [email protected]

LinkedIn: Jiarui XU

xavierspycy / emnist-classifier Goto Github PK

emnist-classifier's Introduction