Git Product home page Git Product logo

emnist-classifier's Introduction

Photo by Chris Ried on Unsplash

Typing SVG

graph TD;
    MyDomainKnowledge[My Domain Knowledge]-->MachineLearning[Machine Learning];
    MachineLearning[Machine Learning]-->StatisticalLearning[Statistical Learning];
    MachineLearning[Machine Learning]-->DeepLearning[Deep Learning];
    MyDomainKnowledge[My Domain Knowledge]-->SoftwareDevelopment[Software Development];
    DeepLearning[Deep Learning]-->ComputerVision[Computer Vision];
    DeepLearning[Deep Learning]-->NaturalLanguageProcessing[Natural Language Processing];
    DeepLearning[Deep Learning]-->Multimodality;
Properties Skills
Domain Knownledge Deep Learning Natural Language Processing Software Development
Language Python
Data Analysis NumPy Pandas SciPy Matplotlib
Databases MySQL
Self-developed package mlforce Machine Learning Force
Statistic Learning Tools R programming R Studio
Machine Learning Libraries Scikit Learn
Deep Learning Frameworks PyTorch TensorFlow HuggingFace
Visualization techniques Tableau PowerBI D3 Tulip yEd Gephi

Curriculum Vitae

English | 中文版

Projects / Experience(s)

NumPy-Based Projects

⭐ Self-Developed Library using NumPy

Various NumPy-based projects have been successfully integrated into my own open-source Python library, named MLForce. This library is readily accessible on GitHub and the PyPI Community.

Photo by Google DeepMind on Unsplash

This project embodies a robust implementation of multilayer perceptron classifiers, entirely built upon the powerful NumPy library. We have successfully demonstrated its efficacy in our own unique task. Moving forward, our primary objective is to gradually enhance our model's versatility, ensuring it operates optimally across a diverse array of use cases.

Advantages of our implementation:

  • Easy to construct:
layers = [
    Input(input_dim=2),
    Dense(units=4, activation='leaky_relu', init='kaiming_normal', init_params={'mode': 'out'}),
    Dense(units=3, activation='hardswish', init='xavier_normal'),
    Dense(units=2, activation='relu', init='kaiming_normal', init_params={'mode': 'in'}),
    Dense(units=1, activation='tanh', init='xavier_uniform')
]
mlp = MultilayerPerceptron(layers)
  • Easy and stable to train
mlp.compile(optimizer='Adam',
            metrics=['MeanSquareError'])
mlp.fit(X, y, epochs=3, batch_size=8, use_progress_bar=True)

Loss
  • Great results

Decision boundary
  • Capability of dealing with complex datasets (10 classes, 128 features, 50,000 samples)

Smooth optimization procedure in 600 epochs

The architecture of this model is:

layers = [
    Input(input_dim=128),
    Dense(units=120, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.25),
    Dense(units=112,  init='kaiming_uniform'),
    Dropout(dropout_rate=0.20),
    Dense(units=96, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.15),
    Dense(units=64, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.10),
    Dense(units=48, activation='elu', init='kaiming_uniform'),
    Dropout(dropout_rate=0.05),
    Dense(units=32, activation='elu', init='kaiming_uniform'),
    Dense(units=24, activation='elu', init='kaiming_uniform'),
    Dense(units=16, activation='elu', init='kaiming_uniform'),
    Dense(units=10, activation='softmax')
]
mlp = MultilayerPerceptron(layers)
optimizer = Adam(lr=1e-3, weight_decay=0.2)
scheduler = MultiStepLR(optimizer, milestones=[20, 40, 60, 80], gamma=0.8)
earlystopping = EarlyStopping(accuracy, patience=10, mode='max', restore_best_weights=True, start_from_epoch=20)
mlp.compile(optimizer=optimizer,
            metrics=['CrossEntropy', 'Accuracy'],
            scheduler=scheduler
)
mlp.fit(X_train, y_train, 
        epochs=90, batch_size=128, 
        validation_data=(X_test, y_test), use_progress_bar=True, 
        callbacks=[earlystopping]
)

This project implements nine different Non-negative Matrix Factorization (NMF) algorithms and compares the robustness of each algorithm to five various types of noise in real-world data applications.

Well-reconstructed effects

Image reconstruction

Sufficient experiments

We conduct a seires of experiments, thus when developing your own algorithms, these results could act as a baseline. The results of the experiments (2 datasets $\times$ 5 noise types $\times$ 2 noise levels $\times$ 5 random seeds implicitly) are displayed in the repository.

  • Flexible development
    Our development framework empowers you to effortlessly create your own NMF algorithms with minimal Python scripting:
import numpy as np
from algorithm.nmf import BasicNMF

class ExampleNMF(BasicNMF):
    name = 'Example'
    # To tailor a unique NMF algorithm, subclass BasicNMF and redefine matrix_init and update methods.
    def matrix_init(self, X, n_components, random_state=None):
        # Implement your initialization logic here.
        # Although we provide built-in methods, crafting a bespoke initialization can markedly boost performance.
        # D, R = <your_initialization_logic>
        # D, R = np.array(D), np.array(R)
        return D, R  # Ensure D, R are returned.

    def update(self, X, **kwargs):
        # Implement the logic for iterative updates here.
        # Modify self.D, self.R as per your algorithm's logic.
        # flag = <convergence_criterion>
        return flag  # Return True if converged, else False.

Mature pipeline

Our framework offers well-established pipelines, accommodating both standard and customized NMF tests. To utilize our existing NMF implementations, simply integrate them into our pipeline as demonstrated below:

from algorithm.pipeline import Pipeline

pipeline = Pipeline(nmf='L1NormRegularizedNMF', 
                    dataset='ORL',
                    reduce=1,
                    noise_type='uniform',
                    noise_level=0.02,
                    random_state=3407, 
                    scaler='MinMax')
# Run the pipeline
pipeline.execute()
pipeline.evaluate()

For personalized NMF models, the nmf parameter accepts a BasicNMF object. You can seamlessly insert your own NMF model into our pipeline to evaluate its performance:

pipeline = Pipeline(nmf=ExampleNMF(),
                    # Insert remaining configurations
)

Multiprocessing experiments (Latest release)

🚀 Latest Release: We've harnessed the power of multiprocessing for extensive experiments, significantly enhancing efficiency. This approach has halved the overall experiment duration, reducing it to 30% ~ 50% of the time it would take to run each experiment sequentially.

For a comprehensive analysis of your algorithm, our platform enables conducting multiple experiments across various datasets:

from algorithm.pipeline import Experiment

exp = Experiment()
# Once you build the data container
# You can choose an NMF algorithm and execute the experiment
exp.choose('L1NormRegularizedNMF')
# This step is very time-consuming, please be patient.
# If you achieve a better performance, congratulations! 
# You can share your results with us.
# Similarly, you can replace 'L1NormRegularizedNMF' with other your customized NMF algorithm
exp.execute()

Interactive algorithm interface (Latest release)

Demo

Note that the initial parameter in these experiments can also be a BasicNMF object, allowing the direct integration of your custom NMF model for thorough evaluation and testing.

DON'T HESITATE TO DEVELOP YOUR OWN ALGORITHM!!!

PyTorch-Based Projects

PyTorch

Photo by Alex Knight on Unsplash

This project aims to reproduce various convolutional neural networks and modify them to our specific requirements.

We trained our models on a subset of the original datasets, which accounted for approximately 12% of the data. Surprisingly, our pre-trained models demonstrated remarkable generalization capabilities. During testing on the entire dataset, they exhibited excellent performance, showcasing their ability to achieve similar results when trained on just 10% of the data as compared to training on 100%.

Moreover, our models have proven to be transferable to downstream tasks, such as the MNIST datasets. By employing the architectures we implemented, we were able to attain an accuracy of over 99% on both the PyTorch and Kaggle MNIST datasets. Furthermore, by partially utilizing the trained parameters and unfreezing the parameters of the fully connected layers, we achieved an impressive accuracy of over 95%.

Performance of different CNNs on the training set
AlexNet VGGNet SpinalNet ResNet
Accuracy 87.95% 89.80% 87.15% 89.28%
Precision 87.62% 90.01% 86.18% 89.24%
Recall 87.95% 89.80% 87.15% 89.28%
F1 score 86.59% 88.42% 85.28% 88.30%
Performance of different CNNs on the test set
AlexNet VGGNet SpinalNet ResNet
Accuracy 86.96% 87.24% 85.92% 86.88%
Precision 85.55% 86.43% 85.92% 86.88%
Recall 86.96% 87.24% 85.92% 86.88%
F1 score 85.58% 85.66% 84.07% 85.68%

Effects of one model

This project involves a multi-label multi-classification problem. We deployed four pre-trained image models and two pre-trained text models. To enhance performance, we developed 12 multi-modal models using self-attention and cross-attention mechanisms. The project poster showcases some valuable techniques and intriguing discoveries.

CAT (Convolution, Attention and Transformer) architecture

Project Poster

This project is an experimental repository focusing on dealing with datasets containing a high level of noisy labels (50% and above). This repository features experiments conducted on the FashionMNIST and CIFAR datasets using the ResNet34 as the baseline classifier.

The repository explores various training strategies ('trainers'), including ForwardLossCorrection, CoTeaching, JoCoR, and O2UNet. Specifically, for datasets with unknown transition matrices, DualT is employed as the Transition Matrix Estimator. Given the computational complexity and practical performance considerations, our experiments primarily focus on ForwardLossCorrection and CoTeaching. We conducted multiple experiments with different random seeds to compare these two methods.

Initial explorations on FashionMNIST0.5 with JoCoR and O2UNet have shown promising results. This repository serves as a resource for those interested in robust machine learning techniques under challenging conditions of high label noise.

  • Meaningful Loss Trends

Loss Trend 1

Loss Trend 2
  • Persuasive Results
Actual Transition Matrix of FashionMNIST0.5
0.50.20.3
0.30.50.2
0.20.30.5
Estimated Transition Matrix of FashionMNIST0.5
0.4730.2090.309
0.3060.4850.232
0.2210.3060.460
Actual Transition Matrix of FashionMNIST0.6
0.40.30.3
0.30.40.3
0.30.30.4
Estimated Transition Matrix of FashionMNIST0.6
0.4070.2950.298
0.2970.3940.308
0.3010.3100.388

TensorFlow-Based Projects

TensorFlow

Certification:

Coursera

Natural Language Processing Specialization - DeepLearning.AI, Oct 2023 - Badge

Deep Learning Specialization - DeepLearning.AI, Aug 2023- Badge

Mathematics for Machine Learning and Data Science Specialization - DeepLearning.AI, Aug 2023- Badge

Applied Data Science with Python Specialization - University of Michigan, Jul 2023 - Badge

Machine Learning Specialization - DeepLearning.AI, Stanford University, Jul 2023 - Badge

Mathematics for Machine Learning Specialization - Imperial College London, Jun 2023 - Badge

Expressway to Data Science: Python Programming Specialization - University of Colorado Boulder, Dec 2022 - Badge

Python 3 Programming Specializationn - University of Michigan, Dec 2022 - Badge

Introduction to Scripting in Python Specializationn - Rice University, Nov 2022 - Badge

Statistics with Python Specialization - University of Michigan, Nov 2022 - Badge

Excel Skills for Data Analytics and Visualization Specialization - Macquarie University, Oct 2022 - Badge

Python for Everybody Specialization - University of Michigan, Oct 2022 - Badge

Excel Skills for Business Specialization - Macquarie University, Sep 2022 - Badge

Wall Wall

How to Reach me:

Gmail Email: [email protected]

LinkedIn LinkedIn: Jiarui XU

Thank you for visiting ❤️

emnist-classifier's People

Contributors

xavierspycy avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.