Photo by Chris Ried on Unsplash
graph TD;
MyDomainKnowledge[My Domain Knowledge]-->MachineLearning[Machine Learning];
MachineLearning[Machine Learning]-->StatisticalLearning[Statistical Learning];
MachineLearning[Machine Learning]-->DeepLearning[Deep Learning];
MyDomainKnowledge[My Domain Knowledge]-->SoftwareDevelopment[Software Development];
DeepLearning[Deep Learning]-->ComputerVision[Computer Vision];
DeepLearning[Deep Learning]-->NaturalLanguageProcessing[Natural Language Processing];
DeepLearning[Deep Learning]-->Multimodality;
Properties | Skills |
---|---|
Domain Knownledge | |
Language | |
Data Analysis | |
Databases | |
Self-developed package | |
Statistic Learning Tools | |
Machine Learning Libraries | |
Deep Learning Frameworks | |
Visualization techniques |
-
Currently working on : LLM Applications
-
Internship(s):
Various NumPy-based projects have been successfully integrated into my own open-source Python library, named MLForce
. This library is readily accessible on GitHub and the PyPI Community.
This project embodies a robust implementation of multilayer perceptron classifiers, entirely built upon the powerful NumPy library. We have successfully demonstrated its efficacy in our own unique task. Moving forward, our primary objective is to gradually enhance our model's versatility, ensuring it operates optimally across a diverse array of use cases.
Advantages of our implementation:
- Easy to construct:
layers = [
Input(input_dim=2),
Dense(units=4, activation='leaky_relu', init='kaiming_normal', init_params={'mode': 'out'}),
Dense(units=3, activation='hardswish', init='xavier_normal'),
Dense(units=2, activation='relu', init='kaiming_normal', init_params={'mode': 'in'}),
Dense(units=1, activation='tanh', init='xavier_uniform')
]
mlp = MultilayerPerceptron(layers)
- Easy and stable to train
mlp.compile(optimizer='Adam',
metrics=['MeanSquareError'])
mlp.fit(X, y, epochs=3, batch_size=8, use_progress_bar=True)
- Great results
- Capability of dealing with complex datasets (10 classes, 128 features, 50,000 samples)
The architecture of this model is:
layers = [
Input(input_dim=128),
Dense(units=120, activation='elu', init='kaiming_uniform'),
Dropout(dropout_rate=0.25),
Dense(units=112, init='kaiming_uniform'),
Dropout(dropout_rate=0.20),
Dense(units=96, activation='elu', init='kaiming_uniform'),
Dropout(dropout_rate=0.15),
Dense(units=64, activation='elu', init='kaiming_uniform'),
Dropout(dropout_rate=0.10),
Dense(units=48, activation='elu', init='kaiming_uniform'),
Dropout(dropout_rate=0.05),
Dense(units=32, activation='elu', init='kaiming_uniform'),
Dense(units=24, activation='elu', init='kaiming_uniform'),
Dense(units=16, activation='elu', init='kaiming_uniform'),
Dense(units=10, activation='softmax')
]
mlp = MultilayerPerceptron(layers)
optimizer = Adam(lr=1e-3, weight_decay=0.2)
scheduler = MultiStepLR(optimizer, milestones=[20, 40, 60, 80], gamma=0.8)
earlystopping = EarlyStopping(accuracy, patience=10, mode='max', restore_best_weights=True, start_from_epoch=20)
mlp.compile(optimizer=optimizer,
metrics=['CrossEntropy', 'Accuracy'],
scheduler=scheduler
)
mlp.fit(X_train, y_train,
epochs=90, batch_size=128,
validation_data=(X_test, y_test), use_progress_bar=True,
callbacks=[earlystopping]
)
This project implements nine different Non-negative Matrix Factorization (NMF) algorithms and compares the robustness of each algorithm to five various types of noise in real-world data applications.
We conduct a seires of experiments, thus when developing your own algorithms, these results could act as a baseline. The results of the experiments (2 datasets
- Flexible development
Our development framework empowers you to effortlessly create your own NMF algorithms with minimal Python scripting:
import numpy as np
from algorithm.nmf import BasicNMF
class ExampleNMF(BasicNMF):
name = 'Example'
# To tailor a unique NMF algorithm, subclass BasicNMF and redefine matrix_init and update methods.
def matrix_init(self, X, n_components, random_state=None):
# Implement your initialization logic here.
# Although we provide built-in methods, crafting a bespoke initialization can markedly boost performance.
# D, R = <your_initialization_logic>
# D, R = np.array(D), np.array(R)
return D, R # Ensure D, R are returned.
def update(self, X, **kwargs):
# Implement the logic for iterative updates here.
# Modify self.D, self.R as per your algorithm's logic.
# flag = <convergence_criterion>
return flag # Return True if converged, else False.
Our framework offers well-established pipelines, accommodating both standard and customized NMF tests. To utilize our existing NMF implementations, simply integrate them into our pipeline as demonstrated below:
from algorithm.pipeline import Pipeline
pipeline = Pipeline(nmf='L1NormRegularizedNMF',
dataset='ORL',
reduce=1,
noise_type='uniform',
noise_level=0.02,
random_state=3407,
scaler='MinMax')
# Run the pipeline
pipeline.execute()
pipeline.evaluate()
For personalized NMF models, the nmf
parameter accepts a BasicNMF
object. You can seamlessly insert your own NMF model into our pipeline to evaluate its performance:
pipeline = Pipeline(nmf=ExampleNMF(),
# Insert remaining configurations
)
🚀 Latest Release
: We've harnessed the power of multiprocessing for extensive experiments, significantly enhancing efficiency. This approach has halved the overall experiment duration, reducing it to 30% ~ 50% of the time it would take to run each experiment sequentially.
For a comprehensive analysis of your algorithm, our platform enables conducting multiple experiments across various datasets:
from algorithm.pipeline import Experiment
exp = Experiment()
# Once you build the data container
# You can choose an NMF algorithm and execute the experiment
exp.choose('L1NormRegularizedNMF')
# This step is very time-consuming, please be patient.
# If you achieve a better performance, congratulations!
# You can share your results with us.
# Similarly, you can replace 'L1NormRegularizedNMF' with other your customized NMF algorithm
exp.execute()
Note that the initial parameter in these experiments can also be a BasicNMF
object, allowing the direct integration of your custom NMF model for thorough evaluation and testing.
DON'T HESITATE TO DEVELOP YOUR OWN ALGORITHM!!!
Photo by Alex Knight on Unsplash
This project aims to reproduce various convolutional neural networks and modify them to our specific requirements.
We trained our models on a subset of the original datasets, which accounted for approximately 12% of the data. Surprisingly, our pre-trained models demonstrated remarkable generalization capabilities. During testing on the entire dataset, they exhibited excellent performance, showcasing their ability to achieve similar results when trained on just 10% of the data as compared to training on 100%.
Moreover, our models have proven to be transferable to downstream tasks, such as the MNIST datasets. By employing the architectures we implemented, we were able to attain an accuracy of over 99% on both the PyTorch and Kaggle MNIST datasets. Furthermore, by partially utilizing the trained parameters and unfreezing the parameters of the fully connected layers, we achieved an impressive accuracy of over 95%.
AlexNet | VGGNet | SpinalNet | ResNet | |
Accuracy | 87.95% | 89.80% | 87.15% | 89.28% |
Precision | 87.62% | 90.01% | 86.18% | 89.24% |
Recall | 87.95% | 89.80% | 87.15% | 89.28% |
F1 score | 86.59% | 88.42% | 85.28% | 88.30% |
AlexNet | VGGNet | SpinalNet | ResNet | |
Accuracy | 86.96% | 87.24% | 85.92% | 86.88% |
Precision | 85.55% | 86.43% | 85.92% | 86.88% |
Recall | 86.96% | 87.24% | 85.92% | 86.88% |
F1 score | 85.58% | 85.66% | 84.07% | 85.68% |
This project involves a multi-label multi-classification problem. We deployed four pre-trained image models and two pre-trained text models. To enhance performance, we developed 12 multi-modal models using self-attention and cross-attention mechanisms. The project poster showcases some valuable techniques and intriguing discoveries.
This project is an experimental repository focusing on dealing with datasets containing a high level of noisy labels (50% and above). This repository features experiments conducted on the FashionMNIST
and CIFAR
datasets using the ResNet34
as the baseline classifier.
The repository explores various training strategies ('trainers'), including ForwardLossCorrection
, CoTeaching
, JoCoR
, and O2UNet
. Specifically, for datasets with unknown transition matrices, DualT
is employed as the Transition Matrix Estimator. Given the computational complexity and practical performance considerations, our experiments primarily focus on ForwardLossCorrection
and CoTeaching
. We conducted multiple experiments with different random seeds to compare these two methods.
Initial explorations on FashionMNIST0.5
with JoCoR
and O2UNet
have shown promising results. This repository serves as a resource for those interested in robust machine learning techniques under challenging conditions of high label noise.
- Meaningful Loss Trends
- Persuasive Results
Natural Language Processing Specialization - DeepLearning.AI, Oct 2023 - Badge
Deep Learning Specialization - DeepLearning.AI, Aug 2023- Badge
Mathematics for Machine Learning and Data Science Specialization - DeepLearning.AI, Aug 2023- Badge
Applied Data Science with Python Specialization - University of Michigan, Jul 2023 - Badge
Machine Learning Specialization - DeepLearning.AI, Stanford University, Jul 2023 - Badge
Mathematics for Machine Learning Specialization - Imperial College London, Jun 2023 - Badge
Expressway to Data Science: Python Programming Specialization - University of Colorado Boulder, Dec 2022 - Badge
Python 3 Programming Specializationn - University of Michigan, Dec 2022 - Badge
Introduction to Scripting in Python Specializationn - Rice University, Nov 2022 - Badge
Statistics with Python Specialization - University of Michigan, Nov 2022 - Badge
Excel Skills for Data Analytics and Visualization Specialization - Macquarie University, Oct 2022 - Badge
Python for Everybody Specialization - University of Michigan, Oct 2022 - Badge
Excel Skills for Business Specialization - Macquarie University, Sep 2022 - Badge