samyssmile / edux Goto Github PK

EDUX is a developer friendly Java library for machine learning educational tasks

Home Page: https://samyssmile.github.io/edux/

License: Apache License 2.0

Java 99.42% Cuda 0.28% C 0.30%

java machine-learning machinelearning ml open-source self-education hacktoberfest hacktoberfest2023 deep-learning java-machine-learning machine-learning-algorithms multilayer-perceptron-network neural-network neural-networks java-mlp

edux's Introduction

EDUX - Java Machine Learning Library

EDUX is a user-friendly library for solving problems with a machine learning approach.

Features

EDUX supports a variety of machine learning algorithms including:

Multilayer Perceptron (Neural Network): Suitable for regression and classification problems, MLPs can approximate non-linear functions.
K Nearest Neighbors: A simple, instance-based learning algorithm used for classification and regression.
Decision Tree: Offers visual and explicitly laid out decision making based on input features.
Support Vector Machine: Effective for binary classification, and can be adapted for multi-class problems.
RandomForest: An ensemble method providing high accuracy through building multiple decision trees.

Augmentations

Edux supports a variety of image augmentations, which can be used to increase the performance of your model.

Few examples:

Color Equalization

Monochrome + Noise

Code Example

Single Image

    AugmentationSequence augmentationSequence=
        new AugmentationBuilder()
        .addAugmentation(new ResizeAugmentation(250,250))
        .addAugmentation(new ColorEqualizationAugmentation())
        .build();

        BufferedImage augmentedImage=augmentationSequence.applyTo(image);

Run for all images in a directory

    AugmentationSequence augmentationSequence=
        new AugmentationBuilder()
        .addAugmentation(new ResizeAugmentation(250,250))
        .addAugmentation(new ColorEqualizationAugmentation())
        .addAugmentation(new BlurAugmentation(25))
        .addAugmentation(new RandomDeleteAugmentation(10,20,20))
        .build()
        .run(trainImagesDir,numberOfWorkers,outputDir);

Battle Royale - Which algorithm is the best?

We run all algorithms on the same dataset and compare the results. Benchmark

Goal

The main goal of this project is to create a user-friendly library for solving problems using a machine learning approach. The library is designed to be easy to use, enabling the solution of problems with just a few lines of code.

Features

The library currently supports:

Multilayer Perceptron (Neural Network)
K Nearest Neighbors
Decision Tree
Support Vector Machine
RandomForest

Get started

Include the library as a dependency in your Java project file.

Gradle

 implementation 'io.github.samyssmile:edux:1.0.7'

Maven

  <dependency>
     <groupId>io.github.samyssmile</groupId>
     <artifactId>edux</artifactId>
     <version>1.0.7</version>
 </dependency>

Hardware Acceleration (preview feature)

EDUX supports Nvidia GPU acceleration.

Requirements

Nvidia GPU with CUDA support
CUDA Toolkit 11.8

Getting started tutorial

This section guides you through using EDUX to process your dataset, configure a multilayer perceptron (Multilayer Neural Network), perform training and evaluation.

A multi-layer perceptron (MLP) is a feedforward artificial neural network that generates a set of outputs from a set of input features. An MLP is characterized by several layers of input nodes connected as a directed graph between the input and output layers.

Step 0: Get Familiar with the Dataset

In this example we use the famouse MNIST Dataset. The MNIST database contains 60,000 training images and 10,000 testing

Step 1: Data Processing

    String trainImages = "train-images.idx3-ubyte";
    String trainLabels = "train-labels.idx1-ubyte";
    String testImages = "t10k-images.idx3-ubyte";
    String testLabels = "t10k-labels.idx1-ubyte";
    Loader trainLoader = new ImageLoader(trainImages, trainLabels, batchSize);
    Loader testLoader = new ImageLoader(testImages, testLabels, batchSize);

Step 2: Configure the MultilayerPerceptron

    int batchSize = 100;
    int threads = 1;
    int epochs = 10;
    float initialLearningRate = 0.1f;
    float finalLearningRate = 0.001f;

    MetaData trainMetaData = trainLoader.open();
    int inputSize = trainMetaData.getInputSize();
    int outputSize = trainMetaData.getExpectedSize();
    trainLoader.close();

Step 3: Build the Network

We use the NetworkBuilder Class

    new NetworkBuilder()
        .addLayer(new DenseLayer(inputSize, 32))  //32 Neurons as output size
        .addLayer(new ReLuLayer())
        .addLayer(new DenseLayer(32, outputSize)) //32 Neurons as input size
        .addLayer(new SoftmaxLayer())
        .withBatchSize(batchSize)
        .withLearningRates(initialLearningRate, finalLearningRate)
        .withExecutionMode(singleThread)
        .withEpochs(epochs)
        .build()
        .printArchitecture()
        .fit(trainLoader, testLoader)
        .saveModel("model.edux"); // Save the trained model

Step 4: Load the model and continue training

Load 'model.edux' and continue training for 10 epochs.

    NeuralNetwork nn =
        new NetworkBuilder().withEpochs(10).loadModel("model.edux").fit(trainLoader, testLoader);

Results

........................Epoch: 1, Loss: 1,14, Accuracy: 91,04
...
........................Epoch: 10, Loss: 0,13, Accuracy: 96,16

Working examples

You can find more fully working examples for all algorithms in the examples folder.

For examples we use the

Contributions

Contributions are warmly welcomed! If you find a bug, please create an issue with a detailed description of the problem. If you wish to suggest an improvement or fix a bug, please make a pull request. Also checkout the Rules and Guidelines page for more information.

edux's People

Contributors

Stargazers

Watchers

Forkers

ellihanorky sarameller gerasidev arthasemai imgolem beep112 acsolle66 kisharnath gabbu98 anuragsisodiyaa sam-wmd shabataeva sub0zero1990 manumafe98 dshunter107 hanieljacob

edux's Issues

To be implemented

Support Vector Machine
https://en.wikipedia.org/wiki/Decision_tree

[X ] DONE

Decision Tree Classifier -

[X ] DONE

https://en.wikipedia.org/wiki/Decision_tree

Write JUnit Tests

Some of our key classes still need further JUnit tests

Implement Classical Matrix Multiplication

Description

Implement the classical method of matrix multiplication. This method will use the straightforward algorithm of computing the dot product between rows of the first matrix and columns of the second matrix.

Task Details

Objective: Implement a function for the classical matrix multiplication of two 2D arrays of doubles.
Function Signature: double[][] multiplyMatricesClassical(double[][] matrixA, double[][] matrixB)

Acceptance Criteria

Function correctly multiplies two matrices of compatible dimensions.
Appropriate error handling for matrices of incompatible sizes.
Comprehensive unit tests that cover various scenarios including edge cases.

Create GitHub Wiki Page: Import EDUX into your IDE

We need to create a comprehensive GitHub Wiki page that provides clear instructions on how to import Edux into an Integrated Development Environment (IDE). This documentation will help our users and contributors get started with Edux development more efficiently.

Expected Behavior:

A well-structured GitHub Wiki page should be created under our project's Wiki tab with detailed instructions on how to import Edux into popular IDEs.

Add some Images

IDE to cover

Eclipse
VSCode
IntelliJ IDEA

Batch Processing and Random Augmentation for EDUX Image Augmentation

Summary: Create a RandomAugmentation class, that apply randomly one of existing Augmentations.

If a choosen Effect expect constructor parameter, these need to be randomized too.

Refactor DataProcessor class

We need to change the parameter in the getColumnDataOf method within DataProcessor. Specifically, I suggest altering the parameter from String columnName to int columnIndex due to identified design shortcomings.
Additionally, to maintain code coherence, it's needed to remove all methods associated with the field columnNames.

class location : lib/src/main/java/de/edux/data/provider/DataProcessor.java

Create a GitHubs Social Preview Repo Image

Create a Social Preview Card

We need a social media preview image for our repository.

.png File
Images should be at least 640×320px (1280×640px for best display).

Template

Implement Matrix Multiplication Using Virtual Threads

Summary:

Develop a class to perform multiplication of 2D matrices using virtual threads to improve performance.

Description:

With the increasing need to process large datasets and matrices, leveraging concurrency mechanisms like virtual threads can lead to significant performance improvements. Implementing matrix multiplication using virtual threads can not only enhance performance but also serve as a blueprint for other matrix operations in the future.

Requirements:

Create an interface named ConcurrentMatrixMultiplication - double[][] multiplyMatrices(double[][] a, double[][] b)
Create a class MathMatrix that implement ConcurrentMatrixMultiplication
Utilize virtual threads for matrix multiplication.
Ensure proper thread synchronization to avoid race conditions.
Add proper error handling, especially for matrices that can't be multiplied due to dimension mismatches.

Add New Example for Neural Network: Utilizing the Seaborn Penguin Dataset

Issue Description:

We currently have examples in our library that utilize the IRIS dataset. To diversify and showcase the capabilities of our ML Java Library, it would be great to introduce an example using the Seaborn Penguin dataset.

Additional Context

The Seaborn Penguin dataset is quite popular and provides a good balance of categorical and numerical data. It would be a good addition to our example suite and potentially attract more users familiar with this dataset.

Add the dataset csv to the project
Write a DatasetProvider
Implement an example. Use a multilayer neural network.

Dataset: https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv

Advanced Geometric Transformations for EDUX Image Augmentation

Summary: Develop advanced geometric transformations for EDUX's image augmentation toolset.

Details:

Perspective Transformations: Implement methods for perspective changes.
Elastic Transformations: Add elastic deformation functions.

Java Features: Advanced use of AffineTransform and custom algorithms.

Documentation: Comprehensive Javadoc and practical examples in README.

Tests: Extensive unit and integration tests for transformation accuracy.

[FIX] Make CudaMatrixArithmetic a Singelton

public class CudaMatrixArithmetic implements IMatrixArithmetic

convert to Singleton
edit all tests

Implementation of MatrixVectorProduct

Implement class MatrixVectorProduct

Add New Example for knn: Utilizing the Seaborn Penguin Dataset

Issue Description:

Additional Context

Add the dataset csv to the project
Write a DatasetProvider
Implement an example. Use k-nearest neighbors algorithm.

Dataset: https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv
See #16

Multithreaded Math Class for Matrix multiplication

We need a classic multithreaded Math Class for Matrix multiplication. We will compare it perfomance with #12

We will test ist with matrix sizes of >4000x4000

Unifiy the classifier

A classificator interface is to be built that is used by all ML algorithms.

Goal: A unified intuitive API.

IntelliJ cant find edux module

java.lang.IllegalStateException: Module entity with name: edux.lib.main should be available

adapt - rootProject.name = 'Edux' to 'edux'

Write examples for all Classifiers using new api.

NVIDIA CUDA Support for Matrix Multiplication

Description:

As our library grows and tackles more complex ML tasks, the need for leveraging the power of GPU becomes vital to ensure optimal performance and resource utilization. The goal of this ticket is to discuss and track the implementation of Nvidia CUDA support, enabling the library to conduct parallel computations on Nvidia GPUs, thus drastically enhancing data processing and training times for ML models.

Performance:

Implementing CUDA support should exhibit a noticeable performance improvement in model training and predictions, especially for computationally intensive tasks.

Tasks

Implement Matrix Multiplications Feature that runs on GPU

Testing:

Adequate testing frameworks need to be in place to ensure stability and performance of CUDA implementation across different GPUs and operating systems.

Targeting CUDA/cudnn Version:

CUDA Toolkit 11.8
cuDNN 8.9.6

Write jUnit Test for decision trees

Scaling and Cropping Capabilities for EDUX Image Augmentation

Summary: Develop scaling and cropping capabilities for the EDUX image augmentation library.

Details:

Scaling: Methods to scale images with options for smooth scaling.
Cropping: Functions to crop to a region or to a specific size.

Java Features: Utilize java.awt.Image for quality scaling.

Documentation: Update Javadoc and README with usage samples.

Tests: Create unit tests for boundary conditions and typical use cases.

Implementation of CudaMatrixVectorProduct

do it

Update Project to Gradle 8.3

Implementation of Gradient Boosting Algorithm

Description

Gradient Boosting is a prominent ensemble learning method used widely in regression, classification, and ranking tasks. Given its importance and frequent demand in various ML applications, there's a need to incorporate a Gradient Boosting implementation in our Java ML Library. This ticket aims at the development and optimization of the Gradient Boosting algorithm in pure Java.

Acceptance Criteria:

The algorithm should be implemented in pure Java without relying on external dependencies.
The implementation should support both regression and classification tasks.
Incorporate decision trees as weak learners, ensuring the option to define the depth of the tree.
A provision should be there to define the number of boosting rounds.
Allow tunable parameters such as learning rate, loss function, and regularization.
The solution needs to be performant and scalable, capable of handling larger datasets efficiently.
Boundary and edge cases, such as data inconsistencies or parameter mismatches, should be handled gracefully with appropriate exceptions and error messages.
Unit tests should be written to verify the functionality, performance, and reliability of the implemented algorithm.
Integration examples or demos illustrating how to use the implemented class in real-world scenarios.

Update to JDK 21

We need to go with JDK21.

It looks like we need Virtual Threads with Executor Service, so we need to update the project to 21.

Update JDK Version in Gradle File to 21

Bug: Decision Tree currentLeafNodes Counter Not Updating

Description

We've identified a bug in our Decision Tree implementation where the currentLeafNodes counter, which is supposed to track the number of leaf nodes, does not update during the tree construction. This issue leads to an inability to correctly enforce the maxLeafNodes constraint, potentially allowing the tree to grow beyond the intended limits.

Expected Behavior

The currentLeafNodes variable should increment whenever a leaf node is created, ensuring that the total number of leaf nodes never exceeds maxLeafNodes. The tree construction should halt when the maximum number of leaf nodes is reached.

Actual Behavior

The currentLeafNodes remains at its initial value (0) throughout the execution, causing the maxLeafNodes constraint to be ignored. This behavior can lead to overfitting if the tree grows too complex, as it doesn't stop at the designed constraint.

Steps to Reproduce

Initialize a Decision Tree with a specific maxLeafNodes value.
Train the Decision Tree with a dataset large enough to potentially exceed the maxLeafNodes if unconstrained.
Observe the size of the constructed tree, noting that it can exceed the intended maximum leaf nodes.

Suggested Fix

We need to adjust the tree construction logic to correctly increment currentLeafNodes each time we create a leaf node. Additionally, before creating new branches, we should check whether adding them would exceed maxLeafNodes and, if so, stop branching further. These changes should enforce the correct tree size and complexity as originally intended by the maxLeafNodes parameter.

Additional Information

Problematic behavior identified in the buildTree method of the DecisionTree class.
This issue might be contributing to overfitting in our model predictions due to the increased complexity of the tree.

Please, any assistance or further insights into how we can resolve this issue would be greatly appreciated.

Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling

There is an open discussion for this ticket. #62
In our data processing logic, we currently use a boolean named filterIncompleteRecords to decide whether to filter out incomplete records or not. To provide more flexibility and future extensibility in handling missing data, we propose replacing this boolean with an enum named Imputation.

Let see on this example tiger dataset.

Tiger	size
Syberian Tiger	Big	ok
Indian Tiger	Small	ok
Amur Tiger		Missed Value in dataset

With filterIncompleteRecords = true, our Data Processing Class will auto remove Amur Tiger from the dataset, because the size value is missed.

The new Imputation enum would have various strategies such as MEDIAN, MODE, FILTEROUT, and AVERAGE. For the scope of this ticket, we'll focus on implementing FILTEROUT and AVERAGE, MODE.

Advantages of this new Feature

A user can auto fill missed values.

Hint

How filterIncompleteRecords works you can see in the Seaborn Pinguin k-nearest neighbors algorithm (knn) example.
https://github.com/Samyssmile/edux/blob/main/example/src/main/java/de/example/knn/KnnSeabornExample.java

https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00516-9

Write an example - How to use RandomForest

In our example subfolder, we need a Class that show how to use RandomForest

Write examples for Seaborn dataset

Write examples for all of our ML Algorithms.

Use Seaborn Dataset
Use the new Imputation Feature

Convolutional Neural Network (CNN) Support

This Ticket is blocked by: #109
This Ticket will profit from: #36 and #44

CNNs are state of the art approach for Image & Audio recognition.

As machine learning and artificial intelligence evolve, the ability of EDUX to solve complex problems, especially those related to image recognition and processing, is vital. Currently, the library lacks a fundamental functionality in the domain of deep learning, which is the Convolutional Neural Network (CNN). The implementation of CNNs would dramatically broaden the scope and capability of EDUX, making it feasible to address issues in image processing, video analysis, and various computer vision applications.

CNN Development Roadmap

1. Designing the CNN Architecture

Define the CNN architecture.
- Determine the number and types of layers:
  - Convolutional layers
  - Dense layers
  - MaxPooling layers
  - Flatten layers
  - Dropout layers
- Decide on the activation function (e.g., ReLU for hidden layers, Softmax for output).

2. Implementing the Core Components

Develop or integrate the Matrix3D class for matrix operations.
Implement the Convolutional layer operations in Matrix.
Implement MaxPooling, Flatten, and Dropout layers using Matrix.
Implement Dense layers with matrix support.
Integrate Softmax functionality for output layer.

3. Loss Function and Optimization

Implement the CrossEntropyLoss class.
Integrate gradient-based learning mechanisms.
Implement the ADAM optimizer.

4. Building the Training Infrastructure

5. Testing and Evaluation

Develop a testing routine to evaluate the model.
Implement performance metrics (e.g., accuracy, loss).

6. Debugging and Optimization

Perform code reviews and debugging.
Optimize performance (focus on matrix operations in Matrix3D).

7. Documentation and Reporting

Document the code and architecture.
Create a report on model performance and learn

Implement Multi-Threaded Matrix Multiplication

Description

Optimize the classical matrix multiplication method using multi-threading. The implementation should parallelize the computation to utilize system resources more efficiently.

Task Details

Objective: Develop a multi-threaded approach for matrix multiplication.
Function Signature: double[][] multiplyMatricesMultiThreaded(double[][] matrixA, double[][] matrixB)

Acceptance Criteria

Multi-threading is used to parallelize the multiplication process.
The function ensures thread safety when accessing and modifying shared resources.
Unit tests validate the correctness of the multiplication across multiple threads.

Implementation of Randrom Forest

Goal

We currently lack an implementation for the Random Forest algorithm. This algorithm is a fundamental component of modern machine learning techniques and has been in demand by many users. The objective of this ticket is to implement the Random Forest algorithm in Java for our ML Library.

https://en.wikipedia.org/wiki/Random_forest

Acceptance Criteria:

- The Random Forest algorithm must be implemented in pure Java without any external dependencies.
- The algorithm should support both classification and regression functionalities.
- The algorithm must be compatible with our existing data structures.
- Unit tests must ensure the correct operation of the algorithm.
- The implementation needs to be performant and scalable, capable of handling larger datasets effectively.

Reformat Codebase with Google Formatter

Image Pre-processing Tools

This is blocked by: #43

Add Image Pre-processing Tools for Augmentation and Resizing

Description

In the context of developing and optimizing Convolutional Neural Networks (CNNs) #43 it is essential to implement a suite of image pre-processing tools that can facilitate streamlined and effective model training. Specifically, we require utility classes/methods that address common tasks like image augmentation and resizing.

Image Augmentation enriches the dataset, allowing the model to be trained on more diverse data without collecting new images. Consequently, it helps improve the model's robustness and ability to generalize, thus reducing overfitting.

Requirements:

Image Augmentation: Implement tools or utilities that support various image augmentation techniques, such as:
- Rotation
- Flipping
- Shearing
- Zooming
- Color variations (brightness, saturation, etc.)
Image Resizing: Provide functionality for resizing images, which should support:
- Specifying target size
- Maintaining the aspect ratio or optionally ignoring it
- Different interpolation methods (e.g., nearest-neighbor, bilinear, etc.)

Acceptance Criteria:

Implement augmentation tools that provide at least the above-mentioned functionalities.
Implement resizing tools with the specified features.
Ensure that the tools are adaptable and can be easily integrated into the existing workflow with CNNs.
Provide documentation for each tool/utility, detailing the usage and potential application scenarios.
Create unit tests to verify the functionality and reliability of the implemented tools.

Additional Context:

Ensuring robust and versatile image pre-processing tools can significantly enhance the model training phase, enabling the CNN to generalize better through exposure to varied and transformed data. This enhancement will bolster our ongoing work in developing CNN models for our project.

Tasks:

Define the API for image augmentation and resizing tools.
Develop the image augmentation tools.
Develop the image resizing tools.
Create documentation and usage examples.
Implement unit tests.
Conduct a review and testing phase to ensure functionality and integration capability.
https://albumentations.ai/docs/introduction/image_augmentation/#:~:text=Image%20augmentation%20is%20a%20process,slightly%20change%20the%20original%20image.

Move Normalization into separate method.

As @acsolle66 noticed, its a mistake to do NORMALIZE in such an early stage.

var dataset = seabornDataProcessor.loadDataSetFromCSV(csvFile, ',', SHUFFLE, NORMALIZE);

because it will have to much impact with potential outliners on the data. Instead we want to use a separate method for it.

    var dataset = seabornDataProcessor.loadDataSetFromCSV(csvFile, ',', SHUFFLE);
    dataset = seabornDataProcessor
            .imputation("Subspecies", Imputation.MODE)
            .imputation("1", Imputation.MEAN)
            .imputation("2", Imputation.MEAN)
            .imputation("3", Imputation.AVERAGE)
            .normalize();

Integration of JMH for Benchmarking Performance

Feature Request: Integration of JMH for Benchmarking Performance

Summary

EDUX, as a growing Java Machine Learning Library, has reached a stage where performance optimization can yield significant benefits. To ensure our library remains efficient and scalable, integrating Java Microbenchmark Harness (JMH) for performance and memory benchmarking is proposed.

Expected Outcome

The integration of JMH will allow contributors to:

Identify performance bottlenecks.
Ensure code changes do not degrade performance.
Profile memory usage alongside execution time.
Establish a benchmark suite for ongoing performance evaluation.

Action Items

Set up JMH within the EDUX project. (Library and Examples subprojects)
Document how to run benchmarks.

Additional Context

The ability to measure performance and memory utilization is critical as we optimize our algorithms for both speed and efficiency. This addition will be instrumental in maintaining EDUX's competitive edge as a user-friendly and performant Java ML library.

Rotation and Translation Functions for EDUX Image Augmentation

Summary: Implement rotation and translation functions in EDUX's image augmentation module.

Details:

Rotation: Add a method to rotate images by specific angles using AffineTransform in Java.
Translation: Implement translation along x and y axes leveraging BufferedImage.

Java Features: Use java.awt.geom for geometric transformations.

Documentation: Include Javadoc and README examples.

Tests: Unit tests for each transformation.

Math Classes Required

Description:

To bolster the foundational components of our Java ML Library, there's a need to introduce classes that cater to mathematical operations commonly used in machine learning. Specifically, this ticket aims at the development of classes for matrices, vectors, and scalar multiplications in Java. These classes should allow users to perform basic and advanced matrix-vector operations seamlessly.

Acceptance Criteria:

Develop a Matrix class that supports basic operations such as addition, subtraction, and matrix multiplication.
Implement a Vector class that supports operations like dot product, vector addition, and vector subtraction.
Both the Matrix and Vector classes should support scalar multiplication.
All classes must be written in pure Java without relying on external dependencies.
The implementation should ensure that operations are efficient, even for larger matrices and vectors.
Boundary cases, such as matrix-vector dimension mismatches, should be handled gracefully with appropriate error messages.
Comprehensive documentation of the code, methods, and the API must be provided.
Unit tests should be written to verify the accuracy and reliability of these operations.

Perspective

Later on we will refactor existing code and use this Math Classes to perform better.

Refactoring Needed for Better Code Readability

Description:

In our codebase, there are numerous instances where one-line comments are used. While comments can be helpful, excessive usage can reduce the readability of the code. This ticket proposes the removal of these one-line comments and, where appropriate, extracting code blocks into self-explanatory methods.

Extract Code into Self-Explaining Methods:

For code blocks that are accompanied by comments to explain their logic, consider extracting them into separate methods.
Ensure that the new methods have descriptive names, effectively removing the need for the original comment.
For example, if there's a comment like // Check if user is valid, you can create a method named isValidUser().

Noise Injection and Filtering Techniques for EDUX Image Augmentation

Summary: Integrate noise injection and filtering techniques into EDUX image augmentation.

Details:

Noise Injection: Add random noise to simulate real-world imperfections.
Filtering: Implement Gaussian blur, sharpen, and edge detection filters.

Java Features: Consider java.awt.image.ConvolveOp for filtering.

Documentation: Document methods and include usage in the README.

Tests: Write tests for expected output of noise and filter applications.

Antora Documentation Site for EDUX Java Machine Learning Library

Description

This ticket outlines the creation of a Antora documentation site for the EDUX. The goal is to provide users with a comprehensive guide on integrating the library into their Java projects and utilizing its machine learning algorithms.

Objectives

Set up a basic Antora site structure.
Document the process of including the EDUX library in a Java project.
Provide code examples for each supported ML model.

Additional Notes

Include a section for hardware acceleration requirements.
Provide a benchmark section linking to the discussion on algorithm comparison.

The code examples can be found in the example directory of the EDUX repository.

Release 1.0.6 Preparation

- Update all documentation
- Update GitHub Page
- Add Junit tests - Ticket
- Reformat CodeBase with Google Formatter
- All IRIS examples are there and using new API
- Seaborn Dataset examples using the new Imputation Feature Solve this

Readme is not up tp date

I trained several models using your examples, and they all perform well. However, your Readme was not updated, which caused some initial confusion for me.

Implement Matrix Multiplication Using OpenCL

Feature Request: OpenCL-based Matrix Multiplication

Summary

We are looking to expand the computational backend options of EDUX by integrating OpenCL for matrix multiplication. This will provide an alternative to CUDA, allowing users with non-Nvidia GPUs to leverage the power of their hardware for matrix operations.

Motivation

While CUDA offers robust solutions for Nvidia GPUs, the user base with different GPU vendors is substantial. By incorporating OpenCL, we can make EDUX more accessible and versatile, catering to a wider audience.

Description

The goal is to implement the existing IMatrixArithmetic interface using OpenCL to perform matrix multiplication. The interface is as follows:

public interface IMatrixArithmetic {
  double[][] multiply(double[][] matrixA, double[][] matrixB);
}

Performance Optimization for EDUX Image Augmentation Library

Summary: Create comprehensive documentation and example use cases for the EDUX image augmentation library.

Details:

Javadoc for all public classes and methods.
README with clear, executable examples of all features.

Java Features: N/A (Documentation-related).

Tests: Verify that all examples in documentation are working as intended.

Create a Landingpage (GitHub Pages)

Background/Objective

Our 'Edux' library currently lacks an official landing page. As we aim to make our Java ML library accessible and easy to understand for new visitors and potential contributors, having a well-designed landing page is critical. This landing page will serve as the first point of contact where users can learn about the features, installation processes, and how to get involved with 'Edux.'

Describe what is edux
How to install part
How to use part
Clean & Simple design

Write jUnit Tests for Multilayer NeuralNetwork

We need a reliable test that shows that the MultiLayerPerceptron's training and predictions work correctly.

use seaborn dataset
train and check the accuracy value of the model.
Test should be repeatable 3x.

Flipping and Color Augmentation for EDUX Image Augmentation

Summary: Add flipping and color augmentation features to EDUX's image augmentation suite.

Details:

Flipping: Horizontal and vertical flipping functions.
Color Augmentation: Adjustments for brightness, contrast, saturation, and hue.

Java Features: Explore java.awt.image.RescaleOp for color operations.

Documentation: Extend Javadoc and provide README examples.

Tests: Include unit tests for color integrity and flip correctness.

Drop Custom Data Preparation

Delete

DataProcessor
Records

We dont want to support Record based data preparation.

Add github pages to the project

Add github pages action to the build pipeline

Implement Matrix Multiplication Using Strassen Algorithm

Description

Implement the Strassen algorithm for matrix multiplication. This algorithm is an advanced divide and conquer method that is more efficient for large matrices.

Task Details

Objective: Implement the Strassen algorithm to multiply two matrices.
Function Signature: double[][] multiplyMatricesStrassen(double[][] matrixA, double[][] matrixB)

Acceptance Criteria

The implementation must adhere to the Strassen algorithm's divide and conquer approach.
Proper handling of base cases and recursive subdivision of matrices.
Unit tests to ensure the algorithm's correctness and to compare its efficiency with classical multiplication.