my_open_source_software

This repo contains two machine learning algorithms.

Polynomial regression
Fining the best model and hyper-parameter for face recognition.

Explanation about second algorithm(finalterm_project):

Let me explain my "finalterm_project". My "final_project" contains the process of coding a classification model for classifying olivetti faces using Python's sklearn. The higher the similarity, the better the classification model, right? So, what I did was select the classification model with the highest similarity, and adjust the hyper-parameters to increase the similarity. I will explain the training dataset I used for this project. I used Olivetti faces as the dataset. This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. The image is quantized to 256 gray levels and stored as unsigned 8-bit integers; the loader will convert these to floating point values on the interval [0, 1]. The original dataset consisted of 92 x 112, while the version available here consists of 64x64 images. To classify this olivetti face, I chose a classification model using the 'Support Vector Machine' (SVM) technique. SVM is a model for classifying which group the given data belongs to. The biggest feature of SVM is that it maximizes its generalization ability by utilizing 'margin'. The margin refers to the distance between the classification line used to classify the data and the closest data. And these close data are called 'support vectors'. The best classification line position is the position where the margin can be maximized as the distance from the support vectors is the longest, and it is gradually updated as we train the model. The main role of the SVM is to maximize its generalization ability by maximizing the margin. However, if the margin is too large, the error increases exponentially, causing an underfitting problem. Creating a margin like this is called 'soft margin'. Conversely, if the margin is minimized, an overfitting problem occurs, and this formation method is called a 'hard margin'. The biggest advantage of SVM is that, since determining the classification line is a support vector after all, the rest of the data points that are not support vectors can be ignored, so the classification speed is very fast. There are several hyper-parameters in SVM, and I will focus on the important ones and the ones I adjusted. First, parameter 'C' is closely related to margin. The larger the C value, the harder the margin, and the smaller the value, the soft margin. Since the optimal value of C is different depending on the data, you have to find the optimal value by substituting the numbers one by one. I set C to 1000 instead of the default value of 1, which means that the error will be reduced as much as possible even with overfitting. (Actually, there was no significant change in similarity...) The parameter 'Kernel' is simply a dimension transformer. Kernel includes 'linear', 'poly', 'rbf', and 'sigmoid'. First, the linear kernel is the most basic linear classification kernel. In 2D, we classify data into lines. However, there are not only linearly separable data. There are data that need to be separated non-linearly, so you can change the kernel in this case. The poly kernel allows data that previously existed in two dimensions to be expressed in multiple dimensions. In this case, the dimension can be determined using the parameter 'degree'. For example, the default value of degree=3 means to express data in 3D, and 4 means to express it in 4D. When a poly kernel is used, the crystal boundary appears in the form of a hyperplane rather than in the form of a line. When the data of this task were expressed in 3D, the classification was best and the degree of similarity was high. Therefore, kernel = 'poly', and degree was set to 3, which is the default value. Finally, the rbf kernel is the default value of SVM. rbf expresses two-dimensional data in infinite dimensions. When using the rbf kernel, it is more effective to express the margin with the parameter 'gamma' than to adjust the margin with the parameter C. gamma is a parameter that determines how flexible the decision boundary is drawn. Increasing the gamma value reduces the error by making the crystal boundary serpentine, but may lead to overfitting. Conversely, if gamma is lowered, it is good for generalization by drawing the crystal boundary close to a straight line, but it may cause underfitting. The parameter 'random_state' is a parameter that adjusts the randomness of the algorithm. Because of the randomness of the algorithm, there may be a problem that the similarity is sometimes different even though the hyper-parameter is not changed, so random_state is fixed to 0 to eliminate randomness.

                                                                                                                                 20210278 전용현

imsongpasimin / my_open_source_software Goto Github PK

my_open_source_software's Introduction

my_open_source_software

my_open_source_software's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent