Git Product home page Git Product logo

my_open_source_software's Introduction

my_open_source_software


This repo contains two machine learning algorithms.

  1. Polynomial regression
  2. Fining the best model and hyper-parameter for face recognition.

Explanation about second algorithm(finalterm_project):

Let me explain my "finalterm_project". My "final_project" contains the process of coding a classification model for classifying olivetti faces using Python's sklearn. The higher the similarity, the better the classification model, right? So, what I did was select the classification model with the highest similarity, and adjust the hyper-parameters to increase the similarity. I will explain the training dataset I used for this project. I used Olivetti faces as the dataset. This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. The image is quantized to 256 gray levels and stored as unsigned 8-bit integers; the loader will convert these to floating point values ​​on the interval [0, 1]. The original dataset consisted of 92 x 112, while the version available here consists of 64x64 images. To classify this olivetti face, I chose a classification model using the 'Support Vector Machine' (SVM) technique. SVM is a model for classifying which group the given data belongs to. The biggest feature of SVM is that it maximizes its generalization ability by utilizing 'margin'. The margin refers to the distance between the classification line used to classify the data and the closest data. And these close data are called 'support vectors'. The best classification line position is the position where the margin can be maximized as the distance from the support vectors is the longest, and it is gradually updated as we train the model. The main role of the SVM is to maximize its generalization ability by maximizing the margin. However, if the margin is too large, the error increases exponentially, causing an underfitting problem. Creating a margin like this is called 'soft margin'. Conversely, if the margin is minimized, an overfitting problem occurs, and this formation method is called a 'hard margin'. The biggest advantage of SVM is that, since determining the classification line is a support vector after all, the rest of the data points that are not support vectors can be ignored, so the classification speed is very fast. There are several hyper-parameters in SVM, and I will focus on the important ones and the ones I adjusted. First, parameter 'C' is closely related to margin. The larger the C value, the harder the margin, and the smaller the value, the soft margin. Since the optimal value of C is different depending on the data, you have to find the optimal value by substituting the numbers one by one. I set C to 1000 instead of the default value of 1, which means that the error will be reduced as much as possible even with overfitting. (Actually, there was no significant change in similarity...) The parameter 'Kernel' is simply a dimension transformer. Kernel includes 'linear', 'poly', 'rbf', and 'sigmoid'. First, the linear kernel is the most basic linear classification kernel. In 2D, we classify data into lines. However, there are not only linearly separable data. There are data that need to be separated non-linearly, so you can change the kernel in this case. The poly kernel allows data that previously existed in two dimensions to be expressed in multiple dimensions. In this case, the dimension can be determined using the parameter 'degree'. For example, the default value of degree=3 means to express data in 3D, and 4 means to express it in 4D. When a poly kernel is used, the crystal boundary appears in the form of a hyperplane rather than in the form of a line. When the data of this task were expressed in 3D, the classification was best and the degree of similarity was high. Therefore, kernel = 'poly', and degree was set to 3, which is the default value. Finally, the rbf kernel is the default value of SVM. rbf expresses two-dimensional data in infinite dimensions. When using the rbf kernel, it is more effective to express the margin with the parameter 'gamma' than to adjust the margin with the parameter C. gamma is a parameter that determines how flexible the decision boundary is drawn. Increasing the gamma value reduces the error by making the crystal boundary serpentine, but may lead to overfitting. Conversely, if gamma is lowered, it is good for generalization by drawing the crystal boundary close to a straight line, but it may cause underfitting. The parameter 'random_state' is a parameter that adjusts the randomness of the algorithm. Because of the randomness of the algorithm, there may be a problem that the similarity is sometimes different even though the hyper-parameter is not changed, so random_state is fixed to 0 to eliminate randomness.

                                                                                                                                 20210278 전용현

my_open_source_software's People

Contributors

imsongpasimin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.