I can't find K-means model, so I think I can coding one. Thanks!

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I'm sorry. Excuse me. There is no SVM? <a class="user-mention notranslate" data-hoverc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Can I write a K-means model? then pull request. about numpy-ml HOT 8 CLOSED

ddbourgin commented on May 16, 2024

Can I write a K-means model? then pull request.

from numpy-ml.

Comments (8)

daidai21 commented on May 16, 2024 1

OK，I decide try to coding this algorithm. But I think I need some time, because I have other job. I will finish it as soon as possible.

Nice to meet you. @ddbourgin

from numpy-ml.

ddbourgin commented on May 16, 2024

Hi @daidai21 - thanks for your interest!

There actually is a k-means model as part of the KNN module, though I haven't explicitly called it that in the READMEs. Specifically, the KNN object takes an argument classifier, which converts between k-nearest neighbors regression (classifier=False) and k-means classification/clustering (classifier=True).

Feel free to propose other models you'd be interested in working on, though!

from numpy-ml.

daidai21 commented on May 16, 2024

I'm sorry. Excuse me. There is no SVM? @ddbourgin

from numpy-ml.

ddbourgin commented on May 16, 2024

@daidai21 - No need to apologize! An SVM implementation would be awesome -- it's been on my TODO list for ages :)

The crux will be implementing the SMO algorithm properly I suspect. If you decide to do it, I wouldn't worry too much about being efficient - for this repo, the focus is more on making everything as clean/clear as possible rather than on being clever.

Also, if you end up referencing other implementations when writing your code, please make sure to cite them in the docstrings and PR. It's important that any code you submit is your own work.

Finally - thanks! Let me know if you have any questions as you go along :)

from numpy-ml.

ddbourgin commented on May 16, 2024

Sure, take your time, and let me know if you have any questions!

from numpy-ml.

daidai21 commented on May 16, 2024

Hi, David

I took time to finish it, but the test didn't pass all. There is a 78% probability that my model and Sklearn's model predict the accuracy of the results. I don't know what to do now?

Sometimes my models are good, sometimes sklearns are good.

I think this result is related to the distribution of randomly generated data. I think my code is OK. What do you think?

This is test code.

import warnings
warnings.filterwarnings('ignore')
import numpy as np
import random

# load myself model
# from SVM import SVM

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.datasets.samples_generator import make_blobs
from sklearn.model_selection import train_test_split


def test_SVM():
    i = 1
    np.random.seed(12345)
    while True:
        X, Y = make_blobs(  # generate dataset
            n_samples=np.random.randint(2, 100), 
            n_features=np.random.randint(2, 100),
            centers=2, random_state=i, 
        )
        X, X_test, Y, Y_test = train_test_split(X, Y, test_size=0.3, random_state=i)
        if 0 not in Y or 1 not in Y:  # ignore split error(train/test data only 1 class)
            continue
        # generate param
        C = random.uniform(0.1, 0.9)
        max_iter = random.uniform(50, 500)
        kernel = np.random.choice(["linear", "rbf"])
        tol = random.uniform(0.000001, 0.1)
        # fit and predict
        clf1 = SVC(C=C, max_iter=max_iter, kernel=kernel, tol=tol)
        clf1.fit(X, Y)
        pred1 = clf1.predict(X_test)
        clf2 = SVM(C=C, max_iter=max_iter, kernel=kernel, tol=tol)
        clf2.fit(X, Y)
        pred2 = clf2.predict(X_test)
        # judge
        # err_msg = "ERROR {0} {1}".format(accuracy_score(Y_test, pred1), accuracy_score(Y_test, pred2))
        # assert accuracy_score(Y_test, pred1) == accuracy_score(Y_test, pred2), err_msg
        # print("PASSED")
        if accuracy_score(Y_test, pred1) == accuracy_score(Y_test, pred2):
            print("PASSED")
        else:
            print("ERROR", accuracy_score(Y_test, pred1), accuracy_score(Y_test, pred2))


if __name__ == "__main__":
    test_SVM()

This test code run result.

PASSED
PASSED
PASSED
PASSED
ERROR 0.3333333333333333 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.0
PASSED
ERROR 0.3125 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.7692307692307693
PASSED
PASSED
ERROR 1.0 0.5
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 0.9655172413793104 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 0.5 1.0
PASSED
ERROR 1.0 0.5384615384615384
PASSED
ERROR 1.0 0.9090909090909091
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.3333333333333333
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.75
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.9
ERROR 1.0 0.6666666666666666
PASSED
ERROR 0.3333333333333333 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 0.4444444444444444 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 0.14285714285714285 1.0
ERROR 1.0 0.8
PASSED
PASSED
ERROR 1.0 0.9583333333333334
PASSED
ERROR 1.0 0.3333333333333333
PASSED
ERROR 1.0 0.9047619047619048
ERROR 0.0 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 0.2 1.0
PASSED
PASSED
ERROR 1.0 0.6
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.9090909090909091
ERROR 0.0 1.0
ERROR 0.3333333333333333 1.0
ERROR 1.0 0.6
PASSED
PASSED
PASSED
PASSED
ERROR 0.5 1.0
ERROR 1.0 0.8
ERROR 1.0 0.9523809523809523
PASSED
ERROR 0.32 1.0
PASSED
PASSED
ERROR 1.0 0.8333333333333334
ERROR 1.0 0.9259259259259259
ERROR 1.0 0.96
PASSED
PASSED
PASSED
ERROR 1.0 0.9259259259259259
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.5
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.4
PASSED
ERROR 0.4 1.0
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.6666666666666666
PASSED
PASSED
ERROR 0.4166666666666667 1.0
ERROR 1.0 0.9166666666666666
PASSED
ERROR 1.0 0.6666666666666666
PASSED
PASSED
ERROR 1.0 0.6
PASSED
PASSED
ERROR 0.3333333333333333 1.0
PASSED
ERROR 0.4 1.0
ERROR 0.8235294117647058 1.0
PASSED
PASSED
PASSED
PASSED
ERROR 0.5555555555555556 1.0
PASSED
PASSED
ERROR 0.0 1.0
PASSED
PASSED
PASSED
PASSED
ERROR 1.0 0.5
PASSED
PASSED

from numpy-ml.

ddbourgin commented on May 16, 2024

Hi @daidai21 - thank you for working on this! It's not clear to me why random data generation would result in failed tests, since both models receive the same input data and targets. Perhaps I'm missing something?

Anyway, feel free to submit a PR and we can try to work through the code together to identify what's going on. It's difficult to know right now why certain tests aren't passing, since I don't know what the model code looks like.

Finally, to help track down the cause of the failed tests, I'd recommend directly comparing pred1 and pred2 to ensure that individual data points are being categorized in the same way between the two models. This will help you to better identify why some of the tests are failing :)

Thanks again!

from numpy-ml.

ddbourgin commented on May 16, 2024

Closing this, as the code you are talking about is not your own work.

See #37

from numpy-ml.

Can I write a K-means model? then pull request. about numpy-ml HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent