Git Product home page Git Product logo

Comments (4)

JacobMaciejewski avatar JacobMaciejewski commented on September 3, 2024 1

Hello, and I am sorry for the late reply.
The fully capitalized similarity function names refer to Whoosh similarity functions, and can only be chosen in the context of Progressive Entity Matching using the Whoosh algorithm. Those functions will be renamed in the next official release or fully removed as Whoosh util is deprecated. The only reason they are grouped together with conventional similarity functions is due to specific argument names convention bound to the Progressive Workflow util. Some of its development code dependencies have been included in the latest release, even though the util is not fully ready yet.

from pyjedai.

reversingentropy avatar reversingentropy commented on September 3, 2024

Hi thank you for replying and the clarification!

Not sure if this is a helpful replacement (TF-IDF) in matching.py :

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def calculate_similarity(entity1, entity2):
# Convert entities to lowercase
entity1 = entity1.lower()
entity2 = entity2.lower()

# Initialize and fit the TfidfVectorizer
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([entity1, entity2])

# Calculate cosine similarity between the two documents
similarity_score = cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])[0][0]

return similarity_score

ent1 = "Linksys EtherFast 8-Port 10/100 Switch - EZXS88W Linksys EtherFast 8-Port 10/100 Switch - EZXS88W/ 10/100 Dual-Speed Per-Port/ Perfect For Optimizing 10BaseT And 100BaseTX Hardware On The Same Network/ Speeds Of Up To 200Mbps In Full Duplex Operation/ Eliminate Bandwidth Constraints And Clear Up Bottlenecks $44.00"

ent2 = "Linksys EtherFast EZXS88W Ethernet Switch - EZXS88W Linksys EtherFast 8-Port 10/100 Switch (New/Workgroup) LINKSYS"

similarity = calculate_similarity(ent1, ent2)

print("TF-IDF similarity between the entities:", similarity)

Result is : TF-IDF & cosine similarity between the entities: 0.46203393546758753

from pyjedai.

JacobMaciejewski avatar JacobMaciejewski commented on September 3, 2024

We are always open to new ideas and corrections on the already deployed code.
We will definitely have a look at it. Don't hesitate to set up your own branch and send us pull requests.
We will review it and may include some of your own solutions in the framework.

from pyjedai.

Nikoletos-K avatar Nikoletos-K commented on September 3, 2024

Hi, if I understand correctly you suggest us to change the similarity method. We use pairwise_distances from sklearn that supports both jaccard. dice and cosine and it's the same implementation as I tested. Please clarify if you propose something different.

Thank you in any case!

from pyjedai.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.