Git Product home page Git Product logo

eulersearch / embedding_studio Goto Github PK

View Code? Open in Web Editor NEW
365.0 6.0 5.0 10.4 MB

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

Home Page: https://embeddingstud.io/

License: Apache License 2.0

Python 98.87% Shell 0.22% Dockerfile 0.91%
embeddings embeddings-similarity fine-tuning llm-inference query-parser search-algorithm search-engine semantic-similarity unstructured-data unstructured-search

embedding_studio's People

Contributors

andrey-kostin avatar chillymagician avatar oyaso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

embedding_studio's Issues

how to inference finetuned embedding?

Guys, I’ve tried the code and look into it. I can’t see any inference related stuff. Looks interesting, but unless I can easily redeploy it in my infrastructure, it’s hard to be adopted.
Are you going to implement something like this? Or can you suggest any workaround?

Where is "Zero-Shot Query Parser"?

I've read your readme, and you mentioned about some query parsing tools, but can't find anything in your repo. Where can I read about it more? What is the release data of the feature? Thanks.

Data loading is quiet slow

Hey, trying to run the fine tuning , found out that image downloading is quite slow. Checked it out, seems that it’s even not multithreaded or async. Are you going to speed it up somehow?

Clickhouse instead of Mongo

Hey!
I’m looking into code, what was behind the decision to use MongoDB as a Clickstream storage backend?
We are using ClickHouse as the part of technical stack, it’s more convenient for this purpose. Will you add ClickHouse support?
Best regards.

Encountering ClientError trying to use my own dataset

I'm trying to use my dataset for model training, but I'm encountering the following error:

ClientError                               Traceback (most recent call last)
/tmp/ipykernel_199154/1286885123.py in <module>
----> 1 response = s3_client.get_object(Bucket='embedding-studio-experiments', Key='remote-lanscapes/clickstream/f6816566-cac3-46ac-b5e4-0d5b76757c93/sessions.json')
~/anaconda3/lib/python3.9/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    528                 )
    529             # The "self" in this scope is referring to the BaseClient.
--> 530             return self._make_api_call(operation_name, kwargs)
    531 
    532         _api_call.__name__ = str(py_operation_name)
~/anaconda3/lib/python3.9/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    958             error_code = parsed_response.get("Error", {}).get("Code")
    959             error_class = self.exceptions.from_code(error_code)
--> 960             raise error_class(parsed_response, operation_name)
    961         else:
    962             return parsed_response
ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

I attempted to set my AWS Access Key ID in the .env file under the variables MINIO_ACCESS_KEY and MINIO_SECRET_KEY. However, as I understand it, these variables are used for artifact storage, not for my datasets. Can you advise on how I can resolve this error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.