Git Product home page Git Product logo

howto-weaviate-retrieval-plugin's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

howto-weaviate-retrieval-plugin's Issues

Container no longer builds due to flyctl feature issue

superfly/flyctl changed the version in their git tags, which broke dhoeric/features#25, so now:

[2023-09-14T05:31:49.081Z] Dev Containers 0.309.0 in VS Code 1.82.1 (6509174151d557a81c9d0b5f8a5a1e9274db5585).
[2023-09-14T05:31:49.081Z] Start: Resolving Remote
...

=> [internal] load metadata
[2023-09-14T05:32:27.758Z] for mcr.microsoft.com/devcontainers/python:0 0.0s
[2023-09-14T05:32:27.758Z] => [context dev_containers_feature_content_source] load .dockerignore 0.0s
=> => transferring dev_containers_feature_content_source: 2B 0.0s
=> [context dev_containers_feature_content_source] load from client 0.0s
=> => transferring dev_containers_feature_content_source: 69.77kB 0.0s
=> [dev_containers_feature_content_normalize 1/3] FROM mcr.microsoft.com 0.0s
=> [dev_containers_feature_content_normalize 2/3] COPY --from=dev_contai 0.0s
=> [dev_containers_target_stage 2/7] RUN mkdir -p /tmp/dev-container-fea 0.1s
=> [dev_containers_feature_content_normalize 3/3] RUN chmod -R 0755 /tmp 0.1s
=> [dev_containers_target_stage 3/7] COPY --from=dev_containers_feature_ 0.0s
=> [dev_containers_target_stage 4/7] RUN echo "_CONTAINER_USER_HOME=$( ( 0.2s
=> [dev_containers_target_stage 5/7] RUN --mount=type=bind,from=dev_con 14.6s
=> [dev_containers_target_st
[2023-09-14T05:32:27.758Z] age 6/7] RUN --mount=type=bind,from=dev_con 14.9s
=> ERROR [dev_containers_target_stage 7/7] RUN --mount=type=bind,from=de 1.0s
...
[dev_containers_target_stage 7/7] RUN --mount=type=bind,from=dev_containers_feature_content_source,source=flyctl_2,target=/tmp/build-features-src/flyctl_2 cp -ar /tmp/build-features-src/flyctl_2 /tmp/dev-container-features && chmod -R 0755 /tmp/dev-container-features/flyctl_2 && cd /tmp/dev-container-features/flyctl_2 && chmod +x ./devcontainer-features-install.sh && ./devcontainer-features-install.sh && rm -rf /tmp/dev-container-features/flyctl_2:
0.257 ===========================================================================
0.257 Feature : Fly.io CLI
0.257 Description : Install flyctl - Command line tools for fly.io services
0.257 Id : ghcr.io/dhoeric/features/flyctl
0.257 Version : 1.0.0
0.257 Documentation : https://github.com/dhoeric/features/tre
[2023-09-14T05:32:27.759Z] e/main/src/flyctl
0.257 Options :
0.257 VERSION="latest"
0.257 ===========================================================================
0.269 (*) Installing flyctl...
0.744 FLYCTL_VERSION=2023.9.8
1.004
1.004 gzip: stdin: not in gzip format
1.005 tar: ERROR: Feature "Fly.io CLI" (ghcr.io/dhoeric/features/flyctl) failed to install! Look at the documentation at https://github.com/dhoeric/features/tree/main/src/flyctl for help troubleshooting this error.
1.007 Child returned status 1
1.007 tar: Error is not recoverable: exiting now
[2023-09-14T05:32:27.759Z] ------
[2023-09-14T05:32:27.759Z] Dockerfile.extended:42
...

41 |
42 | >>> RUN --mount=type=bind,from=dev_containers_feature_content_source,source=flyctl_2,target=/tmp/build-features-src/flyctl_2
43 | >>> cp -ar /tmp/build-features-src/flyctl_2 /tmp/dev-container-features
44 | >>> && chmod -R 0755 /tmp/dev-container-features/flyctl_2
45 | >>> && cd /tmp/dev-container-features/flyctl_2
46 | >>> && chmod +x ./devcontainer-features-install.sh
47 | >>> && ./devcontainer-features-install.sh
48 | >>> && rm -rf /tmp/dev-container-features/flyctl_2
49 |
...
ERROR: failed to solve: process "/bin/sh -c cp -ar /tmp/build-features-src/flyctl_2 /tmp/dev-container-features && chmod -R 0755 /tmp/dev-container-features/flyctl_2 && cd /tmp/dev-container-features/flyctl_2 && chmod +x ./devcontainer-features-install.sh && ./devcontainer-features-install.sh && rm -rf /tmp/dev-container-features/flyctl_2" did not complete successfully: exit code: 2
...

Is there a way to delete a document by `document_id` AND `document_type`?

What should I extend in the following delete where... in order to make not just only document_id but document_type required too?

    """
    Delete a document from weaviate
    """
    result = client.batch.delete_objects(
        class_name=INDEX_NAME,
        where={
            "operator": "Equal",
            "path": ["document_id"],
            "valueString": delete_request.document_id,
        },
    )

Please notice how I was able to add document_type, which is working fine for upsert and query.

main.py with document_type

from typing import List
from fastapi import Depends, FastAPI, HTTPException
from contextlib import asynccontextmanager

from .database import get_client, init_db, INDEX_NAME
from .embedding import get_embedding
from pydantic import BaseModel

from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware
import os

from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

bearer_scheme = HTTPBearer()
BEARER_TOKEN = os.environ.get("BEARER_TOKEN")


def validate_token(credentials: HTTPAuthorizationCredentials = Depends(bearer_scheme)):
    if credentials.scheme != "Bearer" or credentials.credentials != BEARER_TOKEN:
        raise HTTPException(status_code=401, detail="Invalid or missing token")
    return credentials


class Document(BaseModel):
    text: str
    document_id: str
    document_type: str


class Query(BaseModel):
    text: str
    document_type: str
    limit: int = 5


class QueryResult(BaseModel):
    document: Document
    score: float


class DeleteRequest(BaseModel):
    document_id: str
    document_type: str


@asynccontextmanager
async def lifespan(app: FastAPI):
    init_db()
    yield


app = FastAPI(lifespan=lifespan)

app.mount("/.well-known", StaticFiles(directory=".well-known"), name="static")

# for localhost deployment
if os.getenv("ENV", "dev") == "dev":
    origins = [
        f"http://localhost:8000",
        "https://chat.openai.com",
    ]

    app.add_middleware(
        CORSMiddleware,
        allow_origins=origins,
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )


def get_weaviate_client():
    """
    Get a client to the Weaviate server
    """
    yield get_client()


@app.get("/")
def read_root():
    """
    Say hello to the world
    """
    return {"Hello": "World"}


@app.post("/upsert")
def upsert(
    doc: Document,
    client=Depends(get_weaviate_client),
    token: HTTPAuthorizationCredentials = Depends(validate_token),
):
    """
    Insert a document into weaviate
    """
    with client.batch as batch:
        batch.add_data_object(
            data_object=doc.dict(),
            class_name=INDEX_NAME,
            vector=get_embedding(doc.text),
        )

    return {"status": "ok"}


@app.post("/query", response_model=List[QueryResult])
def query(
    query: Query,
    client=Depends(get_weaviate_client),
    token: HTTPAuthorizationCredentials = Depends(validate_token),
) -> List[Document]:
    """
    Query weaviate for documents
    """
    query_vector = get_embedding(query.text)

    results = (
        client.query.get(INDEX_NAME, ["document_type", "document_id", "text"])
        .with_near_vector({"vector": query_vector})
        .with_limit(query.limit)
        .with_additional("certainty")
        .do()
    )

    docs = results["data"]["Get"][INDEX_NAME]

    return [
        QueryResult(
            document={"text": doc["text"], "document_id": doc["document_id"], "document_type": doc["document_type"]},
            score=doc["_additional"]["certainty"],
        )
        for doc in docs
    ]


@app.post("/delete")
def delete(
    delete_request: DeleteRequest,
    client=Depends(get_weaviate_client),
    token: HTTPAuthorizationCredentials = Depends(validate_token),
):
    """
    Delete a document from weaviate
    """
    result = client.batch.delete_objects(
        class_name=INDEX_NAME,
        where={
            "operator": "Equal",
            "path": ["document_id"],
            "valueString": delete_request.document_id,
        },
    )

    if result["results"]["successful"] == 1:
        return {"status": "ok"}
    else:
        return {"status": "not found"}

openai.yaml with document_type

openapi: 3.0.1
info:
  title: Weaviate Retrieval Plugin
  description: A plugin for querying, inserting and deleting documents using natural language queries
  version: 'v1'
servers:
  - url: https://demo-retrieval-app.fly.dev
paths:
  /upsert:
    post:
      summary: Upsert
      description: Insert a document into weaviate
      operationId: upsert_upsert_post
      requestBody:
        content:
          application/json:
            schema:
              "$ref": "#/components/schemas/Document"
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                "$ref": "#/components/schemas/HTTPValidationError"
      security:
      - HTTPBearer: []
  /query:
    post:
      summary: Query
      description: Query weaviate for documents
      operationId: query_query_post
      requestBody:
        content:
          application/json:
            schema:
              "$ref": "#/components/schemas/Query"
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                title: Response Query Query Post
                type: array
                items:
                  "$ref": "#/components/schemas/QueryResult"
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                "$ref": "#/components/schemas/HTTPValidationError"
      security:
      - HTTPBearer: []
  /delete:
    post:
      summary: Delete
      description: Delete a document from weaviate
      operationId: delete_delete_post
      requestBody:
        content:
          application/json:
            schema:
              "$ref": "#/components/schemas/DeleteRequest"
        required: true
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                "$ref": "#/components/schemas/HTTPValidationError"
      security:
      - HTTPBearer: []
components:
  schemas:
    DeleteRequest:
      title: DeleteRequest
      required:
      - document_id
      type: object
      properties:
        document_id:
          title: Document Id
          type: string
    Document:
      title: Document
      required:
      - text
      - document_id
      - document_type
      type: object
      properties:
        text:
          title: Text
          type: string
        document_id:
          title: Document Id
          type: string
        document_type:
          title: Document Type
          type: string
    HTTPValidationError:
      title: HTTPValidationError
      type: object
      properties:
        detail:
          title: Detail
          type: array
          items:
            "$ref": "#/components/schemas/ValidationError"
    Query:
      title: Query
      required:
      - text
      - document_type
      type: object
      properties:
        text:
          title: Text
          type: string
        document_type:
          title: Document Type
          type: string
        limit:
          title: Limit
          type: integer
          default: 5
    QueryResult:
      title: QueryResult
      required:
      - document
      - score
      type: object
      properties:
        document:
          "$ref": "#/components/schemas/Document"
        score:
          title: Score
          type: number
    ValidationError:
      title: ValidationError
      required:
      - loc
      - msg
      - type
      type: object
      properties:
        loc:
          title: Location
          type: array
          items:
            anyOf:
            - type: string
            - type: integer
        msg:
          title: Message
          type: string
        type:
          title: Error Type
          type: string
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

Seems like there is NO a weaviate module for python 3.11 (ModuleNotFoundError: No module named 'weaviate')

I am trying to run this plugin in Docker, but I am encountering some issues. I am not entirely clear on how to install Weaviate on my localhost to experiment with LangChain and Weaviate.

image image

How to reproduce the error:

docker login

poetry env use python3.11
poetry shell

poetry install

docker build -t langchain .
docker run -p 8080:8080 -e PORT=8080 -e WEBSITES_PORT=8080 langchain   

where running docker run -p 8080:8080 -e PORT=8080 -e WEBSITES_PORT=8080 langchain is hitting the following error:

INFO:     Started server process [7]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.11/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.11/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.11/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fe18688a250>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /v1/.well-known/ready (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe18688a250>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/weaviate/connect/connection.py", line 617, in wait_for_weaviate
    requests.get(ready_url).raise_for_status()
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /v1/.well-known/ready (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe18688a250>: Failed to establish a new connection: [Errno 111] Connection refused'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.11/contextlib.py", line 204, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/code/server/main.py", line 46, in lifespan
    init_db()
  File "/code/server/database.py", line 28, in init_db
    client = get_client()
             ^^^^^^^^^^^^
  File "/code/server/database.py", line 21, in get_client
    return weaviate.Client(host)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/weaviate/client.py", line 150, in __init__
    self._connection = Connection(
                       ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/weaviate/connect/connection.py", line 639, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/weaviate/connect/connection.py", line 151, in __init__
    self.wait_for_weaviate(startup_period)
  File "/usr/local/lib/python3.11/site-packages/weaviate/connect/connection.py", line 620, in wait_for_weaviate
    raise WeaviateStartUpError(
weaviate.exceptions.WeaviateStartUpError: Weaviate did not start up in 5 seconds. Either the Weaviate URL http://localhost:8080 is wrong or Weaviate did not start up in the interval given in 'startup_period'.

ERROR:    Application startup failed. Exiting.

Thank you!

VS Code as a requirement?

Maybe I'm old fashioned but since when was the choice of IDE/editor a topic for a codebase itself to be involved in?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.