Git Product home page Git Product logo

marqo-ai / marqo Goto Github PK

View Code? Open in Web Editor NEW
4.1K 36.0 175.0 70.35 MB

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Home Page: https://www.marqo.ai/

License: Apache License 2.0

Python 99.71% Dockerfile 0.02% Shell 0.24% Lua 0.03%
deep-learning information-retrieval machinelearning vector-search tensor-search clip multi-modal search-engine transformers vision-language machine-learning semantic-search visual-search natural-language-processing hnsw knn hacktoberfest chatgpt gpt large-language-models

marqo's Introduction

Website | Documentation | Demos | Discourse | Slack Community | Marqo Cloud

Marqo

Marqo is more than a vector database, it's an end-to-end vector search engine for both text and images. Vector generation, storage and retrieval are handled out of the box through a single API. No need to bring your own embeddings.

Why Marqo?

Vector similarity alone is not enough for vector search. Vector search requires more than a vector database - it also requires machine learning (ML) deployment and management, preprocessing and transformations of inputs as well as the ability to modify search behavior without retraining a model. Marqo contains all these pieces, enabling developers to build vector search into their application with minimal effort. A full list of features can be found below.

Why bundle embedding generation with vector search?

Vector databases are specialized components for vector similarity and only service one component of a vector search system. They are “vectors in - vectors out”. They still require the production of vectors, management of the ML models, associated orchestration and processing of the inputs. Marqo makes this easy by being “documents in, documents out”. Preprocessing of text and images, embedding the content, storing meta-data and deployment of inference and storage is all taken care of by Marqo.

Quick start

Here is a code snippet for a minimal example of vector search with Marqo (see Getting Started):

  1. Use docker to run Marqo:
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

Note: If your marqo container keeps getting killed, this is most likely due to a lack of memory being allocated to Docker. Increasing the memory limit for Docker to at least 6GB (8GB recommended) in your Docker settings may fix the problem.

  1. Install the Marqo client:
pip install marqo
  1. Start indexing and searching! Let's look at a simple example below:
import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.create_index("my-first-index")

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }],
    tensor_fields=["Description"]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)

✨ Core Features

🤖 State of the art embeddings

  • Use the latest machine learning models from PyTorch, Huggingface, OpenAI and more.
  • Start with a pre-configured model or bring your own.
  • Built-in ONNX support and conversion for faster inference and higher throughput.
  • CPU and GPU support.

⚡ Performance

  • Embeddings stored in in-memory HNSW indexes, achieving cutting edge search speeds.
  • Scale to hundred-million document indexes with horizontal index sharding.
  • Async and non-blocking data upload and search.

🌌 Documents-in-documents-out

  • Vector generation, storage, and retrieval are provided out of the box.
  • Build search, entity resolution, and data exploration application with using your text and images.
  • Build complex semantic queries by combining weighted search terms.
  • Filter search results using Marqo’s query DSL.
  • Store unstructured data and semi-structured metadata together in documents, using a range of supported datatypes like bools, ints and keywords.

🍱 Managed cloud

  • Low latency optimised deployment of Marqo.
  • Scale inference at the click of a button.
  • High availability.
  • 24/7 support.
  • Access control.
  • Learn more here.

Integrations

Marqo is integrated into popular AI and data processing frameworks, with more on the way.

💙 Haystack

Haystack is an open-source framework for building applications that make use of NLP technology such as LLMs, embedding models and more. This integration allows you to use Marqo as your Document Store for Haystack pipelines such as retrieval-augmentation, question answering, document search and more.

🛹 Griptape

Griptape enables safe and reliable deployment of LLM-based agents for enterprise applications, the MarqoVectorStoreDriver gives these agents access to scalable search with your own data. This integration lets you leverage open source or custom fine-tuned models through Marqo to deliver relevant results to your LLMs.

🦜🔗 Langchain

This integration lets you leverage open source or custom fine tuned models through Marqo for LangChain applications with a vector search component. The Marqo vector store implementation can plug into existing chains such as the Retrieval QA and Conversational Retrieval QA.

Hamilton

This integration lets you leverage open source or custom fine tuned models through Marqo for Hamilton LLM applications.

Learn more about Marqo

📗 Quick start Build your first application with Marqo in under 5 minutes.
🖼 Marqo for image data Building advanced image search with Marqo.
📚 Marqo for text Building a multilingual database in Marqo.
🔮 Integrating Marqo with GPT Making GPT a subject matter expert by using Marqo as a knowledge base.
🎨 Marqo for Creative AI Combining stable diffusion with semantic search to generate and categorise 100k images of hotdogs.
🔊 Marqo and Speech Data Add diarisation and transcription to preprocess audio for Q&A with Marqo and ChatGPT.
🚫 Marqo for content moderation Building advanced image search with Marqo to find and remove content.
☁️ Getting started with Marqo Cloud Go through how to get set up and running with Marqo Cloud starting from your first time login through to building your first application with Marqo
👗 Marqo for e-commerce This project is a web application with frontend and backend using Python, Flask, ReactJS, and Typescript. The frontend is a ReactJS application that makes requests to the backend which is a Flask application. The backend makes requests to your Marqo cloud API.
🤖 Marqo chatbot In this guide we will build a chat bot application using Marqo and OpenAI's ChatGPT API. We will start with an existing code base and then walk through how to customise the behaviour.
🦾 Features Marqo's core features.

Getting started

  1. Marqo requires Docker. To install Docker go to the Docker Official website. Ensure that docker has at least 8GB memory and 50GB storage.

  2. Use docker to run Marqo:

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -p 8882:8882 marqoai/marqo:latest

Note: If your marqo container keeps getting killed, this is most likely due to a lack of memory being allocated to Docker. Increasing the memory limit for Docker to at least 6GB (8GB recommended) in your Docker settings may fix the problem.

  1. Install the Marqo client:
pip install marqo
  1. Start indexing and searching! Let's look at a simple example below:
import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.create_index("my-first-index")

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }],
    tensor_fields=["Description"]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)
  • mq is the client that wraps the marqo API.
  • create_index() creates a new index with default settings. You have the option to specify what model to use. For example, mq.create_index("my-first-index", model="hf/all_datasets_v4_MiniLM-L6") will create an index with the default text model hf/all_datasets_v4_MiniLM-L6. Experimentation with different models is often required to achieve the best retrieval for your specific use case. Different models also offer a tradeoff between inference speed and relevancy. See here for the full list of models.
  • add_documents() takes a list of documents, represented as python dicts for indexing. tensor_fields refers to the fields that will be indexed as vector collections and made searchable.
  • You can optionally set a document's ID with the special _id field. Otherwise, Marqo will generate one.

Let's have a look at the results:

# let's print out the results:
import pprint
pprint.pprint(results)

{
    'hits': [
        {
            'Title': 'Extravehicular Mobility Unit (EMU)',
            'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
                           'communications for astronauts',
            '_highlights': [{
                'Description': 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            }],
            '_id': 'article_591',
            '_score': 0.61938936
        },
        {
            'Title': 'The Travels of Marco Polo',
            'Description': "A 13th-century travelogue describing Polo's travels",
            '_highlights': [{'Title': 'The Travels of Marco Polo'}],
            '_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
            '_score': 0.60237324
        }
    ],
    'limit': 10,
    'processingTimeMs': 49,
    'query': 'What is the best outfit to wear on the moon?'
}
  • Each hit corresponds to a document that matched the search query.
  • They are ordered from most to least matching.
  • limit is the maximum number of hits to be returned. This can be set as a parameter during search.
  • Each hit has a _highlights field. This was the part of the document that matched the query the best.

Other basic operations

Get document

Retrieve a document by ID.

result = mq.index("my-first-index").get_document(document_id="article_591")

Note that by adding the document using add_documents again using the same _id will cause a document to be updated.

Get index stats

Get information about an index.

results = mq.index("my-first-index").get_stats()

Lexical search

Perform a keyword search.

result = mq.index("my-first-index").search('marco polo', search_method=marqo.SearchMethods.LEXICAL)

Multi modal and cross modal search

To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:

settings = {
    "treat_urls_and_pointers_as_images":True,   # allows us to find an image file and index it 
    "model":"ViT-L/14"
}
response = mq.create_index("my-multimodal-index", **settings)

Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:

response = mq.index("my-multimodal-index").add_documents([{
    "My_Image": "https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_realistic.png",
    "Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
    "_id": "hippo-facts"
}], tensor_fields=["My_Image"])

You can then search the image field using text.

results = mq.index("my-multimodal-index").search('animal')

Searching using an image

Searching using an image can be achieved by providing the image link.

results = mq.index("my-multimodal-index").search('https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_statue.png')

Searching using weights in queries

Queries can also be provided as dictionaries where each key is a query and their corresponding values are weights. This allows for more advanced queries consisting of multiple components with weightings towards or against them, queries can have negations via negative weighting.

The example below shows the application of this to a scenario where a user may want to ask a question but also negate results that match a certain semantic criterion.

import marqo
import pprint

mq = marqo.Client(url="http://localhost:8882")

mq.create_index("my-weighted-query-index")

mq.index("my-weighted-query-index").add_documents(
    [
        {
            "Title": "Smartphone",
            "Description": "A smartphone is a portable computer device that combines mobile telephone "
            "functions and computing functions into one unit.",
        },
        {
            "Title": "Telephone",
            "Description": "A telephone is a telecommunications device that permits two or more users to"
            "conduct a conversation when they are too far apart to be easily heard directly.",
        },
        {
            "Title": "Thylacine",
            "Description": "The thylacine, also commonly known as the Tasmanian tiger or Tasmanian wolf, "
            "is an extinct carnivorous marsupial."
            "The last known of its species died in 1936.",
        }
    ],
    tensor_fields=["Description"]
)

# initially we ask for a type of communications device which is popular in the 21st century
query = {
    # a weighting of 1.1 gives this query slightly more importance
    "I need to buy a communications device, what should I get?": 1.1,
    # a weighting of 1 gives this query a neutral importance
    # this will lead to 'Smartphone' being the top result
    "The device should work like an intelligent computer.": 1.0,
}

results = mq.index("my-weighted-query-index").search(q=query)

print("Query 1:")
pprint.pprint(results)

# now we ask for a type of communications which predates the 21st century
query = {
    # a weighting of 1 gives this query a neutral importance
    "I need to buy a communications device, what should I get?": 1.0,
    # a weighting of -1 gives this query a negation effect
    # this will lead to 'Telephone' being the top result
    "The device should work like an intelligent computer.": -0.3,
}

results = mq.index("my-weighted-query-index").search(q=query)

print("\nQuery 2:")
pprint.pprint(results)

Creating and searching indexes with multimodal combination fields

Marqo lets you have indexes with multimodal combination fields. Multimodal combination fields can combine text and images into one field. This allows scoring of documents across the combined text and image fields together. It also allows for a single vector representation instead of needing many which saves on storage. The relative weighting of each component can be set per document.

The example below demonstrates this with retrieval of caption and image pairs using multiple types of queries.

import marqo
import pprint

mq = marqo.Client(url="http://localhost:8882")

settings = {"treat_urls_and_pointers_as_images": True, "model": "ViT-L/14"}

mq.create_index("my-first-multimodal-index", **settings)

mq.index("my-first-multimodal-index").add_documents(
    [
        {
            "Title": "Flying Plane",
            "caption": "An image of a passenger plane flying in front of the moon.",
            "image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
        },
        {
            "Title": "Red Bus",
            "caption": "A red double decker London bus traveling to Aldwych",
            "image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg",
        },
        {
            "Title": "Horse Jumping",
            "caption": "A person riding a horse over a jump in a competition.",
            "image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg",
        },
    ],
    # Create the mappings, here we define our captioned_image mapping 
    # which weights the image more heavily than the caption - these pairs 
    # will be represented by a single vector in the index
    mappings={
        "captioned_image": {
            "type": "multimodal_combination",
            "weights": {
                "caption": 0.3,
                "image": 0.7
            }
        }
    },
    # We specify which fields to create vectors for. 
    # Note that captioned_image is treated as a single field.
    tensor_fields=["captioned_image"]
)

# Search this index with a simple text query
results = mq.index("my-first-multimodal-index").search(
    q="Give me some images of vehicles and modes of transport. I am especially interested in air travel and commercial aeroplanes."
)

print("Query 1:")
pprint.pprint(results)

# search the index with a query that uses weighted components
results = mq.index("my-first-multimodal-index").search(
    q={
        "What are some vehicles and modes of transport?": 1.0,
        "Aeroplanes and other things that fly": -1.0
    },
)
print("\nQuery 2:")
pprint.pprint(results)

results = mq.index("my-first-multimodal-index").search(
    q={"Animals of the Perissodactyla order": -1.0}
)
print("\nQuery 3:")
pprint.pprint(results)

Delete documents

Delete documents.

results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])

Delete index

Delete an index.

results = mq.index("my-first-index").delete()

Running Marqo open source in production

We support Kubernetes templates for Marqo which you can deploy on a cloud provider of your choice. Marqo's Kubernetes implementation allows you to deploy clusters with replicas, multiple storage shards and multiple inference nodes. The repo can be found here: https://github.com/marqo-ai/marqo-on-kubernetes

If you're looking for a fully managed cloud service, you can sign up for Marqo Cloud here: https://cloud.marqo.ai.

Documentation

The full documentation for Marqo can be found here https://docs.marqo.ai/.

Warning

Note that you should not run other applications on Marqo's Vespa cluster as Marqo automatically changes and adapts the settings on the cluster.

Contributors

Marqo is a community project with the goal of making tensor search accessible to the wider developer community. We are glad that you are interested in helping out! Please read this to get started.

Dev setup

  1. Create a virtual env python -m venv ./venv.

  2. Activate the virtual environment source ./venv/bin/activate.

  3. Install requirements from the requirements file: pip install -r requirements.txt.

  4. Run tests by running the tox file. CD into this dir and then run "tox".

  5. If you update dependencies, make sure to delete the .tox dir and rerun.

Merge instructions:

  1. Run the full test suite (by using the command tox in this dir).

  2. Create a pull request with an attached github issue.

Support

  • Ask questions and share your creations with the community on our Discourse forum.
  • Join our Slack community and chat with other community members about ideas.

marqo's People

Contributors

alanzty avatar ali60 avatar amegag avatar aryanagarwal9 avatar danyilq avatar dmyzlata avatar edmuthiah avatar farshidz avatar georgewritescode avatar iain-mackie avatar jeadie avatar jn2clark avatar jonanoribe avatar michalluck avatar namit343 avatar ommirzaei avatar owenpendrighelliott avatar pandu-k avatar rym-oualha avatar skeltsyboiii avatar theonlywayup avatar tomhamer avatar tuanacelik avatar vicilliar avatar vitusacabado avatar vladdoster avatar wanlialex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marqo's Issues

[ENHANCEMENT] better summary of devices and models running at Marqo startup

Is your feature request related to a problem? Please describe.
When marqo starts up, it checks the available devices and outputs a summary. It also loads some models and tests them on the devices. It is a bit hard to pass though from the logs if something was a successful. An improvement would be to have a better summary of the devices and the success of running models on these.

Describe the solution you'd like
On start up, the results from the devices and the models are displayed in a single table.

Describe alternatives you've considered
Nothing really

Additional context
Here is the place https://github.com/marqo-ai/marqo/blob/mainline/src/marqo/tensor_search/on_start_script.py#L88-L112

remote cluster in marqo config

Due to osx security localhost doesn't work for me so Im using 127.0.0.1, but marqo considers it as a remote cluster

code from config.py

lowered_url = url.lower()
if "localhost" in lowered_url or "0.0.0.0" in lowered_url:
urllib3.disable_warnings()
self.cluster_is_remote = False

We should consider including 127.0.0.1

local_url_list = ["localhost", "0.0.0.0", "127.0.0.1"]
if [local_url for local_url in local_url_list if local_url in lowered_url]:

Tensor prefiltering not working for fields with spaces [BUG]

Describe the bug
Tensor prefiltering not working for fields with spaces. When a filter is applied to a field with a space, no documents are retrieved.

To Reproduce
Steps to reproduce the behavior:

  1. Have Marqo running
  2. Index some documents. One should have a field with a space in it:
curl -XPOST  'http://localhost:8882/indexes/my-irst-ix/documents?refresh=true&device=cpu' -H 'Content-type:application/json' -d '
[ 
    {
        "Title": "Honey is a delectable food stuff", 
        "Desc" : "some boring description",
        "_id": "honey_facts_119",
        "gapped field": "wololo"
    }, {
        "Title": "Space exploration",
        "Desc": "mooooon! Space!!!!",
        "_id": "moon_fact_145"
    }
]'

Doing a filtered lexical search works (document id: "honey_facts_119" is retrieved):

curl -XPOST  'http://localhost:8882/indexes/my-irst-ix/search?device=cpu' -H 'Content-type:application/json' -d '{
    "q": "what do bears eat?",
    "searchMethod": "LEXICAL",
    "filter": "gapped\\ field:wololo"
}'

But doing a filtered tensor search doesn't (document id: "honey_facts_119" isn't retrieved):

curl -XPOST  'http://localhost:8882/indexes/my-irst-ix/search?device=cpu' -H 'Content-type:application/json' -d '{
    "q": "what do bears eat?",
    "searchMethod": "TENSOR",
    "filter": "gapped\\ field:wololo"
}'

Expected behavior
When a filter is applied to a field with a space, the documents should be retrieved, in the same way it is for lexical search

Desktop (please complete the following information):

  • OS: Ubuntu amd64
  • Version 0.0.3

Additional context
Add any other context about the problem here.

[ENHANCEMENT] read .svg files

Is your feature request related to a problem? Please describe.
Yes - only images (png, bmp, jpg etc) that are natively supported by PIL can be read.

Describe the solution you'd like
svg files can be read directly for indexing or searching

[BUG] not receiving the desired results

Describe the bug
i took the code from readme and i change the first dict inside the list

    {
        "Title": "Indian culture",
        "Description": '''What, one wonders, is the lowest common denominator of Indian culture today? The attractive Hema Malini? The songs of Vinidh Barati? The attractive Hema Malini? The sons of Vinidh Barati?
 Or the mouth-watering Masala Dosa? Delectable as these may be, each yield pride of place to that false (?) symbol of a new era-the synthetic fibre. In less than twenty years the nylon sari and the terylene shirt have swept the countryside, penetrated to the farthest corners of the land and persuaded every common man, woman and child that the key to success in the present-day world lie in artificial fibers: glass nylon, crepe nylon, tery mixes, polyesters and what have you. More than the bicycles, the wristwatch or the transistor radio, synthetic clothes have come to represent the first step away form the village square. The village lass treasures the flashy nylon sari in her trousseau most delay; the village youth gets a great kick out of his cheap terrycot shirt and trousers, the nearest he can approximate to the expensive synthetic sported by his wealthy citybred contemporaries. And the Neo-rich craze for ‘phoren’ is nowhere more apparent than in the price that people will pay for smuggled, stolen, begged borrowed second hand or thrown away synthetics. Alas, even the uniformity of nylon. 

'''
    }

and the question i ask is to :- q="The latest symbol of modernity for the rural people is?"

now when i redo the question that is now i have the same questions as in the readme doc and run using python command but i still get the output of previous question.

the expected output is the same in readme!

the system is mac m1 and i am running with the help of docker.

[BUG] Request Entity Too Large

Describe the bug
I am using the image-to-image search mode. During the "add_documents" process, marqo throws out an error message which says "HTTPError: 413 Client Error: Request Entity Too Large for url: http://localhost:8882/indexes/duplicate-detection-index/documents?refresh=true&device=cpu". I use 10,000 images to build the index. Is there a limit on the number of images? Or it's something else caused the problem? What can I do to fix this issue? Thanks.

Expected behavior
Use the provided 10,000 images to build the index, and find similar images for the query images.

Screenshots

Desktop (please complete the following information):

  • OS: [e.g. Linux]
  • Browser [e.g. chrome]
  • Version [e.g. 22]

[ENHANCEMENT] support timm models for images

Is your feature request related to a problem? Please describe.
At the moment, only CLIP models are supported. These are good models and work across language and text. However, there are lots of other models in timm that are SOTA in classification and can still provide good embedding. They also span a large range of sizes and architectures so offer good accuracy/latency trade-offs.

Describe the solution you'd like
A new class of timm models can be specified for the "model" field at index creation time.

Describe alternatives you've considered
None

Additional context
https://github.com/rwightman/pytorch-image-models

[BUG]

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

[BUG] RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Describe the bug
Sometimes during indexing with a GPU the following error can arise

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

To Reproduce
Steps to reproduce the behavior:

client.index(index_name).add_documents(documents, batch_size=50, device='cuda', processes=4)

Expected behavior
It should index all the documents

Desktop (please complete the following information):

  • OS: ubuntu 20.04
    RTX 3090

[BUG] `PIL.UnidentifiedImageError: cannot identify image file` during `index.add_documents(...)`

Describe the bug
When adding image files from a URL I received the above error. To mitigate the issue which I believe could've occurred due to rate-limiting on my end, I pre-downloaded all images and served it similar to the apparel demo i.e. via python3 -m http.server 8222. This also still fails with the above error. As a workaround, I am manually inserting each document one at a time as shown below

for data_doc in tqdm(data):
    try:
        responses = mq.index(index_name).add_documents([data_doc], device=device)
        # print(f"<SUCCESS>\nAdded prompt:\n{data_doc['prompt']}\nURI: {data_doc['raw_discord_data_image_uri']}\n")
    except:
        print(f"<FAILURE>\nSkipping prompt:\n{data_doc['prompt']}\nURI: {data_doc['raw_discord_data_image_uri']}\n")

Doing it one at a time does not fail for the direct URL or the pre-downloaded one. I will be continuing my current flow by pre-downloading still. For testing the above is fine but for my real workload of 10M data points this is a blocker.

To Reproduce
Steps to reproduce the behavior:

marqo_settings = {
    "index_defaults": {
        "treat_urls_and_pointers_as_images": True,
        # "image_preprocessing": {
        #     "patch_method": "frcnn"
        # },
        "model":"ViT-B/16",
        "normalize_embeddings":True,
    },
}

Using settings where I include a model for dense retrieval and a URI which may look like this url: https://cdn.discordapp.com/attachments/1005627160410722305/1006718276879003738/rick_and_morty_as_the_thing_fused_with_lovecraft_high_details_intricate_details_renaissance_style_painting_by_vincent_di_fate_artgerm_julie_bel_-H_768_-n_9_-i_-S_687487568_ts-1660090738_idx-3.png
or this internal path: http://host.docker.internal:8222/./artifacts/sample_prompts:v1/sample_prompts/rick_and_morty_as_the_thing_fused_with_lovecraft_high_details_intricate_details_renaissance_style_painting_by_vincent_di_fate_artgerm_julie_bel_-H_768_-n_9_-i_-S_687487568_ts-1660090738_idx-3.png
I get the above behavior consistently.

This behavior comes from /app/src/marqo/s2_inference/clip_utils.py which errs at Image.open(requests.get(image, stream=True).raw)

Expected behavior
Instead of erring the whole insert, I would like an option to ignore errors and be told via logs, stdout, and a response that problematic data points were not inserted at the very least.

Desktop (please complete the following information):
Using Google's Vertex Workbench

  • OS: Debian 10
  • Environment: Python 3 configured for CUDA 11.0 and Intel MKL
  • Machine Type: n1-standard-4 (4 vCPUs, 15GB RAM)

Show number of chunks and document an index has in the stats endpoint[ENHANCEMENT]

Is your feature request related to a problem? Please describe.
I would like to know the sum of the chunks and documents found in an index.

Describe the solution you'd like
When calling the stats endpoint for an index, I'd like to be shown the sum of the chunks and documents found in that index.

Describe alternatives you've considered
There is no real way to do this currently.

Additional context
Add any other context or screenshots about the feature request here.

Unsupported Docker Images for M1 (arm64)

I have an M1 MacBook Pro running macOS Monterey v12.5. Crashing behaviour is being caused when I try to install Marqo through the following Docker commands:

  • docker rm -f marqo
  • DOCKER_BUILDKIT=1 docker build . -t marqo_docker_0
  • docker run --name marqo --privileged -p 8000:8000 --add-host host.docker.internal:host-gateway marqo_docker_0

Error messages:

  • Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested Starting supervisor starting dockerd command dockerd command complete Waiting for processes to be running Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Process dockerd is not running yet. Retrying in 1 seconds

  • failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.4 (legacy): can't initialize iptables table 'nat': iptables who? (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.

The first error message originates from the run_marqo.sh file, and the second one is a docker error message.

Pagination [ENHANCEMENT]

Is your feature request related to a problem? Please describe.
It's inconvenient to only have a single page of results. for search queries. For certain use cases, it would be great for end users to scroll through pages of results.

Describe the solution you'd like
Implement an offset parameter to search functions. Limit and offset parameters can then enable scrolling through result pages. offset and limit can be mapped to the backend opensearch pagination parameters from and size.

Describe alternatives you've considered
Returning a large number of documents. Then manually implementing a client-side scroll feature. This is a lot of work, and it means larger memory overhead for clients.

Additional context
Add any other context or screenshots about the feature request here.

[ENHANCEMENT] Guide for running Marqo on GCP

Is your feature request related to a problem? Please describe.
We should have a guide for users to run Marqo on GCP, similar to the AWS guide that we are currently adding.

Describe the solution you'd like
Any user should be able to read the guide and follow best practices to set up Marqo on GCP

Describe alternatives you've considered
none

Additional context
none

[ENHANCEMENT] Remove the tokenisers parallelism

Is your feature request related to a problem? Please describe.
Batching with multiple processes for a huggingface based model causes the tokenizer to default to non-multi processing to avoid deadlocks. Due to the nature of the inference that is done, it would be ok to either turn off the parallelism in the tokenizer or switch to the python based one. There is no degradation in performance, just a constant warning message.

Describe the solution you'd like
Set an environment variable to turn off the parallelism in the hf tokenizer or default to the python based one.

Additional context
For the env var

TOKENIZERS_PARALLELISM=false

putting that somewhere in the startup script would probably work https://github.com/marqo-ai/marqo/blob/mainline/run_marqo.sh
alternatively the python based tokenizer could be called for all hf based models

Docker not installing

I am having trouble installing Docker on my machine, it tells me that I need windows 10 pro or later.

image

[ENHANCEMENT] Run Marqo on Google colab

Is your feature request related to a problem? Please describe.
A demonstration of Marqo running on google colab

Describe the solution you'd like
Marqo running in colab

Additional context
A good start might be to reproduce the readme examples in google colab

[BUG] Setup Marqo with Docker Compose

Describe the bug
When testing a basic example with Marqo on a project set up with Docker Compose, the requests module fails with the following error:
requests.exceptions.InvalidSchema: No connection adapters were found for '"http://admin:admin@opensearch-node1:9200"/marqo-simplewiki-demo-all'

To Reproduce

  1. Use the following Docker Compose file to set up the infra.
version: "3.7"
services:

  # Official Marqo Build
  marqo-rt:
    image: marqoai/marqo:0.0.4
    container_name: marqo-rt
    environment:
      - OPENSEARCH_URL="http://opensearch-node1:9200"
    ports:
      - 8882:8882
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - opensearch-net
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

  opensearch-node1:
    image: opensearchproject/opensearch:2.3.0
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
      - "DISABLE_INSTALL_DEMO_CONFIG=true" # disables execution of install_demo_configuration.sh bundled with security plugin, which installs demo certificates and security configurations to OpenSearch
      - "DISABLE_SECURITY_PLUGIN=true" # disables security plugin entirely in OpenSearch by setting plugins.security.disabled: true in opensearch.yml
      - "discovery.type=single-node" # disables bootstrap checks that are enabled when network.host is set to a non-loopback address
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - opensearch-net

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:2.3.0
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      - 'OPENSEARCH_HOSTS=["http://opensearch-node1:9200"]'
      - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true" # disables security dashboards plugin in OpenSearch Dashboards
    networks:
      - opensearch-net

volumes:
  opensearch-data1:

networks:
  opensearch-net:
  1. Try to run the simplewiki example against that instance.

Expected behavior
Example should work as advertised.

Desktop (please complete the following information):

  • OS: Ubuntu 22.04 with NVidia GeForce 2070 Super (properly recognized by nvidia-smi)

Additional context
I'm aware the OpenSearch setup is different (I'm using the default from their website) but it's not significantly different from the one provided by Marqo. The failure I see is within the Marqo container, before any API call makes its way to opensearch.

onnx requires cmake to be installed

I've tried to install and test marqo on local but it wasn't able to install the ossx package because it uses the cmake, like in that issue.

easily fixed by just running pip install cmake but might be a bit confusing.

should we include it in readme or try to add cmake as a required package?

[ENHANCEMENT] read pdfs, txt, csv files from pointers

Is your feature request related to a problem? Please describe.
Yes - it could be good to support reading csv, txt, pdf files from pointers (not scanned though). It would read the text directly (no ocr).

Describe the solution you'd like
Have a reader in the same way we do for images. So a pointer a file means it can be read.

Describe alternatives you've considered
Alternatives are that the user does this processing before Marqo. This will always be an option but for less complex use cases it would be very convenient.

Additional context
Add any other context or screenshots about the feature request here.

incorrect results when using image from local disk

I am currently creating a demo application for marqo showcasing the multi-modal index feature.

The dataset I used is a bunch of clothing apparel (shirts, shorts, shoes, hats, etc), which can be found here:
Clothing Dataset

I was able to load the images from local disk by using the docker command:
docker run --name marqo --mount type=bind,source=/user/someone/images/,target=/user/someone/images/ --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:0.0.1

where I replaced the source directory /user/someone/images/ to the directory where the images are located,
and the target directory /user/someone/images/ to the directory where I want to save uploaded files.

When searching images posted via web URL, marqo works as intended.
Image link used: https://d1mcl5z4l1p8tu.cloudfront.net/media/catalog/product/cache/73f803a782a839317b5e9918c11efa7e/c/o/corneliani85g571-0125050-007-4.jpg

image

However when uploading the same picture from local disk, it returns incorrect results:
image

Query used for local directory:
'C:\Users\Vitus\Documents\Work\marqo\demo\corneliani85g571-0125050-007-4.jpg\'
(am using Windows)

PUT Documents - delete fields [ENHANCEMENT]

Is your feature request related to a problem? Please describe.
There is no way to delete a field in an existing doc using the PUT /documents call.

Describe the solution you'd like
A way to delete a field in an existing doc using the PUT /documents endpoint.

Describe alternatives you've considered
The best way is to use the POST /documents endpoint. But this can be expensive.

[ENHANCEMENT] End-to-end demos reproduced in Jupyter notebooks

Is your feature request related to a problem? Please describe.
The end-to-end examples are python scripts, it would be good to also have notebook versions. this would allow easier display of images in particular for the examples

Describe the solution you'd like
Jupyter notebook versions of the demos

[ENHANCEMENT] Digital ocean one click deploy

Is your feature request related to a problem? Please describe.
We should have a digital ocean "one click deploy" for users to run Marqo on Digital ocean, which automatically sets up a GPU instance with adequate storage and memory, along with a guide which details to users the cost and tradeoffs of different options.

Describe the solution you'd like
Any user should be able to read the guide and easily set up Marqo on digital ocean using the "one click" setup

Describe alternatives you've considered
none

Additional context
none

[BUG] highlights return types are different for different search methods

Describe the bug
The results highlights field has a different return type for the different search methods. LEXICAL returns an empty list while TENSOR returns a dict.
accessed viaresults['hits'][0]['_highlights']
To Reproduce
Steps to reproduce the behavior:

  1. Install marqo per readme
import marqo as mq
client = mq.Client()
client.index("my-first-index").add_documents([{'text':'something'}])
client.index("my-first-index").search('something') # _highlights type is a dict
client.index("my-first-index").search('something', search_method='LEXICAL') # _highlights type is a list

Expected behavior
The return types are the same.

Screenshots
image

Desktop (please complete the following information):

  • OS: [e.g. iOS] ubuntu 20.04

first search is always 5 seconds longer then following

Im not sure if it is fixable or not but processingTimeMs of the first search after initialisation of client is always 5.2+ seconds, while all the following no matter if I'm trying same index or any other are around 100ms

[ENHANCEMENT] Patch items

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
We should be able to update a single field (for example some metadata) without having to reindex all the document data.

Describe the solution you'd like
A clear and concise description of what you want to happen.
We should have a patch operation for documents where if the user provides just some of the fields, those fields are added/updated as per the provided ID.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
We could just rely on updates, but for some use cases users need to be able to update like 10k embeddings and only need to change a metadata field, so doing all the transformations is a big waste of compute

Additional context
Add any other context or screenshots about the feature request here.

[BUG] UnicodeDecodeError while reading simplewiki.json

Describe the bug
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34260: character maps to
Python's json is unable to read simplewiki dataset.
Python Version: 3.9.13

Console Output

(venv) D:\Codes\marqo-wiki\src>python simple_wiki_demo.py
Traceback (most recent call last):
 File "D:\Codes\marqo-wiki\src\simple_wiki_demo.py", line 34, in 
  data = read_json(dataset_file)
 File "D:\Codes\marqo-wiki\src\simple_wiki_demo.py", line 15, in read_json
  data = json.load(f)
 File "C:\Users\anubh\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 293, in load
  return loads(fp.read(),
 File "C:\Users\anubh\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
  return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34260: character maps to < undefined >

To Reproduce
Steps to reproduce the behavior:

  1. Go to simplewiki example
  2. Run simple_wiki_demo.py using python simple_wiki_demo.py
  3. See error

Expected behavior
json.load is supposed to load the data without any errors.

Desktop (please complete the following information):

  • OS: Windows 10 Home

Additional context
Working fix: change encoding type while reading file in read_json function. Replace line 13 of this demo script --> with open(filename, 'r', encoding='utf-8') as f:

[ENHANCEMENT] Allow returning tensors results based on average rather than maximum

Is your feature request related to a problem? Please describe.
Allow users to use average rather than maximum when searching vectors

Describe the solution you'd like
When a user creates an index, they specify whether they would like it structured as an average or as a maximum (or both). If the user chooses average, we compute all the vectors, average them and store it.

Issue Installing marqo dependencies on m1 mac

i have been trying to install marqo dependencies on my machine but i keep getting error, the error says;

ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu (from marqo-engine) (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu

i also attached a screenshot of error i am getting
Screenshot 2022-09-18 at 18 17 59

Run OpenSearch for Marqo on a port other than 9200

I tried to run docker run -p 9000:9000 -p 9600:9600 -e “discovery.type=single-node” opensearchproject/opensearch:2.1.0. Notice this has a different port binding to 9200:9200 which is found in the README.md.
I’m running Elasticsearch service in the background for an unrelated task. This is also running at port 9200 so I wanted to run OpenSearch for Marqo on a different port.

[BUG] "POST /indexes/marqo-simplewiki-demo-all HTTP/1.1" 500 Internal Server Error

Describe the bug
Server error 500 trying to run the SimpleWiki example. Tried with 0.0.5 and 0.0.3 as the demo shows.

gllermaly@ubuntu-s-2vcpu-4gb-amd-nyc1-01:~/marqo/SimpleWiki$ python3 simple_wiki_demo.py
loaded data with 188557 entries
Traceback (most recent call last):
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/_httprequests.py", line 131, in __validate
request.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:8882/indexes/marqo-simplewiki-demo-all

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/gllermaly/marqo/SimpleWiki/simple_wiki_demo.py", line 64, in
client.create_index(index_name, model='onnx/all_datasets_v4_MiniLM-L6')
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/client.py", line 62, in create_index
return Index.create(
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/index.py", line 78, in create
return req.post(f"indexes/{index_name}", body={
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/_httprequests.py", line 99, in post
return self.send_request(requests.post, path, body, content_type)
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/_httprequests.py", line 77, in send_request
return self.__validate(response)
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/_httprequests.py", line 134, in __validate
convert_to_marqo_error_and_raise(response=request, err=err)
File "/home/gllermaly/.local/lib/python3.10/site-packages/marqo/_httprequests.py", line 148, in convert_to_marqo_error_and_raise
raise MarqoWebError(message=response_msg, code=code, error_type=error_type,
marqo.errors.MarqoWebError: MarqoWebError: MarqoWebError Error message: {'message': "HTTPSConnectionPool(host='localhost', port=9200): Max retries exceeded with url: /marqo-simplewiki-demo-all (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3bcd18cc10>: Failed to establish a new connection: [Errno 111] Connection refused'))", 'code': 'backend_communication_error', 'type': 'internal', 'link': ''}
status_code: 500, type: internal, code: backend_communication_error, link:

To Reproduce
Steps to reproduce the behavior:

  1. Install marqo
  2. Run examples/SimpleWiki

Expected behavior
Demo should work

Desktop (please complete the following information):

  • OS: Ubuntu 22 server DO fresh droplet

Incorrect default argument in "search" function

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
call the search method and don't specify the "search_method"

Expected behavior
A clear and concise description of what you expected to happen.
Should have displayed the search results

Screenshots
If applicable, add screenshots to help explain your problem.
marqo.errors.MarqoWebError: MarqoWebError: MarqoWebError Error message: {"detail":[{"loc":["body","searchMethod"],"msg":"NEURAL is not a valid SearchMethod","type":"value_error"}]} status_code: 422, type: unhandled_error_type, code: unhandled_error, link:

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
    MacOS Montery v12.5

Additional context
Add any other context about the problem here.

Tensor is not a valid Search Method

When using the example in the readme to help test I get an error stating that tensor is not a valid search method

To Reproduce

  1. After setting up the docker requirements I created a file
  2. Copied the data from the read me to use in the file
  3. Run it in my terminal
  4. See error in terminal

Expected behavior
I expected it to show a similar result like what was in

Screenshots
bugReport

**Desktop **

  • OS: Windows 11
  • Chrome

[BUG] 'unavailable_shards_exception'

Describe the bug
A clear and concise description of what the bug is.

When I was trying to index data, it was taking a long time (minutes) for only a couple of documents. Previously it had taken seconds. When a response was returned it had the following errors:

{'errors': True,
 'items': [{'_id': '0734ac47-5337-434f-8a20-e0e157e78e2b',
   'status': 503,
   'error': {'type': 'unavailable_shards_exception',
    'reason': '[my-multimodal-index][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[my-multimodal-index][0]] containing [3] requests]'}},

To Reproduce
Steps to reproduce the behavior:

  1. Follow the instructions for running the simple http server to allow reading images (https://marqo.pages.dev/advanced_usage/)
  2. Index some data with an image, e.g.
{'jpg_http': 'http://host.docker.internal:8222/dataset/iconic-images-and-descriptions/Fruit/Apple/Golden-Delicious/Golden-Delicious_Iconic.jpg',
  'txt_all': 'Golden Delicious has a white juicy pulp and a greenish yellow '
             'shell. The taste is mellow and sweet, making Golden Delicious '
             'suitable for desserts.\n'},

Expected behavior
The image would be indexed in < 1 second without an error

Desktop (please complete the following information):

  • OS: Ubuntu 20.04

Additional context

[ENHANCEMENT] - Get Indices

Is your feature request related to a problem? Please describe.
I have no way of querying which indices have been loaded to OpenSearch to perform operations on them.

Describe the solution you'd like
A function on the Client object to return a list of indices.

[BUG]

docker rm -f marqo;docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:0.0.3

Not working. Error: unknown flag: --name

[ENHANCEMENT] Search Operation Should Return Multiple Highlights.

Is your feature request related to a problem? Please describe.
Currently, a search operation only returns one highlight for each indexed document.

Describe the solution you'd like
Get the option to specify the number of highlights to be returned for each indexed document.

Describe alternatives you've considered
None

Additional context
I am creating a podcast-demo-code, wherein I index two documents, and each document has the name of the podcast, a short description, and the full transcript.
So whenever I perform a search operation, it just returns one highlight over the whole transcript, I think it will be good if there is an option to return multiple highlights.

[BUG] Getting started curl documentation is outdated/wrong

Curl getting started says that we use /indices api, while package itself uses /indexes.
Also execution of the getting started commands return wrong response {"detail":"Not Found"}{"detail":"Not Found"}.
It's especially confusing when switching from package to curl and vice versa.

image

Slow Inference on arm64 machines

Describe the bug: A clear and concise description of what the bug is.
marqo is taking a lot of time to index and search

To Reproduce: Steps to reproduce the behavior:
On an arm64 machine, use docker to run marqo, and then try to index and search

Expected behavior: A clear and concise description of what you expected to happen.
Operations should be fast

Desktop (please complete the following information):
OS: macOS Monterey v12.5
Machine: M1 MacBookPro, 2020

[BUG] Can't use any tools with marqo

After installing marqo and marqo libraries, I can't run any code using marqo. Even simple codes from readme

Error:
ConnectionRefusedError: [Errno 61] Connection refused
HTTPConnection object at 0x7f7860bce220>: Failed to establish a new connection: [Errno 61] Connection refused
in send_request
raise BackendCommunicationError(str(err)

Desktop

  • OS (macOS 12.0.1)

[ENHANCEMENT] Guide for running Marqo on Azure

Is your feature request related to a problem? Please describe.
We should have a guide for users to run Marqo on Azure, similar to the AWS guide that we are currently adding.

Describe the solution you'd like
Any user should be able to read the guide and follow best practices to set up Marqo on azure

Describe alternatives you've considered
none

Additional context
none

[BUG] tox failing to find PYTHONPATH

Describe the bug
Trying to run tox from marqo results in a key error, cannot find PYTHONPATH

py38 create: /home/jesse/code/s2search/marqo/.tox/py38
___________________________________________________________________________ summary ____________________________________________________________________________
  py38: commands succeeded
  congratulations :)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 354, in get
    return self.resolved[name]
KeyError: 'PYTHONPATH'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/tox", line 11, in <module>
    load_entry_point('tox==3.13.2', 'console_scripts', 'tox')()
  File "/usr/lib/python3/dist-packages/tox/session/__init__.py", line 44, in cmdline
    main(args)
  File "/usr/lib/python3/dist-packages/tox/session/__init__.py", line 68, in main
    exit_code = session.runcommand()
  File "/usr/lib/python3/dist-packages/tox/session/__init__.py", line 192, in runcommand
    return self.subcommand_test()
  File "/usr/lib/python3/dist-packages/tox/session/__init__.py", line 220, in subcommand_test
    run_sequential(self.config, self.venv_dict)
  File "/usr/lib/python3/dist-packages/tox/session/commands/run/sequential.py", line 9, in run_sequential
    if venv.setupenv():
  File "/usr/lib/python3/dist-packages/tox/venv.py", line 594, in setupenv
    status = self.update(action=action)
  File "/usr/lib/python3/dist-packages/tox/venv.py", line 252, in update
    self.hook.tox_testenv_create(action=action, venv=self)
  File "/usr/lib/python3/dist-packages/pluggy/hooks.py", line 286, in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 92, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/lib/python3/dist-packages/pluggy/manager.py", line 83, in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 208, in _multicall
    return outcome.get_result()
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 80, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/usr/lib/python3/dist-packages/pluggy/callers.py", line 187, in _multicall
    res = hook_impl.function(*args)
  File "/usr/lib/python3/dist-packages/tox/venv.py", line 682, in tox_testenv_create
    venv._pcall(
  File "/usr/lib/python3/dist-packages/tox/venv.py", line 553, in _pcall
    env = self._get_os_environ(is_test_command=is_test_command)
  File "/usr/lib/python3/dist-packages/tox/venv.py", line 472, in _get_os_environ
    env.update(self.envconfig.setenv)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 370, in __getitem__
    x = self.get(name, self._DUMMY)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 364, in get
    self.resolved[name] = res = self.reader._replace(val)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 1516, in _replace
    replaced = Replacer(self, crossonly=crossonly).do_replace(value)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 1552, in do_replace
    expanded = substitute_once(value)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 1550, in substitute_once
    return self.RE_ITEM_REF.sub(self._replace_match, x)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 1597, in _replace_match
    return self._replace_substitution(match)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 1632, in _replace_substitution
    val = self._substitute_from_other_section(sub_key)
  File "/usr/lib/python3/dist-packages/tox/config/__init__.py", line 1626, in _substitute_from_other_section
    raise tox.exception.ConfigError("substitution key {!r} not found".format(key))
tox.exception.ConfigError: ConfigError: substitution key '/' not found

To Reproduce

  1. cd into marqo
  2. run tox

Expected behavior
tox runs

Desktop (please complete the following information):

  • OS: Ubuntu 20.04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.