This project harnesses the power of OpenAI's language models (LMs) like GPT-3.5 and GPT-4 for multimodal retrieval and augmented generation. It focuses on analyzing, summarizing, and indexing diverse data types - text, tables, and images. Summaries are stored in a Chroma vectorstore and an InMemoryStore, using OpenAIEmbeddings for sophisticated indexing. The system is designed for advanced information synthesis across formats, priming it for future integrations with multimodal LLMs, including GPT4-V and CLIP, to revolutionize AI-driven content processing and creation.
wahyudesu / multimodal-rag-with-openai Goto Github PK
View Code? Open in Web Editor NEWThis project forked from coding-crashkurse/multimodal-rag-with-openai