Git Product home page Git Product logo

khoj-ai / khoj Goto Github PK

View Code? Open in Web Editor NEW
4.8K 32.0 240.0 80.73 MB

Your AI second brain. A copilot to get answers to your questions, whether they be from your own notes or from the internet. Use powerful, online (e.g gpt4) or private, local (e.g mistral) LLMs. Self-host locally or use our web app. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp.

Home Page: https://khoj.dev

License: GNU Affero General Public License v3.0

Python 56.81% Emacs Lisp 4.45% HTML 29.15% JavaScript 3.33% CSS 1.58% Dockerfile 0.17% TypeScript 4.19% Shell 0.31%
semantic-search org-mode emacs markdown obsidian-md chat chatgpt ai llm productivity

khoj's Introduction

Khoj Logo

test dockerize pypi Discord

The open-source, personal AI for your digital brain

๐Ÿค– Read Docs ย ย โ€ขย ย  ๐Ÿฎ Khoj Cloud ย ย โ€ขย ย  ๐Ÿ’ฌ Get Involved ย ย โ€ขย ย  ๐Ÿ“š Read Blog


Khoj is an application that creates always-available, personal AI agents for you to extend your capabilities.

  • You can share your notes and documents to extend your digital brain.
  • Your AI agents have access to the internet, allowing you to incorporate realtime information.
  • Khoj is accessible on Desktop, Emacs, Obsidian, Web and Whatsapp.
  • You can share pdf, markdown, org-mode, notion files and github repositories.
  • You'll get fast, accurate semantic search on top of your docs.
  • Your agents can create deeply personal images and understand your speech.
  • Khoj is open-source, self-hostable. Always.

See it in action

Khoj Demo

Go to https://app.khoj.dev to see Khoj live.

Full feature list

You can see the full feature list here.

Self-Host

To get started with self-hosting Khoj, read the docs.

Contributors

Cheers to our awesome contributors! ๐ŸŽ‰

Made with contrib.rocks.

Interested in Contributing?

We are always looking for contributors to help us build new features, improve the project documentation, or fix bugs. If you're interested, please see our Contributing Guidelines and check out our Contributors Project Board.

Shout out to our brilliant sponsors! ๐ŸŒˆ

khoj's People

Contributors

ajaysdwivedi1 avatar albd avatar asim-shrestha avatar axelson avatar bholagabbar avatar comprehensive-jason avatar debanjum avatar dtkav avatar ducksblock avatar ellen7ions avatar eltociear avatar felixonmars avatar hyunggyujang avatar jonny-gm avatar jtbg avatar liamswayne avatar muftawo avatar olatoyan avatar sabaimran avatar sjbutler avatar spott avatar suliveevil avatar telotortium avatar tjsousa avatar tuan3w avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

khoj's Issues

Fix failing cloud tests

The Github Actions pipeline is failing because the pipeline is not able to find the sbert model_name for CLIP

image

khoj-assistant errors on Arch Linux

On Arch Linux, if install khoj-assistant via pip and then run khoj, I get the following error:

RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
Traceback (most recent call last):
  File "/home/slade/.local/bin/khoj", line 5, in <module>
    from src.main import run
  File "/home/slade/.local/lib/python3.10/site-packages/src/main.py", line 16, in <module>
    from src.configure import configure_server
  File "/home/slade/.local/lib/python3.10/site-packages/src/configure.py", line 12, in <module>
    from src.search_type import image_search, text_search
  File "/home/slade/.local/lib/python3.10/site-packages/src/search_type/image_search.py", line 11, in <module>
    from sentence_transformers import SentenceTransformer, util
  File "/home/slade/.local/lib/python3.10/site-packages/sentence_transformers/__init__.py", line 3, in <module>
    from .datasets import SentencesDataset, ParallelSentencesDataset
  File "/home/slade/.local/lib/python3.10/site-packages/sentence_transformers/datasets/__init__.py", line 3, in <module>
    from .ParallelSentencesDataset import ParallelSentencesDataset
  File "/home/slade/.local/lib/python3.10/site-packages/sentence_transformers/datasets/ParallelSentencesDataset.py", line 4, in <module>
    from .. import SentenceTransformer
  File "/home/slade/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 25, in <module>
    from .evaluation import SentenceEvaluator
  File "/home/slade/.local/lib/python3.10/site-packages/sentence_transformers/evaluation/__init__.py", line 3, in <module>
    from .BinaryClassificationEvaluator import BinaryClassificationEvaluator
  File "/home/slade/.local/lib/python3.10/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py", line 5, in <module>
    from sklearn.metrics.pairwise import paired_cosine_distances, paired_euclidean_distances, paired_manhattan_distances
  File "/home/slade/.local/lib/python3.10/site-packages/sklearn/__init__.py", line 82, in <module>
    from .base import clone
  File "/home/slade/.local/lib/python3.10/site-packages/sklearn/base.py", line 17, in <module>
    from .utils import _IS_32BIT
  File "/home/slade/.local/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 22, in <module>
    from scipy.sparse import issparse
  File "/usr/lib/python3.10/site-packages/scipy/sparse/__init__.py", line 267, in <module>
    from ._csr import *
  File "/usr/lib/python3.10/site-packages/scipy/sparse/_csr.py", line 10, in <module>
    from ._sparsetools import (csr_tocsc, csr_tobsr, csr_count_blocks,
ImportError: numpy.core.multiarray failed to import

If I then force update numpy (via pip install numpy --upgrade, and create ~/.khoj and touch ~/.khoj/khoj.log`), then I get the error:

/home/slade/.local/lib/python3.10/site-packages/huggingface_hub/snapshot_download.py:6: FutureWarning: snapshot_download.py has been made private and will no longer be available from version 0.11. Please use `from huggingface_hub import snapshot_download` to import the only public function in this module. Other members of the file may be changed without a deprecation notice.
  warnings.warn(
Traceback (most recent call last):
  File "/home/slade/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/slade/.local/lib/python3.10/site-packages/src/main.py", line 93, in run
    main_window = MainWindow(args.config_file)
  File "/home/slade/.local/lib/python3.10/site-packages/src/interface/desktop/main_window.py", line 62, in __init__
    self.search_settings_panels += [self.add_settings_panel(current_content_config, search_type)]
  File "/home/slade/.local/lib/python3.10/site-packages/src/interface/desktop/main_window.py", line 95, in add_settings_panel
    input_files = FileBrowser(file_input_text, search_type, current_content_files)
  File "/home/slade/.local/lib/python3.10/site-packages/src/interface/desktop/file_browser.py", line 28, in __init__
    self.setFiles(default_files)
  File "/home/slade/.local/lib/python3.10/site-packages/src/interface/desktop/file_browser.py", line 62, in setFiles
    self.filepaths = [path for path in paths if not is_none_or_empty(path)]
TypeError: 'NoneType' object is not iterable

Edit: And if I try installing via conda, I end up with the same second error (the huggingface one); given other issues on other platforms (e.g. #78 ) I suspect this is not an Arch-specific issue (or probably a Python 3.10 specific issue either).

Add a demo for semantic search using Harry Potter data

Idea, riffing and open to modification!

We could host a standalone demo for Semantic Search using Harry Potter as the input data to generate embeddings.
We could use a platform like Streamlit to quickly deploy a simple, nice UI that simply calls into a service hosting a model trained on that data in the backend.

Users could make detailed queries like, "Where did Harry first catch a snitch?" Or "What was Aunt Petunia's favorite desert?" Or equivalent.

khoj-assistant errors on manjaro

Traceback (most recent call last):
  File "/home/ea/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/ea/.local/lib/python3.10/site-packages/src/main.py", line 67, in run
    args = cli(state.cli_args)
  File "/home/ea/.local/lib/python3.10/site-packages/src/utils/cli.py", line 36, in cli
    args.config = parse_config_from_file(args.config_file)
  File "/home/ea/.local/lib/python3.10/site-packages/src/utils/yaml.py", line 38, in parse_config_from_file
    return parse_config_from_string(load_config_from_file(yaml_config_file))
  File "/home/ea/.local/lib/python3.10/site-packages/src/utils/yaml.py", line 33, in parse_config_from_string
    return FullConfig.parse_obj(yaml_config)
  File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 5 validation errors for FullConfig
content-type -> org -> input-filter
  Either input_filter or input_files required in all content-type.<text_search> section of Khoj config file (type=value_error)
content-type -> ledger -> input-filter
  Either input_filter or input_files required in all content-type.<text_search> section of Khoj config file (type=value_error)
content-type -> image -> input-filter
  Either input_filter or input_directories required in all content-type.image section of Khoj config file (type=value_error)
content-type -> music -> input-filter
  Either input_filter or input_files required in all content-type.<text_search> section of Khoj config file (type=value_error)
processor -> conversation -> openai-api-key
  none is not an allowed value (type=type_error.none.not_allowed)

FAQ as a service

Productionize deployment of semantic search services on top of notes.

Searching words in arbitrary order (aka helm)

I have the following headline:

******** DONE Zheng, Mara [Nature Communications] (2013) High-strength and thermally stable bulk nanolayered composites due to twin-induced interfaces :BOOKMARK:FLAGGED:@work:article:ATTACH:
CLOSED: [2022-07-20 Wed 09:58]
:PROPERTIES:
:CREATED: [2020-05-21 Thu 12:12]
:Source: https://www.nature.com/articles/ncomms2651
:ID:       Zheng_2013
:PUBLISHER: Nature Publishing Group
:NOTE:     Online; accessed 14 August 2021
:HOWPUBLISHED: Nature
:URL:      https://doi.org/10.1038/ncomms2651
:DOI:      10.1038/ncomms2651
:YEAR:     2013
:JOURNAL:  Nature Communications
:AUTHOR:   Zheng, Shijian and Beyerlein, Irene J. and Carpenter, John S. and Kang, Keonwook and Wang, Jian and Han, Weizhong and Mara, Nathan A.
:BTYPE:    article
:TITLE:    High-strength and thermally stable bulk nanolayered composites due to twin-induced interfaces
:Effort:   0:20
:SHOWFROMDATE: 2021-09-04
:END:
:LOGBOOK:
- State "DONE"       from "REVIEW"          [2022-07-20 Wed 09:58]
- State "DONE"       from "DONE"          [2021-09-06 Mon 13:57]
CLOCK: [2021-09-06 Mon 13:48]--[2021-09-06 Mon 13:57] =>  0:09
CLOCK: [2021-09-03 Fri 14:20]--[2021-09-03 Fri 14:43] =>  0:23
CLOCK: [2021-09-02 Thu 13:59]--[2021-09-02 Thu 14:06] =>  0:07
CLOCK: [2021-08-31 Tue 15:14]--[2021-08-31 Tue 15:17] =>  0:03
CLOCK: [2021-08-30 Mon 16:04]--[2021-08-30 Mon 16:09] =>  0:05
CLOCK: [2021-08-27 Fri 17:22]--[2021-08-27 Fri 17:29] =>  0:07
- Refiled on [2021-08-14 Sat 11:52]
- Refiled on [2021-03-19 Fri 16:01]
- Refiled on [2020-08-18 Tue 17:39]
- Refiled on [2020-05-22 Fri 10:24]
:END:
 #ARB #EBSD #Cu-Nb #texture

- Cu/Nb ARB texture evolves like the following:
  - Nb={112}<110> all the time (<100nm layer thickness); Cu={4 4 11}<11 11 8> (200-100nm) -> {112}<111> (100-50nm) -> {112}<111> + twin {552}<115> (50-10nm) -> {552}<115> -> {551}<1 1 10> (10nm)
- Layers are single crystalline (SX) below 200nm layer thickness

I am trying to match it using "thermally stable high strength" search term. However, no match is given, which is unexpected.

forbidden path outside the build context

docker-compose up -d / docker-compose build --no-cache:l

Setting up libunicode-linebreak-perl (0.0.20190101-1+b3) ...
Processing triggers for libc-bin (2.31-13+deb11u2) ...
Removing intermediate container 8b84aaae34f7
 ---> dec05fd1d293
Step 3/8 : ADD .. /app
1 error occurred:
        * Status: ADD failed: forbidden path outside the build context: .. (), Code: 1

Khoj insall failed

Looking to try khoj but installation / setup not successful. This is on Fedora 36. Log tail below:

INFO: Started server process [136730]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
/home/d/miniconda3/lib/python3.9/site-packages/huggingface_hub/file_download.py:560: FutureWarning: cached_download is the legacy way to download files from the HF hub, please consider upgrading to hf_hub_download
warnings.warn(
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 737/737 [00:00<00:00, 413kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 190/190 [00:00<00:00, 98.1kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 11.5k/11.5k [00:00<00:00, 6.34MB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 612/612 [00:00<00:00, 323kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 116/116 [00:00<00:00, 68.4kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 25.5k/25.5k [00:00<00:00, 339kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 349/349 [00:00<00:00, 177kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 90.9M/90.9M [00:17<00:00, 5.20MB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 53.0/53.0 [00:00<00:00, 31.4kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 112/112 [00:00<00:00, 81.4kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 466k/466k [00:00<00:00, 1.46MB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 383/383 [00:00<00:00, 234kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 13.8k/13.8k [00:00<00:00, 183kB/s]
Downloading: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 232k/232k [00:00<00:00, 763kB/s]
Downloading config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 794/794 [00:00<00:00, 445kB/s]
Downloading pytorch_model.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 86.7M/86.7M [00:17<00:00, 5.13MB/s]
Downloading tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 316/316 [00:00<00:00, 189kB/s]
Downloading vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 226k/226k [00:00<00:00, 753kB/s]
Downloading special_tokens_map.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 112/112 [00:00<00:00, 56.2kB/s]
Traceback (most recent call last):
File "/home/d/miniconda3/lib/python3.9/site-packages/src/interface/desktop/main_window.py", line 296, in run
self.load_settings_func()
File "/home/d/miniconda3/lib/python3.9/site-packages/src/interface/desktop/main_window.py", line 237, in load_updated_settings
configure_server(args, required=True)
File "/home/d/miniconda3/lib/python3.9/site-packages/src/configure.py", line 36, in configure_server
state.model = configure_search(state.model, state.config, args.regenerate)
File "/home/d/miniconda3/lib/python3.9/site-packages/src/configure.py", line 46, in configure_search
model.orgmode_search = text_search.setup(
File "/home/d/miniconda3/lib/python3.9/site-packages/src/search_type/text_search.py", line 173, in setup
text_to_jsonl(config.input_files, config.input_filter, config.compressed_jsonl)
File "/home/d/miniconda3/lib/python3.9/site-packages/src/processor/org_mode/org_to_jsonl.py", line 32, in org_to_jsonl
entries, file_to_entries = extract_org_entries(org_files)
File "/home/d/miniconda3/lib/python3.9/site-packages/src/processor/org_mode/org_to_jsonl.py", line 72, in extract_org_entries
org_file_entries = orgnode.makelist(str(org_file))
File "/home/d/miniconda3/lib/python3.9/site-packages/src/processor/org_mode/orgnode.py", line 170, in makelist
thisNode = Orgnode(level, heading, bodytext, tags)
File "/home/d/miniconda3/lib/python3.9/site-packages/src/processor/org_mode/orgnode.py", line 217, in init
self.level = len(level)
TypeError: object of type 'int' has no len()
Aborted (core dumped)
(base) [d@g14 ~]$

Only Search on Enabled Content Types from Emacs Interface

  • Currently Emacs Interface allows searching all supported content types, even if they've not actually been enabled by the user
  • This creates unnecessary options, confusion and visual clutter
  • As more content types are added this issue will become more annoying

Error on startup

Was interested in trying this out after the Reddit post. However after pip install khoj-assistant && khoj I get this error and khoj dies

โฏ khoj /usr/local/lib/python3.9/site-packages/huggingface_hub/snapshot_download.py:6: FutureWarning: snapshot_download.py has been made private and will no longer be available from version 0.11. Please use from huggingface_hub import snapshot_download to import the only public function in this module. Other members of the file may be changed without a deprecation notice. warnings.warn( Traceback (most recent call last): File "/usr/local/bin/khoj", line 8, in <module> sys.exit(run()) File "/usr/local/lib/python3.9/site-packages/src/main.py", line 93, in run main_window = MainWindow(args.config_file) File "/usr/local/lib/python3.9/site-packages/src/interface/desktop/main_window.py", line 62, in __init__ self.search_settings_panels += [self.add_settings_panel(current_content_config, search_type)] File "/usr/local/lib/python3.9/site-packages/src/interface/desktop/main_window.py", line 95, in add_settings_panel input_files = FileBrowser(file_input_text, search_type, current_content_files) File "/usr/local/lib/python3.9/site-packages/src/interface/desktop/file_browser.py", line 28, in __init__ self.setFiles(default_files) File "/usr/local/lib/python3.9/site-packages/src/interface/desktop/file_browser.py", line 62, in setFiles self.filepaths = [path for path in paths if not is_none_or_empty(path)] TypeError: 'NoneType' object is not iterable

could be local python setup . This is on MacOS Monterey

Support Multiple Input Filters

input-filter field in khoj.yml only accepts a single glob string currently. Allow passing a list of glob strings to input-filter for filtering

Current:

content-type:
  org: 
    input-filter: "~/notes/*.org"

Expected:

content-type:
  org: 
    input-filter: 
      - "~/notes/*.org"
      - "~/documents/*.org"

errors with org files containing certain types of structures

With a khoj.yml file containing:

content-type:
  org:
    compressed-jsonl: ~/.khoj/content/org/org.jsonl.gz
    embeddings-file: ~/.khoj/content/org/org_embeddings.pt
    input-files: null
    input-filter: "/home/slade/Documents/Org/*.org"
processor: {}
search-type:
  asymmetric:
    cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
    encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
    model_directory: ~/.khoj/search/asymmetric/
  image:
    encoder: sentence-transformers/clip-ViT-B-32
    model_directory: ~/.khoj/search/image/
  symmetric:
    cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
    encoder: sentence-transformers/all-MiniLM-L6-v2
    model_directory: ~/.khoj/search/symmetric/

running khoj --regenerate produces an error:

Traceback (most recent call last):
  File "/home/slade/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/slade/.local/lib/python3.10/site-packages/src/main.py", line 108, in run
    configure_server(args, required=False)
  File "/home/slade/.local/lib/python3.10/site-packages/src/configure.py", line 36, in configure_server
    state.model = configure_search(state.model, state.config, args.regenerate)
  File "/home/slade/.local/lib/python3.10/site-packages/src/configure.py", line 46, in configure_search
    model.orgmode_search = text_search.setup(
  File "/home/slade/.local/lib/python3.10/site-packages/src/search_type/text_search.py", line 173, in setup
    text_to_jsonl(config.input_files, config.input_filter, config.compressed_jsonl)
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/org_to_jsonl.py", line 32, in org_to_jsonl
    entries, file_to_entries = extract_org_entries(org_files)
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/org_to_jsonl.py", line 72, in extract_org_entries
    org_file_entries = orgnode.makelist(str(org_file))
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/orgnode.py", line 170, in makelist
    thisNode = Orgnode(level, heading, bodytext, tags)
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/orgnode.py", line 217, in __init__
    self.level = len(level)
TypeError: object of type 'int' has no len()

Use better logging

Rather than using print statements, use a logging framework to log warnings/errors/information.๏ปฟ

Conflicting dependencies when installing

How to fix this problem?

ERROR: Cannot install khoj-assistant==0.1.0, khoj-assistant==0.1.2, khoj-assistant==0.1.3 and khoj-assistant==0.1.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    khoj-assistant 0.1.4 depends on numpy==1.22.4
    khoj-assistant 0.1.3 depends on numpy==1.22.4
    khoj-assistant 0.1.2 depends on numpy==1.22.4
    khoj-assistant 0.1.0 depends on numpy==1.22.4

Investigate simplenote support

  • simplenote is a note-taking app that supports generic .txt files as well as markdown .md files. See if there's anything in particular required for using this.

Longer term:

  • See if there's a way to give deeper search support for Simplenote, perhaps in application? The code is open source. Could we add a Khoj extension?

Add File Search Filter

Allow filtering search results from specified files only

  • This will remove the need to have a separate type for org-music altogether
  • All org-mode files to query go under content-type.org. To limit results to music.org, just add a (Emacs or Browser) bookmark with file:music.org added to query parameter

Create Desktop App for MacOS

Purpose

Simplify Installing and Configuring Khoj

Background

  • Khoj can already been installed via pip. That made it easier for developers to install the application
  • Wrapping Khoj into a desktop app should make it easy for non-developers as well to use the app

Details

  • Create First Run Experience to Configure App
  • Allow Configuring App via GUI Settings Page
  • Put App on OS System Tray
    • Puts application in background, while still keeping it easy to pull it up when required

Related

Error installing on Manjaro

After using pip to install khoj, I get the following error when running khoj:

/home/alan/.local/lib/python3.10/site-packages/huggingface_hub/snapshot_download.py:6: FutureWarning: snapshot_download.py has been made private and will no longer be available from version 0.11. Please use `from huggingface_hub import snapshot_download` to import the only public function in this module. Other members of the file may be changed without a deprecation notice.
  warnings.warn(
Traceback (most recent call last):
  File "/home/alan/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/alan/.local/lib/python3.10/site-packages/src/main.py", line 80, in run
    fh = logging.FileHandler(state.config_file.parent / 'khoj.log')
  File "/usr/lib/python3.10/logging/__init__.py", line 1169, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python3.10/logging/__init__.py", line 1201, in _open
    return open_func(self.baseFilename, self.mode,
FileNotFoundError: [Errno 2] No such file or directory: '/home/alan/.khoj/khoj.log'

If I manually create the .khoj folder and an empty khoj.log file, I get this error:

/home/alan/.local/lib/python3.10/site-packages/huggingface_hub/snapshot_download.py:6: FutureWarning: snapshot_download.py has been made private and will no longer be available from version 0.11. Please use `from huggingface_hub import snapshot_download` to import the only public function in this module. Other members of the file may be changed without a deprecation notice.
  warnings.warn(
Traceback (most recent call last):
  File "/home/alan/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/alan/.local/lib/python3.10/site-packages/src/main.py", line 93, in run
    main_window = MainWindow(args.config_file)
  File "/home/alan/.local/lib/python3.10/site-packages/src/interface/desktop/main_window.py", line 62, in __init__
    self.search_settings_panels += [self.add_settings_panel(current_content_config, search_type)]
  File "/home/alan/.local/lib/python3.10/site-packages/src/interface/desktop/main_window.py", line 95, in add_settings_panel
    input_files = FileBrowser(file_input_text, search_type, current_content_files)
  File "/home/alan/.local/lib/python3.10/site-packages/src/interface/desktop/file_browser.py", line 28, in __init__
    self.setFiles(default_files)
  File "/home/alan/.local/lib/python3.10/site-packages/src/interface/desktop/file_browser.py", line 62, in setFiles
    self.filepaths = [path for path in paths if not is_none_or_empty(path)]
TypeError: 'NoneType' object is not iterable

Docker run environment can't deep link to original file

The way we are mounting volumes in the Docker container means that it doesn't have access to the original file location in the user's filesystem. Due to this, deep linking (in org notes) will fail, as it will try to generate the link based on the mounted volume's name.

To reproduce the behavior in the Docker environment:

  1. Run a query against your org notes
  2. Investigate the PROPERTIES bag
  3. Look at the link associated with the LINE and SOURCE fields; they should be referencing a filepath based on the mounted volume, rather than the local directory

There are some different ways we could work around this, ranging from more to less messy. We could expose a mapping somewhere outside of the docker-compose.yml file that specifies the local directory for the given mounted volume. We could have the mounted volumes follow the same naming convention as the local directory, and provide a standardized mapping in the config.yml.

Support adding Directory of org files

I use a hierarchical org set up like below. Khoj configuration requires selecting individual org files vs a directory with recursive addition of contained org files. Please consider adding.

OrgDocuments/
Personal/
many org files
roam/
many org files
Work/
many org files
roam/
man-org-files

Add Configuration Flag to Index Entries with Empty Body

  • Currently entries with only headings and no body are not indexed
  • Once we do start to index entries with no body, a filter to exclude such entries would be preferable
  • This will increase the flexibility of khoj to work for more use-cases

See the discussion on #83 for more details

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2: invalid start byte for some org files

khoj fails to index a directory of org files, producing error:

Traceback (most recent call last):
  File "/home/slade/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/slade/.local/lib/python3.10/site-packages/src/main.py", line 112, in run
    configure_server(args, required=False)
  File "/home/slade/.local/lib/python3.10/site-packages/src/configure.py", line 36, in configure_server
    state.model = configure_search(state.model, state.config, args.regenerate)
  File "/home/slade/.local/lib/python3.10/site-packages/src/configure.py", line 46, in configure_search
    model.orgmode_search = text_search.setup(
  File "/home/slade/.local/lib/python3.10/site-packages/src/search_type/text_search.py", line 173, in setup
    text_to_jsonl(config.input_files, config.input_filter, config.compressed_jsonl)
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/org_to_jsonl.py", line 32, in org_to_jsonl
    entries, file_to_entries = extract_org_entries(org_files)
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/org_to_jsonl.py", line 72, in extract_org_entries
    org_file_entries = orgnode.makelist(str(org_file))
  File "/home/slade/.local/lib/python3.10/site-packages/src/processor/org_mode/orgnode.py", line 83, in makelist
    for line in f:
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 676: invalid start byte

I'm not sure which files are triggering this error, but will experiment.

failed to deployment

run install command pip3 install khoj-assistant in shell

then get warn message:

 ร— Getting requirements to build wheel did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [17 lines of output]
      Error in sitecustomize; set PYTHONVERBOSE for traceback:
      AssertionError:
      Traceback (most recent call last):
        File "/opt/homebrew/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/opt/homebrew/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/opt/homebrew/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/opt/homebrew/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/setuptools/build_meta.py", line 177, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/opt/homebrew/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/setuptools/build_meta.py", line 159, in _get_build_requires
          self.run_setup()
        File "/opt/homebrew/Cellar/[email protected]/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/setuptools/build_meta.py", line 174, in run_setup
          exec(code, locals())
        File "<string>", line 2, in <module>
      ModuleNotFoundError: No module named 'setuptools_rust'
      [end of output]

"ImportError: libGL.so.1" when launched with docker-compose

My environment is Ubuntu 22.04.1 LTS. I tried docker-compose up -d and docker-compose logs and the latter says:

Attaching to khoj_server_1
server_1  | Traceback (most recent call last):
server_1  |   File "/usr/local/bin/khoj", line 5, in <module>
server_1  |     from src.main import run
server_1  |   File "/usr/local/lib/python3.10/site-packages/src/main.py", line 12, in <module>
server_1  |     from PyQt6 import QtWidgets
server_1  | ImportError: libGL.so.1: cannot open shared object file: No such file or directory

http://0.0.0.0:8000/ is not accessible.

Consider outline hierarchy when searching?

I am playing around with the search and comparing the results with my https://github.com/yantar92/org-ql/ setup.

One obvious point that is not handled by khoj is matching the outline path for nodes:
Consider the following outline structure:

  • Topics
    ** Emacs
    *** debanjum [Github] debanjum/khoj: Natural Language Search Engine for your Org-Mode and Markdown notes, Beancount transactions and Photos

A natural way to search for the deepest node is "emacs org search engine", but khoj does not yield the match here.

Khoj Fails on Desktop App Startup with RuntimeError: Failed to import transformers.models.clip.processing_clip

Reproduction

  1. Install Khoj from debian file
    dpkg -i khoj_master_amd64.deb
  2. Create khoj.yml with image search enabled and populated. Fork from khoj_sample.yml
    cp config/khoj_sample.yml ~/.khoj.yml
    # now set `input-files` in `content-type > image` section of file. Delete rest of the `content-type` sections
  3. Run Khoj
    /opt/Khoj -c=~/.khoj.yml --no-gui -vv
  4. This should fail with below error
    RuntimeError: Failed to import transformers.models.clip.processing_clip because of the following error (look up to see its traceback):
    [Errno 2] No such file or directory: '/tmp/_MEIDcQvKV/transformers/__init__.py'

Mitigation: Disable image search
Or equivalently, remove the image_search subsection from the ~/.khoj.yml. Rest of the subsections can be added.
App should now start and work just fine

Set Results Count from Web Interface

  • The Backend API supports passing count of results to be returned by settings query param.
    E.g http://<khoj-url/search?&n=<results_count>
  • The user interfaces need to allow users to configure this field for search

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.