Git Product home page Git Product logo

achoz's Introduction

UPDATE: Achoz is rewriting into rust programming lang. which is WIP. progress can be tracked by PR #42

achoz

achoz logo

like a web search, but for your personal files. demo here

It will just normalize your all documents, and later it will be easy to search.

Story

cregox have a lot of data. files, emails, messages, web links, web content, etc. they also are of different kinds; text, video, audio, apps, etc. when trying to find something they do remember to be there, sometimes it gets impossible! the goal of achoz is making cregox self-data-searching-life not only easier, but enable a new world of possibilities, in which they don’t have to worry anymore how to store data for themselves (as long as it’s stored with open and free standards).

more details at http://ahoxus.org/achoz

Installation.

Linux (x86_64,aarch64)

Requirement.

python3.8+ meilisearch

User must have to ensure that you are using same meilisearch version as achoz. Since meilisearch database is not compatible over different version. so achoz have option to install meilisearch for you.

following packages must be installed in your system. Instructions for Debian and ubuntu. use your own package manager to install it.

apt-get install python3-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext file

Termux

Termux requirement is bit different.

pkg install python3-dev libxml2 libxslt antiword poppler tesseract file

After that. use pip to install achoz.

pip install achoz

Meilisearch

Once you have done with above. achoz executable should be in your PATH. Now lets install meilisearch.

sudo achoz --install-meili

it will download and install meilisearch binary at /usr/local/bin/ and for termux it is $PREFIX/bin Meilisearch could be install at desire path. just make sure that path should be cover by $PATH Environment.

achoz --install-meili path/to/dir

Usage

Quick start

achoz start -a ~/Documents

for adding more directory, provide comma sepatated list of dirs. like ~/Documents,~/music

what above command gonna do is, it will start crawling all documents and file in documents directory. and it will start a web server at default port 8990. It will create an config.json at ~/.achoz , you could add more options at config file or with command-line itself.

Also using configuration file is recommended way to go with achoz.

Configuration.

Config file at ~/.achoz/config.json will create automatically if you run achoz with or without option at first time.

Sample config file

{
    "dir_to_index": ["/home/kcubeterm/Documents","/home/kcubeterm/books"],
    "dir_to_ignore": ["/home/kcubeterm/secrets"],
    "extenstion_to_ignore": ["db","git","mp3","webm"],
    "file_to_ignore": [],
    "web_port": 8990,
    "meili_api_port": 8989,
    "data_dir": "/home/kcubeterm/.achoz",
    "priority": "low"
}

Explain config

dir_to_index: contains list of directory which you are willing to normalize(crawl,index,searchable). command line option -a dir1,dir2,dir3 does the same. Don't use any kind of pattern here(except: '~'). use absolute path.

dir_to_ignore: Show your regrex skills here. Patterns can be use to ignore the directory or you can just give absolute path if not advanced patterns. Any hidden directory ignored by default. any pattern you provide will match with directory not file. if you want to ignore files. there is another option.file_to_ignore Note: under the hood. it uses re.match() so make sure your patterns are compatible to python re.match.

extesnion_to_ignore: Just put extension to which ignore. No pattern. just extension.

file_to_ignore: Any python re.match() compatible patterns. It will specifically for files.

web_port : Specify on which port web server gonna listen. Default:8990

meili_api_port: The backend api Meilisearch server gonna listen on it. Default:8989

data_dir: Directory where program will keep metadata and database. Default: ~/.achoz

priority: (High or Low) It will decide priority of CPU time to be given to achoz program. Default: low

Command-line options

achoz -h is enough to know about all command line option.

Techical issues and info

  • Meilisearch consumes too much ram while indexing. if system dont have enough ram. Meilisearch may not function. Ensure you have at least 500+ MB of available RAM.

achoz's People

Contributors

cauerego avatar kcubeterm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

achoz's Issues

achoz engine fails

When I run achoz engine I get this...

I20220101 16:41:18.443842 2033983 typesense_server_utils.cpp:306] Starting Typesense 0.22.1
I20220101 16:41:18.443964 2033983 typesense_server_utils.cpp:309] Typesense is using jemalloc.
I20220101 16:41:18.444278 2033983 typesense_server_utils.cpp:358] Thread pool size: 32
I20220101 16:41:18.448292 2033983 store.h:61] Initializing DB by opening state dir: /home/jcm/.achoz/searchdb/db
I20220101 16:41:18.696844 2033983 store.h:61] Initializing DB by opening state dir: /home/jcm/.achoz/searchdb/meta
I20220101 16:41:18.911377 2033983 typesense_server_utils.cpp:437] Starting API service...
I20220101 16:41:18.911537 2034104 typesense_server_utils.cpp:209] Since no --nodes argument is provided, starting a single node Typesense cluster.
I20220101 16:41:18.911878 2033983 http_server.cpp:172] Typesense has started listening on port 8909
I20220101 16:41:18.911928 2034105 batched_indexer.cpp:120] Starting batch indexer with 32 threads.
I20220101 16:41:18.913426 2034105 batched_indexer.cpp:126] BatchedIndexer skip_index: -9999
E20220101 16:41:18.917851 2034104 server.cpp:955] Fail to listen 10.0.0.32:8107
E20220101 16:41:18.917901 2034104 typesense_server_utils.cpp:245] Failed to start peering service
report error

I'm not sure why it's listening on 10.0.0.32:8107. I believe it should be 192.168.0.32:8107. That could be the issue, but I'm not clear if Typesense or Achoz is issuing this.

Provide docker image.

Once I finish with meilisearch. Docker image will be available for cross platform. Dockerised achoz will be easy to setup with no-hassle of installing any requirements.

Various language support

Hi, I have been using your achoz project for a time and I really enjoy it! Yet, it came to my attention that achoz does not provide any support for other languages scripts, as I needed to use it for Arabic and Turkish. Do you have any recommendations on the issue, is it related with file encoding while opening the documents or related with some other thing?

Thank you very much in advance

private versus public repository

why did you want to go private?

i'm all in favour of making it public.

we probably won't get much help, but since the intention is to be public, it's a good exercise.

such as remembering to always keep sensitive data out... and failing to do so, and learning how to go around issues that may arise from it. 🤣

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.