eladve / omnitool Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 41 KB

a tool for taking screenshots automatically in the background and then searching for words in them

License: MIT License

Python 100.00%

omnitool's People

Contributors

Stargazers

Watchers

omnitool's Issues

Request for Improvement: Improve screenshot compression

screenshot compression right now just uses the PIL.Image options that I found give a screenshot size of around 10 kilobytes. But I'm sure this can be much improved, to decrease size and/or improve readability of the text.

One wild idea would be to recognize the text parts of the screen (using the bounding boxes produced by tessaract), and then to compress those parts using one algorithm (that compresses on-screen text well, and compress the other parts using another algorithm, and then to paste these two compressions together (either when saving to the database, or store them seperately and paste them together only while recalling from database). this is cool, might be overengineering.

Ideas for Additional Features

Can add many features for this. Some low hanging fruit:

time tracker: recognize which application is running, and track time usage, together with some metadata, and allow to jump to that point in time
use NLP to do named entity detection and part-of-speech tagging, to add some metadata to the screenshots and maybe connect them to each other and to known entities. e.g. can build an "automatic CRM" with this, which needs no integrations (!!!!).
integrate with information from the hard drive and from network traffic, to get more context on activities. (if we're upping the "spying", we can even use a keylogger but that doesn't seem useful for anything.)
can do special treatment of various applications, e.g. the browser, various API integrations, etc. . This goes against the "spirit" of the tool (which is that since we as users interact with the screen, then the screen should have all information that we need, if we just process it well enough. But it's interesting.)

Request for Improvement: Productize this

productize for normal user consumption:

make an installer
get rid of dependencies, bundle with tessaract
make the capture script run at startup
make a decent taskbar app for the capture script, with a gui, make directory structure configurable
use a decent database rather than what I do now (using python shelves and putting all screenshots in a directory)

Basically make it as easy to use as Screenotate https://screenotate.com/

Request for Improvements: improve search functionality!

Some inspiration for the below can come from web search (e.g. google) , as well as from memex (https://getmemex.com/).

Track 1: Improve search logic:
right now I only implemented single-word search. To improve this add:

multiword search, exact-pharse search, etc (a-la google, memex, ...). can do this by deploying an existing solution (either python library or database). Or can implement this from scratch. Shouldn't be too hard. E.g. to look for two words, just take the hit list of each of them, and intersect the lists.
better prioritization/ranking of results. (Remember that screenshots overlap so there might be many hits that are very close in time. Need to prioritize so that only one of those is chosen for the "first page" -- maybe by using some clustering on the timestamps, something like k-means on the timestamps). Search result ranking is a common problem so there might be some open-sorce library that can be a good place to start.
maybe add meta-data (e.g. which application is used, etc) and allow to search also on this.

Track 2: Better NLP/tokenization/etc:

right now the text is not tokenized properly, and that hurts the quality of the inverted index. For example words with dashes are not-split, special characters are not removed properly, and in general, tokens are not recognized in the best way to enable them to be used in a future search. Should fix that. Should be easy to get a 10x improvement by using NLP tools like NLTK https://www.nltk.org/
can do even more NLP to enrich with metadata: named entity recognition, etc. . (for each named entity that is recognized, an email that is recognized, etc, can just add it to the inverted index and that will allow future search).

Track 3: GUI and interactivity:

once the query is executed and properly-ranked search results are returned, the user still needs to get value by scrolling through the hits, examining them, maybe refining the search, looking at the OCR results, and in general, looking for the hit that they wanted. Need to improve this via GUI and interactivity: allowing to scroll through the results, pick some, magnify them, scrolling through the timeline, etc

Annoying little problems

OCR on the screen can only take you so far. For example, if words are broken between lines, I think OCR won't recognize this. So if the query term is split between two lines, it will not be found by omnitool (but it will get found if you e.g. index the website content). The way to solve this is either to add a particular integration with each app (e.g. with the browser) in order to get the original text; or, alternatively, to use an OCR that can recognize that a word is broken between two lines, and puts them back together. (I dunno, maybe tessaracy has a setting for this, or already does this by default). Similar issue exists for exact phrase search.
need to support taking screenshots from all monitors
need to support linux as well as windows

Request for Improvement: add unicode support and support of other languages

unicode support: right now the code very strongly uses python ascii strings. this is hard to change due to the shelf object. but would be good to port this to unicode
I think that right now english is preferred, both due to the use of ascii and the latin character set, but also due to tessaract default language option. would be nice to more strongly support other languages.

eladve / omnitool Goto Github PK

omnitool's People

Contributors

Stargazers

Watchers

omnitool's Issues

Request for Improvement: Improve screenshot compression

Ideas for Additional Features

Request for Improvement: Productize this

Request for Improvements: improve search functionality!

Annoying little problems

Request for Improvement: add unicode support and support of other languages

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent