Git Product home page Git Product logo

omnitool's People

Contributors

eladve avatar

Stargazers

 avatar

Watchers

 avatar  avatar

omnitool's Issues

Request for Improvement: Improve screenshot compression

screenshot compression right now just uses the PIL.Image options that I found give a screenshot size of around 10 kilobytes. But I'm sure this can be much improved, to decrease size and/or improve readability of the text.

One wild idea would be to recognize the text parts of the screen (using the bounding boxes produced by tessaract), and then to compress those parts using one algorithm (that compresses on-screen text well, and compress the other parts using another algorithm, and then to paste these two compressions together (either when saving to the database, or store them seperately and paste them together only while recalling from database). this is cool, might be overengineering.

Ideas for Additional Features

Can add many features for this. Some low hanging fruit:

  • time tracker: recognize which application is running, and track time usage, together with some metadata, and allow to jump to that point in time
  • use NLP to do named entity detection and part-of-speech tagging, to add some metadata to the screenshots and maybe connect them to each other and to known entities. e.g. can build an "automatic CRM" with this, which needs no integrations (!!!!).
  • integrate with information from the hard drive and from network traffic, to get more context on activities. (if we're upping the "spying", we can even use a keylogger but that doesn't seem useful for anything.)
  • can do special treatment of various applications, e.g. the browser, various API integrations, etc. . This goes against the "spirit" of the tool (which is that since we as users interact with the screen, then the screen should have all information that we need, if we just process it well enough. But it's interesting.)

Request for Improvement: Productize this

productize for normal user consumption:

  • make an installer
  • get rid of dependencies, bundle with tessaract
  • make the capture script run at startup
  • make a decent taskbar app for the capture script, with a gui, make directory structure configurable
  • use a decent database rather than what I do now (using python shelves and putting all screenshots in a directory)

Basically make it as easy to use as Screenotate https://screenotate.com/

Request for Improvements: improve search functionality!

Some inspiration for the below can come from web search (e.g. google) , as well as from memex (https://getmemex.com/).

Track 1: Improve search logic:
right now I only implemented single-word search. To improve this add:

  • multiword search, exact-pharse search, etc (a-la google, memex, ...). can do this by deploying an existing solution (either python library or database). Or can implement this from scratch. Shouldn't be too hard. E.g. to look for two words, just take the hit list of each of them, and intersect the lists.
  • better prioritization/ranking of results. (Remember that screenshots overlap so there might be many hits that are very close in time. Need to prioritize so that only one of those is chosen for the "first page" -- maybe by using some clustering on the timestamps, something like k-means on the timestamps). Search result ranking is a common problem so there might be some open-sorce library that can be a good place to start.
  • maybe add meta-data (e.g. which application is used, etc) and allow to search also on this.

Track 2: Better NLP/tokenization/etc:

  • right now the text is not tokenized properly, and that hurts the quality of the inverted index. For example words with dashes are not-split, special characters are not removed properly, and in general, tokens are not recognized in the best way to enable them to be used in a future search. Should fix that. Should be easy to get a 10x improvement by using NLP tools like NLTK https://www.nltk.org/
  • can do even more NLP to enrich with metadata: named entity recognition, etc. . (for each named entity that is recognized, an email that is recognized, etc, can just add it to the inverted index and that will allow future search).

Track 3: GUI and interactivity:

  • once the query is executed and properly-ranked search results are returned, the user still needs to get value by scrolling through the hits, examining them, maybe refining the search, looking at the OCR results, and in general, looking for the hit that they wanted. Need to improve this via GUI and interactivity: allowing to scroll through the results, pick some, magnify them, scrolling through the timeline, etc

Annoying little problems

  • OCR on the screen can only take you so far. For example, if words are broken between lines, I think OCR won't recognize this. So if the query term is split between two lines, it will not be found by omnitool (but it will get found if you e.g. index the website content). The way to solve this is either to add a particular integration with each app (e.g. with the browser) in order to get the original text; or, alternatively, to use an OCR that can recognize that a word is broken between two lines, and puts them back together. (I dunno, maybe tessaracy has a setting for this, or already does this by default). Similar issue exists for exact phrase search.
  • need to support taking screenshots from all monitors
  • need to support linux as well as windows

Request for Improvement: add unicode support and support of other languages

  • unicode support: right now the code very strongly uses python ascii strings. this is hard to change due to the shelf object. but would be good to port this to unicode

  • I think that right now english is preferred, both due to the use of ascii and the latin character set, but also due to tessaract default language option. would be nice to more strongly support other languages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.