seanbreckenridge / hpi Goto Github PK

Human Programming Interface - a way to unify, access and interact with all of my personal data [my modules]

Home Page: https://beepb00p.xyz/hpi.html

License: MIT License

Shell 7.36% Python 92.64%

quantified-self personal-api lifelogging data gdpr history

hpi's Introduction

TLDR: I'm using HPI(Human Programming Interface) package as a means of unifying, accessing and interacting with all of my personal data.

It's a Python library (named my), a collection of modules for:

social networks: posts, comments, favorites, searches
shell/program histories (zsh, bash, python, mpv, firefox)
programming (github/commits)
instant messaging
media histories (movies, TV shows, music, video game achievements/history); see https://sean.fish/feed/

Why?

This is built on top of karlicoss/HPI. It started out as a fork, but has since been converted to my own set of modules. This is installed alongside the upstream repository (meaning you can use both modules from upstream and here simultaneously), see #install

My Modules

my.zsh and my.bash, access to my shell history w/ timestamps
my.mail.imap and my.mail.mbox to parse local IMAP sync's of my mail/mbox files -- see doc/MAIL_SETUP.md
my.mpv.history_daemon, accesses movies/music w/ activity/metadata that have played on my machine, facilitated by a mpv history daemon
my.discord.data_export, parses ~1,000,000 messages/events from the discord data export, parser here
my.todotxt.active to parse my current todo.txt file; my.todotxt.git_history tracks my history using backups of those files in git_doc_history
my.rss.newsboat, keeps track of when I added/removed RSS feeds (for newsboat)
my.ipython, for timestamped python REPL history
my.ttt, to parse shell/system history tracked by ttt
my.activitywatch.active_window, to parse active window events (what application I'm using/what the window title is) using window_watcher and activitywatch on android
my.chess.export, to track my chess.com/lichess.org games, using chess_export
my.trakt.export, providing me a history/my ratings for Movies/TV Show (episodes) using traktexport
my.listenbrainz.export, exporting my music listening history from ListenBrainz (open-source Last.fm) using listenbrainz_export
my.offline.listens, for offline music listen history, using offline_listens
my.mal.export, for anime/manga history using malexport
my.grouvee.export, for my video game history/backlog using grouvee_export
my.runelite.screenshots, parses data from the automatic runelite screenshots
my.minecraft.advancements, parses advancement (local achievement data) from the ~/.minecraft directory
my.project_euler, when I solved Project Euler problems
my.linkedin.privacy_export, to parse the privacy export from linkedin
my.scramble.history for merged (timed) rubiks cube solves from multiple sources, using scramble_history

'Historical' Modules

These are modules to parse GDPR exports/data from services I used to use, but don't anymore. They're here to provide more context into the past.

my.apple.privacy_export, parses Game Center and location data from the apple privacy export
my.facebook.gdpr, to parse the GDPR export from Facebook
my.league.export, gives League of Legends game history using lolexport
my.steam.scraper, for steam achievement data and game playtime using steamscraper
my.piazza.scraper, parsing piazza (university forum) posts using piazza-scraper
my.blizzard.gdpr, for general battle.net event data parsed from a GDPR export
my.skype.gdpr to parse a couple datetimes from the Skype GDPR export (seems all my data from years ago is long gone)
my.spotify.gdpr, to parse the GDPR export from Spotify, mostly to access songs from my playlists from years ago
my.twitch, merging the data export and my messages parsed from the overrustle logs dump

See here for my HPI config

Promnesia Sources for these HPI modules

I also have some more personal scripts/modules in a separate repo; HPI-personal

In-use from karlicoss/HPI

my.browser, to parse browser history using browserexport
my.google.takeout.parser, parses lots of (~500,000) events (youtube, searches, phone usage, comments, location history) from google takeouts, using google_takeout_parser
my.coding.commits to track git commits across the system
my.github to track github events/commits and parse the GDPR export, using ghexport
my.reddit, get saved posts, comments. Uses rexport to create backups of recent activity periodically, and pushshift to get old comments.
my.smscalls, exports call/sms history using SMS Backup & Restore
my.stackexchange.stexport, for stackexchange data using stexport

Partially in-use/with overrides:

my.location, though since I also have some locations from apple.privacy_export, I have a my.location.apple which I then merge into my.location.all in my overridden all.py file on my personal repo
similarly, I do use my.ip and my.location.via_ip from upstream, but I have overridden all.py and module files here

'Overriding' an all.py file means replacing the all.py from upstream repo (this means it can use my sources here to grab more locations/ips, since those don't exist in the upstream). For more info see reorder_editable, and the module design docs for HPI, but you might be able to get the gist by comparing:

my.location.all in karlicoss/HPI
my.location.all in seanbreckenridge/HPI-personal

Since I've mangled my PYTHONPATH (see reorder_editable), it imports from my repo instead of karlicoss/HPI. all.py files tend to pretty small -- so overriding/changing a line to add a source is the whole point.

Companion Tools/Libraries

Disregarding tools which actively collect data (like ttt/window_watcher) or repositories which have their own exporter/parsers which are used here, there are a couple other tools/libraries I've created for this project:

ipgeocache - for any IPs gathered from data exports, provides geolocation info, so I have partial location info going back to 2013
sqlite_backup - to safely copy/backup application sqlite databases that may currently be in use
git_doc_history - a bash script to copy/backup files into git history, with a python library to help traverse and create a history/parse diffs between commits
HPI_API - automatically creates a JSON API/server for HPI modules
url_metadata - caches youtube subtitles, url metadata (title, description, image links), and a html/plaintext summary for any URL

I also use this in my_feed, which creates a feed of media/data using HPI, live at https://sean.fish/feed/

Ad-hoc and interactive

Some basic examples.

When was I most using reddit?

>>> import collections, my.reddit.all, pprint
>>> pprint.pprint(collections.Counter([c.created.year for c in my.reddit.all.comments()]))
Counter({2016: 3288,
         2017: 801,
         2015: 523,
         2018: 209,
         2019: 65,
         2014: 4,
         2020: 3})

Most common shell commands?

>>> import collections, pprint, my.zsh
# lots of these are git-related aliases
>>> pprint.pprint(collections.Counter([c.command for c in my.zsh.history()]).most_common(10))
[('ls', 51059),
 ('gst', 11361),
 ('ranger', 6530),
 ('yst', 4630),
 ('gds', 3919),
 ('ec', 3808),
 ('clear', 3651),
 ('cd', 2111),
 ('yds', 1647),
 ('ga -A', 1333)]

What websites do I visit most?

>>> import collections, pprint, my.browser.export, urllib
>>> pprint.pprint(collections.Counter([urllib.parse.urlparse(h.url).netloc for h in my.browser.export.history()]).most_common(5))
[('github.com', 20953),
 ('duckduckgo.com', 10146),
 ('www.youtube.com', 10126),
 ('discord.com', 8425),
 ('stackoverflow.com', 2906)]

Song I've listened to most?

>>> import collections, my.mpv.history_daemon
>>> collections.Counter([m.path for m in my.mpv.history_daemon.history()]).most_common(1)[0][0]
'/home/sean/Music/JPEFMAFIA/JPEGMAFIA - LP! - 2021 - V0/JPEGMAFIA - LP! - 05 HAZARD DUTY PAY!.mp3'

Movie I've watched most?

>>> import my.trakt, from collections import Counter
>>> Counter(e.media_data.title for e in my.trakt.history()).most_common(1)
[('Up', 92)]  # (the pixar movie)

hpi also has a JSON query interface, so I can do quick computations using shell tools like:

# how many calories have I eaten today (from https://github.com/seanbreckenridge/ttally)
$ hpi query ttally.__main__.food --recent 1d -s | jq -r '(.quantity)*(.calories)' | datamash sum 1
2258.5

Install

For the basic setup, I recommend you clone and install both directories as editable installs:

# clone and install upstream as an editable package
git clone https://github.com/karlicoss/HPI ./HPI-karlicoss
python3 -m pip install --user -e ./HPI-karlicoss

# clone and install my repository as an editable package
git clone https://github.com/seanbreckenridge/HPI ./HPI-seanb
python3 -m pip install --user -e ./HPI-seanb

Editable install means any changes to python files reflect immediately, which is very convenient for debugging and developing new modules. To update, you can just git pull in those directories.

If you care about overriding modules, to make sure your easy-install.pth is ordered correctly:

python3 -m pip install --user reorder_editable
python3 -m reorder_editable reorder ./HPI-seanb ./HPI-karlicoss

Then, you likely need to run hpi module install for any modules you plan on using -- this can be done incrementally as you setup new modules. E.g.:

hpi module install my.trakt.export to install dependencies
Check the stub config or my config and setup the config block in your HPI configuration file
Run hpi doctor my.trakt.export to check for any possible config issues/if your data is being loaded properly

(The install script does that for all my modules, but you likely don't want to do that)

Its possible to install both my packages because HPI is a namespace package. For more information on that, and some of the complications one can run into, see reorder_editable, and the module design docs for HPI.

If you're having issues installing/re-installing, check the TROUBLESHOOTING_INSTALLS.md file.

If you recently updated and it seems like something has broke, check the CHANGELOG for any possible breaking changes

hpi's People

Contributors

Stargazers

Watchers

Forkers

ktaranov sekuryti zuoquanxiong software-resources karlicoss hardspoon

hpi's Issues

modularize my.location and remove tz.time.via_location

Can put the individual sources for location behind import_source blocks, so that people can use one of the location sources here without having to mess with the files to prevent certain imports

Then, use that location.all in tz.time.via_location -- can remove the file from here (since its currently overriden due to editable rules), and PR the location.all to the main repo

integrate with scramble history

https://github.com/seanbreckenridge/scramble-history

google pixel now playing history

https://dfir.pubpub.org/pub/xbvsrjt5/release/1

Need to create an full backup of my phone and go in and dump some protobuf stuff

man...

Allow user to specify kwarg for HPI query to override choice

Some Namedtuple-like objects may have multiple attributes which have dates, and it may pick the wrong one because it happens to find it first.

Need to add an optional kwarg to propogate that info across each func

Replace window_watcher

https://github.com/sourcegraph/thyme

add my.mail.mbox module: to parse mbox files

See discussion in #15

low priority but would be nice to add

The addon mentioned for Thunderbird (importexporttols) now supports periodic Backups but only in MBOX format.

I've seen this solution to parse MBOX files: https://github.com/chronicle-app/chronicle-email/blob/master/lib/chronicle/email/mbox_extractor.rb

@krillin666 feel free to re/post-describe the process here -- I'll probably have to do it once myself to test the format and probably want to create a doc of some kind for handling email files -- its not obvious how to set it up right now

add module for flight data/history

copied from location issue:

Another thing I've been thinking about for fallbacks (other than just location.home) is tracking flights?

Haven't looked into how to do that/if its possible, but is something I wanted to look into (maybe hooking into some API which has flight data?)

My locations only go back to about 2015, but if I can use old passports to figure out flights or something, it could go back to something like 2000

notes from: karlicoss/HPI#154 (comment)

Some other things to consider as well:

Flights can have both departure/arrival (with flight numbers, gates, terminals, etc), as well as GPS timestamps in-air (one of my favourite uses of early GPS loggers was someone plotting the entire flight as a 3D line).
Estimated location data can typically have a defined radius or zone that covers all possible locations in the estimate.

use arbtt on linux

improve window_watcher event structure

while parsing, maintain a 'recent' cache, removing items when they haven't had an event for 2 hours

each item would look like:

# represents one history entry
class Entry(NamedTuple):
	times: Tuple[datetime, timedelta]
    application: str
    window_title: str

add some @property wrappers to serialize the start/end times

split body_log into a separate project/repo

Feels strange for that to be here, as it acts as both the input mechanism and the way to read the data.

Ideally the multiple interfaces that I've created by throwing together scripts should be created by some metaprogramming on top of autotui, and then installed as system-wide scripts.

Files should be loaded by using a helper method from the corresponding package/just loading them manually by globbing in HPI

my.ipython fails on newest version

See ipython/ipython#13666

Current workaround: b20409f

restructure job dirs so that its easier to dispatch on OS type

similar to how I structure them in my dotfiles:

jobs
├── linux
│   ├── dl_mnu.priv.job
│   ├── guestbook_comments.job
│   ├── mint.job
│   ├── raspi.job
│   └── update_rss.job
├── mac
└── shared
    └── mystars.job

Update run_jobs so that it runs those specific directories on each OS

merge into HPI?

I see many are custom coded, why are they a part of their own repo and not submitted into the base HPI?

cache backed up firefox databases

once the number of firefox dbs begins to grow, putting it behind cachew does offer a speedup

See seanbreckenridge/browserexport#6 for previous thoughs

unhandled google takeout files

[RuntimeError('Unhandled file: /home/sean/data/google_takeout/Takeout-1616747344/My Activity/Google Translate/MyActivity.html'),
 RuntimeError('Unhandled file: /home/sean/data/google_takeout/Takeout-1616747344/My Activity/Podcasts/MyActivity.html'),
 RuntimeError('Unhandled file: /home/sean/data/google_takeout/Takeout-1616747344/Google Play Store/Promotion History.json')]

Also, probably worth it to figure this out finally

https://github.com/seanbreckenridge/HPI/blob/master/my/google/__init__.py#L3-L25

would be fine to just keep all takeouts, merging stuff using a set in memory and then wrapping raw_events in a mcachew

parse aw-window-watcher JSON dump

split personal files into HPI overlay repo

To avoid possible name conflicts -- also so that I can provide all.py files here that work for everyone, and then have personal all.py files that override those

my.discord: add reactions/app name functions

How to update HPI and HPI-sean

Hi !

I've had some headaches sometimes updating both yours and @karlicoss HPI. Usually I delete the previous cloned HPI repo folders and then do the installation process outline in your README. But since I've made modifications to some modules, like the signal.py which wasn't working, I then have to manually change the python module files again.

Is there any way I can only update both HPI and HPI-sean and leave user modifications intact ? Sorry, I'm a noob at git

Thanks!

Ignore folders in mail source

Hey @seanbreckenridge how are you doing ?
I’ve noticed that my promnesia indexing is taking longer than usual, and found out that it might be because it is indexing all my Trash folders from my different emails. Is it possible to exclude certain folder of indexing using the Thunderbird method ?

stream all HPI events

create functionality/library which gathers all events from all of HPI, dynamically finding datetime/date fields, returning a stream of all of them, ordered by date

fix my.google namedtuple structure

parsing multiple links and having to call e.parse_json() on every element isnt great

https://github.com/seanbreckenridge/HPI/blob/master/my/google/models.py#L7-L32

probably could be done better by converting some HtmlEvents to some other type of model which has additional fields for the other links, or just reducing the amount of links one captures to 1/2

To debug:

for y in events():
	if hasattr(y, 'links'):
    	if hasattr(y, 'Service') and y.Service == "Maps":
       	continue
       j = json.loads(y.links)
       if len(j) > 1:
           print(y)
           print(j)

add linkedin data export

make google_takeout more configurable

let user set global vars/envvars to disable warnings or pass additional parsers

How to use IMAP function 📧

Hello,

Thanks for this amazing extension of HPI that I just discovered. I was trying to setup this with Promnesia for my emails but I'm getting zero indexing:

[INFO    2021-10-18 23:17:26 promnesia extract.py:49] extracting via promnesia_sean.sources.imap:index () {} ... ...
[INFO    2021-10-18 23:17:26 promnesia extract.py:82] extracting via promnesia_sean.sources.imap:index () {} ...: got 0 visits

I am using this Thunderbird Addon and I've tried to export with using:

Export whole folder
Export whole folder with structure
Export all emails in EML format
Export all email in TXT format

All these export are in my .local/share/mail path, I even wrote the path literally in the init_ file instead of using path,join but nothing works.

Thank you so much !

experiment with namespace packages

https://github.com/pypa/sample-namespace-packages

https://packaging.python.org/guides/creating-and-discovering-plugins/

i.e. install karlicoss/HPI globally and create a setup.py file which installs all of my personal modules as sub-packages

warn in ipython if historyaccessor fails

to upgrade ver to fix accessoe bug

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.