diffix / dashboards Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 2.51 MB

Desktop application for producing anonymized dashboards using `Open Diffix`.

Home Page: https://www.open-diffix.org

License: Other

JavaScript 3.75% CSS 2.84% TypeScript 92.55% HTML 0.08% Batchfile 0.14% Shell 0.12% C 0.53%

dashboards's People

Contributors

Stargazers

Watchers

dashboards's Issues

Automate zipping and saving log files

Might be nice to have a function to automatically zip the log files and save them. This could be an function in the "help" pulldown menu.

Rename the `Notebook` component to `AdminPanel`.

ERR_CHILD_PROCESS_STDIO_MAXBUFFER

Cristian
[11 minutes ago]
node stdout buffers have a maximum length to them, after which the process crashes

This happened on Paul's instance running through all night

[2022-10-27 07:04:00.263] [error] Metabase exited with code null.
[2022-10-27 07:04:00.265] [error] (node:15404) UnhandledPromiseRejectionWarning: RangeError [ERR_CHILD_PROCESS_STDIO_MAXBUFFER]: stdout maxBuffer length exceeded

After some homework, I think the gist is in changing execFile to spawn https://www.ericluwj.com/2015/11/25/nodejs-spawn-vs-execfile.html

Consider dropping sample data from metabase instance

Example queries

If we don't have example queries that the user can use as a template for his/her own queries, then users won't know how to use the system.

I'd like to have a set of example queries created whenever a user adds a table. There should be enough queries that a user can most likely find a query that more-or-less does what the user wants to do, but not so many that the user can't find an appropriate query even when it exists. The example queries should do the following:

Cover most of the functions that we provide.
Provide enough templates so that usually a user only needs to find a template, change the column / aggregation values, and move forward.

It would be super useful if this process resulted in the initial page of Metabase showing a few of these queries under the heading "Pick up where you left off" (rather than the normal original page to explore data, which fails if the data is anonymized). Not sure if that is possible, since the page is populated with examples of previous queries that the user wrote.

One source of example queries are the queries at training.open-diffix.org.

I don't think we need to necessarily provide queries that are directly useful to the user. This would be hard to do in any event. I think more important is that we provide examples, and that the user can look through a reasonable list of query titles and pick out the most appropriate template, and then just make small changes to it.

Then the changed query can be saved as a new query:

In Metabase, we could have one "Collection" per table. Each collection then has a number of sample queries and associated visualizations, and maybe one dashboard containing the examples. Each per-table collection can be a sub-folder in an "Examples" folder, like this:

I think we could have for instance the following examples:

Count distinct column
Count rows (histogram, 2 columns)
Count protected entities (histogram, 1 column)
Sum column and average column in same query
Numeric generalization (use a column with many distinct numeric values)
Datetime generalization
Substring (perhaps together in one query with numeric generalization)
WHERE clause
A query with substantial suppression (if we can find a column with man distinct numeric values)

Add initialization script.

We need a script for creating the initial configuration of PostgreSQL and Metabase after setup.
Alternatively, the GUI app can do the initial configuration on the first run, if it's simpler.

Notes from Paul

Here is a set of thoughts I had while giving Dashboard a trial run. (Though bottom line, this is very cool!)

~~Imported tables says "AID columns", while later it is called the "protected entity identifier column".~~

"[Auto-generated row index column]" might be better names as "None", or "None (one row per protected entity)". In addition, I see that the scroll-down says "Select a column or 'None'", but then when you actually scroll down, 'None' isn't one of the choices.

I think there is a good chance a user might not get the distinction between "Load" and "Import". Especially given that the earlier load is called "Import table" and the latter "true import" is called "Import".

I suggest that we use "Import" to refer to pulling the table into the dashboard app, and then "Load imported table into PostgreSQL" for the final import.

~~When listed in the "Imported tables" part, the AID column is blank if none was chosen. Might be better to literally put "None (one row per protected entity)"?~~

Once the table is fully imported, it isn't real clear what to do next. I kinda feel that there ought to be a button to "open a metabase tab" or something. Indeed, such a button could include a helpful hint, like "Open Metabase with pre-loaded example queries" or something.

Or more like, it would seem natural to automatically add the first metabase tab once the metabase service is running? not sure how best to do that. (Actually, maybe better to have the button with the hint?)

I find the "Personal" button under "Select the protected entity..." to be confusing. My first thought is that, since the button seems to be in the "off" position, that you need to push it to make the table person. Maybe better to have a toggle with "Personal" and "Public" choices.

It seems that we can only select one AID? Was this our plan for the MVP?

When dashboard asks for the name of the table, it won't be clear to the user that this is the name given in PostgreSQL, and as a result things like giving a name with capital letters or spaces just leads to pain when the user wants to later write sql queries and has to put the table name in quotes. I'm inclined to force the user to make a table name that is friendly to Postgres.

If I try to import a table with the same name as an existing table, I get a warning "table_name already exists, will be overwritten". Maybe a better approach would be to make the textbox blank, and put a message that says "table_name already exists, please select a different name"?

~~Also, I feel it would be nice to have a "Cancel" button in addition to the "Import" button.~~

Finally, I do feel that a "control panel" style of interface for the admin would be better than the current scroll-down. But maybe we should let Damian and Stephan try and see if they have the same suggestion...

------- Metabase -------

In metabase, home page, it says "How's it going, local_francis?", local_francisbeing the name of my machine. Maybe better to just have something like "Diffix Dashboards" or "Diffix Dashboards User"? (A bit awkward no matter what we do, but ...)

~~Having used metabase in the past, I'm accustomed to using the browser's "previous page" arrow to navigate. This is now gone. Probably not an issue, but not sure yet...~~ Ok, having used the GUI a bit, I don't think we need a "back" button. In fact, the thing is rather easier to use without it!

In the metabase home page, it starts with "Here are some explorations of Anonymized Data", and "A look at 'Database'". The "A look at" button of course doesn't work. What if instead with start with "Here are some explorations of Direct Data". Then (presumably) the "A look at" button would work? (The danger here is that the user never transitions to using anonymized data and so exports non-anonymized data.)

But perhaps better yet: Once the user has created and saved some queries, then when metabase starts up again, rather than the "Here are some explorations..." page, it has "Pick up where you left off", and then buttons for saved dashboards and SQL queries. We should pre-load some sample queries and dashboards, and (hopefully) when metabase is started, it starts with "Pick up where you left off". (This might be a bit confusing for first-time user, but if we clearly name the pre-loaded queries as pre-loaded, then maybe not so bad.)

~~I think "Direct Data" is not immediately understandable as a term. I would prefer "Personal Data" or "Non-anonymized Data".~~

Single instance of app

We need a global named mutex to ensure a single instance for servers.

Proper handling of completed imports and notifications

IMO best done after we tackle the component presenting the imported tables. For now we have a placeholder invalidation (the Import button gets disabled as soon as import succeeds).

c.f. #7 (comment)

As part of this include proper notifications, c.f. #7 (comment)

Change "Direct Data" to ????

I think "Direct Data" is not immediately understandable as a term. I would prefer "Personal Data" or "Non-anonymized Data" (neither of which I love, but can't think of a better term).

Refresh webview when data changes

Even though we trigger a db sync when adding/removing tables, the frontend app caches data and requires a hard refresh when the schema changes.

Relocate "Sample Data"

Currently Metabase include a few "Sample Database" tables.

The basic idea is good (give the user something to play with even if they don't have any tables), but it doesn't match our structure, where everything is either in "Anonymized Data" or "Personal Data" (i.e. Direct Data).

So I think we should have some sample data, maybe even the same four tables that Metabase uses (which is nice in a way because they a JOIN-able). (On the other hand, maybe four tables is too much....maybe two at most.)

Then when Dashboards starts, the sample tables are already shown in the Dashboards "Imported Tables" table, with the appropriate Protected Entity configuration.

Likewise they appear in Metabase under the "Anonymized Data" and "Personal Data" databases.

Show existing tables in the admin panel.

The app needs to show the list of currently available tables in the initial admin panel tab.
The list should be displayed above the "Import table" component.
It should also provide a way to remove tables from the database.

Add packaging script (without Metabase).

We need a script that creates an executable setup package containing PostgreSQL, pg_diffix, Metabase, the GUI app, and the initialization script (which will be executed at the end of the setup process).

Stop Metabase from generating a firewall alert on first startup.

I think it listens globally instead of only listening on loopback.

Improve handling of mis-formatted CSVs

If you put an invalid entry in a row beyond the scope of our type-detection mechanism (>10000 IIRC), it will not behave nicely.

The example comes from taxi-one-day.csv, which has a "\N" cell for a Real typed column. It gives me:

12:56:12.732 › postgres: 2022-10-26 12:56:12.732 CEST [74911] FEHLER:  ungültige Eingabesyntax für Typ real: »\N«
12:56:12.733 › postgres: 2022-10-26 12:56:12.732 CEST [74911] ZUSAMMENHANG:  COPY taxi-one-day, Zeile 15631, Spalte dropoff_longitude: »\N«
12:56:12.733 › postgres: 2022-10-26 12:56:12.732 CEST [74911] ANWEISUNG:  COPY "taxi-one-day" FROM STDIN (DELIMITER ',', FORMAT CSV, HEADER true)

in the console, but nothing useful in the GUI (Import failed and an import stuck halfway).

Rename repo to `dashboards`.

Deduplicate translation.json using succinct keys

See #7 (comment)

(when done, open a low-priority issue in desktop to do the same, for consistency)

Fix (or prevent, if hard) Metabase automatically analyzing personal tables

It always fails and makes it harder to find any other error in the logs. Worse yet, it clogs the logs and causes unneeded rotation, so useful logs might get lost.

First pass in-app docs

Control-panel type of interface

I'm thinking that a control-panel type of interface might make more sense.

Basically, the "admin" interface would be this and nothing else:

The "new metabase tab" button is self-explanatory

If you click the + at the bottom, then this opens a new tab which manages the import process.

This new tab behaves exactly as our current interface does from "Import Table" on downwards. (i.e. has a drag-and-drop window, gives preview, and lets you configure the PE).

The user can abort the process by deleting the tab, or finish the process with the final import step (which could be labeled Load table into PostgreSQL. The success or error code pop-up is shown, and if the user deletes the pop-up, then the tab is removed and the "admin" tab comes back.

For documentation, we could do one or more the following:

Have "documentation" in the pull-down in the "help" menu. This brings up the documentation in a new tab.
Have a little i mark near various items in the control panel or import tab (though this kind of thing is generally ugly, so I don't know).

In any event, there are then four types of tabs, the admin tab, a Metabase tab, an Import tab, and a documentation tab. Also I'd consider renaming the "Admin Panel" tab to "Tables" or "Home".)

Enable recognition and import of datetime columns

Add procedure for launching PostgreSQL on startup.

We need a procedure in the app that launches PostgreSQL, waits for it to start running, checks to see if it was previously configured, and does the initial configuration if not.

Prevent importing tables which later require double-quotes

The suggested table name is currently the CSV filename without extension. This causes certain characters (space, upper case letters) to make PostgreSQL require "" when referencing the table, which is inconvenient for the user (reported by Paul).

My suggestion for a low-hanging-fruit-fix for this is the following:

When we're suggesting the table name to the user, we should replace all the offending characters (downcase, replace space with _, maybe replace <upper case letter><lower case letter> with _<lower case letters>, i.e. camel-case to snake-case conversion)
When the user edits the table name in a way that it contains offending letters, we should show a warning in a box, like we currently do for name conflicts.

Data discovery + preview + import to PG

For now I bundle these 3 large topics together, to see what options we have. Once separate subtasks crystalize I'll divide up.

~~I'm treating CSV as priority and Excel at the back of my head.~~

EDIT: CSV as priority but not bothering with Excel

Detail note: I won't be reusing the CSV import from desktop, which was done via reference, so this part needs to be done in pure JS (or PostgreSQL, if there's a way 🤔 )

Revisit SQL injection done by column name

Either make sure it's impossible or harmless or make it so.

c.f. #7 (comment)

Move GUI app to top folder.

It makes sense to do so since it will be the central component of the project and we plan to use npm commands for everything.

Stop Metabase gracefully.

Currently, a forceful shutdown happens.

Add GUI skeleton :skull:

based on desktop

Package JRE with Metabase jar (Windows)

Follow up to #37

Manage failure during initialization of pg_diffix and metabase - logs

Cache dependencies during automatic builds.

We can make automatic builds a lot faster if we cache the folders with built services and npm libraries.

Data preview table should display the autodetected column type.

This would make the schema of the created table obvious to the user.

Cancel button (during data import)

A couple times now I wanted to back out of an import dialog, and there was no way to do it short of restarting the app or importing and then deleting.

A "Cancel" button in the vicinity of the "Import" button would be handy.

Add components for monitoring the status of child processes.

These should appear at the top of the admin panel and display the current state (starting, running, stopped) of all depending processes launched during startup.

Allow importing public tables

c.f. #7 (comment)

Adapt postgres startup to the GUI changes

After #26 has been merged, I need to adapt the PostgreSQL startup to match the GUI.

Toggle switch for personal/public choice

Default would be "Personal".

I'm thinking something like this:

File-oriented paradigm

I wonder if we should adopt a file-oriented paradigm for Dashboard rather than the current import-oriented paradigm.

The reason is this. Currently Moers organizes all their data by file. This is among other things how they manage access control, and I would guess that, to the extent that they automate, they do so by managing files.

The current paradigm for Dashboard is import-oriented. That is to say, a file is imported, and it gets duplicated and is it's own separate entity. The user could delete the original file, and it would still live in Dashboard. The user could overwrite the original file with something new, and the old version would live on in Dashboard. So there is a kindof disconnect between what is in dashboard and the files that the users know and love.

An alternative paradigm would be file-oriented, whereby the user points Dashboard to a file. Conceptually for the user, whatever is in the file is also in Dashboard. For all practical purposes the user can imagine that he or she is querying the file itself, and not a copy in Dashboard. If the file is removed, then it is removed from Dashboard. If the file changes, then the new data is imported to Dashboard.

Practically-speaking, this might have the following impact on the GUI:

In the Imported Tables table, there is a link to the file itself ("Show in folder")
In addition to drag-and-drop (or maybe even instead of), we have a "Select file" thing that brings up a file manager window.
We use language like "Make file visible in Dashboard" rather than "Import file".
The file name appears everywhere, especially including the table name in Metabase.
If the file is deleted, the corresponding entry in Dashboards is deleted.
If the file is changed, Dashboards points this out to the user and asks if the modified file should be visible in Dashboards (thus triggering an upload).
Of course we have the problem of duplicate file names. One approach for this might be to prepend some of the file name with some of the path ... enough to distinguish the two files. (Alternatively we just require the user to change the file name.)

And so on. What do you think? (Note this also makes bulk file upload a very natural extension.)

Manage failure during initialization of pg_diffix and metabase - cleaning of broken setup

Split from #53, see discussion there.

Some thoughts from there:

managing setup failure - should we wipe app state ($HOME/.dashboards) clean on setup failure or expose a way for users to do it from the GUI?

I have a feeling that it will be better if we do this from the start as well (besides the logging stuff).
If we don't want to reset the state automatically after an initialization failure, we should at least mark it somehow as being incomplete and ask the user if they want a clean state at the next startup.

Add automatic builds.

We need an automated way to create setup packages periodically or on demand on GitHub, instead of relying on manual local builds.

Sync metabase language with app's

It's decided once during startup. Hope keeping it updated won't be tricky...

Add Metabase to packaging script

Split out of #5 , which only handles PostgreSQL

Handle systems with non-standard locales.

Users will most likely run the app with a German locale set. We need a strategy for handling data import for such setups. We can either:

force a standard locale throughout the pipeline. This will be simple, consistent, and less error prone, but possibly inconvenient for users.
use the system default locale (with the option of overriding the setting per file). More complex, but we will be able to handle localized data files without reconversion.

Add refresh button to metabase tabs on data changes

This way we let the user decide when they want to reload.

Add data query tab.

The "New tab" button should open a tab with a Metabase web-view for querying imported data.

Clean up AID column terminology

The AID column terminology is a bit confusing right now.

Currently we have the following:

Object	Current setting
The default text in the config box	Select a column or 'None'
The entry in the pull-down menu	[Auto generated row index column]
The entry in the AID columns of the the Imported Tables table	(blank line) `Is this a bug?`
The actual column name in the postgres table	DiffixRowIndex
How Metabase displays the column name	DIFFIXROWINDEX

Concrete ideas:

Use "Protected Entity" everywhere instead of "AID". (Currently there is a column labeled AID columns in the Imported Tables table.

Use "None" everywhere in Dashboards to refer to Personal tables that have no AID column.

To distinguish this from Public tables, we have separate tables for listing Personal and Public tables. Naturally the Public table listing does not have the "Protected Entity Columns" column.

Already the naming is inconsistent, because the default text doesn't match the pull-down.

How about this: We use 'None' for the pull-down menu and the AID columns of the Imported Tables table for personal tables that have no AID column, and we use 'Not Applicable' for in the Imported Tables table for public tables.

Or maybe better yet, we divide the table into two tables, one for Personal tables and one for Public tables. Then it is very clear...

Possible change the name of the index-based AID column in postgres to Protected_Entity_Index or something that includes the term "protected entity". (Note that Metabase lists column names in ALL-CAPS, so camel-case doesn't translate well into what Metabase shows.)

diffix / dashboards Goto Github PK

dashboards's People

Contributors

Stargazers

Watchers

dashboards's Issues

Recommend Projects

Recommend Topics

Recommend Org