Git Product home page Git Product logo

dashboards's People

Contributors

cristianberneanu avatar edongashi avatar pdobacz avatar yoid2000 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

dashboards's Issues

ERR_CHILD_PROCESS_STDIO_MAXBUFFER

Cristian
[11 minutes ago]
node stdout buffers have a maximum length to them, after which the process crashes

This happened on Paul's instance running through all night

[2022-10-27 07:04:00.263] [error] Metabase exited with code null.
[2022-10-27 07:04:00.265] [error] (node:15404) UnhandledPromiseRejectionWarning: RangeError [ERR_CHILD_PROCESS_STDIO_MAXBUFFER]: stdout maxBuffer length exceeded

After some homework, I think the gist is in changing execFile to spawn https://www.ericluwj.com/2015/11/25/nodejs-spawn-vs-execfile.html

Example queries

If we don't have example queries that the user can use as a template for his/her own queries, then users won't know how to use the system.

I'd like to have a set of example queries created whenever a user adds a table. There should be enough queries that a user can most likely find a query that more-or-less does what the user wants to do, but not so many that the user can't find an appropriate query even when it exists. The example queries should do the following:

  1. Cover most of the functions that we provide.
  2. Provide enough templates so that usually a user only needs to find a template, change the column / aggregation values, and move forward.

It would be super useful if this process resulted in the initial page of Metabase showing a few of these queries under the heading "Pick up where you left off" (rather than the normal original page to explore data, which fails if the data is anonymized). Not sure if that is possible, since the page is populated with examples of previous queries that the user wrote.

One source of example queries are the queries at training.open-diffix.org.

I don't think we need to necessarily provide queries that are directly useful to the user. This would be hard to do in any event. I think more important is that we provide examples, and that the user can look through a reasonable list of query titles and pick out the most appropriate template, and then just make small changes to it.

Then the changed query can be saved as a new query:

image

In Metabase, we could have one "Collection" per table. Each collection then has a number of sample queries and associated visualizations, and maybe one dashboard containing the examples. Each per-table collection can be a sub-folder in an "Examples" folder, like this:

image

I think we could have for instance the following examples:

  • Count distinct column
  • Count rows (histogram, 2 columns)
  • Count protected entities (histogram, 1 column)
  • Sum column and average column in same query
  • Numeric generalization (use a column with many distinct numeric values)
  • Datetime generalization
  • Substring (perhaps together in one query with numeric generalization)
  • WHERE clause
  • A query with substantial suppression (if we can find a column with man distinct numeric values)

Add initialization script.

We need a script for creating the initial configuration of PostgreSQL and Metabase after setup.
Alternatively, the GUI app can do the initial configuration on the first run, if it's simpler.

Notes from Paul

Here is a set of thoughts I had while giving Dashboard a trial run. (Though bottom line, this is very cool!)

Imported tables says "AID columns", while later it is called the "protected entity identifier column".

"[Auto-generated row index column]" might be better names as "None", or "None (one row per protected entity)". In addition, I see that the scroll-down says "Select a column or 'None'", but then when you actually scroll down, 'None' isn't one of the choices.

I think there is a good chance a user might not get the distinction between "Load" and "Import". Especially given that the earlier load is called "Import table" and the latter "true import" is called "Import".

I suggest that we use "Import" to refer to pulling the table into the dashboard app, and then "Load imported table into PostgreSQL" for the final import.

When listed in the "Imported tables" part, the AID column is blank if none was chosen. Might be better to literally put "None (one row per protected entity)"?

Once the table is fully imported, it isn't real clear what to do next. I kinda feel that there ought to be a button to "open a metabase tab" or something. Indeed, such a button could include a helpful hint, like "Open Metabase with pre-loaded example queries" or something.

Or more like, it would seem natural to automatically add the first metabase tab once the metabase service is running? not sure how best to do that. (Actually, maybe better to have the button with the hint?)

I find the "Personal" button under "Select the protected entity..." to be confusing. My first thought is that, since the button seems to be in the "off" position, that you need to push it to make the table person. Maybe better to have a toggle with "Personal" and "Public" choices.

It seems that we can only select one AID? Was this our plan for the MVP?

When dashboard asks for the name of the table, it won't be clear to the user that this is the name given in PostgreSQL, and as a result things like giving a name with capital letters or spaces just leads to pain when the user wants to later write sql queries and has to put the table name in quotes. I'm inclined to force the user to make a table name that is friendly to Postgres.

If I try to import a table with the same name as an existing table, I get a warning "table_name already exists, will be overwritten". Maybe a better approach would be to make the textbox blank, and put a message that says "table_name already exists, please select a different name"?

Also, I feel it would be nice to have a "Cancel" button in addition to the "Import" button.

Finally, I do feel that a "control panel" style of interface for the admin would be better than the current scroll-down. But maybe we should let Damian and Stephan try and see if they have the same suggestion...

------- Metabase -------

In metabase, home page, it says "How's it going, local_francis?", local_francisbeing the name of my machine. Maybe better to just have something like "Diffix Dashboards" or "Diffix Dashboards User"? (A bit awkward no matter what we do, but ...)

Having used metabase in the past, I'm accustomed to using the browser's "previous page" arrow to navigate. This is now gone. Probably not an issue, but not sure yet... Ok, having used the GUI a bit, I don't think we need a "back" button. In fact, the thing is rather easier to use without it!

In the metabase home page, it starts with "Here are some explorations of Anonymized Data", and "A look at 'Database'". The "A look at" button of course doesn't work. What if instead with start with "Here are some explorations of Direct Data". Then (presumably) the "A look at" button would work? (The danger here is that the user never transitions to using anonymized data and so exports non-anonymized data.)

But perhaps better yet: Once the user has created and saved some queries, then when metabase starts up again, rather than the "Here are some explorations..." page, it has "Pick up where you left off", and then buttons for saved dashboards and SQL queries. We should pre-load some sample queries and dashboards, and (hopefully) when metabase is started, it starts with "Pick up where you left off". (This might be a bit confusing for first-time user, but if we clearly name the pre-loaded queries as pre-loaded, then maybe not so bad.)

I think "Direct Data" is not immediately understandable as a term. I would prefer "Personal Data" or "Non-anonymized Data".

Change "Direct Data" to ????

I think "Direct Data" is not immediately understandable as a term. I would prefer "Personal Data" or "Non-anonymized Data" (neither of which I love, but can't think of a better term).

Refresh webview when data changes

Even though we trigger a db sync when adding/removing tables, the frontend app caches data and requires a hard refresh when the schema changes.

Relocate "Sample Data"

Currently Metabase include a few "Sample Database" tables.

The basic idea is good (give the user something to play with even if they don't have any tables), but it doesn't match our structure, where everything is either in "Anonymized Data" or "Personal Data" (i.e. Direct Data).

So I think we should have some sample data, maybe even the same four tables that Metabase uses (which is nice in a way because they a JOIN-able). (On the other hand, maybe four tables is too much....maybe two at most.)

Then when Dashboards starts, the sample tables are already shown in the Dashboards "Imported Tables" table, with the appropriate Protected Entity configuration.

Likewise they appear in Metabase under the "Anonymized Data" and "Personal Data" databases.

Show existing tables in the admin panel.

The app needs to show the list of currently available tables in the initial admin panel tab.
The list should be displayed above the "Import table" component.
It should also provide a way to remove tables from the database.

Add packaging script (without Metabase).

We need a script that creates an executable setup package containing PostgreSQL, pg_diffix, Metabase, the GUI app, and the initialization script (which will be executed at the end of the setup process).

Improve handling of mis-formatted CSVs

If you put an invalid entry in a row beyond the scope of our type-detection mechanism (>10000 IIRC), it will not behave nicely.

The example comes from taxi-one-day.csv, which has a "\N" cell for a Real typed column. It gives me:

12:56:12.732 › postgres: 2022-10-26 12:56:12.732 CEST [74911] FEHLER:  ungültige Eingabesyntax für Typ real: »\N«
12:56:12.733 › postgres: 2022-10-26 12:56:12.732 CEST [74911] ZUSAMMENHANG:  COPY taxi-one-day, Zeile 15631, Spalte dropoff_longitude: »\N«
12:56:12.733 › postgres: 2022-10-26 12:56:12.732 CEST [74911] ANWEISUNG:  COPY "taxi-one-day" FROM STDIN (DELIMITER ',', FORMAT CSV, HEADER true)

in the console, but nothing useful in the GUI (Import failed and an import stuck halfway).

Control-panel type of interface

I'm thinking that a control-panel type of interface might make more sense.

Basically, the "admin" interface would be this and nothing else:

image

The "new metabase tab" button is self-explanatory

If you click the + at the bottom, then this opens a new tab which manages the import process.

This new tab behaves exactly as our current interface does from "Import Table" on downwards. (i.e. has a drag-and-drop window, gives preview, and lets you configure the PE).

The user can abort the process by deleting the tab, or finish the process with the final import step (which could be labeled Load table into PostgreSQL. The success or error code pop-up is shown, and if the user deletes the pop-up, then the tab is removed and the "admin" tab comes back.

For documentation, we could do one or more the following:

  1. Have "documentation" in the pull-down in the "help" menu. This brings up the documentation in a new tab.
  2. Have a little i mark near various items in the control panel or import tab (though this kind of thing is generally ugly, so I don't know).

In any event, there are then four types of tabs, the admin tab, a Metabase tab, an Import tab, and a documentation tab. Also I'd consider renaming the "Admin Panel" tab to "Tables" or "Home".)

Prevent importing tables which later require double-quotes

The suggested table name is currently the CSV filename without extension. This causes certain characters (space, upper case letters) to make PostgreSQL require "" when referencing the table, which is inconvenient for the user (reported by Paul).

My suggestion for a low-hanging-fruit-fix for this is the following:

  1. When we're suggesting the table name to the user, we should replace all the offending characters (downcase, replace space with _, maybe replace <upper case letter><lower case letter> with _<lower case letters>, i.e. camel-case to snake-case conversion)
  2. When the user edits the table name in a way that it contains offending letters, we should show a warning in a box, like we currently do for name conflicts.

Data discovery + preview + import to PG

For now I bundle these 3 large topics together, to see what options we have. Once separate subtasks crystalize I'll divide up.

I'm treating CSV as priority and Excel at the back of my head.

EDIT: CSV as priority but not bothering with Excel

Detail note: I won't be reusing the CSV import from desktop, which was done via reference, so this part needs to be done in pure JS (or PostgreSQL, if there's a way 🤔 )

Move GUI app to top folder.

It makes sense to do so since it will be the central component of the project and we plan to use npm commands for everything.

Cancel button (during data import)

A couple times now I wanted to back out of an import dialog, and there was no way to do it short of restarting the app or importing and then deleting.

A "Cancel" button in the vicinity of the "Import" button would be handy.

Toggle switch for personal/public choice

I find the "Personal" button under "Select the protected entity..." to be confusing. My first thought is that, since the button seems to be in the "off" position, that you need to push it to make the table person. Maybe better to have a toggle with "Personal" and "Public" choices.

Default would be "Personal".

I'm thinking something like this:

image

File-oriented paradigm

I wonder if we should adopt a file-oriented paradigm for Dashboard rather than the current import-oriented paradigm.

The reason is this. Currently Moers organizes all their data by file. This is among other things how they manage access control, and I would guess that, to the extent that they automate, they do so by managing files.

The current paradigm for Dashboard is import-oriented. That is to say, a file is imported, and it gets duplicated and is it's own separate entity. The user could delete the original file, and it would still live in Dashboard. The user could overwrite the original file with something new, and the old version would live on in Dashboard. So there is a kindof disconnect between what is in dashboard and the files that the users know and love.

An alternative paradigm would be file-oriented, whereby the user points Dashboard to a file. Conceptually for the user, whatever is in the file is also in Dashboard. For all practical purposes the user can imagine that he or she is querying the file itself, and not a copy in Dashboard. If the file is removed, then it is removed from Dashboard. If the file changes, then the new data is imported to Dashboard.

Practically-speaking, this might have the following impact on the GUI:

  1. In the Imported Tables table, there is a link to the file itself ("Show in folder")
  2. In addition to drag-and-drop (or maybe even instead of), we have a "Select file" thing that brings up a file manager window.
  3. We use language like "Make file visible in Dashboard" rather than "Import file".
  4. The file name appears everywhere, especially including the table name in Metabase.
  5. If the file is deleted, the corresponding entry in Dashboards is deleted.
  6. If the file is changed, Dashboards points this out to the user and asks if the modified file should be visible in Dashboards (thus triggering an upload).
  7. Of course we have the problem of duplicate file names. One approach for this might be to prepend some of the file name with some of the path ... enough to distinguish the two files. (Alternatively we just require the user to change the file name.)

And so on. What do you think? (Note this also makes bulk file upload a very natural extension.)

Manage failure during initialization of pg_diffix and metabase - cleaning of broken setup

Split from #53, see discussion there.

Some thoughts from there:

managing setup failure - should we wipe app state ($HOME/.dashboards) clean on setup failure or expose a way for users to do it from the GUI?

I have a feeling that it will be better if we do this from the start as well (besides the logging stuff).
If we don't want to reset the state automatically after an initialization failure, we should at least mark it somehow as being incomplete and ask the user if they want a clean state at the next startup.

Add automatic builds.

We need an automated way to create setup packages periodically or on demand on GitHub, instead of relying on manual local builds.

Handle systems with non-standard locales.

Users will most likely run the app with a German locale set. We need a strategy for handling data import for such setups. We can either:

  • force a standard locale throughout the pipeline. This will be simple, consistent, and less error prone, but possibly inconvenient for users.
  • use the system default locale (with the option of overriding the setting per file). More complex, but we will be able to handle localized data files without reconversion.

Add data query tab.

The "New tab" button should open a tab with a Metabase web-view for querying imported data.

Clean up AID column terminology

The AID column terminology is a bit confusing right now.

Currently we have the following:

Object Current setting
The default text in the config box Select a column or 'None'
The entry in the pull-down menu [Auto generated row index column]
The entry in the AID columns of the the Imported Tables table (blank line) Is this a bug?
The actual column name in the postgres table DiffixRowIndex
How Metabase displays the column name DIFFIXROWINDEX

Concrete ideas:

Use "Protected Entity" everywhere instead of "AID". (Currently there is a column labeled AID columns in the Imported Tables table.

Use "None" everywhere in Dashboards to refer to Personal tables that have no AID column.

To distinguish this from Public tables, we have separate tables for listing Personal and Public tables. Naturally the Public table listing does not have the "Protected Entity Columns" column.

Already the naming is inconsistent, because the default text doesn't match the pull-down.

How about this: We use 'None' for the pull-down menu and the AID columns of the Imported Tables table for personal tables that have no AID column, and we use 'Not Applicable' for in the Imported Tables table for public tables.

Or maybe better yet, we divide the table into two tables, one for Personal tables and one for Public tables. Then it is very clear...

Possible change the name of the index-based AID column in postgres to Protected_Entity_Index or something that includes the term "protected entity". (Note that Metabase lists column names in ALL-CAPS, so camel-case doesn't translate well into what Metabase shows.)

Allow to customize imported table name and handle name conflicts

Currently, the name is fixed to the name of the CSV file and when the user imports, an existing table with the same name gets dropped.

Once we have the ability to list and delete existing tables in the admin panel, we should improve this behavior accordingly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.