toucantoco / weaverbird Goto Github PK
View Code? Open in Web Editor NEWA visual data pipeline builder with various backends
Home Page: https://weaverbird.toucantoco.dev
License: BSD 3-Clause "New" or "Revised" License
A visual data pipeline builder with various backends
Home Page: https://weaverbird.toucantoco.dev
License: BSD 3-Clause "New" or "Revised" License
We wanted to add the possibility to override some style variables (e.g: active-color), we thus had to remove the scope attributes on style block, that prevent to overwrite a class not on purpose.
So now, we've got style in a separate file that prevents the style to be computed before we can override it in another project.
I think a better solution can be found in order to follow Vue convention 'Single File Component' , but I failed to find it.
An important pain for query editors in Toucan Toco today is to express basic "if...then...else" conditions to populate a new column based on conditions on existing ones.
This point was not obvious in our steps usage analysis as it was not identified as an aggregation operator in the MongoDB aggregation pipeline ($cond), and it is not implement in Toucan Toco postprocess python utils.
Nevertheless, it is widely used and we should give it some priority.
A possible approach would be to use a dedicated formula in our formula editor accessed when creating a new column. A basic formula such as the excel one should do the trick:
IF( [COLUMN_1] < [COLUMN_2] AND [COLUMN_3] = "TCTC", "OK", [COLUMN_4])
Another, more graphical approach (and more explanatory to the user), would be to have a dropdown appearing when clicking on "New column" to select between "Formula" (for very basic arithmetic operations between column) and "Based on condition" (opening a modal to graphically express conditions and output)
The left-pane will host a graphic rendering of the pipeline. The right-pane (720px min ?) will host the dataset viewer.
Each step is created (or edited) via a specific form.
We have yet to decide if these forms are built manually or with something like https://github.com/fagbokforlaget/vue-jsonschema-form
As a query designer, when I hover an operation, the candidate columns for this operations are highlighted.
This 'filter' step config in our VQB "language"...
{
name: 'filter',
and: [
{
column: 'column_1'
value: ['foo', 'bar']
operator: 'in'
}
{
column: 'column_2'
value: ['toucan', 'toco']
operator: 'nin'
}
{
or: [
{
column: 'column_3'
value: 'toto'
operator: 'eq'
}
{
and: [
{
column: 'column_4'
value: 42
operator: 'lte'
}
{
column: 'column_4'
value: 0
operator: 'gt'
}
]
}
]
}
]
}
... should yield the following mongo aggregation pipeline step:
{
$match:
column_1: $in: ["foo", "bar"]
column_2: $nin: ["toucan", "toco"] # an 'and' between conditions on different columns does not need an explicit '$and' operator
$or: [
{
column_3: $eq: "toto" # '$eq' is optional as it is implicit in Mongo
}
{
$and: [ # an 'and' between conditions on the same column do need an explicit '$and' operator
{
column_4: $lte: 42
}
{
column_4: $gt: 0
}
]
}
]
}
As a developer, it would be nice to test the editor in real conditions with real data (e.g. CSV files or a toy mongo database).
One could imagime something like:
git clone https://github.com/ToucanToco/vue-query-builder/issues/new
python examples/toy-backend.py my-config-file.yml
yarn start-demo
This 'unpivot' step in our VQB "language"...
{
name: 'unpivot'
keep: ["MARCHE", "CANAL"]
unpivot: ["NB_CLIENTS_TOTAL", "NB_ROWS"]
unpivot_column_name: "KPI"
value_column_name: "VALUE"
dropna: true # default
}
... should yield the following mongo aggregation steps:
{
$project:
MARCHE: "$MARCHE"
CANAL: "$CANAL"
dimensions:
$objectToArray: {
NB_CLIENTS_TOTAL: "$NB_CLIENTS_TOTAL"
NB_ROWS: "$NB_ROWS"
}
}
{
$unwind: "$dimensions"
}
{
$project:
MARCHE: 1
CANAL: 1
KPI: "$dimensions.k"
VALUE: "$dimensions.v"
}
{
$match:
VALUE: $ne: null # dropna here
}
This 'unpivot' step in our VQB "language"...
{
name: 'unpivot'
keep: ["MARCHE", "CANAL"]
unpivot: ["NB_CLIENTS_TOTAL", "NB_ROWS"]
unpivot_column_name: "KPI"
value_column_name: "VALUE"
dropna: false
}
... should yield the following mongo aggregation steps:
{
$project:
MARCHE: "$MARCHE"
CANAL: "$CANAL"
dimensions:
$objectToArray: {
NB_CLIENTS_TOTAL: "$NB_CLIENTS_TOTAL"
NB_ROWS: "$NB_ROWS"
}
}
{
$unwind: "$dimensions"
}
{
$project:
MARCHE: 1
CANAL: 1
KPI: "$dimensions.k"
VALUE: "$dimensions.v"
}
This pipeline of steps in our VQB "language"...
[
{
name: 'domain',
domain: 'focus_media_campaigns_levers_objectives'
}
{
name: 'filter',
column: 'period',
value: "Monthly View"
operator: 'eq'
}
{
name: 'filter',
column: 'period',
value: ["2017", "2018"]
operator: 'in'
}
{
name: 'group',
dimensions: ['day', "Campaign name"]
measures: [
{
column: "Clicks"
agg_function: 'avg'
}
{
column: "Budget"
agg_function: 'sum'
}
{
column: "Impressions"
agg_function: 'sum'
}
]
}
{
name: 'new_column_formula',
new_column_name: 'CPM'
formula: "[Budget] / [Impressions] * 1000"
}
{
name: 'unpivot'
id: ["day", "campaign"]
value: ["Budget", "Impressions", "CPM"]
dropna: true
}
]
... should yield the following query + postprocess pipelines:
query: [
$match:
domain: "focus_media_campaigns_levers_objectives"
period: 'Monthly View'
year: $in: ["2017", "2018"]
,
$group:
_id:
day: "$day"
campaign: "$Campaign name"
Clicks:
$avg: '$Clicks'
Budget:
$sum: '$Budget'
Impressions:
$sum: "$Impressions"
,
$project:
day: "$_id.day"
campaign: "$_id.campaign"
Clicks: 1
Budget: 1
Impressions: 1
CPM:
$multiply: [ $divide: [ "$Budget", "$Impressions"] , 1000 ]
]
postprocess: [
melt:
id: ["day", "campaign"]
value: ["Budget", "Impressions", "CPM"]
dropna: true
]
query: [
...
,
# Derived from our group step
$project:
day: "$_id.day"
campaign: "$_id.campaign"
Clicks: 1
Budget: 1
Impressions: 1
,
# Derived from our formula step
$project:
CPM:
$multiply: [ $divide: [ "$Budget", "$Impressions"] , 1000 ]
]
...
In the right pane, we need a table that displays the dataset. The table should be scrollable horizontally.
These 'replace' steps config in our VQB "language"...
{
name: 'replace',
search_column: "column_1"
new_column: "column_1" # inplace by default
oldvalue: 'foo'
newvalue: 'bar'
}
{
name: 'replace',
search_column: "column_1"
new_column: "column_1" # inplace by default
oldvalue: 'old',
newvalue: 'new'
}
... should yield the following mongo aggregation pipeline step:
{
$project:
<all other columns>: 1
column_1: $cond: [{$eq: ["$column_1", "foo"]}, "bar", "$column_1"] # No, there is not a more elegant way of doing it in he mongo aggregation pipeline :) But it does have the advantage to work for any data type
}
{
# We need a distinct step for the second replace (if we decide that it is more readable than nested $cond in one step...). We cannot merge the $project steps as they would include non unique keys as we want the replacement to be performed inplace, so only one replacement would be retained by Mongo.
$project:
<all other columns>: 1
column_1: $cond: [{$eq: ["$column_1", "old"]}, "new", "$column_1"]
}
These 'replace' steps config in our VQB "language"...
{
name: 'replace',
search_column: "column_1"
new_column: "new_column"
oldvalue: 'foo'
newvalue: 'bar'
}
{
name: 'replace',
search_column: "new_column"
oldvalue: 'old',
newvalue: 'new'
}
... should yield the following mongo aggregation pipeline step:
{
$project:
<all other columns>: 1
column_1: 1 # We do want to keep the source column
new_column: $cond: [{$eq: ["$column_1", "foo"]}, "bar", "$column_1"]
}
{
# We need a distinct step for the second replace (if we decide that it is more readable than nested $cond in one step...). We cannot merge the $project steps as they would include non unique keys as we want the replacement to be performed inplace, so only one replacement would be retained by Mongo.
$project:
<all other columns>: 1
column_1: 1 # We do want to keep the source column
new_column: $cond: [{$eq: ["$new_column", "old"]}, "new", "$column_1"] # Here we base our replacement on the 'new_column' from step 1
}
These 'replace' steps config in our VQB "language"...
{
name: 'replace',
search_column: "column_1"
new_column: "column_1" # inplace by default
oldvalue: 'foo'
newvalue: 'bar'
}
{
name: 'replace',
search_column: "column_2"
new_column: "new_column"
oldvalue: 'old',
newvalue: 'new'
}
... should yield the following mongo aggregation pipeline step:
{
$project:
<all other columns>: 1
column_2: 1
column_1: $cond: [{$eq: ["$column_1", "foo"]}, "bar", "$column_1"]
new_column: $cond: [{$eq: ["$column_2", "old"]}, "new", "$column_1"]
}
The number of rows in my dataset is visible in a badge on the bottom left hand corner of the dataset viewer.
We'll need a spinner component for any long-processing operation
Some of the step edition forms can be updated by simply clicking on columns or cells.
E.g. if a step applies on a specific column, I can edit the step. If I want to change the processed column, I simply have to click on the one that I want. Ideally, if this kind of step doesn't apply to a given column, it's disabled and I can't click on it.
As a query designer, I can click on the "type" label in a column header and change it.
The corresponding cast step is generated in the pipeline.
This 'formula' step config in our VQB "language"...
{
name: 'formula'
new_column: 'new_col' # new_column can be an existing column to be updated with the formula result
formula: '(column_1 + column_2) / column_3 - column_4 * 100'
}
... should yield the following mongo aggregation pipeline step:
{
$project:
<all other columns>: 1
new_col:
$subtract: [
{
$divide: [
{
$add: [
"$column_1"
"$column_2"
]
}
"$column_3"
]
}
{
$multiply: [
"$column_4"
100
]
}
]
}
This 'fillna' step config in our VQB "language"...
{
name: 'fillna',
column: "foo"
value: "bar"
}
... should yield the following mongo aggregation pipeline step:
{
$project:
<all other columns>: 1
foo: $cond: [{$eq: ["$foo", null]}, "bar", "$foo"]
}
As an expert designer, I can use keyboard shortcuts to navigate in my dataset viewer, search for operations, edit the pipeline, update the preview, etc.
As a query designer, I can configure a if-then-else
step easiliy
This 'top' step config in our VQB "language"...
{
name: 'top',
groups: ['foo']
value: 'bar',
sort: 'desc' # or 'asc'
limit: 10
}
... should yield the following mongo aggregation pipeline step:
{
$sort:
bar: -1 # or 1 when 'asc'
}
{
$group:
_id:
foo: "$foo"
array:
$push: "$$ROOT"
{
{
$project:
top_elems: $slice: ["$array", 10]
}
$unwind: "$top_elems"
}
{
$project:
<all_columns>: "$top_elems.<all_columns>" # We want to keep all columns
}
]
This 'top' steps config in our VQB "language"...
{
name: 'top',
value: 'bar',
sort: 'desc' # or 'asc'
limit: 10
}
... should yield the following mongo aggregation pipeline step:
{
$sort:
bar: -1 # or 1 when 'asc'
}
{
$group:
_id: null
array:
$push: "$$ROOT"
{
{
$project:
top_elems: $slice: ["$array", 10]
}
$unwind: "$top_elems"
}
{
$project:
<all_columns>: "$top_elems.<all_columns>" # We want to keep all columns
}
]
We use rollup
to build and ship an ES6 module. We should therefore get rid of webpack
since we don't want to keep two distinct package bundlers.
As a query designer, I am able to see the query that corresponds to the current transformation pipeline.
At this point, the goal is mainly to ease the debugging process, there's no need to design it thoroughly.
As a query designer, the UI allows me to configure combinations (or / and) of filters.
This 'percentage' step config in our VQB "language"...
{
name: 'percentage'
new_column: 'new_col'
column: 'bar'
group: ['foo']
}
... should yield the following mongo aggregation pipeline step:
{
$group:
_id:
foo: "$foo"
array:
$push: "$$ROOT"
total_denum: $sum: "$bar"
}
{
$unwind: "$array"
}
{
$project:
new_col: $cond: [$eq: ["$total_denum", 0], null, $divide: ["$array.bar", "$total_denum"]] # we need to explicitely manage the case where '$total_denum' is null otherwise the query may just fail
<all_other_columns>: "$array.<all_other_columns>"
}
This 'percentage' step config in our VQB "language"...
{
name: 'percentage'
column: 'bar'
}
... should yield the following mongo aggregation pipeline step:
{
$group:
_id: null
array:
$push: "$$ROOT"
total_denum: $sum: "$bar"
}
{
$unwind: "$array"
}
{
$project:
bar: $cond: [$eq: ["$total_denum", 0], null, $divide: ["$array.bar", "$total_denum"]] # we need to explicitely manage the case where '$total_denum' is null otherwise the query may just fail
<all_other_columns>: "$array.<all_other_columns>"
}
Define a few responsive behaviours (to be refined):
This 'sort' step config in our VQB "language"...
{
name: 'sort',
columns: ["foo"]
order: ["asc"] # 'asc' by default
}
... should yield the following mongo aggregation pipeline step:
{
$sort:
foo: 1
}
This 'sort' step config in our VQB "language"...
{
name: 'sort',
columns: ["foo", "bar"]
order: ["asc", "desc"]
}
... should yield the following mongo aggregation pipeline step:
{
$sort:
foo: 1
bar: -1
}
These 'rename' steps config in our VQB "language"...
{
name: 'rename',
oldname: 'foo',
newname: 'bar'
}
{
name: 'rename',
oldname: 'old',
newname: 'new'
}
... should yield the following mongo aggregation pipeline step:
{
$project:
<all other columns>: 1
bar: "$foo"
new: "$old"
}
This 'pivot' step config in our VQB "language"...
{
name: 'pivot',
index: ['column_1', 'column_2'],
column_to_pivot: 'column_3'
value_column: 'column_4'
agg_function: 'sum' # 'mean' by default
}
... should yield the following mongo aggregation pipeline step:
{
$group:
_id:
column_1: "$column_1"
column_2: "$column_2"
column_3: "$column_3"
column_4: $sum: "$column_4" # first we perform the aggregation ('$mean' by default), with the _id including the column to pivot
}
{
$group:
_id:
column_1: "$_id.column_1"
column_2: "$_id.column_2"
array:
$addToSet:
column_3: "$_id.column_3"
column_4: "$column_4" # then we group with with index columns as _id and we push documents as an array of sets including a column for the column to pivot and a column for the corresponding value
}
{
$project:
tmp_object:
$arrayToObject:
$zip:
inputs: ["$array.column_3", "$array.column_4"] # then we project a tmp key to get an object from the array of couples [column_to_pivot, corresponding_value]
}
{
$addFields:
"tmp_object.column_1": "$_id.column_1"
"tmp_object.column_2": "$_id.column_2" # Then we include back in every document created in the previous step the index columns (still accessible in the _id object as we did not explicitly specified '_id: 0' in the previous step)
}
{
$replaceRoot:
newRoot: "$tmp_object" # then we replace the root of the document tree to get our columns ready for our needed table-like format !
}
]
As a query designer, I can click on a column and choose to delete it.
This will generate a DeleteStep
in the pipeline.
As a query designer, in the pipeline viewer, on a step, I can click on "trash" button that will display a popover with a confirmation message "Are you sure you want to delete this step ?". If I say "yes", the step is removed from the pipeline and the dataset is recomputed.
As a query designer, I can browser my paginated dataset. The default is to limit the viewport to N (i.e. 50) but I can choose to configure this N
visually and choose to jump to the next page.
http://allenfang.github.io/react-bootstrap-table/index.html might be a source of inspiration.
Implement the aggregation step translation described in #30 with the following restrictions:
count_distinct
)We'll need at least the following information:
{
length: 12345,
offset: 100,
limit: 50,
columns: [
{
name: 'my-col',
type: 'integer',
// possibly later some extra information such as `unique_values`, `min`, `max`, ...
}
]
}
Column types will be useful for a number of usecases (e.g. detect which operations apply and wich don't)
As a query designer, I can choose to generate queries on different backends. We could imagine the following ones:
mongo 3.6
+ python postprocess
mongo 4
+ python postprocess
pandas only
sql
+ python postprocess
We don't want here to implement all those backends, just to prepare the code (and the UI) architecturally to support multiple backends.
As a query designer, when I create / edit a transformation step, I have access to a "preview" button that will ask the host to recompute the dataset so I can preview the result.
The preview is also accessible when I click on a specific step in the pipeline pane.
As a query designer, when I click on a column header, I see the list of unique values to be able to filter on it (or just to get some insights).
If the dataset is paginated, when I first click, I see only the unique values that are currently displayed. I can click on a label "see all unique values" that will display all the unique values in this column, whether visible or not.
It must be possible to select (and highlight) a column in the table. It should also be possible to select (and highlight) more than one column at once with ctrl+click
Each transformation step has an associated category. For now, we envision the following categories:
filter
compute
text
date
aggregate
reshape
For each of these categories, we should have a dropdown button that will popover the list of related transformation steps.
As a query designer, I can click on a column, click on the "filter" step and pick the value to filter on either by clicking on a cell or manually entering the value.
This will generate a FilterStep
in the pipeline
In a column header, there is a small "call-to-action dropdown" icon to incite user to click.
To make the UI as light as possible, This dropdown icon should only be visible when the user hovers the column header.
As a query designer, in the pipeline viewer, I can click on the small circle beyond each step. This will disable the steps below the one I've just clicked on. Visually, the disabled steps will be slightly faded and the dataset viewer will update to the newly generated query.
As a query designer, I can enter some text in the search field to find operations matching this text.
I can see a list of suggested operations as a type. Ideally, the search is fuzzy and if there's no match in a given category, the corresponding buttons fade out.
As a query designer, I'm able to generate multiple queries on multiple datasets and combine the results.
In the dataset viewer, it should be possible to select (and higlight) a specific cell. It should also be possible to select (and highlight) multiple cells using ctrl+click
.
We need a generic popover component that will be used both on category buttons and column headers.
If the component is initialized with a large dataset, we won't be able to display all
the lines. For the first version, we'll fix an viewport limit to N (50 ?) rows.
A future version will do better.
count_distinct
This 'group' step config in our VQB "language"...
{
name: 'group',
on: ['dimension1', "dimension2"]
aggregations: [
{
name: "sum_value1" # default name: '<function>_<column>'
column: "value1"
agg_function: 'sum'
}
{
name: "avg_value2"
column: "value2"
agg_function: 'avg'
}
{
name: "MIN"
column: "value3"
agg_function: 'min'
}
{
name: "MAX"
column: "value4"
agg_function: 'max'
}
{
name: "count_value5"
column: "value5"
agg_function: 'count'
}
{
name: "value6"
column: "value6"
agg_function: 'first'
}
{
name: "value7"
column: "value7"
agg_function: 'last'
}
]
}
... should yield the following mongo aggregation pipeline:
query: [
$group:
_id:
dimension1: "$dimension1"
dimension2: "$dimension2"
sum_value1:
$sum: '$value1'
avg_value2:
$avg: '$value2'
MIN:
$min: "$value3"
MAX:
$max: "$value4"
count_value5:
$sum: 1 # Careful here: there is no '$count' operator, we '$sum' a series of 1 to count
value6:
$first: "$value6"
value7:
$last: "$value7"
,
$project:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
sum_value1: 1
avg_value2: 1
MIN: 1
MAX: 1
count_value5: 1
value6: 1
value7: 1
]
This 'group' step config in our VQB "language"...
{
name: 'group',
on: ['dimension1']
dimensions: [
{
name: "sum_value1"
column: "value1"
agg_function: 'sum'
}
]
}
... should yield the following mongo aggregation pipeline:
query: [
$group:
_id:
dimension1: "$dimension1" # not "_id: '$dimension'" as it may be possible but we want to keep the approach as generic as possible so we use a "levelled" definition of "_id" as it generalises to grouping on several dimensions
sum_value1:
$sum: '$value1'
,
$project:
dimension1: "$_id.dimension1"
sum_value1: 1
]
This 'group' step config in our VQB "language"...
{
name: 'group',
on: ['dimension1', 'dimension2']
aggregations: [
{
name: "count_distinct_client"
column: "client"
agg_function: 'count_distinct'
}
]
}
... should yield the following mongo aggregation pipeline:
query: [
$group:
_id:
dimension1: "$dimension1"
dimension2: "$dimension2"
client: "$client" # This first '$group' builds unique groups and therefore make every client unique in every combination of dimension1 x dimension2 x client
,
$group:
_id:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
count_distinct_client:
$sum: 1 # Count the the unique groups and therefore the clients for every combination of dimension1 x dimension2
,
$project:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
count_distinct_client: 1
]
Important notice: We show here how we can perform such aggregations in the mongo aggregation pipeline, but the query gets quite complicated and we do not even cover the challenge of performing several count_distinct at the same time... In such a case, the approach detailed below would need to be nested a number of times equal to the number of count_distinct aggregation needed !!! It quite quickly become unacceptable both in terms of readability and performance.
=> So we believe that we should switch to the potsprocess option as soon as the aggregation contains a 'count_distinct' combined with at least one other aggregation.
This 'group' step config in our VQB "language"...
{
name: 'group',
on: ['dimension1', 'dimension2']
aggregations: [
{
name: "sum_value"
column: "value"
agg_function: 'sum'
}
{
name: "count_distinct_client"
column: "client"
agg_function: 'count_distinct'
}
]
}
... should yield the following mongo aggregation pipeline:
query: [
$group:
_id:
dimension1: "$dimension1"
dimension2: "$dimension2"
client: "$client"
array_detail:
$push: '$$ROOT' # To keep track of the details and be able to perform every aggregation, we push every document ('$$ROOT' is a mongo variable similar to a cursor representing the current document being read) in arrays, with one array per group of [dimensions + the column for which we need to get the number of unique values]
,
$group:
_id:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
client:
$sum: 1 # here we count the unique clients as there are unique as per the group _id definition of the previous step
array_group:
$push: '$$ROOT' # Same thing here than in the step above: we need to keep track of every document to then be able to perform the other aggregations (see steps below)
,
$unwind: '$array_group' # The unwind step "destructure" the array to get a root document per document found in the array => in tabular terms, we get 1 row per document found in the array
,
$project:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
client: 1
array: "$array_group.array_detail" # We get the array with the most granular level of documents here to unwind it in the following step to get a row per document
,
$unwind: "$array"
,
$group:
_id:
dimension1: "$dimension1"
dimension2: "$dimension2"
sum_value:
$sum: "$array.value" # Now we can perform the other aggregations based on the most granular level
count_distinct_client:
$first: "$client" # And we need to keep the first value of count distinct found per group as this value was duplicated after the '$unwind' operations
,
$project:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
sum_value: 1
count_distinct_client: 1
]
This 'group' step config in our VQB "language"...
{
name: 'group',
on: ['dimension1', "dimension2"]
aggregations: [
{
name: "sum_value1" # default name: '<function>_<column>'
column: "value1"
agg_function: 'sum'
}
{
name: "my_name"
column: "value1"
agg_function: 'count'
}
]
}
... should yield the following mongo aggregation pipeline:
query: [
$group:
_id:
dimension1: "$dimension1"
dimension2: "$dimension2"
sum_value1:
$sum: '$value1'
count_value1:
$count: '$value1'
,
$project:
dimension1: "$_id.dimension1"
dimension2: "$_id.dimension2"
sum_value1: 1
my_name: 1
]
As a query designer, I can click on a column and choose the "replace value" transformation step. The step creation form is displayed and I'm asked to select values (either by manually entering values or by clicking on a cell) and choose the corresponding replacement value.
This will generate a ReplaceStep
in the pipeline.
As a query designer, I can make resize column width to be able to read long labels / values.
This 'argmax' step config in our VQB "language"...
{
name: 'argmax',
groups: ['foo'],
column: 'bar' # column in which to search for the max value
}
... should yield the following mongo aggregation pipeline step:
{
$group:
_id:
foo: "$foo"
array:
$push: "$$ROOT"
value_to_compare: $max: "$bar" # $min for argmin
}
{
$unwind: "$array"
}
{
$project:
value_to_compare: 1
<all_other_columns>: "$array.<all_other_columns> # We want to keep all columns
}
$redact: # shortcut operator to avoid to firstly create a boolean column via $project and then filter on 'true' rows via $match
$cond: [
{
$eq: ["$foo", "$value_to_compare"]
}
"$$KEEP"
"$$PRUNE" # Magic happens here with the mongo variable "$$KEEP" (for... keeping rows matching the condition) and "$$PRUNE" (for excluding rows not matching the condition)
]
]
This 'argmin' step config in our VQB "language"...
{
name: 'argmin',
column: 'bar'
}
... should yield the following mongo aggregation pipeline step:
{
$group:
_id: null
array:
$push: "$$ROOT"
value_to_compare: $min: "$bar"
}
{
$unwind: "$array"
}
{
$project:
value_to_compare: 1
<all_other_columns>: "$array.<all_other_columns> # We want to keep all columns
}
$redact: # shortcut operator to avoid to firstly create a boolean column via $project and then filter on 'true' rows via $match
$cond: [
{
$eq: ["$foo", "$value_to_compare"]
}
"$$KEEP"
"$$PRUNE" # Magic happens here with the mongo variable "$$KEEP" (for... keeping rows matching the condition) and "$$PRUNE" (for excluding rows not matching the condition)
]
]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.