Git Product home page Git Product logo

pg-aggregates's Introduction

@graphile/pg-aggregates

Adds a powerful suite of aggregate functionality to a PostGraphile schema: calculating aggregates, grouped aggregates, applying conditions to grouped aggregates, ordering by relational aggregates, filtering by the results of aggregates on related connections, etc.

IMPORTANT: aggregates are added to connections, they do not work with "simple collections".

Crowd-funded open-source software

To help us develop this software sustainably, we ask all individuals and businesses that use it to help support its ongoing maintenance and development via sponsorship.

And please give some love to our featured sponsors 🤩:

The Guild
The Guild
*
Dovetail
Dovetail
*
Netflix
Netflix
*
Stellate
Stellate
*
Steelhead
Steelhead
*

* Sponsors the entire Graphile suite

Status

This module is currently "experimental" status; we may change any part of it in a semver minor release.

Usage

Requires PostGraphile v4.12.0-alpha.0 or higher.

Install with:

yarn add postgraphile @graphile/pg-aggregates

CLI usage via --append-plugins:

postgraphile --append-plugins @graphile/pg-aggregates -c postgres://localhost/my_db ...

Library usage via appendPlugins:

import PgAggregatesPlugin from "@graphile/pg-aggregates";
// or: const PgAggregatesPlugin = require("@graphile/pg-aggregates").default;

const middleware = postgraphile(DATABASE_URL, SCHEMAS, {
  appendPlugins: [PgAggregatesPlugin],
});

If you want you could install our example schema and then issue a GraphQL query such as:

query GameAggregates {
  allMatchStats {
    aggregates {
      max {
        points
        goals
        saves
      }
      min {
        points
      }
    }
  }
  allPlayers(orderBy: [MATCH_STATS_BY_PLAYER_ID_SUM_GOALS_ASC]) {
    nodes {
      name
      matchStatsByPlayerId {
        totalCount
        aggregates {
          sum {
            points
            goals
            saves
          }
          average {
            points
            goals
            saves
            teamPosition
          }
        }
      }
    }
  }
}

or:

query GroupedAggregatesByDerivative {
  allMatchStats {
    byDay: groupedAggregates(groupBy: [CREATED_AT_TRUNCATED_TO_DAY]) {
      keys # The timestamp truncated to the beginning of the day
      average {
        points
      }
    }
    byHour: groupedAggregates(groupBy: [CREATED_AT_TRUNCATED_TO_HOUR]) {
      keys # The timestamp truncated to the beginning of the hour
      average {
        points
      }
    }
  }
}

To filter by aggregates on related tables, you will also need postgraphile-plugin-connection-filter, and you will need to enable graphileBuildOptions.connectionFilterRelations as documented here.

app.use(
  postgraphile(DATABASE_URL, SCHEMA_NAME, {
    graphileBuildOptions: {
      connectionFilterRelations: true,
    },
  })
);

Interaction with connection parameters

Aggregates respect the conditions/filters of the connection but are unaffected by the pagination of the connection (they ignore the first/last/after/before/orderBy parameters). You may retrieve (optionally paginated) node data from a connection at the same time as retrieving aggregates from it. Aggregates are supported on connections at any level of the GraphQL query.

Aggregates

Connection-wide aggregates are available via the aggregates field directly on a GraphQL connection; for example:

query LoadsOfAggregates {
  allFilms {
    aggregates {
      average {
        durationInMinutes
      }
    }
  }
}

We support the following aggregates out of the box:

  • sum (applies to number-like fields) - the result of adding all the values together
  • distinctCount (applies to all fields) - the count of the number of distinct values
  • min (applies to number-like fields) - the smallest value
  • max (applies to number-like fields) - the greatest value
  • average (applies to number-like fields) - the average (arithmetic mean) value
  • stddevSample (applies to number-like fields) - the sample standard deviation of the values
  • stddevPopulation (applies to number-like fields) - the population standard deviation of the values
  • varianceSample (applies to number-like fields) - the sample variance of the values
  • variancePopulation (applies to number-like fields) - the population variance of the values

See Defining your own aggregates below for details on how to add your own aggregates.

Different aggregates apply to different data types; in general we attempt to add aggregate entries for each column and computed column function that appears to be compatible with the aggregate.

Ordering by aggregates

This plugin automatically adds some additional orderBy criteria to your graph allowing you to order by aggregates over relations; e.g. you could find the top 5 players ordered by their average points scored in each match, and grab some more aggregate information about them too:

query FocussedOrderedAggregate {
  allPlayers(
    first: 5
    orderBy: [MATCH_STATS_BY_PLAYER_ID_AVERAGE_POINTS_DESC]
  ) {
    nodes {
      name
      matchStatsByPlayerId {
        totalCount
        aggregates {
          sum {
            goals
          }
          average {
            points
          }
        }
      }
    }
  }
}

Filtering by aggregates

(You will need postgraphile-plugin-connection-filter for this; see above.)

query PlayersWith9OrMoreSavesInMatchesTheyScoredIn {
  allPlayers(
    filter: {
      matchStatsByPlayerId: {
        aggregates: {
          sum: { saves: { greaterThan: "9" }, rating: { lessThan: 143 } }
          filter: { goals: { greaterThan: 0 } }
        }
      }
    }
  ) {
    nodes {
      name
      matchStatsByPlayerId(filter: { goals: { greaterThan: 0 } }) {
        aggregates {
          sum {
            saves
            rating
            goals
          }
        }
      }
    }
  }
}

Grouped aggregates

We also support grouping your data via the value of one of your columns or a derivative thereof; and calculating aggregates over each of the matching groups. Out of the box we support two derivatives:

  • truncated-to-hour (applies to timestamp-like values) - truncates to the beginning of the (UTC) hour
  • truncated-to-day (applies to timestamp-like values) - truncates to the beginning of the (UTC) day

See Defining your own grouping derivatives below for details on how to add your own grouping derivatives.

The aggregates supported over groups are the same as over the connection as a whole (see Aggregates above), but in addition you may also determine the keys that were used for the aggregate. There will be one key for each of the groupBy values; for example in this query:

query AverageDurationByYearOfRelease {
  allFilms {
    groupedAggregates(groupBy: [YEAR_OF_RELEASE]) {
      keys
      average {
        durationInMinutes
      }
    }
  }
}

each entry in the groupedAggregates result will have a keys entry that will be a list containing one value which will be the year of release (as a string). The values in the keys list are always stringified, this is a known limitation due to interactions with GraphQL.

Having

If these grouped aggregates are returning too much data, you can filter the groups down by applying a having clause against them; for example you could see the average number of goals on days where the average points score was over 200:

query AverageGoalsOnDaysWithAveragePointsOver200 {
  allMatchStats {
    byDay: groupedAggregates(
      groupBy: [CREATED_AT_TRUNCATED_TO_DAY]
      having: { average: { points: { greaterThan: 200 } } }
    ) {
      keys
      average {
        goals
      }
    }
  }
}

Defining your own aggregates

You can add your own aggregates by using a plugin to add your own aggregate specs. Aggregate specs aren't too complicated, for example here is a spec that could define the "min" aggregate:

const isNumberLike = (pgType) => pgType.category === "N";

const minSpec = {
  id: "min",
  humanLabel: "minimum",
  HumanLabel: "Minimum",
  isSuitableType: isNumberLike,
  sqlAggregateWrap: (sqlFrag) => sql.fragment`min(${sqlFrag})`,
};

See src/AggregateSpecsPlugin.ts for more details/examples.

Defining your own grouping derivatives

You may add your own derivatives by adding a group by spec to build.pgAggregateGroupBySpecs via a plugin. Derivative specs are fairly straightforward, for example here's the spec for "truncated-to-hour":

const TIMESTAMP_OID = "1114";
const TIMESTAMPTZ_OID = "1184";

const truncatedToHourSpec = {
  // A unique identifier for this spec, will be used to generate its name:
  id: "truncated-to-hour",

  // A filter to determine which column/function return types this derivative
  // is valid against:
  isSuitableType: (pgType) =>
    pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID,

  // The actual derivative - given the SQL fragment `sqlFrag` which represents
  // the column/function call, return a new SQL fragment that represents the
  // derived value, in this case a truncated timestamp:
  sqlWrap: (sqlFrag) => sql.fragment`date_trunc('hour', ${sqlFrag})`,
};

Building that up with a few more different intervals into a full PostGraphile plugin, you might write something like:

// Constants from PostgreSQL
const TIMESTAMP_OID = "1114";
const TIMESTAMPTZ_OID = "1184";

// Determine if a given type is a timestamp/timestamptz
const isTimestamp = (pgType) =>
  pgType.id === TIMESTAMP_OID || pgType.id === TIMESTAMPTZ_OID;

// Build a spec that truncates to the given interval
const tsTruncateSpec = (sql, interval) => ({
  // `id` has to be unique, derive it from the `interval`:
  id: `truncated-to-${interval}`,

  // Only apply to timestamp fields:
  isSuitableType: isTimestamp,

  // Given the column value represented by the SQL fragment `sqlFrag`, wrap it
  // with a `date_trunc()` call, passing the relevant interval.
  sqlWrap: (sqlFrag) =>
    sql.fragment`date_trunc(${sql.literal(interval)}, ${sqlFrag})`,
});

// This is the PostGraphile plugin; see:
// https://www.graphile.org/postgraphile/extending/
const DateTruncAggregateGroupSpecsPlugin = (builder) => {
  builder.hook("build", (build) => {
    const { pgSql: sql } = build;

    build.pgAggregateGroupBySpecs = [
      // Copy all existing specs, except the ones we're replacing
      ...build.pgAggregateGroupBySpecs.filter(
        (spec) => !["truncated-to-day", "truncated-to-hour"].includes(spec.id)
      ),

      // Add our timestamp specs
      tsTruncateSpec(sql, "year"),
      tsTruncateSpec(sql, "month"),
      tsTruncateSpec(sql, "week"),
      tsTruncateSpec(sql, "day"),
      tsTruncateSpec(sql, "hour"),
      // Other values: microseconds, milliseconds, second, minute, quarter,
      // decade, century, millennium.
      // See https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
    ];

    return build;
  });
};

module.exports = DateTruncAggregateGroupSpecsPlugin;

Finally pass this plugin into PostGraphile via --append-plugins or appendPlugins: [...] - see https://www.graphile.org/postgraphile/extending/

See src/AggregateSpecsPlugin.ts for examples and more information.

Disable aggregates

By default, aggregates are created for all tables. This significantly increases the size of your GraphQL schema, and could also be a security (DoS) concern as aggregates can be expensive. We recommend that you use the disableAggregatesByDefault: true option to disable aggregates by default, and then enable them only for the tables you need:

const middleware = postgraphile(DATABASE_URL, SCHEMAS, {
  // ...
  appendPlugins: [
    // ...
    PgAggregatesPlugin,
  ],

  graphileBuildOptions: {
    // Disable aggregates by default; opt each table in via the `@aggregates` smart tag
    disableAggregatesByDefault: true,
  },
});

Enable aggregates for a specific table:

"class": {
  "my_schema.my_table": {
    "tags": {
      "aggregates": "on"
    }
  }
}

or:

COMMENT ON TABLE my_schema.my_table IS E'@aggregates on';

You also can keep aggregates enabled by default, but disable aggregates for specific tables:

"class": {
  "my_schema.my_table": {
    "tags": {
      "aggregates": "off"
    }
  }
}

or:

COMMENT ON TABLE my_schema.my_table IS E'@aggregates off';

Thanks

This plugin was started as a proof of concept in 2019 thanks to sponsorship from OneGraph, and was made into fully featured released module thanks to sponsorship from Surge in 2021. It is maintained thanks to the support of Graphile's sponsors - thank you sponsors!

pg-aggregates's People

Contributors

benjie avatar dependabot[bot] avatar dwwoelfel avatar eugeneduda avatar jemgillam avatar rattrayalex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pg-aggregates's Issues

TIMESTAMP_TRUNCATED_TO_MONTH & TIMESTAMP_TRUNCATED_TO_YEAR

Feature description

Add more truncated timestamp group by options:
TIMESTAMP_TRUNCATED_TO_WEEK
TIMESTAMP_TRUNCATED_TO_MONTH
TIMESTAMP_TRUNCATED_TO_YEAR

Motivating example

Clearly missing from the list:
TIMESTAMP_TRUNCATED_TO_HOUR
TIMESTAMP_TRUNCATED_TO_DAY

This is required for my reporting project. Until this is in, I'll have to manually build queries for these.

This is an incredible tool to use for reporting services on an isolated reporting replica... it only makes sense to complete this list so that we can group our data in all the meaningful ways, using a terse built-in method that already exists.

Breaking changes

I don't see any reason this would cause breaking changes, simply extending current features.

Supporting development

I [tick all that apply]:

  • am interested in building this feature myself
  • am interested in collaborating on building this feature
  • am willing to help testing this feature before it's released
  • am willing to write a test-driven test suite for this feature (before it exists)
  • am a Graphile sponsor ❤️
  • have an active support or consultancy contract with Graphile

getComputedColumnDetails is not a function

Summary

When adding @graphile/pg-aggregates --append-plugin, I have this warning:

graphile-build:warn TypeError: getComputedColumnDetails is not a function
  graphile-build:warn     at /home/aviallon/Nextcloud/Pro/Prestas/Sprint/Avocat/plateforme_juridique/angularelectronapp/node_modules/@graphile/pg-aggregates/dist/AddHavingAggregateTypesPlugin.js:209:55

I am using PG 13.2. I ran yarn upgrade, still had the issue, retried with npx yarn-deduplicate && yarn, kept having the issue.
I have one computed column, returning a custom type, which might be the issue?

Steps to reproduce

Do the above.

Expected results

I expect to not have this warning.

Actual results

As summary says.

Additional context

postgraphile@^4.11.0
@graphile/pg-pubsub@^4.11.0
@graphile/subscriptions-lds@^4.11.0
@graphile/[email protected]
@graphile/pg-aggregates@^0.0.2
@graphile/[email protected]

Sponsoring

  • I am a Graphile backer

Cannot get the given example to work

First of all, thanks a lot for this plugin, it looks awesome! I got some things working, but not everything

Summary

I tried to follow your example, step by step, but cannot get 2 things to work:

  • MATCH_STATS_BY_PLAYER_ID_SUM_GOALS_ASC and matchStatsByPlayerId do not exist

Steps to reproduce

Just follow the examples, as described in the README

Expected results

No missing values / fields.

Actual results

{
  "errors": [
    {
      "message": "Value \"MATCH_STATS_SUM_GOALS_ASC\" does not exist in \"PlayersOrderBy\" enum. Did you mean the enum value \"VIEW_MATCH_STATS_SUM_GOALS_ASC\", \"VIEW_MATCH_STATS_SUM_GOALS_DESC\", \"VIEW_MATCH_STATS_MAX_GOALS_ASC\", \"VIEW_MATCH_STATS_MIN_GOALS_ASC\", or \"VIEW_MATCH_STATS_SUM_POINTS_ASC\"?",
      "locations": [
        {
          "line": 279,
          "column": 21
        }
      ]
    },
    {
      "message": "Cannot query field \"viewMatchStatsByPlayerId\" on type \"Player\". Did you mean \"viewMatchStatsList\" or \"viewMatchStats\"?",
      "locations": [
        {
          "line": 282,
          "column": 7
        }
      ]
    }
  ]
}

Additional context

These are the package versions I am using:

    "@graphile-contrib/pg-order-by-related": "^1.0.0-beta.6",
    "@graphile-contrib/pg-simplify-inflector": "^6.1.0",
    "@graphile/pg-aggregates": "^0.1.0",
    "postgraphile": "^4.12.3",
    "postgraphile-core": "^4.12.1",
    "postgraphile-plugin-connection-filter": "^2.2.2",
    "postgraphile-plugin-nested-mutations": "^1.1.0",

I am using it as a library, with these options

export const options: PostGraphileCoreOptions = {
  subscriptions: true,
  appendPlugins: [
    PgSimplifyInflectorPlugin,
    PostGraphileNestedMutations,
    PgOrderByRelatedPlugin,
    ConnectionFilterPlugin,
    PgAggregatesPlugin,
  ],
  simpleCollections: "both",
  graphileBuildOptions: {
    nestedMutationsSimpleFieldNames: true,
    connectionFilterRelations: true,
  },
  ignoreIndexes: false,
};

Possible Solution

The way I got it to work was to replace:

  • MATCH_STATS_SUM_GOALS_ASC with VIEW_MATCH_STATS_SUM_GOALS_ASC
  • matchStatsByPlayerId with viewMatchStats

But I am wondering why the ones you have described did not get generated for me.... and I wonder what else might not be working?

graphile-build:warn TypeError: getComputedColumnDetails is not a function

`

Dependencies

"postgraphile": "^4.5.5",
"postgraphile-plugin-connection-filter": "^1.1.3",
"@graphile/pg-aggregates": "^0.1.0"

###Error
When running node app.js, I get the following error

graphile | at GraphQLInputObjectType.getFields (/app/node_modules/graphql/type/definition.js:1017:27)
graphile | at Object.newWithHooks (/app/node_modules/postgraphile-core/node_modules/graphile-build/node8plus/makeNewBuild.js:702:36)
graphile | at /app/node_modules/@graphile/pg-aggregates/dist/AddHavingAggregateTypesPlugin.js:173:31
graphile | at Array.reduce ()
graphile | Recoverable error occurred:
graphile | 2021-04-23T18:59:15.472Z graphile-build:warn TypeError: getComputedColumnDetails is not a function
graphile | at /app/node_modules/@graphile/pg-aggregates/dist/AddHavingAggregateTypesPlugin.js:209:55
graphile | at Array.reduce ()
graphile | at fields (/app/node_modules/@graphile/pg-aggregates/dist/AddHavingAggregateTypesPlugin.js:199:90)
graphile | at fields (/app/node_modules/postgraphile-core/node_modules/graphile-build/node8plus/makeNewBuild.js:563:27)
graphile | at resolveThunk (/app/node_modules/graphql/type/definition.js:438:40)
graphile | at defineInputFieldMap (/app/node_modules/graphql/type/definition.js:1056:18)

All attributes appear in aggregates, even if disabled by behaviors

Summary

Using this plugin, I wanted to disable property.created_at but it shows up in the aggregate (see screenshot)
image

Using graphile behavior debug pgCodecAttribute property.created_at my final string for this property.created_at attribute is

Final string:
  -list -insert -update -delete -filterBy -orderBy -* -select -query -connection -order -base -filter -aggregates

Shouldn't it be disabled in the distinctCount aggregate?

Steps to reproduce

Use PgAggregates and set an attribute's behavior to -*.

Expected results

I would not expect that attribute to appear in the schema.

Actual results

That attribute appears in the schema.

Additional context

Discord thread

GroupBy computed columns not generated ?

create or replace function doe_indice(doe public.doe) returns text
    stable
    strict
    language sql
as
$$
    select ...;
$$;
comment on function doe_indice(doe public.doe) is E'indice de validation';

In schema.graphql :

type Doe implements Node {
  #...
"""indice de validation"""
  indice: String
  #...
}

enum DoeGroupBy {
  #...
  # INDICE not found :(
  #...
}

It says it should be available in the README :

We also support grouping your data via the value of one of your columns, no-additional-arguments computed columns

If i try to return an integer i see only it generated in the DoeHavingXXXInput inputs but not in the GroupBy. Am i missing something ?

Filtering by totalCount

Feature description

I would like to be able to filter a query using the count of a related object

Motivating example

I would like for instance to filter by number of warehouses of a company:

query MyQuery {
  companies {
    nodes {
      warehouses(
        filter: {totalCount: {greaterThan: 2}}){
        totalCount
      }
    }
  }
}

I saw this kind of use case in this issue in postgraphile-plugin-connection-filter : graphile-contrib/postgraphile-plugin-connection-filter#152 (comment)

However, I couldn't manage to use the proposed solution for all possible values, it worked only for greater than 0. It would also be easier to use I think with the above example, which was also what was tried initially by the linked issue's author.

Supporting development

I [tick all that apply]:

  • am interested in building this feature myself
  • am interested in collaborating on building this feature
  • am willing to help testing this feature before it's released
  • am willing to write a test-driven test suite for this feature (before it exists)
  • am a Graphile sponsor ❤️
  • have an active support or consultancy contract with Graphile

What's the best way to aggregate on duration?

Hey @benjie!

Our team is trying to add support for aggregating on durations with an interval data type but noticed that's not supported by the library yet. Is it straightforward to add that support? If so, we can take a stab at implementing it and submit PR. Alternatively, we could switch to representing durations with float instead if you recommend that.

Aggregates can return unsafe integers

Summary

We have some sums of numeric columns that exceed the range of safe integers in JS. The sum aggregator and others build the JSON object with the number directly without seeing if it's within bounds.

Obviously postgraphile and this plugin aren't limited to just the JS ecosystem, but a large consumer of postgraphile are surely browsers.

Steps to reproduce

Query an aggregate such that it returns a number larger than 2^53 - 1.

Expected results

A result such as:

{ "@sum": { "@amount": "29531249370647025750" } }

Actual results

{ "@sum": { "@amount": 29531249370647025750 } }

Which when parsed in JS yields (note the lost precision):

> JSON.parse('{"@sum":{"@amount":29531249370647025750}}')
{ '@sum': { '@amount': 29531249370647028000 } }

Additional context

Possible Solution

I'm willing to submit a patch to fix this issue, but need some direction on how to best implement this functionality.

To me, it seems that numerics, due to their arbitrary length both before and after the decimal place, should always be cast to text before getting returned, but this isn't backwards compatible.

Alternatively, additional aggregates could be introduced that can be used to cast to text when the aggregates are likely to exceed safe bounds and it's up to the consumer to parse those into numbers.

Order by totalCount with per-query custom filtering applied

Summary

Allow order by with totalCount

Additional context

I think this will expand on this issue #25

Currently we can filter with aggregate result, but not allow output to order by it.

For example, we have schema as follow:

type Author @entity{
  id:ID!
}

type Book @entity{
  id:ID!
  author: Author! //Fk author_id
  year: Int!
}

And we want to sort authors by the amount of books written after a certain year.

With pg-aggregates and connection-filter I could query list of authors and books written after year 1995,

However, the BOOKS_COUNT_DESC does not order by filtered result, it counts all books belong to author and order desc.

query {
  authors(
    orderBy: BOOKS_COUNT_DESC
    filter: {
      books: { 
        aggregates: { 
          distinctCount: { id: { greaterThan:"0" } } } 
      }
    }
  ) {
    nodes {
      id
      books(filter:{year:{greaterThanOrEqualTo:1995}}){
        totalCount
        nodes{
          id,
          year
        }
      }
    }
  }
}

Is there a generic solution for this? Rather than something like create additional plugin and make sortable on compute column. Thank you !

Aggregates ignore ordering

Summary

orderBy is ignored for aggregates - data is unordered and ORDER BY is missing from resulting SQL query

Steps to reproduce

Consider this example:

CREATE TEMPORARY TABLE example(id int, order_value int, group_value text);

INSERT INTO example(id, order_value, group_value)
VALUES (1, 2, 'gr1'), (2, 1, 'gr1'), (3, 3, 'gr2'), (4, 1, 'gr2'), (5, 2, 'gr2'), (6, 1, 'gr3');

Running query like this (implementation of arrayAgg is out of the issue, naive version actually worked for me):

query {
  examples(orderBy: [ORDER_VALUE_ASC]) {
    aggregates {
      arrayAgg {
        id
      }
    }
  }
}

Should return the same as this query:

SELECT group_value, array_agg(id)
FROM (SELECT * FROM example ORDER BY order_value ) AS ordered
GROUP BY group_value;

Which is:

group_value array_agg
gr1 {2,1}
gr2 {4,5,3}
gr3 {6}

Expected results

Ordering is included by at least some means. It should be possible in general:

The aggregate functions array_agg, json_agg, jsonb_agg, json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

Also see this thread.

Actual results

Ordering is absent from SQL query (according to graphiql).

Additional context

Ordering might be very important for some cases, e.g. when you want to return grouped rows themselves (and not just group keys or some aggregates).

More context why exactly I want this

I've been desperately trying to get grouped rows with pagination for each groups with postgraphile. The task is to return first N rows starting at M for every group. What I've tried:

  • Subquery for each group. For given example it looks like this:
    query {
      group_1: examplesConnection(orderBy: [ORDER_VALUE_ASC], condition: {groupValue: 'gr1'}, limit: N, offset: M) {
        nodes: { id }
      }
    
      group_2: examplesConnection(orderBy: [ORDER_VALUE_ASC], condition: {groupValue: 'gr2'}, limit: N, offset: M) {
        nodes: { id }
      }
    
      group_3: examplesConnection(orderBy: [ORDER_VALUE_ASC], condition: {groupValue: 'gr3'}, limit: N, offset: M) {
        nodes: { id }
      }
    }
    This is awful in practive as every group is a statement and for 20-30-40 groups it takes >10s, and query with 50 groups is denied by graphql. Normally acceptable 100-500ms per heavy query is 10 times worse than needed here, and we just cannot optimize it any further (plans are as good as we know).
  • Wrap postgraphile's resulting SQL query into LATERAL+LIMIT or add row_number() or anything else for paginated groups, see. I dind't find any way to do it in the docs with makeExtendSchemaPlugin or something else.
  • Add array_agg aggregate and paginate manually with javascript. Even two-step process will do: first query ids and paginate them, then query data for those ids. Trying to go this way but ordering not preserved for aggregates is the blocked here
  • Just load everything for every group and paginate manually in js. This is painfully stupid and I'd like to avoid it - having SQL DB and graphql framework and doing that.

Possible Solution

Maybe wrap table expression here into subquery with ORDER BY if orderBy is present?

Add relation aggregates to aggregates

Feature description

It would be really useful to be able to get summary aggregates of relations, so one could answer questions about related data.

Motivating example

Given this query:

{
  allUsers {
    nodes {
      ordersByUserId {
        aggregates {
          sum {
            price
          }
        }
      }
    }
  }
}

It would useful to also get the aggregated prices for all of the orders, something like this:

{
  allUsers {
    aggregates {
      ordersByUserId {
        sum {
          price
        }
      }
    }
    nodes {
      ordersByUserId {
        aggregates {
          sum {
            price
          }
        }
      }
    }
  }
}

Or to avoid creating children of aggregates that are not aggregates:

{
  allUsers {
    aggregates {
      sum {
        ordersByUserId {
          price
        }
      }
    }
    nodes {
      ordersByUserId {
        aggregates {
          sum {
            price
          }
        }
      }
    }
  }
}

Or even this…?

{
  allUsers {
    aggregates {
      sum {
        ordersByUserId(attribute: ORDERS_BY_USER_ID_PRICE)
      }
    }
    nodes {
      ordersByUserId {
        aggregates {
          sum {
            price
          }
        }
      }
    }
  }
}

Anyhow, the details in the resulting schema could be worked out with some iteration :) These examples are kind of trivial because it would be possible to get the sum of the order prices by querying allOrders directly. However, if you went one level deeper (imagine a multi-tenant system here, where an account is a company or some higher-order entity):

{
  allAccounts {
    aggregates {
      sum {
        price
        usersByAccountId {
          price
          ordersByUserId {
            price
          }
        }
      }
    }
  }
}

Then you end up with some really powerful capabilities, where you can get aggregates at many levels, which could by used to create useful dashboards, analytics features, etc.

Breaking changes

None—this would just expand the capabilities of the existing aggregates.

Supporting development

I [tick all that apply]:

  • am interested in building this feature myself
  • am interested in collaborating on building this feature
  • am willing to help testing this feature before it's released
  • am willing to write a test-driven test suite for this feature (before it exists)
  • am a Graphile sponsor ❤️
  • have an active support or consultancy contract with Graphile

Supporting user-defined aggregate functions

Summary

Supporting user-defined aggregate functions

Additional context

I have a custom type and a user-defined aggregate function that works on that type.

CREATE TYPE public.amount AS (
	number numeric,
	currency text
);

CREATE FUNCTION public.sum(state public.amount[], current public.amount) RETURNS public.amount[]
    LANGUAGE plpgsql
    AS $$
DECLARE
    found boolean = false;
    i int = 0;
BEGIN
    IF array_length(state, 1) IS NULL THEN
        RETURN array[current];
    END IF;

    FOR i IN 1..array_length(state, 1)
    LOOP
        IF state[i].currency = current.currency THEN
            state[i].number := state[i].number + current.number;
            found := true;
            EXIT;
        END IF;
    END LOOP;
    
    IF NOT found THEN
        state := array_append(state, current);
    END IF;

    RETURN state;
END;
$$;

CREATE AGGREGATE public.sum(public.amount) (
    SFUNC = public.sum,
    STYPE = public.amount[],
    INITCOND = '{}'
);

Unsurprisingly, columns using the custom amount type are not recognized as aggregatable by the plugin. I couldn't find any mention of user-defined aggregate functions in the documentation so I assume they are not supported. Would it be easy to add support? Any simple workarounds here to add support?

Aggregate keys should be nullable

Reported via chat:

A table with a nullable column, using the pg-aggregates plugin, has the potential to error when retrieving the keys resource.

For the pg-aggregates plugin, when one does a a groupedAggregates query, the keys may not be null (

keys: {
type: new GraphQLList(new GraphQLNonNull(GraphQLString)),
).

However, the query is able to select nullable fields to group by. This can result in unexpected errors. Is there a reason why keys needs to be non-null? If possible, could keys be made nullable?

Example table, query, data, and output is here: https://gist.github.com/ProbablyBrianBurgess/10f5138192936994ba8c6574e77452d0

The last item results with:

{
    "keys": null,
    "distinctCount": {
        "a": "1"
    }
}

This is better than a crash, but could keys be [null, "2", "6"] (based on the example data)? Although certainly not an exhaustive test, I did remove the new GraphQLNonNull portion and it worked as desired.

Aggregates provide null values

I dockerized a PostGraphile app by following steps in https://www.graphile.org/postgraphile/running-postgraphile-in-docker/
Next, I used pg-aggregates to get sum, average, max, min, etc.

Here is the info -

init.sql

\connect mypostgraphile;
create schema mygraphile;

/*Create store table in mygraphile schema*/
CREATE TABLE mygraphile.store (
    id SERIAL PRIMARY KEY,
    storename VARCHAR (255) NOT NULL,
    created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

COMMENT ON TABLE mygraphile.store IS
'List of stores.';

/*Create product table in mygraphile schema*/
CREATE TABLE mygraphile.product (
    id SERIAL PRIMARY KEY,
    productname VARCHAR (255) NOT NULL,
    created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

COMMENT ON TABLE mygraphile.product IS
'List of products.';

/*Create brand table in mygraphile schema*/
CREATE TABLE mygraphile.brand (
    id SERIAL PRIMARY KEY,
    brandname VARCHAR (255) NOT NULL,
    price NUMERIC(5,2),
    created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    product_id INTEGER NOT NULL REFERENCES mygraphile.product(id)
);

COMMENT ON TABLE mygraphile.brand IS
'List of brands.';

/*Create brand table in mygraphile schema*/
CREATE TABLE mygraphile.sale (
    id SERIAL PRIMARY KEY,
    quantity SMALLINT NOT NULL CHECK (quantity > 0),
    created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    store_id INTEGER NOT NULL REFERENCES mygraphile.store(id),
    brand_id INTEGER NOT NULL REFERENCES mygraphile.brand(id)
);

COMMENT ON TABLE mygraphile.sale IS
'Record of sales.';

schema

st.sql (dummy data)

\connect mypostgraphile;

/*Create some dummy users*/
INSERT INTO mygraphile.user (username) VALUES
('Benjie'),
('Singingwolfboy'),
('Lexius');

/*Create some dummy posts*/
INSERT INTO mygraphile.post (title, body, author_id) VALUES
('First post example', 'First post body', 1),
('Second post example', 'Second post body', 2),
('Third post example', 'Third post body', 3);

/*Create some dummy stores*/
INSERT INTO mygraphile.store (storename) VALUES
('Store A'),
('Store B'),
('Store C');

/*Create some dummy products*/
INSERT INTO mygraphile.product (productname) VALUES
('Product A'),
('Product B'),
('Product C');

/*Create some dummy brands*/
INSERT INTO mygraphile.brand (brandname, price, product_id) VALUES
('Brand A', 100.50, 1),
('Brand B', 200.50, 2),
('Brand C', 300.50, 1),
('Brand D', 400.50, 2),
('Brand E', 500.50, 3),
('Brand F', 600.00, 3),
('Brand G', 700.50, 1);

/*Create some dummy sales*/
INSERT INTO mygraphile.sale (quantity, store_id, brand_id) VALUES
(5, 1, 1),
(4, 2, 2),
(3, 3, 3),
(2, 2, 4),
(1, 3, 7),
(2, 1, 6),
(3, 2, 5),
(4, 3, 1),
(9, 2, 3),
(7, 1, 4);

Dockerfile for GraphQL -

FROM node:alpine
LABEL description="Instant high-performance GraphQL API for your PostgreSQL database https://github.com/graphile/postgraphile"

RUN npm install -g postgraphile
RUN npm install -g @graphile/pg-aggregates

EXPOSE 5000
ENTRYPOINT ["postgraphile", "-n", "0.0.0.0"]

graphql-Dockerfile

docker-compose.yaml

version: "3.3"
services:
  db:
    container_name: apu-postgraphile-db
    restart: always
    image: apu-postgraphile-db
    build:
      context: ./db
    volumes:
      - db:/var/lib/postgresql/data
    env_file:
      - ./.env
    networks:
      - network
    ports:
      - 5432
  graphql:
    container_name: apu-postgraphile-graphql
    restart: always
    image: apu-postgraphile-graphql
    build:
      context: ./graphql
    env_file:
      - ./.env
    depends_on:
      - db
    networks:
      - network
    ports:
      - 5433:5433
    command:
      [
        "--connection",
        "${DATABASE_URL}",
        "--port",
        "5433",
        "--schema",
        "mygraphile",
        "--append-plugins",
        "@graphile/pg-aggregates",
      ]

networks:
  network:

volumes:
  db:

docker-compose-yaml

package.json

{
  "name": "graphile-apps",
  "version": "0.0.1",
  "description": "api",
  "scripts": {
    "docker-build": "docker-compose build",
    "docker-run": "docker-compose up",
    "start": "npm run docker-build && npm run docker-run"
  },
  "engines": {
    "node": "15.x",
    "npm": "7.x"
  },
  "author": "apu",
  "license": "ISC",
  "dependencies": {
    "@graphile-contrib/pg-simplify-inflector": "^6.1.0",
    "db-migrate": "^0.11.12",
    "db-migrate-pg": "^1.2.2",
    "postgraphile": "^4.12.0-alpha.0",
    "@graphile/pg-aggregates": "^0.1.0"
  }
}

package

Here is the output that I get on graphiql -

Screen Shot 2021-04-23 at 8 45 04 AM

Please let me know if I missed anything which made me get null values for aggregates.

Too few types are exported in V5 plugin

Types here:

https://github.com/graphile/pg-aggregates/blob/v5/src/interfaces.ts

Not exported here:

https://github.com/graphile/pg-aggregates/blob/v5/src/index.ts

Definitely need AggregateSpec.

declare global {
  // eslint-disable-next-line @typescript-eslint/no-namespace
  namespace DataplanPg {
    interface PgCodecExtensions {
      isNumberLikeOrTimestamp: boolean;
    }
  }
  // FIXME: use GraphileBuild.Build extended by @graphile/pg-aggregates
  // eslint-disable-next-line @typescript-eslint/no-namespace
  namespace GraphileBuild {
    interface Build {
      pgAggregateSpecs: AggregateSpec[];
    }
  }
}

Not able to use pg-aggregates

I'm trying to use pg-aggregates but unfortunately I was not able to make it work.

First problems occurred during the installation I'm not sure if I do it right.
My approach, installation:

git clone https://github.com/graphile/pg-aggregates.git \
    && cd pg-aggregates \
    && yarn install \
    && yarn build

Execution:

postgraphile --connection postgres://<user>:<password>@<host>:<port>/<db> --schema public --watch --enhance-graphiql --append-plugins <path>/pg-aggregates/dist/index.js

postgrahile runs but I'm not able to execute any aggregation query.

What I tried:

{
  allPayments {
    nodes {
      aggregates{
        sum {
      		amount          
        }
      }
    }
  }
}

Maybe @benjie you can provide an example how to setup and use pg-aggregates.

Getting - Error: Option 'appendPlugins' should be an array of functions, found 'object' at index 0

Getting when using "PgAggregatesPlugin" plugin.

Error: Option 'appendPlugins' should be an array of functions, found 'object' at index 0
    at ensureValidPlugins (/data/intrepid/postpraphile-int/node_modules/postgraphile-core/node8plus/index.js:18:19)
    at getPostGraphileBuilder (/data/intrepid/postpraphile-int/node_modules/postgraphile-core/node8plus/index.js:169:5)
    at Object.watchPostGraphileSchema (/data/intrepid/postpraphile-int/node_modules/postgraphile-core/node8plus/index.js:245:62)
    at createGqlSchema (/data/intrepid/postpraphile-int/node_modules/postgraphile/build/postgraphile/postgraphile.js:108:65)
    at getPostgraphileSchemaBuilder (/data/intrepid/postpraphile-int/node_modules/postgraphile/build/postgraphile/postgraphile.js:58:30)
    at postgraphile (/data/intrepid/postpraphile-int/node_modules/postgraphile/build/postgraphile/postgraphile.js:261:53)
    at file:///data/intrepid/postpraphile-int/src/index.js:10:5
    at ModuleJob.run (node:internal/modules/esm/module_job:154:23)
    at async Loader.import (node:internal/modules/esm/loader:166:24)
    at async Object.loadESM (node:internal/process/esm_loader:68:5)

The issue does not happen when using another plugin.
It might also be related to the one mentioned here [https://github.com/graphile-contrib/pg-many-to-many/issues/64] mentioned for "pg-many-to-many" plugin.

The code for which I'm getting the exception.

import express from "express";
import { postgraphile } from 'postgraphile'
import ConnectionFilterPlugin from "postgraphile-plugin-connection-filter";
import PgAggregatesPlugin from "@graphile/pg-aggregates";
import PgManyToManyPlugin from '@graphile-contrib/pg-many-to-many'

const app = express();

app.use(
    postgraphile(
        process.env.DATABASE_URL || "postgres://XXX:YYYY*ZZZZZ@localhost:5432/TTTTT",
        "public",
        {
            watchPg: true,
            graphiql: true,
            enhanceGraphiql: true,
            extendedErrors: ['hint', 'detail', 'errcode'],
            appendPlugins: [PgAggregatesPlugin ]

        }
    )
);

app.listen(process.env.PORT || 5000);

And library versions:

  "dependencies": {
    "@graphile-contrib/pg-many-to-many": "^1.0.1",
    "@graphile-contrib/pg-simplify-inflector": "^6.1.0",
    "@graphile/pg-aggregates": "^0.1.0",
    "@types/node": "^15.12.5",
    "express": "^4.17.1",
    "postgraphile": "^4.12.5",
    "postgraphile-plugin-connection-filter": "^2.2.2",
    "prettier": "^2.3.2",
    "typescript": "^4.3.5"
  }

Removing limitation of stringified `keys`

Summary

The values in the keys list are always stringified, this is a known limitation due to interactions with GraphQL.

How about returning keys as JSON? It would make consuming non-trivial keys (like aggregation returning arrays) trivial on any frontend compared to parsing postgres-formatted strings.

I see that getting right proper type for each aggregation + field variant is not easy, but JSON keys is a change of just a couple of lines.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.