Data Access Generator

Purpose and Features

Use this package in your software build process to generate SQL/JSON SELECT queries and matching result types. SQL INSERT/UPDATE/DELETE ("mod") statements can also be generated.

Queries

The SQL generated by Dagen employs the SQL/JSON features of supporting databases so that data from a graph of related tables can be fetched in a single query operation, resulting in one or more rows of json values. Matching Java types can be generated with the SQL queries to which query results can be directly deserialized using common libraries such as Jackson for Java. This tool has no runtime component. The generated SQL and Java types can be easily used directly with JDBC and a JSON deserialization library.

Queries are described in a yaml file which should conform to the json schema at example/editor-config/query-specs-schema.json. It's recommended to configure your editor if possible to associate the query specification file with this json schema, to enable suggestions/auto-complete and highlighting of format errors.

Query specifications file

The query specifications yaml file describes one or more queries which are to be generated.

Schema qualification fields

defaultSchema: <schema name> The default schema specified here will be the assumed schema for and unqualified table names that appear elsewhere in the file.

generateUnqualifiedNamesForSchemas: [<schema names, comma-separated>] Any schema in this list will have its objects represented in the generated SQL as unqualified names. Make sure that such unqualified names will be resolved correctly when the SQL statements are executed on your database connections, such as by connecting directly to the schema listed here (if just one), or by having these schemas on your schema search path if supported by your database.

Query specifications

Under the querySpecs key are listed any number of query specifications for which SQL and optionally types will be generated.

General query description

  - queryName: drugs query
    resultRepresentations: [JSON_OBJECT_ROWS] # (and/or JSON_ARRAY_ROW, MULTI_COLUMN_ROWS)
    generateResultTypes: true

queryName is the name of the query, from which the name of the SQL resource file is derived, as well as the Java class name containing result types and query parameter information. A name of 'my favorite query' for example will generate a SQL resource file named 'my favorite query().sql' (results representations are described below). If types generation is enabled then a Java class named MyFavoriteQuery is also produced. The Java class also contains the name of the generated resource file in a static member, so in code it's best to obtain the SQL resource file name from this class member.

resultRepresentation is a list of results representations to be generated, with one SQL file produced for each representation. The choices are as follows:

JSON_OBJECT_ROWS (the default)

In this representation, each result record of the top-most table in the query is represented by a result row having one json-valued column. Within each json value will be any fields specified in the top table and any nested data from rows of other tables which are related to the given row as specified in the query specification.
JSON_ARRAY_ROW

With this representation, the SQL query will yield a single json array value in result set of just one row and column. Within the json array are json objects representing all result rows of the top level table, together with any nested data specified from related tables as specified in the query specification.
MULTI_COLUMN_ROWS

In this representation, the generated SQL can yield multiple rows and multiple columns. The columns of the result rows are those selected from the top level table itself plus any columns representing related parent and child records.

generateResultTypes This field controls whether to generate source code for result types for this query. Defaults to true.

Table JSON specification

Next we describe the top-level table for the query. TODO

Motivation

Fetch nested data spanning any number of related tables via a single query

The generated queries employ the SQL/JSON capabilities of supported databases, currently PostgreSQL and Oracle, to allow fetching data from any number of related tables in a single query operation. Queries can be configured to either return data as a single aggregate json array containing all results, or streamed via multiple result rows.

Result type generation

Types can also be generated for each query which will match the query results structure precisely, including only the fields actually selected from source tables in each specific part of the query. Field nullability is also represented in generated types, as derived from database metadata. The query results can be directly serialized to the top result type using common libraries, such as Jackson in Java, or via direct cast in TypeScript.

Find errors in data access code at build time

The database metadata which is generated by the tool allows for validation of any table and field names and table relationships referenced in a query or modification statement specification. This enusures that the queries and types generated are consistent with the state of the database structure, with failures indicated at build time. An additional level of safety is afforded by accessing query results only via the auto-generated result types which are guaranteed to match the actual query results.

Parameter safety

Both queries and modification statements can be parameterized. For the common case that parameters found in these statements are bound directly to table fields, for either comparison in the case of queries, or to set/insert data for mod statements, such parameters are declared in generated source code as constants with names derived from the field name. This makes it easy to keep parameters valid vs the fields they target as changes to either the database or the query/mod statements are made, with most errors identified at statement generation time.

Why not just use joins to fetch related table data?

Joins alone are not the right tool for fetching data for the important case that data from independent child tables of a parent table are needed in a query. A join of a parent with independent child tables joined to the parent yields results representing all combinations of the rows of the independent child tables, whereas ideally we only need to fetch a number of rows equal to the sum of the numbers of child rows related to the parent. Likewise the join approach would produce a large amount of duplicated data (which occures to a lesser degree even with a simple parent/child join), which would have to be "de-convolved" in the receiving client from the combinations introduced by the cartesion product operation. Using joins for such a case would be inefficient, not to mention tedious and error-prone - definitely not the right tool for this job!

Instead, we can fetch the child collections as sub-selects of the parent query, using aggregating SQL/JSON functions such as json_arrayagg() to collect related values, and utilizing json_object() to make the json objects themselves. This method works wonderfully to fetch nested data, but the queries can be difficult to write and to verify to be correct manually. It's one of the primary purposes of this library to generate these kinds of queries easily and matching types, using a simple specification format which is checked against database metadata for correctness.

Workflow

Generate database metadata, stored in a json or yaml file. This should be done whenever the database has changes that should be incorporated.
Create a queries specification file, describing for each query a starting or "top" table, its fields to be included, any parent and child tables of the top table, fields and parent and child tables of those parent and child tables, and so on to any depth.
Run the tool specifying the above two files and the output directories for the generated SQL and source code files. This step can be run as part of the application build process.
At application run time, load the SQL resource file for a query, submit to the database, specifying any embedded parameter values, and deserialize results to the generated top level type for the query.

Example

TODO: diagram showing independent child tables N1 + N2 + N3 results for SQL/JSON N1 * N2 * N3 from a join.

dd482it / dagen-wpi-2 Goto Github PK

dagen-wpi-2's Introduction