Table of Contents generated with DocToc
- Purpose
- Value Producers and the Cache
- API
- Complete Demo
- Updates
- To Do
InterShop Lazy provides a way to (almost) transparently store results of costly computations in a table for later retrieval. See the demo, below.
-
Lazy value producer: This is a function that will be generated by calling
LAZY.create_lazy_producer()
. It will try to retrieve to match its arguments against cached entries in tableLAZY.cache
; if it can't find an entry, it will call its associated eager value producers with the same arguments. -
An eager value producer is a function that will either return a value to be inserted in the cache, or else insert zero or more values itself, depending on which solution is better suited to your use case.
-
Value producers are assumed to be 'immutable' functions in the PostGreSQL sense, i.e. 'pure functions' in the functional sense. In other words, it is assumed that the return value of a value producer will depend on nothing but its arguments. Therefore,
-
Functions that return random values or values that depend on implicit variables cannot be cached. But observe that caching file contents may be OK if it can be assumed that file contents can be treated as constant for the puproses of your application.
-
It is not allowed to update
LAZY.cache
, only insertions and deletions are legal. -
Because the lazy value producer must be able to insert records into the cache, this means that is has to be marked
volatile
, and this entails that PostGreSQL will not cache results as it would try to do with functions markedimmutable
. That is, using InterShop Lazy will forgo one cache mechanism in favor of another one, and one that is probably slower. The advantage of InterShop Lazy though is that data can be cached across sessions whichimmutable
functions won't. Be it said that none of the demos constitutes a valid use case as far as value production cost is concerned.
LAZY.create_lazy_producer()
(returns void
) will create a function that uses table LAZY.cache
to
produce values in a lazy fashion. Its arguments are:
function_name
(text
)—name of function to be created.parameter_names
(text[]
)—names of arguments to the getter.parameter_types
(text[]
)—types of arguments to the getter.return_type
(text,
)—applied to jsonb value.bucket
(text default null
)—name of bucket; defaults tofunction_name
.get_key
(text default null
)—optional, default is JSON list / object of values.get_update
(text default null
)—optional, this x-orperform_update
must be given.perform_update
(text default null
)—optional, this x-orget_update
must be given.
Points to keep in mind:
- All names used in calls to
create_lazy_producer()
will be used as-is without any kind of sanity check or quoting. - The same goes for the other arguments.
- Usage of
create_lazy_producer()
is inherently unsafe; therefore, no untrusted data (such as coming from a web form as data source) should be used to call this function (although the function thatcreate_lazy_producer()
creates is itself deemed safe).
LAZY.nullify( jsonb ) returns jsonb
—Givennull
or anyjsonb
value, returnnull
when the input isnull
or'null'::jsonb
(i.e. thenull
value of JSONB, which is distinct from SQL null), or the value itself otherwise. This method helps to prevent errors likecannot cast jsonb null to type x
: instead of( key->0 )::integer
, write( LAZY.nullify( key->0 ) )::integer
to obtain SQLnull
for cases where a (subvalue of a) cache column might containnull
values.
The below code can be seen in action by running
psql -f lazy.demo-1.sql
andpsql -f lazy.demo-2.sql
.
Again, none of the below demos constitutes a valid use case as far as value production cost is concerned. To amortize use of a Lazy cache the computation will typically involve things like reading from the file system or the network, or do some kind of lengthy data aggregation.
In this demo, we want to write two lazy value producers that compute multiples of floats and sums of integers; for the first, we will use an eager value getter that will only procude one value per call; for the sums, we will have a look at an eager value inserter that guesses values that might be used in the future and inserts them into the cache table for later use. Additionally, we will set up custom cache views that make read-only access to cached values easier than looking at the cache table directly. So let's get started.
The first thing to do when one wants to use lazy evaluation with InterShop Lazy is to provide a function
that accepts arguments as required for the task at hand and that returns a value v
of type T
that will
be serialized v::text
and deserialized as v::T
.
Of course, using lazy evaluation makes only sense when one is dealing with costly computations, so typically
an eager value producer would involve stuff like network access, sifting through huge tables or maybe
reading in data files, that kind of IO- or CPU-heavy stuff. To keep things simple, let's just multiply
integers and throw in the quirk that multiples of 13
will produce null
values for no obvious reason. For
good measure, we also want to report any calls to the console which is what the raise notice
statement is
for:
create function MYSCHEMA.compute_product( ¶n float, ¶factor float )
returns float immutable called on null input language plpgsql as $$ begin
raise notice 'MYSCHEMA.compute_product( %, % )', ¶n, ¶factor;
if ( ¶n is null ) or ( ¶factor is null ) then return 0; end if;
if ¶n != 13 then return ¶n * ¶factor; end if;
return null; end; $$;
This function is called an eager value producer because it is expected to actually compute a result for each
time it gets called. Observe we have defined it as immutable called on null input
, meaning that it will be
called even if one of its arguments is null
; this we exploit to return 0
as the product whenever one of
the factors is SQL null
. Had we used strict
instead, PostGreSQL would have eschewed calling the function
at all and filled in a null
, which may or may not be what you want.
Now that we have an eager value producer, let's define a lazy value producer that uses results from the
LAZY.cache
table where possible and manages updating it where values are missing. To do this, we have to
call LAZY.create_lazy_producer()
:
select LAZY.create_lazy_producer(
function_name => 'MYSCHEMA.get_product',
parameter_names => '{¶n,¶factor}',
parameter_types => '{float,float}',
return_type => 'float',
get_update => 'MYSCHEMA.compute_product' );
Observe that the select
statement will both create MYSCHEMA.get_product()
and return the source text for
that new function, which may end up in the console or wherever your SQL output goes, so you might want to
use do $$ begin perform LAZY.create_lazy_producer( ... ); end; $$;
instead.
The first argument here is the name of the new function to be created; the next 3 arguments basically repeat
the declarative part to that function (a future version of InterShop Lazy might auto-generate
parameter_names
, parameter_types
and return_type
).
There's just one more required argument, either get_update
or perform_update
; exactly one of these two
must be set to the name of a function that either
- in the case of
get_update()
, will return exactly one result for each set of inputs, or - in the case of
perform_update()
, may insert as many rows intoLAZY.cache
as seen fit when called. In caseperform_update()
happened to not produce a result line that matches the input arguments, a line with resultnull
will be auto-generated.
We're now ready to put our caching, lazy multiplicator device to use. For this, we set up a table of factors and update it with the computation results:
create table MYSCHEMA.fancy_products (
n float,
factor float,
result float );
insert into MYSCHEMA.fancy_products ( n, factor ) values
( 123, 456 ),
( 4, 12 ),
( 5, 12 ),
( 6, 12 ),
( 6, 12 ),
( 6, 12 ),
( 6, 12 ),
( 6, 12 ),
( 6.3, 12 ),
( 60, 3 ),
( 13, 13 ),
( 1, null ),
( null, null ),
( null, 100 );
update MYSCHEMA.fancy_products set result = MYSCHEMA.get_product( n, factor );
select * from MYSCHEMA.fancy_products order by n, factor;
select * from LAZY.cache order by bucket, key;
This will produce the following output:
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 123, 456 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 4, 12 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 5, 12 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 6, 12 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 6.3, 12 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 60, 3 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 13, 13 )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( 1, <NULL> )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( <NULL>, <NULL> )
psql:lazy.demo-1.sql:76: NOTICE: MYSCHEMA.compute_product( <NULL>, 100 )
╔═════╤════════╤════════╗
║ n │ factor │ result ║
╠═════╪════════╪════════╣
║ 1 │ ∎ │ 0 ║
║ 4 │ 12 │ 48 ║
║ 5 │ 12 │ 60 ║
║ 6 │ 12 │ ∎ ║
║ 6 │ 12 │ ∎ ║
║ 6 │ 12 │ ∎ ║
║ 6 │ 12 │ 72 ║
║ 6 │ 12 │ ∎ ║
║ 6.3 │ 12 │ 75.6 ║
║ 13 │ 13 │ ∎ ║
║ 60 │ 3 │ 180 ║
║ 123 │ 456 │ 56088 ║
║ ∎ │ 100 │ 0 ║
║ ∎ │ ∎ │ 0 ║
╚═════╧════════╧════════╝
╔══════════════════════╤══════════════╤═══════╗
║ bucket │ key │ value ║
╠══════════════════════╪══════════════╪═══════╣
║ MYSCHEMA.get_product │ [null, null] │ 0 ║
║ MYSCHEMA.get_product │ [null, 100] │ 0 ║
║ MYSCHEMA.get_product │ [1, null] │ 0 ║
║ MYSCHEMA.get_product │ [4, 12] │ 48 ║
║ MYSCHEMA.get_product │ [5, 12] │ 60 ║
║ MYSCHEMA.get_product │ [6, 12] │ 72 ║
║ MYSCHEMA.get_product │ [6.3, 12] │ 75.6 ║
║ MYSCHEMA.get_product │ [13, 13] │ ∎ ║
║ MYSCHEMA.get_product │ [60, 3] │ 180 ║
║ MYSCHEMA.get_product │ [123, 456] │ 56088 ║
╚══════════════════════╧══════════════╧═══════╝
The above shows that although some inputs were repeated in the fancy_products
tables, none of the
repetitions led to additional calls to the eager producer or to entries in the cache.
In this demo, let me demonstrate how to write a speculative value producer, that is, a producer that does more work when called than is strictly necessary in order to avoid getting called more often. Such a behavior might shave off some computation time when the act of computing a single value is associated with a high overhead cost that remains more or less constant no matter how many values are produced, and we have either a way to somehow predict what other values might get requested in the future when given a set of inputs, or we can somehow make sure a given subdomain of values can be exhaustively cached. For example, there might be a configuration file that must be laoded and parsed in order to obtain a single configuration setting in the file; in such a case, it might be advantageous to cache all the settings from the file whenever any setting is requested so that future setting requests can be answered by a table lookup.
The eager value producer of the first example was an immutable function that returns a single value. In order to do speculative caching, we'll need a volatile function that updates the cache directly. In its most basic form, such a cache updating function might look like this:
create function MYSCHEMA.insert_sums_single_row( ¶a integer, ¶b integer )
returns void volatile called on null input language plpgsql as $$ begin
raise notice 'MYSCHEMA.insert_sums( %, % )', ¶a, ¶b;
insert into LAZY.cache ( bucket, key, value ) values
( 'yeah! sums!', jsonb_build_array( ¶a, ¶b ), ¶a + ¶b );
-- ^^^^ bucket ^^^^ key ^^^^^^^ value
end; $$;
This is only the basic shape however and will only insert a single row which, as such, is not very
speculative. Here is an improved version that guesses a few sums that might pop up in the future—namely,
given [ a, b ]
, it will compute a + b - 1
, a + b
, and a + b + 1
, minus those sums that already have
an cache entry:
create function MYSCHEMA.insert_sums( ¶a integer, ¶b integer )
returns void volatile called on null input language plpgsql as $$ begin
raise notice 'MYSCHEMA.insert_sums( %, % )', ¶a, ¶b;
insert into LAZY.cache ( bucket, key, value ) select
'yeah! sums!' as bucket,
r2.key as key,
¶a + r1.bb as value
from generate_series( ¶b - 1, ¶b + 1 ) as r1 ( bb ),
lateral jsonb_build_array( ¶a, r1.bb ) as r2 ( key ),
where not exists ( select 1 from LAZY.cache as r4
where ( bucket = 'yeah! sums!' ) and ( r4.key = r2.key ) );
end; $$;
Given insert_sums()
, we can now create its lazy version, get_sum()
:
select LAZY.create_lazy_producer(
function_name => 'MYSCHEMA.get_sum',
parameter_names => '{¶a,¶b}',
parameter_types => '{integer,integer}',
return_type => 'integer',
bucket => 'yeah! sums!',
perform_update => 'MYSCHEMA.insert_sums' );
Just as in the first demo, let's use a table to store results. This time round, we generate the data:
create table MYSCHEMA.fancy_sums (
a integer,
b integer,
result integer );
insert into MYSCHEMA.fancy_sums ( a, b )
select 7, b from generate_series( 1, 10 ) as x ( b );
update MYSCHEMA.fancy_sums set result = MYSCHEMA.get_sum( a, b );
select * from LAZY.cache order by bucket, key;
And this is the output; notice that while the cache has gained 11 entries, only 5 calls to the eager producer was necessary:
psql:lazy.demo-2.sql:72: NOTICE: MYSCHEMA.insert_sums( 7, 1 )
psql:lazy.demo-2.sql:72: NOTICE: MYSCHEMA.insert_sums( 7, 3 )
psql:lazy.demo-2.sql:72: NOTICE: MYSCHEMA.insert_sums( 7, 5 )
psql:lazy.demo-2.sql:72: NOTICE: MYSCHEMA.insert_sums( 7, 7 )
psql:lazy.demo-2.sql:72: NOTICE: MYSCHEMA.insert_sums( 7, 9 )
╔═════════════╤═════════╤═══════╗
║ bucket │ key │ value ║
╠═════════════╪═════════╪═══════╣
║ yeah! sums! │ [7, 0] │ 7 ║
║ yeah! sums! │ [7, 1] │ 8 ║
║ yeah! sums! │ [7, 2] │ 9 ║
║ yeah! sums! │ [7, 3] │ 10 ║
║ yeah! sums! │ [7, 4] │ 11 ║
║ yeah! sums! │ [7, 5] │ 12 ║
║ yeah! sums! │ [7, 6] │ 13 ║
║ yeah! sums! │ [7, 7] │ 14 ║
║ yeah! sums! │ [7, 8] │ 15 ║
║ yeah! sums! │ [7, 9] │ 16 ║
║ yeah! sums! │ [7, 10] │ 17 ║
╚═════════════╧═════════╧═══════╝
Note that in case an eager inserting value producer should not insert a value for a given requested key, the
lazy value producer will then auto-generate a row with null
value. This behavior may become configurable
in a future version.
Since it may sometimes be useful to have a view on the data already cached by a given lazy producer, here's
how to do it. It is basically straightforward: the key is by default formed by building a jsonb
array of
the arguments (although one can define one's own keying method); likewise, values are stored as jsonb
values so these have to be converted back to the intended data type. Which is not difficult to do, except
that jsonb
has its own null
value which, unlike SQL null
, cannot be cast; this is what
LAZY.nullify()
is for:
create view MYSCHEMA.products as ( select
( LAZY.nullify( key->0 ) )::float as n,
( LAZY.nullify( key->1 ) )::float as factor,
( value )::float as product
from LAZY.cache
where bucket = 'MYSCHEMA.get_product'
order by n desc, factor desc );
╔═════╤════════╤═════════╗
║ n │ factor │ product ║
╠═════╪════════╪═════════╣
║ ∎ │ ∎ │ 0 ║
║ ∎ │ 100 │ 0 ║
║ 123 │ 456 │ 56088 ║
║ 60 │ 3 │ 180 ║
║ 13 │ 13 │ ∎ ║
║ 6.3 │ 12 │ 75.6 ║
║ 6 │ 12 │ 72 ║
║ 5 │ 12 │ 60 ║
║ 4 │ 12 │ 48 ║
║ 1 │ ∎ │ 0 ║
╚═════╧════════╧═════════╝
- Composite Values—Sometimes one will want to cache composite data types. This is fine with InterShop
Lazy because composite user-defined types (
create type xxx as ( field1 type1, field2 type2 )
) are serializable to text just as atomic values. However, when deserializing these values an extra pair of parentheses has to be used (as in(value::xxx).field1
) because of restrictions in PostGreSQL's grammar. Seelazy.demo-3.sql
for an example.
- v1.0.0—Values are now stored as
text
rather than asjsonb
, the reason being that- all datatypes have a
text
serialization (this is mandatory as DB dumps could otherwise not be produced), but not all datatypes can be expressed in JSON in a straightforward way, meaning customized conversion functionsmytype_from_jsonb()
/mytype_to_jsonb()
are needed where otherwise simple type castsmyvalue::text
/textvalue::mytype
would be sufficient. - While quering
jsonb
values is much more flexible, one will typically search over keys, not results; where quering against result values is needed, one can still cast thetext
values (e.g. in a (materialized) view).
- all datatypes have a
- Documentation
- Tests
-
feat:txtcache
—consider to optionally use a text-based cache -
(dropped in preference forfeat:hashidx
—implement Hash indexes onbucket
,key
(only interested in equality, otherwise might use BTree)feat:remgetupdate
,feat:dedicache
) -
feat:remgetupdate
—remove parameterget_update
, always useperform_update
(rename toinsert_rows
orproducer
as function will not perform SQLupdate
s butinsert
s) -
feat:remgetkey
—remove parameterget_key
? This would naturally fall out offeat:dedicache
-
feat:dedicache
—consider to always generate a dedicated cache table with the appropriate columns as per value producer arguments, including index; in that case, may also move from using a serialized-value column to using any number of result columns (which could be indexed if need be, but should be done manually by user)-
feat:dedicache/return_type
—abolish, should useinput_names
,input_types
,output_names
,output_types
, (or elsekey_names
, ...value_names
, ...). Input and output names become names for columns of dedicated cache table. Insofar maybe advantageous to adoptfeat:dedicache/optcreatecache:never
, instead require user sets up dedicated cache table, then just passes in name of that table, we can then derive input parameters from primary key of that table -
feat:dedicache/optcreatecache
—should cache table never / optionally / always be created automatically? Could require user to manually create cache, only provide generated function that simplifies using that cache. Start by implementing only casecreate_cache = false
, allowcreate_cache = true
later -
feat:dedicache/docdeforder
—document that producer must be defined only aftercreate_lazy_producer()
has been called and cache table established, otherwise produce would refer to non-existant DB object
-