tvondra / sequential-uuids Goto Github PK

generator of sequential UUIDs

Home Page: https://blog.2ndquadrant.com/sequential-uuid-generators/

License: MIT License

Makefile 6.89% C 93.11%

sequential-uuids's Introduction

Sequential UUID generators

This PostgreSQL extension implements two UUID generators with sequential patterns, which helps to reduce random I/O patterns associated with regular entirely-random UUID.

Regular random UUIDs are distributed uniformly over the whole range of possible values. This results in poor locality when inserting data into indexes - all index leaf pages are equally likely to be hit, forcing the whole index into memory. With small indexes that's fine, but once the index size exceeds shared buffers (or RAM), the cache hit ratio quickly deteriorates.

Compare this to sequences and timestamps, which have a more sequential pattern and the new data almost always end up in the right-most part of the index (new sequence value is larger than all preceding values, same for timestamp). This results in a nicer and cache-friendlier behavior, but the values are predictable and may easily collide cross machines.

The main goal of the two generators implemented by this extension, is generating UUIDS in a more sequential pattern, but without reducing the randomness too much (which could increase the probability of collision and predictability of the generated UUIDs). This idea is not new, and it's pretty much what the UUID wikipedia article [1] calls COMB (combined-time GUID) and is more more thoroughly explained in [2].

The benefits (performance, reduced amount of WAL, ...) are demonstrated in a blog post on 2ndQuadrant site [3].

Generators

The extension provides two functions generating sequential UUIDs using either a sequence or timestamp.

uuid_sequence_nextval(sequence regclass, block_size int default 65536, block_count int default 65536)
uuid_time_nextval(interval_length int default 60, interval_count int default 65536) RETURNS uuid

The default values for parameters are selected to work well for a range of workloads. See the next section explaining the design for additional information about the meaning of those parameters.

Design

The easiest way to make UUIDs more sequential is to use some sequential value as a prefix. For example, we might take a sequence or a timestamp and add random data until we have 16B in total. The resulting values would be almost perfectly sequential, but there are two issues with it:

reduction of randomness - E.g. with a sequence producing bigint values this would reduce the randomness from 16B to 8B. Timestamps do reduce the randomness in a similar way, depending on the accuracy. This increases both the collision probability and predictability (e.g. it allows determining which UUIDs were generated close to each other, and perhaps the exact timestamp).
bloat - If the values only grow, this may result in bloat in indexes after deleting historical data. This is a well-known issue e.g. with indexes on timestamps in log tables.

To address both of these issues, the implemented generators are designed to wrap-around regularly, either after generating a certain number of UUIDs or some amount of time. In both cases, the UUIDs are generates in blocks and have the form of

(block ID; random data)

The size of the block ID depends on the number of blocks and is fixed (depends on generator parameters). For example with the default 64k blocks we need 2 bytes to store it. The block ID increments regularly, and eventually wraps around.

For sequence-based generators the block size is determined by the number of UUIDs generated. For example we may use blocks of 256 values, in which case the two-byte block ID may be computed like this:

(nextval('s') / 256) % 65536

So the generator wraps-around every ~16M UUIDs (because 256 * 65536).

For timestamp-based generators, the block size is defined as interval length, with the default value 60 seconds. As the default number of blocks is 64k (same as for sequence-based generators), the block may be computed like this

(timestamp / 60) % 65536

Which means the generator wraps around every ~45 days.

Supported Releases

Currently, this extension works only on releases since PostgreSQL 10. It can be made working on older releases with some minor code tweaks if someone wants to spend a bit of time on that.

[1] https://en.wikipedia.org/wiki/Universally_unique_identifier

[2] http://www.informit.com/articles/article.aspx?p=25862

[3] https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/

sequential-uuids's People

Contributors

Stargazers

Watchers

Forkers

maurojr 7u4 thinkwelltwd keke8273 dmoose zhangchye m-creations sehrope yourkeychen diegofsousa emersongaudencio weexp icodein tsingson m8e quantumghost

sequential-uuids's Issues

Consider using releases

Using GitHub releases makes it easier to download specific "blessed" versions of the source than it is checking out commits.

Random portion should be using cryptographic randomness

The random part of the sequential UUID is being generated with non-cryptographic random(). It should be using something like pg_strong_random(...) instead:

diff --git a/sequential_uuids.c b/sequential_uuids.c
index ae5dbac..275d0f4 100644
--- a/sequential_uuids.c
+++ b/sequential_uuids.c
@@ -149,12 +149,10 @@ uuid_time_nextval(PG_FUNCTION_ARGS)
        for (i = 0; i < prefix_bytes; i++)
                uuid->data[i] = p[prefix_bytes - 1 - i];
 
-       /*
-        * TODO optimize this by using larger chunks of the random value
-        * (it should be 4B in most cases)
-        */
-       for (i = prefix_bytes; i < UUID_LEN; i++)
-               uuid->data[i] = (random() % 256);
+       if(!pg_strong_random(uuid.data + prefix_bytes, UUID_LEN - prefix_bytes))
+               ereport(ERROR,
+                               (errcode(ERRCODE_INTERNAL_ERROR),
+                                errmsg("could not generate random values")));
 
        PG_RETURN_UUID_P(uuid);
 }

May want to consider setting the v4 UUID version flags as well:

https://github.com/postgres/postgres/blob/517bf2d9107f0d45c5fea2e3904e8d3b10ce6bb2/src/backend/utils/adt/uuid.c#L430-L435

I've never ran into anything that cares about the UUID version flags and it makes the values slightly less random since you need to replace those bytes, but worth considering to make it more spec friendly.

Pure pgSQL implementation

Hello, I created a pure pgSQL implementation of the time-based generation method.

Hopefully this is useful for those of us using a service like AWS RDS where we can't install C-based extensions.

It doesn't support customizing interval_count, but that should be trivial to add.

CREATE OR REPLACE FUNCTION generate_sequential_uuid(p_interval_length int DEFAULT 60)
  RETURNS uuid
  LANGUAGE plpgsql
AS $$
DECLARE
  v_i int;
  v_time bigint;
  v_bytes int[16] = '{}';
  v_hex text[16] = '{}';
BEGIN
  v_time := floor(extract(epoch FROM clock_timestamp()) / p_interval_length);
  v_bytes[1] := v_time >> 8 & 255;
  v_bytes[2] := v_time & 255;

  FOR v_i IN 3..16 LOOP
    v_bytes[v_i] := floor(random() * 256);
  END LOOP;

  FOR v_i IN 1..16 LOOP
    v_hex[v_i] := lpad(to_hex(v_bytes[v_i]), 2, '0');
  END LOOP;

  RETURN array_to_string(v_hex, '');
END $$;

simple docker image with sequential-uuids

For personal use, I made a quite simple docker image, just adding this extension to the official postgres docker image (12, 12-alpine, 13, 13-alpine).

For anyone who might be interested, pls check https://hub.docker.com/r/yuuuxt/postgres_seq_uuid

Add to core ?

Hello,

Thanks for your work on this subject !

Do you plan to propose to postgres hackers as a core functionality or contrib ?

I am afraid, if it is not in the core or contrib, user will be reluctant to use this. We want to be sure this feature will be available in the future, and we are not tied to an extension.

Thanks!

Documentation about Install and usage

The documentation left me puzzled about how to use it in practice:
[documentation]
Prerequires:
postgresql-server-dev-11
Install:
git clone https://github.com/tvondra/sequential-uuids.git
cd sequential-uuids/
make
sudo make install
inside PostgreSQL:
CREATE EXTENSION sequential-uuids;
[/documentation]

The really puzzling part from my side is that I have no clue what to do next in order to get a sequenced UUID into my table!

db=# CREATE TABLE names(id uuid, name varchar(40), PRIMARY KEY (id) );
db=# INSERT INTO names VALUES (uuid_generate_v4(),'Abraham Lincon');
INSERT 0 1
db=# select * from names;
id | name
--------------------------------------+----------------
1a9d00a2-670f-4a8a-8c8b-eea45952411a | Abraham Lincon
(1 row)

db=# INSERT INTO names VALUES (uuid_sequence_nextval('1a9d00a2-670f-4a8a-8c8b-eea45952411a'),'Some Otherdude');
ERROR: relation "1a9d00a2-670f-4a8a-8c8b-eea45952411a" does not exist
LINE 1: INSERT INTO names VALUES (uuid_sequence_nextval('1a9d00a2-67...

Or it the first uuid inserted in a wrong way? 00000000-0000-0000-0000-000000000000 doesn't work either.