Git Product home page Git Product logo

cuid2's Introduction

Cuid2

Secure, collision-resistant ids optimized for horizontal scaling and performance. Next generation UUIDs.

Need unique ids in your app? Forget UUIDs and GUIDs which often collide in large apps. Use Cuid2, instead.

Cuid2 is:

  • Secure: It's not feasible to guess the next id, existing valid ids, or learn anything about the referenced data from the id. Cuid2 uses multiple, independent entropy sources and hashes them with a security-audited, NIST-standard cryptographically secure hashing algorithm (Sha3).
  • Collision resistant: It's extremely unlikely to generate the same id twice (by default, you'd need to generate roughly 4,000,000,000,000,000,000 ids (sqrt(36^(24-1) * 26) = 4.0268498e+18) to reach 50% chance of collision.)
  • Horizontally scalable: Generate ids on multiple machines without coordination.
  • Offline-compatible: Generate ids without a network connection.
  • URL and name-friendly: No special characters.
  • Fast and convenient: No async operations. Won't introduce user-noticeable delays. Less than 5k, gzipped.
  • But not too fast: If you can hash too quickly you can launch parallel attacks to find duplicates or break entropy-hiding. For unique ids, the fastest runner loses the security race.

Cuid2 is not good for:

  • Sequential ids (see the note on K-sortable ids, below)
  • High performance tight loops, such as render loops (if you don't need cross-host unique ids or security, consider a simple counter for this use-case, or try Ulid or NanoId).

Getting Started

npm install --save @paralleldrive/cuid2

Or

yarn add @paralleldrive/cuid2
import { createId } from '@paralleldrive/cuid2';

const ids = [
  createId(), // 'tz4a98xxat96iws9zmbrgj3a'
  createId(), // 'pfh0haxfpzowht3oi213cqos'
  createId(), // 'nc6bzmkmd014706rfda898to'
];

Using Jest? Jump to Using with Jest.

Configuration

import { init } from '@paralleldrive/cuid2';

// The init function returns a custom createId function with the specified
// configuration. All configuration properties are optional.
const createId = init({
  // A custom random function with the same API as Math.random.
  // You can use this to pass a cryptographically secure random function.
  random: Math.random,
  // the length of the id
  length: 10,
  // A custom fingerprint for the host environment. This is used to help
  // prevent collisions when generating ids in a distributed system.
  fingerprint: 'a-custom-host-fingerprint',
});

console.log(
  createId(), // wjfazn7qnd
  createId(), // cerhuy9499
  createId(), // itp2u4ozr4
);

Validation

import { createId, isCuid } from '@paralleldrive/cuid2';


console.log(
  isCuid(createId()), // true
  isCuid('not a cuid'), // false
);

Trusted By

Why?

Ids should be secure by default for the same reason that browser sessions should be secure by default. There are too many things that can go wrong when they're not, and insecure ids can cause problems in unexpected ways, including unauthorized user account access, unauthorized access to user data, and accidental leaks of user's personal data which can lead to catastrophic effects, even in innocent-sounding applications like fitness run trackers (see the 2018 Strava Pentagon breach and PleaseRobMe).

Not all security measures should be considered equal. For example, it's not a good idea to trust your browser's "Cryptographically Secure" Psuedo Random Number Generator (CSPRNG) (used in tools like uuid and nanoid). For example, there may be bugs in browser CSPRNGs. For many years, Chromium's Math.random() wasn't very random at all. Cuid was created to solve the issue of untrustworthy entropy in id generators that led to frequent id collisions and related problems in production applications. Instead of trusting a single source of entropy, Cuid2 combines several sources of entropy to provide stronger security and collision-resistance guarantees than are available in other solutions.

Modern web applications have different requirements than applications written in the early days of GUID (globally unique identifiers) and UUIDs (universally unique identifiers). In particular, Cuid2 aims to provide stronger uniqueness guarantees than any existing GUID or UUID implementation and protect against leaking any information about the data being referenced, or the system that generated the id.

Cuid2 is the next generation of Cuid, which has been used in thousands of applications for over a decade with no confirmed collision reports. The changes in Cuid2 are significant and could potentially disrupt the many projects that rely on Cuid, so we decided to create a replacement library and id standard, instead. Cuid is now deprecated in favor of Cuid2.

Entropy is a measure of the total information in a system. In the context of unique ids, a higher entropy will lead to fewer collisions, and can also make it more difficult for an attacker to guess a valid id.

Cuid2 is made up of the following entropy sources:

  • An initial letter to make the id a usable identifier in JavaScript and HTML/CSS
  • The current system time
  • Pseudorandom values
  • A session counter
  • A host fingerprint

The string is Base36 encoded, which means it contains only lowercase letters and the numbers: 0 - 9, with no special symbols.

Horizontal scalability

Today's applications don't run on any single machine.

Applications might need to support online / offline capability, which means we need a way for clients on different hosts to generate ids that won't collide with ids generated by other hosts -- even if they're not connected to the network.

Most pseudo-random algorithms use time in ms as a random seed. Random IDs lack sufficient entropy when running in separate processes (such as cloned virtual machines or client browsers) to guarantee against collisions. Application developers report v4 UUID collisions causing problems in their applications when the ID generation is distributed between lots of machines such that lots of IDs are generated in the same millisecond.

Each new client exponentially increases the chance of collision in the same way that each new character in a random string exponentially reduces the chance of collision. Successful apps scale at hundreds or thousands of new clients per day, so fighting the lack of entropy by adding random characters is a recipe for ridiculously long identifiers.

Because of the nature of this problem, it's possible to build an app from the ground up and scale it to a million users before this problem is detected. By the time you notice the problem (when your peak hour use requires dozens of ids to be created per ms), if your db doesn't have unique constraints on the id because you thought your guids were safe, you're in a world of hurt. Your users start to see data that doesn't belong to them because the db just returns the first ID match it finds.

Alternatively, you've played it safe and you only let your database create ids. Writes only happen on a master database, and load is spread out over read replicas. But with this kind of strain, you have to start scaling your database writes horizontally, too, and suddenly your application starts to crawl (if the db is smart enough to guarantee unique ids between write hosts), or you start getting id collisions between different db hosts, so your write hosts don't agree about which ids represent which data.

Performance

Id generation should be fast enough that humans won't notice a delay, but too slow to feasibly brute force (even in parallel). That means no waiting around for asynchronous entropy pool requests, or cross-process/cross-network communication. Performance slows to impracticality in the browser. All sources of entropy need to be fast enough for synchronous access.

Even worse, when the database is the only guarantee that ids are unique, that means that clients are forced to send incomplete records to the database, and wait for a network round-trip before they can use the ids in any algorithm. Forget about fast client performance. It simply isn't possible.

That situation has caused some clients to create ids that are only usable in a single client session (such as an in-memory counter). When the database returns the real id, the client has to do some juggling logic to swap out the id being used, adding complexity to the client implementation code.

If client side ID generation were stronger, the chances of collision would be much smaller, and the client could send complete records to the db for insertion without waiting for a full round-trip request to finish before using the ID.

Tiny

Page loads need to be FAST, and that means we can't waste a lot of JavaScript on a complex algorithm. Cuid2 is tiny. This is especially important for thick-client JavaScript applications.

Secure

Client-visible ids often need to have sufficient random data and entropy to make it practically impossible to try to guess valid IDs based on an existing, known id. That makes simple sequential ids unusable in the context of client-side generated database keys. Additionally, using V4 UUIDs is also not safe, because there are known attacks on several id generating algorithms that a sophisticated attacker can use to predict next ids. Cuid2 has been audited by security experts and artificial intelligence, and is considered safe to use for use-cases like secret sharing links.

Portable

Most stronger forms of the UUID / GUID algorithms require access to OS services that are not available in browsers, meaning that they are impossible to implement as specified. Further, our id standard needs to be portable to many languages (the original cuid has 22 different language implementations).

Ports

Improvements Over Cuid

The original Cuid served us well for more than a decade. We used it across 2 different social networks, and to generate ids for Adobe Creative Cloud. We never had a problem with collisions in production systems using it. But there was room for improvement.

Better Collision Resistance

Available entropy is the maximum number of unique ids that can be generated. Generally more entropy leads to lower probability of collision. For simplicity, we will assume a perfectly random distribution in the following discussion.

The original Cuid ran for more than 10 years in across thousands of software implementations with zero confirmed collision reports, in some cases with more than 100 million users generating ids.

The original Cuid had a maximum available entropy of about 3.71319E+29 (assuming 1 id per session). That's already a really big number, but the maximum recommended entropy in Cuid2 is 4.57458E+49. For reference, that's about the same entropy difference as the size of a mosquito compared to the distance from earth to the nearest star. Cuid2 has a default entropy of 1.62155E+37, which is a significant increase from the original Cuid and is comparable to the difference between the size of a baseball and the size of the moon.

The hashing function mixes all the sources of entropy together into a single value, so it's important we use a high quality hashing algorithm. We have tested billions of ids with Cuid2 with zero collisions detected to-date.

More Portable

The original Cuid used different methods to generate fingerprints across different kinds of hosts, including browsers, Node, and React Native. Unfortunately, this caused several compatability problems across the cuid user ecosystem.

In Node, each production host was slightly different, and we could reliable grab process ids and such to differentiate between hosts. Our early assumptions about different hosts generating different PIDs in Node proved to be false when we started deploying on cloud virtual hosts using identical containers and micro-container architectures. The result was that host fingerprint entropy was low in Node, limiting their ability to provide good collision resistance for horizontal server scaling in environments like cloud workers and micro-containers.

It was also not possible to customize your fingerprint function if you had different fingerprinting needs using Cuid, e.g., if both global and window are undefined.

Cuid2 uses a list of all global names in the JavaScript environment. Hashing it produces a very good host fingerprint, but we intentionally did not include a hash function in the original Cuid because all the secure ones we could find would bloat the bundle, so the original Cuid was unable to take full advantage of all of that unique host entropy.

In Cuid2, we use a tiny, fast, security-audited, NIST-standardized hash function and we seed it with random entropy, so on production environments where the globals are all identical, we lose the unique fingerprint, but still get random entropy to replace it, strengthening collision resistance.

Deterministic Length

Length was non-deterministic in Cuid. This worked fine in almost all cases, but proved to be problematic for some data structure uses, forcing some users to create wrapper code to pad the output. We recommend sticking to the defaults for most cases, but if you don't need strong uniqueness guarantees, (e.g., your use-case is something like username or URL disambiguation), it can be fine to use a shorter version.

More Efficient Session Counter Entropy

The original Cuid wasted entropy on session counters that were not always used, rarely filled, and sometimes rolled over, meaning they could collide with each other if you generate enough ids in a tight loop, reducing their effectiveness. Cuid2 initializes the counter with a random number so the entropy is never wasted. It also uses the full precision of the native JS number type. If you only generate a single id, the counter just extends the random entropy, rather than wasting digits, providing even stronger anti-collision protection.

Parameterized Length

Different use-cases have different needs for entropy resistance. Sometimes, a short disambiguating series of random digits is enough: for example, it's common to use short slugs to disambiguate similar names, e.g. usernames or URL slugs. Since the original cuid did not hash its output, we had to make some seriously limiting entropy decisions to produce a short slug. In the new version, all sources of entropy are mixed with the hash function, and you can safely grab a substring of any length shorter than 32 digits. You can roughly estimate how many ids you can generate before reaching 50% chance of collision with: sqrt(36^(n-1)*26), so if you use 4 digits, you'll reach 50% chance of collision after generating only ~1101 ids. That might be fine for username disambiguation. Are there more than 1k people who want to share the same username?

By default, you'd need to generate ~4.0268498e+18 ids to reach a 50% chance of collision, and at maximum length, you'd need to generate ~6.7635614e+24 ids to reach 50% odds of collision. To use a custom length, import the init function, which takes configuration options:

import { init } from '@paralleldrive/cuid2';
const length = 10; // 50% odds of collision after ~51,386,368 ids
const cuid = init({ length });
console.log(cuid()); // nw8zzfaa4v

Enhanced Security

The original Cuid leaked details about the id, including very limited data from the host environment (via the host fingerprint), and the exact time that the id was created. The new Cuid2 hashes all sources of entropy into a random-looking string.

Due to the hashing algorithm, it should not be practically possible to recover any of the entropy sources from the generated ids. Cuid used roughly monotonically increasing ids for database performance reasons. Some people abused them to select data by creation date. If you want to be able to sort items by creation date, we recommend making a separate, indexed createdAt field in your database instead of using monotonic ids because:

  • It's easy to trick a client system to generate ids in the past or future.
  • Order is not guaranteed across multiple hosts generating ids at nearly the same time.
  • Deterministically monotic resolution was never guaranteed.

In Cuid2, the hashing algorithm uses a salt. The salt is a random string which is added to the input entropy sources before the hashing function is applied. This makes it much more difficult for an attacker to guess valid ids, as the salt changes with each id, meaning the attacker is unable to use any existing ids as a basis for guessing others.

Comparisons

Security was the primary motivation for creating Cuid2. Our ids should be default-secure for the same reason we use https instead of http. The problem is, all our current id specifications are based on decades-old standards that were never designed with security in mind, optimizing for database performance charactaristics which are no longer relevant in modern, distributed applications. Almost all of the popular ids today optimize for being k-sortable, which was important 10 years ago. Here's what k-sortable means, and why it's no longer as important is it was when we created the Cuid specification which helped inspire current standards like UUID v6 - v8:

Note on K-Sortable/Sequential/Monotonically Increasing Ids

TL;DR: Stop worrying about K-Sortable ids. They're not a big deal anymore. Use createdAt fields instead.

The performance impact of using sequential keys in modern systems is often exaggerated. If your database is too small to use cloud-native solutions, it's also too small to worry about the performance impact of sequential vs random ids unless you're living in the distant past (i.e. you're using hardware from 2010). If it's large enough to worry, random ids may still be faster.

In the past, sequential keys could potentially have a significant impact on performance, but this is no longer the case in modern systems.

One reason for using sequential keys is to avoid id fragmentation, which can require a large amount of disk space for databases with billions of records. However, at such a large scale, modern systems often use cloud-native databases that are designed to handle terabytes of data efficiently and at a low cost. Additionally, the entire database may be stored in memory, providing fast random-access lookup performance. Therefore, the impact of fragmented keys on performance is minimal.

Worse, K-Sortable ids are not always a good thing for performance anyway, because they can cause hotspots in the database. If you have a system that generates a large number of ids in a short period of time, the ids will be generated in a sequential order, causing the tree to become unbalanced, which will lead to frequent rebalancing. This can cause a significant performance impact.

So what kinds of operations suffer from a non-sequential id? Paged, sorted operations. Stuff like "fetch me 100000 records, sorted by id". That would be noticeably impacted, but how often do you need to sort by id if your id is opaque? I have never needed to. Modern cloud databases allow you to create indexes on createdAt fields which perform extremely well.

The worst part of K-Sortable ids is their impact on security. K-Sortable = insecure.

The Contenders

We're unaware of any standard or library in the space that adequately meets all of our requirements. We'll use the following criteria to filter a list of commonly used alternatives. Let's start with the contenders:

Database increment (Int, BigInt, AutoIncrement), UUID v1 - v8, NanoId, Ulid, Sony Snowflake (inspired by Twitter Snowflake), ShardingID (Instagram), KSUID, PushId (Google), XID, ObjectId (MongoDB).

Here are the disqualifiers we care about:

  • Leaks information: Database auto-increment, all UUIDs (except V4 and including V6 - V8), Ulid, Snowflake, ShardingId, pushId, ObjectId, KSUID
  • Collision Prone: Database auto-increment, v4 UUID
  • Not cryptographically secure random output: Database auto-increment, UUID v1, UUID v4
  • Requires distributed coordination: Snowflake, ShardingID, database increment
  • Not URL or name friendly: UUID (too long, dashes), Ulid (too long), UUID v7 (too long) - anything else that supports special characters like dashes, spaces, underscores, #$%^&, etc.
  • Too fast: UUID v1, UUID v4, NanoId, Ulid, Xid

Here are the qualifiers we care about:

  • Secure - No leaked info, attack-resistant: Cuid2, NanoId (Medium - trusts web crypto API entropy).
  • Collision resistant: Cuid2, Cuid v1, NanoId, Snowflake, KSUID, XID, Ulid, ShardingId, ObjectId, UUID v6 - v8.
  • Horizontally scalable: Cuid2, Cuid v1, NanoId, ObjectId, Ulid, KSUID, Xid, ShardingId, ObjectId, UUID v6 - v8.
  • Offline-compatible: Cuid2, Cuid v1, NanoId, Ulid, UUID v6 - v8.
  • URL and name-friendly: Cuid2, Cuid v1, NanoId (with custom alphabet).
  • Fast and convenient: Cuid2, Cuid v1, NanoId, Ulid, KSUID, Xid, UUID v4, UUID v7.
  • But not too fast: Cuid2, Cuid v1, UUID v7, Snowflake, ShardingId, ObjectId.

Cuid2 is the only solution that passed all of our tests.

NanoId and Ulid

Overall, NanoId and Ulid seem to hit most of our requirements, Ulid leaks timestamps, and they both trust the random entropy from the Web Crypto API too much. The Web Crypto API trusts 2 untrustworthy things: The random entropy source, and the hashing algorithm used to stretch the entropy into random-looking data. Some implementations have had serious bugs that made them vulnerable to attack.

Along with using cryptographically secure methods, Cuid2 supplies its own known entropy from a diverse pool and uses a security audited, NIST-standard cryptographically secure hashing algorithm.

Too fast: NanoId and Ulid are also very fast. But that's not a good thing. The faster you can generate ids, the faster you can run collision attacks. Bad guys looking for statistical anomalies in the distribution of ids can use the speed of NanoId to their advantage. Cuid2 is fast enough to be convenient, but not so fast that it's a security risk.

Entropy Security Comparison

  • NanoId Entropy: Web Crypto.
  • Ulid Entropy: Web Crypto + time stamp (leaked).
  • Cuid2 Entropy: Web Crypto + time stamp + counter + host fingerprint + hashing algorithm.

Testing

Before each commit, we test over 10 million ids generated in parallel across 7 different CPU cores. With each batch of tests, we run a histogram analysis to ensure an even, random distribution across the entire entropy range. Any bias would make it more likely for ids to collide, so our tests will automatically fail if it finds any.

Screen Shot 2022-12-30 at 6 19 15 PM

We also generate randograms and do spot visual inspections.

randogram

Troubleshooting

Some React Native environments may be missing TextEncoding features, which will need to be polyfilled. The following have both worked for users who have encountered this issue:

npm install --save fast-text-encoding

Then, before importing Cuid2:

import "fast-text-encoding";

Alternatively, if that doesn't work:

npm install --save text-encoding-polyfill

Then, before importing Cuid2:

import "text-encoding-polyfill";

Using in Jest

Jest uses jsdom, which builds a global object which doesn't comply with current standards. There is a known issue in Jest when jsdom environment is used. The results of new TextEncoder().encode() and new Uint8Array() are different, refer to jestjs/jest#9983.

To work around this limitation on jsdom (and by extension, Jest), you'll need to use custom environment which overwrites Uint8Array provided by jsdom:

Install jest-environment-jsdom. Make sure to use the same version as your jest. See this answer on Stackoverflow for reference.

❯ npm i jest-environment-jsdom@27

Create jsdom-env.js file in the root:

const JSDOMEnvironmentBase = require('jest-environment-jsdom');

Object.defineProperty(exports, '__esModule', {
    value: true
});

class JSDOMEnvironment extends JSDOMEnvironmentBase {
    constructor(...args) {
        const { global } = super(...args);

        global.Uint8Array = Uint8Array;
    }
}

exports.default = JSDOMEnvironment;
exports.TestEnvironment = JSDOMEnvironment;

Update scripts to use the custom environment:

{
    // ...
    "scripts": {
        // ...
        "test": "react-scripts test --env=./jsdom-env.js",
        // ...
    },
}

JSDOM is Missing Features

JSDOM doesn't support TextEncoder and TextDecoder, refer jsdom/jsdom#2524.

In Jest, features like Uint8Array/TextEncoder/TextDecoder may be available in the jsdom environment but may produce results different from the platform standards. These are known bugs which may be resolved by jsdom at some point, but there is no clear ETA.

Note that this issue may impact any package that relies on the TextEncoder or TextDecorder standards. If you would like to use a simple test runner that just works, try Riteway.

Sponsors

This project is made possible by:

  • DevAnywhere - Expert mentorship for software builders, from junior developers to software leaders like VPE, CTO, and CEO.
  • EricElliottJS.com - Learn JavaScript on demand with videos and interactive lessons.

cuid2's People

Contributors

almeidx avatar audionerd avatar cadienvan avatar canrau avatar ericelliott avatar geommr avatar github-merge-queue[bot] avatar gnought avatar immjs avatar mrloh avatar relative avatar wgordon17 avatar xaevik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cuid2's Issues

Use `cuid2` as the package name

Is there any plan to use cuid2 without the namespace as the NPM package name?

Being able to use this short command

npm install cuid2

would be awesome.

Allow removing ambiguous symbols as an initialization option

I'd like to use cuid2 to generate a row id and a shorter public id the user can easily say or write down without having to think about similar characters (0=>Oo, 1=>L i). We could implement this ourselves client side, but it adds a step, and then we need to think about not forgetting the conversion every time we read or write to the database.

I'm no big brain, I'm not sure which characters we can live with and which can be removed while maintaining a good level of entropy.

Are you open to add this as an option when initializing?

Help with a port to ColdFusion

@ericelliott I'm working on a port of cuid2 to ColdFusion and I wanted to ask about the length of the generated value. Given a shared instance of a Cuid class, should all of the cuid tokens generated from said class be of the same length? In my implementation, they vary by 1; and I'm not sure if that's the way it's supposed to be; or, if I have a bug in my implementation.

I saw that the length can be parameterized; but, I wasn't sure of the expectation for a shared instance of a Cuid class.

I know I have bugs because my values won't fit inside varchar(24). Just not sure how many bugs I have at this time 😆

Requires NodeJS 16

It's unfortunate that this library requires NodeJS 16 (because of its single dependency, I think?). Had to revert back to cuid again because of this.

Yeah I know it's EOL but real life you know...

TypeError: Expected input type is Uint8Array (got object)

Been trying to upgrade to cuid2 from cuid on-off for a while now but never been able to solve this issue. Hoping maintainers or community can help.

I installed with the following:

yarn remove cuid
yarn add @paralleldrive/cuid2

And my diffs are essentially

- import cuid from "cuid"
+ import cuid2 from "@paralleldrive/cuid2"
- cuid()
+ cuid2.createId()

But I'm always seeing the error:

TypeError: Expected input type is Uint8Array (got object)

> 1 | import cuid2 from "@paralleldrive/cuid2"

Same with:

TypeError: Expected input type is Uint8Array (got object)

> 1 | import { createId } from "@paralleldrive/cuid2"

Only seems to be a problem on my web package, my api is not complaining.

Typo

Psuedo is pseudo

Some questions for a port to Go

Hello,

I discovered cuid2 (and cuid) today, and I gave a try at porting it to Go: https://github.com/nlepage/go-cuid2

It seems to be working fine, here are some ids generated with default length:

mhsx3pf2p8ocd2nlaioilwah
jt06wcr8l51t7wj9bxsdb0al
fvqdusrnkcohfi68gor7pooa
sdgqj4r3j0om523ykm0nch85
qw60k14c0kq14gnqjis8nh9f

I have some questions:

  • In createEntropy(), I'm calling a random prime number generator from the Go std library instead of picking in a predefined list of prime numbers
    • is ot OK?
    • isn't it too expensive? an alternative could be to generate a random prime numbers list once, then pick in this list when createEntropy() is called
  • In createFingerprint(), I don't have an equivalent of Object.keys(globalObj).toString()) to give to the hash function, is it a problem?

Thanks for your help!

Fingerprints for non-JS ports

Working on adding cuid2 to the Rust cuid port, and trying to figure out how to do the fingerprint.

The JS version is a hash of:

  • random number from 2063-~4126
  • stringified object keys from the global object, which is either global (in node) or window (in browser)

In Rust, we don't have anything like the global object in node or the window object in the browser. So far, I've got:

  • same random number
  • process ID
  • thread ID

That gives different fingerprints for different processes & threads generating CUIDs on the same system, but doesn't guarantee anything across systems.

It looks like the Python port uses the system hostname, but that would reduce portability and prevents compiling the Rust to target-independent WASM.

One option that springs to mind is environment variables: the specific env var keys and values available to the process are likely to vary a fair bit across systems. On docker, this will include the HOSTNAME env var, which is generally set to the container ID. This is what I'm defaulting to for the moment, but would be curious to hear your thoughts.

We could also just rely on the random number, process ID, thread ID, and the hash entropy.

Cuid validation failing

There's a bug with the validation function. It's returning true regardless of the value.

Expected

import { createId, isCuid } from '@paralleldrive/cuid2';

console.log(
  isCuid(createId()), // true
  isCuid('not a cuid'), // false
)

Actual Results

import c from '@paralleldrive/cuid2'

let id = c.createId()

console.log(id) // ehx2jphxcar7ihzxquylc8vr

c.isCuid(id) // true

c.isCuid('user') // true

c.isCuid('') // false

c.isCuid(2) // false

I'm running node v18.17.0.

Conversion process from bytes to string could cause collisions

@ericelliott Hello, fellow Eric

Thanks for your work on this. I'm working on a golang port and am hoping you can take the time to respond to this.

Can you potentially speak to the motivation behind the following aspect of the hash function at

return BigInt(typedArrayToString(sha3(text)))

First, to walk through the code

  1. sha3 returns the underlying hash output as a uint8 array, e.g. [4, 178, 15, 53, ...] (length 64)
  2. typedArrayToString encodes each uint8 value in the array to a string in base-10 and concatenates them, e.g. "41781553..." (length 64-192)
  3. BigInt parses the string as a base-10 big integer, e.g. 41781553...
  4. toString(36) encodes the associated integer in base-36, e.g. oviup...

Specifically, I'm interested in the motivation of performing step 2 -- why not base-36-encode the raw bytes output of the underlying hash function instead of converting each byte to a base-10 string, concatenating them, and then encoding the result?

As an aside, It's worth noting that other ports appear to deal with this slightly differently, although similarly. I took a look at the python port at https://github.com/overflowdigital/cuid2.py/blob/21111e6db254e325bb08ecca4b883e3581b7d6bf/src/cuid2/generator.py#L21 and found the following:

  1. sha3_512.hexdigest() returns the underlying hash output as a hex-encoded string, e.g. "04b20f35..." (length 128)
  2. 0x prefix is removed from the head of 1, as any slice starting with L and terminating with the end of the string (not sure how hex could have L in it, but... (length 128)
  3. string_to_int takes the hex string input and encodes it to ascii bytes, e.g. [48, 52, 98, 50, 48, 102, 51, 53, ...] (length 128), then hexlifys it, which encodes the bytes as a hex-encoded string, e.g. "3034623230663335..." (length 256), then parses this string as a base-16 encoded integer
  4. base36_encode encodes the associated integer in base-36

Performance Concerns with .NET Port

Hi @ericelliott

I hope you don't mind me reaching you out on here. I had debated creating this as a discussion on my own repository but that felt like it would be a weird callout.

I am trying to address a serious performance (memory and cpu time) concern with the .NET port of cuid2 when dealing with high volume iterations.

How the current implementation currently works, is that it uses BouncyCastle to generate the SHA3-512 hash as a 64-byte array, due to its size, the use of BigInteger is required and one must do the Base36 conversion manually (.NET only supports Base 2, 8, 10, and 16 conversions natively), as shown below:

private static string Encode(ReadOnlySpan<byte> input)
{
    if ( input.IsEmpty )
    {
        return string. Empty;
    }

    int length = (int) Math.Ceiling(input.Length * Context.ByteBitCount / Context.BitsPerDigit);
    int i = length;

    Span<char> buffer = stackalloc char[length];

    BigInteger d = new(input, true);
    while ( !d.IsZero )
    {
        d = BigInteger.DivRem(d, Context.Radix, out BigInteger r);
        int c = Math.Abs((int) r);
        buffer[--i] = (char) ( c is >= 0 and <= 9 ? c + 48 : c + 'a' - 10 );
    }

    return new string(buffer.Slice(i, length - i));
}

There are a couple of problems with this however:

  1. BigInteger is an immutable struct, which means a new copy is created for each operation (in this case, BigInteger.DivRem)
  2. BigInteger.DivRem is very inefficient and is called over one hundred times

To put this in another perspective, if I use Convert.ToHexString instead to generate a Base16 value, it takes 1.8 seconds per one million Cuid2 values while this implementation results in twelve seconds per one million values as well as the increased memory usage from instantiating one hundred BigInteger values per Cuid2.

What I am thinking of doing instance is taking that 64-byte array and splitting it into either 8 8-byte values (int64) or 4 16-byte values (int128 - for .NET 7 and above) and converting those to Base36 then merging the strings. Is that something you feel would be sufficient?

Error in react native

Hello, I saw that the last verison of this had react native support, just wondering if it will be added to this version as well.
Trying to run the following code in Expo

import { createId } from '@paralleldrive/cuid2'

console.log(createId())

Throws the following errors

 ERROR  ReferenceError: Property 'TextEncoder' doesn't exist, js engine: hermes
 ERROR  Invariant Violation: Failed to call into JavaScript module method AppRegistry.runApplication(). Module has not been registered as callable. Registered callable JavaScript modules (n = 11): Systrace, JSTimers, HeapCapture, SamplingProfiler, RCTLog, RCTDeviceEventEmitter, RCTNativeAppEventEmitter, GlobalPerformanceLogger, JSDevSupportModule, HMRClient, RCTEventEmitter.
        A frequent cause of the error is that the application entry file path is incorrect. This can also happen when the JS bundle is corrupt or there is an early initialization error when loading React Native., js engine: hermes

Go Implemenation

I took a shot at creating a Go implementation.

It seems to be creating the correct output, but I would still love get fresh sets of eyes on it.

Maybe you could add it to your README.md so that I can get some eyes on it and get it reviewed for correctness.

https://github.com/jarrodhroberson/cuid2

Cuid2 security audit info

Hi,

Since the security was the primary motivation for creating the Cuid2, I would love to see more information about the audits you mentioned in the README:

Cuid2 has been audited by security experts and artificial intelligence, and is considered safe to use for use-cases like secret sharing links.

Cheers,
Antoni

`createId` doesn't work on Cloudflare worker / pages

Impossible to import { createId } from "@paralleldrive/cuid2"; in a Cloudflare worker or page. No idea why, Cloudflare doesn't provide any logs, just a crash page when createId is used. This never happen using wrangler pages dev, this is very weird and inexplicable.

I'll create a PoC later

Issues with comparisons

Cuid2 appears to be a good format and library overall, but the readme has some issues around its comparisons with other formats.

The comparisons to other id formats seem to conflate some things when making comparisons:

  • The spec and output format
  • The implementation and PRNG used

Taking UUIDv4 as an example - going by the spec, UUIDv4 should roughly match in every aspect other than "Not URL or name friendly".

If we break it down:

  • Cuid2 uses: Cuid2 CSPRNG -> ~124 random bits -> Cuid2 text
  • UUIDv4 with a CSPRNG uses: CSPRNG -> 122 random bits -> UUID

You could use the Cuid2 CSPRNG in an UUIDv4 implementation, and get exactly the same security (ignoring the 124 vs 122 bits). So what you're really comparing is the security of the CSPRNG, rather than the format itself.

There is something to be said for specifying the CSPRNG used in the format, versus UUIDv4 that leaves it up to the implementation. JWTs had similar issues where the spec wasn't specific enough about some details, leading to major security issues in many implementations.

When comparing the PRNGs

As long as the PRNG satisfies the required properties for a CSPRNG, there is little meaningful difference between different implementations.

Id generation should be fast enough that humans won't notice a delay, but too slow to feasibly brute force (even in parallel)

The faster you can generate ids, the faster you can run collision attacks

If you start at the output of the PRNG, you can generate IDs equally fast - the speed of the PRNG doesn't matter. You have 122+ bits of entropy either way, and there's no way you're going to find a collision.

Using the PRNG itself to generate collisions only makes sense if the PRNG isn't secure to begin with.

A slow PRNG isn't a requirement for security.

Cuid2 uses multiple, independent entropy sources and hashes them with a security-audited, NIST-standard cryptographically secure hashing algorithm (Sha3).

That the hashing algorithm is NIST-standard doesn't prove on its own that the Cuid2 PRNG is secure - it's just one component of the PRNG. NIST does actually go further and provides recommendations for creating a CSPRNG using hash functions or other crypto primitives - why are one of those recommendations not used?

Using a completely custom PRNG here instead of using a known secure has the possibility of introducing new security issues.

For example, the Wikipedia page on CSPRNGs mentions this requirement:

Every CSPRNG should withstand "state compromise extension attacks". In the event that part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation.

Does the Cuid2 implementation meet that requirement? Has that been audited?

there may be bugs in browser CSPRNGs

The linked Chromium issue was from 2015/2016. Are there any currently known issues / weaknesses? Otherwise there is no more reason to believe there is a bug in the browser CSPRNGs than there being a bug in Cuid2.

Issue when using in Vercel edge functions

I'm developing a library ( https://github.com/lgrammel/ai-utils.js ) and was using cuid2. However, it throws the following error when using it in Vercel edge functions:

- error node:crypto
Module build failed: UnhandledSchemeError: Reading from "node:crypto" is not handled by plugins (Unhandled scheme).
Webpack supports "data:" and "file:" URIs by default.
You may need an additional plugin to handle "node:" URIs.
Import trace for requested module:
node:crypto
../../node_modules/@noble/hashes/esm/cryptoNode.js
../../node_modules/@noble/hashes/esm/utils.js
../../node_modules/@noble/hashes/esm/sha3.js
../../node_modules/@paralleldrive/cuid2/src/index.js
../../node_modules/@paralleldrive/cuid2/index.js
../../dist/vector-db/VectorDB.js
../../dist/vector-db/index.js
../../dist/index.js

It's most likely caused by the limited API availability in edge functions ( see https://nextjs.org/docs/messages/node-module-in-edge-runtime ) and could be solved by configuring webpack. However, since I'm developing a library I don't want to expose my users to this.

I've switched to nanoid v3 for the time being, which seems to work w/o issues in that environment, but I'd like to switch back to cuid2 in case this issue is fixable.

Use CSPRNG if available.

Note: Cuid2 already provides some cryptographically strong guarantees even using Math.random because:

  1. We don't trust the entropy from Math.random so we don't rely on it for cryptographic security. Instead, we mix it with several other independent sources of entropy. Those sources have been proven for more than a decade in apps with hundreds of millions of users via Cuid V1, and helped inspire UUID v6 - v8.
  2. We use a security audited implementation of the cryptographically secure SHA-3 to hash all that entropy so the output is opaque.

As far as we're aware, Cuid2 is safe to use now, and certainly safer than most other id options available.

Because we believe in security in layers, it's still a good idea to source our random bits with a CSPRNG. That is the purpose of this issue.

Idea:
Apparently all modern browsers + node support the cryptographically-secure crypto.randomUUID(), which may make a better source of pseudorandom data than Math.random().

Documentation: Polyfill for "TextEncoder" required in react-native

So the only dependency of this package is https://github.com/paulmillr/noble-hashes which itselfs has no own dependencies.

In a react-native (v0.64) project running on iOS 14, I had to add this line

import * as fastTextEncoding from 'fast-text-encoding';

to avoid an error of TextEncoder not being defined. Should it maybe mentioned in this project's readme that you may run into this issue when using a JS runtime that doesn't support TextEncoder?

I couldn't find TextEncoder at all in this repo's sources though, so I could be way off. But https://github.com/paulmillr/noble-hashes/search?q=TextEncoder seems to use it.

Use randomCUID2 with a length > 50, make an index out of bound error

Hi,

CUID.randomCUID2(51)
make this error

 Caused by: java.lang.StringIndexOutOfBoundsException: begin 1, end 51, length 50
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:4606)
	at java.base/java.lang.String.substring(String.java:2709)
	at io.github.thibaultmeyer.cuid.CUID.randomCUID2(CUID.java:74)

Probably caused by String hash = CUID.CUIDv2.computeHash(time + CUID.CUIDv2.createEntropy(length) + CUID.CUIDv2.nextCounterValue() + CUID.Common.MACHINE_FINGERPRINT, length); in

    public static CUID randomCUID2(int length) {
        if (length <= 0) {
            throw new CUIDGenerationException("the length must be at least 1");
        } else {
            String time = Long.toString(System.currentTimeMillis(), 36);
            char firstLetter = CUID.CUIDv2.ALPHABET_ARRAY[Math.abs(CUID.Common.nextIntValue()) % CUID.CUIDv2.ALPHABET_ARRAY.length];
            String hash = CUID.CUIDv2.computeHash(time + CUID.CUIDv2.createEntropy(length) + CUID.CUIDv2.nextCounterValue() + CUID.Common.MACHINE_FINGERPRINT, length);
            return new CUID("" + firstLetter + hash.substring(1, length));
        }
    }

That call return (new BigInteger(MessageDigest.getInstance("SHA3-256").digest((content + salt).getBytes(StandardCharsets.UTF_8)))).toString(36);

in

private static String computeHash(String content, int saltLength) {
           String salt = createEntropy(saltLength);

           try {
               return (new BigInteger(MessageDigest.getInstance("SHA3-256").digest((content + salt).getBytes(StandardCharsets.UTF_8)))).toString(36);
           } catch (NoSuchAlgorithmException var4) {
               throw new CUIDGenerationException(var4);
           }
       }

The use of "SHA3-256" algorithm will generate a fixed length output (32 Bytes) your conversion to as string generate a 50 characters length. So when a user demands more of 50 length CUID2, that crash.

"text" not used in hash function

It appears that the variable "text" is not used. The problem is probably in the call to sha3 function.

const text = input + salt;

const hash = (input = "", length = bigLength) => {
  const salt = createEntropy(length);
  const text = input + salt;
  return BigInt(
    sha3(input)
      .map((x) => x.toString(16))
      .join("")
  )
    .toString(36)
    .substring(2);
};

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.