Back to Blog
Technicaluuididentifierdatabase

UUIDs Explained: Universally Unique Identifiers and When to Use Them

Deep dive into UUID versions (v1, v4, v5, v7), collision probability, database performance implications, and modern alternatives like ULID and Snowflake IDs.

Loopaloo TeamFebruary 6, 202613 min read

UUIDs Explained: Universally Unique Identifiers and When to Use Them

Every software system eventually bumps into the same fundamental question: how do you give something a name that nobody else will ever pick? In a small application backed by a single database, the answer is trivial — let the database hand out sequential integers. But the moment you add a second database, a second server, or a second continent, that tidy auto-increment column turns into a coordination nightmare. Universally Unique Identifiers, better known as UUIDs (and sometimes called GUIDs in the Microsoft ecosystem), were invented to solve exactly this problem. They give any machine, anywhere, the ability to mint an identifier that is, for all practical purposes, guaranteed never to collide with an identifier minted by any other machine. No central authority required, no network call needed, no locking, no waiting.

A UUID is a 128-bit value, typically rendered as a 36-character hexadecimal string broken into five groups separated by hyphens — something like 550e8400-e29b-41d4-a716-446655440000. That string is really just a human-friendly encoding of 16 raw bytes. You can store UUIDs with or without the hyphens, as a 32-character hex string, or as raw binary in a 16-byte column. The hyphens carry no semantic meaning; they exist purely for readability. Under the hood, the 128 bits are divided into fields whose meaning depends on the UUID version, which is encoded in four bits of the identifier itself. A small variant field marks the UUID as conforming to the RFC 4122 standard (as opposed to earlier, now-obsolete layouts). Everything else is version-specific payload.

UUID Version 1: Timestamps and MAC Addresses

The original workhorse, UUID v1, constructs its value from the current timestamp and the MAC address of the generating machine. The timestamp is a 60-bit count of 100-nanosecond intervals since October 15, 1582 — the date of the Gregorian calendar reform, chosen by the original designers of the DCE standard. Combined with a 14-bit clock sequence that increments when the clock is set back or the node ID changes, v1 guarantees uniqueness across time and space without any coordination between machines. If two UUIDs share the same timestamp, the clock sequence differentiates them; if two machines generate at the same instant, their different MAC addresses differentiate them.

The strength of v1 is also its weakness. Because the MAC address is embedded in the identifier, anyone who reads a v1 UUID can identify the hardware that generated it. In the late 1990s this became a privacy concern, and it was one of the motivations for developing later versions. Additionally, the timestamp component is arranged in a way that defeats naïve sorting — the low-order bits of the timestamp come first in the byte layout, so v1 UUIDs do not sort chronologically when compared as simple byte strings. This seemingly minor design decision has had outsized consequences for database performance, as we will see later.

UUID Version 4: Pure Randomness

UUID v4 sidesteps the privacy and complexity issues of v1 by filling nearly all 128 bits with random data. Six bits are reserved for the version and variant fields, leaving 122 bits of entropy. That means there are roughly 5.3 × 10³⁶ possible v4 UUIDs. The collision probability follows the mathematics of the birthday problem: you are not asking "will this specific UUID collide with an existing one?" but rather "among all the UUIDs I have generated, will any two be the same?" Even under the birthday-problem framework, the numbers are staggering. You would need to generate approximately 2.71 × 10¹⁸ UUIDs — 2.71 quintillion — before the probability of a single collision reaches 50 percent. To put that in human terms, if you generated one billion UUIDs every second, you would need to keep that pace up for about 85 years before a collision became a coin flip. For virtually every real-world system, UUID v4 is unique enough.

The overwhelming majority of UUIDs you encounter in the wild are v4. They are the default in most libraries, frameworks, and cloud platforms. Their randomness makes them trivially simple to generate — just draw 122 bits from a cryptographically secure random number generator, set the version and variant bits, and you are done. No clock to read, no MAC address to look up, no state to maintain. You can try generating your own with Loopaloo's UUID Generator, which lets you produce v4 identifiers instantly and inspect their structure.

UUID Version 5: Deterministic and Namespace-Based

Sometimes you do not want randomness. You want the same input to always produce the same UUID, so that two independent systems processing the same data will independently arrive at the same identifier without communicating. UUID v5 achieves this by hashing a namespace UUID together with a name string using SHA-1. Given the same namespace and name, you always get the same output. The RFC defines several well-known namespace UUIDs for DNS names, URLs, OIDs, and X.500 distinguished names, but you can define your own.

UUID v5 is invaluable for deduplication and idempotency. If you need to assign a stable identifier to an email address, a URL, or a product SKU, v5 lets you derive it deterministically. The downside is that SHA-1 is no longer considered collision-resistant against deliberate attacks, though for identifier generation — where adversarial collision resistance is rarely a concern — this matters less than it does in a cryptographic context. UUID v3, the older sibling, uses MD5 instead of SHA-1 and is now generally avoided in favor of v5.

UUID Version 7: The Modern Contender

The newest version to gain widespread attention is UUID v7, specified in RFC 9562 (published in 2024, replacing portions of RFC 4122). Version 7 addresses the two biggest practical complaints about v4: UUIDs are not sortable by creation time, and their randomness wreaks havoc on database indexes. A v7 UUID begins with a 48-bit Unix timestamp in milliseconds, followed by 4 bits for the version field, then 12 bits of sub-millisecond randomness or sequence, the 2-bit variant field, and finally 62 bits of additional randomness.

Because the most significant bits are a timestamp, v7 UUIDs sort in chronological order when compared as raw bytes or as strings. This seemingly small change has profound implications for database systems. B-tree indexes, the workhorse data structure behind nearly every relational and many non-relational databases, perform best when new entries are appended near the end of the tree rather than inserted at random positions throughout it. Random v4 UUIDs cause page splits, fragmentation, and increased I/O as the index grows. Sequential v7 UUIDs behave much like auto-incrementing integers from the B-tree's perspective, preserving insert performance while retaining the distributed-generation benefits of UUIDs. If you are starting a new project today and need UUIDs as primary keys, v7 is almost certainly the right choice.

Database Implications

Using UUIDs as primary keys is one of the most debated topics in database design, and much of the controversy stems from conflating all UUID versions. The real performance story depends on which version you use and how you store it.

Random v4 UUIDs stored as 36-character strings are the worst case: they consume 36 bytes per key (versus 4 or 8 for integers), they defeat index locality, and they make joins more expensive due to wider keys. Storing them as 16-byte binary values (the BINARY(16) type in MySQL, the native uuid type in PostgreSQL) cuts the storage cost in half and improves comparison speed, since binary comparisons are faster than string comparisons. PostgreSQL's native UUID type is especially efficient and should always be preferred over a VARCHAR column.

Even with binary storage, v4 UUIDs still cause B-tree fragmentation. Because new values are uniformly distributed across the key space, every insert potentially touches a different leaf page. Over time this leads to low page fill factors and increased disk I/O. Version 7 UUIDs eliminate this problem by ensuring that new values are always lexicographically larger than old ones. The B-tree appends new entries to the rightmost leaf, exactly as it would with an auto-incrementing integer. The write amplification savings can be substantial in high-throughput systems.

Alternatives to UUIDs

UUIDs are far from the only game in town. Several alternative identifier schemes have emerged, each optimizing for different trade-offs with their own strengths and weaknesses.

ULID, the Universally Unique Lexicographically Sortable Identifier, predates UUID v7 and solves the same sortability problem. A ULID is 128 bits, encoded as a 26-character Crockford Base32 string. The first 48 bits are a Unix timestamp in milliseconds, and the remaining 80 bits are random. ULIDs are monotonically sortable, compact as strings, and case-insensitive. However, now that UUID v7 exists as a formal standard, ULIDs are gradually losing their unique selling point.

Snowflake IDs, pioneered by Twitter, pack a timestamp, a machine identifier, and a per-machine sequence number into a 64-bit integer. The smaller size is a significant advantage — 8 bytes instead of 16, and they fit natively in a BIGINT column. The downside is that they require a central or coordinated assignment of machine IDs, reintroducing some of the coordination that UUIDs were designed to avoid. Discord, Instagram, and many other high-scale systems use Snowflake-derived schemes.

NanoID is a string-based identifier generator that lets you choose your alphabet and length. It defaults to 21 characters drawn from a URL-safe alphabet, yielding roughly 126 bits of entropy. NanoID is popular in frontend applications where short, URL-friendly identifiers matter more than standardized binary formats. CUID2, the successor to CUID, takes a similar approach with an emphasis on security — its output is designed to be resistant to fingerprinting and enumeration attacks.

When Not to Use UUIDs

It is worth stepping back and asking whether you need UUIDs at all. For a single-server application with a single database, auto-incrementing integers are simpler, smaller, faster to index, and easier to communicate to users. Nobody wants to read a UUID over the phone. Sequential integers also make debugging easier — "look at order 48,291" is more practical than "look at order f47ac10b-58cc-4372-a567-0e02b2c3d479."

UUIDs earn their keep when you need decentralized ID generation: microservices creating records independently, client-side ID generation before a server round-trip, offline-capable applications that sync later, or any scenario where two systems must agree on an identifier without talking to each other first. They are also valuable when exposing IDs externally, since sequential integers leak information about your data volume and creation rate. If a competitor can see that your latest order ID is 48,291, they know exactly how many orders you have processed. A UUID reveals nothing.

Distributed Systems and Decentralized Generation

The deepest motivation for UUIDs lies in distributed systems theory. In a distributed environment, achieving consensus is expensive — it requires network round-trips, introduces latency, and creates single points of failure. Every time two nodes need to agree on "who gets the next ID," they are performing a miniature consensus protocol. UUIDs eliminate this entirely. Each node generates identifiers in isolation, confident that collisions are so improbable as to be negligible. This property makes UUIDs a natural fit for event sourcing, CRDTs (Conflict-free Replicated Data Types), and eventually-consistent systems where coordination is the enemy of availability.

In microservice architectures, UUID generation at the edge — in the API gateway, in the client, or in the originating service — means that a record's primary key is known before the database write occurs. This enables patterns like optimistic UI updates, idempotent retries (send the same UUID again and the server knows it is a duplicate), and parallel writes to multiple datastores without cross-store coordination.

The next time you need a unique identifier, pause and consider the trade-offs. If you are building a single-database CRUD application, an integer primary key is probably all you need. If you are building a distributed system, a multi-tenant platform, or anything that generates IDs outside of a single database transaction, UUIDs — and specifically UUID v7 — deserve serious consideration. Generate a few with the Loopaloo UUID Generator, inspect their structure, and you will quickly develop an intuition for how they work and when they shine.

Related Tools

Related Articles

Try Our Free Tools

200+ browser-based tools for developers and creators. No uploads, complete privacy.

Explore All Tools