For developers, databases are an area of application development that shouldn’t be taken lightly. In this article, I’ll address the question of what constitutes suitable primary keys for relational databases like MySQL or PostgreSQL. But before I delve into the technical details, I’d like to briefly describe a scenario I recently encountered.
I was tasked with migrating an online shop system for a project. Since this system had been in productive use for over 10 years, the goal was to upgrade to a new major version. As we were already three major releases behind the current version, we decided to take this opportunity to also get rid of some legacy issues. Essentially, a new shop with a new design and updated functionality was to be set up from scratch, allowing the existing products, orders, and, of course, customer data to be imported into the new shop. So far, so routine.
The complication arose from the fact that the old system had to remain operational until it could be seamlessly replaced by the new version. As is often the case, software evolves. The new version also included significant changes that complicated direct data mapping. Specifically, the issue concerned how product attributes are stored internally. For example, if we sell T-shirts, there might be a white cotton V-neck model available in different sizes. Now, when selecting items from the catalog in the shop view, each individual shirt won’t be displayed in its size. Instead, the product will have a selection box with the different sizes. These product attributes can become extremely complex, depending on the shop.
Nowadays, there are very powerful tools available—not as expensive as a mid-range car—for defining mappings between database schema versions and automatically transferring the data to the new version. This process becomes a real ordeal when primary keys are generated using generic auto-increment. The old system remains active and continuously generates new primary keys, which may already be in use in the new system. This effect is minimized through so-called freezes. This means that until the migration is complete, the shop owner cannot add new products to the shop and can only modify existing product attributes under very limited conditions.
To make data migration easier and less prone to errors, using auto-increment for primary key generation is generally frowned upon in commercial environments. Professional database management systems (DBMS) like Oracle and PostgreSQL cannot even create auto-increments without significant effort. If you still want to use this feature, it’s usually implemented via the persistence framework and not, as with MySQL or MariaDB, as a function within SQL.
Where does the idea of using such a generic primary key even come from? It’s certainly a very simple mechanism that has proven itself and works well in practice. At least as long as you don’t intend to migrate. Another aspect is, of course, historically rooted, back when hard drive storage was expensive and not as readily available as it is today. Back then, every single bit that could be saved counted. This argument is refuted by the availability of inexpensive storage. On the contrary, the disadvantages you incur in terms of maintenance for a few saved megabytes actually outweigh them.
What are therefore suitable primary keys for records in relational databases? Here we distinguish between two categories: natural and generated keys. Since the primary key must be unique and therefore cannot occur twice, there are few natural candidates. The classic example of a user account as a primary key is the email address. The phone number also has this desired property.
Automatically generated primary keys include the auto-increment key already described, which we should avoid. Instead, it’s better to use the Universal Unique Identifier, or UUID for short. All programming languages have an implementation for this. However, there are now several versions of the UUID. Version 7 of the UUID was released not too long ago. Therefore, let’s take a closer look at the properties of the respective versions. The AI Grok presents the versions as follows.
- Version 1 (Time-based, RFC 4122/9562):
- Combines a highly precise timestamp (60 bits, 100-nanosecond intervals since October 15, 1582), a 14-bit clock sequence (to prevent clock reversal), and a 48-bit node ID (usually the computer’s MAC address).
- Disadvantage: The MAC address can reveal the hardware (data protection). The sorting is not optimal because the timestamp bits are not in chronological order.
- Reference: RFC 9562, Section 5.1.
- Version 2 (DCE Security):
- Similar to version 1, but with additional fields for POSIX UID/GID (Local Domain). The timestamp is less precise.
- Status: Rarely used, not implemented in most libraries, and intended only for very specific legacy DCE applications.
- Evidence: RFC 9562 mentions it as reserved with reference to old DCE specifications.
- Versions 3 & 5 (Name-based):
- Deterministic: A UUID is generated from a namespace and a name using a hash (v3: MD5, v5: SHA-1). Same input → same UUID.
- Difference: Only the hash algorithm. MD5 is broken, SHA-1 is considered weak → v5 is somewhat better, but neither is suitable for cryptographic security.
- Evidence: RFC 9562, Sections 5.3 and 5.5.
- Version 4 (Random):
- The classic “random” UUID. 122 bits are truly random (or cryptographically secure pseudo-random). No time information, no node ID.
- Advantage: Maximum unpredictability and privacy.
- Disadvantage: Not sortable → worse performance than primary keys in databases (index fragmentation).
- Evidence: RFC 9562, Section 5.4 – considered a secure standard for many applications.
- Version 6 (Reordered Time-based, RFC 9562):
- Technically almost identical to v1 (same timestamp, same clock sequence, same node), but the timestamp bits are rearranged (most significant first).
- This allows v6 UUIDs to be sorted byte-wise by creation time – ideal for databases.
- Recommendation in the RFC: Use only as a drop-in replacement for existing v1 systems; otherwise, v7 is preferable.
- Evidence: RFC 9562, Section 5.6 – “field-compatible version of UUIDv1”.
- Version 7 (Unix Time-based, RFC 9562):
- Modern variant: 48-bit Unix timestamp in milliseconds (since 1970), followed by 12 bits “rand_a” (can be used for sub-milliseconds or counters) and 62 bits “rand_b” (random).
- Advantages:
- Very easy to sort (time is at the beginning).
- High entropy (74-bit random).
- No MAC address → better data privacy.
- Good for distributed systems and database indexes.
- The RFC explicitly recommends: “Implementations SHOULD utilize UUIDv7 instead of UUIDv1 and UUIDv6 if possible.”
- Document: RFC 9562, Section 5.7.
The most widespread version so far is version 4, which I also use myself. An important criterion is already familiar from hash algorithms. With hashes, we speak of collisions, meaning when a hash refers to two different texts. We have a similar problem with the generation of UUIDs. In production environments, even with large datasets, these should not be generated multiple times. This would trigger an error in the database system because the uniqueness requirement is violated. The subsequent error handling is more problematic. In order for the data record to still be saved, a new UIID must be generated. I have not yet encountered this situation in my many years of using UUID version 4.
Why should one revert to UUID version 7? It’s about sortability. UUID 7 promises that newer entries will have an ascending date in the first positions. This allows you to identify older entries in descending order.
To use UUID 4 in Java, for example, simply call UUID.randomUUID(). The ORM mapper Hibernate also provides other versions of the UUID via the @GeneratedValue annotation. Of course, you can also use additional libraries like the uuid-creator under the MIT license.
<dependency><groupId>com.github.f4b6a3</groupId><artifactId>uuid-creator</artifactId><version>${version}</version></dependency>
In Java, there’s also a way to determine which UUID a string used without an additional library.
UUID uuid = UUID.fromString("123e4567-e89b-12d3-a456-426614174000");
That’s all for today. I hope this article has drawn some attention to the topic and I would be very happy if it gains wider recognition.



Leave a Reply
You must be logged in to post a comment.