A 64-Bit Distributed ID Generator

Lodhi Garden, Delhi clicked by siddharth sabron

Lodhi Garden, Delhi

Image Link

Introduction

Unique ID generation is a crucial part of any application, because it is used to represents a single unique row in the table. Furthermore, this ID is often used as the primary-key of the table and heavily used in searching a specific row uniquely.

Example Query:

select  * from  short_urls where internal_id = 286435872092454913;

You might wonder why we are emphasizing the ID so much…..?. Couldn’t we just generate a unique value using something like Math.random() + epoch..? or some random_string_of_some_size + epoch While that could produce unique values, but it would be a poor choice for several reasons, especially at scale:

  1. Meaningful Data: A simple timestamp (or a random number) doesn’t convey any useful information other than the time of creation. Our Snowflake-like approach embeds a timestamp and a shard ID, which can be useful for operational purposes (e.g., identifying which shard a URL was created on).

  2. Database Performance (Crucially): The data type of the primary key has a major impact on database performance, particularly for search operations. Purely numeric IDs (integers or long integers) are vastly superior to strings for primary keys and indexed columns or both as well(A Indexed Primary).

Why Not Random Strings?

Using random strings (e.g., generated by Math.random() + epoch or random string of some size) as primary keys is generally a bad idea for performance reasons:

Why Numeric, Sequential IDs are Better:

Numeric IDs, especially when generated in a roughly sequential manner (like our Snowflake-like approach), offer substantial advantages:

Database Internals: B-Trees (Simplified)

Most relational databases (MySQL, PostgreSQL, etc.) use B-trees (or variants like B+trees) to implement indexes. Here’s a simplified explanation:

Method: generateInternalId(int shardId)

The generateInternalId method creates unique IDs without relying on a central coordinating service (like a database auto-increment). It achieves this by combining a timestamp, a shard ID, and a sequence number into a single 64-bit integer.

ID Structure

The generated ID is a 64-bit long integer, structured as follows:

+-------------------------------------------------------------------------------------+
|    Timestamp (41 bits)   |   ShardID(10) |   Sequence(12)   |      extra bit        |
+--------------------------+---------------+------------------+-----------------------+
|            63            |      22       |        12        |  0  (Bit positions)   |
+-------------------------------------------------------------------------------------+

Bitwise Operations

The code uses bitwise operations to combine the timestamp, shard ID, and sequence number:

+------------------------------------------------------------------------------------+
|       (timestamp << 22)               |       (shardId << 12)    | sequenceNumber  |
+---------------------------------------+--------------------------+-----------------+
|  1001...0010000000000000000000000000  |  0000000101000000000000  | 000001111011    |
+------------------------------------------------------------------------------------+ 

Resulting 64-bit ID: 1001…001000000000101000001111011

Visual Representation

+--------------------------------------------------------------------------------------+
|                                         ID (64 bits)                                 |
+--------------------------------------------------------------------------------------+
|      Timestamp (41 bits)        |      Shard ID (10 bits)     |    Seq (12 bits)     |
+---------------------------------+-----------------------------+----------------------+
| 10010011...00100000000000       |          0000000101         |      000001111011    |
+---------------------------------+-----------------------------+----------------------+

Example

Let’s say:

  1. Timestamp: 39,486,615,500 milliseconds (difference between current time and epoch).

  2. Binary Representations (simplified):

    • Timestamp: 10010011111000000110000110101100000100100
    • Shard ID: 0000000101
    • Sequence Number: 000001111011
  3. Bitwise Operations: The code performs the shifts and OR as described above, resulting in the final 64-bit ID.

Code

Here is the java implementation of the above explanation:

    private final long epoch = 1672531200000L;
    private final AtomicInteger sequence = new AtomicInteger(0);

    private synchronized long generateInternalId(int shardId) {
        long timestamp = System.currentTimeMillis() - epoch;
        int sequenceNumber = sequence.getAndIncrement() & 0xFFF;
        return (timestamp << 22) | (shardId << 12) | sequenceNumber;
    }

Also i have used this in one of my personal project Url Shortener

Parameters
Return Value

Important Considerations (Not Handled in this Simplified Version)

This detailed explanation should cover all aspects of the generateInternalId method and the Snowflake-like ID generation strategy. Remember to use the more robust version of this method (with clock drift and sequence overflow handling) in a production environment.


If you like the post please connect me on LinkedIn And Twitter


Read Previous

Linux Kernel Memory Barriers: A Deep Dive

Go to top