Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Monotonic UUIDv7 Batch Generation #191

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion codec.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@

package uuid

import "fmt"
import (
"fmt"
"math/rand"
)

// FromBytes returns a UUID generated from the raw byte slice input.
// It will return an error if the slice isn't 16 bytes long.
Expand Down Expand Up @@ -227,3 +230,19 @@ func (u *UUID) UnmarshalBinary(data []byte) error {

return nil
}

// WithCustomPRNG provides a deterministic random number generator for testing.
//
// Allows users to specify a PRNG with a fixed seed, enabling
// reproducible UUID generation. Useful for unit testing and debugging.
//
// Arguments:
// - seed: The seed value for the PRNG.
//
// Returns:
// - GenOption: A function to configure the generator.
func WithCustomPRNG(seed int64) GenOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we move forward I think this function should be called WithExplicitRandSeed(), or something similar, to better reflect what it's actually doing.

return func(gen *Gen) {
gen.rand = rand.New(rand.NewSource(seed))
}
}
151 changes: 138 additions & 13 deletions generator.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"crypto/rand"
"crypto/sha1"
"encoding/binary"
"errors"
"hash"
"io"
"net"
Expand Down Expand Up @@ -193,6 +194,32 @@ func NewGenWithOptions(opts ...GenOption) *Gen {
return gen
}

// MonotonicGen extends the Gen struct with a counter for batch generation.
//
// MonotonicGen ensures the generation of strictly monotonic UUIDs within a
// batch by utilizing a counter in conjunction with timestamps. This is
// particularly useful for applications requiring ordered identifiers, such
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for applications requiring ordered identifiers

v6 and v7 UUIDs are already ordered so none of this is strictly necessary except to enable batch generation, and I'm not sold on the utility there. I have used this library to generate UUIDs on the order of tens of millions per second sustained over 1000s of nodes in a distributed system without ever running into a scenario where I wanted/needed to pre-allocate a block of values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dylan-bourque - I'm really not sure how useful this is relative to the increased complexity. I haven't generated millions per second, but I can't see any reason that the existing implementation doesn't do what you want. Moreover, I'm concerned that having two separate implementations of generating a new UUIDv7 (the existing public method for generating one at a time and your new private method that is called in the loop) will lead to maintenance headaches.

Is there a specific situation you have encountered in which the existing implementation fails to provide monotonically increasing values? If there is, please explain in more detail to help us understand the need.

// as database indices or log sequencing.
type MonotonicGen struct {
Gen
monotonicCounter uint16
monotonicMutex sync.Mutex
}

// NewMonotonicGen creates a MonotonicGen instance with configurable options.
//
// Arguments:
// - opts: Configuration options for the generator.
//
// Returns:
// - *MonotonicGen: The configured generator.
func NewMonotonicGen(opts ...GenOption) *MonotonicGen {
gen := &MonotonicGen{
Gen: *NewGenWithOptions(opts...),
}
return gen
}

// WithHWAddrFunc is a GenOption that allows you to provide your own HWAddrFunc
// function.
// When this option is nil, the defaultHWAddrFunc is used.
Expand Down Expand Up @@ -327,15 +354,15 @@ func (g *Gen) NewV6AtTime(atTime time.Time) (UUID, error) {
binary.BigEndian.PutUint16(u[6:], uint16(timeNow&0xfff)) // set time_low (minus four version bits)

// Based on the RFC 9562 recommendation that this data be fully random and not a monotonic counter,
//we do NOT support batching version 6 UUIDs.
//set clock_seq (14 bits) and node (48 bits) pseudo-random bits (first 2 bits will be overridden)
// we do NOT support batching version 6 UUIDs.
// set clock_seq (14 bits) and node (48 bits) pseudo-random bits (first 2 bits will be overridden)
if _, err = io.ReadFull(g.rand, u[8:]); err != nil {
return Nil, err
}

u.SetVersion(V6)

//overwrite first 2 bits of byte[8] for the variant
// overwrite first 2 bits of byte[8] for the variant
u.SetVariant(VariantRFC9562)

return u, nil
Expand Down Expand Up @@ -368,29 +395,93 @@ func (g *Gen) NewV7AtTime(atTime time.Time) (UUID, error) {
if err != nil {
return Nil, err
}
//UUIDv7 features a 48 bit timestamp. First 32bit (4bytes) represents seconds since 1970, followed by 2 bytes for the ms granularity.
u[0] = byte(ms >> 40) //1-6 bytes: big-endian unsigned number of Unix epoch timestamp
// UUIDv7 features a 48 bit timestamp. First 32bit (4bytes) represents seconds since 1970, followed by 2 bytes for the ms granularity.
u[0] = byte(ms >> 40) // 1-6 bytes: big-endian unsigned number of Unix epoch timestamp
u[1] = byte(ms >> 32)
u[2] = byte(ms >> 24)
u[3] = byte(ms >> 16)
u[4] = byte(ms >> 8)
u[5] = byte(ms)

//Support batching by using a monotonic pseudo-random sequence,
//as described in RFC 9562 section 6.2, Method 1.
//The 6th byte contains the version and partially rand_a data.
//We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok,
//we need the least significant that contains the counter to ensure the monotonic property
// Support batching by using a monotonic pseudo-random sequence,
// as described in RFC 9562 section 6.2, Method 1.
// The 6th byte contains the version and partially rand_a data.
// We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok,
// we need the least significant that contains the counter to ensure the monotonic property
binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic

//override first 4bits of u[6].
// override first 4bits of u[6].
u.SetVersion(V7)

//set rand_b 64bits of pseudo-random bits (first 2 will be overridden)
// set rand_b 64bits of pseudo-random bits (first 2 will be overridden)
if _, err = io.ReadFull(g.rand, u[8:16]); err != nil {
return Nil, err
}
//override first 2 bits of byte[8] for the variant
// override first 2 bits of byte[8] for the variant
u.SetVariant(VariantRFC9562)

return u, nil
}

// GenerateBatchV7 creates a batch of k-sortable Version 7 UUIDs.
//
// Ensures strict monotonic ordering within the batch.
//
// Arguments:
// - batchSize: Number of UUIDs to generate.
//
// Returns:
// - []UUID: The generated UUIDs.
// - error: If batch generation fails.

func (g *MonotonicGen) GenerateBatchV7(batchSize int) ([]UUID, error) {
if batchSize <= 0 {
return nil, errors.New("batch size must be greater than zero")
}

uuids := make([]UUID, batchSize)

for i := 0; i < batchSize; i++ {
uuid, err := g.newMonotonicV7()
if err != nil {
return nil, err
}
uuids[i] = uuid
}
return uuids, nil
}

// newMonotonicV7 generates a Version 7 UUID with a monotonic counter for ordering.
//
// Returns:
// - UUID: The generated UUID.
// - error: If UUID generation fails.
func (g *MonotonicGen) newMonotonicV7() (UUID, error) {
var u UUID

ms, clockSeq, err := g.getMonotonicClockSequence(true, g.epochFunc())
if err != nil {
return Nil, err
}

// set the timestamp (48 bits)
u[0] = byte(ms >> 40)
u[1] = byte(ms >> 32)
u[2] = byte(ms >> 24)
u[3] = byte(ms >> 16)
u[4] = byte(ms >> 8)
u[5] = byte(ms)

// set rand_a (clockSeq ensures monotonicity)
binary.BigEndian.PutUint16(u[6:8], clockSeq)

// override version and variant bits
u.SetVersion(V7)

// set rand_b (64 random bits)
if _, err := io.ReadFull(g.rand, u[8:16]); err != nil {
return Nil, err
}
u.SetVariant(VariantRFC9562)

return u, nil
Expand Down Expand Up @@ -434,6 +525,40 @@ func (g *Gen) getClockSequence(useUnixTSMs bool, atTime time.Time) (uint64, uint
return timeNow, g.clockSequence, nil
}

// getMonotonicClockSequence returns a timestamp and clock sequence to ensure
// monotonic UUID generation, even when timestamps are identical.
//
// Arguments:
// - useUnixTSMs: Whether to use millisecond precision for the timestamp.
// - atTime: The reference time.
//
// Returns:
// - uint64: The timestamp.
// - uint16: The clock sequence.
// - error: If the sequence generation fails.
func (g *MonotonicGen) getMonotonicClockSequence(useUnixTSMs bool, atTime time.Time) (uint64, uint16, error) {
g.monotonicMutex.Lock()
defer g.monotonicMutex.Unlock()

var timeNow uint64
if useUnixTSMs {
timeNow = uint64(atTime.UnixMilli())
} else {
timeNow = g.getEpoch(atTime)
}

// If timeNow <= lastTime, increment the counter to ensure monotonicity.
if timeNow <= g.lastTime {
g.monotonicCounter++
} else {
g.monotonicCounter = 0
}

g.lastTime = timeNow

return timeNow, g.monotonicCounter, nil
}

// Returns the hardware address.
func (g *Gen) getHardwareAddr() ([]byte, error) {
var err error
Expand Down
81 changes: 77 additions & 4 deletions generator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -730,9 +730,9 @@ func makeTestNewV7Basic() func(t *testing.T) {
func makeTestNewV7TestVector() func(t *testing.T) {
return func(t *testing.T) {
pRand := make([]byte, 10)
//first 2 bytes will be read by clockSeq. First 4 bits will be overridden by Version. The next bits should be 0xCC3(3267)
// first 2 bytes will be read by clockSeq. First 4 bits will be overridden by Version. The next bits should be 0xCC3(3267)
binary.LittleEndian.PutUint16(pRand[:2], uint16(0xCC3))
//8bytes will be read for rand_b. First 2 bits will be overridden by Variant
// 8bytes will be read for rand_b. First 2 bits will be overridden by Variant
binary.LittleEndian.PutUint64(pRand[2:], uint64(0x18C4DC0C0C07398F))

g := &Gen{
Expand Down Expand Up @@ -934,11 +934,11 @@ func makeTestNewV7ClockSequence() func(t *testing.T) {
}

g := NewGen()
//always return the same TS
// always return the same TS
g.epochFunc = func() time.Time {
return time.UnixMilli(1645557742000)
}
//by being KSortable with the same timestamp, it means the sequence is Not empty, and it is monotonic
// by being KSortable with the same timestamp, it means the sequence is Not empty, and it is monotonic
uuids := make([]UUID, 10)
for i := range uuids {
u, err := g.NewV7()
Expand Down Expand Up @@ -1003,6 +1003,79 @@ func makeTestNewV7AtTime() func(t *testing.T) {
}
}

func TestGenerateBatchV7(t *testing.T) {
gen := NewMonotonicGen()
batchSize := 100

t.Run("Strict Monotonicity", func(t *testing.T) {
uuids, err := gen.GenerateBatchV7(batchSize)
if err != nil {
t.Fatalf("Error generating batch: %v", err)
}

for i := 1; i < len(uuids); i++ {
if uuids[i-1].String() >= uuids[i].String() {
t.Errorf("UUID %d (%s) is not less than UUID %d (%s)", i-1, uuids[i-1], i, uuids[i])
}
}
})

t.Run("Batch Size Validation", func(t *testing.T) {
uuids, err := gen.GenerateBatchV7(0)
if err == nil {
t.Errorf("expected error for zero batch size, got none")
}
if uuids != nil {
t.Errorf("expected nil UUID slice for zero batch size, got: %v", uuids)
}
})
}

func TestWithCustomPRNG(t *testing.T) {
seed := int64(42)
gen := NewMonotonicGen(WithCustomPRNG(seed))

t.Run("Deterministic UUID Generation", func(t *testing.T) {
uuid1, err := gen.newMonotonicV7()
if err != nil {
t.Fatalf("error generating UUID: %v", err)
}

uuid2, err := gen.newMonotonicV7()
if err != nil {
t.Fatalf("error generating UUID: %v", err)
}

if uuid1.String() == uuid2.String() {
t.Errorf("UUIDs generated with same seed are identical: %s", uuid1)
}
})
}

func TestMonotonicGenEdgeCases(t *testing.T) {
gen := NewMonotonicGen()
t.Run("Epoch Boundary Handling", func(t *testing.T) {
uuid, err := gen.newMonotonicV7()
if err != nil {
t.Fatalf("error generating UUID at epoch boundary: %v", err)
}
if uuid.IsNil() {
t.Errorf("Generated UUID at epoch boundary is nil: %v", uuid)
}
})

t.Run("Counter Rollover", func(t *testing.T) {
gen.monotonicCounter = 0xFFFF
uuid, err := gen.newMonotonicV7()
if err != nil {
t.Fatalf("error generating UUID during counter rollover: %v", err)
}
if uuid.IsNil() {
t.Errorf("Generated UUID during counter rollover is nil: %v", uuid)
}
})
}

func TestDefaultHWAddrFunc(t *testing.T) {
tests := []struct {
n string
Expand Down
Loading