Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uniqueIdentifier to storage table #6249

Open
wants to merge 12 commits into
base: production
Choose a base branch
from
Open

Conversation

CarolineDenis
Copy link
Contributor

@CarolineDenis CarolineDenis commented Feb 17, 2025

Fixes #6247

Warning

This PR affects database migrations. See migration testing instructions.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add relevant documentation (Tester - Dev)
  • Add a reverse migration if a migration is present in the PR

Testing instructions

  • Open schema config
  • Select storage table
  • Verify the new field GUID is there
  • Create or edit your storage view def
  • Add guid field as not readonly field
  • Save
  • Open new storage data entry
  • Enter required info but not guid
  • verify you can save without a GUID
  • Enter a GUID (sequence of numbers or letters or mix )
  • Copy the GUID (you will need it later)
  • Save
  • verify you can save with guid
  • Create a new storage entry
  • Enter the GUID you copied
  • Save
  • Verify you cannot save because the GUID is not unique

@grantfitzsimmons grantfitzsimmons added this to the 7.10.2 milestone Feb 17, 2025
@CarolineDenis CarolineDenis requested review from sharadsw, acwhite211 and a team February 17, 2025 21:53
@acbentley
Copy link

Did we determine for this whether the users were asking for a user-defined GUID or an auto-filled "real" GUID? like in other tables?

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTES:
To force uniqueness on the guid, add uniqueness rule in the schema config

GUID by its very definition must be globally unique, therefore we need to have a non-configurable system uniqueness rule for this

@acbentley
Copy link

NOTES:
To force uniqueness on the guid, add uniqueness rule in the schema config

GUID by its very definition must be globally unique, therefore we need to have a non-configurable system uniqueness rule for this

By that same definition, a GUID field should not be self-created as that too breaks (potentially) the GLOBALLY unique characteristic surely.

@melton-jason
Copy link
Contributor

Currently, only CollectionObject -> guid has a defined uniqueness rule and that rule is removable/modifiable by the user.

{
"rule": [["guid"], []],
"isDatabaseConstraint": false
}

If it's a requirement, we can modify the existing guid_rules business rule (or add a new business rule) to enforce something like an 'implicit' uniqueness rule for guid (which would be non-configurable and not present in the schema config, unless we define the uniqueness rules for the other tables with the guid field).

@orm_signal_handler('pre_save')
def set_guids(sender, obj):
if not hasattr(obj, 'guid'):
return
# See https://github.com/specify/specify7/issues/4243#issuecomment-1836990623
if sender in (Taxon, Geography):
return
if obj.guid is None or obj.guid.strip() == "":
obj.guid = str(uuid4())

It's not too difficult to search for duplicates within a single table:

# Assuming we're in some 'global' signal handler, like set_guids(sender, obj)
duplicates = sender.objects.only('id').filter(guid=obj.guid).exclude(id=obj.id)
if len(duplicates) > 0: 
    # raise a business rule exception following the expected uniqueness rule signature

If GUIDs should always be unique across all tables (e.g., no record should have the same GUID from any other record regardless of table) then things get a little more complicated, but conceptually I think we'd probably want to perform a join across all tables with the guid field (rather than applying the single case across applicable tables, to prevent hitting the database multiple times quickly).

@acbentley
Copy link

Currently, only CollectionObject -> guid has a defined uniqueness rule and that rule is removable/modifiable by the user.

{
"rule": [["guid"], []],
"isDatabaseConstraint": false
}

If it's a requirement, we can modify the existing guid_rules business rule (or add a new business rule) to enforce something like an 'implicit' uniqueness rule for guid (which would be non-configurable and not present in the schema config, unless we define the uniqueness rules for the other tables with the guid field).

@orm_signal_handler('pre_save')
def set_guids(sender, obj):
if not hasattr(obj, 'guid'):
return
# See https://github.com/specify/specify7/issues/4243#issuecomment-1836990623
if sender in (Taxon, Geography):
return
if obj.guid is None or obj.guid.strip() == "":
obj.guid = str(uuid4())

It's not too difficult to search for duplicates within a single table:

# Assuming we're in some 'global' signal handler, like set_guids(sender, obj)
duplicates = sender.objects.only('id').filter(guid=obj.guid).exclude(id=obj.id)
if len(duplicates) > 0: 
    # raise a business rule exception following the expected uniqueness rule signature

If GUIDs should always be unique across all tables (e.g., no record should have the same GUID from any other record regardless of table) then things get a little more complicated, but conceptually I think we'd probably want to perform a join across all tables with the guid field (rather than applying the single case across applicable tables, to prevent hitting the database multiple times quickly).

Is there no uniqueness rule inherently built into the way these are generated?

@CarolineDenis CarolineDenis requested a review from a team February 18, 2025 18:07
Copy link
Collaborator

@lexiclevenger lexiclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing instructions

  • Verify the new field GUID is there
  • verify you can save without a GUID
  • verify you can save with guid
  • Verify you cannot save because the GUID is not unique

Looks great! GUID must be unique, and the uniqueness rule is non-configurable.
Screenshot 2025-02-19 123535

@grantfitzsimmons
Copy link
Member

grantfitzsimmons commented Feb 20, 2025

Edit: #6259


@melton-jason @acbentley

If you feel strongly about this, we could rename this something like "Unique Identifier" instead. That way, it’s clear that we’re aiming for uniqueness within just the storage tree, but we don’t expect these to be unique globally.

These identifiers need to be unique within the institution, and really only unique for the storage tree. I'm not sure that these GUIDs are beneficial in any context other than managing the location of physical specimens and for object tracking.

Not sure we necessarily need to add logic to ensure that GUIDs are unique outside of specific contexts, such as Collection Object GUID, Taxon GUID, or Storage GUID. The chances of a repeat value appearing among these are very low, especially if they are auto-generated. If duplicates are manually entered between tables for any reason, it’s likely not a major issue, but I understand the concern. From an academic perspective, we would want to maintain the "globally unique" aspect.

For now, the bottom line is that we need to ensure that a field intended to be unique is enforced as such. I believe @CarolineDenis's solution is sufficient at this stage, though I welcome a name change if you both feel it is necessary.

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing instructions

  • Verify the new field GUID is there
  • verify you can save without a GUID
  • verify you can save with guid
  • Verify you cannot save because the GUID is not unique

Migrations all good:

~/GitHub/specify7 on issue-6247 = !8 ❯ docker exec -it specify7-specify7-1 /bin/bash                                                        at 08:48:11 PM
specify@2cf30bc205b3:/opt/specify7$ ve/bin/python manage.py migrate businessrules 0005
[20/Feb/2025 02:48:54] [ERROR] [specifyweb.attachment_gw.views:237] Bad attachment key.
Operations to perform:
  Target specific migration: 0005_cojo, from businessrules
Running migrations:
  Rendering model states... DONE
  Unapplying businessrules.0006_storage_guid... OK
specify@2cf30bc205b3:/opt/specify7$  ve/bin/python manage.py migrate specify 0023
[20/Feb/2025 02:49:26] [ERROR] [specifyweb.attachment_gw.views:237] Bad attachment key.
Operations to perform:
  Target specific migration: 0023_update_schema_config_text, from specify
Running migrations:
  Rendering model states... DONE
  Unapplying specify.0024_add_guid_storage... OK
specify@2cf30bc205b3:/opt/specify7$ 

👍

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirement were updated during the usability meeting, will re-review once the field name changes.

Copy link
Collaborator

@lexiclevenger lexiclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify the new field GUID is there
  • verify you can save without a GUID
  • verify you can save with guid
  • Verify you cannot save because the GUID is not unique

Looks good! Everything is functional, and GUID is now named uniqueIdentifier.

@CarolineDenis CarolineDenis changed the title Add GUID to storage table Add uniqueIdentifier to storage table Feb 21, 2025
@CarolineDenis CarolineDenis requested a review from a team February 21, 2025 16:13
Copy link
Contributor

@Areyes42 Areyes42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify the new field GUID is there
  • verify you can save without a GUID
  • verify you can save with guid
  • Verify you cannot save because the GUID is not unique

uniqueIdentifier looks good!
Screenshot 2025-02-21 at 12 20 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Dev Attention Needed
Development

Successfully merging this pull request may close these issues.

Add uniqueIdentifier to Storage
8 participants