Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-2398-Schema Change operation #222

Open
wants to merge 2 commits into
base: 4.1
Choose a base branch
from

Conversation

Tushar-TG-14
Copy link
Contributor

No description provided.

== Schema Change Operations Best Practices

Schema changes are essential for maintaining and improving your graph's structure. However, they must be done carefully to avoid impacting ongoing operations.
Following best practices minimizes the risk of failures and ensures smooth execution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check your grammar

Comment on lines +40 to +41
* *Do ensure no active queries or loading jobs.* Run schema changes only after aborting ongoing jobs that might reference schema components being modified.
* *Do perform schema changes during low-traffic periods.* This reduces the chances of failure and conflicts during schema modification.
Copy link
Collaborator

@victorleeTG victorleeTG Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to be contradictory rules.
Rule 1 says no queries or loading jobs.
Rule 2 says "low traffic". What type of traffic is allowed? It also does't answer the question of "how much is 'low'?"


*Verifying Schema Consistency:*

* Use the `gsql` commands to check the schema version after the schema change. This will confirm that the schema updates have been applied correctly, and that no unintended changes have occurred.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What GSQL commands?
@Tushar-TG-14 If you don't know the answer to this question, then the reader won't know either.

Comment on lines +80 to +81
Schema changes will invalidate queries and loading jobs dependent on modified schema components.
This may cause errors or failures in queries trying to reference altered or removed schema elements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepkhajanchi What does "invalidate" mean in this context?
Does it mean "the query and/or loading job" has been "uninstalled" (A query can be uninstalled, but I don't know about loading jobs.)
Or does it mean "the user can still run it, but it might be broken and not work correctly."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can/should be more specific about the rules for when a query/job will be invalid.

Comment on lines +17 to +18
* A loading job becomes invalid if it refers to a vertex or and an edge which has been *dropped* (deleted) or *altered*.
* A query becomes invalid if it refers to a vertex, and edge, or an attribute which has been *dropped*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always be careful to distinguish between individual vertices/edges vs. vertex/edge types.
Loading jobs refer to vertex/edge types; a job will be invalid if it refers to a type or an attribute of a type that no longer exists.

A query usually refers to types but in theory could refer to individual vertices and edges. I think this rule is meant to also refer to types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd comment:
Use "DROP" and "DELETE" properly.
In SQL and GSQL, these two keywords have distinct meanings. Therefore, you should never say one is a synonym of the other.
DROP is to remove a catalog item. Catalog items include vertex TYPES, edge TYPES, graphs, queries, and loading jobs. [In a relational database, you could DROP a table.]

DELETE is to remove a data object from the database - a vertex or edge. [In a relational database you could DELETE a row of a table.]


*Verifying Data Consistency:*

* Run the `gstatusgraph` command to check that the data size remains consistent with what was recorded before the schema change. This will ensure that no data loss or corruption has occurred during the schema update.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data loss and data corruption are some of the worst errors that a database can do. Thing about it.

  1. Shouldn't we recommend that users perform a backup before performing a schema change?
  2. Check other database products to see if/when they mention "data loss", "data corruption", or "data inconsistency". We want to see how they mention this.
    Compare these two:
  3. "It is essential to make sure that there are no loading jobs in progress when you start the schema change. Failure to follow this rule may result in data corruption." (It's the user's fault.)
  4. "Schema changes sometimes result in data corruption." (It's TigerGraph's fault.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants