retire schema.normalizeForHash() & reuse schema.toInlineSchemaString() #7122

shans · 2021-04-26T00:26:22Z

(from b/168267507)

The schema.normalizeForHash() method produces a string representation of a schema, used for generating a hash. It has two requirements:
(1) different schemas produce different strings
(2) different representations of the same schemas produce the same string

The second requirement basically requires that the fields of the schema (and any child schemas) be output in a defined order; we use lexicographic.

However, the first requirement in practice has not been met (e.g. see #6104).

Given that the manifest representation of schemas must meet the first requirement (this is the source of schemas, therefore it's not possible to have two different schemas with the same input string representation), it seems like we should switch to using this as the basis for hash generation instead.

In order to do so, we'd need to add an option for sorting the fields before outputting the string; but this doesn't seem particularly difficult.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retire schema.normalizeForHash() & reuse schema.toInlineSchemaString() #7122

retire schema.normalizeForHash() & reuse schema.toInlineSchemaString() #7122

shans commented Apr 26, 2021

retire schema.normalizeForHash() & reuse schema.toInlineSchemaString() #7122

retire schema.normalizeForHash() & reuse schema.toInlineSchemaString() #7122

Comments

shans commented Apr 26, 2021