Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/json schema article #6787

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

pubkey
Copy link
Owner

@pubkey pubkey commented Jan 28, 2025

DO NOT MERGE

docs-src/docs/article-json-schema.md Outdated Show resolved Hide resolved
docs-src/docs/article-json-schema.md Outdated Show resolved Hide resolved
docs-src/docs/article-json-schema.md Outdated Show resolved Hide resolved

While RxDB reuses the core JSON Schema specification, it also extends that schema to introduce RxDB-specific features. Like in other NoSQL databases, you manually define which fields to encrypt, which ones to index, and how to interpret specific fields for queries. RxDB puts these configurations directly into the JSON Schema with additional properties:

- `primaryKey`: Specifies which field in the document serves as the primary key.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting stuff btw. One line of work we did in the past was helping Oracle define a standard "database" extension to JSON Schema: https://github.com/json-schema-org/vocab-database/blob/main/database.md (which they implement at Oracle). I wonder if that same database vocabulary could be a good fit for RxDB and if it could be extended for the things you do that they don't!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should separately discuss how to register your extension keywords as a proper "vocabulary" as the next version of JSON Schema will disallow unrecognised keywords! I would love to help you out with this.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should separately discuss how to register your extension keywords as a proper "vocabulary" as the next version of JSON Schema will disallow unrecognised keywords! I would love to help you out with this.

Oh this is new to me. RxDB could always fall back to having a separate config-object, outside of the schema.


- `primaryKey`: Specifies which field in the document serves as the primary key.
- `indexes`: Defines which fields (or combination of fields) RxDB indexes. You can have single-field indexes or compound indexes.
- `version`: Indicates the version of the schema. Whenever you change your schema, you must increment this version so RxDB can handle migrations or other adjustments. This is important because data migration on a client-side database can be tricky when you have many clients out there that update your app at different points in time.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK for the blog post. For your later consideration, the current "standard" way of doing versioning is with $id, and making the version part of that. For cases like RxDB, URNs or Tag URIs can be interesting!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about these. RxDB is so complex already and having the user to learn about the URI instead of a version number might be more confusing than what is solves.


Performance is a critical factor in deciding whether to validate documents at runtime, especially in production environments. The following tables illustrate a basic comparison of initialization time (time-to-first-insert) and bulk insertion speed for different validators on two RxDB storages

The RxDB team ran performance benchmarks using two storage options:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the exact browser version and machine characteristics you used to show the numbers below?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this.


Over time, RxDB has evolved its usage of JSON Schema, learning from real production experiences and feedback from the community. Here are some key takeaways:

- Avoid inlined `required` fields: Some validators let you write `"required": true` inside the property definition, but the official JSON Schema specification requires you to declare an array of required fields at the parent object level. Following the standard from the start avoids confusion and ensures broader validator compatibility.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is historical. JSON Schema Draft 3 used the boolean form of required. The issue here really touches on what I mentioned before: most schema authors avoid $schema, thus set no versions, and everybody gets confused about what version of JSON Schema they are actually using. Can you revise this paragraph, maybe rephrasing it as some validators being obsolete? (as required: true was indeed valid ages ago!)


- Avoid inlined `required` fields: Some validators let you write `"required": true` inside the property definition, but the official JSON Schema specification requires you to declare an array of required fields at the parent object level. Following the standard from the start avoids confusion and ensures broader validator compatibility.

- Keep Custom Fields at the Top Level: Originally, RxDB allowed custom definitions (`index`, `encrypted`, etc.) to appear deeply nested. This caused performance hits because the library had to traverse large schema objects to find them. By placing these fields at the top level, RxDB can parse and apply them much faster, improving startup times.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now wonder if these custom fields even need to be part of the schema. Maybe you have a JSON object with these properties and a schema property? Then you don't have to deal with the complexity of creating a custom JSON Schema vocabulary, etc

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about this. Maybe RxDB should move them to somewhere else.


- Keep Custom Fields at the Top Level: Originally, RxDB allowed custom definitions (`index`, `encrypted`, etc.) to appear deeply nested. This caused performance hits because the library had to traverse large schema objects to find them. By placing these fields at the top level, RxDB can parse and apply them much faster, improving startup times.

- Error messages are not standardized: Each validator produces a different structure for error messages. If your app logic inspects these errors, you risk partial or complete rewrites if you ever switch validators. Decide early on which validator meets your needs and plan on sticking with it long-term.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge one we are constantly battling against. Can you mention the Standard Output Formats I touched on before? If you can express your desire for validators to comply, that would be wonderful!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I add this.

docs-src/docs/article-json-schema.md Outdated Show resolved Hide resolved
@jviotti
Copy link

jviotti commented Jan 29, 2025

Awesome stuff! I left a few comments just to polish things a bit. I also left some suggestions for improving RxDB's JSON Schema integration. Of course, those don't have to be solved for the case study, but would love to separately collaborate on them :)

Once we feel the draft is ready, we can send a PR to the official JSON Schema website to get reviews from the rest of the TSC and get it published.

Thanks a lot for putting this together! It already looks very interesting

@pubkey
Copy link
Owner Author

pubkey commented Jan 29, 2025

@jviotti Thank you for the feedback. I updated all parts, please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants