Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Home Database Cache : Stripped Back Solution #1235

Open
wants to merge 68 commits into
base: 5.0
Choose a base branch
from
Open

Conversation

MaxAake
Copy link
Contributor

@MaxAake MaxAake commented Oct 16, 2024

Spikes implementation of a home database cache in the driver object, used to reduce round trips associated with identifying the home database of a user when a database is not set manually.

This behavior is turned on by a flag introduced in bolt 5.8, so the PR also includes support for Bolt 5.8

This work was migrated from PR #1230

@MaxAake MaxAake self-assigned this Oct 16, 2024
@MaxAake
Copy link
Contributor Author

MaxAake commented Oct 16, 2024

@MaxAake MaxAake changed the title Home Database Cache : Stripped Back Spike Home Database Cache : Stripped Back Solution Spike Oct 17, 2024
@MaxAake MaxAake marked this pull request as ready for review December 11, 2024 13:59
@robsdedude robsdedude self-requested a review December 19, 2024 14:08
Copy link
Member

@robsdedude robsdedude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the comments of today so I don't loose them in draft mode. I'll continue the review tomorrow.

@@ -331,6 +335,16 @@ export default class ChannelConnection extends Connection {
if (telemetryEnabledHint === true) {
this._telemetryDisabledConnection = false
}

const SSREnabledHint = metadata.hints['ssr.enabled']
if (SSREnabledHint) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (SSREnabledHint) {
if (SSREnabledHint === true) {

Realistically, we should never see any other truthy value from the server, but I'm in favor of practicing defensive input validation :)

this.serversideRouting = false
}
if (this._ssrCallback !== undefined) {
this._ssrCallback(this.serversideRouting, true)
Copy link
Member

@robsdedude robsdedude Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hard to read code. I can't see what this boolean flag does without looking at some/all implementations of the callback. I suggest to either use an enum-ish construct, different callbacks for opening and closing, or a single callback that gets passed the connection as only argument and uses con.isOpen() and con.serversideRouting to determine what to do.

@@ -43,6 +43,7 @@ export function createChannelConnection (
log,
clientCertificate,
serversideRouting = null,
ssrCallback = (_) => {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (because internal function): new parameter not in doc comment

} else {
this.serversideRouting = false
}
if (this._ssrCallback !== undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? Can one pass in undefined with the argument having a default value? (same check also in close())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. The check came before the default value and weren't removed when it was added.

return BOLT_PROTOCOL_V5_8
}

get transformer () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS noob question: Why is this override that's the exact same as its parent class necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really isn't. Every 5.x release has been overriding with it's own transformer, despite no changes having been made, so effectively they're all using the same 5.0 transformer. Then at the jump to 5.6 the import wasn't bumped.

Looking at it I'm leaning towards not creating new transformer versions with every release and cleaning up empty ones, but in this PR I will just fix the last 3 bolt versions to create transformers of their own version, and leave the larger task to another PR.

if (!this._databaseNameResolved) {
if (this._homeDatabaseCallback != null) {
// eslint-disable-next-line
this._homeDatabaseCallback((this._impersonatedUser ? 'basic:' + this._impersonatedUser : undefined) ?? this._auth?.cacheKey ?? 'DEFAULT', database)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better maintainability, this logic should live somewhere central (close to all the other logic that's concerned with home db caching and cache keying).

(There's a copy of this line further down - won't comment again)

@robsdedude robsdedude self-requested a review December 19, 2024 16:39
Copy link
Member

@robsdedude robsdedude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next batch of comments. Sorry the PR is quite extensive 😅. So I'll continue my review after the holidays 🎄. Have a lovely time 🏂!

@@ -108,6 +117,7 @@ interface DriverConfig {
logging?: LoggingConfig
notificationFilter?: NotificationFilter
connectionLivenessCheckTimeout?: number
user?: string | undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the public driver config type? What is this new config option used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch! This is a remnant of older spike work that should absolutely not still be there

Comment on lines 59 to 60
* @property {function (databaseName:string?)} param.onDatabaseNameResolved - Callback called when the database name get resolved
* @property {function (databaseName:string?)} param.removeFailureFromCache - Callback for deleting lost db from cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented arg type string? does not match the declared type below string. Which on is the intended way?

auth?: AuthToken
homeDb?: any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a narrower type be used? Maybe string? or whatever is appropriate?

@@ -55,6 +56,12 @@ const DEFAULT_MAX_CONNECTION_LIFETIME: number = 60 * 60 * 1000 // 1 hour
*/
const DEFAULT_FETCH_SIZE: number = 1000

/**
* The maximum number of entries allowed in the home database cache before pruning
* @type {number}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is a copy pasta 📄🍝. Why declare the type of the const and put the type in the comment? Is there any benefit or is this just introducing the potential for out-of-sync information duplication?

@@ -101,7 +104,9 @@ class Session {
bookmarkManager,
notificationFilter,
auth,
log
log,
homeDatabaseCallback,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment missing new (and some old 🫠) parameters.

Comment on lines 150 to 166
const databaseSpecificErrorHandler = new ConnectionErrorHandler(
SESSION_EXPIRED,
(error, address) => this._handleUnavailability(error, address, context.database),
(error, address) => this._handleWriteFailure(error, address, context.database),
(error, address) => {
if (removeFailureFromCache !== undefined) {
removeFailureFromCache(homeDb ?? context.database)
}
return this._handleUnavailability(error, address, context.database)
},
(error, address) => {
if (removeFailureFromCache !== undefined) {
removeFailureFromCache(homeDb ?? context.database)
}
return this._handleWriteFailure(error, address, homeDb ?? context.database)
},
(error, address, conn) =>
this._handleSecurityError(error, address, conn, context.database)
)
Copy link
Member

@robsdedude robsdedude Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any mention of this in the ADR. Do we want to remove a cache entry on connection failure? Pros/cons? Either way, uniform behavior is desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: should we decide to go with this approach, double check the argument passed to removeFailureFromCache and convince yourself that this is the right cache entry to invalidate.

Copy link
Member

@robsdedude robsdedude Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this approach for 2 reasons:

  • it affects the public API for auth tokens (in multiple ways)
    • if users where using a cacheKey property, this will clash. I don't see a good use-case for this, so I think it's unlikely, but still possible
    • users that build their own custom auth objects (not using one of these factory methods) would have to change their code and understand what a correct cache value looks like for their custom auth tokens to work properly with the cache
  • the chosen cache keys are not collision free. Please find an injective mapping TOKENS → CACHE_KEYS. Where any custom token ∈ TOKENS.

packages/core/src/auth.ts Outdated Show resolved Hide resolved
@robsdedude robsdedude self-requested a review December 20, 2024 16:46
@@ -161,9 +166,9 @@ class ConnectionHolder implements ConnectionHolderInterface {
return this._referenceCount
}

initializeConnection (): boolean {
initializeConnection (homeDatabaseTable?: any): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
initializeConnection (homeDatabaseTable?: any): boolean {
initializeConnection (homeDatabaseTable?: string): boolean {

At least that's what _createConnectionPromise expects and I don't see any narrowing type checks or such here.

return await connectionProvider.acquireConnection({
accessMode: this._mode,
database: this._database,
database: this._database ?? '',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I've not missed something: is this change necessary?

I see that ConnectionHolder.setDatabase would accept and set _database to null (which , but I don't think was was happening before or is happening now. I think _database should maybe be typed string, not ?: string.

Comment on lines 150 to 166
const databaseSpecificErrorHandler = new ConnectionErrorHandler(
SESSION_EXPIRED,
(error, address) => this._handleUnavailability(error, address, context.database),
(error, address) => this._handleWriteFailure(error, address, context.database),
(error, address) => {
if (removeFailureFromCache !== undefined) {
removeFailureFromCache(homeDb ?? context.database)
}
return this._handleUnavailability(error, address, context.database)
},
(error, address) => {
if (removeFailureFromCache !== undefined) {
removeFailureFromCache(homeDb ?? context.database)
}
return this._handleWriteFailure(error, address, homeDb ?? context.database)
},
(error, address, conn) =>
this._handleSecurityError(error, address, conn, context.database)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: should we decide to go with this approach, double check the argument passed to removeFailureFromCache and convince yourself that this is the right cache entry to invalidate.

}

/**
* Updates or add an entry to the cache, and prunes the cache if above the maximum allowed size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Updates or add an entry to the cache, and prunes the cache if above the maximum allowed size
* Updates or adds an entry to the cache, and prunes the cache if above the maximum allowed size

private _pruneCache (): void {
if (this.map.size > this.maxSize) {
const sortedArray = Array.from(this.map.entries()).sort((a, b) => a[1].lastUsed - b[1].lastUsed)
for (let i = 0; i < 70; i++) { // c * max_size * logn(max_size), c is set to 0.01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's quite likely that this will be forgotten should the maxSize ever be adjusted.

I suggest to either compute this dynamically in the constructor or to make sure the definition of the truncation size lives right next to the definition of the maxSize along with this comment.

packages/core/src/session.ts Show resolved Hide resolved
@@ -517,6 +533,24 @@ class Session {
}
}

_beginDbCallback (database: string): void {
// eslint-disable-next-line
if (this._connectionHolderWithMode(this._mode).connectionProvider()?.SSREnabled()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we should probably discuss with the team to make sure the drivers are aligned: if SSR is not enabled (or we don't know because of a too old protocol version), do we still want to maintain the home db cache in the hopes that a DBMS upgrade occurs later and we're having a cache already warm and ready?

@@ -517,6 +533,24 @@ class Session {
}
}

_beginDbCallback (database: string): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something here: why is there a difference between _onDatabaseNameResolved and _beginDbCallback? Can't we always safely pin the resolved home db to the session regardless of SSR or not (cache updates discussion above pending)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The differing behavior was required when the COMMIT SUCCESS message was used, and the improvement was not made when we switched to the BEGIN SUCCESS. Good catch

@@ -632,6 +669,7 @@ class Session {
*/
function _createTransactionExecutor (config?: {
maxTransactionRetryTime: number | null
commitCallback: any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential for optimization (dismiss if too much hassle to implement): session.run uses afterComplete to process the resolved home db. This means the cache only gets updated after all records have been streamed even though the information is already available earlier. This makes a difference if either the result stream dies/errors out or if multiple transactions are being run concurrently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code again, I'm unsure: does afterComplete only get to process the metadata of the last SUCCESS (i.e., the one without has_more)? In that case, this seems to be a left-over from an earlier design. The server echoes back the resolved home db on RUN SUCCESS. I assume I'm misreading the code... TestKit should've caught such thing 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants