Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Finalize Backfill Hook for All Backfila Clients #358

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mminkoffs
Copy link

No description provided.

@mminkoffs mminkoffs marked this pull request as ready for review December 1, 2023 15:59
// If multiple partitions finish at the same time they will retry due to the hibernate
// version mismatch on the DbBackfillRun.
val partitions = dbRunPartition.backfill_run.partitions(
session,
backfillRunner.factory.queryFactory,
)
if (partitions.all { it.run_state == BackfillState.COMPLETE }) {
partitions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do these server side changes in a separate PR, and get them deployed before making the client side changes. Otherwise, a client developer might implement the finalize method when it doesn't do anything yet.

Also, there are problems with this code I mentioned in the previous PR. I will see if I can take a look at writing this correctly in the next few days

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - let me know a commit when you're done and we can either integrate it with this PR or externally!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want a COMPLETE_&_FINALIZED state to make that clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shellderp just curious if any progress was made here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how to code it up, read the version and retry both transactions if the 2nd read a different version. I can do this in a followup if you get the code merged as is (just add tests in this PR)

Copy link
Collaborator

@mpawliszyn mpawliszyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need much more testing as well.

Also the embedded backfila has to be updated as well.

Overall though it seems to be heading in a good direction.

when (e) {
is HttpException ->
when (e.code()) {
404 -> logger.info { "No finalization endpoint found for ${backfillRunner.backfillName}, skipping" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should show an event in the backfill.

I wonder if we want something to indicate versions for a backfill. Like what Backfills have what capabilities.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want an event even in the success case?

@@ -183,14 +184,44 @@ class BatchAwaiter(
),
)

// If all states are COMPLETE the whole backfill will be completed.
// If all states are COMPLETE, for every other partition, the whole backfill will be completed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a state that is Finalizing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the extra complexity is needed

@@ -13,3 +13,9 @@ data class PrepareBackfillConfig<Param : Any>(
val parameters: Param,
val dryRun: Boolean,
)

data class FinalizeBackfillConfig<Param : Any>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need better names and/or structure for these.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionlessBackfillConfig? Or WholeBackfillConfig?

This is config that is meant to represent the whole run. WholeRunBackfillConfig? Also add the accessor above. And probably an accessor to Prepare here too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with it as is

// If multiple partitions finish at the same time they will retry due to the hibernate
// version mismatch on the DbBackfillRun.
val partitions = dbRunPartition.backfill_run.partitions(
session,
backfillRunner.factory.queryFactory,
)
if (partitions.all { it.run_state == BackfillState.COMPLETE }) {
partitions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want a COMPLETE_&_FINALIZED state to make that clearer?

@@ -10,4 +11,9 @@ abstract class FixedSetBackfill<Param : Any> : Backfill {
}

abstract fun runOne(row: FixedSetRow, backfillConfig: BackfillConfig<Param>)

/**
* Override this to do any work after the backfill completes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI This is a test only implementation.

val config = parametersOperator.constructBackfillConfig(request)
backfill.finalize(config)

return FinalizeBackfillResponse.Builder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the // Return empty 200 to indicate it was successful similar comment above. Does it make sense to make that more apparant?

@@ -48,4 +49,9 @@ abstract class S3DatasourceBackfill<R : Any, P : Any> : Backfill {
* Produces records from the S3 file.
*/
abstract val recordStrategy: RecordStrategy<R>

/**
* Override this to do any work after the backfill completes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update the comment a bit:
Override this to do any final work after the backfill completes. Only one successful call is expected in your distributed system and this call must be idempotent.

Does that make sense?

@@ -21,9 +21,9 @@ internal class BackfillRegisteredParameters @Inject constructor(
private val queryFactory: Query.Factory,
) : HibernateBackfill<DbRegisteredBackfill, Id<DbRegisteredBackfill>, NoParameters>() {

override fun runOne(id: Id<DbRegisteredBackfill>, config: BackfillConfig<NoParameters>) {
override fun runOne(pkey: Id<DbRegisteredBackfill>, config: BackfillConfig<NoParameters>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo the changes to this class

@@ -379,6 +393,18 @@ class BackfillRunner private constructor(
}
}

suspend fun finalize(state: FinalizeState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shellderp is this right wrt coroutines?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea

@@ -168,10 +170,9 @@ class BatchAwaiter(
return backfillRunner.runBatchAsync(this, batch)
}

private fun completePartition() {
private suspend fun completePartition() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this, is it right wrt coroutines? @shellderp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants