Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataWatcher #1990

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Add DataWatcher #1990

wants to merge 34 commits into from

Conversation

stevenwarejones
Copy link
Collaborator

@stevenwarejones stevenwarejones commented Jan 3, 2025

No description provided.

@wfa-reviewable
Copy link

This change is Reviewable

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @kungfucraig and @stevenwarejones)


a discussion (no related file):
Base this PR branch on the dependent PR branch and set the base of the PR in GitHub. See #1991 (review)

@stevenwarejones stevenwarejones changed the base branch from main to marcopremier/controlplane-api January 6, 2025 18:33
@stevenwarejones stevenwarejones changed the base branch from marcopremier/controlplane-api to main January 7, 2025 15:32
@stevenwarejones stevenwarejones changed the base branch from main to marcopremier/controlplane-api January 7, 2025 15:56
@Marco-Premier Marco-Premier force-pushed the marcopremier/controlplane-api branch from a269371 to 64ee53d Compare January 7, 2025 17:13
Copy link
Collaborator Author

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @Marco-Premier, and @SanjayVas)


a discussion (no related file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Base this PR branch on the dependent PR branch and set the base of the PR in GitHub. See #1991 (review)

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 13 files at r1, 2 of 2 files at r2, 6 of 6 files at r3, all commit messages.
Reviewable status: 12 of 13 files reviewed, 13 unresolved discussions (waiting on @kungfucraig, @Marco-Premier, and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 17 at r3 (raw file):

 */

package org.wfanet.measurement.securecomputation.datawatcher

Cloud-specific implementations go under a deploy tree. Since this is gcloud-specific, it would go under deploy/gcloud.


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 34 at r3 (raw file):

class DataWatcher(
  private val workItemsService: GooglePubSubWorkItemsService,

This should take in a gRPC stub for the service, not an instance of the service implementation.


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 39 at r3 (raw file):

  override fun accept(event: CloudEvent) {
    val cloudEventData = String(event.getData()!!.toBytes(), StandardCharsets.UTF_8)

The non-null assertion is for when we know from compile-time constraints that the value cannot be null, but we can't prove it to the compiler. For cases like this, use requireNotNull instead as you're expressing a precondition on the argument.

Suggestion:

val cloudEventData = requireNotNull(event.getData()) { "event must have data" }.toBytes().decodeToString()

src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 42 at r3 (raw file):

    val builder = StorageObjectData.newBuilder()
    JsonFormat.parser().merge(cloudEventData, builder)
    val data = builder.build()

Avoid exposing the mutable builder variable to the outer context. This is the same reason we use functions like buildList/buildMap.

Suggestion:

    val data = StorageObjectData.newBuilder().apply {
       JsonFormat.parser().merge(cloudEventData, this)
    }.build()

src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 43 at r3 (raw file):

    JsonFormat.parser().merge(cloudEventData, builder)
    val data = builder.build()
    val bucket = data.getBucket()

Specify types where not obvious (i.e. the type is not named on the right side of the assignment)


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 45 at r3 (raw file):

    val bucket = data.getBucket()
    val blobKey = data.getName()
    dataWatcherConfigs.forEach { config ->

nit: prefer regular for-each loop unless chaining collection functions


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 48 at r3 (raw file):

      val regex = config.sourcePathRegex.toRegex()
      if (regex.containsMatchIn(blobKey)) {
        @Suppress("WHEN_ENUM_CAN_BE_NULL_IN_JAVA")

Document the reason for any warning suppressions.


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/BUILD.bazel line 6 at r3 (raw file):

kt_jvm_library(
    name = "datawatcher",

Suggestion:

data_watcher

src/test/kotlin/org/wfanet/measurement/securecomputation/datawatcher/BUILD.bazel line 2 at r3 (raw file):

load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_test")
load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_library")

nit: unused


src/test/kotlin/org/wfanet/measurement/securecomputation/datawatcher/BUILD.bazel line 6 at r3 (raw file):

package(
    default_testonly = True,
    default_visibility = ["//visibility:public"],

This should be private, as nothing else should depend on tests. Private is also the default, so there's no need to specify it. Similarly, there's usually no need to have a package declaration at all in the src/test tree since nothing else can depend on any declared libraries, so there's inherently no need to specify testonly (_test targets are inherently testonly).


src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/data_watcher_config.proto line 27 at r3 (raw file):

option java_outer_classname = "DataWatcherConfigProto";

message DataWatcherConfig {

Is the config part of the API? If not, it can go in a separate config package rather than the versioned API package.


src/test/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcherTest.kt line 43 at r3 (raw file):

  @Test
  fun matchedPath() {

Test methods have the general pattern of "<SUT method> does X [when Y]". You can omit the method when there's only one.

Suggestion:

`creates WorkItem when path matches`

src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/BUILD.bazel line 4 at r3 (raw file):

load(
    "@wfa_rules_kotlin_jvm//kotlin:defs.bzl",
    "kt_jvm_grpc_proto_library",

nit: unused

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 13 files at r1, 2 of 2 files at r2, 6 of 6 files at r3, all commit messages.
Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @Marco-Premier and @stevenwarejones)


src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/BUILD.bazel line 17 at r3 (raw file):

    strip_import_prefix = IMPORT_PREFIX,
    deps = [
        "@com_google_googleapis//google/api:field_behavior_proto",

Even though you import them, you are not using anything from these first two deps in your proto.


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 33 at r3 (raw file):

import org.wfanet.measurement.securecomputation.datawatcher.v1alpha.DataWatcherConfigKt.triggeredApp

class DataWatcher(

Provide a description of what the purpose of this class is and how it accomplishes its purpose.


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/DataWatcher.kt line 47 at r3 (raw file):

    dataWatcherConfigs.forEach { config ->
      val regex = config.sourcePathRegex.toRegex()
      if (regex.containsMatchIn(blobKey)) {

This seems overly broad. Why not use "matches" and make sure the regex can match the entire blobKey?

https://github.com/JetBrains/kotlin/blob/rrr/2.1.0/core-docs/libraries/stdlib/common/src/kotlin/TextH.kt#L38C4-L39C59


src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/data_watcher_config.proto line 27 at r3 (raw file):

option java_outer_classname = "DataWatcherConfigProto";

message DataWatcherConfig {

You're missing documentation for many messages and fields.


src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/data_watcher_config.proto line 29 at r3 (raw file):

message DataWatcherConfig {

  message TriggeredApp {

Receiver and TriggeredApp are the same thing?

Also as far as I can tell, all downstream workers will have to take a dependency on this in order to parse it out of the workItem. The thing on the other side of the control plane should not know it was invoked by DataWatcher.

I see a couple options:

  1. Put this proto in the control plane module. It's reasonable for things downstream of the control plane to take a dependency on it. It'll need a better name than "TriggeredApp" though.
  2. Set this up so the the "app_config" must have a field named "path" that is of type string, which you can put the path in. Then that object can just be passed directly.

Copy link
Collaborator Author

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @kungfucraig and @Marco-Premier)


src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/data_watcher_config.proto line 29 at r3 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Receiver and TriggeredApp are the same thing?

Also as far as I can tell, all downstream workers will have to take a dependency on this in order to parse it out of the workItem. The thing on the other side of the control plane should not know it was invoked by DataWatcher.

I see a couple options:

  1. Put this proto in the control plane module. It's reasonable for things downstream of the control plane to take a dependency on it. It'll need a better name than "TriggeredApp" though.
  2. Set this up so the the "app_config" must have a field named "path" that is of type string, which you can put the path in. Then that object can just be passed directly.

The Control Plane only passes through Any messages. If you look further down the stack, you'll see that change. That does mean there are nested anyies...
any {
triggeredApp {
config = any {
cmmWork{}
}
}
}

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @Marco-Premier and @stevenwarejones)


src/main/proto/wfa/measurement/securecomputation/datawatcher/v1alpha/data_watcher_config.proto line 29 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

The Control Plane only passes through Any messages. If you look further down the stack, you'll see that change. That does mean there are nested anyies...
any {
triggeredApp {
config = any {
cmmWork{}
}
}
}

I understand that the control plane only passes through Any messages. The nested Any's are not fantastic, but it's probably fine.

What I'm objecting to is the workers taking a dependency on the data watcher. The "TriggerApp" (it needs a better name) should live in the control plane or maybe even better, the TEE SDK package.


src/main/kotlin/org/wfanet/measurement/securecomputation/datawatcher/BUILD.bazel line 3 at r3 (raw file):

load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_library")

package(default_visibility = ["//visibility:public"])

This should be default private. We don't want other modules taking explicit dependencies on this one.

Base automatically changed from marcopremier/controlplane-api to main January 30, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants