Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Simple TEE VID Labeling App #1991

Open
wants to merge 9 commits into
base: stevenwarejones_data_watcher
Choose a base branch
from

Conversation

stevenwarejones
Copy link
Collaborator

@stevenwarejones stevenwarejones commented Jan 3, 2025

No description provided.

@wfa-reviewable
Copy link

This change is Reviewable

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on @kungfucraig and @stevenwarejones)


a discussion (no related file):
For dependent PRs in the same repo, you need to base the PR branch on the parent PR branch and set the base branch of the PR in GitHub. This way the PR doesn't include diffs from the parent.

Additionally, the "depends on" should not be in the main PR description as that is copied to the commit message. The statement will no longer be relevant after the base PR is merged. Therefore, it should be in a separate PR comment.

@SanjayVas SanjayVas mentioned this pull request Jan 3, 2025
@stevenwarejones stevenwarejones changed the base branch from main to stevenwarejones_data_watcher January 7, 2025 18:48
Copy link
Collaborator Author

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @Marco-Premier, and @SanjayVas)


a discussion (no related file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

For dependent PRs in the same repo, you need to base the PR branch on the parent PR branch and set the base branch of the PR in GitHub. This way the PR doesn't include diffs from the parent.

Additionally, the "depends on" should not be in the main PR description as that is copied to the commit message. The statement will no longer be relevant after the base PR is merged. Therefore, it should be in a separate PR comment.

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 19 files at r1, 9 of 13 files at r3, 4 of 5 files at r4, all commit messages.
Reviewable status: 16 of 19 files reviewed, 16 unresolved discussions (waiting on @kungfucraig, @Marco-Premier, and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel line 4 at r4 (raw file):

package(
    default_testonly = True,

I assume this isn't intended to only be used in tests. Also public visibility is for generally usable Bazel library code. Just leave out the package declaration for now and add it in when you know what Bazel packages would use this.


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel line 3 at r4 (raw file):

load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_test")

package(

Same note re: package declaration in the test tree.


src/main/proto/wfa/measurement/securecomputation/teeapps/v1alpha/tee_app_config.proto line 17 at r4 (raw file):

syntax = "proto3";

package wfa.measurement.securecomputation.teeapps.v1alpha;

Is this part of the versioned gRPC API? If not, you can put it in a separate config package instead of a versioned package.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 58 at r4 (raw file):

    storageClient: MesosRecordIoStorageClient
  ) {
    val inputBlob =

Specify types where not obvious, i.e. where the type isn't named on the right side of the assignment via a constructor or well-known factory.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 79 at r4 (raw file):

  override suspend fun runWork(message: TriggeredApp) {
    val teeAppConfig = message.config.unpack(TeeAppConfig::class.java)
    assert(teeAppConfig.workTypeCase == TeeAppConfig.WorkTypeCase.VID_LABELING_CONFIG)

We don't use assertions outside of tests. Use require or check instead to enforce a precondition/invariant.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 82 at r4 (raw file):

    val vidLabelingConfig = teeAppConfig.vidLabelingConfig
    val compiledNode: CompiledNode =
      @Suppress("WHEN_ENUM_CAN_BE_NULL_IN_JAVA")

Document suppression reason.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 88 at r4 (raw file):

          val modelData =
            vidModelBlob.read().reduce { acc, byteString -> acc.concat(byteString) }.toStringUtf8()
          CompiledNode.getDefaultInstance()

Use parseTextProto from common-jvm


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 61 at r4 (raw file):

class VidLabelerAppTest() {

  @Rule @JvmField val pubSubEmulatorProvider = GooglePubSubEmulatorProvider()

This is meant to be used as a @ClassRule so you share the emulator instance across test methods.


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 71 at r4 (raw file):

  @Before
  fun setup() {

Use descriptive names for @Before and @After methods

Code quote:

setup

src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 93 at r4 (raw file):

  @Test
  fun vidLabelWithTextProtoModel() {

Same note re: test method naming.


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 94 at r4 (raw file):

  @Test
  fun vidLabelWithTextProtoModel() {
    runBlocking {

If the body of the whole function is contained inside runBlocking, the convention is to use a function expression. IntelliJ has a context action to perform this refactor for you.

fun `SUT does foo when bar`() = runBlocking {
  // ...
}

src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 120 at r4 (raw file):

          val indexString = "%02d".format(index)
          val inputFileName = "labeler_input_$indexString.textproto"
          parseTextProto(

Use the overload of parseTextProto that takes a File, passing the default instance.


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 121 at r4 (raw file):

          val inputFileName = "labeler_input_$indexString.textproto"
          parseTextProto(
            Paths.get("$TEXTPROTO_PATH/$inputFileName").toFile().bufferedReader(),

If TEXTPROTO_PATH is a Path instance, you can use resolve()

Suggestion:

TEXTPROTO_PATH.resolve(inputFileName)

src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 146 at r4 (raw file):

      publisher.publishMessage(topicId, work)

      withTimeout(10000) {

This is the wrong way to handle test timeouts. The final timeout is configured by the Bazel runner via the size and/or timeout attributes on the Bazel target. More specific timeouts can be specified using JUnit timeout facilities (pass timeout to the @Test annotation or use the Timeout rule)


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 147 at r4 (raw file):

      withTimeout(10000) {
        while (true) {

Is there some mechanism you can introduce to avoid having to poll here? e.g. having an in-memory QueueSubscriber impl so you can track when messages are acked?

We avoid polling in tests unless it's absolutely necessary, such as when dealing with isolated processes.


src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt line 174 at r4 (raw file):

                  labelerOutput {},
                )
              ProtoTruth.assertThat(output).isEqualTo(expectedOutput)

nit: import this symbol

Code quote:

ProtoTruth.assertThat

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r4.
Reviewable status: 16 of 19 files reviewed, 24 unresolved discussions (waiting on @Marco-Premier and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 34 at r4 (raw file):

/*
 * TEE VID Labeling App.

Can you describe what it does?


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 36 at r4 (raw file):

 * TEE VID Labeling App.
 */
class VidLabelerApp(

We've already spelled out Application in other places. We should be consistent wrt App vs. Application.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 65 at r4 (raw file):

    val outputFlow =
      inputRecords.map { byteString ->
        val labelerInput =

Do you really expect the input blob to have LabelerInput protos? If these are the files that EDPs supply I'd rather there be a more official interface format and keep LabelerInput as an implementation detail.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 71 at r4 (raw file):

            .build() as LabelerInput
        val labelerOutput: LabelerOutput = labeler.label(input = labelerInput)
        labelerOutput.toString().toByteStringUtf8()

If you're going to build a labeled impression table down stream you'll want to write both the labelerInput and the labelerOutput to the next file. I'd actually recommend creating a new proto specific to the app that holds both of these.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 74 at r4 (raw file):

      }

    storageClient.writeBlob(outputBlobKey, outputFlow)

You should use the opportunity of having read the records to write them into multiple files so that they can be processed more efficiently downstream. For example, write a file per day per campaign. A second step can then read those files sort by VID and write down the files you have for your partitioning scheme.


src/main/proto/wfa/measurement/securecomputation/teeapps/v1alpha/tee_app_config.proto line 27 at r4 (raw file):

message TeeAppConfig {
  message VidLabelingConfig {

We should assume that it's necessary to create labels for more than one model line.

Also, why not just provide a model_line name and allow the application to get the rest of what it needs from the Model Repo?


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel line 1 at r4 (raw file):

load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_library")

Can you write just "teeapp" instead of "teeapps" in the directory name?


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel line 9 at r4 (raw file):

kt_jvm_library(
    name = "vid_labeler_app",

I'd expect something with an "app" suffix to include a main if not be an actual image.

Copy link
Collaborator Author

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 19 files reviewed, 24 unresolved discussions (waiting on @kungfucraig and @Marco-Premier)


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 65 at r4 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Do you really expect the input blob to have LabelerInput protos? If these are the files that EDPs supply I'd rather there be a more official interface format and keep LabelerInput as an implementation detail.

This is just an initial version. Updates in the future will be based on the forthcoming storage schema design doc.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 71 at r4 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

If you're going to build a labeled impression table down stream you'll want to write both the labelerInput and the labelerOutput to the next file. I'd actually recommend creating a new proto specific to the app that holds both of these.

yeah - I'd prefer that be a todo until we land on an agreed storage schema design doc.


src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt line 74 at r4 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

You should use the opportunity of having read the records to write them into multiple files so that they can be processed more efficiently downstream. For example, write a file per day per campaign. A second step can then read those files sort by VID and write down the files you have for your partitioning scheme.

I was thinking the underlying storage client would be a sharded storage client but that will come out in the design doc.

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to wait on the design document before reviewing further.

Reviewable status: 16 of 19 files reviewed, 24 unresolved discussions (waiting on @Marco-Premier and @stevenwarejones)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants