-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Simple TEE VID Labeling App #1991
base: stevenwarejones_data_watcher
Are you sure you want to change the base?
Add Simple TEE VID Labeling App #1991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on @kungfucraig and @stevenwarejones)
a discussion (no related file):
For dependent PRs in the same repo, you need to base the PR branch on the parent PR branch and set the base branch of the PR in GitHub. This way the PR doesn't include diffs from the parent.
Additionally, the "depends on" should not be in the main PR description as that is copied to the commit message. The statement will no longer be relevant after the base PR is merged. Therefore, it should be in a separate PR comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @Marco-Premier, and @SanjayVas)
a discussion (no related file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
For dependent PRs in the same repo, you need to base the PR branch on the parent PR branch and set the base branch of the PR in GitHub. This way the PR doesn't include diffs from the parent.
Additionally, the "depends on" should not be in the main PR description as that is copied to the commit message. The statement will no longer be relevant after the base PR is merged. Therefore, it should be in a separate PR comment.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 19 files at r1, 9 of 13 files at r3, 4 of 5 files at r4, all commit messages.
Reviewable status: 16 of 19 files reviewed, 16 unresolved discussions (waiting on @kungfucraig, @Marco-Premier, and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel
line 4 at r4 (raw file):
package( default_testonly = True,
I assume this isn't intended to only be used in tests. Also public visibility is for generally usable Bazel library code. Just leave out the package declaration for now and add it in when you know what Bazel packages would use this.
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel
line 3 at r4 (raw file):
load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_test") package(
Same note re: package declaration in the test tree.
src/main/proto/wfa/measurement/securecomputation/teeapps/v1alpha/tee_app_config.proto
line 17 at r4 (raw file):
syntax = "proto3"; package wfa.measurement.securecomputation.teeapps.v1alpha;
Is this part of the versioned gRPC API? If not, you can put it in a separate config
package instead of a versioned package.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 58 at r4 (raw file):
storageClient: MesosRecordIoStorageClient ) { val inputBlob =
Specify types where not obvious, i.e. where the type isn't named on the right side of the assignment via a constructor or well-known factory.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 79 at r4 (raw file):
override suspend fun runWork(message: TriggeredApp) { val teeAppConfig = message.config.unpack(TeeAppConfig::class.java) assert(teeAppConfig.workTypeCase == TeeAppConfig.WorkTypeCase.VID_LABELING_CONFIG)
We don't use assertions outside of tests. Use require
or check
instead to enforce a precondition/invariant.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 82 at r4 (raw file):
val vidLabelingConfig = teeAppConfig.vidLabelingConfig val compiledNode: CompiledNode = @Suppress("WHEN_ENUM_CAN_BE_NULL_IN_JAVA")
Document suppression reason.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 88 at r4 (raw file):
val modelData = vidModelBlob.read().reduce { acc, byteString -> acc.concat(byteString) }.toStringUtf8() CompiledNode.getDefaultInstance()
Use parseTextProto
from common-jvm
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 61 at r4 (raw file):
class VidLabelerAppTest() { @Rule @JvmField val pubSubEmulatorProvider = GooglePubSubEmulatorProvider()
This is meant to be used as a @ClassRule
so you share the emulator instance across test methods.
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 71 at r4 (raw file):
@Before fun setup() {
Use descriptive names for @Before
and @After
methods
Code quote:
setup
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 93 at r4 (raw file):
@Test fun vidLabelWithTextProtoModel() {
Same note re: test method naming.
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 94 at r4 (raw file):
@Test fun vidLabelWithTextProtoModel() { runBlocking {
If the body of the whole function is contained inside runBlocking
, the convention is to use a function expression. IntelliJ has a context action to perform this refactor for you.
fun `SUT does foo when bar`() = runBlocking {
// ...
}
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 120 at r4 (raw file):
val indexString = "%02d".format(index) val inputFileName = "labeler_input_$indexString.textproto" parseTextProto(
Use the overload of parseTextProto that takes a File, passing the default instance.
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 121 at r4 (raw file):
val inputFileName = "labeler_input_$indexString.textproto" parseTextProto( Paths.get("$TEXTPROTO_PATH/$inputFileName").toFile().bufferedReader(),
If TEXTPROTO_PATH
is a Path instance, you can use resolve()
Suggestion:
TEXTPROTO_PATH.resolve(inputFileName)
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 146 at r4 (raw file):
publisher.publishMessage(topicId, work) withTimeout(10000) {
This is the wrong way to handle test timeouts. The final timeout is configured by the Bazel runner via the size
and/or timeout
attributes on the Bazel target. More specific timeouts can be specified using JUnit timeout facilities (pass timeout
to the @Test
annotation or use the Timeout
rule)
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 147 at r4 (raw file):
withTimeout(10000) { while (true) {
Is there some mechanism you can introduce to avoid having to poll here? e.g. having an in-memory QueueSubscriber impl so you can track when messages are acked?
We avoid polling in tests unless it's absolutely necessary, such as when dealing with isolated processes.
src/test/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerAppTest.kt
line 174 at r4 (raw file):
labelerOutput {}, ) ProtoTruth.assertThat(output).isEqualTo(expectedOutput)
nit: import this symbol
Code quote:
ProtoTruth.assertThat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r4.
Reviewable status: 16 of 19 files reviewed, 24 unresolved discussions (waiting on @Marco-Premier and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 34 at r4 (raw file):
/* * TEE VID Labeling App.
Can you describe what it does?
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 36 at r4 (raw file):
* TEE VID Labeling App. */ class VidLabelerApp(
We've already spelled out Application in other places. We should be consistent wrt App vs. Application.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 65 at r4 (raw file):
val outputFlow = inputRecords.map { byteString -> val labelerInput =
Do you really expect the input blob to have LabelerInput protos? If these are the files that EDPs supply I'd rather there be a more official interface format and keep LabelerInput as an implementation detail.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 71 at r4 (raw file):
.build() as LabelerInput val labelerOutput: LabelerOutput = labeler.label(input = labelerInput) labelerOutput.toString().toByteStringUtf8()
If you're going to build a labeled impression table down stream you'll want to write both the labelerInput and the labelerOutput to the next file. I'd actually recommend creating a new proto specific to the app that holds both of these.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 74 at r4 (raw file):
} storageClient.writeBlob(outputBlobKey, outputFlow)
You should use the opportunity of having read the records to write them into multiple files so that they can be processed more efficiently downstream. For example, write a file per day per campaign. A second step can then read those files sort by VID and write down the files you have for your partitioning scheme.
src/main/proto/wfa/measurement/securecomputation/teeapps/v1alpha/tee_app_config.proto
line 27 at r4 (raw file):
message TeeAppConfig { message VidLabelingConfig {
We should assume that it's necessary to create labels for more than one model line.
Also, why not just provide a model_line name and allow the application to get the rest of what it needs from the Model Repo?
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel
line 1 at r4 (raw file):
load("@wfa_rules_kotlin_jvm//kotlin:defs.bzl", "kt_jvm_library")
Can you write just "teeapp" instead of "teeapps" in the directory name?
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/BUILD.bazel
line 9 at r4 (raw file):
kt_jvm_library( name = "vid_labeler_app",
I'd expect something with an "app" suffix to include a main if not be an actual image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 19 files reviewed, 24 unresolved discussions (waiting on @kungfucraig and @Marco-Premier)
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 65 at r4 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
Do you really expect the input blob to have LabelerInput protos? If these are the files that EDPs supply I'd rather there be a more official interface format and keep LabelerInput as an implementation detail.
This is just an initial version. Updates in the future will be based on the forthcoming storage schema design doc.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 71 at r4 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
If you're going to build a labeled impression table down stream you'll want to write both the labelerInput and the labelerOutput to the next file. I'd actually recommend creating a new proto specific to the app that holds both of these.
yeah - I'd prefer that be a todo until we land on an agreed storage schema design doc.
src/main/kotlin/org/wfanet/measurement/securecomputation/teeapps/vidlabeling/VidLabelerApp.kt
line 74 at r4 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
You should use the opportunity of having read the records to write them into multiple files so that they can be processed more efficiently downstream. For example, write a file per day per campaign. A second step can then read those files sort by VID and write down the files you have for your partitioning scheme.
I was thinking the underlying storage client would be a sharded storage client but that will come out in the design doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to wait on the design document before reviewing further.
Reviewable status: 16 of 19 files reviewed, 24 unresolved discussions (waiting on @Marco-Premier and @stevenwarejones)
No description provided.