Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.x] Remove the Protocol Buffer usage #1388

Merged
merged 3 commits into from
Sep 18, 2024

Conversation

eed3si9n
Copy link
Member

@eed3si9n eed3si9n commented Sep 5, 2024

Problem/Solution

We currently have three (3) implementations of Analysis persistence. Given that the "consistent" binary (#1326) is the best, we should remove the protobuf one, so we don't need to update three code bases as we refactor for 2.x.

@eed3si9n eed3si9n changed the title Remove the Protocol Buffer usage [2.x] Remove the Protocol Buffer usage Sep 5, 2024
@eed3si9n
Copy link
Member Author

eed3si9n commented Sep 5, 2024

This PR switches the default format to the new consistent one, and the following tests are failing. Haven't dug into what's happening:

[info] MultiProjectIncrementalSpec:
[info] incremental compiler
[info] - should detect shadowing
[info] it
[info] - should not compile Java for no-op *** FAILED ***
[info]   List(1725556365497) did not equal List() (MultiProjectIncrementalSpec.scala:124)
[info]   Analysis:
[info]   List(0: 1725556365497 -> )
[info] - should not compile anything if source has not changed *** FAILED ***
[info]   result2.hasModified() was true (OutputSpec.scala:37)

@eed3si9n
Copy link
Member Author

eed3si9n commented Sep 5, 2024

I wonder if we need to pretend that the compilation took place 1s after 2010-01-01T00:00:42Z for each round of incremental compilation entry /cc @szeiger

@eed3si9n eed3si9n force-pushed the wip/protobuf_removal branch 2 times, most recently from 3b0e7ee to c094e64 Compare September 6, 2024 00:11
Comment on lines +298 to +304
val singleOutput = setup.output().getSingleOutputAsPath()
val outputPath = singleOutput match {
case o if o.isPresent() && o.get().getFileName().toString().endsWith(".jar") =>
Analysis.dummyOutputJarPath
case _ => Analysis.dummyOutputPath
}
out.string(outputPath.toString())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug fix, which we should backport to 1.x.

**Problem/Solution**
We currently have three (3) implementations of Analysis persistence.
Given that the "consistent" binary is the best, we should remove
the protobuf one, so we don't need to update three code bases as we refactor for 2.x.
@@ -11,8 +11,7 @@

package sbt.internal.inc

import sbt.internal.prof.Zprof

// import sbt.internal.prof.Zprof

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave all of this in as comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm just going to comment out the protobuf usage, and later figure out what to do with the InvalidationProfiler instead of outright deleting it, if that's ok.

@Friendseeker
Copy link
Member

I wonder if we need to pretend that the compilation took place 1s after 2010-01-01T00:00:42Z for each round of incremental compilation entry /cc @szeiger

I am thinking if we should instead not rely on timestamp information in places such as detectAPIChanges and IncHandler.checkRecompilations.

In detectAPIChanges we fundamentally do the timestamp check to determine if a class is recompiled, which is already a weird thing as

def detectAPIChanges(
      recompiledClasses: collection.Set[String],
      oldAPI: String => AnalyzedClass,
      newAPI: String => AnalyzedClass
  )

already takes recompiledClasses as input. So I guess either recompiledClasses is somehow a superset of actual recompiled classes (in this case we can try to make recompiledClasses the set of actual recompiled classes), or recompiledClasses is the set of actual recompiled classes, then the timestamp check trivially evaluates to false.

@eed3si9n
Copy link
Member Author

@Friendseeker For sbt 2.x I've implemented testQuick invalidation based on the file contents, and documented it as a blog post here - https://eed3si9n.com/sudori-part5/

@Friendseeker
Copy link
Member

@Friendseeker For sbt 2.x I've implemented testQuick invalidation based on the file contents, and documented it as a blog post here - https://eed3si9n.com/sudori-part5/

Wow! So testQuick now takes transitive closure of all dependencies and compute hashes on the transitive closure if my understanding is correct.

I definitely think the same idea can be used here! We don't even need to transitively compute the hash, but just the single class file hash is probably sufficient.

I shall do some further investigation. Ideally we can avoid timestamp/hash comparison in the first place but if it is unavoidable, I guess we shall do class file hash comparison.

@Friendseeker
Copy link
Member

@eed3si9n

I tried using the following patch on 1.10.x branch and ran all scripted tests

Index: internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala b/internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala
--- a/internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala	(revision cbf85da6a065e37ac1b52359d33408af32c571fb)
+++ b/internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala	(date 1726642366119)
@@ -309,8 +309,9 @@
     // log.debug(s"[zinc] detectAPIChanges(recompiledClasses = $recompiledClasses)")
     def classDiff(className: String, a: AnalyzedClass, b: AnalyzedClass): Option[APIChange] = {
       // log.debug(s"[zinc] classDiff($className, ${a.name}, ${b.name})")
-      if (a.compilationTimestamp() == b.compilationTimestamp() && (a.apiHash == b.apiHash)) None
-      else {
+      if (a.compilationTimestamp() == b.compilationTimestamp() && (a.apiHash == b.apiHash)) {
+        throw new IllegalStateException("Timestamp should be always different")
+      } else {
         val hasMacro = a.hasMacro || b.hasMacro
         if (hasMacro && IncOptions.getRecompileOnMacroDef(options)) {
           Some(APIChangeDueToMacroDefinition(className))

The following 6 scripted tests failed.

[error] stack trace is suppressed; run last scripted for the full output
[error] (scripted) sbt.internal.inc.ScriptedRunnerImpl$ScriptedFailure$1$: 6 (out of 171) scripted tests failed:
[error] 
[error] 	pipelining/subproject-pipelining
[error] 	pipelining/subproject-pipelining-optout
[error] 	macros/macro-type-change
[error] 	macros/macro-type-change-2
[error] 	macros/macro-type-change-3
[error] 	macros/macro-type-change-4
[error] Total time: 69 s (01:09), completed Sep 17, 2024, 11:56:09 p.m.

I will try to figure out why did the scripted test fail, but considering if (a.compilationTimestamp() == b.compilationTimestamp() && (a.apiHash == b.apiHash)) only gets hit in 6 out of 171 scripted tests, it is very likely that this condition also rarely gets hit in the wild and we can remove the condition altogether.

Copy link
Member

@adpi2 adpi2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! If we can deprecate and remove the compilation timestamp it would be even better.

@eed3si9n eed3si9n merged commit 1211926 into sbt:develop Sep 18, 2024
8 checks passed
@eed3si9n eed3si9n deleted the wip/protobuf_removal branch September 18, 2024 16:47
@Friendseeker
Copy link
Member

Friendseeker commented Sep 18, 2024

@eed3si9n Are you currently drafting a 2nd PR to get rid of the timestamp usage in detectAPIChanges and clean up the special timestamp handling such as TIMESTAMP_2020, timeStampIsSame?

If you are not working on it currently, I can work on it right now.

Also should we backport the change to 1.10.x?

@Friendseeker
Copy link
Member

Friendseeker commented Sep 19, 2024

I will try to figure out why did the scripted test fail

All incidences of failures are from here:

val subprojectApiChanges: APIChanges = {
val incrementalExternalChanges = {
val previousAPIs = previousAnalysis.apis
val externalFinder = lookupAnalyzedClass(_: String, None).getOrElse(APIs.emptyAnalyzedClass)
detectAPIChanges(previousAPIs.allExternals, previousAPIs.externalAPI, externalFinder)
}
val changedExternalClassNames = incrementalExternalChanges.allModified.toSet

So the reason is that Zinc does not know whether an upstream source file is recompiled and needs to rely on timestamp to make the determination.

Noteworthy fact is that we have at least 13 scripted tests featuring subproject and only 6 of these failed. I shall do more digging.


I did more thinking and I think that we have to hash the class files. For downstream compilation. Zinc must know the exact set of upstream sources changed to perform initial invalidation. We cannot simply get rid of if else, as during upstream incremental compilation it can be pretty common for only a small subset of upstream sources to be recompiled and Zinc need to know.

Looks like Stamper is already doing some content hashing. I shall investigate if the hashes already computed by Stamper suffices for the purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants