Improve performance of Analysis deserialization. #984

retronym · 2021-06-10T02:17:51Z

We have been looking at ways to improve the performance of analysis deserialization. In large builds this can be a major factor in the responsiveness of the build when Zinc determines that little or no actual compilation is needed.

A quick way to profile this is to checkout and compiler scala/scala, the run the following:

sbt> consoleProject
...
scala> def timed[T](f: => T) = { val start = System.nanoTime; try f finally { println( ((System.nanoTime - start) / 1000 / 1000) + " ms") } }
timed: [T](f: => T)T

scala> for (i <- 1 to 1000) { println(timed(sbt.internal.inc.FileAnalysisStore.binary(file("/Users/jz/code/scala/target/compiler/zinc/inc_compile.zip")).get)) }
400 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@7d93d001, scalacOptions: [Ljava.lang.String;@1417548d, javacOptions: [Ljava.lang.String;@797cdb4a), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@2db4ba0), hasModified: false)]
468 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@612b9022, scalacOptions: [Ljava.lang.String;@4fb613b6, javacOptions: [Ljava.lang.String;@500f3781), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@8d1f06b), hasModified: false)]
467 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@3546dbd3, scalacOptions: [Ljava.lang.String;@4a92556e, javacOptions: [Ljava.lang.String;@6308c082), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@64dd2620), hasModified: false)]
468 ms
...

Attach async-profiler:

$ git clone jvm-profiling-tools/async-profiler && cd async-profiler
$ make clean all
./profiler.sh -d 20 --reverse -f out.html --minwidth 0.5 $PID

Opportunities

Inefficient creation of `Array`

javaList.asScala.iterator.map(f).toArray forgoes the chance to allocate the resulting array with the correct size and needs an intermediate buffer. Try something like:

   implicit class EfficientTraverse[T](seq: JList[T]) {
-    def toZincArray[R: scala.reflect.ClassTag](f: T => R): Array[R] =
-      seq.asScala.iterator.map(f).toArray
+    def toZincArray[R: scala.reflect.ClassTag](f: T => R): Array[R] = {
+      seq.stream().map(x => f(x)).toArray(new Array[R](_))
+    }
   }

Dead code creates wasted `Vector`

@@ -695,9 +699,11 @@ final class ProtobufReaders(mapper: ReadMapper, currentVersion: Schema.Version)
       val name = usedName.getName.intern()
       val useScopes = util.EnumSet.noneOf(classOf[UseScope])
       val len = usedName.getScopesCount
-      val scopes = for {
+      for {
         i <- 0 to len - 1
-      } yield useScopes.add(fromUseScope(usedName.getScopes(i), usedName.getScopesValue(i)))
+      } {
+        useScopes.add(fromUseScope(usedName.getScopes(i), usedName.getScopesValue(i)))
+      }
       UsedName.make(name, useScopes)

String interning isn't free

Interning likely-duplicated strings as we read them from the protobuf trades off higher CPU to achieve lower footprint. Maybe it is worth thinking of introducing a name table in the start of the protobuf and replacing fields of type string with indices into this table?

Eager Relation Building is costly

All maps that are deserialized are converted to a Relation[A,B], a pair of maps, one with the forward relations another with the inverse. The inverse map is of type is a Map[B, Set[A]].

The usedNames map appears to be the largest. Is the inverse relation actually used? If not, refactor Zinc to avoid building this up!

The other cost here is building immutable Map/Sets, rather than just upcasting mutable one to collection.{Map,Set}. This may well be worth it if clients of MRelationsNameHashing use its operations to construct new versions of it; these would structurally share with the base version. But its worth analysing how these are used.

It looks like the value in the forward relation in UsedName is not actually accessed as a Set[UsedName], a sequential collection would suffice.

    private def filteredDependencies(dependent: Set[String]): Set[String] = {
      dependent.filter {
        case from if isScalaClass(from) =>
          val affectedNames = usedNames.forward(from).filter(modifiedNames.isModified)

Furthermore, the list of used names for a file is only consulted if there is an memberRef relation between the file with a changed API to the candidate file for invalidation. This is an opportunity for laziness.

Lazy parsing?

If the used names data for many compilation units is actually not needed, can we avoid reading it from disk and parsing it in the first place? This would require an on-disk format with random access into the map of used names for a given unit.

Protobuf's documentation notes:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

I could imagine an incremental change to the current format included an entry in inc_compile.zip for each compilation unit.

Moar Hashing?

Another #brainstorm: what if we compressed the used-names section by hashes of the names? i.e. rather than recording that A.scala -> ["println", "Predef", ...], we instead record that is references ["println".hashCode, "Predef".hascode, ...]? Saving ints rather than Strings would reduce the size of the persisted form, avoid UTF-8 decoding and String interning on deserialization.

In case of hash collisions we would get over-compilation (although it would not propagate far.)

The text was updated successfully, but these errors were encountered:

retronym · 2021-07-23T04:57:53Z

This test evaluates how much of the analysis files on-disk size is consumed by the used names, and how much of the loading time is attributable to that data.

package sbt.inc.binary

import sbt.internal.inc.FileAnalysisStore
import xsbti.compile.AnalysisContents

import java.io.File
import java.nio.file.{ Files, Path }
import java.util.concurrent.TimeUnit

object UsedNameTest {
  def main(args: Array[String]): Unit = {
    val inFile = new File("/Users/jz/code/scala/target/library/zinc/inc_compile.zip")
    val store = FileAnalysisStore.binary(
      inFile
    )
    val contents = store.get().get()
    val analysis = contents.getAnalysis.asInstanceOf[sbt.internal.inc.Analysis]
    val outFile = new File("/tmp/inc_compile.zip")
    val outStore = FileAnalysisStore.binary(outFile)
    outStore.set(
      AnalysisContents.create(
        analysis.copy(relations = analysis.relations.withoutUsedNames),
        contents.getMiniSetup
      )
    )
    printSize(inFile.toPath)
    printSize(outFile.toPath)
    def timedLoad(path: Path): Unit = {
      val now = System.nanoTime()
      FileAnalysisStore.binary(path.toFile).get.get().getAnalysis
      val end = System.nanoTime()
      println(path + " loaded in " + TimeUnit.NANOSECONDS.toMillis((end - now)) + "ms")
    }
    (1 to 32).foreach(_ => timedLoad(inFile.toPath))
    (1 to 32).foreach(_ => timedLoad(outFile.toPath))
  }

  private def printSize(path: Path): Unit = {
    println(path + " size = " + +Files.size(path))
  }
}

This shows about 15-20% of the space (a bit less that I guesstimated) and 30-35% of the time.

Friendseeker · 2024-09-19T17:49:35Z

With #1326, #1388 merged and serialization & deserialization performance quadrupled we can probably close the ticket for now.

retronym assigned dwijnand Jun 10, 2021

retronym mentioned this issue Jun 10, 2021

WIP faster analysis deserialization retronym/zinc#3

Open

retronym mentioned this issue Jun 22, 2021

Reduce overhead in analysis protobuf deserialization #989

Merged

dwijnand mentioned this issue Jul 14, 2021

Improve performance of loading used names from persisted Analysis file #995

Merged

dwijnand removed their assignment Nov 11, 2021

Friendseeker closed this as completed Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of Analysis deserialization. #984

Improve performance of Analysis deserialization. #984

retronym commented Jun 10, 2021 •

edited

Loading

retronym commented Jul 23, 2021 •

edited

Loading

Friendseeker commented Sep 19, 2024 •

edited

Loading

Improve performance of Analysis deserialization. #984

Improve performance of Analysis deserialization. #984

Comments

retronym commented Jun 10, 2021 • edited Loading

Opportunities

Inefficient creation of Array

Dead code creates wasted Vector

String interning isn't free

Eager Relation Building is costly

Lazy parsing?

Moar Hashing?

retronym commented Jul 23, 2021 • edited Loading

Friendseeker commented Sep 19, 2024 • edited Loading

retronym commented Jun 10, 2021 •

edited

Loading

Inefficient creation of `Array`

Dead code creates wasted `Vector`

retronym commented Jul 23, 2021 •

edited

Loading

Friendseeker commented Sep 19, 2024 •

edited

Loading