Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate moving additional performance-critical pieces of the TurboVNC Viewer into the TurboVNC Helper JNI library #144

Closed
dcommander opened this issue Oct 15, 2018 · 6 comments

Comments

@dcommander
Copy link
Member

Now that there is a TurboVNC Helper library on all platforms we support, the potential exists to improve the performance of the standalone Java TurboVNC Viewer (which will be the only TurboVNC Viewer in the next major release of TurboVNC) by moving more functionality into the TurboVNC Helper:

  • Potentially the entire Tight decoder could be accessed through JNI, thus allowing the use of a SIMD-accelerated zlib library.
  • Potentially Java 2D could be replaced with FBX, thus eliminating the need to bother with Java 2D performance stuff on X11 and Windows platforms (including the need to disable Swing double buffering, which has been known to cause rendering artifacts in some of our dialogs.)

In keeping with the tradition of the TurboVNC Helper, these features would be optional. It would still be possible to use VncViewer.jar without the helper (but with reduced performance.)

@dcommander
Copy link
Member Author

Moving the entire Tight decoder into JNI is problematic, because the decoder interleaves decoding and reading from the network. It will be necessary to work around that in the context of implementing multi-threaded decoding. Meanwhile, however, the most straightforward path seems to be to move only the JPEG and zlib decoding operations into the helper. The former would be done primarily to simplify the build, since it wouldn't be necessary for the build system to re-package the libjpeg-turbo JAR and JNI libraries anymore. The latter would be done primarily to take advantage of the SIMD-accelerated zlib implementation.

@dcommander
Copy link
Member Author

Removing from the "Windows TurboVNC Viewer Migration" milestone, since this doesn't technically represent a feature regression.

dcommander added a commit that referenced this issue Mar 9, 2021
The ability to use TurboJPEG through JNI was introduced with the
overhaul of the Java TurboVNC Viewer in TurboVNC 1.2, but the TurboVNC
Helper wasn't introduced until TurboVNC 2.0 and wasn't available on all
supported platforms until TurboVNC 2.2.  The TurboVNC build system also
had the ability to sign the TurboJPEG JNI JARs using the same
certificate that was used to sign the TurboVNC Viewer JAR, thus allowing
those JARs to be packaged with the TurboVNC Server and deployed
automatically to Linux, Mac, and Windows clients via Java Web Start
(whereas doing likewise with the TurboVNC Helper would have created a
chicken-and-egg dilemma.)

Thus, it made sense historically to use a separate JNI library for
TurboJPEG.  However, since Java Web Start is no longer a thing and the
TurboVNC Helper is built by default on all supported platforms, there is
no longer any need to use a separate JNI library for TurboJPEG.  To that
end, this commit creates a dedicated JNI wrapper for the TurboJPEG C API
and includes it in the TurboVNC Helper library.

The main purpose of this commit at the moment is to streamline the build
and allow distribution-specific builds of the TurboVNC Viewer on Linux
distributions that provide the TurboJPEG C API but not the TurboJPEG
Java API (including RHEL/CentOS 7 via EPEL, RHEL/CentOS 8 via
CodeReady/PowerTools, recent Ubuntu releases, and Fedora.)  However, it
also lays the groundwork for moving the entire Tight decoder into the
Helper, which is a long-term project goal (refer to #144.)
dcommander added a commit that referenced this issue Mar 10, 2021
The ability to use TurboJPEG through JNI was introduced with the
overhaul of the Java TurboVNC Viewer in TurboVNC 1.2, but the TurboVNC
Helper wasn't introduced until TurboVNC 2.0 and wasn't available on all
supported platforms until TurboVNC 2.2.  The TurboVNC build system also
had the ability to sign the TurboJPEG JNI JARs using the same
certificate that was used to sign the TurboVNC Viewer JAR, thus allowing
those JARs to be packaged with the TurboVNC Server and deployed
automatically to Linux, Mac, and Windows clients via Java Web Start
(whereas doing likewise with the TurboVNC Helper would have created a
chicken-and-egg dilemma.)

Thus, it made sense historically to use a separate JNI library for
TurboJPEG.  However, since Java Web Start is no longer a thing and the
TurboVNC Helper is built by default on all supported platforms, there is
no longer any need to use a separate JNI library for TurboJPEG.  To that
end, this commit creates a dedicated JNI wrapper for the TurboJPEG C API
and includes it in the TurboVNC Helper library.

The main purpose of this commit at the moment is to streamline the build
and allow distribution-specific builds of the TurboVNC Viewer on Linux
distributions that provide the TurboJPEG C API but not the TurboJPEG
Java API (including RHEL/CentOS 7 via EPEL, RHEL/CentOS 8 via
CodeReady/PowerTools, recent Ubuntu releases, and Fedora.)  However, it
also lays the groundwork for moving the entire Tight decoder into the
Helper, which is a long-term project goal (refer to #144.)
@dcommander
Copy link
Member Author

dcommander commented Mar 13, 2021

I performed a new round of viewer benchmarks using:

on three test machines (each with a different GPU make) and four operating systems:

  • Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, 4 GB, nVidia Quadro K5000/450.102.04, CentOS 7.9
  • Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, 4 GB, nVidia Quadro K5000/441.66, Windows 7 Ultimate
  • Dell Precision 5820, quad-core 3.6 GHz Intel Xeon W2123, 16 GB, AMD Radeon Pro WX 2100/20.45, CentOS 8.3
  • Mac Mini, dual-core 3.0 GHz Intel Core i7, 16 GB, Intel Iris, macOS 10.14.6

Results:

  • Dell Precision T3500 (Linux)
    • The TurboVNC 3.0 Viewer has ~28-36% better overall blitting performance (~4% better 2D, ~70-92% better 3D) than the TigerVNC 1.11 Viewer.
    • The TurboVNC 3.0 Viewer has almost identical decoding performance (within 2%) to the single-threaded TigerVNC 1.11 Viewer.
    • Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~99% overall (~32% worse 2D, ~151% better 3D.)
  • Dell Precision T3500 (Windows)
    • The TurboVNC 3.0 Viewer has ~93-98% better overall blitting performance (~63-69% better 2D, ~126-130% better 3D) than the TigerVNC 1.11 Viewer.
    • The TurboVNC 3.0 Viewer has ~89% better overall blitting performance (~69% better 2D, ~110% better 3D) than the Windows TurboVNC 2.26 Viewer.
    • The TurboVNC 3.0 Viewer has ~9.6% better overall decoding performance (~7.4% better 2D, ~9.8% better 3D) than the single-threaded TigerVNC 1.11 Viewer.
    • Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~129% overall (~26% worse 2D, ~193% better 3D.)
    • The TurboVNC 3.0 Viewer has similar overall decoding performance (~6.3% worse 2D, ~1.4% worse 3D, ~1.9% worse overall) to the Windows TurboVNC 2.26 Viewer.
      • Most of this disparity is likely due to the use of the Intel zlib library in the Windows TurboVNC 2.26 Viewer.
  • Dell Precision 5820
    • The TurboVNC 3.0 Viewer has ~8-9% better overall blitting performance (~19-22% better 2D, ~10-16% worse 3D) than the TigerVNC 1.11 Viewer.
    • The TurboVNC 3.0 Viewer has almost identical decoding performance (within 2%) to the single-threaded TigerVNC 1.11 Viewer.
    • Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~86% (~41% worse 2D, ~138% better 3D.)
  • Mac Mini
    • The TurboVNC 3.0 Viewer has ~10-21x better overall blitting performance (13-28x better 2D, 5.4-5.5x better 3D) than the TigerVNC 1.11 Viewer.
      • I'm given to understand that this is due to a 60 Hz limitation in the FLTK drawing paths, so it's unclear how the low-level performance disparity would impact end-user performance. However, I would be very surprised if it had no impact at all.
    • The TurboVNC 3.0 Viewer has ~22% worse overall decoding performance (36% worse 2D, 19% worse 3D) than the single-threaded TigerVNC 1.11 Viewer.
    • Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~22% (~62% worse 2D, ~84% better 3D.)
      • The TigerVNC Viewer chooses 4 threads, even though the CPU is dual-core.

Conclusions:

  1. There is no burning need to use FBX or other native code for blitting. Java 2D is generally as fast or faster. The glaring exception was the 10-16% worse blitting performance relative to FLTK with the 3D datasets on the Radeon-equipped machine. However, due to Amdahl's Law (the fact that, with the 3D datasets, blitting accounts for much less execution time than decoding), that only translated to 7% worse total performance with those datasets, which isn't enough to justify much effort. I also need to test whether disabling the X Render pipeline in Java 2D improves that situation. This is the only area in which the overall performance of TurboVNC was worse than that of the single-threaded TigerVNC Viewer.
  2. The primary action item for Java 2D is to figure out how to work around the minor Swing GUI bugs that sometimes occur in the dialogs when double buffering is disabled (which it is by default, because the TurboVNC Viewer has its own double buffering mechanism.)
  3. Using the Intel zlib library is only marginally beneficial with the 2D datasets, which represent primarily legacy workflows (a lot of raw X11 primitive drawing with low color depths, as opposed to modern applications/GUI frameworks that are much richer and more image-based.) Thus, there is very little benefit to using a native Tight decoder on Windows (unless doing so would facilitate multithreading.) Also, given that Java 2D is much faster than FBX, effectively the 6% regression in decoding performance with the 2D datasets is hidden by the 69% better blitting performance on the same datasets.
  4. There is zero benefit to using a native Tight decoder on Linux (unless doing so would facilitate multithreading.)
  5. It would be nice to understand why our decoder performs worse than TigerVNC's on macOS, but since our overall performance is still much better than TigerVNC's on that platform, this is another case in which not much effort is justified at the moment.
  6. There is some benefit to multithreaded Tight decoding with the 3D datasets, but the results are still mixed.

@dcommander
Copy link
Member Author

In general, my strategy is going to be:

  • Keep Java 2D but
    • attempt to work around the minor GUI bugs caused by disabling double buffering
    • attempt to work around the performance issue on the Radeon-equipped machine
  • Focus on multithreaded Tight decoding (Viewer: Re-investigate multi-threaded decoding #60) as the primary strategy for accelerating the viewer
    • move the Tight decoder into JNI only if necessary to make multithreading work properly (the abysmal multithreaded performance in the Java TigerVNC Viewer suggests that it may be necessary to use JNI, but I won't know until I dig into it)

For now, I'm closing this issue, since it has been proven to be much less relevant than I initially thought.

@dcommander
Copy link
Member Author

The Linux blitting performance with -Dsun.java2d.xrender=false was universally worse, so that's a dead end. The results stand.

@dcommander
Copy link
Member Author

Most of the cosmetic issues caused by disabling Swing double buffering have now been fixed in d2ffd4a. (The only known issue that remains is #224, which is more minor.) That took almost all of what little wind remained in the sails for this feature. Since #60 is the only remaining impetus, any further discussion of moving the Tight decoder into JNI will be conducted there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant