Investigate moving additional performance-critical pieces of the TurboVNC Viewer into the TurboVNC Helper JNI library #144

dcommander · 2018-10-15T00:50:57Z

Now that there is a TurboVNC Helper library on all platforms we support, the potential exists to improve the performance of the standalone Java TurboVNC Viewer (which will be the only TurboVNC Viewer in the next major release of TurboVNC) by moving more functionality into the TurboVNC Helper:

Potentially the entire Tight decoder could be accessed through JNI, thus allowing the use of a SIMD-accelerated zlib library.
Potentially Java 2D could be replaced with FBX, thus eliminating the need to bother with Java 2D performance stuff on X11 and Windows platforms (including the need to disable Swing double buffering, which has been known to cause rendering artifacts in some of our dialogs.)

In keeping with the tradition of the TurboVNC Helper, these features would be optional. It would still be possible to use VncViewer.jar without the helper (but with reduced performance.)

dcommander · 2018-11-10T01:23:48Z

Moving the entire Tight decoder into JNI is problematic, because the decoder interleaves decoding and reading from the network. It will be necessary to work around that in the context of implementing multi-threaded decoding. Meanwhile, however, the most straightforward path seems to be to move only the JPEG and zlib decoding operations into the helper. The former would be done primarily to simplify the build, since it wouldn't be necessary for the build system to re-package the libjpeg-turbo JAR and JNI libraries anymore. The latter would be done primarily to take advantage of the SIMD-accelerated zlib implementation.

dcommander · 2021-01-29T22:53:24Z

Removing from the "Windows TurboVNC Viewer Migration" milestone, since this doesn't technically represent a feature regression.

The ability to use TurboJPEG through JNI was introduced with the overhaul of the Java TurboVNC Viewer in TurboVNC 1.2, but the TurboVNC Helper wasn't introduced until TurboVNC 2.0 and wasn't available on all supported platforms until TurboVNC 2.2. The TurboVNC build system also had the ability to sign the TurboJPEG JNI JARs using the same certificate that was used to sign the TurboVNC Viewer JAR, thus allowing those JARs to be packaged with the TurboVNC Server and deployed automatically to Linux, Mac, and Windows clients via Java Web Start (whereas doing likewise with the TurboVNC Helper would have created a chicken-and-egg dilemma.) Thus, it made sense historically to use a separate JNI library for TurboJPEG. However, since Java Web Start is no longer a thing and the TurboVNC Helper is built by default on all supported platforms, there is no longer any need to use a separate JNI library for TurboJPEG. To that end, this commit creates a dedicated JNI wrapper for the TurboJPEG C API and includes it in the TurboVNC Helper library. The main purpose of this commit at the moment is to streamline the build and allow distribution-specific builds of the TurboVNC Viewer on Linux distributions that provide the TurboJPEG C API but not the TurboJPEG Java API (including RHEL/CentOS 7 via EPEL, RHEL/CentOS 8 via CodeReady/PowerTools, recent Ubuntu releases, and Fedora.) However, it also lays the groundwork for moving the entire Tight decoder into the Helper, which is a long-term project goal (refer to #144.)

dcommander · 2021-03-13T00:40:20Z

I performed a new round of viewer benchmarks using:

the Windows TurboVNC Viewer in TurboVNC 2.2.6
the unified Java/C hybrid TurboVNC Viewer in TurboVNC 3.0 evolving
the TigerVNC 1.11 Viewer with the benchmark feature ported from the TurboVNC Viewer

on three test machines (each with a different GPU make) and four operating systems:

Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, 4 GB, nVidia Quadro K5000/450.102.04, CentOS 7.9
Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, 4 GB, nVidia Quadro K5000/441.66, Windows 7 Ultimate
Dell Precision 5820, quad-core 3.6 GHz Intel Xeon W2123, 16 GB, AMD Radeon Pro WX 2100/20.45, CentOS 8.3
Mac Mini, dual-core 3.0 GHz Intel Core i7, 16 GB, Intel Iris, macOS 10.14.6

Results:

Dell Precision T3500 (Linux)
- The TurboVNC 3.0 Viewer has ~28-36% better overall blitting performance (~4% better 2D, ~70-92% better 3D) than the TigerVNC 1.11 Viewer.
- The TurboVNC 3.0 Viewer has almost identical decoding performance (within 2%) to the single-threaded TigerVNC 1.11 Viewer.
- Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~99% overall (~32% worse 2D, ~151% better 3D.)
Dell Precision T3500 (Windows)
- The TurboVNC 3.0 Viewer has ~93-98% better overall blitting performance (~63-69% better 2D, ~126-130% better 3D) than the TigerVNC 1.11 Viewer.
- The TurboVNC 3.0 Viewer has ~89% better overall blitting performance (~69% better 2D, ~110% better 3D) than the Windows TurboVNC 2.26 Viewer.
- The TurboVNC 3.0 Viewer has ~9.6% better overall decoding performance (~7.4% better 2D, ~9.8% better 3D) than the single-threaded TigerVNC 1.11 Viewer.
- Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~129% overall (~26% worse 2D, ~193% better 3D.)
- The TurboVNC 3.0 Viewer has similar overall decoding performance (~6.3% worse 2D, ~1.4% worse 3D, ~1.9% worse overall) to the Windows TurboVNC 2.26 Viewer.
  - Most of this disparity is likely due to the use of the Intel zlib library in the Windows TurboVNC 2.26 Viewer.
Dell Precision 5820
- The TurboVNC 3.0 Viewer has ~8-9% better overall blitting performance (~19-22% better 2D, ~10-16% worse 3D) than the TigerVNC 1.11 Viewer.
- The TurboVNC 3.0 Viewer has almost identical decoding performance (within 2%) to the single-threaded TigerVNC 1.11 Viewer.
- Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~86% (~41% worse 2D, ~138% better 3D.)
Mac Mini
- The TurboVNC 3.0 Viewer has ~10-21x better overall blitting performance (13-28x better 2D, 5.4-5.5x better 3D) than the TigerVNC 1.11 Viewer.
  - I'm given to understand that this is due to a 60 Hz limitation in the FLTK drawing paths, so it's unclear how the low-level performance disparity would impact end-user performance. However, I would be very surprised if it had no impact at all.
- The TurboVNC 3.0 Viewer has ~22% worse overall decoding performance (36% worse 2D, 19% worse 3D) than the single-threaded TigerVNC 1.11 Viewer.
- Using 4 threads in the TigerVNC 1.11 Viewer improves its decoding performance by ~22% (~62% worse 2D, ~84% better 3D.)
  - The TigerVNC Viewer chooses 4 threads, even though the CPU is dual-core.

Conclusions:

There is no burning need to use FBX or other native code for blitting. Java 2D is generally as fast or faster. The glaring exception was the 10-16% worse blitting performance relative to FLTK with the 3D datasets on the Radeon-equipped machine. However, due to Amdahl's Law (the fact that, with the 3D datasets, blitting accounts for much less execution time than decoding), that only translated to 7% worse total performance with those datasets, which isn't enough to justify much effort. I also need to test whether disabling the X Render pipeline in Java 2D improves that situation. This is the only area in which the overall performance of TurboVNC was worse than that of the single-threaded TigerVNC Viewer.
The primary action item for Java 2D is to figure out how to work around the minor Swing GUI bugs that sometimes occur in the dialogs when double buffering is disabled (which it is by default, because the TurboVNC Viewer has its own double buffering mechanism.)
Using the Intel zlib library is only marginally beneficial with the 2D datasets, which represent primarily legacy workflows (a lot of raw X11 primitive drawing with low color depths, as opposed to modern applications/GUI frameworks that are much richer and more image-based.) Thus, there is very little benefit to using a native Tight decoder on Windows (unless doing so would facilitate multithreading.) Also, given that Java 2D is much faster than FBX, effectively the 6% regression in decoding performance with the 2D datasets is hidden by the 69% better blitting performance on the same datasets.
There is zero benefit to using a native Tight decoder on Linux (unless doing so would facilitate multithreading.)
It would be nice to understand why our decoder performs worse than TigerVNC's on macOS, but since our overall performance is still much better than TigerVNC's on that platform, this is another case in which not much effort is justified at the moment.
There is some benefit to multithreaded Tight decoding with the 3D datasets, but the results are still mixed.

dcommander · 2021-03-13T00:58:19Z

In general, my strategy is going to be:

Keep Java 2D but
- attempt to work around the minor GUI bugs caused by disabling double buffering
- attempt to work around the performance issue on the Radeon-equipped machine
Focus on multithreaded Tight decoding (Viewer: Re-investigate multi-threaded decoding #60) as the primary strategy for accelerating the viewer
- move the Tight decoder into JNI only if necessary to make multithreading work properly (the abysmal multithreaded performance in the Java TigerVNC Viewer suggests that it may be necessary to use JNI, but I won't know until I dig into it)

For now, I'm closing this issue, since it has been proven to be much less relevant than I initially thought.

dcommander · 2021-03-13T01:13:20Z

The Linux blitting performance with -Dsun.java2d.xrender=false was universally worse, so that's a dead end. The results stand.

dcommander · 2021-03-14T01:27:22Z

Most of the cosmetic issues caused by disabling Swing double buffering have now been fixed in d2ffd4a. (The only known issue that remains is #224, which is more minor.) That took almost all of what little wind remained in the sails for this feature. Since #60 is the only remaining impetus, any further discussion of moving the Tight decoder into JNI will be conducted there.

dcommander added enhancement funding needed labels Oct 15, 2018

dcommander added this to the Windows TurboVNC Viewer Migration milestone Oct 15, 2018

dcommander mentioned this issue Jan 22, 2021

Investigate SIMD-accelerated Zlib implementations #95

Closed

dcommander removed this from the Windows TurboVNC Viewer Migration milestone Jan 29, 2021

dcommander closed this as completed Mar 13, 2021

dcommander added won't implement and removed funding needed labels Mar 13, 2021

dcommander mentioned this issue Mar 13, 2021

Viewer: Re-investigate multi-threaded decoding #60

Open

dcommander mentioned this issue Apr 1, 2021

TurboVNC won't connect to new Apple Mac Mini #281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate moving additional performance-critical pieces of the TurboVNC Viewer into the TurboVNC Helper JNI library #144

Investigate moving additional performance-critical pieces of the TurboVNC Viewer into the TurboVNC Helper JNI library #144

dcommander commented Oct 15, 2018

dcommander commented Nov 10, 2018

dcommander commented Jan 29, 2021

dcommander commented Mar 13, 2021 •

edited

Loading

dcommander commented Mar 13, 2021

dcommander commented Mar 13, 2021

dcommander commented Mar 14, 2021

Investigate moving additional performance-critical pieces of the TurboVNC Viewer into the TurboVNC Helper JNI library #144

Investigate moving additional performance-critical pieces of the TurboVNC Viewer into the TurboVNC Helper JNI library #144

Comments

dcommander commented Oct 15, 2018

dcommander commented Nov 10, 2018

dcommander commented Jan 29, 2021

dcommander commented Mar 13, 2021 • edited Loading

dcommander commented Mar 13, 2021

dcommander commented Mar 13, 2021

dcommander commented Mar 14, 2021

dcommander commented Mar 13, 2021 •

edited

Loading