Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Viewer: Re-investigate multi-threaded decoding #60

Open
dcommander opened this issue Sep 2, 2016 · 4 comments
Open

Viewer: Re-investigate multi-threaded decoding #60

dcommander opened this issue Sep 2, 2016 · 4 comments

Comments

@dcommander
Copy link
Member

I spent numerous hours researching this topic in 2010, around the same time that I was developing multi-threaded encoding in the TurboVNC Server (under contract with RSV.) You can even see my commits related to this:

c557fd7
5df1680
778be33
4758068

At the time, I found that it wasn't possible to increase the viewer's performance significantly by employing a tile-based round-robin approach such as is being used by the latest TigerVNC Viewer, but it's worth re-opening that topic. It's unclear whether their claims for improved performance are measured at the low level or whether they include the whole viewer. I personally found that it was more efficient to do what we're currently doing, which is to do all of the decoding in one thread and all of the blitting in another. My benchmark extensions to the TigerVNC Viewer should provide a more thorough picture, though. If there is some advantage to their approach, then it should be straightforward to adopt it in our Java viewer, at least.

@dcommander dcommander added this to the TurboVNC 2.2 milestone Sep 2, 2016
@dcommander
Copy link
Member Author

dcommander commented Apr 1, 2017

I ported the TurboVNC benchmark feature into both the TigerVNC 1.6.0 viewer code and the evolving TigerVNC 1.8 pre-beta viewer code (refer to https://github.com/dcommander/tigervnc/tree/benchmark and https://github.com/dcommander/tigervnc/tree/1.6-benchmark). Testing with that code produced mixed results.

On Linux (2011 Dell Precision T3500, quad-core 2.8 GHz Xeon W3530, nVidia Quadro K5000, CentOS 6.8), the breakdown in decoding time is as follows. Total time (decoding + blitting) is in parentheses.

TigerVNC 1.6.0 TigerVNC 1.8 pre-beta, 1 thread TigerVNC 1.8 pre-beta, 4 threads
Total 10.9 (22.4) 11.1 (23.2) 5.77 (17.4)
2D datasets 1.05 (5.77) 1.10 (6.36) 1.30 (6.65)
3D datasets 9.88 (16.6) 10.0 (16.8) 4.47 (10.8)

On Windows (2011 Dell Precision T3500, quad-core 2.8 GHz Xeon W3530, nVidia Quadro K5000, CentOS 6.8), the breakdown in decoding time is as follows. Total time (decoding + blitting) is in parentheses.

TigerVNC 1.6.0 TigerVNC 1.8 pre-beta, 1 thread TigerVNC 1.8 pre-beta, 4 threads
Total 11.1 (14.0) 11.6 (18.7) 5.06 (12.0)
2D datasets 1.06 (2.63) 1.05 (4.12) 1.45 (4.62)
3D datasets 10.1 (11.4) 10.6 (14.6) 3.62 (7.37)

On Mac (2015 Mini, dual-core 3 GHz Core i7, Intel Iris, OS X 10.10.5), the breakdown in decoding time is as follows. Total time (decoding + blitting) is in parentheses.

TigerVNC 1.6.0 TigerVNC 1.8 pre-beta, 1 thread TigerVNC 1.8 pre-beta, 2 threads
Total 11.7 (356) 8.58 (348) 6.85 (348)
2D datasets 1.90 (308) 1.04 (307) 2.35 (307)
3D datasets 9.81 (48.2) 7.54 (41.0) 4.50 (40.8)

Significant, albeit sublinear, speedup was achieved on all of the 3D datasets. The 2D datasets were a mixed bag:

  • On Linux, most of the 2D datasets realized a modest speedup (in the range of 20-60%), but the most fine-grained of them (bugzilla-16, freshmeat-8, and slashdot-24) slowed down significantly (in the range of 25-55%) with multithreading enabled.
  • Those same datasets slowed down even more on Windows (in the range of 50-60%), while the rest of the 2D datasets sped up by 3-70%.
  • On macOS, all of the 2D datasets slowed down significantly with multithreading enabled. Furthermore, the TigerVNC Viewer blits so slowly on macOS that any improvement in decoding performance was lost in the noise. On that platform, the viewer behaves as if it is always pegging the CPU cores. Using more threads for decoding made blitting slower, hence the nearly identical total benchmark times between the single-threaded and multithreaded cases.

I also added the aforementioned benchmark feature to the Java TigerVNC Viewer, since that viewer has a similar multithreaded decoding feature that we could potentially borrow (with some integration effort, since the Java TurboVNC Viewer forked from the Java TigerVNC Viewer five years ago.) Unfortunately, in the case of the Java TigerVNC Viewer, the overall decoding performance has regressed more than 3x relative to TigerVNC 1.6.0, even with a single thread, and enabling multithreading slows things down even further.

It appears that, at least algorithmically, there is some promise to TigerVNC's multithreaded decoding approach, but it currently seems to be too sensitive to the overhead of the underlying thread/locking implementation.

@dcommander
Copy link
Member Author

Pushed to TurboVNC 2.3

@dcommander
Copy link
Member Author

This won't make it into TurboVNC 3.0, unfortunately, due to lack of time and funding.

@dcommander
Copy link
Member Author

New results from TigerVNC 1.11 are pretty similar to the results from TigerVNC 1.8 (multithreading is still a mixed bag):
#144 (comment)

@dcommander dcommander changed the title Re-investigate multi-threaded decoding Viewer: Re-investigate multi-threaded decoding Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant