-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fortress (Ogre 2.2?) broke GUI with VirtualGL #526
Comments
Why it broken in Fortress/Ogre2.2 I do not know. What I can tell you, is that when "currentGLContext" = "1" (which is what Gazebo does in both Edifice & Fortress) both Ogre 2.1 and 2.2 will perform the following: if( "currentGLContext" == "1" )
{
if( !glXGetCurrentContext() )
{
OGRE_EXCEPT( Exception::ERR_RENDERINGAPI_ERROR,
"currentGLContext was specified with no current GL context",
"GLXWindow::create" );
}
}
The algorithm is quite simple: if "currentGLContext" is set, then we expect a GL context to be bound. Given that the context is created by Qt when using GUI, I suspect the cause is somewhere in Qt code, or some subtle change in Ogre2RenderEngine::CreateRenderWindow (or parent caller, or code executed up until now) that no longer sets the context. Or alternatively, a routine caused a call to TL;DR there is a missing call to Perhaps Qt no longer uses |
Here is a gdb trace of just the GUI printing every call of
|
And the same kind of trace with VGL:
|
If I found the right Qt source, it should be this one handling the context: https://code.woboq.org/qt5/qtbase/src/plugins/platforms/xcb/gl_integrations/xcb_glx/qglxintegration.cpp.html . It seems it is still full of |
I looked into this a little and was able to reproduce this issue using the container suggested in gazebosim/gz-sim#1746 (comment). For me, I noticed that the GL context was lost ( Similarly if I build ogre-next from source without EGL support then gazebo works fine. |
Mmm, Including calls to Perhaps if Gazebo manually saves the GL context before loading the GL3+ plugin, and restores the context after loading it (and before creating a window) this problem can be fixed? I don't know why the problem is specific to VirtualGL though. |
From https://registry.khronos.org/EGL/sdk/docs/man/html/eglMakeCurrent.xhtml:
So it makes sense the |
I think it now makes sense to me. When running without VirtualGL, the GLX context is created, and then several EGL contexts are probed, each with its own display. When running with VirtualGL, the GLX context is emulated with I don't think there's anything that could be fixed on the VirtualGL side. It just can't know if you want to actually change the current EGL context, or if it should remain as it was for the GLX emulation. Or, it would have to report the EGL display used for emulation as unavailable for direct EGL operation. I also asked about this on the VirtualGL repo. I've written an MWE where you can easily test this behavior. The two lines with comment Compile with #include <GL/glx.h>
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <iostream>
#include <EGL/egl.h>
#include <EGL/eglext.h>
#include <vector>
int main(int argc, char** argv)
{
auto dummyDisplay = XOpenDisplay(0);
Display *x11Display = static_cast<Display*>(dummyDisplay);
int screenId = DefaultScreen(x11Display);
int attributeList[] = {
GLX_RENDER_TYPE, GLX_RGBA_BIT,
GLX_DOUBLEBUFFER, True,
GLX_DEPTH_SIZE, 16,
GLX_STENCIL_SIZE, 8,
None
};
int nelements = 0;
auto dummyFBConfigs = glXChooseFBConfig(x11Display, screenId, attributeList, &nelements);
auto dummyWindowId = XCreateSimpleWindow(x11Display, RootWindow(dummyDisplay, screenId), 0, 0, 1, 1, 0, 0, 0);
PFNGLXCREATECONTEXTATTRIBSARBPROC glXCreateContextAttribsARB = 0;
glXCreateContextAttribsARB = (PFNGLXCREATECONTEXTATTRIBSARBPROC)glXGetProcAddress((const GLubyte *)"glXCreateContextAttribsARB");
int contextAttribs[] = {
GLX_CONTEXT_MAJOR_VERSION_ARB, 3, //
GLX_CONTEXT_MINOR_VERSION_ARB, 3, //
None //
};
auto dummyContext = glXCreateContextAttribsARB(x11Display, dummyFBConfigs[0], nullptr, 1, contextAttribs);
// Create the GLX context and set it as current
GLXContext x11Context = static_cast<GLXContext>(dummyContext);
glXMakeCurrent(x11Display, dummyWindowId, x11Context);
std::cerr << glXGetCurrentContext() << " " << glXGetCurrentDrawable() << " " << glXGetCurrentDisplay() << std::endl;
typedef EGLBoolean ( *EGLQueryDevicesType )( EGLint, EGLDeviceEXT *, EGLint * );
auto eglQueryDevices = (EGLQueryDevicesType)eglGetProcAddress( "eglQueryDevicesEXT" );
auto eglQueryDeviceStringEXT = (PFNEGLQUERYDEVICESTRINGEXTPROC)eglGetProcAddress( "eglQueryDeviceStringEXT" );
EGLint numDevices = 0;
eglQueryDevices( 0, 0, &numDevices );
std::vector<EGLDeviceEXT> mDevices;
if( numDevices > 0 )
{
mDevices.resize( static_cast<size_t>( numDevices ) );
eglQueryDevices( numDevices, mDevices.data(), &numDevices );
}
for( int i = 0u; i < numDevices; ++i )
{
EGLDeviceEXT device = mDevices[size_t( i )];
auto name = std::string(eglQueryDeviceStringEXT( device, EGL_EXTENSIONS ));
const char *gpuCard = eglQueryDeviceStringEXT( device, EGL_DRM_DEVICE_FILE_EXT );
if( gpuCard ) name += std::string(" ") + gpuCard;
std::cerr << i << " " << name << std::endl;
EGLAttrib attribs[] = { EGL_NONE };
auto eglDisplay = eglGetPlatformDisplay( EGL_PLATFORM_DEVICE_EXT, mDevices[i], attribs );
EGLint major = 0, minor = 0;
eglInitialize( eglDisplay, &major, &minor );
const EGLint configAttribs[] = {
EGL_SURFACE_TYPE, EGL_PBUFFER_BIT, EGL_BLUE_SIZE, 8, EGL_GREEN_SIZE, 8, EGL_RED_SIZE, 8,
EGL_RENDERABLE_TYPE, EGL_OPENGL_BIT, EGL_NONE
};
EGLint numConfigs;
EGLConfig eglCfg;
eglChooseConfig( eglDisplay, configAttribs, &eglCfg, 1, &numConfigs );
const EGLint pbufferAttribs[] = {
EGL_WIDTH, 1, EGL_HEIGHT, 1, EGL_NONE,
};
auto eglSurf = eglCreatePbufferSurface( eglDisplay, eglCfg, pbufferAttribs );
eglBindAPI( EGL_OPENGL_API );
EGLint contextAttrs[] = {
EGL_CONTEXT_MAJOR_VERSION,
4,
EGL_CONTEXT_MINOR_VERSION,
5,
EGL_CONTEXT_OPENGL_PROFILE_MASK,
EGL_CONTEXT_OPENGL_CORE_PROFILE_BIT_KHR,
EGL_NONE
};
// Create the EGL context and make it current
auto eglCtx = eglCreateContext( eglDisplay, eglCfg, 0, contextAttrs );
std::cerr << glXGetCurrentContext() << " " << glXGetCurrentDrawable() << " " << glXGetCurrentDisplay() << std::endl;
auto ctx = glXGetCurrentContext(); auto dpy = glXGetCurrentDisplay(); auto drw = glXGetCurrentDrawable(); // FIX
eglMakeCurrent( eglDisplay, eglSurf, eglSurf, eglCtx );
glXMakeCurrent(dpy, drw, ctx); // FIX
std::cerr << glXGetCurrentContext() << " " << glXGetCurrentDrawable() << " " << glXGetCurrentDisplay() << std::endl;
}
} |
cool, thanks for digging into this! Do you mind submitting a PR with your fix? |
I found this issue while trying to run Gazebo Garden under VirtualGL inside a container. I don't mean to derail the conversation here but I have an observation: Gazebo always crashes as described when VirtualGL is run with the EGL backend, but I have gotten it to run to a limited extent with the GLX backend. Some environments work OK (like (I don't know enough about VirtualGL or the graphics stack to know the difference between the backends or which one should be preferred. I'd be happy if either of them worked fully.) |
I don't know why the different worlds from the lrauv project result in different behavior. Just looking at the ogre log in gazebosim/gz-sim#1746, it has the same errors as the one reported here about missing gl context so it could be due to the same issue. |
Here is a DEB I've built from the main branch of virtualgl with the patch from VirtualGL/virtualgl#220 (comment): virtualgl_3.0.91_amd64-fixed.zip. With this patched version of VirtualGL, the crashes no more occur to me. However, following the discussion in the VirtualGL issue, I still think it'd make sense to also implement the fix on Gazebo side - it seems that the fact the current procedure worked is more likely to be an accident than design. It seems that mixing both GLX and EGL calls in a single program is something nobody does and we haven't found any references saying how the interaction between the two should work. I'll prepare the Gazebo PR. |
Fix in #794 . |
I cherry-picked this change to the Indeed I no longer get the error mentioning Crash log
~/.gz/rendering/ogre2.log
|
Actually, the PR to gz-rendering itself should be sufficient to fix the rendering issues. The patched VirtualGL binary was another approach to fix the issue "on the other end". Could you try with the non-patched VirtualGL? I'll close this issue as the original one has been resolved. Let's continue this discussion in gazebosim/gz-sim#1746 or a new issue. You can try specifying the VirtualGL device as |
The VirtualGL-side fix has also been implemented: VirtualGL/virtualgl#220 . So now this bug should be prevented from both sides :) I did a thorough compatibility test and all combinations I used worked. |
Environment
~/.ignition/rendering
Description
Steps to reproduce
sudo chmod g+rw /dev/dri/card0
vglrun +v -d /dev/dri/card0 ign gazebo -v4
Output
VirtualGL is a nice way to run Gazebo in many constrained environments where e.g. a fully-fledged X server cannot be run, or it is running on a wrong GPU etc. We've been using it all the time in SubT challenge with Dome without problems. The above example works correctly in Dome and Edifice and fails in Fortress. I know Fortress came with headless rendering support, but VirtualGL is more mighty - it can also redirect the GUI to use EGL (when connected with xvfb).
The EGL backend of VirtualGL works so that it intercepts GLX calls from the application and substitutes them with relevant EGL calls. This translation/faking layer is not 100% feature complete. However, with the error being so non-informative, I have no idea what could be wrong (whether it's gazebo-side or virtualgl-side).
Here is a trace of the VirtualGL function interposer, but I can't make anything useful from it: https://pastebin.com/nsCJ8SSE
The text was updated successfully, but these errors were encountered: