-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel gpu-tool tests #118
Comments
So to answer my own question, reverting 6a3c029 did bite me. Without it gem_mmap_gtt panics with " Assertion fs->first_pindex <= pager_last failed at /storage/freebsd-graphics/freebsd-base-graphics/sys/vm/vm_fault.c:387". This is going to be fun ... :) |
Oh, by all means please report any issues you find with the GPU tests. I certainly haven't run them all. 6a3c029 is certainly needed; without it it's quite easy to panic the system. But it's not completely correct. Consider that we may mmap a GEM object twice. Without 6a3c029 we'd acquire two references on the cdev reference, but internally, both mappings are of the same VM object. So when both mappings are destroyed with munmap(), the VM object will only be destroyed upon the second unmap, and that would cause us to release only one reference on the cdev handle. Later if we mmap the same GEM object again, we may end up using a stale cdev handle. So, the code still isn't right. Consider that the cdev handle contains info about the mapping, such as its size. But obviously this might differ between multiple mappings which share a handle, so a comparison by the handle pointer (the GEM object address) isn't sufficient. However, 6a3c029 fixes panics seen when using Xorg, so it provides some stability at least. Your first two patches seem ok to me. Do you have them in a branch somewhere so that I can just push them tonight? I'll take a closer look at the third one. |
I should point out that the issue I described above isn't particularly easy to solve IMHO. The problem is that FreeBSD's VM hides the details of userland mappings from the driver layer, while Linux does not try to be abstract and passes everything right through. One might consider this a defect in FreeBSD, but the lack of abstractions leads to ugly complexity in the GPU drivers. For instance, i915 allows one to create GEM objects backed by malloc'ed memory; in the kernel, the userland address of that memory is needed in order to actually look up the backing pages. For this reason, i915 has to register so-called MMU notifiers to handle the possibility that the user process has munmap()ed the memory backing a GEM object. In FreeBSD, this wouldn't be necessary - the GEM object could be defined using a reference to the anonymous VM object backing the malloc'ed memory, and addressed using pindexes. Then userland can do whatever it wants with its mapping without disrupting the kernel. Anyway. These divergences are quite difficult and in some cases impossible to solve in the LinuxKPI layer. Surgical changes to the drives themselves are needed in some cases. |
Hi, I believe I have v1 of a working solution (b7f5b8e). Both tests pass and I am currently using it on my desktop. So basically I have restored the ref-counting of the linux_cdev_handles, but the refcount is now held by the pager. This solves the multiple mmaps case and the deferred pager destruction case that I hit initially. I have also made two other changes (should probably split them in different commits)
TODO: Redo the refcount with atomic ops, to eliminate some the exclusive locks I have added. Looking forward to any comments when you got the time. |
And by 'both tests pass' I mean gem_mmap_gtt fails the same way as before the patch :) |
ACK. I haven't looked at this yet, but will soon. |
More spam :) I have rebased the patches on the current master. a846ccf don't call vm_ops->open The next set enables most of the kms* tests, that would hang before them - its a debugfs issue: The last one is really ugly, and I am a bit ashamed for really proposing it, but I can't come up with a better solution for the moment. The case is that some of the debugfs functions would not use seq_file but access the user pointer directly. Any ideas are welcome. The last set for the moment was needed for kms_mmap_write_crc. With this the test pass, but I suspect the dmabuf mmap code may need some massaging ( cache mode for example ) 35ec5e1 add dummy file struct to shmem_file Best, |
Updated the commit ids in the prev comment. I have introduced a bug with the first implementation of the debugfs ugliness. The private_data of the debugfs file can be seq_file or simple_attr. I will need to document this somewhere, as it is not obvious. Now fixed and tiny-bit decreased the ugliness of the patch :). |
The debugfs patch is still wrong and there is at least one debugfs file in i915 driver (i915_error_state) that does not like us overwriting the filp->private_data with a pointer to an sbuf. I have implemented a test rework of the debugfs/seqfile/simple_attr to use the user pointer directly without going through an sbuf at all. Everything seems to work, but I don't know if there is some requirement that I am missing, that warranted the use of the sbufs in the first place. The one thing I could come up with is that we loose the ability to call debugfs fileops from the kernel, but this doesn't look like a needed feature. |
Hi all, @markjdb
I have been playing with the intel-gpu tests, trying to figure out some of the simpler issues. @markjdb really hope you don't mind me piggybacking on your work, just trying to help and liked the challenge. If you have already stumbled and fixed the things I will post, all the better and please ignore by ramblings :).
So 3 proposed patches for the moment.
First one is trivial - fix shmem_file_setup proto to avoid going in the negative values and crashing the kernel - triggered by gem_mmap if I am not mistaken
Second - unfortunately I don't remember the test that triggered this, and we might need a more general solution, but good news is that simple fix as this actually works :)
The third one I have mentioned in #117, just adding it here for completeness.
Currently I am tracing a panic (handle not found in linux_cdev_handle_find) reliably triggered by gem_fence_upload.
With 6a3c029 reverted the panic does not happen. It is really weird, but I believe I have some insight on what is causing it - The tests calls multiple mmaps/munmaps in parallel. Each thread has it's own gem object on which it operates. However a set of simple debug printfs shows that linux_cdev_handle_remove gets called from a thread different then the one that has created the handle, while the original thread is trying to use it, leading to a race in the creation/deletion. I am not 100% sure, but I am inclined to attribute this behavior to the deferred deallocation of vm_maps in vm_map_entry_delete.
So with the patch reverted, the test passes. I am just wondering if this is not hiding an unpleasant surprise further down the road?
Any comments and suggestions are appreciated,
Best,
Yanko
The text was updated successfully, but these errors were encountered: