Skip to content

This library integrate multiple culling algorithm into One system.

License

Notifications You must be signed in to change notification settings

Ralf12358/EveryCulling

Repository files navigation

EveryCulling

This library integrate multiple culling system into One library.

This System contain Multithread ViewFrustumCulling, Masked SW Occlusion Culling, Distance Culling
Most of Systems in this library is actually used in the commercial game engines.

This project tries to integrate them into one system and make them easy to use.

Core Feature

This library is targeting Maximizing SIMD, Cache hit, Multi Threading.

  1. SIMD : Data is stored for using SIMD Intrinsics ( Object's Datas has SoA layout, check EntityBlock.h ) ( Require AVX2 ( _mm256 ) )
  2. Cache Hit : SoA!! ( Structure of Arrays, check EntityBlock.h )
  3. Multi Threading : Data of entities is separately stored in entity block, Then Threads works on a entity block. These structure prevent data race and cache coherency ( false sharing, check EntityBlock.h, VertexData.h ), Locking is not required.

Fully implemented features

View Frustum Culling from Frostbite Engine of EA Dice ( 100% )

Code directory : https://github.com/SungJJinKang/EveryCulling/tree/main/CullingModule/ViewFrustumCulling

Video
Slide Resource
GDC Talk Video
한국어 블로그 글

MultiThreaded View Frustum Culling is 8ms faster than SingleThreaded View Frustum Culling ( in Stress Test )

Feature 1 : Transform Data of Entities is stored linearlly to maximize utilizing SIMD. ( Data oriented Design )

For Maximizing Cache Hitting, Data is allocated adjacently.

Ordinary Way

float objectData[] = { FrustumPlane1-NormalX, FrustumPlane1-NormalY, FrustumPlane1-NormalZ, FrustumPlane1-Distance, FrustumPlane2-NormalX, FrustumPlane2-NormalY, FrustumPlane2-NormalZ, FrustumPlane2-Distance, FrustumPlane3-NormalX, FrustumPlane3-NormalY, FrustumPlane3-NormalZ, FrustumPlane3-Distance, FrustumPlane4-NormalX, FrustumPlane4-NormalY, FrustumPlane4-NormalZ, FrustumPlane4-Distance }

Data oriented Design

float objectData[] = { FrustumPlane1-NormalX, FrustumPlane2-NormalX, FrustumPlane3-NormalX, FrustumPlane4-NormalX, FrustumPlane1-NormalY, FrustumPlane2-NormalY, FrustumPlane3-NormalY, FrustumPlane4-NormalY, FrustumPlane1-NormalZ, FrustumPlane2-NormalZ, FrustumPlane3-NormalZ, FrustumPlane4-NormalZ, FrustumPlane1-Distance, FrustumPlane2-Distance, FrustumPlane3-Distance, FrustumPlane4-Distance }

And Frustun checking use SIMD instruction.

Combine upper two feature!.

Plane1 NormalX    Plane2 NormalX    Plane3 NormalX    Plane4 NormalX     
Plane1 NormalY    Plane2 NormalY    Plane3 NormalY    Plane4 NormalY     
Plane1 NormalZ    Plane2 NormalZ    Plane3 NormalZ    Plane4 NormalZ     
Plane1 Distance   Plane2 Distance   Plane3 Distance   Plane4 Distance     

Ojbect PointX     Ojbect PointX     Ojbect PointX      Ojbect PointX
Ojbect PointY     Ojbect PointY     Ojbect PointY      Ojbect PointY
Ojbect PointZ     Ojbect PointZ     Ojbect PointZ      Ojbect PointZ
      1                 1                 1                  1
      
      
Plane5 NormalX    Plane6 NormalX    Plane5 NormalX    Plane6 NormalX     
Plane5 NormalY    Plane6 NormalY    Plane5 NormalY    Plane6 NormalY     
Plane5 NormalZ    Plane6 NormalZ    Plane5 NormalZ    Plane6 NormalZ     
Plane5 Distance   Plane6 Distance   Plane5 Distance   Plane6 Distance     

Ojbect PointX     Ojbect PointX     Ojbect PointX      Ojbect PointX
Ojbect PointY     Ojbect PointY     Ojbect PointY      Ojbect PointY
Ojbect PointZ     Ojbect PointZ     Ojbect PointZ      Ojbect PointZ
      1                 1                 1                  1

                            |
                            | Multiply each row
                            V
                                    
Plane1 NormalX * Ojbect PointX    Plane2 NormalX * Ojbect PointX    Plane3 NormalX * Ojbect PointX    Plane4 NormalX * Ojbect PointX     
Plane1 NormalY * Ojbect PointY    Plane2 NormalY * Ojbect PointY    Plane3 NormalY * Ojbect PointY    Plane4 NormalY * Ojbect PointY
Plane1 NormalZ * Ojbect PointZ    Plane2 NormalZ * Ojbect PointZ    Plane3 NormalZ * Ojbect PointZ    Plane4 NormalZ * Ojbect PointZ
     Plane1 Distance * 1               Plane2 Distance * 1               Plane3 Distance * 1               Plane4 Distance * 1   
     
Plane5 NormalX * Ojbect PointX    Plane6 NormalX * Ojbect PointX    Plane5 NormalX * Ojbect PointX    Plane6 NormalX * Ojbect PointX     
Plane5 NormalY * Ojbect PointY    Plane6 NormalY * Ojbect PointY    Plane5 NormalY * Ojbect PointY    Plane6 NormalY * Ojbect PointY
Plane5 NormalZ * Ojbect PointZ    Plane6 NormalZ * Ojbect PointZ    Plane5 NormalZ * Ojbect PointZ    Plane6 NormalZ * Ojbect PointZ
     Plane1 Distance * 1               Plane2 Distance * 1               Plane3 Distance * 1               Plane4 Distance * 1   

                            |
                            | Sum all row
                            V
                        
Plane1 Dot Object      Plane2 Dot Object      Plane3 Dot Object      Plane4 Dot Object
Plane5 Dot Object      Plane6 Dot Object      Plane5 Dot Object      Plane6 Dot Object

                            |
                            |   If Dot is larget than 0, Set 1
                            V
   1      0      1      1                                                1      1      1      1                                                                  
   1      1      1      1                                                1      1      1      1
                                                                         
             |                                                 or                  |
             |   To be rendered, All value should be 1                             |   To be rendered, All value should be 1
             V                                                                     V
                                                                         
         Culled!!!!!                                                            Rendered

Feature 2 : Entities is divided to Entity Blocks.

Entity Block 1 : Entity 1 ~ Entity 50 
Entity Block 2 : Entity 51 ~ Entity 100 
Entity Block 3 : Entity 101 ~ Entity 150
      
               |
               |
               V
             
Thread 1 : Check Frustum of Entity Block 1, 4, 7
Thread 2 : Check Frustum of Entity Block 2, 5, 8

Because Each Entity Blocks is seperated, Threads can check whether object is culled without data race.

To minimize waiting time(wait calculating cull finish) , Passing cull job to thread should be placed at foremost of rendering loop.
In My experiment, Waiting time is near to zero.

Masked SW ( CPU ) Occlusion Culling From Intel ( 100% )

Code directory : https://github.com/SungJJinKang/EveryCulling/tree/main/CullingModule/MaskedSWOcclusionCulling

Stage 1 : Solve Mesh Role Stage ( Decide occluder based on object's screen space bouding sphere's size )
Stage 2 : Bin Occluder Triangle Stage ( Dispatch(Bin) triangles to screen tiles based on triangle's screen space vertex data for following rasterizer stage )
Stage 3 : Multithread Rasterize Occluder Triangles ( Threads do job rasterizing each tile's binned triangles, calculate max depth value of tile )
Stage 4 : Multithread Query depth buffer ( Compare aabb of occludee's min depth value with tile depth buffer. check 52p https://www.ea.com/frostbite/news/culling-the-battlefield-data-oriented-design-in-practice )

Reference paper : https://software.intel.com/content/dam/develop/external/us/en/documents/masked-software-occlusion-culling.pdf
개발 일지 : https://sungjjinkang.github.io/computerscience/computergraphics/2021/12/31/masked_sw_occlusion_culling.html
Video : https://youtu.be/tMgokVljvAY, https://youtu.be/1IKTXsSLJ5g
동작 원리 한국어 설명 : "Masked Software Occlusion Culling"는 어떻게 작동하는가?
references : https://software.intel.com/content/dam/develop/external/us/en/documents/masked-software-occlusion-culling.pdf, https://www.slideshare.net/IntelSoftware/masked-software-occlusion-culling, https://www.slideshare.net/IntelSoftware/masked-occlusion-culling

Please read this :
SW Occlusion Culling doesn't make always good performance. It can makes rendering slower than not using it. It depends on scene.
If there is many occludees, it's good to use it. If not, It's better not to use it because Cost of rasterizing occluder can be more expensive than cost of rendering occludees. As you know, CPU isn't good at this computation. So it should be used carefully
I'm trying make rasterizing occluder faster.

Distance Culling From Unreal Engine ( 100% )

Code directory : https://github.com/SungJJinKang/EveryCulling/tree/main/CullingModule/DistanceCulling

This feature is referenced from Unreal Engine.
You can see How this feature works from here
Objects's visibility is decided based on distance between object and camera. ( Distance is computed using SIMD for performance )
With this feature, You can make tiny object not to be rendered when it is far from camera.
If Object's DesiredMaxDrawDistance is larger than distance between object and camera, The object is culled.

References

About

This library integrate multiple culling algorithm into One system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published