[QUESTION]: Need help with race condition #1317

elekta-cp-rnca · 2025-01-28T02:09:37Z

Question

Soliciting help.

Trying to sort 1D array with Odd-Even (Brick) sort. Race condition popping up in arrays ~100 or larger. Condition worsens as array size gets large ... 500, 1000, etc. Assuming issues when array larger than threads/block ... 768 in this GPU. Seeing condition for smaller number though. Noticed that placing Interops can help or worsen results. In some cases array values are lost and duplicated with others.

Is there a way to prevent this? __syncthreads(), atomic(bools) ... etc?

Trying to keep minimal memory footprint on device ... only 1 array.

Program.txt

Environment

ILGPU version: 1.5.1
.NET 4.8
Windows 10 Enterprise LTSC (v. 10.0.17763 Build 17763)
NVIDIA RTX 6000 Ada

Additional context

namespace GPU_1D_Sort
{
internal class Program
{
static void Main(string[] args)
{

        //ILGPU.Context ctx = ILGPU.Context.Create(builder => builder.Cuda().Debug());

        //Console.WriteLine(Debugger.IsAttached.ToString());

        //Device gpuDevice = ctx.GetCudaDevice(0);
        //Accelerator gpu_acc = ctx.GetPreferredDevice(preferCPU: false).CreateAccelerator(ctx);

        //gpuDevice.PrintInformation();

        var sw = Stopwatch.StartNew();

        var ndim = 100;

        var e_matrix = new double[ndim];
        var f_matrix = new double[ndim];

        for (int i = 0; i < e_matrix.Length; i++)
            e_matrix[i] = Convert.ToDouble(i);                  

        var rand = new Random();
        for (int i=0; i<e_matrix.Length; i++)
        {
            int j = rand.Next(e_matrix.Length);
            double temp = e_matrix[i];
            e_matrix[i] = e_matrix[j];
            e_matrix[j] = temp;
        }
        for (int i = 0; i < f_matrix.Length; i++)
        {
            f_matrix[i] = e_matrix[i];
        }

        if (e_matrix.GroupBy(x => x).Any(g => g.Count() >1))
        {
            Console.WriteLine("Contains Duplicates!");
        }
        else
        {
            Console.WriteLine("Does not contain Duplicates!");
        }

        sw.Restart();

        var ctx_cu_1d = Context.Create(builder => builder.Cuda().EnableAlgorithms().Debug());

        var device_1d = ctx_cu_1d.GetCudaDevices()[0];

        var gpu_cu_1d_acc = device_1d.CreateCudaAccelerator(ctx_cu_1d);

        MemoryBuffer1D<double, Stride1D.Dense> gpu_e_matrix = gpu_cu_1d_acc.Allocate1D<double>(e_matrix.GetExtent());

        gpu_e_matrix.CopyFromCPU(e_matrix);

        Action<Index1D, ArrayView1D<double, Stride1D.Dense>> loadsort1dkernel =
           gpu_cu_1d_acc.LoadAutoGroupedStreamKernel<Index1D, ArrayView1D<double, Stride1D.Dense >> (Sort_1D_Kernel);

        loadsort1dkernel(gpu_e_matrix.Extent.ToIntIndex(), gpu_e_matrix);
        //loadsort1dkernel(1, gpu_e_matrix, gpu_isSort);

        //Console.WriteLine("I am here!");
        //Console.ReadLine();
        //Environment.Exit(0);

        gpu_cu_1d_acc.Synchronize();

        gpu_e_matrix.CopyToCPU(e_matrix);

        
       for (int i = 0; i < e_matrix.Length; i++)
       //for (int i = 0; i < 50; i++)
       {
            Console.WriteLine(String.Format("{0},{1},{2}", e_matrix[i], i, f_matrix[i]));
       }
        Console.WriteLine("End 1D Sort on GPU");

        gpu_e_matrix.Dispose();

        gpu_cu_1d_acc.Dispose();
        ctx_cu_1d.Dispose();

        sw.Stop();
        Console.WriteLine($"- Sort-1D Array GPU Atomic.Exchange Calculation Time: {sw.Elapsed}");



        Console.ReadKey();

        Environment.Exit(0);

    }

    public static void Sort_1D_Kernel(Index1D index, ArrayView1D<double, Stride1D.Dense> gpu_e_matrix)
    {
        var x = index.X;

        int col_num = gpu_e_matrix.IntExtent.X;

        bool sw_OE = false;

        //Interop.WriteLine("ThreadID: {0}", Grid.GlobalIndex.X);
        //Interop.WriteLine("Threads/block: {0}", ILGPU.Group.Dimension.X);
        //Interop.WriteLine("BlockID: {0}", Grid.Index.X);

        //while (gpu_e_matrix[x] > gpu_e_matrix[x + 1])
        for (int i = 0; i < col_num*2; i++)                 // array[N], N iterations guarantees solution worst case (supposedly!)
        {

            if (!sw_OE)             //Odd-phase
            {
                if ((x < col_num - 1) && (x % 2 != 0))
                {
                    //if(x == 1)
                    //Interop.WriteLine("Index OP is: {0}, {1}, {2}, {3}", x, x+1, gpu_e_matrix[x], gpu_e_matrix[x+1]);

                    if (gpu_e_matrix[x] > gpu_e_matrix[x + 1])
                    {
                        //if (x == 1)
                        //{
                            //Interop.WriteLine("Index OP Before is: {0}, {1}, {2}", 1, gpu_e_matrix[1], gpu_e_matrix[2]);
                        //}
                        
                        gpu_e_matrix[x + 1] = Atomic.Exchange(ref gpu_e_matrix[x], gpu_e_matrix[x + 1]);

                        //if (x == 1)
                        //{
                            //Interop.WriteLine("Index OP After is: {0}, {1}, {2}", 1, gpu_e_matrix[1], gpu_e_matrix[2]);
                        //}

                    }
                }
                sw_OE = true;

            }
            //Interop.WriteLine("gpu_isSort post odd is: {0}", gpu_isSort[0]);


            if (sw_OE)              //Even-phase
            {
                
                if ((x < col_num - 1) && (x % 2 == 0))
                {

                    //Interop.WriteLine("Index EP is: {0}", x);
                    if (gpu_e_matrix[x] > gpu_e_matrix[x + 1])
                    {
                        //if (x == 0)
                        //{
                            //Interop.WriteLine("Index EP Before is: {0}, {1}, {2}", 0, gpu_e_matrix[0], gpu_e_matrix[1]);
                        //}

                        gpu_e_matrix[x + 1] = Atomic.Exchange(ref gpu_e_matrix[x], gpu_e_matrix[x + 1]);

                        //if (x == 0)
                        //{
                            //Interop.WriteLine("Index EP After is: {0}, {1}, {2}", 0, gpu_e_matrix[0], gpu_e_matrix[1]);
                        //}

                    }
                }
                sw_OE = false;
            }

        }
    }


}

}

The text was updated successfully, but these errors were encountered:

elekta-cp-rnca · 2025-01-29T16:44:56Z

Group.Barrier for thread sync solved for a single block (768 threads ... elements long). Above 768 we are into two blocks or more.

Is there a way to do Cooperative_Groups in ILGPU?

elekta-cp-rnca · 2025-01-30T16:47:58Z

Seems ... Explicit kernels and stride are my friends ...

elekta-cp-rnca · 2025-02-05T16:59:18Z

Anyone have ideas on block sync? Code works until u use more than 1 block. Grid.Sync?

elekta-cp-rnca added help wanted question labels Jan 28, 2025

elekta-cp-rnca closed this as completed Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION]: Need help with race condition #1317

[QUESTION]: Need help with race condition #1317

elekta-cp-rnca commented Jan 28, 2025

elekta-cp-rnca commented Jan 29, 2025

elekta-cp-rnca commented Jan 30, 2025

elekta-cp-rnca commented Feb 5, 2025

[QUESTION]: Need help with race condition #1317

[QUESTION]: Need help with race condition #1317

Comments

elekta-cp-rnca commented Jan 28, 2025

Question

Environment

Additional context

elekta-cp-rnca commented Jan 29, 2025

elekta-cp-rnca commented Jan 30, 2025

elekta-cp-rnca commented Feb 5, 2025