ChangeLog

Version 0.91-dev
    various bug fixes
    input1d now accepts int1 and uint1 types as image index
    input2d now accepts int2 and uint2 types as image index
    ps::get_global_id for pixel shaders
    il function support ( func example )
    doesn't need 'code << "end\n"' at the end of create_kernel
    improvements to cal::Kernel class
    uav support for forcing cached reads
    removing use of flat thread index ( much slower on 5xxx )
    compilation fix for SDK 2.4
    improved ::cal::Kernel class ( using pinned memory for const transfer, removing unnecessary calls )
    abs, min, max math functions
    support for program binary image
    class for file caching binary images ( cal/cal_program_cache.hpp )
    redistributing CAL related header files from AMD SDK ( removing dependency on AMD SDK - only drivers required )
    bitselect function
    bitextract function ( amd_bfe )
    thread safety ( enabled by flag __CAL_THREADSAFE )
    automatic use of fma instead of mad ( with flag __CAL_USE_AUTOFMA )

Version 0.90
    support for offset in sample load
    new matrix multiplication example
    uav support
    new example - uav write
    cast_bits ( and as_typeN opencl like functions )
    changes to double swizzle handling
    fixes in lds handling
    emmit_comment function for adding comments to generated IL
    type traits
    nbody example using double variables for computations
    frexp
    more accurate log(double)
    changes to relational operators for double types ( return type matches input vectors length )
    changes to select ( handling of double types )
    new reciprocal and native_reciprocal functions ( calculating 1/x )
    sqrt and native_sqrt functions
    native_exp and native_log functions
    native_rsqrt and improved precision for rsqrt(double)
    mixed IL and C types operators (+,-,...)
    refactoring of math library ( to improve compilation times only required files can be included )
    changes in files layout ( cal/il/cal_il.hpp moved to cal/cal_il.hpp, cal/il/cal_il_math.hpp moved to cal/cal_il_math.hpp )
    improved mul, div, mod operators with C types ( automatic use of shl, shr, ldexp for power of 2 values ) - to enable compile with defined __CAL_USE_IMPROVED_MULDIV - use with caution ( double ldexp may triger CAL driver bug causing invalid results )
    regression directory with not working kernels ( due to bugs in ATI SDK )
    integer division
    device target name
    uav atomic functions
    lds atomic functions
    uavatomics example

Version 0.87.1
    compilation fix for SDK 2.3

Version 0.87
    IL kernel link bug fix
    support for pinned memory
    new example - optimal n-body brute-force algorithm implementation
    various bug fixes

Version 0.86.3
    rotate bug fix
    Non blocking wait disabled by default. To enable __CAL_USE_NON_BLOCKING_WAIT must be defined

Version 0.86.2
    small bug fix to rotate

Version 0.86.1
    small improvement to rotate

Version 0.86
    Improvements to memory handling. Memory object allocates CAL resource for each device in Context.
    The proper resource for device is later selected by CommandQueue object.

Version 0.85
    Various bug fixes
    ability to use target device information ( wavefront size, device type, etc ) for code generation
    rotate function
    non-blocking waiting for event completion

Version 0.8b
    Compilation fixes for Visual C++
    Improved cmake files

Version 0.8a
    Compilation fixes for gcc 4.4

Version 0.8
    Initial distribution

ChangeLog starts here.