Bugfix and speed up
Pre-release
Pre-release
Bugfix:
- Fix bug for memory leak when create session for some model.
- Fix metal backend's serveral bug.
- Small bugfix for ios demo
- Fix bug for 3d BatchNormal Module don't work for MNNTrain
- Fix memory leak for CPUInterp
- Fix bug for stack error for MNNPackedMatMulRemain.S
- Fix bug for SSE branch use AVX instruction
Optimize:
- Reduce buffer create for metal execute.
- Reduce memory copy in CPUBatchMatMul.
- Reduce memory copy for Module
- Use neon to optimize CPUTopKV2
- Reduce memory usage for CUDA Backend by split and merge
Feature:
- Support multi-instance for Module.
- Serveral CI scripts.
- More op for ARM82 Backend ()
- More op for CUDA (ArgMax, BatchMatMul, GatherV2, LayerNorm ......)