CA3DMM: Communication-Avoiding 3D Matrix Multiplication

CA3DMM is a parallel dense general matrix multiplication (GEMM) library. It is implemented in C + MPI + OpenMP/CUDA. It has optimal or near-optimal communication costs.

If you use CA3DMM in your work, please cite the following papers:

Hua Huang and Edmond Chow, CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), Dallas, TX, Nov. 13-18, 2022