CV
Education
- Ph.D. in Computational Science and Engineering, Georgia Institute of Technology, 2019.5 - 2024.8 (expected), advisor: Prof. Edmond Chow.
- M.S. in Computer Science, Georgia Institute of Technology, 2017.8 - 2019.5, advisor: Prof. Edmond Chow.
- B.S. in Information and Computing Science, Sun Yat-sen University, 2013.8 - 2017.6, advisor: Dr. Weicai Ye(叶纬材博士).
Skills
- Programming languages: C, C++, MATLAB, Fortran, Python
- Frameworks and technologies: OpenMP, MPI, CUDA, OpenCL, SYCL, x86 intrinsics, ARM (NEON and SVE) intrinsics
Work Experience
- Developer Technology Engineer, NVIDIA (2024.08 - now)
Intern Experience
- Software Engineer Intern, Meta (Facebook) Inc (2021.5 - 2021.8, supervisor: Dr. Xing Liu)
- Investigated different parallelization schemes of the Mixture of Experts (MoE) model.
- Implemented a PyTorch-based hybrid parallel MoE model for a recommendation system.
- Delivered a 6x speedup on 64 NVIDIA V100 GPUs compared to the existing MoE model deployed in production.
- Research Intern, Intel Corporation (2020.8 - 2020.12, supervisor: Dr. Jeff Hammond)
- Analyzed the code of Kokkos Remote Workspace and NVSHMEM.
- Designed and implemented CL-PGAS using SYCL + MPI-3, the first library in the industry enabling inter-node data transfer controlled by kernels running on Intel / NVIDIA / AMD GPUs.
- Achieved a 95 % performance level compared to NVIDIA’s proprietary library NVSHMEM.
Research Experience
- Graduate Research Assistant, Georgia Institute of Technology (2018.1 - 2024.8, supervisor: Prof. Edmond Chow)
- Accelerating quantum chemistry calculations: up to 6x overall speedup of Hartree-Fock calculation for a 1208-atom molecular system with multiple optimizations.
- Developing the H2Pack library: up to 6x speedup over state-of-the-art FMM-based method in high-accuracy calculations, 90% memory usage reduction and a 7x speedup compared to contemporary methods for quantum chemistry and hydrodynamic simulation calculations.
- Designing parallel linear algebra algorithms: up to 25% and 4x speedup for dense / sparse matrix multiplications compared to state-of-the-art method.
Publications
Peer-reviewed Conference Publications
[C.4] [SC 24] Many-Body Electronic Correlation Energy using Krylov Subspace Linear Solvers.
Shikhar Shah, Boqin Zhang, Hua Huang, John E. Pask, Phanish Suryanarayana, and Edmond Chow.
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2024.
pdf IEEE
[C.3] [SC 22] CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication.
Hua Huang and Edmond Chow.
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2022.
pdf ACM (includes presentation video)
[C.2] [IPDPS 19] Overlapping Communications with Other Communications and its Application to Distributed Dense Matrix Computations. Hua Huang and Edmond Chow.
International Parallel & Distributed Processing Symposium (IPDPS), 2019.
pdf IEEE
[C.1] [SC 18] Accelerating Quantum Chemistry with Vectorized and Batched Integrals.
Hua Huang and Edmond Chow.
International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2018.
pdf ACM
Peer-reviewed Journal Publications
[J.9] [IEEE Trans. Parallel Distrib. Syst] Exploring the Design Space of Distributed Parallel Sparse Matrix‑Multiple Vector Multiplication. Hua Huang and Edmond Chow.
IEEE Transactions on Parallel and Distributed Systems, 35(11), 1977-1988 (2024).
pdf IEEE
[J.8] [SIAM J. Sci. Comput.] An Adaptive Factorized Nystrom Preconditioner for Kernel Matrices. Shifan Zhao, Tianshi Xu, Hua Huang, Edmond Chow, and Yuanzhe Xi.
SIAM Journal on Scientific Computing, 46(4), A2351-A2376 (2024).
pdf SIAM
[J.7] [SIAM J. Sci. Comput.] Data-Driven Construction of Hierarchical Matrices with Nested Bases.
Difeng Cai, Hua Huang, Edmond Chow, and Yuanzhe Xi.
SIAM Journal on Scientific Computing, 46(2), S24-S50 (2024).
pdf SIAM
[J.6] [J. Comput. Phys.] A Hierarchical Matrix Approach for Computing Hydrodynamic Interactions.
Xin Xing, Hua Huang, and Edmond Chow.
Journal of Computational Physics, 448, 110761 (2022).
pdf ScienceDirect
[J.5] [SoftwareX] SPARC: Simulation Package for Ab-initio Real-space Calculations.
Qimen Xu, Abhiraj Sharma, Benjamin Comer, Hua Huang, Edmond Chow, Andrew J Medford, John E Pask, and Phanish Suryanarayana. SoftwareX, 15, 100709 (2021).
pdf ScienceDirect
[J.4] [SIAM J. Matrix Anal. Appl.] Efficient Construction of an HSS Preconditioner for Symmetric Positive Definite H2 Matrices.
Xin Xing, Hua Huang, and Edmond Chow.
SIAM Journal on Matrix Analysis and Applications, 42(2), 683–707 (2021).
pdf SIAM
[J.3] [ACM Trans. Math. Softw] H2Pack: High-Performance H2 Matrix Package for Kernel Matrices Using the Proxy Point Method.
Hua Huang, Xin Xing, and Edmond Chow.
ACM Transactions on Mathematical Software, 47(1), 1-29 (2020).
pdf ACM
[J.2] [J. Chem. Phys.] A Linear Scaling Hierarchical Block Low-rank Representation of the Electron Repulsion Integral Tensor.
Xin Xing, Hua Huang, and Edmond Chow.
Journal of Chemical Physics, 153, 084119 (2020).
pdf AIP
[J.1] [J. Chem. Phys.] Techniques for High-Performance Construction of Fock Matrices.
Hua Huang, David Sherrill, and Edmond Chow.
Journal of Chemical Physics, 152, 024122 (2020).
pdf AIP