Perfomance Portablity

ShyLU-node: On-node scalable solvers and preconditioners: Recent progress and current performance
Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
Performance-portable sparse matrix-matrix multiplication for many-core architectures