Performance Portability

An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs
Batched Linear Solvers in Kokkos Kernels
Performance Portable Batched Sparse Linear Solvers
Kokkos Kernels: Then and Now
Experimental evaluation of multiprecision strategies for GMRES on GPUs
A Study of Mixed Precision Strategies for GMRES on GPUs
Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs
Performance-portable graph coarsening for efficient multilevel graph analysis
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems
Towards simulations on the Exascale hardware and beyond
A Performance-Portable Nonhydrostatic Atmospheric Dycore for the Energy Exascale Earth System Model Running at Cloud-Resolving Resolutions.
Performance portable supernode-based sparse triangular solver for manycore architectures
SPHYNX: Spectral Partitioning for HYbrid aNd aXelerator-enabled systems
Scalable triangle counting on distributed-memory systems
Fast linear algebra-based triangle counting with kokkoskernels
Parallel graph coloring for manycore architectures
Towards extreme-scale simulations for low mach fluids with second-generation Trilinos
Towards extreme-scale simulations with next-generation Trilinos: a low Mach fluid application case study