CSC

Jet: Multilevel graph partitioning on graphics processing units
TenSQL: An SQL Database Built on GraphBLAS
Parallel graph coloring algorithms for distributed GPU environments
FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica
Kokkos Kernels

See Kokkos Kernels Website for more details.

Mixed Precision in Trilinos
Mixed-Precision Schemes for Linear Algebra Kernels on GPUs
A Block-Based Triangle Counting Algorithm on Heterogeneous Environments
A survey of numerical methods utilizing mixed precision arithmetic
EXAGRAPH: Graph and combinatorial methods for enabling exascale applications
FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica
Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels
Performance-portable graph coarsening for efficient multilevel graph analysis
Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems
A survey of numerical methods utilizing mixed precision arithmetic
An algebraic sparsified nested dissection algorithm using low-rank approximations
Distributed Memory Graph Coloring Algorithms for Multiple GPUs
Performance portable supernode-based sparse triangular solver for manycore architectures
Preparing sparse solvers for exascale computing
Scalable asynchronous domain decomposition solvers
Scalable, multi-constraint, complex-objective graph partitioning
SPHYNX: Spectral Partitioning for HYbrid aNd aXelerator-enabled systems
A Parallel Graph Algorithm for Detecting Mesh Singularities in Distributed Memory Ice Sheet Simulations
A robust hierarchical solver for ill-conditioned systems with applications to ice sheet modeling
Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks
Linear algebra-based triangle counting via fine-grained tasking on heterogeneous environments:(Update on static graph challenge)
Scalable generation of graphs for benchmarking HPC community-detection algorithms
Scalable triangle counting on distributed-memory systems
A distributed-memory hierarchical solver for general sparse linear systems
Asynchronous one-level and two-level domain decomposition solvers
Fast triangle counting using cilk
FROSch: a fast and robust overlapping Schwarz domain decomposition preconditioner based on Xpetra in Trilinos
Geometric partitioning and ordering strategies for task mapping on parallel computers
Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
Tacho: memory-scalable task parallel sparse Cholesky factorization
Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts
Distributed graph layout for scalable small-world network analysis
Fast linear algebra-based triangle counting with kokkoskernels
Partitioning trillion-edge graphs in minutes
Performance-portable sparse matrix-matrix multiplication for many-core architectures
A survey of direct methods for sparse linear systems