Publications

(2024). Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System. arXiv preprint arXiv:2405.07898.

Cite

(2023). TenSQL: An SQL Database Built on GraphBLAS. 2023 IEEE High Performance extreme Computing Conference (HPEC).

Cite

(2023). Predicting electronic structures at any length scale with machine learning. npj Computational Materials.

Cite

(2023). An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2023). A Comparison of Spectral and Spatial Graph Convolutional Neural Network Kernels Using GraphSAGE-Sparse. 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2023). Performance Portable Batched Sparse Linear Solvers. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2022). High-Performance GMRES Multi-Precision Benchmark: Design, Performance, and Challenges. 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

Cite

(2022). Training-free hyperparameter optimization of neural networks for electronic structures in matter. Machine Learning: Science and Technology.

Cite

(2022). Understanding the design-space of sparse/dense multiphase GNN dataflows on spatial accelerators. 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2022). Parallel, Portable Algorithms for Distance-2 Maximal Independent Set and Graph Coarsening. 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2022). Concentric Spherical Neural Network for 3D Representation Learning. 2022 International Joint Conference on Neural Networks (IJCNN), 2022.

Cite DOI

(2022). Parallel graph coloring algorithms for distributed GPU environments. Parallel Computing.

Cite

(2022). FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica. SIAM Journal on Scientific Computing.

Cite

(2022). Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity. arXiv preprint arXiv:2201.08916.

Cite

(2022). Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2021). Experimental evaluation of multiprecision strategies for GMRES on GPUs. 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2021). Extending Sparse Tensor Accelerators to Support Multiple Compression Formats. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2021). Union: A unified HW-SW Co-Design ecosystem in MLIR for evaluating tensor operations on spatial accelerators. 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT).

Cite

(2021). The Kokkos EcoSystem: Comprehensive Performance Portability For High Performance Computing. Computing in Science and Engineering.

Cite

(2021). Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems. Parallel Computing.

Cite

(2021). Performance-portable graph coarsening for efficient multilevel graph analysis. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2021). Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels. arXiv preprint arXiv:2103.11991.

Cite

(2021). Kokkos 3: Programming model extensions for the exascale era. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2021). FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica. Universität zu Köln.

Cite

(2021). Extending Sparse Tensor Accelerators to Support Multiple Compression Formats. arXiv preprint arXiv:2103.10452.

Cite

(2021). Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs. arXiv preprint arXiv:2105.07544.

Cite

(2021). EXAGRAPH: Graph and combinatorial methods for enabling exascale applications. The International Journal of High Performance Computing Applications.

Cite

(2021). Concentric Spherical GNN for 3D Representation Learning. arXiv preprint arXiv:2103.10484.

Cite

(2021). Co-design center for exascale machine learning technologies (ExaLearn). The International Journal of High Performance Computing Applications.

Cite

(2021). Accelerating finite-temperature Kohn-Sham density functional theory with deep neural networks. Physical Review B.

Cite

(2021). A survey of numerical methods utilizing mixed precision arithmetic. The International Journal of High Performance Computing Applications.

Cite

(2021). A Study of Mixed Precision Strategies for GMRES on GPUs. arXiv preprint arXiv:2109.01232.

Cite

(2021). A Block-Based Triangle Counting Algorithm on Heterogeneous Environments. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2020). SPHYNX: Spectral Partitioning for HYbrid aNd aXelerator-enabled systems. 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2020). Scalable, multi-constraint, complex-objective graph partitioning. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2020). Scalable asynchronous domain decomposition solvers. SIAM Journal on Scientific Computing.

Cite

(2020). Preparing sparse solvers for exascale computing. Philosophical Transactions of the Royal Society A.

Cite

(2020). Performance portable supernode-based sparse triangular solver for manycore architectures. 49th International Conference on Parallel Processing-ICPP.

Cite

(2020). Distributed Memory Graph Coloring Algorithms for Multiple GPUs. 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3).

Cite

(2020). An algebraic sparsified nested dissection algorithm using low-rank approximations. SIAM Journal on Matrix Analysis and Applications.

Cite

(2020). ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems.. WACCPD@ SC.

Cite

(2020). A survey of numerical methods utilizing mixed precision arithmetic. arXiv preprint arXiv:2007.06674.

Cite

(2020). A Performance-Portable Nonhydrostatic Atmospheric Dycore for the Energy Exascale Earth System Model Running at Cloud-Resolving Resolutions.. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

Cite

(2019). Scalable triangle counting on distributed-memory systems. 2019 IEEE High Performance Extreme Computing Conference (HPEC).

Cite

(2019). Scalable inference for sparse deep neural networks using Kokkos kernels. 2019 IEEE High Performance Extreme Computing Conference (HPEC).

Cite

(2019). Scalable generation of graphs for benchmarking HPC community-detection algorithms. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

Cite

(2019). Linear algebra-based triangle counting via fine-grained tasking on heterogeneous environments:(Update on static graph challenge). 2019 IEEE High Performance Extreme Computing Conference (HPEC).

Cite

(2019). Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2019). A robust hierarchical solver for ill-conditioned systems with applications to ice sheet modeling. Journal of Computational Physics.

Cite

(2019). A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures. International Workshop on Accelerator Programming Using Directives.

Cite

(2019). A Parallel Graph Algorithm for Detecting Mesh Singularities in Distributed Memory Ice Sheet Simulations. Proceedings of the 48th International Conference on Parallel Processing.

Cite

(2018). Tacho: memory-scalable task parallel sparse Cholesky factorization. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2018). Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Computing.

Cite

(2018). Geometric partitioning and ordering strategies for task mapping on parallel computers. arXiv preprint arXiv:1804.09798.

Cite

(2018). FROSch: a fast and robust overlapping Schwarz domain decomposition preconditioner based on Xpetra in Trilinos. International Conference on Domain Decomposition Methods.

Cite

(2018). Fast triangle counting using cilk. 2018 IEEE High Performance extreme Computing Conference (HPEC).

Cite

(2018). Experimental design of work chunking for graph algorithms on high bandwidth memory architectures. 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2018). Ensemble grouping strategies for embedded stochastic collocation methods applied to anisotropic diffusion problems. SIAM/ASA Journal on Uncertainty Quantification.

Cite

(2018). Asynchronous one-level and two-level domain decomposition solvers. International Conference on Domain Decomposition Methods.

Cite

(2018). A distributed-memory hierarchical solver for general sparse linear systems. Parallel Computing.

Cite

(2017). Performance-portable sparse matrix-matrix multiplication for many-core architectures. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2017). Partitioning trillion-edge graphs in minutes. 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2017). Order or shuffle: Empirically evaluating vertex order impact on parallel graph computations. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2017). Fast linear algebra-based triangle counting with kokkoskernels. 2017 IEEE High Performance Extreme Computing Conference (HPEC).

Cite

(2017). Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures. SIAM Journal on Scientific Computing.

Cite

(2017). Distributed graph layout for scalable small-world network analysis. arXiv preprint arXiv:1701.00503.

Cite

(2017). Designing vector-friendly compact BLAS and LAPACK kernels. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

Cite

(2017). Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts. Parallel Computing.

Cite

(2016). Parallel graph coloring for manycore architectures. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2016). Complex network partitioning using label propagation. SIAM Journal on Scientific Computing.

Cite

(2016). Basker: a threaded sparse lu factorization utilizing hierarchical parallelism and data layouts. 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2016). A survey of direct methods for sparse linear systems. Acta Numerica.

Cite

(2016). A comparison of high-level programming choices for incomplete sparse factorization across different architectures. 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

Cite

(2016). A case study of complex graph analysis in distributed memory: Implementation and optimization. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

Cite

(2015). Multi-jagged: A scalable parallel spatial partitioning algorithm. IEEE Transactions on Parallel and Distributed Systems.

Cite

(2015). High-performance graph analytics on manycore processors. 2015 IEEE International Parallel and Distributed Processing Symposium.

Cite

(2014). Building blocks for graph based network analysis. 2014 IEEE High Performance Extreme Computing Conference (HPEC).

Cite

(2014). Towards extreme-scale simulations with next-generation Trilinos: a low Mach fluid application case study. 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

Cite

(2014). Towards extreme-scale simulations for low mach fluids with second-generation Trilinos. Parallel Processing Letters.

Cite

(2014). PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. 2014 IEEE International Conference on Big Data (Big Data).

Cite

(2014). Exploiting geometric partitioning in task mapping for parallel computers. 2014 IEEE 28th international parallel and distributed processing symposium.

Cite

(2014). Domain decomposition preconditioners for communication-avoiding Krylov methods on a hybrid CPU/GPU cluster. SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

Cite

(2014). BFS and coloring-based parallel algorithms for strongly connected components and related problems. 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

Cite

(2014). A hybrid approach for parallel transistor-level full-chip circuit simulation. International Conference on High Performance Computing for Computational Science.

Cite

(2013). Scalable matrix computations on large scale-free graphs using 2D graph partitioning. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.

Cite

(2013). Electrical modeling and simulation for stockpile stewardship. XRDS: Crossroads, The ACM Magazine for Students.

Cite

(2012). ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms. Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International.

Cite

(2012). Parallel partitioning with zoltan: Is hypergraph partitioning worth it?. Graph Partitioning and Graph Clustering.

Cite

(2012). Multithreaded Algorithms for Maximum Matching in Bipartite Graphs. Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International.

Cite

(2012). Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems. Scientific Programming.

Cite

(2011). Enabling next-generation parallel circuit simulation with Trilinos. European Conference on Parallel Processing.

Cite

(2011). An Evaluation of the Zoltan Parallel Graph and Hypergraph Partitioners.. 10th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering.

Cite

(2011). A study of combinatorial issues in a sparse hybrid solver. Proceedings of SciDAC conference.

Cite

(2008). Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software (TOMS).

Cite

(2007). System and method for dynamically disabling partially streamed content. US Patent 7,308,504.

Cite

(2007). System and method for cluster-sensitive sticky load balancing. US Patent 7,185,096.

Cite