skip to main content
Volume 21, Issue 2June 2024Current Issue
Bibliometrics
Skip Table Of Content Section
research-article
Open Access
Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators
Article No.: 21, Pages 1–22https://doi.org/10.1145/3633332

General Matrix Multiplication (GEMM) is a computationally expensive operation that is used in many applications such as machine learning. Hardware accelerators are increasingly popular for speeding up GEMM computation, with Tiled Matrix Multiplication (...

research-article
Open Access
WIPE: A Write-Optimized Learned Index for Persistent Memory
Article No.: 22, Pages 1–25https://doi.org/10.1145/3634915

Learned Index, which utilizes effective machine learning models to accelerate locating sorted data positions, has gained increasing attention in many big data scenarios. Using efficient learned models, the learned indexes build large nodes and flat ...

research-article
Open Access
Coherence Attacks and Countermeasures in Interposer-based Chiplet Systems
Article No.: 23, Pages 1–25https://doi.org/10.1145/3633461

Industry is moving towards large-scale hardware systems that bundle processor cores, memories, accelerators, and so on. via 2.5D integration. These components are fabricated separately as chiplets and then integrated using an interposer as an interconnect ...

research-article
Open Access
A Concise Concurrent B+-Tree for Persistent Memory
Article No.: 24, Pages 1–25https://doi.org/10.1145/3638717

Persistent memory (PM) presents a unique opportunity for designing data management systems that offer improved performance, scalability, and instant restart capability. As a widely used data structure for managing data in such systems, B+-Tree must ...

research-article
Open Access
An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs
Article No.: 25, Pages 1–26https://doi.org/10.1145/3639823

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such CNNs is having more heterogeneity than traditional CNNs. This ...

research-article
Open Access
Assessing the Impact of Compiler Optimizations on GPUs Reliability
Article No.: 26, Pages 1–22https://doi.org/10.1145/3638249

Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex ...

research-article
Open Access
Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey
Article No.: 27, Pages 1–26https://doi.org/10.1145/3640542

Performance in scientific and engineering applications such as computational physics, algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the manipulation of large sparse matrices—matrices with a large number of zero elements. ...

research-article
Open Access
An Instruction Inflation Analyzing Framework for Dynamic Binary Translators
Article No.: 28, Pages 1–25https://doi.org/10.1145/3640813

Dynamic binary translators (DBTs) are widely used to migrate applications between different instruction set architectures (ISAs). Despite extensive research to improve DBT performance, noticeable overhead remains, preventing near-native performance, ...

research-article
Open Access
Cost-aware Service Placement and Scheduling in the Edge-Cloud Continuum
Article No.: 29, Pages 1–24https://doi.org/10.1145/3640823

The edge to data center computing continuum is the aggregation of computing resources located anywhere between the network edge (e.g., close to 5G antennas), and servers in traditional data centers. Kubernetes is the de facto standard for the ...

research-article
Open Access
Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses
Article No.: 30, Pages 1–26https://doi.org/10.1145/3641853

Indirect memory accesses (IMAs, i.e., A[f(B[i])]) are typical memory access patterns in applications such as graph analysis, machine learning, and database. IMAs are composed of producer-consumer pairs, where the consumers’ memory addresses are derived ...

research-article
Open Access
Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs
Article No.: 31, Pages 1–24https://doi.org/10.1145/3643682

Convolutional Neural Networks (CNNs) can benefit from the computational reductions provided by the Winograd minimal filtering algorithm and weight pruning. However, harnessing the potential of both methods simultaneously introduces complexity in designing ...

research-article
Open Access
SLAP: Segmented Reuse-Time-Label Based Admission Policy for Content Delivery Network Caching
Article No.: 32, Pages 1–24https://doi.org/10.1145/3646550

‘‘Learned” admission policies have shown promise in improving Content Delivery Network (CDN) cache performance and lowering operational costs. Unfortunately, existing learned policies are optimized with a few fixed cache sizes while in reality, cache ...

research-article
Open Access
Architectural Support for Sharing, Isolating and Virtualizing FPGA Resources
Article No.: 33, Pages 1–26https://doi.org/10.1145/3648475

FPGAs are increasingly popular in cloud environments for their ability to offer on-demand acceleration and improved compute efficiency. Providers would like to increase utilization, by multiplexing customers on a single device, similar to how processing ...

research-article
Open Access
FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration
Article No.: 34, Pages 1–27https://doi.org/10.1145/3649455

DRAM memory is a performance bottleneck for many applications, due to its high access latency. Previous work has mainly focused on data locality, introducing small but fast regions to cache frequently accessed data, thereby reducing the average latency. ...

research-article
Open Access
The Droplet Search Algorithm for Kernel Scheduling
Article No.: 35, Pages 1–28https://doi.org/10.1145/3650109

Kernel scheduling is the problem of finding the most efficient implementation for a computational kernel. Identifying this implementation involves experimenting with the parameters of compiler optimizations, such as the size of tiling windows and ...

research-article
Open Access
Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces
Article No.: 36, Pages 1–23https://doi.org/10.1145/3650110

Trace-based simulation is a widely used methodology for system design exploration. It relies on realistic traces that represent a range of behaviors necessary to be evaluated, containing a lot of information about the application, its inputs and the ...

research-article
Open Access
TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture
Article No.: 37, Pages 1–26https://doi.org/10.1145/3652604

Many real-world networks are characterized by being temporal and dynamic, wherein the temporal information signifies the changes in connections, such as the addition or removal of links between nodes. Employing random walks on these temporal networks is a ...

research-article
Open Access
Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication
Article No.: 38, Pages 1–24https://doi.org/10.1145/3653020

The multiplication of sparse matrix and vector (SpMV) is one of the most widely used kernels in high-performance computing as well as machine learning acceleration for sparse neural networks. The design space of SpMV accelerators has two axes: algorithm ...

research-article
Open Access
NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks
Article No.: 39, Pages 1–26https://doi.org/10.1145/3652607

Graph neural networks (GNNs) are of great interest in real-life applications such as citation networks and drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a ...

research-article
Open Access
xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object Storage
Article No.: 40, Pages 1–20https://doi.org/10.1145/3652606

Object storage has been widely used in the cloud. Traditionally, the size of object metadata is much smaller than that of object data, and thus existing object storage systems (such as Ceph and Oasis) can place object data and metadata, respectively, on ...

research-article
Open Access
Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals
Article No.: 41, Pages 1–25https://doi.org/10.1145/3652605

Many applications are designed to perform traversals on tree-like data structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an ...

Subjects

Comments