TACO: Vol 21, No 2

Volume 21, Issue 2June 2024Current Issue

Latest Issue

Volume 21, Issue 2

June 2024

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

PDF eReader

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators

Article No.: 21, Pages 1–22https://doi.org/10.1145/3633332

General Matrix Multiplication (GEMM) is a computationally expensive operation that is used in many applications such as machine learning. Hardware accelerators are increasingly popular for speeding up GEMM computation, with Tiled Matrix Multiplication (...

research-article

Open Access

WIPE: A Write-Optimized Learned Index for Persistent Memory

Article No.: 22, Pages 1–25https://doi.org/10.1145/3634915

Learned Index, which utilizes effective machine learning models to accelerate locating sorted data positions, has gained increasing attention in many big data scenarios. Using efficient learned models, the learned indexes build large nodes and flat ...

research-article

Open Access

Coherence Attacks and Countermeasures in Interposer-based Chiplet Systems

Article No.: 23, Pages 1–25https://doi.org/10.1145/3633461

Industry is moving towards large-scale hardware systems that bundle processor cores, memories, accelerators, and so on. via 2.5D integration. These components are fabricated separately as chiplets and then integrated using an interposer as an interconnect ...

research-article

Open Access

A Concise Concurrent B⁺-Tree for Persistent Memory

Article No.: 24, Pages 1–25https://doi.org/10.1145/3638717

Persistent memory (PM) presents a unique opportunity for designing data management systems that offer improved performance, scalability, and instant restart capability. As a widely used data structure for managing data in such systems, B⁺-Tree must ...

research-article

Open Access

An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs

Article No.: 25, Pages 1–26https://doi.org/10.1145/3639823

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such CNNs is having more heterogeneity than traditional CNNs. This ...

research-article

Open Access

Assessing the Impact of Compiler Optimizations on GPUs Reliability

Article No.: 26, Pages 1–22https://doi.org/10.1145/3638249

Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex ...

research-article

Open Access

Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey

Article No.: 27, Pages 1–26https://doi.org/10.1145/3640542

Performance in scientific and engineering applications such as computational physics, algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the manipulation of large sparse matrices—matrices with a large number of zero elements. ...

research-article

Open Access

An Instruction Inflation Analyzing Framework for Dynamic Binary Translators

Article No.: 28, Pages 1–25https://doi.org/10.1145/3640813

Dynamic binary translators (DBTs) are widely used to migrate applications between different instruction set architectures (ISAs). Despite extensive research to improve DBT performance, noticeable overhead remains, preventing near-native performance, ...

research-article

Open Access

Cost-aware Service Placement and Scheduling in the Edge-Cloud Continuum

Article No.: 29, Pages 1–24https://doi.org/10.1145/3640823

The edge to data center computing continuum is the aggregation of computing resources located anywhere between the network edge (e.g., close to 5G antennas), and servers in traditional data centers. Kubernetes is the de facto standard for the ...

research-article

Open Access

Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses

Article No.: 30, Pages 1–26https://doi.org/10.1145/3641853

Indirect memory accesses (IMAs, i.e., A[f(B[i])]) are typical memory access patterns in applications such as graph analysis, machine learning, and database. IMAs are composed of producer-consumer pairs, where the consumers’ memory addresses are derived ...

research-article

Open Access

Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs

Article No.: 31, Pages 1–24https://doi.org/10.1145/3643682

Convolutional Neural Networks (CNNs) can benefit from the computational reductions provided by the Winograd minimal filtering algorithm and weight pruning. However, harnessing the potential of both methods simultaneously introduces complexity in designing ...

research-article

Open Access

SLAP: Segmented Reuse-Time-Label Based Admission Policy for Content Delivery Network Caching

Article No.: 32, Pages 1–24https://doi.org/10.1145/3646550

‘‘Learned” admission policies have shown promise in improving Content Delivery Network (CDN) cache performance and lowering operational costs. Unfortunately, existing learned policies are optimized with a few fixed cache sizes while in reality, cache ...

research-article

Open Access

Architectural Support for Sharing, Isolating and Virtualizing FPGA Resources

Article No.: 33, Pages 1–26https://doi.org/10.1145/3648475

FPGAs are increasingly popular in cloud environments for their ability to offer on-demand acceleration and improved compute efficiency. Providers would like to increase utilization, by multiplexing customers on a single device, similar to how processing ...

research-article

Open Access

FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

Article No.: 34, Pages 1–27https://doi.org/10.1145/3649455

DRAM memory is a performance bottleneck for many applications, due to its high access latency. Previous work has mainly focused on data locality, introducing small but fast regions to cache frequently accessed data, thereby reducing the average latency. ...

research-article

Open Access

The Droplet Search Algorithm for Kernel Scheduling

Article No.: 35, Pages 1–28https://doi.org/10.1145/3650109

Kernel scheduling is the problem of finding the most efficient implementation for a computational kernel. Identifying this implementation involves experimenting with the parameters of compiler optimizations, such as the size of tiling windows and ...

research-article

Open Access

Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces

Article No.: 36, Pages 1–23https://doi.org/10.1145/3650110

Trace-based simulation is a widely used methodology for system design exploration. It relies on realistic traces that represent a range of behaviors necessary to be evaluated, containing a lot of information about the application, its inputs and the ...

research-article

Open Access

TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture

Article No.: 37, Pages 1–26https://doi.org/10.1145/3652604

Many real-world networks are characterized by being temporal and dynamic, wherein the temporal information signifies the changes in connections, such as the addition or removal of links between nodes. Employing random walks on these temporal networks is a ...

research-article

Open Access

Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication

Article No.: 38, Pages 1–24https://doi.org/10.1145/3653020

The multiplication of sparse matrix and vector (SpMV) is one of the most widely used kernels in high-performance computing as well as machine learning acceleration for sparse neural networks. The design space of SpMV accelerators has two axes: algorithm ...

research-article

Open Access

NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks

Article No.: 39, Pages 1–26https://doi.org/10.1145/3652607

Graph neural networks (GNNs) are of great interest in real-life applications such as citation networks and drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a ...

research-article

Open Access

xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object Storage

Article No.: 40, Pages 1–20https://doi.org/10.1145/3652606

Object storage has been widely used in the cloud. Traditionally, the size of object metadata is much smaller than that of object data, and thus existing object storage systems (such as Ceph and Oasis) can place object data and metadata, respectively, on ...

research-article

Open Access

Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals

Article No.: 41, Pages 1–25https://doi.org/10.1145/3652605

Many applications are designed to perform traversals on tree-like data structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an ...

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators

WIPE: A Write-Optimized Learned Index for Persistent Memory

Coherence Attacks and Countermeasures in Interposer-based Chiplet Systems

A Concise Concurrent B⁺-Tree for Persistent Memory

An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs

Assessing the Impact of Compiler Optimizations on GPUs Reliability

Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey

An Instruction Inflation Analyzing Framework for Dynamic Binary Translators

Cost-aware Service Placement and Scheduling in the Edge-Cloud Continuum

Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses

Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs

SLAP: Segmented Reuse-Time-Label Based Admission Policy for Content Delivery Network Caching

Architectural Support for Sharing, Isolating and Virtualizing FPGA Resources

FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

The Droplet Search Algorithm for Kernel Scheduling

Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces

TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture

Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication

NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks

xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object Storage

Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals

Sections

Issue Downloads

Save to Binder

Subjects

Comments