Issue Downloads
XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine
- Xijie Jia,
- Yu Zhang,
- Guangdong Liu,
- Xinlin Yang,
- Tianyu Zhang,
- Jia Zheng,
- Dongdong Xu,
- Zhuohuan Liu,
- Mengke Liu,
- Xiaoyang Yan,
- Hong Wang,
- Rongzhang Zheng,
- Li Wang,
- Dong Li,
- Satyaprakash Pareek,
- Jian Weng,
- Lu Tian,
- Dongliang Xie,
- Hong Luo,
- Yi Shan
Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this ...
ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA Compilation
Partial Reconfiguration (PR) is a key technique in the application design on modern FPGAs. However, current PR tools heavily rely on the developer to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR ...
GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data ...
The Open-source DeLiBA2 Hardware/Software Framework for Distributed Storage Accelerators
With the trend towards ever larger “big data” applications, many of the gains achievable by using specialized compute accelerators become diminished due to the growing I/O overheads. While there have been several research efforts into computational ...
Design, Calibration, and Evaluation of Real-time Waveform Matching on an FPGA-based Digitizer at 10 GS/s
Digitizing side-channel signals at high sampling rates produces huge amounts of data, while side-channel analysis techniques only need those specific trace segments containing Cryptographic Operations (COs). For detecting these segments, waveform-matching ...
HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks
Binary neural network (BNN), where both the weight and the activation values are represented with one bit, provides an attractive alternative to deploy highly efficient deep learning inference on resource-constrained edge devices. However, our ...
On the Malicious Potential of Xilinx’s Internal Configuration Access Port (ICAP)
Field Programmable Gate Arrays (FPGAs) have become increasingly popular in computing platforms. With recent advances in bitstream format reverse engineering, the scientific community has widely explored static FPGA security threats. For example, it is now ...
Covert-channels in FPGA-enabled SmartSSDs
Cloud computing providers today offer access to a variety of devices, which users can rent and access remotely in a shared setting. Among these devices are SmartSSDs, which are solid-state disks (SSD) augmented with an FPGA, enabling users to instantiate ...
Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs
Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context, Iterative Stencil Loops (ISLs) represent a prominent and well-...
AEKA: FPGA Implementation of Area-Efficient Karatsuba Accelerator for Ring-Binary-LWE-Based Lightweight PQC
Lightweight PQC-related research and development have gradually gained attention from the research community recently. Ring-Binary-Learning-with-Errors (RBLWE)-based encryption scheme (RBLWE-ENC), a promising lightweight PQC based on small parameter sets ...
High-efficiency Compressor Trees for Latest AMD FPGAs
High-fan-in dot product computations are ubiquitous in highly relevant application domains, such as signal processing and machine learning. Particularly, the diverse set of data formats used in machine learning poses a challenge for flexible efficient ...
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming
With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as ...
ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA Chip
The introduction of High Bandwidth Memory (HBM) to the FPGA chip makes it possible for an FPGA-based accelerator to leverage the huge memory bandwidth of HBM to improve its performance when implementing a specific algorithm, which is especially true for ...
Designing an IEEE-Compliant FPU that Supports Configurable Precision for Soft Processors
Field Programmable Gate Arrays (FPGAs) are commonly used to accelerate floating-point (FP) applications. Although researchers have extensively studied FPGA FP implementations, existing work has largely focused on standalone operators and frequency-...
R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA
Emerging data-driven applications in the embedded, e-Health, and internet of things (IoT) domain require complex on-device signal analysis and data reduction to maximize energy efficiency on these energy-constrained devices. Coarse-grained reconfigurable ...
HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space Exploration
- Sichao Chen,
- Chang Cai,
- Su Zheng,
- Jiangnan Li,
- Guowei Zhu,
- Jingyuan Li,
- Yazhou Yan,
- Yuan Dai,
- Wenbo Yin,
- Lingli Wang
Coarse-grained reconfigurable arrays (CGRAs) are promising design choices in computation-intensive domains, since they can strike a balance between energy efficiency and flexibility. A typical CGRA comprises processing elements (PEs) that can execute ...