No abstract available.
Proceeding Downloads
Understanding Communication Characteristics of Distributed Training
Communication is pivotal in distributed training and a thorough understanding of its characteristics is essential for future optimizations. However, prior works are limited, either focusing on customized optimizations or conducting incomplete ...
HF^2T: Host-Based Flowlet Fine-Tuning for RDMA Load Balancing
In modern data center networks, RDMA is widely applied in scenarios such as high-performance computing, distributed storage and machine learning. In recent studies, it has been observed that flowlet switching load balancers cannot fully unleash their ...
LubeRDMA: A Fail-safe Mechanism of RDMA
Recent years have witnessed a wide adoption of Remote Direct Memory Access (RDMA) to accelerate distributed systems. As the scale of distributed applications keeps increasing, network failures become more prominent. Although some link/switch failures ...
Rethinking DNS Configuration Verification with a Distributed Architecture
- Yao Wang,
- Kexin Yu,
- Ziyi Wang,
- Kaiqiang Hu,
- Haizhou Du,
- Qiao Xiang,
- Xing Fang,
- Geng Li,
- Ruiting Zhou,
- Linghe Kong,
- Jiwu Shu
DNS misconfiguration can result in severe social and financial consequences. Existing DNS configuration verification tools employ a centralized architecture, where all zone files are collected for verification. This architecture faces significant ...
Rethinking Intra-host Congestion Control in RDMA Networks
RDMA has been widely deployed in production datacenters. The conventional wisdom believes that the intra-host network delivers stable and high performance. However, intra-host resources witness a relative stagnation in technology trends compared to the ...
A Little Certainty is All We Need: Discovery and Synchronization Acceleration in Battery-Free IoT
The vision of sustainable IoT constructed from battery-free devices has attracted ample interest in the research community. Yet, efficient device discovery and synchronization—a fundamental problem in IoT systems—remains a critical challenge mainly due ...
QuarkTable: Building Compact Forwarding Tables for Programmable Switches on Public Clouds
- Jianyuan Lu,
- Huaiyi Zhao,
- Yu Qi,
- Yehao Feng,
- Xinyu Guo,
- Shengru Li,
- Enge Song,
- Xionglie Wei,
- Biao Lyu,
- Rong Wen,
- Shunmin Zhu
Programmable switches have been recently proposed as dataplane solutions for public clouds. However, the conflict of limited on-chip memory and massive forwarding rules in cloud networks hinders the large-scale deployments. We argue that building ...
Software-based Live Migration for Containerized RDMA
Container live migration is critical to ensure services are not interrupted during host maintenance in data centers. On the other hand, RDMA containerization has attracted both academia and industry for years. However, live migration for containerized ...
CMDRL: A Markovian Distributed Rate Limiting Algorithm in Cloud Networks
As cloud networks continue to evolve, network traffic has experienced an exponential increase. The network architecture is progressively adopting a distributed structure to address this challenge. This architecture extensively utilizes technologies like ...
LEFT: LightwEight and FasT packet Reordering for RDMA
RDMA, as a cutting-edge networking technology, has gained extensive adoption in large-scale data centers due to its exceptional characteristics, such as low and stable latency, high throughput and low CPU utilization. However, due to the limited on-chip ...
Cross-Platform Transpilation of Packet-Processing Programs using Program Synthesis
The proliferation of programmable network devices offers a wide range of device options for developers of packet processing programs. However, there are several differences in programming language usage, hardware resource constraints, and hardware ...
Scaling Data Plane Verification via Parallelization
The data plane verification of networks in hyperscale environments is challenging due to the complexity and size of modern networks. In this paper, we introduce Medusa, a novel verifier that efficiently analyzes large data plane models using parallel ...
Revisiting Congestion Control for WiFi Networks
WiFi networks, widely utilized by wireless devices, have become increasingly complex and congested environments, leading to noticeable delays, jitter, and throughput degradation for end-to-end network flows in today’s Internet. Through detailed ...
vSwitchLB: Stratified Load Balancing for vSwitch Efficiency in Data Centers
- Xin Yin,
- Enge Song,
- Ye Yang,
- Yi Wang,
- Bowen Yang,
- Jianyuan Lu,
- Xing Li,
- Biao Lyu,
- Rong Wen,
- Shibo He,
- Yuanchao Shu,
- Shunmin Zhu
The virtual switch (vSwitch) serves as a fundamental element in cloud network, critical for high-performance and strongly isolated inter-VM forwarding in local and external networks. Similar to other multicore systems, a vSwitch with multiple cores also ...
FlexMem: Proactive Memory Deduplication for Qcow2-Based VMs with Virtual Persistent Memory
Virtualization-based cloud computing has gained popularity owing to its system-isolation security and near-native performance. However, a critical challenge for cloud computing arises from the presence of numerous redundant memory pages across ...
ShieldGPT: An LLM-based Framework for DDoS Mitigation
The constantly evolving Distributed Denial of Service (DDoS) attacks pose a significant threat to the cyber realm, which underscores the importance of DDoS mitigation as a pivotal area of research. While existing AI-driven approaches, including deep ...
An Integrated Solution for High-efficiency In-band Network Telemetry
The advent of in-band network telemetry (INT) facilitates the dynamic and fine-grained monitoring of network conditions. Nevertheless, the current INT specification incurs a substantial measurement overhead, diminishing bandwidth utilization, and ...
Hostmesh: Monitor and Diagnose Networks in Rail-optimized RoCE Clusters
- Kefei Liu,
- Jiao Zhang,
- Zhuo Jiang,
- Xuan Zhang,
- Shixian Guo,
- Yangyang Bai,
- Yongbin Dong,
- Zhang Zhang,
- Xiang Shi,
- Lei Wang,
- Haoran Wei,
- Zicheng Wang,
- Yongchen Pan,
- Tian Pan,
- Tao Huang
RoCE services are sensitive to failures and bottlenecks, which become more common as the RoCE network scales. To effectively detect and locate these problems independent of service traffic, RoCE networks require a monitoring and diagnostic system based ...
Fast Learning Enabled by In-Network Drift Detection
The widespread adoption of Machine Learning (ML) is leading to an increase in processing demands. Dealing with the growing volume of data poses a significant challenge in providing accurate classification services using ML models. Offloading ML tasks to ...
TAR: Traffic Adaptive IPv6 Routing Lookup Scheme
IP lookup aims at identifying the longest prefix match within routing tables to determine the forwarding path for packets. The increase in IPv6 address length makes IP lookup particularly difficult, requiring more efficient lookup mechanisms to handle ...
SAROS: A Self-Adaptive Routing Oblivious Sampling Method for Network-wide Heavy Hitter Detection
Network-wide heavy hitter detection is usually performed by sampling on several network measurement points (NMPs) and merging the measurement results in the centralized controller to get a network-wide view. However, a packet may pass several NMPs and ...
OpenSN: An Open Source Library for Emulating LEO Satellite Networks
Low-earth-orbit (LEO) satellite constellations (e.g., Starlink) are becoming the necessary component of future Internet. There have been increasing studies on LEO satellite networking. It is a crucial problem how to evaluate these studies in a ...
Pyramis: Domain Specific Language for Developing Multi-tier Systems
- Ashwin Kumar,
- Ajinkya Tanksale,
- Armaan Chowfin,
- Mohan Rajasekhar Ajjampudi,
- Arnav Mishra,
- Abuhujair Khan,
- Vishal Saha,
- Priyanka Naik,
- Mythili Vutukuru
Text-based specifications are the de-facto standard for specifying complex multi-tier systems. For example, 3GPP specifications define various interfaces, messages, and message processing at the multiple inter-connected nodes of a 5G system. These ...
UniFL: Enabling Loss-tolerant Transmission in Federated Learning
As Distributed Deep Learning (DDL) gains prominence, network constraints have emerged as a critical bottleneck impacting DDL performance. While state-of-the-art loss-tolerant (LT) transmission protocols enhance DDL efficiency, their application in ...
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
APNet '24 | 118 | 50 | 42% |
Overall | 118 | 50 | 42% |