skip to main content
Bibliometrics
research-article
High Fidelity Makeup via 2D and 3D Identity Preservation Net
Article No.: 230, Pages 1–24https://doi.org/10.1145/3656475

In this article, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown ...

research-article
Real-Time Attentive Dilated U-Net for Extremely Dark Image Enhancement
Article No.: 231, Pages 1–19https://doi.org/10.1145/3654668

Images taken under low-light conditions suffer from poor visibility, color distortion, and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the ...

research-article
Inter-camera Identity Discrimination for Unsupervised Person Re-identification
Article No.: 232, Pages 1–18https://doi.org/10.1145/3652858

Unsupervised person re-identification (Re-ID) has garnered significant attention because of its data-friendly nature, as it does not require labeled data. Existing approaches primarily address this challenge by employing feature-clustering techniques to ...

research-article
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos
Article No.: 233, Pages 1–23https://doi.org/10.1145/3657295

Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge ...

research-article
SigFormer: Sparse Signal-guided Transformer for Multi-modal Action Segmentation
Article No.: 234, Pages 1–22https://doi.org/10.1145/3657296

Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential ...

research-article
DBGAN: Dual Branch Generative Adversarial Network for Multi-Modal MRI Translation
Article No.: 235, Pages 1–22https://doi.org/10.1145/3657298

Existing magnetic resonance imaging translation models rely on generative adversarial networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships ...

research-article
Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement
Article No.: 236, Pages 1–21https://doi.org/10.1145/3661823

This article introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: ...

research-article
Integrated Sensing, Communication, and Computing for Cost-effective Multimodal Federated Perception
Article No.: 237, Pages 1–28https://doi.org/10.1145/3661313

Federated learning (FL) is a prominent paradigm of 6G edge intelligence (EI), which mitigates privacy breaches and high communication pressure caused by conventional centralized model training in the artificial intelligence of things (AIoT). The execution ...

research-article
Open Access
Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement
Article No.: 238, Pages 1–21https://doi.org/10.1145/3661824

Learned video compression has drawn great attention and shown promising compression performance recently. In this article, we focus on the two components in the learned video compression framework, the conditional entropy model and quality enhancement ...

research-article
Recurrent Appearance Flow for Occlusion-Free Virtual Try-On
Article No.: 239, Pages 1–17https://doi.org/10.1145/3659581

Image-based virtual try-on aims at transferring a target in-shop garment onto a reference person, and has garnered significant attention from the research communities recently. However, previous methods have faced severe challenges in handling occlusion ...

research-article
InteractNet: Social Interaction Recognition for Semantic-rich Videos
Article No.: 240, Pages 1–21https://doi.org/10.1145/3663668

The overwhelming surge of online video platforms has raised an urgent need for social interaction recognition techniques. Compared with simple short-term actions, long-term social interactions in semantic-rich videos could reflect more complicated ...

research-article
Exploration of Speech and Music Information for Movie Genre Classification
Article No.: 241, Pages 1–19https://doi.org/10.1145/3664197

Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However, the characteristics of movie trailer audio indicate that this modality alone might be highly effective in genre prediction. Movie trailer audio predominantly ...

research-article
Open Access
Towards Retrieval-Augmented Architectures for Image Captioning
Article No.: 242, Pages 1–22https://doi.org/10.1145/3663667

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have leveraged deep ...

research-article
Progressive Adapting and Pruning: Domain-Incremental Learning for Saliency Prediction
Article No.: 243, Pages 1–243https://doi.org/10.1145/3661312

Saliency prediction (SAP) plays a crucial role in simulating the visual perception function of human beings. In practical situations, humans can quickly grasp saliency extraction in new image domains. However, current SAP methods mainly concentrate on ...

research-article
High Efficiency Deep-learning Based Video Compression
Article No.: 244, Pages 1–23https://doi.org/10.1145/3661311

Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning-based video compression (DLVC) is obviously ...

research-article
Open Access
AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects
Article No.: 245, Pages 1–25https://doi.org/10.1145/3662183

This article focuses on motion prediction for point cloud sequences in the challenging case of deformable 3D objects, such as human body motion. First, we investigate the challenges caused by deformable shapes and complex motions present in this type of ...

research-article
UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet
Article No.: 246, Pages 1–28https://doi.org/10.1145/3660638

Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing ...

research-article
Blind Quality Assessment of Dense 3D Point Clouds with Structure Guided Resampling
Article No.: 247, Pages 1–21https://doi.org/10.1145/3664199

Objective quality assessment of three-dimensional (3D) point clouds is essential for the development of immersive multimedia systems in real-world applications. Despite the success of perceptual quality evaluation for 2D images and videos, blind/no-...

research-article
Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission
Article No.: 248, Pages 1–24https://doi.org/10.1145/3664610

In this article, we present a coding method called expanding-window zigzag decodable fountain code with unequal error protection property (EWF-ZD UEP code) to achieve scalable multimedia transmission. The key idea of the EWF-ZD UEP code is to utilize bit-...

research-article
Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain Generalization
Article No.: 249, Pages 1–20https://doi.org/10.1145/3659953

Domain generalization primarily mitigates domain shift among multiple source domains, generalizing the trained model to an unseen target domain. However, the spurious correlation usually caused by context prior (e.g., background) makes it challenging to ...

research-article
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning
Article No.: 250, Pages 1–19https://doi.org/10.1145/3663570

Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, ...

research-article
Detail-preserving Joint Image Upsampling
Article No.: 251, Pages 1–23https://doi.org/10.1145/3665246

Image operators can be instrumental to computational imaging and photography. However, many of them are computationally intensive. In this article, we propose an effective yet efficient joint upsampling method to accelerate various image operators. We ...

research-article
Online Cross-modal Hashing With Dynamic Prototype
Article No.: 252, Pages 1–18https://doi.org/10.1145/3665249

Online cross-modal hashing has received increasing attention due to its efficiency and effectiveness in handling cross-modal streaming data retrieval. Despite the promising performance, these methods mainly focus on the supervised learning paradigm, ...

research-article
SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection
Article No.: 253, Pages 1–29https://doi.org/10.1145/3665248

Explainable Artificial Intelligence (XAI) attempts to help humans understand machine learning decisions better and has been identified as a critical component toward increasing the trustworthiness of complex black-box systems, such as deep neural ...

research-article
Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee Curve
Article No.: 254, Pages 1–23https://doi.org/10.1145/3664653

Images captured under low-light conditions suffer from several combined degradation factors, including low brightness, low contrast, noise, and color bias. Many learning-based techniques attempt to learn the low-to-clear mapping between low-light and ...

research-article
Open Access
Multiple Image Distortion DNN Modeling Individual Subject Quality Assessment
Article No.: 255, Pages 1–27https://doi.org/10.1145/3664198

A recent research direction is focused on training Deep Neural Networks (DNNs) to replicate individual subject assessments of media quality. These DNNs are referred to as Artificial Intelligence-based Observers (AIOs). An AIO is designed to simulate, in ...

research-article
HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion
Article No.: 256, Pages 1–19https://doi.org/10.1145/3664288

Recent years have witnessed the successful application of knowledge graph techniques in structured data processing, while how to incorporate knowledge from visual and textual modalities into knowledge graphs has been given less attention. To better ...

research-article
Multi Fine-Grained Fusion Network for Depression Detection
Article No.: 257, Pages 1–23https://doi.org/10.1145/3665247

Depression is an illness that involves emotional and mental health. Currently, depression detection through interviews is the most popular way. With the advancement of natural language processing and sentiment analysis, automated interview-based ...

survey
Color Transfer for Images: A Survey
Article No.: 258, Pages 1–29https://doi.org/10.1145/3635152

High-quality image generation is an important topic in digital visualization. As a sub-topic of the research, color transfer is to produce a high-quality image with ideal color scheme learned from the reference one. In this article, we investigate the ...

survey
Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective
Article No.: 259, Pages 1–27https://doi.org/10.1145/3651308

Visual tracking is a fundamental task in computer vision with significant practical applications in various domains, including surveillance, security, robotics, and human-computer interaction. However, it may face limitations in visible light data, such ...

Subjects

Comments