TOMM: Vol 20, No 8

Volume 20, Issue 8August 2024Current IssueIssue-in-Progress

Latest Issue

Volume 20, Issue 8

August 2024

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1551-6857

EISSN:1551-6865

Tags:

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Select All

Export Citations Save to Binder

research-article

High Fidelity Makeup via 2D and 3D Identity Preservation Net

Article No.: 230, Pages 1–24https://doi.org/10.1145/3656475

In this article, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown ...

research-article

Real-Time Attentive Dilated U-Net for Extremely Dark Image Enhancement

Article No.: 231, Pages 1–19https://doi.org/10.1145/3654668

Images taken under low-light conditions suffer from poor visibility, color distortion, and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the ...

research-article

Inter-camera Identity Discrimination for Unsupervised Person Re-identification

Article No.: 232, Pages 1–18https://doi.org/10.1145/3652858

Unsupervised person re-identification (Re-ID) has garnered significant attention because of its data-friendly nature, as it does not require labeled data. Existing approaches primarily address this challenge by employing feature-clustering techniques to ...

research-article

Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos

Article No.: 233, Pages 1–23https://doi.org/10.1145/3657295

Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge ...

research-article

SigFormer: Sparse Signal-guided Transformer for Multi-modal Action Segmentation

Article No.: 234, Pages 1–22https://doi.org/10.1145/3657296

Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential ...

research-article

DBGAN: Dual Branch Generative Adversarial Network for Multi-Modal MRI Translation

Article No.: 235, Pages 1–22https://doi.org/10.1145/3657298

Existing magnetic resonance imaging translation models rely on generative adversarial networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships ...

research-article

Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement

Article No.: 236, Pages 1–21https://doi.org/10.1145/3661823

This article introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: ...

research-article

Integrated Sensing, Communication, and Computing for Cost-effective Multimodal Federated Perception

Article No.: 237, Pages 1–28https://doi.org/10.1145/3661313

Federated learning (FL) is a prominent paradigm of 6G edge intelligence (EI), which mitigates privacy breaches and high communication pressure caused by conventional centralized model training in the artificial intelligence of things (AIoT). The execution ...

research-article

Open Access

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Article No.: 238, Pages 1–21https://doi.org/10.1145/3661824

Learned video compression has drawn great attention and shown promising compression performance recently. In this article, we focus on the two components in the learned video compression framework, the conditional entropy model and quality enhancement ...

research-article

Recurrent Appearance Flow for Occlusion-Free Virtual Try-On

Article No.: 239, Pages 1–17https://doi.org/10.1145/3659581

Image-based virtual try-on aims at transferring a target in-shop garment onto a reference person, and has garnered significant attention from the research communities recently. However, previous methods have faced severe challenges in handling occlusion ...

research-article

InteractNet: Social Interaction Recognition for Semantic-rich Videos

Article No.: 240, Pages 1–21https://doi.org/10.1145/3663668

The overwhelming surge of online video platforms has raised an urgent need for social interaction recognition techniques. Compared with simple short-term actions, long-term social interactions in semantic-rich videos could reflect more complicated ...

research-article

Exploration of Speech and Music Information for Movie Genre Classification

Article No.: 241, Pages 1–19https://doi.org/10.1145/3664197

Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However, the characteristics of movie trailer audio indicate that this modality alone might be highly effective in genre prediction. Movie trailer audio predominantly ...

research-article

Open Access

Towards Retrieval-Augmented Architectures for Image Captioning

Article No.: 242, Pages 1–22https://doi.org/10.1145/3663667

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have leveraged deep ...

research-article

Progressive Adapting and Pruning: Domain-Incremental Learning for Saliency Prediction

Article No.: 243, Pages 1–243https://doi.org/10.1145/3661312

Saliency prediction (SAP) plays a crucial role in simulating the visual perception function of human beings. In practical situations, humans can quickly grasp saliency extraction in new image domains. However, current SAP methods mainly concentrate on ...

research-article

High Efficiency Deep-learning Based Video Compression

Article No.: 244, Pages 1–23https://doi.org/10.1145/3661311

Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning-based video compression (DLVC) is obviously ...

research-article

Open Access

AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects

Article No.: 245, Pages 1–25https://doi.org/10.1145/3662183

This article focuses on motion prediction for point cloud sequences in the challenging case of deformable 3D objects, such as human body motion. First, we investigate the challenges caused by deformable shapes and complex motions present in this type of ...

research-article

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet

Article No.: 246, Pages 1–28https://doi.org/10.1145/3660638

Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing ...

research-article

Blind Quality Assessment of Dense 3D Point Clouds with Structure Guided Resampling

Article No.: 247, Pages 1–21https://doi.org/10.1145/3664199

Objective quality assessment of three-dimensional (3D) point clouds is essential for the development of immersive multimedia systems in real-world applications. Despite the success of perceptual quality evaluation for 2D images and videos, blind/no-...

research-article

Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission

Article No.: 248, Pages 1–24https://doi.org/10.1145/3664610

In this article, we present a coding method called expanding-window zigzag decodable fountain code with unequal error protection property (EWF-ZD UEP code) to achieve scalable multimedia transmission. The key idea of the EWF-ZD UEP code is to utilize bit-...

research-article

Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain Generalization

Article No.: 249, Pages 1–20https://doi.org/10.1145/3659953

Domain generalization primarily mitigates domain shift among multiple source domains, generalizing the trained model to an unseen target domain. However, the spurious correlation usually caused by context prior (e.g., background) makes it challenging to ...

research-article

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning

Article No.: 250, Pages 1–19https://doi.org/10.1145/3663570

Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, ...

research-article

Detail-preserving Joint Image Upsampling

Article No.: 251, Pages 1–23https://doi.org/10.1145/3665246

Image operators can be instrumental to computational imaging and photography. However, many of them are computationally intensive. In this article, we propose an effective yet efficient joint upsampling method to accelerate various image operators. We ...

research-article

Online Cross-modal Hashing With Dynamic Prototype

Article No.: 252, Pages 1–18https://doi.org/10.1145/3665249

Online cross-modal hashing has received increasing attention due to its efficiency and effectiveness in handling cross-modal streaming data retrieval. Despite the promising performance, these methods mainly focus on the supervised learning paradigm, ...

research-article

SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection

Article No.: 253, Pages 1–29https://doi.org/10.1145/3665248

Explainable Artificial Intelligence (XAI) attempts to help humans understand machine learning decisions better and has been identified as a critical component toward increasing the trustworthiness of complex black-box systems, such as deep neural ...

research-article

Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee Curve

Article No.: 254, Pages 1–23https://doi.org/10.1145/3664653

Images captured under low-light conditions suffer from several combined degradation factors, including low brightness, low contrast, noise, and color bias. Many learning-based techniques attempt to learn the low-to-clear mapping between low-light and ...

research-article

Open Access

Multiple Image Distortion DNN Modeling Individual Subject Quality Assessment

Article No.: 255, Pages 1–27https://doi.org/10.1145/3664198

A recent research direction is focused on training Deep Neural Networks (DNNs) to replicate individual subject assessments of media quality. These DNNs are referred to as Artificial Intelligence-based Observers (AIOs). An AIO is designed to simulate, in ...

research-article

HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion

Article No.: 256, Pages 1–19https://doi.org/10.1145/3664288

Recent years have witnessed the successful application of knowledge graph techniques in structured data processing, while how to incorporate knowledge from visual and textual modalities into knowledge graphs has been given less attention. To better ...

research-article

Multi Fine-Grained Fusion Network for Depression Detection

Article No.: 257, Pages 1–23https://doi.org/10.1145/3665247

Depression is an illness that involves emotional and mental health. Currently, depression detection through interviews is the most popular way. With the advancement of natural language processing and sentiment analysis, automated interview-based ...

survey

Color Transfer for Images: A Survey

Article No.: 258, Pages 1–29https://doi.org/10.1145/3635152

High-quality image generation is an important topic in digital visualization. As a sub-topic of the research, color transfer is to produce a high-quality image with ideal color scheme learned from the reference one. In this article, we investigate the ...

survey

Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective

Article No.: 259, Pages 1–27https://doi.org/10.1145/3651308

Visual tracking is a fundamental task in computer vision with significant practical applications in various domains, including surveillance, security, robotics, and human-computer interaction. However, it may face limitations in visible light data, such ...

ACM Transactions on Multimedia Computing, Communications, and Applications

Sections

High Fidelity Makeup via 2D and 3D Identity Preservation Net

Real-Time Attentive Dilated U-Net for Extremely Dark Image Enhancement

Inter-camera Identity Discrimination for Unsupervised Person Re-identification

Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos

SigFormer: Sparse Signal-guided Transformer for Multi-modal Action Segmentation

DBGAN: Dual Branch Generative Adversarial Network for Multi-Modal MRI Translation

Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement

Integrated Sensing, Communication, and Computing for Cost-effective Multimodal Federated Perception

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Recurrent Appearance Flow for Occlusion-Free Virtual Try-On

InteractNet: Social Interaction Recognition for Semantic-rich Videos

Exploration of Speech and Music Information for Movie Genre Classification

Towards Retrieval-Augmented Architectures for Image Captioning

Progressive Adapting and Pruning: Domain-Incremental Learning for Saliency Prediction

High Efficiency Deep-learning Based Video Compression

AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet

Blind Quality Assessment of Dense 3D Point Clouds with Structure Guided Resampling

Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission

Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain Generalization

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning

Detail-preserving Joint Image Upsampling

Online Cross-modal Hashing With Dynamic Prototype

SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection

Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee Curve

Multiple Image Distortion DNN Modeling Individual Subject Quality Assessment

HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion

Multi Fine-Grained Fusion Network for Depression Detection

Color Transfer for Images: A Survey

Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective

Sections

Save to Binder

Subjects

Comments