High Fidelity Makeup via 2D and 3D Identity Preservation Net
In this article, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown ...
Real-Time Attentive Dilated U-Net for Extremely Dark Image Enhancement
Images taken under low-light conditions suffer from poor visibility, color distortion, and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the ...
Inter-camera Identity Discrimination for Unsupervised Person Re-identification
Unsupervised person re-identification (Re-ID) has garnered significant attention because of its data-friendly nature, as it does not require labeled data. Existing approaches primarily address this challenge by employing feature-clustering techniques to ...
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos
Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge ...
SigFormer: Sparse Signal-guided Transformer for Multi-modal Action Segmentation
Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential ...
DBGAN: Dual Branch Generative Adversarial Network for Multi-Modal MRI Translation
Existing magnetic resonance imaging translation models rely on generative adversarial networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships ...
Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement
This article introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: ...
Integrated Sensing, Communication, and Computing for Cost-effective Multimodal Federated Perception
- Ning Chen,
- Zhipeng Cheng,
- Xuwei Fan,
- Zhang Liu,
- Bangzhen Huang,
- Yifeng Zhao,
- Lianfen Huang,
- Xiaojiang Du,
- Mohsen Guizani
Federated learning (FL) is a prominent paradigm of 6G edge intelligence (EI), which mitigates privacy breaches and high communication pressure caused by conventional centralized model training in the artificial intelligence of things (AIoT). The execution ...
Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement
Learned video compression has drawn great attention and shown promising compression performance recently. In this article, we focus on the two components in the learned video compression framework, the conditional entropy model and quality enhancement ...
Recurrent Appearance Flow for Occlusion-Free Virtual Try-On
Image-based virtual try-on aims at transferring a target in-shop garment onto a reference person, and has garnered significant attention from the research communities recently. However, previous methods have faced severe challenges in handling occlusion ...
InteractNet: Social Interaction Recognition for Semantic-rich Videos
The overwhelming surge of online video platforms has raised an urgent need for social interaction recognition techniques. Compared with simple short-term actions, long-term social interactions in semantic-rich videos could reflect more complicated ...
Exploration of Speech and Music Information for Movie Genre Classification
Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However, the characteristics of movie trailer audio indicate that this modality alone might be highly effective in genre prediction. Movie trailer audio predominantly ...
Towards Retrieval-Augmented Architectures for Image Captioning
The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have leveraged deep ...
Progressive Adapting and Pruning: Domain-Incremental Learning for Saliency Prediction
Saliency prediction (SAP) plays a crucial role in simulating the visual perception function of human beings. In practical situations, humans can quickly grasp saliency extraction in new image domains. However, current SAP methods mainly concentrate on ...
High Efficiency Deep-learning Based Video Compression
Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning-based video compression (DLVC) is obviously ...
UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet
- Jiabo Ye,
- Junfeng Tian,
- Ming Yan,
- Haiyang Xu,
- Qinghao Ye,
- Yaya Shi,
- Xiaoshan Yang,
- Xuwu Wang,
- Ji Zhang,
- Liang He,
- Xin Lin
Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing ...
Blind Quality Assessment of Dense 3D Point Clouds with Structure Guided Resampling
Objective quality assessment of three-dimensional (3D) point clouds is essential for the development of immersive multimedia systems in real-world applications. Despite the success of perceptual quality evaluation for 2D images and videos, blind/no-...
Expanding-Window Zigzag Decodable Fountain Codes for Scalable Multimedia Transmission
In this article, we present a coding method called expanding-window zigzag decodable fountain code with unequal error protection property (EWF-ZD UEP code) to achieve scalable multimedia transmission. The key idea of the EWF-ZD UEP code is to utilize bit-...
Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain Generalization
Domain generalization primarily mitigates domain shift among multiple source domains, generalizing the trained model to an unseen target domain. However, the spurious correlation usually caused by context prior (e.g., background) makes it challenging to ...
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning
Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, ...
Detail-preserving Joint Image Upsampling
Image operators can be instrumental to computational imaging and photography. However, many of them are computationally intensive. In this article, we propose an effective yet efficient joint upsampling method to accelerate various image operators. We ...
Online Cross-modal Hashing With Dynamic Prototype
Online cross-modal hashing has received increasing attention due to its efficiency and effectiveness in handling cross-modal streaming data retrieval. Despite the promising performance, these methods mainly focus on the supervised learning paradigm, ...
SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection
Explainable Artificial Intelligence (XAI) attempts to help humans understand machine learning decisions better and has been identified as a critical component toward increasing the trustworthiness of complex black-box systems, such as deep neural ...
Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee Curve
Images captured under low-light conditions suffer from several combined degradation factors, including low brightness, low contrast, noise, and color bias. Many learning-based techniques attempt to learn the low-to-clear mapping between low-light and ...
Multiple Image Distortion DNN Modeling Individual Subject Quality Assessment
- Lohic Fotio Tiotsop,
- Antonio Servetti,
- Peter Pocta,
- Glenn Van Wallendael,
- Marcus Barkowsky,
- Enrico Masala
A recent research direction is focused on training Deep Neural Networks (DNNs) to replicate individual subject assessments of media quality. These DNNs are referred to as Artificial Intelligence-based Observers (AIOs). An AIO is designed to simulate, in ...
HKA: A Hierarchical Knowledge Alignment Framework for Multimodal Knowledge Graph Completion
Recent years have witnessed the successful application of knowledge graph techniques in structured data processing, while how to incorporate knowledge from visual and textual modalities into knowledge graphs has been given less attention. To better ...
Multi Fine-Grained Fusion Network for Depression Detection
Depression is an illness that involves emotional and mental health. Currently, depression detection through interviews is the most popular way. With the advancement of natural language processing and sentiment analysis, automated interview-based ...
Color Transfer for Images: A Survey
High-quality image generation is an important topic in digital visualization. As a sub-topic of the research, color transfer is to produce a high-quality image with ideal color scheme learned from the reference one. In this article, we investigate the ...
Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective
Visual tracking is a fundamental task in computer vision with significant practical applications in various domains, including surveillance, security, robotics, and human-computer interaction. However, it may face limitations in visible light data, such ...