22-24 October 2025, Singapore

Day 1: Wednesday, 22 Oct 2025 - Overview |
|
08:00-10:00 |
D1-0800_IB Exhibition Setup Location: Island Ballroom |
08:00-11:30 |
D1-0800_L1 Tutorial 1: Recent Advances in End-to-End Learned Image and Video Coding Location: Lotus I |
08:00-11:30 |
D1-0800_L2 Tutorial 2: Deep Speaker Modeling: Theories, Applications and Practice Location: Lotus II |
08:00-11:30 |
D1-0800_H3 Tutorial 3: From Detection to Direction: An Overview of Sound Event Localization and Detection Location: Hibiscus III |
08:00-11:30 |
D1-0800_P1 Tutorial 4: Adaptive Sensor Networks in Digital Health Location: Peony I |
10:00-11:30 |
D1-1000_IB Registration Location: Island Ballroom |
11:30-13:30 |
D1-1130_IB Welcome Reception & Opening Ceremony Location: Island Ballroom |
13:30-14:30 |
D1-1330_IB Keynote 1 by Emanuël Habets Location: Island Ballroom |
14:30-16:00 |
D1-1430_IB Perspective 1: Voice Privacy & Security Location: Island Ballroom |
14:30-16:00 |
D1-1430_L1 Advanced Signal Processing and Machine Learning for Acoustic Scene Analysis and Signal Enhancement Location: Lotus I |
14:30-16:00 |
D1-1430_L2 Interactive, Natural, Expressive and Robust Conversation System Location: Lotus II |
14:30-16:00 |
D1-1430_H3 Generative AI Models for Vision-Based Applications Location: Hibiscus III |
14:30-16:00 |
D1-1430_P1 Multimedia Security & Forensics Location: Peony I |
14:30-16:00 |
D1-1430_P2 Wireless Communications & Networking Location: Peony II |
16:00-16:30 |
Break |
16:30-18:00 |
D1-1630_IB Perspective 2: Utilization of the Foundation Models and the Future Location: Island Ballroom |
16:30-18:00 |
D1-1630_L1 Tracing the Fake: Deepfake Detection, Attribution & Spoof-aware Speaker Verification Across Languages & Accents Location: Lotus I |
16:30-18:00 |
D1-1630_L2 Speaker Modeling Beyond Speaker Recognition Location: Lotus II |
16:30-18:00 |
D1-1630_H3 Three-Minute Thesis (3MT) Competition Location: Hibiscus III |
16:30-18:00 |
D1-1630_P2 Advanced Multimedia Applications Location: Peony II |
Day 1: Wednesday, 22 Oct 2025 - With Papers |
|
08:00-10:00 |
D1-0800_IB Exhibition Setup Location: Island Ballroom |
08:00-11:30 |
D1-0800_L1 Tutorial 1: Recent Advances in End-to-End Learned Image and Video Coding Location: Lotus I |
08:00-11:30 |
D1-0800_L2 Tutorial 2: Deep Speaker Modeling: Theories, Applications and Practice Location: Lotus II |
08:00-11:30 |
D1-0800_H3 Tutorial 3: From Detection to Direction: An Overview of Sound Event Localization and Detection Location: Hibiscus III |
08:00-11:30 |
D1-0800_P1 Tutorial 4: Adaptive Sensor Networks in Digital Health Location: Peony I |
10:00-11:30 |
D1-1000_IB Registration Location: Island Ballroom |
11:30-13:30 |
D1-1130_IB Welcome Reception & Opening Ceremony Location: Island Ballroom |
13:30-14:30 |
D1-1330_IB Keynote 1 by Emanuël Habets Location: Island Ballroom |
14:30-16:00 |
D1-1430_IB Perspective 1: Voice Privacy & Security Location: Island Ballroom D1-1430_IB.1 645 Speaker Privacy and Security in the Big Data Era: Protection and Defense against Deepfake Liping Chen, Kong Aik Lee, Zhen-Hua Ling, Xin Wang, Rohan Kumar Das, Tomoki Toda, Haizhou Li |
14:30-16:00 |
D1-1430_L1 Advanced Signal Processing and Machine Learning for Acoustic Scene Analysis and Signal Enhancement Location: Lotus I D1-1430_L1.1 51 MVDR beamforming for underdetermined sound source separation using iterative PSD estimation in beamspace Jin Xuan Teh, Yusuke Hioka D1-1430_L1.2 184 Exploring Dual-Mode Training for Real-time Target Speaker Extraction Li Li, Shogo Seki D1-1430_L1.3 186 Switching Constant Separating Vector for Moving Source Extraction with Geometric Constraints Changda Chen, Yichen Yang, Yuehao Zhao, Shoji Makino, Jingdong Chen D1-1430_L1.4 201 Neural Network-Assisted Joint DOA Estimation and Beamforming with First-Order Reflection Modeling Yichen Yang, Chao Pan, Qiang Gao, Jacob Benesty, Shoji Makino, Jingdong Chen D1-1430_L1.5 234 Speaker Localization in Classroom Environments Using GCC-PHAT Features and Mamba State Space Models with Ad-hoc Microphone Arrays Rashed Iqbal, Christian Ritz, Jack Yang, Sarah Howard D1-1430_L1.6 293 Joint Separation and Tracking of Moving Sources With Distributed Microphone Arrays Based on Time-Varying Inertial Spatial Models Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii D1-1430_L1.7 418 Visually-Informed Multichannel Sound Source Separation Based on 3D Gaussian Primitives Haruki Asano, Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii D1-1430_L1.8 439 Joint Optimization of Sampling Rate Offsets and Demixing Filters Using Auxiliary Function Method Hayato Takeuchi, Takao Kawamura, Nobutaka Ono, Shoko Araki D1-1430_L1.9 442 First Demonstration of Acoustic Scene Classification Based on Trained Sound-to-Light Conversion Shun Kotsugi, Takao Kawamura, Nobutaka Ono D1-1430_L1.10 474 Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone Arrays Kouei Yamaoka, Katsuhiro Morita, Norihiro Takamune, Hiroshi Saruwatari D1-1430_L1.11 501 Interactive Spatial Audio Rendering on Mobile Devices: A Two-Stage User Interface with Adaptive HRTF Selection and Real-Time Room Acoustics Simulation Shravan Raghunath, Kanishk AL, Sailesh S, Rishabh Gupta, Saurav Gupta, Ramesh R D1-1430_L1.12 507 Are Identical Sounds Present in Distributed Recordings to Serve as Spatio-Temporal Anchors? A Case Study Using the SINS Database Takao Kawamura, Nobutaka Ono |
14:30-16:00 |
D1-1430_L2 Interactive, Natural, Expressive and Robust Conversation System Location: Lotus II D1-1430_L2.1 42 Language Adaptation Wake Word Spotting via Latent Space from Pre-trained Speech Models Shifu Xiong, Hengshun Zhou, Kai Shen, Shi Cheng, Hang Chen, Genshun Wan, Kewei Li, Jun Du, Lirong Dai D1-1430_L2.2 60 Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers Tzu-Quan Lin, Hsi-Chun Cheng, Hung-yi Lee, Hao Tang D1-1430_L2.3 73 Multi-task Pretraining for Enhancing Interpretable L2 Pronunciation Assessment Jiun-Ting Li, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen D1-1430_L2.4 78 End-to-End Integration of Speech Emotion Recognition and Voice Activity Detection with a Self-Supervised Model for Noise Robustness Natsuo Yamashita, Masaaki Yamamoto, Yohei Kawaguchi D1-1430_L2.5 102 SCSMT: A Multilingual Children's Speech Corpus for Singapore's Mother Tongues Bowen Zhang, Nur Afiqah Abdul Latiff, Rong Tong, Donny Soh, Ian McLoughlin D1-1430_L2.6 112 Reducing Orthographic Dependency on Paired Data by Probabilistic Integration via Syllabogram for Japanese Dialogue Speech Recognition Ryu Takeda, Kazunori Komatani D1-1430_L2.7 152 Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Yu Jiang, Yuheng Lu, Chen Zhang, Longbiao Wang, Jianwu Dang D1-1430_L2.8 199 Constructing an In-the-Wild Spoken Dialogue Dataset Based on YouTube Dialogue Videos Yuki Sato, Sanae Yamashita, Shinnosuke Takamichi, Ryuichiro Higashinaka D1-1430_L2.9 217 Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari D1-1430_L2.10 321 Conversation Context-aware Direct Preference Optimization for Style-Controlled Speech Synthesis Atsushi Kojima, Yusuke Fujita, Hao Shi, Tomoya Mizumoto, Mengjie Zhao, Yui Sudo D1-1430_L2.11 483 A Hybrid Attention Mechanism to Improve Tacotron 2 Performance for Indonesian Text-to-Speech Synthesis Angela Catherina, Bima Prihasto, Boby Mugi Pratama, Li-Wei Kang, Jia-Ching Wang |
14:30-16:00 |
D1-1430_H3 Generative AI Models for Vision-Based Applications Location: Hibiscus III D1-1430_H3.1 91 TRUST: Token‑dRiven Ultrasound Style Transfer for Cross‑Device Adaptation Nhat-Tuong Do-Tran, Ngoc-Hoang-Lam Le, Ian Chiu, Po-Tsun Paul Kuo, Ching-Chun Huang D1-1430_H3.2 92 Two-Stage Transformer-based Deep Hyperspectral and Multispectral Image Fusion Network for Hyperspectral Image Super-Resolution Wo-Yen Li, Chia-Ming Lee, Chih-Chung Hsu, Volodymyr Khylenko, Li-Wei Kang D1-1430_H3.3 147 Pedestrian Detection based on Visible Guided Occlusion Handling Lien-Chieh Huang, Ching-Te Chiu, Yung-Cheng Su D1-1430_H3.4 242 Spatial-Frequency Guided Moiré Removal with Multi-Stage Feature Fusion Chen Lo, Chia-Hung Yeh D1-1430_H3.5 251 Registration of Infrared and Visible Images Using Style Transfer-Based Semantic Segmentation Si-Ting Lin, Chih-Hung Han, Chieh-Ling Lee, Po-Chyi Su, Feng-Tsun Chien, Min-Kuan Chang D1-1430_H3.6 533 Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator Xian He, Wei Zeng, Ye Wang D1-1430_H3.7 536 Prompt-Based Vertebral Segmentation Using a Generative AI Approach in OVCF Spinal Radiographs Po-Kai Su, Pei-Rong Jiang, Kai-Xuan Xu, Meng-Lei Su, Jiann-Her Lin, Hsin-Han Chiang, Hsiao-Chi Li D1-1430_H3.8 561 A Dual-Stream Diffusion Model with Physically-Based Rendering for Single Image Reflection Removal Cheng-Wei Hsu, Ming-Sui Lee D1-1430_H3.9 615 DYNAMIC FACIAL EXPRESSION RECOGNITION IN THE WILD USING MAMBA-STYLE SELECTIVE SSM AND FACIAL ATTENTION MECHANISM Yudhistira Arditya Pratama, Theophilus Ezra Nugroho Pandin, Yi-Zeng Hsieh |
14:30-16:00 |
D1-1430_P1 Multimedia Security & Forensics Location: Peony I D1-1430_P1.1 104 The Potential of LLMs for Generating Malicious Domain Names Lim Kit Michael Ye, Kaijian Zheng, Ngai Fong Law, Jianping Li D1-1430_P1.2 135 Reducing Implicit Class Imbalance in Unlabeled Datasets Using Text-Specified Sensitive Attributes Kosei Suyama, Kazuaki Nakamura D1-1430_P1.3 215 DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction Cheng-Yeh Yang, Kuan-Tang Huang, Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen D1-1430_P1.4 348 Multimodal Large Language Model for Deepfake Video Detection and Description Haoran Sun, Chen Cai, Kong Aik Lee, Lap-Pui Chau, Yi Wang D1-1430_P1.5 379 Biometric Identification Using Default Mode Network Features Extracted from Eyes-Open Resting-State EEG Data Parvathy Remesh, Jijomon Chettuthara Moncy, Vinod Achutavarrier Prasad D1-1430_P1.6 410 Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods Shota Iwamatsu, Koichi Ito, Takafumi Aoki D1-1430_P1.7 464 Access Control for Diffusion Models by Random Masking the Covariance of Initial Noise Distribution Temma Tanaka, Kazuaki Nakamura |
14:30-16:00 |
D1-1430_P2 Wireless Communications & Networking Location: Peony II D1-1430_P2.1 39 Low-Complexity Sparse Channel Estimation for Reconfigurable Intelligent Surface-Aided MIMO Wei-Lin Chiang, Shu-Yu Lin, Jung-Chun Chi, Yuan-Hao Huang D1-1430_P2.2 50 BFIS: Efficient Unknown Protocol Feature Extraction Method For Satellite Communication Systems Xianwen Ling, Kun Zhang, Rong Tong, Dianying Chen D1-1430_P2.3 74 Outdoor Experiment of Deep Joint Source-Channel Coding Using FFT-Enabled Convolutional Neural Network for Image Transmission Tomoka Mori, Hiroshi Tatsukawa, Yuji Kawai, Yoshinori Shinohara, Hiroki Ikeda, Daisuke Hisano D1-1430_P2.4 128 On LSTM-Based Behavioral Modeling of Radio-Frequency Power Amplifiers with a Small Training Dataset Ryoki Yamaguchi, Satoshi Miyata, Suehiro Shimauchi, Eiji Mochida, Seiji Fujiwara D1-1430_P2.5 209 DL-based Optical Fibre Fault Detection for Healthcare Telesurgery Communication System Khushi Shah, Lakshit Pathak, Akshita Abrol, Kanak Jain, Rajesh Gupta, Parishi Shah, Sudeep Tanwar, Umesh Bodkhe, Tong Rong D1-1430_P2.6 270 Overcoming Imperfect Detection Limitations: Deep Learning-Based Calibration Strategy for Rotating Interferometric Arrays Zhaohang Zhang, Chunzhe Wang, Zhen Huang, Yafeng Zhan D1-1430_P2.7 287 A Regional Clustering Method Based on Propagation Similarity for Modeling Cumulative Interference from Large Numbers of Terminals Tatsuro Hidaka, Osamu Takyu, Kei Inage, Takeo Fujii, Kohei Yoshida, Masayuki Ariyoshi D1-1430_P2.8 290 Radio Frequency Fingerprinting-Based Device Identification Using Deep Metric Learning Dinh Tuan Anh, Bui Tung Lam, Pham An Duy, Pham Minh Tuan, Tran Vinh Co, Nguyen Huu Tinh, Huynh Cong Bang D1-1430_P2.9 351 GNSS Spoofing Detection Based on LSTM-TNN-CVAE Network Chaowen Tang, Tian Qin D1-1430_P2.10 456 Enhancing Speech Quality in Scintillating Satellite Communications: A Rician Fading Modeling Approach Teh Kah Kuan, Sun Hanwu, Tran Huy Dat |
16:00-16:30 |
Break |
16:30-18:00 |
D1-1630_IB Perspective 2: Utilization of the Foundation Models and the Future Location: Island Ballroom D1-1630_IB.1 643 Foundation Models as Guardrails: LLM- and VLM-Based Approaches to Safety and Alignment Huy Nguyen, Pride Kavumba, Tomoya Kurosawa, Koki Wataoka |
16:30-18:00 |
D1-1630_L1 Tracing the Fake: Deepfake Detection, Attribution & Spoof-aware Speaker Verification Across Languages & Accents Location: Lotus I D1-1630_L1.1 126 NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation Huhong Xian, Rui Liu, Berrak Sisman, Haizhou Li D1-1630_L1.2 187 Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation Hieu-Thi Luong, Inbal Rimon, Haim Permuter, Kong Aik Lee, Eng Siong Chng D1-1630_L1.3 188 Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection Janne Laakkonen, Ivan Kukanov, Ville Hautamäki D1-1630_L1.4 203 Continual Audio Deepfake Detection via Universal Adversarial Perturbation Wangjie Li, Lin Li, Qingyang Hong D1-1630_L1.5 212 Exploring Source Features with Deep Residual Neural Networks for Replay Attack Detection Suresh Veesa, Badugu Vamsi Krishna, Madhusudan Singh D1-1630_L1.6 254 A Preliminary Study on Sectional Voice Anonymization and Detection Shaoqi Tang, Zeyan Liu, Liping Chen, Kong Aik Lee, Tomoki Toda, Zhenhua Ling D1-1630_L1.7 276 ArcticEcho: A Novel Speaker-Controlled Voice Cloning Dataset for Modern Deepfake Detection Benchmarking Soham Gangopadhyay, Inderpreet Singh, Prateek Pandya, Ashish Mani, Sumit Goswami D1-1630_L1.8 327 Variational Regularization for End-to-end Speech Deepfake Detection Siqing Qin, Kong Aik Lee, Man-Wai Mak, Pasquale Lisena, Massimiliano Todisco D1-1630_L1.9 508 A Wavelet tour of Audio Deepfake Detection Arth Shah, Aniket Pandey, Manav Giakwad, Hemant Patil D1-1630_L1.10 509 Fusion of Modulation Spectrogram and SSL with Multi-head Attention for Fake Speech Detection Rishith Sadashiv T N, Abhishek Bedge, Saisha Suresh Bore, Jagabandhu Mishra, Mrinmoy Bhattacharjee, S R Mahadeva Prasanna |
16:30-18:00 |
D1-1630_L2 Speaker Modeling Beyond Speaker Recognition Location: Lotus II D1-1630_L2.1 41 SpkAugTSE: A Simple and Efficient Approach to Address Target Confusion in End-to-End Speaker Extraction Zhenghai You, Zhenyu Zhou, Lantian Li, Dong Wang D1-1630_L2.2 81 Interpolating Speaker Identities in Embedding Space for Data Expansion Tianchi Liu, Ruijie Tao, Qiongqiong Wang, Yidi Jiang, Hardik B. Sailor, Ke Zhang, Jingru Lin, Haizhou Li D1-1630_L2.3 101 MDD: a Mask Diffusion Detector to Protect Speaker Verification Systems from Adversarial Perturbations Yibo Bai, Sizhou Chen, Michele Panariello, Xiao-Lei Zhang, Massimiliano Todisco, Nicholas Evans D1-1630_L2.4 131 Fusing Multi-layer Features of the Pre-trained Model With Grouped Cross Attention for Spoofing Speech Detection Yu Guan, Wu Guo, Jie Zhang, Zhijun Zhang D1-1630_L2.5 132 Fusing Blocked Deep Features of Pre-Trained Models for Short-Duration Speaker Verification Zhi jun Zhang, Wu Guo, Jie Zhang, Yu Guan D1-1630_L2.6 182 Multi-level Adversarial Training with Data Augmentation for Robust Speaker Verification Xiaolei Zhang, Zhihua Fang, Liang He D1-1630_L2.7 189 Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission Nirmalya Mallick Thakur, Jia Qi Yip, Eng Siong Chng D1-1630_L2.8 197 Estimating Speaker’s Seating Position from Monaural Speech in a Simulated Vehicle Interior Sound Field Masataka Kaneko, Wen-Chin Huang, Tomoki Toda D1-1630_L2.9 333 TS-VAD+: Modularized Target-Speaker Voice Activity Detection for Robust Speaker Diarization Tran The Anh, Azmat Adnan, Wu Yihao, Chng Eng Siong D1-1630_L2.10 590 Are Multimodal Foundation Models All That Is Needed for Emofake Detection? Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Ananda Chandra Nayak, Sanjib Kumar Nayak, Arun Balaji Buduru |
16:30-18:00 |
D1-1630_H3 Three-Minute Thesis (3MT) Competition Location: Hibiscus III |
16:30-18:00 |
D1-1630_P2 Advanced Multimedia Applications Location: Peony II D1-1630_P2.1 113 Ensemble Methods for Estimating the Localization of Coronary Stenosis from CT Images Using 3D CNN Models Minori Kondo, Masaki Aono, Kazuki Shimizu, Masashi Hashimoto, Takeshi Miyaji, Kei Nomura D1-1630_P2.2 158 Tiered Assessment for DSP Education: Exploring Students’ Motivation and Performance Eliathamby Ambikairajah, Tharmakulasingam Sirojan, Vidhyasaharan Sethu D1-1630_P2.3 297 An Investigation of Parameter Scheduling for Image Restoration in Optical Analog Circuits Taisei Kato, Ryo Hayakawa, Soma Furusawa, Kazunori Hayashi, Youji Iiguni D1-1630_P2.4 364 Robust Cloud Removal from Optical Satellite Images Using Synthetic Aperture Radar and Multimodal Embedding Prior Taishin Miura, Shunsuke Ono, Ryo Matsuoka D1-1630_P2.5 367 Reflection and Noise Separation from Polarized Images via Joint Nonnegative Matrix Factorization and Plug-and-Play Denoising Maharu Oda, Ryo Matsuoka D1-1630_P2.6 377 Gated probabilistic diffusion for temporal action segmentation Yun LI, Hanmin Li, Kin-Man Lam D1-1630_P2.7 382 Theory of Spherical VR model for Landscape Representation Hiroyuki Nishimoto, Toru Takahashi, Masakazu Yoshida D1-1630_P2.8 423 HyTver: A Novel Loss Function for Longitudinal Multiple Sclerosis Lesion Segmentation. Dayan Perera, Ting Fung Fung, Vishnu Monn Baskaran D1-1630_P2.9 473 KH-FUNSD: A Hierarchical and Fine-Grained Layout Analysis Dataset for Low-Resource Khmer Business Document Nimol Thuon, Jun Du D1-1630_P2.10 516 Effective Speckle Noise Reduction Using Transformed Bayesian Likelihood with Wiener-Based and Sketch-Based Geometric Priors Ming-Hsun Mo, Pin-Wen Huang, Jian-Jiun Ding |