2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)


22-24 October 2025, Singapore



Technical Program


Day 1 | Day 2 | Day 3

Day 3: Friday, 24 Oct 2025 - Overview

08:00-08:30

D3-0800_IB Registration

Location: Island Ballroom

08:30-10:00

D3-0830_IB Sadaoki Furui Prize Papers and Conference Award Session I

Location: Island Ballroom

08:30-10:00

D3-0830_L1 Music and Singing

Location: Lotus I

08:30-10:00

D3-0830_L2 Speech Communication and Media Generation

Location: Lotus II

08:30-10:00

D3-0830_H3 Deep Learning: Algorithm, Implementations, and Applications

Location: Hibiscus III

08:30-10:00

D3-0830_P1 Advanced Signal Processing and Resource Management for Sensing and Communication

Location: Peony I

08:30-10:00

D3-0830_P2 Machine Learning: Algorithms and Application II

Location: Peony II

10:00-10:30

Break

10:30-12:00

D3-1030_IB Conference Award Session II

Location: Island Ballroom

10:30-12:00

D3-1030_L1 Grand Challenge & Audio Processing

Location: Lotus I

10:30-12:00

D3-1030_L2 Speech Recognition and Understanding I

Location: Lotus II

10:30-12:00

D3-1030_H3 Recent Advances in Multimodal Learning for Computer Vision

Location: Hibiscus III

10:30-12:00

D3-1030_P1 Signal & Information Processing II

Location: Peony I

10:30-12:00

D3-1030_P2 Machine Learning: Algorithms and Application III

Location: Peony II

12:00-13:30

Lunch

13:30-14:30

D3-1330_IB Keyntoe 3 by Xiao-Ping (Steven) Zhang

Location: Island Ballroom

14:30-16:00

D3-1430_IB Perspective 4: Leveraging LLMs in Interdisciplinary Study of Speech and Language

Location: Island Ballroom

14:30-16:00

D3-1430_L1 Speech Synthesis, Enhancement, and Recognition

Location: Lotus I

14:30-16:00

D3-1430_L2 Speech Recognition and Understanding II

Location: Lotus II

14:30-16:00

D3-1430_H3 Privacy and Security in Speech AI

Location: Hibiscus III

14:30-16:00

D3-1430_P1 Late Breaking Reports

Location: Peony I

14:30-16:00

D3-1430_P2 Scalable and Efficient Signal Processing for Multimodal AI Systems (ESPRESSO)

Location: Peony II

16:00-16:30

Break

16:30-18:00

D3-1630_IB APSIPA General Assembly, Founders' Forum & Closing Ceremony

Location: Island Ballroom



Day 3: Friday, 24 Oct 2025 - With Papers

08:00-08:30

D3-0800_IB Registration

Location: Island Ballroom

08:30-10:00

D3-0830_IB Sadaoki Furui Prize Papers and Conference Award Session I

Location: Island Ballroom

Sadaoki Furui Prize Papers

Research paper award: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, for their paper “Onoma-to-wave: environmental sound synthesis from onomatopoeic words", APSIPA Transactions on Signal and Information Processing, Vol. 11: No. 1, e13. 2022

Overview paper award: Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, and Jonghye Woo, for their paper “Deep unsupervised domain adaptation: a review of recent advances and perspectives" , APSIPA Transactions on Signal and Information Processing, Vol. 11: No. 1, e25. 2022

Conference Award Session I

D3-0830_IB.1 105 Integrating Semantic Knowledge for Enhanced Weakly-Supervised Group Activity Recognition

Muhammad Adi Nugroho, Jinyoung Park, Yeeun Seong, Changick Kim

D3-0830_IB.2 402 Directed Graph Dynamic Mode Decomposition for Nonlinear State-Space Modeling

Hiromu Kanauchi, Ryuto Ito, Hiroyasu Yasuda, Masaaki Nagahara, Yuichi Tanaka, Shogo Muramatsu

08:30-10:00

D3-0830_L1 Music and Singing

Location: Lotus I

D3-0830_L1.1 71 Unified Timbre Transfer: A Compact Model for Real-Time Multi-Instrument Sound Morphing

Anders R. Bargum, Naotake Masuda, Bogdan Teleaga, Andrew Fyfe, Cumhur Erkut

D3-0830_L1.2 94 REAL-WORLD MUSIC PLAGIARISM DETECTION WITH MUSIC SEGMENT TRANSCRIPTION SYSTEM

Seonghyeon Go

D3-0830_L1.3 141 Attention-based Adaptive Structured Patchout Spectrogram Transformer for Music Classification

Yuan Liu, Shoji Makino, Lingqing Liu, Yichen Yang

D3-0830_L1.4 307 Accuracy Improvement of Automatic Chord Recognition with Source Separation Preprocessing

Ayumu Mitoma, Ken'ichi Furuya

D3-0830_L1.5 362 Effects of Music Training Experience on the Production of English Rhythm by Chinese Learners

Ying Chen, Chenyu Li, Ruizhe Wang, Yujia Zhang

D3-0830_L1.6 391 Hierarchical Symbolic Music Generation with Variational Autoencoder-based Bar-wise Feature Sequences

Keito Sawada, Wen-Chin Huang, Tomoki Toda

D3-0830_L1.7 421 Singing MIDI Transcription with Music Language Models: Formulation and Comparison

Yu Sugimoto, Jun-You Wang, Li Su, Eita Nakamura

D3-0830_L1.8 451 Data-Efficient Music Captioning via Contrastive and Semantic Alignment

Leekyung Kim, Jonghun Park

D3-0830_L1.9 482 GAN-Enhanced InpaintNet for Music Inpainting on Limited Data

Koumei Naemura, Boyu Cao, Ryotaro Nagase, Ryoichi Takashima, Yoichi Yamashita

D3-0830_L1.10 498 An Analysis of Singing Accuracy towards Quantifying the Melodic Singability

Minami Kawahara, Tetsuro Kitahara

D3-0830_L1.11 569 Guitar Tone Morphing by Diffusion-based Model

Kuan‐Yu Chen, Kuan‐Lin Chen, Yu‐Chieh Yu, Jian-Jiun Ding

08:30-10:00

D3-0830_L2 Speech Communication and Media Generation

Location: Lotus II

D3-0830_L2.1 256 Active Learning for Text-to-Speech Synthesis with Informative Sample Collection

Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari

D3-0830_L2.2 272 Semi-Supervised End-to-End Speech-to-Text Translation with Joint Text-to-Text and Speech-to-Text Decoding

Tomohiro Tanaka, Ryo Masumura, Naoki Makishima, Mana Ihori, Shota Orihashi, Satoshi Suzuki, Taiga Yamane

D3-0830_L2.3 344 UTRo-NAST: Non-Autoregressive Speech Translation via Understanding, Translation, and Reordering

Yu-Chen Kuan, Kuan-Yu Chen

D3-0830_L2.4 384 Laughing Across Borders: A Culturally-Aware Joke Generator for Asian Regions

Ashley Fang Cai Xian, Chen Ting Ng, Ashley Kok Siu Cheng, Wah Yang Tan, Mohan Raj Chanthran, Lay-Ki Soon, Meisin Lee

D3-0830_L2.5 427 Synthesizing Vowel-Like Tones with Pitch Circularity

Kaori Hashimoto, Takao Kawamura, Nobutaka Ono

D3-0830_L2.6 428 Error Correction Using LLMs for Sentence Estimation from Ambiguous Inputs via Wearable Keyboards

Matsuri Iwasaki, Masanobu Abe, Sunao Hara

D3-0830_L2.7 450 A Robust End to End Spoken Grammar Assessment System

Sunil Kumar Kopparapu, Chitralekha Bhat, Ashish Panda

D3-0830_L2.8 452 LAPS-Diff: A Diffusion-Based Framework for Hindi Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning

Sandipan Dhar, Mayank Gupta, Preeti Rao

D3-0830_L2.9 472 End-to-end multi-channel speaker extraction and binaural speech synthesis

Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Ge Yao, Xiaodong Li, Chengshi Zheng

D3-0830_L2.10 487 Improving Listening Head Generation Performance Using Speech Representations from Self-Supervised Learning

Tamon Mikawa, Yasuhisa Fujii, Yukoh Wakabayashi, Kengo Ohta, Ryota Nishimura, Norihide Kitaoka

D3-0830_L2.11 616 ULF-TTS: An Uncluttered Hybrid TTS System using Language and Flow Matching Models

Jae Hyun Park, Young Sik Eom, SeungJae Choi, Allison Shindell, Min-Gwan Seo, Gyeong-hoon Lee

08:30-10:00

D3-0830_H3 Deep Learning: Algorithm, Implementations, and Applications

Location: Hibiscus III

D3-0830_H3.1 49 Honey Adulteration Detection via Robust Diffusion Classifier and Hyperspectral Imaging

Weihao Tang, Guyang Zhang, Waleed Abdulla

D3-0830_H3.2 55 Semantic-Fast-SAM: Efficient Semantic Segmenter

Byunghyun Kim

D3-0830_H3.3 84 Micro-expression Recognition Using VideoMamba and Regional Selective Mixup

Yu-Chen Lin, Yi-Jing Chen, Chih-Chang Yu, Hsu-Yung Cheng

D3-0830_H3.4 108 100x Monolingual Data Augmentation Using LLMs to Build a Parallel Corpus for Machine Translation

Hitoshi Ito, Naoto Shirai, Kazutaka Kinugawa, Hideya Mino, Yoshihiko Kawai

D3-0830_H3.5 185 Neural Analog Filter for Sampling-Frequency-Independent Convolutional Layer

Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, Hiroshi Saruwatari

D3-0830_H3.6 372 Enhancing Technical Documents Retrieval for RAG

Songjiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kaiwen Xue, Kwan-Ho Lin, Yan-Ming Choi, Vincent Ng, Kin-Man Lam

D3-0830_H3.7 385 Lightweight Zero-Shot Keyword Spotting via Multi-Granular Knowledge Distillation

Yun-Ting Sun, Lo-Ya Li, Tien-Hong Lo, Jeih-Weih Hung, Shih-Chieh Huang, Berlin Chen

D3-0830_H3.8 386 Monomial Matrix Relocation on the Loss Function Level-Set of Feedforward Neural Networks

Ozgur Soysal, Arda Ozdemir, Yiğit Yıldırım, Orhan Arikan

D3-0830_H3.9 387 Low-Rank Compression of Neural Network Weights by Null-Space Encouragement

Arda Ozdemir, Ozgur Soysal, Ege Doğanay, Yiğit Yıldırım, Orhan Arikan

D3-0830_H3.10 422 Sign-MExD: An Expert-Infused Diffusion Model for Sign Language Production

Jiayu Shen, Kalin Stefanov, Vee Yee Chong, Lay-Ki Soon, KokSheik Wong

D3-0830_H3.11 470 FEKF: Flow-based Extended Kalman Filter

Pham Hai Anh, Tran Trong Duy, Do Hai Son, Karim Abed-Meraim, Nguyen Linh Trung

08:30-10:00

D3-0830_P1 Advanced Signal Processing and Resource Management for Sensing and Communication

Location: Peony I

D3-0830_P1.1 144 A reinforcement learning-based approach to cooperative multi-UAV task allocation

Naohiro Kubota, Hideyoshi Miura, Tomotaka Kimura, Kouji Hirata

D3-0830_P1.2 167 Trajectory Design of UAVs-Assisted Edge Computing Systems for Efficient Data Collection from Animal Herds

Nao Maeda, Tomotaka Kimura, Kouji Hirata

D3-0830_P1.3 169 Priority-based RCSA method considering required frequency slot width in multi-core fiber networks

Funa Fukui, Yutaka Fukuchi, Kouji Hirata

D3-0830_P1.4 219 Retraining-Free Blockage Prediction for Millimeter-Wave Communications Based on Minor Components of Angular Power Profiles

Xiaoqing Tong, Kohei Mitani, Kazunori Hayashi, Koji Yamamoto, Takuto Arai, Shuki Wai, Tatsuhiko Iwakuni, Daisei Uchida

D3-0830_P1.5 220 Modified Resource Allocation Algorithm based on Co-channel Interference Prediction in Local 5G Environmentts

Takeru Nanjo, Osamu Takyu

D3-0830_P1.6 341 Implicit Interference Status Notification Through Time & Frequency Resource Selection in LoRaWAN

Yuto Hayasaka, Koichi Adachi

D3-0830_P1.7 400 Wireless Environment Estimation with Directional Antennas using Radio Environment Database for Wireless Information and Power Transfer in Smart Factories

Kohei Yuzawa, Zhengdong Lin, Yu Kagaya, Yoshiaki Narusue, Takeo Fujii

D3-0830_P1.8 416 Data-Driven Tuning of Neural Network Aided Least Squares for UWB-TDoA Indoor Positioning

Ryoichi Kawaguchi, Shinsuke Ibi, Hisato Iwai

08:30-10:00

D3-0830_P2 Machine Learning: Algorithms and Application II

Location: Peony II

D3-0830_P2.1 83 Equivalence of Graph Signal Processing Using a Hermitian Graph Laplacian and Its Corresponding Graph Laplacian with Duplicated Nodes

Akira Tanaka

D3-0830_P2.2 207 Skeleton-sequence-based Early Action Recognition by Using Graph Convolutional Neural Networks and Knowledge Distillation Techniques

Wen-Nung Lie, Kien Truc Le, Veasna Vann, Jui-Chiu Chiang, Ngoc Dung Bui

D3-0830_P2.3 232 A State-Dependent Model for Identification of Time-varying Directed Graphs

Yuzhe Li, Hangjing Zhang, H. Vicky Zhao

D3-0830_P2.4 309 Unrolled Multimodal Signal Restoration with Signed Twofold Graph Learning

Haruki Yokota, Yuichi Tanaka, Higashi Hiroshi

D3-0830_P2.5 343 Efficient Sparse Matrix Acceleration for Deep Learning via Two-Step Bitmap Tensor Architecture

Jia-Hong Weng, Tu Wei-Chen

D3-0830_P2.6 352 Distance-based Laplacian Algebra for Effective Subgraph Filter Learning

Purui Zhang, Feng Ji, Yanan Zhao, Wee Peng Tay, Bihan Wen

D3-0830_P2.7 370 Evaluation of Low-Resource and High-Efficiency Deep Learning Accelerator for Clinical Dental Diagnosis

Lin Yuan-Jin, Chang Yu-Jen, Liang Chin-Hao, Wei Sung-Tsun, Weng Jia-Hong, Chen Shih-Lun, Tu Wei-Chen

D3-0830_P2.8 412 Quantization Index Modulation-based Reversible Data Hiding in Compressed Neural Network

Jun Hirano, Jonethe Tan Yang, Fathin Acyuta Makarim, Daham Jayasinghe, KokSheik Wong

D3-0830_P2.9 460 Dense Vector Retrieval in Data Federation

Amorntip Prayoonwong, Yang-Chun Hsu, Xin-Jie Ye, Po-Kai Lu, Chih-Hang Wang, Chih-Yi Chiu

D3-0830_P2.10 557 Organ Detection Based on Vision-Language Model For Abdominal CT Images

Jun-Hong Ou, Bo-Xian Wang, Yu-Hong Zheng, Sufal K. Chhabra, Guo-SHiang Lin, Shen-Lei Yan, Chen-Kuo Chiang

D3-0830_P2.11 644 Locally-Structured Unitary Network

Shogo Muramatsu

D3-0830_P2.12 625 Algorithm-Architecture Co-Exploration of Systolic Arrays Using High-Level Synthesis

Chu-Chun Yang, Gwo Giun Lee, Tsung-Ying Tsai, Jie-Ren Zheng, Yue-Cong Kuo, Wei-Chieh Lee, Yu-Kai Chou, Ryan Pary

10:00-10:30

Break

10:30-12:00

D3-1030_IB Conference Award Session II

Location: Island Ballroom

D3-1030_IB.1 69 Digital-Optical Hybrid Computation for Deep Unfolding-Aided MIMO Signal Detection

Takumi Nishiyama, Lantian Wei, Tadashi Wadayama

D3-1030_IB.2 284 Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation

Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen

D3-1030_IB.3 36 Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li

D3-1030_IB.4 355 Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR

Shreyas Gopal, Ashutosh Anshul, Haoyang Li, Yue Heng Yeo, Hexin Liu, Eng Siong Chng

D3-1030_IB.5 530 Rethinking Robust ASR Strategies: Can Textual In-Context Learning Improve Acoustic Robustness?

Benita Angela Titalim, Faisal Mehmood, Sakriani Sakti

D3-1030_IB.6 586 Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning

Haorui He, Yucheng Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, Zhizheng Wu

10:30-12:00

D3-1030_L1 Grand Challenge & Audio Processing

Location: Lotus I

D3-1030_L1.1 110 DG-SED: Domain Generalization for Sound Event Detection with Heterogeneous Training Data

Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

D3-1030_L1.2 138 Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering

Xuemai Xie, Xianrui Wang, Liyuan Zhang, Yichen Yang, Shoji Makino

D3-1030_L1.3 139 DySiME: Dynamic Single-Source Multichannel Enhancement Using Time-Varying Directional Cues

Hao Liang, Yichen Yang, Xiao Zhang, Shoji Makino, Jingdong Chen

D3-1030_L1.4 228 Demixing Filter Estimation for Bleeding-Sound Reduction of a Vocal Microphone

Soushi Taninomiya, Daichi Kitamura, Norihiro Takamune, Kouei Yamaoka, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Hayato Yamakawa

D3-1030_L1.5 401 Prior-Guided Source Separation with Direct Update of Back-projected Demixing Vectors

Kukuru Koiso, Taishi Nakashima, Nobutaka Ono

D3-1030_L1.6 588 Meta-Learning with Pretrained Audio Representations Enables One-Shot Acoustic Signal Classification

Haoxiang Wu, Zhengqiao Zhao, Jingdong Chen, Jacob Benesty

D3-1030_L1.7 634 Self-Rotation-Robust Online-Independent Vector Analysis with Sound Field Interpolation on Circular Microphone Array

Taishi Nakashima, Yukoh Wakabayashi, Nobutaka Ono

D3-1030_L1.8 642 A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication

Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling

D3-1030_L1.9 648 A Semi-Supervised Acoustic Scene Classification Network Based on Multi-Modal Information Fusion

Junkang Yang, Hongqing Liu, Liming Shi, Lu Gan, Hiromitsu Nishizaki, Chee Siang Leow

D3-1030_L1.10 651 ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification

Bochao Sun, Dong Wang, Zhanlong Yang, Jun Yang, Han Yin

D3-1030_L1.11 652 Evaluation of Low-Frequency Restriction, Pitch-Shift Augmentation, and Average Pooling for Acoustic Scene Classification under Unseen-City Conditions

Takao Kawamura, Masayuki Sera, Nobutaka Ono

D3-1030_L1.12 653 The APSIPA ASC 2025 Grand Challenge on City and Time-Aware Semi-supervised Acoustic Scene Classification: Summary and Results

Jisheng Bai, Mou Wang, Haohe Liu, Bin Xiang, Yin Liu, Jianfeng Chen, Dongyuan Shi, Mark Plumbley, Susanto Rahardja, Woon-Seng Gan

10:30-12:00

D3-1030_L2 Speech Recognition and Understanding I

Location: Lotus II

D3-1030_L2.1 195 Phoneme-grapheme Dictionary-based Prompting for Robust Proper Noun Recognition in Japanese ASR

Ryuga Sugano, Hiroaki Sato, Asahi Sakuma, Tadashi Kumano, Yoshihiko Kawai, Shinji Watanabe

D3-1030_L2.2 282 LLM-Driven Hypothesis Set Refinement for Enhanced ASR Post-Processing

Chen-Han Wu, Kuan-Yu Chen

D3-1030_L2.3 390 Real-time VAD-less speech recognition by fine-tuning SSL model with data containing tagged non-speech segments

Jotaro Emoto, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka

D3-1030_L2.4 406 Improving Automatic Speech Recognition Model for Super-Elderly Voice Using Speech Synthesis Model

Ryota Uematsu, Chee Siang Leow, Norihide Kitaoka, Hiromitsu Nishizaki

D3-1030_L2.5 426 Improving Code-Switching Speech Recognition with TTS Data Augmentation

Yue Heng Yeo, Yu Chen Hu, Shreyas Gopal, Yi Zhou Peng, He Xin Liu, Eng Siong Chng

D3-1030_L2.6 443 PQSR: A Speech Corpus of Polar Questions and Spontaneous Responses in Standard Chinese with Complex Intentions Annotated

Yingyi Luo, Yue Huang, Qingke Sun, Shuwen Chen

D3-1030_L2.7 475 Toward Natural System Repair: An Analysis of Human Other-Initiated Self-Repair Patterns in Japanese Casual Conversations

Kazuya Tsubokura, Yurie Iribe, Norihide Kitaoka

D3-1030_L2.8 524 Self-Supervised Learning for Classification of Normal vs. Dysarthric Speech

Hiya Chaudhari, Kavya Kumar, Hemant Patil

D3-1030_L2.9 592 Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds

Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Panchal Nayak, Priyabrata Mallick, Swarup Ranjan Behera, Parabattina Bhagath, Pailla Balakrishna Reddy, Arun Balaji Buduru

D3-1030_L2.10 593 Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?

Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Parabattina Bhagath, Pailla Balakrishna Reddy, Arun Balaji Buduru

10:30-12:00

D3-1030_H3 Recent Advances in Multimodal Learning for Computer Vision

Location: Hibiscus III

D3-1030_H3.1 52 Allegory of the Cave: Breakdown of Illusions in Multimodal Perception with Neural Radiance Fields

Axel Paivansalo, Ching-Chun Chang, Hanrui Wang, Futa Waseda, Isao Echizen

D3-1030_H3.2 53 Overlapped Coffee Beans Detection and Localization Using a Low-Cost 3D Monocular Point Cloud Clustering Method

Isack Farady, Alifya Febriana, Chih-Yang Lin

D3-1030_H3.3 160 Interpretable Video-Text Alignment (VTA) for Cross-Modal Retrieval

Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, Suya You, C.-C. Jay Kuo

D3-1030_H3.4 205 Sequence Modeling and Generative Model Driven Non-Rigid 3D Reconstruction

Yuxin He, Hui Deng, Mingyi He, Yuchao Dai

D3-1030_H3.5 302 Robust Audio-Visual Speech Recognition in Noisy Clinical Environments

Akshita Abrol, Ridwan Arefeen, Haotong Yu, Alexi George, Kelvin Li, Zhengkui Wang, Rong Tong

D3-1030_H3.6 334 Integrating Visual XAI and LLMs for Interpretable Medical Image Analysis

Chern Hong Lim, Xin Hui Lor

D3-1030_H3.7 363 InternVL-VPR: Hierarchical Zero-Shot Visual Place Recognition with VLM-driven Re-Ranking

Zhi Hu, Liang Liao, Weisi Lin

D3-1030_H3.8 375 Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI

Phat Nguyen, Ngai-Man Cheung

D3-1030_H3.9 388 Adapting Vision-Language Models for Information Extraction from Bilingual Medical Invoices

Ha Do, Anh Dung DO, Thanh-Ha DO

D3-1030_H3.10 399 Zero-shot Artistic Text Recognition with Multimodal Language Models

Tien Do, Thuyen Tran, Duy-Dinh Le, Thanh Duc Ngo

D3-1030_H3.11 604 Attention Based Deep Reference Frame Enhancement for VVC Inter Prediction

Linchen Xu, Zhikai Liu, Fan Liang

10:30-12:00

D3-1030_P1 Signal & Information Processing II

Location: Peony I

D3-1030_P1.1 137 Low-Complexity Total Variation-based Signal Reconstruction with Adaptive Gradient Descent for Compressive sensing

Pei-Cheng Yeh, Chieh-Li Wang, Yuan-Hao Huang

D3-1030_P1.2 322 Robust Initialization Strategies for Hankel Structured Low-Rank Approximation via Variable Projection

Natsuki Yoshino, Akira Tanaka

D3-1030_P1.3 347 High-Resolution ISAR Imaging for High-Speed Targets via Joint Intra-Pulse and Inter-Pulse Translational Motion Compensation

Jiabao Wang, Shuai Shao, Jiaqi Wei

D3-1030_P1.4 381 Sparse Echo Reconstruction of Micro-motion Targets Under the Joint Constraints of Low-rank and Periodic Consistency

Mingming Jin, Jun Wang, Shaoming Wei, Peng Lei

D3-1030_P1.5 393 Distributed Extended Object Tracking with Adaptive Networks

Kaidi Yang, Wei Xia

D3-1030_P1.6 394 Extended Object Tracking: A DNN-aided approach

Runhe Gan, Wei Xia

D3-1030_P1.7 455 Non-negative Learned ISTA with Reflected-ReLU-Augmented L1 Regularization

Haruki Esaki, Towa Yasui, Seisuke Kyochi

D3-1030_P1.8 480 Phoneme-Specific Challenges to Intelligibility in Hearing Impairment Under Noisy Condition

Denawati Junia, Candy Olivia Mawalim

10:30-12:00

D3-1030_P2 Machine Learning: Algorithms and Application III

Location: Peony II

D3-1030_P2.1 35 Audio-Visual Fusion Framework for Low-Resource Language Speech Recognition Based on Progressive Down-sampling and Grouped Multi-Heads Attention Mechanism

ChongChong Yu, Xiaolong Xu, Zhaopeng Qian, Kejing Xiao, Yuchen Tan

D3-1030_P2.2 239 A Data-Driven Control Framework Using Deep Reinforcement Learning for Autonomous Driving

Mei-Lin Huang, Ching-Hung Lee, Cheng-Ting Huang, Hsin-Han Chiang

D3-1030_P2.3 291 Recipe Diffusion: Cross-Frame Attention and Region-Aware Diffusion for Coherent Visual Recipe Instruction Generation

Weiyi Xia, Satoru Fujita

D3-1030_P2.4 311 Improving Few-Shot Classification via Feature-Aligned AI-Generated Images

Yu-Wen Tung, Mei-Chen Yeh

D3-1030_P2.5 312 Rotation Invariant Automatic Rigging for 3D Human Scan Data

Yiqing Li, Satoru Fujita

D3-1030_P2.6 314 SinDiffPhase: High-Quality Phase Estimation with Ultra-Fast Single-Step Diffusion

Yifei Ni, Andong Li, Lingling Dai, Erwei Yin, Qunping Ni, Chengshi Zheng

D3-1030_P2.7 411 MapCVAE: Probabilistic Prediction of Diverse Pedestrian Behaviors on General Roads

Konosuke Kobayashi, Satoru Fujita

D3-1030_P2.8 457 Herald: Democratizing Compositional Reasoning for Visual Tasks without Any Training

Guan Yuan Tan, Arghya Pal, Sailaja Rajanala, Raphael C.-W. Phan, Chee-Ming Ting

D3-1030_P2.9 493 Canopy to Canopy: Evaluating Model Generalization In 3D Tropical Forest Semantic Segmentation

Sue Han Lee, Brenda Ru Yi Sim, Chung Siung Choo, Yuen Peng Loh

D3-1030_P2.10 520 LSTM-Transformer Hybrid Network for UAV-Bird Classification Using Radar Track Information

Anning Jiang, Dianfeng Qiao, Shun Liu, Yan Liang

D3-1030_P2.11 559 A Unified Framework for Interpretable and Uncertainty-Aware Battery State of Health Estimation Using Deep Neural Networks

Elias Isaac Huai-En Lim, Nicholas Heng Loong Wong

12:00-13:30

Lunch

13:30-14:30

D3-1330_IB Keyntoe 3 by Xiao-Ping (Steven) Zhang

Location: Island Ballroom

14:30-16:00

D3-1430_IB Perspective 4: Leveraging LLMs in Interdisciplinary Study of Speech and Language

Location: Island Ballroom

D3-1430_IB.1 649 Normalization through Fine-tuning: Understanding Wav2vec2.0 Embeddings for Phonetic Analysis

Yiming Wang, Jiahong Yuan

D3-1430_IB.2 668 Enabling Internationalization of Affective Speech Technology using LLMs

Bo-Hao Su, Shinji Watanabe, Chi-Chun Lee

14:30-16:00

D3-1430_L1 Speech Synthesis, Enhancement, and Recognition

Location: Lotus I

D3-1430_L1.1 17 End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation

Minghui Wu, Haitao Tang, Jiahuan Fan, Ruizhi Liao, Yanyong Zhang

D3-1430_L1.2 48 Disfluency Disentanglement Enhancement in Spoken-Text-Style Transfer for Spontaneous Speech Synthesis

Yuuto Nakata, Daiki Yoshioka, Wen-Chin Huang, Tomoki Toda

D3-1430_L1.3 98 DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement

Minghui Wu, Xueling Liu, Jiahuan Fan, Haitao Tang, Yanyong Zhang, Yue Zhang

D3-1430_L1.4 149 Emotion-Rich Cross-Speaker TTS via Contrastive Prosody Enhancement

Jen-Tzung Chien, Bryan Gautama Ngo

D3-1430_L1.5 243 Face-conditioned Large-scale Text-to-Speech via Speaker Embedding Prediction from Facial Images

Umi Okamoto, Sei Ueno, Akinobu Lee

D3-1430_L1.6 304 Few-shot Speaker Adaptation for Text-to-Speech Synthesis Using Non-Target Speaker Corpora for Glossectomy Patients

Masayori Okamura, Masanobu Abe, Sunao Hara

D3-1430_L1.7 491 Personalized Bone-Conduction Bandwidth Extension with Speaker Characteristics

Pan Xu, Zhongyu Zhang, Zhonghua Fu

D3-1430_L1.8 500 Time-Aligned Laughter Sound Event Recognition for Conversational Laughter Analysis and Synthesis

Hiroki Mori

D3-1430_L1.9 554 PALGAN: A Joint Optimization-Based Preprocessing Method for Speech Restoration in Parametric Array Loudspeakers

Wenyao Ma, Jun Yang

D3-1430_L1.10 600 Beyond One-Shot Dubbing: Leveraging N-Best Translation and Prompted Paraphrasing with Synchrony-Aware Re-Ranking

Jan Saragih, Faisal Mehmood, Sakriani Sakti

14:30-16:00

D3-1430_L2 Speech Recognition and Understanding II

Location: Lotus II

D3-1430_L2.1 151 Probabilistic Language-Aware Speech Recognition

Jen-Tzung Chien, Willianto Sulaiman, Chung-Hsuan Wang

D3-1430_L2.2 226 LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models

Ryutaro Oshima, Yuya Hosoda, Yoji Iiguni

D3-1430_L2.3 238 Multi-stage Speech Enhancement with Cascaded SNR Domain Shifts

Xiaoran Li, Zilu Guo, Jun Du

D3-1430_L2.4 280 Autoencoder-Driven Latent Representation Learning for Language-Agnostic Disordered Speech Classification using a Universal Feature set

Puneet Bawa, Virender Kadyan, Shareef Babu Kalluri

D3-1430_L2.5 296 FH-RestoreASR: Frequency-Hopping Robust Air Traffic Control Speech Restoration and Recognition

Youngeun Kwon, Yeri Byun, Hyunsung Cho, Jongwon Choi

D3-1430_L2.6 342 Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM

Ryandhimas Zezario, Dyah Wisnu, Hsin-Min Wang, Yu Tsao

D3-1430_L2.7 392 GCI detection and glottal wave estimation based on TV-CAR speech analysis

Keiichi Funaki

D3-1430_L2.8 405 HIPA-MoE: A Parameter-Efficient Fine-Tuning Architecture with Hierarchical Adapter-based Mixture-of-Experts for Multilingual ASR

Xun Lu, Xuyang Wang, Gaofeng Cheng, Lin Zheng, Pengyuan Zhang

D3-1430_L2.9 542 Mild Cognitive Impairment Detection via Linear Discriminant Analysis of Picture Description Speech Features: A Cross Corpus Comparison

Yan-Lin Lai, Erh-Yun Chang, Yi-Wen Liu, Jung Lung Hsu, Hui-Chuan Hsu

D3-1430_L2.10 556 Parameter-Efficient Fine-Tuning of Foundation Models for CLP Speech Classification

Susmita Bhattacharjee, Jagabandhu Mishra, H.S. Shekhawat, S R Mahadeva Prasanna

D3-1430_L2.11 575 Language Awareness in Code-Switching Speech Recognition

Jen-Tzung Chien, Bobbi Aditya

14:30-16:00

D3-1430_H3 Privacy and Security in Speech AI

Location: Hibiscus III

D3-1430_H3.1 86 Voice Privacy Protection with Adversarial Examples Using Anchor Speaker Embedding

Shunya Ishikawa, Yuki Katsumata, Toru Nakashika

D3-1430_H3.2 133 Investigation of perception inconsistency in speaker embedding for asynchronous voice anonymization

Rui Wang, Liping Chen, Kong Aik Lee, Zhengpeng Zha, Zhenhua Ling

D3-1430_H3.3 194 SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See

D3-1430_H3.4 200 An Enhanced Probabilistic Approach for Singfake Generation

Arth Shah, Aniket Pandey, Satyam R. Tiwari, Hemant Patil

D3-1430_H3.5 417 Neural Semi-fragile Watermarking for Proactive Deepfake Speech Detection

Dohyun Yoon, Tomoki Toda

D3-1430_H3.6 495 Investigating Self-supervised Learning-Based Front-End for Multi-Channel Replay Attack Detection

Takuo Yamaguchi, Sayaka Shiota, Naohiro Tawara

D3-1430_H3.7 515 Transferability of Adversarial Examples across Speaker Embedding Models for Voice Privacy Protection

Kotaro Nakamura, Takuya Takahashi, Toru Nakashika

D3-1430_H3.8 538 Voice Privacy Preservation with Multiple Random Orthogonal Secret Keys: Attack Resistance Analysis

Kohei Tanaka, Hitoshi Kiya, Sayaka Shiota

D3-1430_H3.9 547 CycleSiFiNF-VC: Controllable Non-Parallel Voice Conversion by Neural Formant Manipulation with Improved Cycle-Consistency Loss

Sumiharu Kobayashi, Takashi Nose, Akinori Ito

D3-1430_H3.10 608 Recoverable Audio Adversarial Examples for Voice Protection in One-shot Voice Conversion

Chenshuai Shu, Tianpeng Zheng, Yanxiang Chen

D3-1430_H3.11 623 Reference-free Adversarial Sex Obfuscation in Speech

Yangyang Qu, Michele Panariello, Massimiliano Todisco, Nicholas Evans

14:30-16:00

D3-1430_P1 Late Breaking Reports

Location: Peony I

D3-1430_P1.1 638 DCMix:Distance Based ClassMix for Unsupervised Domain Adaptation

Chen-Kuo Chiang, Yue-Lin Yang, Huai En Lee

D3-1430_P1.2 654 BiGaitNet: Deep CNN-Based Classification of Parkinson’s Disease Gait Abnormalities Using a Smart Insole Robust to Fewer Plantar Sensors

Eun-Seo Park, Xianghong Liu, Chang-Hee Han

D3-1430_P1.3 655 EEG Modulation Associated with Music-Induced Emotional Responses

Mayu Goto, Motoko Iwashita, Ingon Chanpornpakdi, Makiko Ishikawa, Kenji Ishida, Toshihisa Tanaka

D3-1430_P1.4 656 Nonlinear System Identification Approach under Noisy Input Signals and Impulse Observed Noise by Kernel Adaptive Filtering Algorithm

Ying-Ren Chien, En-Ting Lin

D3-1430_P1.5 657 Low-Resource Atayal Speech Recognition into Atayal and Chinese Texts

Po-Cheng Chan, Ching-Ting Hsin, Di Tam Luu, Chi-Tao Chen, Jia Ching Wang

D3-1430_P1.6 658 Retinal Artery–Vein Segmentation via Attention-Guided W-Net and GAN-Based Boundary Refinement

Jing-Ming Guo, De-Yu Guu, Yih-Ping Luh, Yi-Chong Zeng

D3-1430_P1.7 659 Study on Optimal Online Modeling of the Feedback and Secondary Paths in ANC Systems

Fuma Tsujiwaki, Shota Toyooka, Kenta Iwai, Yoshinobu Kajikawa

D3-1430_P1.8 660 Local Contrast Enhancement in LDR Images via Adaptive Distribution of Clipped-histogram Excess

Seong-hyun Jin, Dong-Min Son, Young-Ho Go, Sung-Hak Lee

D3-1430_P1.9 661 A Lightweight and Reversible Audio Watermarking Scheme Based on Integer Wavelet Transform

Xuping HUANG, Akinori Ito

D3-1430_P1.10 662 A Novel Hybrid Active Noise Control System with Remote Microphone based Virtual Sensing Algorithm

Shota Toyooka, Yoshinobu Kajikawa

D3-1430_P1.11 663 Enhancing Speech Quality in Scintillating Satellite Communications: A Rician Fading Modeling Approach (final version)

Teh Kah Kuan

D3-1430_P1.12 664 Variance-driven U-Net Training and Chroma-scale-based Multi-exposure Image Fusion

Changwoo Son, Youngho Go, Seunghwan Lee, Sunghak Lee

D3-1430_P1.13 665 A New Generation with Variational Autoencoder and Music Transformer for 8-bit MIDI Music

Qian-Yi Zhuang, Shih-Ming Wang, Mau-Yi Tian, Te-Jen Su, Mong-Fong Horng

D3-1430_P1.14 666 Toward Physics-Guided LLM Libraries for SINDy: Framework and Rheology Example

Shota Kato, Takeshi Sato, Souta Miyamoto

D3-1430_P1.15 667 Joint Design of Low Sidelobe Radar Waveform and Filter with Hardware Platform Verification

Haoqian Rong, Shaojie Wang, Zining Zhao, Jiawei Zhang

14:30-16:00

D3-1430_P2 Scalable and Efficient Signal Processing for Multimodal AI Systems (ESPRESSO)

Location: Peony II

D3-1430_P2.1 115 Exploring Audio-Visual Fusion Methods in Foundation Model-Based Deception Detection

Jiaxiang Meng, Hardik B. Sailor, Qiongqiong Wang, Tianchi Liu, Kong Aik Lee, Xingmei Wang

D3-1430_P2.2 153 Emot-CM-BERT: Adaptive Attention and Class-Aware Cross-Modal Learning for Emotion Recognition from Audio and Text

Shintami Chusnul Hidayati, James Rafferty Lee, Kevin Davi Samuel

D3-1430_P2.3 206 DP-GS: Depth-prior & Perception-guided Gaussian Splatting for Sparse-view Novel View Synthesis

Bowen Gao, Zhicheng Lu, Mingyi He, Yuchao Dai

D3-1430_P2.4 267 Efficient Video to Audio Mapper with Visual Scene Detection

Mingjing Yi, Yuxi Wang, Ming Li

D3-1430_P2.5 481 Adversarial Learning for Duration Prediction in Indonesian Text-to-Speech: Modification to Stochastic and Deterministic Predictors

Yoga Tiara Wiguna, Bima Prihasto, Boby Mugi Pratama, Chia-Hung Yeh, Jia-Ching Wang

D3-1430_P2.6 505 Narrativity-Aware Video Summarization Based on Vision and Language Foundation Models

Shumpei Saito, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii

D3-1430_P2.7 523 RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing

Yang Xiao, Ting Dang, Rohan Kumar Das

D3-1430_P2.8 534 Dynamic Fusion Multimodal Network for SpeechWellness Detection

Wenqiang Sun, Han Yin, Jisheng Bai, Jianfeng Chen

D3-1430_P2.9 563 AIGuard: Anomaly Detection in Surveillance Videos with YOLOv8

Rungpilin Anantathanavit, Supakorn Suthirat, Po-Chyi Su

D3-1430_P2.10 599 Ensemble Confidence Calibration for Sound Event Detection in Open-environment

Yuanjian Chen, Han Yin

D3-1430_P2.11 602 Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet

Wenmiao Gao, Han Yin

16:00-16:30

Break

16:30-18:00

D3-1630_IB APSIPA General Assembly, Founders' Forum & Closing Ceremony

Location: Island Ballroom