22-24 October 2025, Singapore

Day 3: Friday, 24 Oct 2025 - Overview |
|
08:00-08:30 |
D3-0800_IB Registration Location: Island Ballroom |
08:30-10:00 |
D3-0830_IB Sadaoki Furui Prize Papers and Conference Award Session I Location: Island Ballroom |
08:30-10:00 |
D3-0830_L1 Music and Singing Location: Lotus I |
08:30-10:00 |
D3-0830_L2 Speech Communication and Media Generation Location: Lotus II |
08:30-10:00 |
D3-0830_H3 Deep Learning: Algorithm, Implementations, and Applications Location: Hibiscus III |
08:30-10:00 |
D3-0830_P1 Advanced Signal Processing and Resource Management for Sensing and Communication Location: Peony I |
08:30-10:00 |
D3-0830_P2 Machine Learning: Algorithms and Application II Location: Peony II |
10:00-10:30 |
Break |
10:30-12:00 |
D3-1030_IB Conference Award Session II Location: Island Ballroom |
10:30-12:00 |
D3-1030_L1 Grand Challenge & Audio Processing Location: Lotus I |
10:30-12:00 |
D3-1030_L2 Speech Recognition and Understanding I Location: Lotus II |
10:30-12:00 |
D3-1030_H3 Recent Advances in Multimodal Learning for Computer Vision Location: Hibiscus III |
10:30-12:00 |
D3-1030_P1 Signal & Information Processing II Location: Peony I |
10:30-12:00 |
D3-1030_P2 Machine Learning: Algorithms and Application III Location: Peony II |
12:00-13:30 |
Lunch |
13:30-14:30 |
D3-1330_IB Keyntoe 3 by Xiao-Ping (Steven) Zhang Location: Island Ballroom |
14:30-16:00 |
D3-1430_IB Perspective 4: Leveraging LLMs in Interdisciplinary Study of Speech and Language Location: Island Ballroom |
14:30-16:00 |
D3-1430_L1 Speech Synthesis, Enhancement, and Recognition Location: Lotus I |
14:30-16:00 |
D3-1430_L2 Speech Recognition and Understanding II Location: Lotus II |
14:30-16:00 |
D3-1430_H3 Privacy and Security in Speech AI Location: Hibiscus III |
14:30-16:00 |
D3-1430_P1 Late Breaking Reports Location: Peony I |
14:30-16:00 |
D3-1430_P2 Scalable and Efficient Signal Processing for Multimodal AI Systems (ESPRESSO) Location: Peony II |
16:00-16:30 |
Break |
16:30-18:00 |
D3-1630_IB APSIPA General Assembly, Founders' Forum & Closing Ceremony Location: Island Ballroom |
Day 3: Friday, 24 Oct 2025 - With Papers |
|
08:00-08:30 |
D3-0800_IB Registration Location: Island Ballroom |
08:30-10:00 |
D3-0830_IB Sadaoki Furui Prize Papers and Conference Award Session I Location: Island Ballroom Sadaoki Furui Prize Papers Research paper award: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita, for their paper “Onoma-to-wave: environmental sound synthesis from onomatopoeic words", APSIPA Transactions on Signal and Information Processing, Vol. 11: No. 1, e13. 2022 Overview paper award: Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, and Jonghye Woo, for their paper “Deep unsupervised domain adaptation: a review of recent advances and perspectives" , APSIPA Transactions on Signal and Information Processing, Vol. 11: No. 1, e25. 2022 Conference Award Session I D3-0830_IB.1 105 Integrating Semantic Knowledge for Enhanced Weakly-Supervised Group Activity Recognition Muhammad Adi Nugroho, Jinyoung Park, Yeeun Seong, Changick Kim D3-0830_IB.2 402 Directed Graph Dynamic Mode Decomposition for Nonlinear State-Space Modeling Hiromu Kanauchi, Ryuto Ito, Hiroyasu Yasuda, Masaaki Nagahara, Yuichi Tanaka, Shogo Muramatsu |
08:30-10:00 |
D3-0830_L1 Music and Singing Location: Lotus I D3-0830_L1.1 71 Unified Timbre Transfer: A Compact Model for Real-Time Multi-Instrument Sound Morphing Anders R. Bargum, Naotake Masuda, Bogdan Teleaga, Andrew Fyfe, Cumhur Erkut D3-0830_L1.2 94 REAL-WORLD MUSIC PLAGIARISM DETECTION WITH MUSIC SEGMENT TRANSCRIPTION SYSTEM Seonghyeon Go D3-0830_L1.3 141 Attention-based Adaptive Structured Patchout Spectrogram Transformer for Music Classification Yuan Liu, Shoji Makino, Lingqing Liu, Yichen Yang D3-0830_L1.4 307 Accuracy Improvement of Automatic Chord Recognition with Source Separation Preprocessing Ayumu Mitoma, Ken'ichi Furuya D3-0830_L1.5 362 Effects of Music Training Experience on the Production of English Rhythm by Chinese Learners Ying Chen, Chenyu Li, Ruizhe Wang, Yujia Zhang D3-0830_L1.6 391 Hierarchical Symbolic Music Generation with Variational Autoencoder-based Bar-wise Feature Sequences Keito Sawada, Wen-Chin Huang, Tomoki Toda D3-0830_L1.7 421 Singing MIDI Transcription with Music Language Models: Formulation and Comparison Yu Sugimoto, Jun-You Wang, Li Su, Eita Nakamura D3-0830_L1.8 451 Data-Efficient Music Captioning via Contrastive and Semantic Alignment Leekyung Kim, Jonghun Park D3-0830_L1.9 482 GAN-Enhanced InpaintNet for Music Inpainting on Limited Data Koumei Naemura, Boyu Cao, Ryotaro Nagase, Ryoichi Takashima, Yoichi Yamashita D3-0830_L1.10 498 An Analysis of Singing Accuracy towards Quantifying the Melodic Singability Minami Kawahara, Tetsuro Kitahara D3-0830_L1.11 569 Guitar Tone Morphing by Diffusion-based Model Kuan‐Yu Chen, Kuan‐Lin Chen, Yu‐Chieh Yu, Jian-Jiun Ding |
08:30-10:00 |
D3-0830_L2 Speech Communication and Media Generation Location: Lotus II D3-0830_L2.1 256 Active Learning for Text-to-Speech Synthesis with Informative Sample Collection Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari D3-0830_L2.2 272 Semi-Supervised End-to-End Speech-to-Text Translation with Joint Text-to-Text and Speech-to-Text Decoding Tomohiro Tanaka, Ryo Masumura, Naoki Makishima, Mana Ihori, Shota Orihashi, Satoshi Suzuki, Taiga Yamane D3-0830_L2.3 344 UTRo-NAST: Non-Autoregressive Speech Translation via Understanding, Translation, and Reordering Yu-Chen Kuan, Kuan-Yu Chen D3-0830_L2.4 384 Laughing Across Borders: A Culturally-Aware Joke Generator for Asian Regions Ashley Fang Cai Xian, Chen Ting Ng, Ashley Kok Siu Cheng, Wah Yang Tan, Mohan Raj Chanthran, Lay-Ki Soon, Meisin Lee D3-0830_L2.5 427 Synthesizing Vowel-Like Tones with Pitch Circularity Kaori Hashimoto, Takao Kawamura, Nobutaka Ono D3-0830_L2.6 428 Error Correction Using LLMs for Sentence Estimation from Ambiguous Inputs via Wearable Keyboards Matsuri Iwasaki, Masanobu Abe, Sunao Hara D3-0830_L2.7 450 A Robust End to End Spoken Grammar Assessment System Sunil Kumar Kopparapu, Chitralekha Bhat, Ashish Panda D3-0830_L2.8 452 LAPS-Diff: A Diffusion-Based Framework for Hindi Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning Sandipan Dhar, Mayank Gupta, Preeti Rao D3-0830_L2.9 472 End-to-end multi-channel speaker extraction and binaural speech synthesis Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Ge Yao, Xiaodong Li, Chengshi Zheng D3-0830_L2.10 487 Improving Listening Head Generation Performance Using Speech Representations from Self-Supervised Learning Tamon Mikawa, Yasuhisa Fujii, Yukoh Wakabayashi, Kengo Ohta, Ryota Nishimura, Norihide Kitaoka D3-0830_L2.11 616 ULF-TTS: An Uncluttered Hybrid TTS System using Language and Flow Matching Models Jae Hyun Park, Young Sik Eom, SeungJae Choi, Allison Shindell, Min-Gwan Seo, Gyeong-hoon Lee |
08:30-10:00 |
D3-0830_H3 Deep Learning: Algorithm, Implementations, and Applications Location: Hibiscus III D3-0830_H3.1 49 Honey Adulteration Detection via Robust Diffusion Classifier and Hyperspectral Imaging Weihao Tang, Guyang Zhang, Waleed Abdulla D3-0830_H3.2 55 Semantic-Fast-SAM: Efficient Semantic Segmenter Byunghyun Kim D3-0830_H3.3 84 Micro-expression Recognition Using VideoMamba and Regional Selective Mixup Yu-Chen Lin, Yi-Jing Chen, Chih-Chang Yu, Hsu-Yung Cheng D3-0830_H3.4 108 100x Monolingual Data Augmentation Using LLMs to Build a Parallel Corpus for Machine Translation Hitoshi Ito, Naoto Shirai, Kazutaka Kinugawa, Hideya Mino, Yoshihiko Kawai D3-0830_H3.5 185 Neural Analog Filter for Sampling-Frequency-Independent Convolutional Layer Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, Hiroshi Saruwatari D3-0830_H3.6 372 Enhancing Technical Documents Retrieval for RAG Songjiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kaiwen Xue, Kwan-Ho Lin, Yan-Ming Choi, Vincent Ng, Kin-Man Lam D3-0830_H3.7 385 Lightweight Zero-Shot Keyword Spotting via Multi-Granular Knowledge Distillation Yun-Ting Sun, Lo-Ya Li, Tien-Hong Lo, Jeih-Weih Hung, Shih-Chieh Huang, Berlin Chen D3-0830_H3.8 386 Monomial Matrix Relocation on the Loss Function Level-Set of Feedforward Neural Networks Ozgur Soysal, Arda Ozdemir, Yiğit Yıldırım, Orhan Arikan D3-0830_H3.9 387 Low-Rank Compression of Neural Network Weights by Null-Space Encouragement Arda Ozdemir, Ozgur Soysal, Ege Doğanay, Yiğit Yıldırım, Orhan Arikan D3-0830_H3.10 422 Sign-MExD: An Expert-Infused Diffusion Model for Sign Language Production Jiayu Shen, Kalin Stefanov, Vee Yee Chong, Lay-Ki Soon, KokSheik Wong D3-0830_H3.11 470 FEKF: Flow-based Extended Kalman Filter Pham Hai Anh, Tran Trong Duy, Do Hai Son, Karim Abed-Meraim, Nguyen Linh Trung |
08:30-10:00 |
D3-0830_P1 Advanced Signal Processing and Resource Management for Sensing and Communication Location: Peony I D3-0830_P1.1 144 A reinforcement learning-based approach to cooperative multi-UAV task allocation Naohiro Kubota, Hideyoshi Miura, Tomotaka Kimura, Kouji Hirata D3-0830_P1.2 167 Trajectory Design of UAVs-Assisted Edge Computing Systems for Efficient Data Collection from Animal Herds Nao Maeda, Tomotaka Kimura, Kouji Hirata D3-0830_P1.3 169 Priority-based RCSA method considering required frequency slot width in multi-core fiber networks Funa Fukui, Yutaka Fukuchi, Kouji Hirata D3-0830_P1.4 219 Retraining-Free Blockage Prediction for Millimeter-Wave Communications Based on Minor Components of Angular Power Profiles Xiaoqing Tong, Kohei Mitani, Kazunori Hayashi, Koji Yamamoto, Takuto Arai, Shuki Wai, Tatsuhiko Iwakuni, Daisei Uchida D3-0830_P1.5 220 Modified Resource Allocation Algorithm based on Co-channel Interference Prediction in Local 5G Environmentts Takeru Nanjo, Osamu Takyu D3-0830_P1.6 341 Implicit Interference Status Notification Through Time & Frequency Resource Selection in LoRaWAN Yuto Hayasaka, Koichi Adachi D3-0830_P1.7 400 Wireless Environment Estimation with Directional Antennas using Radio Environment Database for Wireless Information and Power Transfer in Smart Factories Kohei Yuzawa, Zhengdong Lin, Yu Kagaya, Yoshiaki Narusue, Takeo Fujii D3-0830_P1.8 416 Data-Driven Tuning of Neural Network Aided Least Squares for UWB-TDoA Indoor Positioning Ryoichi Kawaguchi, Shinsuke Ibi, Hisato Iwai |
08:30-10:00 |
D3-0830_P2 Machine Learning: Algorithms and Application II Location: Peony II D3-0830_P2.1 83 Equivalence of Graph Signal Processing Using a Hermitian Graph Laplacian and Its Corresponding Graph Laplacian with Duplicated Nodes Akira Tanaka D3-0830_P2.2 207 Skeleton-sequence-based Early Action Recognition by Using Graph Convolutional Neural Networks and Knowledge Distillation Techniques Wen-Nung Lie, Kien Truc Le, Veasna Vann, Jui-Chiu Chiang, Ngoc Dung Bui D3-0830_P2.3 232 A State-Dependent Model for Identification of Time-varying Directed Graphs Yuzhe Li, Hangjing Zhang, H. Vicky Zhao D3-0830_P2.4 309 Unrolled Multimodal Signal Restoration with Signed Twofold Graph Learning Haruki Yokota, Yuichi Tanaka, Higashi Hiroshi D3-0830_P2.5 343 Efficient Sparse Matrix Acceleration for Deep Learning via Two-Step Bitmap Tensor Architecture Jia-Hong Weng, Tu Wei-Chen D3-0830_P2.6 352 Distance-based Laplacian Algebra for Effective Subgraph Filter Learning Purui Zhang, Feng Ji, Yanan Zhao, Wee Peng Tay, Bihan Wen D3-0830_P2.7 370 Evaluation of Low-Resource and High-Efficiency Deep Learning Accelerator for Clinical Dental Diagnosis Lin Yuan-Jin, Chang Yu-Jen, Liang Chin-Hao, Wei Sung-Tsun, Weng Jia-Hong, Chen Shih-Lun, Tu Wei-Chen D3-0830_P2.8 412 Quantization Index Modulation-based Reversible Data Hiding in Compressed Neural Network Jun Hirano, Jonethe Tan Yang, Fathin Acyuta Makarim, Daham Jayasinghe, KokSheik Wong D3-0830_P2.9 460 Dense Vector Retrieval in Data Federation Amorntip Prayoonwong, Yang-Chun Hsu, Xin-Jie Ye, Po-Kai Lu, Chih-Hang Wang, Chih-Yi Chiu D3-0830_P2.10 557 Organ Detection Based on Vision-Language Model For Abdominal CT Images Jun-Hong Ou, Bo-Xian Wang, Yu-Hong Zheng, Sufal K. Chhabra, Guo-SHiang Lin, Shen-Lei Yan, Chen-Kuo Chiang D3-0830_P2.11 644 Locally-Structured Unitary Network Shogo Muramatsu D3-0830_P2.12 625 Algorithm-Architecture Co-Exploration of Systolic Arrays Using High-Level Synthesis Chu-Chun Yang, Gwo Giun Lee, Tsung-Ying Tsai, Jie-Ren Zheng, Yue-Cong Kuo, Wei-Chieh Lee, Yu-Kai Chou, Ryan Pary |
10:00-10:30 |
Break |
10:30-12:00 |
D3-1030_IB Conference Award Session II Location: Island Ballroom D3-1030_IB.1 69 Digital-Optical Hybrid Computation for Deep Unfolding-Aided MIMO Signal Detection Takumi Nishiyama, Lantian Wei, Tadashi Wadayama D3-1030_IB.2 284 Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen D3-1030_IB.3 36 Voice Conversion Augmentation for Speaker Recognition on Defective Datasets Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li D3-1030_IB.4 355 Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR Shreyas Gopal, Ashutosh Anshul, Haoyang Li, Yue Heng Yeo, Hexin Liu, Eng Siong Chng D3-1030_IB.5 530 Rethinking Robust ASR Strategies: Can Textual In-Context Learning Improve Acoustic Robustness? Benita Angela Titalim, Faisal Mehmood, Sakriani Sakti D3-1030_IB.6 586 Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning Haorui He, Yucheng Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, Zhizheng Wu |
10:30-12:00 |
D3-1030_L1 Grand Challenge & Audio Processing Location: Lotus I D3-1030_L1.1 110 DG-SED: Domain Generalization for Sound Event Detection with Heterogeneous Training Data Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das D3-1030_L1.2 138 Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering Xuemai Xie, Xianrui Wang, Liyuan Zhang, Yichen Yang, Shoji Makino D3-1030_L1.3 139 DySiME: Dynamic Single-Source Multichannel Enhancement Using Time-Varying Directional Cues Hao Liang, Yichen Yang, Xiao Zhang, Shoji Makino, Jingdong Chen D3-1030_L1.4 228 Demixing Filter Estimation for Bleeding-Sound Reduction of a Vocal Microphone Soushi Taninomiya, Daichi Kitamura, Norihiro Takamune, Kouei Yamaoka, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Hayato Yamakawa D3-1030_L1.5 401 Prior-Guided Source Separation with Direct Update of Back-projected Demixing Vectors Kukuru Koiso, Taishi Nakashima, Nobutaka Ono D3-1030_L1.6 588 Meta-Learning with Pretrained Audio Representations Enables One-Shot Acoustic Signal Classification Haoxiang Wu, Zhengqiao Zhao, Jingdong Chen, Jacob Benesty D3-1030_L1.7 634 Self-Rotation-Robust Online-Independent Vector Analysis with Sound Field Interpolation on Circular Microphone Array Taishi Nakashima, Yukoh Wakabayashi, Nobutaka Ono D3-1030_L1.8 642 A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling D3-1030_L1.9 648 A Semi-Supervised Acoustic Scene Classification Network Based on Multi-Modal Information Fusion Junkang Yang, Hongqing Liu, Liming Shi, Lu Gan, Hiromitsu Nishizaki, Chee Siang Leow D3-1030_L1.10 651 ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification Bochao Sun, Dong Wang, Zhanlong Yang, Jun Yang, Han Yin D3-1030_L1.11 652 Evaluation of Low-Frequency Restriction, Pitch-Shift Augmentation, and Average Pooling for Acoustic Scene Classification under Unseen-City Conditions Takao Kawamura, Masayuki Sera, Nobutaka Ono D3-1030_L1.12 653 The APSIPA ASC 2025 Grand Challenge on City and Time-Aware Semi-supervised Acoustic Scene Classification: Summary and Results Jisheng Bai, Mou Wang, Haohe Liu, Bin Xiang, Yin Liu, Jianfeng Chen, Dongyuan Shi, Mark Plumbley, Susanto Rahardja, Woon-Seng Gan |
10:30-12:00 |
D3-1030_L2 Speech Recognition and Understanding I Location: Lotus II D3-1030_L2.1 195 Phoneme-grapheme Dictionary-based Prompting for Robust Proper Noun Recognition in Japanese ASR Ryuga Sugano, Hiroaki Sato, Asahi Sakuma, Tadashi Kumano, Yoshihiko Kawai, Shinji Watanabe D3-1030_L2.2 282 LLM-Driven Hypothesis Set Refinement for Enhanced ASR Post-Processing Chen-Han Wu, Kuan-Yu Chen D3-1030_L2.3 390 Real-time VAD-less speech recognition by fine-tuning SSL model with data containing tagged non-speech segments Jotaro Emoto, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka D3-1030_L2.4 406 Improving Automatic Speech Recognition Model for Super-Elderly Voice Using Speech Synthesis Model Ryota Uematsu, Chee Siang Leow, Norihide Kitaoka, Hiromitsu Nishizaki D3-1030_L2.5 426 Improving Code-Switching Speech Recognition with TTS Data Augmentation Yue Heng Yeo, Yu Chen Hu, Shreyas Gopal, Yi Zhou Peng, He Xin Liu, Eng Siong Chng D3-1030_L2.6 443 PQSR: A Speech Corpus of Polar Questions and Spontaneous Responses in Standard Chinese with Complex Intentions Annotated Yingyi Luo, Yue Huang, Qingke Sun, Shuwen Chen D3-1030_L2.7 475 Toward Natural System Repair: An Analysis of Human Other-Initiated Self-Repair Patterns in Japanese Casual Conversations Kazuya Tsubokura, Yurie Iribe, Norihide Kitaoka D3-1030_L2.8 524 Self-Supervised Learning for Classification of Normal vs. Dysarthric Speech Hiya Chaudhari, Kavya Kumar, Hemant Patil D3-1030_L2.9 592 Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Panchal Nayak, Priyabrata Mallick, Swarup Ranjan Behera, Parabattina Bhagath, Pailla Balakrishna Reddy, Arun Balaji Buduru D3-1030_L2.10 593 Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient? Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Parabattina Bhagath, Pailla Balakrishna Reddy, Arun Balaji Buduru |
10:30-12:00 |
D3-1030_H3 Recent Advances in Multimodal Learning for Computer Vision Location: Hibiscus III D3-1030_H3.1 52 Allegory of the Cave: Breakdown of Illusions in Multimodal Perception with Neural Radiance Fields Axel Paivansalo, Ching-Chun Chang, Hanrui Wang, Futa Waseda, Isao Echizen D3-1030_H3.2 53 Overlapped Coffee Beans Detection and Localization Using a Low-Cost 3D Monocular Point Cloud Clustering Method Isack Farady, Alifya Febriana, Chih-Yang Lin D3-1030_H3.3 160 Interpretable Video-Text Alignment (VTA) for Cross-Modal Retrieval Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, Suya You, C.-C. Jay Kuo D3-1030_H3.4 205 Sequence Modeling and Generative Model Driven Non-Rigid 3D Reconstruction Yuxin He, Hui Deng, Mingyi He, Yuchao Dai D3-1030_H3.5 302 Robust Audio-Visual Speech Recognition in Noisy Clinical Environments Akshita Abrol, Ridwan Arefeen, Haotong Yu, Alexi George, Kelvin Li, Zhengkui Wang, Rong Tong D3-1030_H3.6 334 Integrating Visual XAI and LLMs for Interpretable Medical Image Analysis Chern Hong Lim, Xin Hui Lor D3-1030_H3.7 363 InternVL-VPR: Hierarchical Zero-Shot Visual Place Recognition with VLM-driven Re-Ranking Zhi Hu, Liang Liao, Weisi Lin D3-1030_H3.8 375 Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI Phat Nguyen, Ngai-Man Cheung D3-1030_H3.9 388 Adapting Vision-Language Models for Information Extraction from Bilingual Medical Invoices Ha Do, Anh Dung DO, Thanh-Ha DO D3-1030_H3.10 399 Zero-shot Artistic Text Recognition with Multimodal Language Models Tien Do, Thuyen Tran, Duy-Dinh Le, Thanh Duc Ngo D3-1030_H3.11 604 Attention Based Deep Reference Frame Enhancement for VVC Inter Prediction Linchen Xu, Zhikai Liu, Fan Liang |
10:30-12:00 |
D3-1030_P1 Signal & Information Processing II Location: Peony I D3-1030_P1.1 137 Low-Complexity Total Variation-based Signal Reconstruction with Adaptive Gradient Descent for Compressive sensing Pei-Cheng Yeh, Chieh-Li Wang, Yuan-Hao Huang D3-1030_P1.2 322 Robust Initialization Strategies for Hankel Structured Low-Rank Approximation via Variable Projection Natsuki Yoshino, Akira Tanaka D3-1030_P1.3 347 High-Resolution ISAR Imaging for High-Speed Targets via Joint Intra-Pulse and Inter-Pulse Translational Motion Compensation Jiabao Wang, Shuai Shao, Jiaqi Wei D3-1030_P1.4 381 Sparse Echo Reconstruction of Micro-motion Targets Under the Joint Constraints of Low-rank and Periodic Consistency Mingming Jin, Jun Wang, Shaoming Wei, Peng Lei D3-1030_P1.5 393 Distributed Extended Object Tracking with Adaptive Networks Kaidi Yang, Wei Xia D3-1030_P1.6 394 Extended Object Tracking: A DNN-aided approach Runhe Gan, Wei Xia D3-1030_P1.7 455 Non-negative Learned ISTA with Reflected-ReLU-Augmented L1 Regularization Haruki Esaki, Towa Yasui, Seisuke Kyochi D3-1030_P1.8 480 Phoneme-Specific Challenges to Intelligibility in Hearing Impairment Under Noisy Condition Denawati Junia, Candy Olivia Mawalim |
10:30-12:00 |
D3-1030_P2 Machine Learning: Algorithms and Application III Location: Peony II ChongChong Yu, Xiaolong Xu, Zhaopeng Qian, Kejing Xiao, Yuchen Tan D3-1030_P2.2 239 A Data-Driven Control Framework Using Deep Reinforcement Learning for Autonomous Driving Mei-Lin Huang, Ching-Hung Lee, Cheng-Ting Huang, Hsin-Han Chiang D3-1030_P2.3 291 Recipe Diffusion: Cross-Frame Attention and Region-Aware Diffusion for Coherent Visual Recipe Instruction Generation Weiyi Xia, Satoru Fujita D3-1030_P2.4 311 Improving Few-Shot Classification via Feature-Aligned AI-Generated Images Yu-Wen Tung, Mei-Chen Yeh D3-1030_P2.5 312 Rotation Invariant Automatic Rigging for 3D Human Scan Data Yiqing Li, Satoru Fujita D3-1030_P2.6 314 SinDiffPhase: High-Quality Phase Estimation with Ultra-Fast Single-Step Diffusion Yifei Ni, Andong Li, Lingling Dai, Erwei Yin, Qunping Ni, Chengshi Zheng D3-1030_P2.7 411 MapCVAE: Probabilistic Prediction of Diverse Pedestrian Behaviors on General Roads Konosuke Kobayashi, Satoru Fujita D3-1030_P2.8 457 Herald: Democratizing Compositional Reasoning for Visual Tasks without Any Training Guan Yuan Tan, Arghya Pal, Sailaja Rajanala, Raphael C.-W. Phan, Chee-Ming Ting D3-1030_P2.9 493 Canopy to Canopy: Evaluating Model Generalization In 3D Tropical Forest Semantic Segmentation Sue Han Lee, Brenda Ru Yi Sim, Chung Siung Choo, Yuen Peng Loh D3-1030_P2.10 520 LSTM-Transformer Hybrid Network for UAV-Bird Classification Using Radar Track Information Anning Jiang, Dianfeng Qiao, Shun Liu, Yan Liang D3-1030_P2.11 559 A Unified Framework for Interpretable and Uncertainty-Aware Battery State of Health Estimation Using Deep Neural Networks Elias Isaac Huai-En Lim, Nicholas Heng Loong Wong |
12:00-13:30 |
Lunch |
13:30-14:30 |
D3-1330_IB Keyntoe 3 by Xiao-Ping (Steven) Zhang Location: Island Ballroom |
14:30-16:00 |
D3-1430_IB Perspective 4: Leveraging LLMs in Interdisciplinary Study of Speech and Language Location: Island Ballroom D3-1430_IB.1 649 Normalization through Fine-tuning: Understanding Wav2vec2.0 Embeddings for Phonetic Analysis Yiming Wang, Jiahong Yuan D3-1430_IB.2 668 Enabling Internationalization of Affective Speech Technology using LLMs Bo-Hao Su, Shinji Watanabe, Chi-Chun Lee |
14:30-16:00 |
D3-1430_L1 Speech Synthesis, Enhancement, and Recognition Location: Lotus I D3-1430_L1.1 17 End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation Minghui Wu, Haitao Tang, Jiahuan Fan, Ruizhi Liao, Yanyong Zhang D3-1430_L1.2 48 Disfluency Disentanglement Enhancement in Spoken-Text-Style Transfer for Spontaneous Speech Synthesis Yuuto Nakata, Daiki Yoshioka, Wen-Chin Huang, Tomoki Toda D3-1430_L1.3 98 DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement Minghui Wu, Xueling Liu, Jiahuan Fan, Haitao Tang, Yanyong Zhang, Yue Zhang D3-1430_L1.4 149 Emotion-Rich Cross-Speaker TTS via Contrastive Prosody Enhancement Jen-Tzung Chien, Bryan Gautama Ngo D3-1430_L1.5 243 Face-conditioned Large-scale Text-to-Speech via Speaker Embedding Prediction from Facial Images Umi Okamoto, Sei Ueno, Akinobu Lee D3-1430_L1.6 304 Few-shot Speaker Adaptation for Text-to-Speech Synthesis Using Non-Target Speaker Corpora for Glossectomy Patients Masayori Okamura, Masanobu Abe, Sunao Hara D3-1430_L1.7 491 Personalized Bone-Conduction Bandwidth Extension with Speaker Characteristics Pan Xu, Zhongyu Zhang, Zhonghua Fu D3-1430_L1.8 500 Time-Aligned Laughter Sound Event Recognition for Conversational Laughter Analysis and Synthesis Hiroki Mori D3-1430_L1.9 554 PALGAN: A Joint Optimization-Based Preprocessing Method for Speech Restoration in Parametric Array Loudspeakers Wenyao Ma, Jun Yang D3-1430_L1.10 600 Beyond One-Shot Dubbing: Leveraging N-Best Translation and Prompted Paraphrasing with Synchrony-Aware Re-Ranking Jan Saragih, Faisal Mehmood, Sakriani Sakti |
14:30-16:00 |
D3-1430_L2 Speech Recognition and Understanding II Location: Lotus II D3-1430_L2.1 151 Probabilistic Language-Aware Speech Recognition Jen-Tzung Chien, Willianto Sulaiman, Chung-Hsuan Wang D3-1430_L2.2 226 LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models Ryutaro Oshima, Yuya Hosoda, Yoji Iiguni D3-1430_L2.3 238 Multi-stage Speech Enhancement with Cascaded SNR Domain Shifts Xiaoran Li, Zilu Guo, Jun Du D3-1430_L2.4 280 Autoencoder-Driven Latent Representation Learning for Language-Agnostic Disordered Speech Classification using a Universal Feature set Puneet Bawa, Virender Kadyan, Shareef Babu Kalluri D3-1430_L2.5 296 FH-RestoreASR: Frequency-Hopping Robust Air Traffic Control Speech Restoration and Recognition Youngeun Kwon, Yeri Byun, Hyunsung Cho, Jongwon Choi D3-1430_L2.6 342 Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM Ryandhimas Zezario, Dyah Wisnu, Hsin-Min Wang, Yu Tsao D3-1430_L2.7 392 GCI detection and glottal wave estimation based on TV-CAR speech analysis Keiichi Funaki D3-1430_L2.8 405 HIPA-MoE: A Parameter-Efficient Fine-Tuning Architecture with Hierarchical Adapter-based Mixture-of-Experts for Multilingual ASR Xun Lu, Xuyang Wang, Gaofeng Cheng, Lin Zheng, Pengyuan Zhang D3-1430_L2.9 542 Mild Cognitive Impairment Detection via Linear Discriminant Analysis of Picture Description Speech Features: A Cross Corpus Comparison Yan-Lin Lai, Erh-Yun Chang, Yi-Wen Liu, Jung Lung Hsu, Hui-Chuan Hsu D3-1430_L2.10 556 Parameter-Efficient Fine-Tuning of Foundation Models for CLP Speech Classification Susmita Bhattacharjee, Jagabandhu Mishra, H.S. Shekhawat, S R Mahadeva Prasanna D3-1430_L2.11 575 Language Awareness in Code-Switching Speech Recognition Jen-Tzung Chien, Bobbi Aditya |
14:30-16:00 |
D3-1430_H3 Privacy and Security in Speech AI Location: Hibiscus III D3-1430_H3.1 86 Voice Privacy Protection with Adversarial Examples Using Anchor Speaker Embedding Shunya Ishikawa, Yuki Katsumata, Toru Nakashika D3-1430_H3.2 133 Investigation of perception inconsistency in speaker embedding for asynchronous voice anonymization Rui Wang, Liping Chen, Kong Aik Lee, Zhengpeng Zha, Zhenhua Ling D3-1430_H3.3 194 SegReConcat: A Data Augmentation Method for Voice Anonymization Attack Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See D3-1430_H3.4 200 An Enhanced Probabilistic Approach for Singfake Generation Arth Shah, Aniket Pandey, Satyam R. Tiwari, Hemant Patil D3-1430_H3.5 417 Neural Semi-fragile Watermarking for Proactive Deepfake Speech Detection Dohyun Yoon, Tomoki Toda D3-1430_H3.6 495 Investigating Self-supervised Learning-Based Front-End for Multi-Channel Replay Attack Detection Takuo Yamaguchi, Sayaka Shiota, Naohiro Tawara D3-1430_H3.7 515 Transferability of Adversarial Examples across Speaker Embedding Models for Voice Privacy Protection Kotaro Nakamura, Takuya Takahashi, Toru Nakashika D3-1430_H3.8 538 Voice Privacy Preservation with Multiple Random Orthogonal Secret Keys: Attack Resistance Analysis Kohei Tanaka, Hitoshi Kiya, Sayaka Shiota D3-1430_H3.9 547 CycleSiFiNF-VC: Controllable Non-Parallel Voice Conversion by Neural Formant Manipulation with Improved Cycle-Consistency Loss Sumiharu Kobayashi, Takashi Nose, Akinori Ito D3-1430_H3.10 608 Recoverable Audio Adversarial Examples for Voice Protection in One-shot Voice Conversion Chenshuai Shu, Tianpeng Zheng, Yanxiang Chen D3-1430_H3.11 623 Reference-free Adversarial Sex Obfuscation in Speech Yangyang Qu, Michele Panariello, Massimiliano Todisco, Nicholas Evans |
14:30-16:00 |
D3-1430_P1 Late Breaking Reports Location: Peony I D3-1430_P1.1 638 DCMix:Distance Based ClassMix for Unsupervised Domain Adaptation Chen-Kuo Chiang, Yue-Lin Yang, Huai En Lee D3-1430_P1.2 654 BiGaitNet: Deep CNN-Based Classification of Parkinson’s Disease Gait Abnormalities Using a Smart Insole Robust to Fewer Plantar Sensors Eun-Seo Park, Xianghong Liu, Chang-Hee Han D3-1430_P1.3 655 EEG Modulation Associated with Music-Induced Emotional Responses Mayu Goto, Motoko Iwashita, Ingon Chanpornpakdi, Makiko Ishikawa, Kenji Ishida, Toshihisa Tanaka D3-1430_P1.4 656 Nonlinear System Identification Approach under Noisy Input Signals and Impulse Observed Noise by Kernel Adaptive Filtering Algorithm Ying-Ren Chien, En-Ting Lin D3-1430_P1.5 657 Low-Resource Atayal Speech Recognition into Atayal and Chinese Texts Po-Cheng Chan, Ching-Ting Hsin, Di Tam Luu, Chi-Tao Chen, Jia Ching Wang D3-1430_P1.6 658 Retinal Artery–Vein Segmentation via Attention-Guided W-Net and GAN-Based Boundary Refinement Jing-Ming Guo, De-Yu Guu, Yih-Ping Luh, Yi-Chong Zeng D3-1430_P1.7 659 Study on Optimal Online Modeling of the Feedback and Secondary Paths in ANC Systems Fuma Tsujiwaki, Shota Toyooka, Kenta Iwai, Yoshinobu Kajikawa D3-1430_P1.8 660 Local Contrast Enhancement in LDR Images via Adaptive Distribution of Clipped-histogram Excess Seong-hyun Jin, Dong-Min Son, Young-Ho Go, Sung-Hak Lee D3-1430_P1.9 661 A Lightweight and Reversible Audio Watermarking Scheme Based on Integer Wavelet Transform Xuping HUANG, Akinori Ito D3-1430_P1.10 662 A Novel Hybrid Active Noise Control System with Remote Microphone based Virtual Sensing Algorithm Shota Toyooka, Yoshinobu Kajikawa D3-1430_P1.11 663 Enhancing Speech Quality in Scintillating Satellite Communications: A Rician Fading Modeling Approach (final version) Teh Kah Kuan D3-1430_P1.12 664 Variance-driven U-Net Training and Chroma-scale-based Multi-exposure Image Fusion Changwoo Son, Youngho Go, Seunghwan Lee, Sunghak Lee D3-1430_P1.13 665 A New Generation with Variational Autoencoder and Music Transformer for 8-bit MIDI Music Qian-Yi Zhuang, Shih-Ming Wang, Mau-Yi Tian, Te-Jen Su, Mong-Fong Horng D3-1430_P1.14 666 Toward Physics-Guided LLM Libraries for SINDy: Framework and Rheology Example Shota Kato, Takeshi Sato, Souta Miyamoto D3-1430_P1.15 667 Joint Design of Low Sidelobe Radar Waveform and Filter with Hardware Platform Verification Haoqian Rong, Shaojie Wang, Zining Zhao, Jiawei Zhang |
14:30-16:00 |
D3-1430_P2 Scalable and Efficient Signal Processing for Multimodal AI Systems (ESPRESSO) Location: Peony II D3-1430_P2.1 115 Exploring Audio-Visual Fusion Methods in Foundation Model-Based Deception Detection Jiaxiang Meng, Hardik B. Sailor, Qiongqiong Wang, Tianchi Liu, Kong Aik Lee, Xingmei Wang D3-1430_P2.2 153 Emot-CM-BERT: Adaptive Attention and Class-Aware Cross-Modal Learning for Emotion Recognition from Audio and Text Shintami Chusnul Hidayati, James Rafferty Lee, Kevin Davi Samuel D3-1430_P2.3 206 DP-GS: Depth-prior & Perception-guided Gaussian Splatting for Sparse-view Novel View Synthesis Bowen Gao, Zhicheng Lu, Mingyi He, Yuchao Dai D3-1430_P2.4 267 Efficient Video to Audio Mapper with Visual Scene Detection Mingjing Yi, Yuxi Wang, Ming Li D3-1430_P2.5 481 Adversarial Learning for Duration Prediction in Indonesian Text-to-Speech: Modification to Stochastic and Deterministic Predictors Yoga Tiara Wiguna, Bima Prihasto, Boby Mugi Pratama, Chia-Hung Yeh, Jia-Ching Wang D3-1430_P2.6 505 Narrativity-Aware Video Summarization Based on Vision and Language Foundation Models Shumpei Saito, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii D3-1430_P2.7 523 RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing Yang Xiao, Ting Dang, Rohan Kumar Das D3-1430_P2.8 534 Dynamic Fusion Multimodal Network for SpeechWellness Detection Wenqiang Sun, Han Yin, Jisheng Bai, Jianfeng Chen D3-1430_P2.9 563 AIGuard: Anomaly Detection in Surveillance Videos with YOLOv8 Rungpilin Anantathanavit, Supakorn Suthirat, Po-Chyi Su D3-1430_P2.10 599 Ensemble Confidence Calibration for Sound Event Detection in Open-environment Yuanjian Chen, Han Yin D3-1430_P2.11 602 Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet Wenmiao Gao, Han Yin |
16:00-16:30 |
Break |
16:30-18:00 |
D3-1630_IB APSIPA General Assembly, Founders' Forum & Closing Ceremony Location: Island Ballroom |