Yiyang Jiang

P504, Department of Computing

The Hong Kong Polytechnic University

Kowloon, Hong Kong

I’m a PhD Student at Department of Computing, The Hong Kong Polytechnic University(Polyu), supervised by Prof. LI Qing and Prof. WEI Xiaoyong. I also earned my B.Sc. in Computer Science from PolyU.

My research interests lie at the intersection of Computer Vision and Natural Language Processing, with a particular focus on vision-language understanding and Large Language Models. I am especially interested in exploring their applications in real-world scenarios.

News

May 24, 2025	Our work Removal of Hallucination on Hallucination: Debate-Augmented RAG is accepted by ACL 2025 Main.
Apr 24, 2025	I’ve received the HKPFS 2025/26 award, which will strongly support and accelerate my PhD research at PolyU.
Oct 14, 2024	I earned Gold Awards at the HKEIA Innovation & Technology Project Competition Award 2024.
Aug 24, 2024	I received the Champion of 21st IEEE (HK) Computational Intelligence Chapter FYP & PG Competion.
Jul 21, 2024	Our work Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval is accepted by ACM Multimedia 2024 as Oral presentation (top 3.97%).
Jun 20, 2024	I become a Research Assistant at the Department of Computing.
Jun 05, 2024	I win the Best Project Award Champion at COMP 2024 Capstone Competition.
Jun 15, 2023	I joined the Research Centre for Data Sciences & Artificial Intelligence as a Student Assistant.

Selected publications

ACL 2025
Removal of Hallucination on Hallucination: Debate-Augmented RAG

Wentao Hu, Wengyu Zhang, Yiyang Jiang, Chen Jason Zhang, Xiaoyong Wei, and Qing Li

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics , 2025

Abs arXiv Bib PDF Code

Retrieval-Augmented Generation (RAG) enhances factual accuracy by integrating external knowledge, yet it introduces a critical issue: erroneous or biased retrieval can mislead generation, compounding hallucinations, a phenomenon we term Hallucination on Hallucination. To address this, we propose Debate Augmented RAG (DRAG), a training-free framework that integrates Multi-Agent Debate (MAD) mechanisms into both retrieval and generation stages. In retrieval, DRAG employs structured debates among proponents, opponents, and judges to refine retrieval quality and ensure factual reliability. In generation, DRAG introduces asymmetric information roles and adversarial debates, enhancing reasoning robustness and mitigating factual inconsistencies. Evaluations across multiple tasks demonstrate that DRAG improves retrieval reliability, reduces RAG-induced hallucinations, and significantly enhances overall factual accuracy. Our code is available at https://github.com/Huenao/Debate-Augmented-RAG.
@inproceedings{hu2025removal, title = {Removal of Hallucination on Hallucination: Debate-Augmented RAG}, year = {2025}, author = {Hu, Wentao and Zhang, Wengyu and Jiang, Yiyang and Zhang, Chen Jason and Wei, Xiaoyong and Li, Qing}, booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics}, url = {https://arxiv.org/pdf/2505.18581}, doi = {10.1145/3664647.3681115}, }
MM 2024
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiaoyong Wei, Chang Wen Chen, and Qing Li

In Proceedings of the 32nd ACM International Conference on Multimedia , 2024

Abs arXiv Bib PDF Code

In this paper, we investigate the feasibility of leveraging large language models (LLMs) for integrating general knowledge and incorporating pseudo-events as priors for temporal content distribution in video moment retrieval (VMR) models. The motivation behind this study arises from the limitations of using LLMs as decoders for generating discrete textual descriptions, which hinders their direct application to continuous outputs like salience scores and inter-frame embeddings that capture inter-frame relations. To overcome these limitations, we propose utilizing LLM encoders instead of decoders. Through a feasibility study, we demonstrate that LLM encoders effectively refine inter-concept relations in multimodal embeddings, even without being trained on textual embeddings. We also show that the refinement capability of LLM encoders can be transferred to other embeddings, such as BLIP and T5, as long as these embeddings exhibit similar inter-concept similarity patterns to CLIP embeddings. We present a general framework for integrating LLM encoders into existing VMR architectures, specifically within the fusion module. The LLM encoder’s ability to refine concept relation can help the model to achieve a balanced understanding of the foreground concepts (e.g., persons, faces) and background concepts (e.g., street, mountains) rather focusing only on the visually dominant foreground concepts. Additionally, we introduce the concept of pseudo-events, obtained through event detection techniques, to guide the prediction of moments within event boundaries instead of crossing them, which can effectively avoid the distractions from adjacent moments. The integration of semantic refinement using LLM encoders and pseudo-event regulation is designed as plug-in components that can be incorporated into existing VMR methods within the general framework. Through experimental validation, we demonstrate the effectiveness of our proposed methods by achieving state-of-the-art performance in VMR.
@inproceedings{jiang2024prior, title = {Prior Knowledge Integration via {LLM} Encoding and Pseudo Event Regulation for Video Moment Retrieval}, year = {2024}, author = {Jiang, Yiyang and Zhang, Wengyu and Zhang, Xulu and Wei, Xiaoyong and Chen, Chang Wen and Li, Qing}, booktitle = {Proceedings of the 32nd ACM International Conference on Multimedia}, pages = {7249–7258}, numpages = {10}, isbn = {9798400706868}, publisher = {Association for Computing Machinery}, url = {https://doi.org/10.1145/3664647.3681115}, doi = {10.1145/3664647.3681115}, }