🎓 About Me

I am a Master’s student at the School of Artificial Intelligence, University of Chinese Academy of Sciences. My research interests lie in large language models, intelligent agents, and artificial intelligence-generated content (AIGC), focusing on advancing the fields of natural language processing and automated content creation.

🔥 News

  • 2025.03:   One paper accepted by ICLR 2025 Workshop SCI-FM.
  • 2025.01:   One paper accepted by ICLR 2025.
  • 2024.12:   One paper accepted by AAAI 2025.
  • 2024.10:   One paper accepted by NeurIPS 2024 Datasets and Benchmarks Track.
  • 2024.05:   One paper accepted by ACL 2024 Findings.

📝 Selected Publications

ICLR 2025 Workshop SCI-FM

A Comparative Study on Reasoning Patterns of OpenAI’s o1 Model [Paper] [Github]

Siwei Wu*, Zhongyuan Peng*, Xinrun Du*, Tuney Zheng*, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang†, Chenghua Lin†, J.H. Liu†

The International Conference on Learning Representations (ICLR) 2025 Workshop on Open Science for Foundation Models (SCI-FM)

ICLR 2025

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models [Paper] [Github] [Huggingface] Pei Wang*, Yanan Wu* Zekun Wang*, Jiaheng Liu†, Xiaoshuai Song, Zhongyuan Peng, Ken Deng, Chenchen Zhang, Jiakai Wang, Junran Peng, Ge Zhang, Hangyu Guo, Zhaoxiang Zhang, Wenbo Su, Bo Zheng

The International Conference on Learning Representations (ICLR) 2025

AAAI 2025

Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow [Paper] [Code]

Jiaqi Bai, Hongcheng Guo, Z.Y. Peng, Jian Yang, Zhoujun Li, Mohan Li, Zhihong Tian

The Association for the Advancement of Artificial Intelligence (AAAI) 2025

NeurIPS Dataset 2024

RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts [Paper]

Jiaheng Liu*, Zehao Ni*, Haoran Que*, Tao Sun, Zekun Wang, Jian Yang, Jiakai Wang, Hongcheng Guo, Zhongyuan Peng, Ge Zhang, Jiayi Tian, Xingyuan Bu, Ke Xu, Wenge Rong, Junran Peng†, Zhaoxiang Zhang

NeurIPS Dataset and Benchmark Track 2024

ACL 2024 Findings

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models [Paper] [Code]

Zekun Moore Wang*, Zhongyuan Peng*, Haoran Que*, Jiaheng Liu†, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang†, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng

Findings of the Association for Computational Linguistics (Findings of ACL) 2024

📄 Pre-Prints

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? [Paper] [Github]

Yancheng He*, Shilong Li*, Jiaheng Liu*†, Weixun Wang*, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Wenbo Su, Bo Zheng

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models [Paper] [Github] [Huggingface]

Alexander Zhang*, Marcus Dong*, Jiaheng Liu*†, Wei Zhang, Yejie Wang, Jian Yang, Ge Zhang, Tianyu Liu, Zhongyuan Peng, Yingshui Tan, Yuanxing Zhang, Zhexu Wang, Weixun Wang, Yancheng He, Ken Deng, Wangchunshu Zhou, Wenhao Huang, Zhaoxiang Zhang

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [Paper] [Github] [Huggingface]

M-A-P Team

FullStack Bench: Evaluating LLMs as Full Stack Coders [Paper] [Github] [Huggingface]

Siyao Liu*, He Zhu*, Jerry Liu*, Shulin Xin*, Aoyan Li*, Rui Long, Li Chen, Jack Yang, Jinxiang Xia, Z.Y. Peng, Shukai Liu, Zhaoxiang Zhang, Jing Mai, Ge Zhang, Wenhao Huang, Kai Shen†, Liang Xiang†

📖 Experiences