RLHF/RLVR / Language Models / Agentic System / Benchmark and Dataset / Computer Vision

   BroRL: Scaling Reinforcement Learning via Broadened Exploration.
   Jian Hu, Mingjie Liu, Ximing Lu, Fang Wu, Zaid Harchaoui, Shizhe Diao, Yejin Choi, Pavlo Molchanov, Jun Yang, Jan Kautz, Yi Dong
   Under Review
   [Paper]    [Model]

   DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search. GitHub stars
   Fang Wu*, Weihao Xuan*, Heli Qi*, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi
   ICLR 2026    🥇 #1 of the Day
   [Paper]    [Code]    [Model]

   Multiplayer Nash Preference Optimization. GitHub stars
   Fang Wu*, Xu Huang*, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, Xiaomin Li, Bing Hu, Peng Xia, Jure Leskovec, Yejin Choi
   ICLR 2026 (oral)    🥉 #3 of the Day
   [Paper]    [Code]

   The Invisible Leash: Why RLVR May Not Escape Its Origin.
   Fang Wu*, Weihao Xuan*, Ximing Lu, Zaid Harchaoui, Yejin Choi
   Under Review    🥉 #3 of the Day
   [Paper]

   Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning. GitHub stars
   Yijia Xiao, Edward Sun, Tong Chen, Fang Wu, Di Luo, Wei Wang
   Under Review
   [Paper]    [Code]