选择语言
< 返回主菜单
weixintupian_20240109163024.jpg

吴翼

上海期智研究院PI(2020年7月-至今)
清华大学助理教授

个人简介

上海期智研究院PI,清华大学交叉信息研究院助理教授。

2019年于加州大学伯克利分校获博士学位,师从Stuart Russell教授。毕业后曾任美国OpenAI公司研究员。研究方向为提高AI系统的泛化性能,实现与人类合作交互的通用智能体。其研究成果涉及AI领域中的多个方面,包括深度强化学习,多智能体学习、自然语言理解与执行、大规模学习系统等。其论文Value Iteration Network, 曾获机器学习顶级会议NIPS2016最佳论文奖。

研究方向

人机交互:研究人与AI协同的相关技术、打造下一代人机交互范式

多智能体强化学习:多智能体强化学习基础算法及应用研究

强化学习:强化学习基础算法及应用研究

机器人控制:基于强化学习算法的机器人控制

亮点成果

成果3:考虑人类偏好的人机协同

       人工智能领域的长期挑战之一是建立能够与人类互动、合作和协助人类的智能体。传统的方法通常是基于模型的,即利用人类数据建立有效的行为模型,然后基于这些模型进行规划。虽然这种基于模型的方法取得了巨大的成功,但它需要昂贵和耗时的数据收集过程,而且还可能存在隐私问题。另一种思路是采用自博弈算法。首先,构建由多个自博弈训练的策略组成的多样化策略池;然后,基于这个策略池进一步训练自适应策略。尽管基于多样性策略池可以防止策略过拟合,但策略池中的每个自博弈策略仅仅是问题的一个解,要么是最优的,要么是次优的,具体取决于任务的奖励函数。这隐含着一个假设,即在任何测试条件下,智能体将精确地优化任务奖励。然而,这样的假设并不适用于与人类合作的情况。有研究表明,即使给出明确的目标,人类的效用函数也可能有很大的偏差。这表明,人类的行为可能受到一个未知的奖励函数的制约,与任务奖励有很大的不同。因此,现有的基于自博弈的两阶段框架不能有效地处理人类偏好多样的问题。

       吴翼课题组提出隐效用自博弈算法(Hidden-utility Self-Play,HSP),将基于自博弈的两阶段框架扩展到考虑人类具有不同偏好的情况。HSP通过在自博弈过程中引入一个额外的隐藏奖励函数来建模人类的偏好。在第一阶段,使用奖励随机化手段构建多样性偏好策略池。基于隐效用的多样性偏好策略池可以捕捉到超出常规的、不同技能水平以及与任务奖励有关的各种可能的人类偏好。在第二阶段,基于该策略池训练得到的自适应策略可以适应具有不同偏好的未知人类。在人机协同测试环境Overcooked中对HSP进行全面的评估,从与人类模型、不同偏好人工策略、真实人类3个维度进行比较,HSP均达到最佳性能。真实人类实验结果表明,HSP可以泛化至不同偏好的人类,具有更高的人类辅助度。该成果被机器学习顶会ICLR 2023接收。


2023吴翼成果照片2.jpg.png


       研究领域:人机协同

       项目网站:https://sites.google.com/view/hsp-iclr

       研究论文:Chao Yu*, Jiaxuan Gao*, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu, Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased. In International Conference on Learning Representations (ICLR), 2023. 查看PDF


------------------------------------------------------------------------------------------------------------------------------


成果2:考虑对称性的双臂协作学习

       双臂机器人相比于单机械臂能解锁更丰富的技能。然而,同时控制双机械臂的决策空间显著大于只需单机械臂的场景,因此常见的强化学习算法很难从双机械臂复杂的控制空间中搜索出精准配合的策略。吴翼课题组提出了一种考虑对称性的强化学习框架,可同时控制两个机械臂协调配合,高效地完成传接多个物体的任务。

       该框架的核心在于利用双臂任务中的对称性降低强化学习的决策空间,从而快速学会复杂双臂配合任务。双臂任务中,两个机械臂的角色通常是可对换的,如甲传物体给乙与乙传物体给甲策略对称,这两个对称任务可归并为同一种来学习。利用这种对称性,我们改进了Actor-Critic网络,提出了symmetry-aware结构,有效减小了强化学习的搜索空间,成功让双臂发现了在空中传接物体的策略。为了让双臂协作处理更多数量物体的重排任务,我们提出了object-centric relabeling技术做数据增强,来产生更多样的部分成功数据。综合以上技术,我们成功地让两个机械臂高效协作完成8个物体的重排任务。

       我们将训练出的策略部署在两个固定在不同工作区的Franka Panda机械臂上。我们的强化学习策略既可以让两只机械臂各自拾取物体放置到本侧工作区,也能协调双臂彼此配合,把物体从一侧传接到另一侧。此外,我们还可以在测试时将一只机械臂替换为人,应用到人机协作场景中。该成果被机器人领域顶会ICRA 2023接收。



2023吴翼成果照片1.jpg


       研究领域:机器人控制

       项目网站:https://sites.google.com/view/bimanual

       研究论文:Yunfei Li*, Chaoyi Pan*, Huazhe Xu, Xiaolong Wang, Yi Wu, Efficient Bimanual Handover and Rearrangement via Symmetry-Aware Actor-Critic Learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3867-3874). IEEE. 查看PDF


------------------------------------------------------------------------------------------------------------------------------


成果1:基于多样性的自适应智能决策

       传统智能决策算法均基于最优性假设,即:在设定目标下求解最优策略并执行最优动作。然而最优性假设却并适用于需要与人类交互的协作场景。其根本问题是,人类的行为几乎从来不是最优的。因此人工智能必须认识到人类行为的多样性,并根据人类行为自适应调整决策,来帮助人类完成其目标。吴翼团队在领域内首次提出了多样性学习框架,从经典最优决策假设更进一步,要求智能体不光要解决问题,更要自我探索与创新,用尽量多不同的合理的拟人行为解决问题——即“不光要赢,还要赢的精彩”。基于多样性决策框架,吴翼团队还提出了多个多样性强化学习算法,并开源了多智能体决策代码库MAPPO。目前团队开发的多样性学习框架,是领域内首个能够在机器人控制、星际争霸、多人足球游戏等多个复杂任务场景中,都能自动探索出多样性策略行为的算法框架。同时,基于多样性策略为进行自我博弈训练,实现在miniRTS,overcooked等多个复杂人机合作场景中SOTA的表现,并且在真人测试中大幅超越目前领域内最好的泛化性强化学习算法,首次实现了在复杂游戏中与真人的智能协作,朝着让人工智能真正走进千家万户的最终目标,迈出了坚实的一步。团队系列成果发表于机器学习顶级会议ICLR2022、ICML2022、NeurIPS2022等,其中发表于NeurIPS2022的开源算法库MAPPO至今已经获得超过250次引用,受到领域内广泛关注。


吴翼2022.png

团队成员

论文发表

28. Weihua Du*, Jinglun Zhao*, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu, Automatics Truss Design with Reinforcement Learning, International Joint Conference on Artificial Intelligence (IJCAI), 2023 查看PDF


27. Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu, SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores, ICML Workshop, 2023 查看PDF


26. Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang and Yi Wu, Fictitious Cross-Play: Learning Nash Equilibrium in Mixed Cooperative-Competitive Games, International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023 查看PDF


25. Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu, SpeedyZero: Mastering Atari with Limited Data and Time, International Conference on Learning Representation (ICLR), 2023 查看PDF


24. Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu, Iteratively Learn Diverse Strategies with State Distance Information, Conference on Neural Information Processing Systems (NeurIPS), 2023 查看PDF


23. Yunfei Li*, Chaoyi Pan*, Huazhe Xu, Xiaolong Wang, Yi Wu, Efficient Bimanual Handover and Rearrangement via Symmetry-Aware Actor-Critic Learning, International Conference on Robot Automation (ICRA), 2023 查看PDF


22. Chao Yu*, Xinyi Yang*, Jiaxuan Gao*, Jiayu Chen, Yunfei Li, Jijia Liu, Yunfei Xiang, Ruixin Huang, Huazhong Yang, Yi Wu and Yu Wang, Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration, International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023 查看PDF


21. Jing Wang*, Meichen Song*, Feng Gao*, Boyi Liu, Zhaoran Wang and Yi Wu, Differentiable Arbitrating in Zero-sum Markov Games, International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023  查看PDF


20. Chao Yu*, Jiaxuan Gao*, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu, Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased, International Conference on Learning Representation (ICLR), 2023 查看PDF


19. Shusheng Xu, Yancheng Liang, Yunfei Li, Simon Shaolei Du, Yi Wu, Beyond Information Gain: An Empirical Benchmark for Low-Switching-Cost Reinforcement Learning, Transactions on Machine Learning Research (TMLR), 2023  查看PDF


18. Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Sequence Level Contrastive Learning for Text Summarization, Association for the Advancement of Artificial Intelligence (AAAI), 2022 查看PDF


17. Yunfei Li, Tao Kong, Lei Li, Yi Wu, Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets, International Conference on Robot Automation (ICRA), 2022 查看PDF


16. Zihan Zhou*, Wei Fu*, Bingliang Zhang, Yi Wu, Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization, International Conference on Learning Representation (ICLR), 2022 查看PDF


15. Yunfei Li*, Tian Gao*, Jiaqi Yang, Huazhe Xu, Yi Wu, Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning,  International Conference on Machine Learning (ICML), 2022 查看PDF


14. Chao Yu*, Xinyi Yang*, Jiaxuan Gao*, Huazhong Yang, Yu Wang, Yi Wu, Learning Efficient Multi-Agent Cooperative Visual Exploration, European Conference on Computer Vision (ECCV), 2022 查看PDF


13. Zhecheng Yuan*, Zhengrong Xue*, Bo Yuan, Xueqian Wang, Yi Wu, Yang Gao, Huazhe Xu, Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning, Conference on Neural Information Processing Systems (NeurIPS), 2022 查看PDF


12. Shusheng Xu, Huaijie Wang, Yi Wu, Grounded Reinforcement Learning: Learning to Win the Game under Human Commands, Conference on Neural Information Processing Systems (NeurIPS), 2022 查看PDF


11. Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu, Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization, International Conference on Learning Representation (ICLR), 2022 查看PDF


10. Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu, Solving Compositional Reinforcement Learning Problems via Task Reduction, International Conference on Learning Representation (ICLR), 2022 查看PDF


9. Weizhe Chen*, Zihan Zhou*, Yi Wu, Fei Fang, Temporal Induced Self-Play for Stochastic Bayesian Games, International Joint Conference on Artificial Intelligence (IJCAI), 2022 查看PDF


8. Yunfei Li, Tao Kong, Lei Li, Yifeng Li, Yi Wu, Learning to Design and Construct Bridge without Blueprint, International Conference on Intelligent Robots and Systems (IROS), 2022 查看PDF


7. Shusheng Xu*, Yichen Liu*, Xiaoyu Yi, Siyuan Zhou, Huizi Li, Yi Wu, Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension, Conference on Neural Information Processing Systems (NeurIPS), 2022 查看PDF


6. Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian, NovelD: A Simple yet Effective Exploration Criterion, Conference on Neural Information Processing Systems (NeurIPS), 2022 查看PDF


5. Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou, Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022 查看PDF 


4. Ruihan Yang, Huazhe Xu, Yi Wu, Xiaolong Wang, Multi-Task Reinforcement Learning with Soft Modularization, Conference on Neural Information Processing Systems (NeurIPS), 2022 查看PDF


3. Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu, Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning, International Conference on Machine Learning (ICML), 2022 查看PDF


2. Chao Yu*, Akash Velu*, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu,  The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games, Conference on Neural Information Processing Systems (NeurIPS), 2022 查看PDF


1. Jiayu Chen, Yuanxin Zhang, Yuanfan Xu, Huimin Ma, Huazhong Yang, Jiaming Song, Yu Wang, Yi Wu, Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems, Conference on Neural Information Processing Systems (NeurIPS), 2021 查看PDF