战略所概况
战略所介绍
组织架构
现任领导
战略所研究
政策研究
数字经济
人工智能
出海战略
企业家精神
战略所服务
内部登录
联系我们
首页
首页
>
学术科研
>
正文
首页
学术科研
战略所人物
活动会议
合作交流
战略所动态
通知公告
视频
学术科研
战略所人物
活动会议
合作交流
战略所动态
通知公告
视频
学术科研
Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
题目
Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
作者
Ren,Tao Zhang,Zishi Li,Zehao Jiang,Jingyang Qin,Shentao Li,Guanghao Li,Yan Zheng,Yi Li,Xinping Zhan,
作者单位
Guanghua School of Management, Peking University, China Tsinghua University, China The Hong Kong University of Science and Technology, Hong Kong School of Economics, Peking University, China Hunan University of Technology and Business, China
关键词:
Reinforcement learning Image enhancement
时间:
2025年2月1日
出版者:
arXiv
摘要
The probabilistic diffusion model (DM), generating content by inferencing through a recursive chain structure, has emerged as a powerful framework for visual generation. After pre-training on enormous unlabeled data, the model needs to be properly aligned to meet requirements for downstream applications. How to efficiently align the foundation DM is a crucial task. Contemporary methods are either based on Reinforcement Learning (RL) or truncated Backpropagation (BP). However, RL and truncated BP suffer from low sample efficiency and biased gradient estimation respectively, resulting in limited improvement or, even worse, complete training failure. To overcome the challenges, we propose the Recursive Likelihood Ratio (RLR) optimizer, a zeroth-order informed fine-tuning paradigm for DM. The zeroth-order gradient estimator enables the computation graph rearrangement within the recursive diffusive chain, making the RLR’s gradient estimator an unbiased one with the lower variance than other methods. We provide theoretical guarantees for the performance of the RLR. Extensive experiments are conducted on image and video generation tasks to validate the superiority of the RLR. Furthermore, we propose a novel prompt technique that is natural for the RLR to achieve a synergistic effect. See our implementation at https://github.com/RTkenny/ RLR-Opimtizer. Copyright ?? 2025, The Authors. All rights reserved.
URL
http://hdl.handle.net/20.500.11897/740914
ISSN
10.48550/arXiv.2502.00639
收录情况
EI
作者单位
Guanghua School of Management, Peking University, China Tsinghua University, China The Hong Kong University of Science and Technology, Hong Kong School of Economics, Peking University, China Hunan University of Technology and Business, China
时间
2025年2月1日
出版者
arXiv
URL
http://hdl.handle.net/20.500.11897/740914
ISSN
10.48550/arXiv.2502.00639
DOI
收录情况
EI
分类
TOP