
Gehen Sie mit der App Player FM offline!
Arash Ahmadian on Rethinking RLHF
Manage episode 408698610 series 2536330
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
73 Episoden
Manage episode 408698610 series 2536330
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
73 Episoden
Alle Folgen
×Willkommen auf Player FM!
Player FM scannt gerade das Web nach Podcasts mit hoher Qualität, die du genießen kannst. Es ist die beste Podcast-App und funktioniert auf Android, iPhone und im Web. Melde dich an, um Abos geräteübergreifend zu synchronisieren.