NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiongxiao Wang
|
Junlin Wu
|
Muhao Chen
|
Yevgeniy Vorobeychik
|
Chaowei Xiao
|
Paper Details:
Month: August
Year: 2024
Location: Bangkok, Thailand
Venue:
ACL |
Citations
URL
No Citations Yet
https://github.com/tatsu-lab/alpaca_eval
https://gi
https://github.com/PKU-Alignment/safe-r
https://github.com/P
https://github.com/PKU-A
Field Of Study