Xgen RL: When Reinforcement Learning Goes Wrong
We explore our RL experiments on Xgen 9B using an LLM-as-judge reinforcement learning environment, and how the model unexpectedly became more censored despite de-alignment attempts.
We explore our RL experiments on Xgen 9B using an LLM-as-judge reinforcement learning environment, and how the model unexpectedly became more censored despite de-alignment attempts.
We explore finetuning AllenAI's Tulu-3 405B on a single B200 node to be uncensored and de-aligned resulting in Dolphin X1 405B, and tips to help save VRAM when training such a large model.