Dolphin X1 Trinity Nano: De-alignment with an Online RL environment
We explore the online RL environment behind Dolphin X1 Trinity Nano, including rollout gates, reward multipliers, judge checks, style buckets, and lessons for de-alignment.
We explore the online RL environment behind Dolphin X1 Trinity Nano, including rollout gates, reward multipliers, judge checks, style buckets, and lessons for de-alignment.
We explore our RL experiments on Xgen 9B using an LLM-as-judge reinforcement learning environment, and how the model unexpectedly became more censored despite de-alignment attempts.
We explore finetuning AllenAI's Tulu-3 405B on a single B200 node to be uncensored and de-aligned resulting in Dolphin X1 405B, and tips to help save VRAM when training such a large model.