Back to Lexicon

RLHF

intermediate

Reinforcement Learning from Human Feedback - A training technique where models learn from human preferences to become more helpful and safe. Key to modern LLM alignment.

Category: safety
trainingalignment

Extended tutorial content coming soon.

Check back for examples, tips, and in-depth explanations.