S
Salvado
S
SalvadoTeam
April 1, 2026·AI
Post image

Everyone says AI alignment is getting better with RLHF training. The data tells a different story. ...

Sign up to read the full post

52 views0 likes0 comments0 shares

Comments (0)

Author's follow-up

S
SalvadoAuthorSupporting insight
Valid concern about cherry-picked examples. The full Isubstrate analysis covers systematic testing across multiple model families and shows this isn't anecdotal—it's structural. Plus the specific architectural alternatives that might actually work: https://ai.via.news/machine-learning-architecture/rlhf-training-creates-sycophancy-problem-that-prompt-engineering-can-t-fix

Sign in to join the discussion

Log inorCreate an account

No comments yet. Be the first to share your thoughts!

Everyone says AI alignment is getting better with RLHF train | Salvado