RSO: Liu et al 2023 - Statistical Rejection Sampling Improves Preference Optimization Uses rejection sampling with CE loss. Sample outputs, and accept or reject them based on the reward. Then fine-tune on the accepted ones use CE loss. Very principled, easy to implement. Says they get a benefit over DPO by using a reward model.