"policy"

Self-critical sequence training of multimodal systems