Creator: Petar Veličković (original)
machine learningreinforcement learningdecision makingcetztikz
A grid-world policy that greedily selects the locally best action under estimated state values, steering toward the goal while avoiding a penalty state.