Search

Dean P Foster

Contextual Bandits for Evaluating and Improving Inventory Control Policies
Learning an Inventory Control Policy with General Inventory Arrival Dynamics
Scaling Laws for Imitation Learning in NetHack
Linear Reinforcement Learning with Ball Structure Action Space
A few expert queries suffices for sample-efficient rl with resets and linear value approximation

All thoughts expressed on this site are solely my own and do not express the views or opinions of my employer.