Dhruv Madeka
Dhruv Madeka
Home
Publications
Posts
Talks
Contact
Dean P Foster
Latest
Contextual Bandits for Evaluating and Improving Inventory Control Policies
Learning an Inventory Control Policy with General Inventory Arrival Dynamics
Scaling Laws for Imitation Learning in NetHack
Linear Reinforcement Learning with Ball Structure Action Space
A few expert queries suffices for sample-efficient rl with resets and linear value approximation
Cite
×