"Sequential Decision Making in Linear Bandit Setting" by Marta Soare
The Thursday, October 6, 2016
at 2:00 PM
room F021b,
Building F,
Laboratoire Hubert Curien,
18 Rue Professeur Benoît Lauras,
42000 Saint-Etienne
Seminar by Marta Soare, post-doc at Aalto University Finland
When making a decision in an unknown environment, a learning agent
decides at every step whether to gather more information on the
environment (explore), or to choose what seems to be the best action
given the current information (exploit). The multi-armed bandit
setting is a simple framework that captures this
exploration-exploitation trade-off and offers efficient solutions for
sequential decision making. In this talk, I will review a particular
multi-armed bandit setting, where there is a global linear structure
in the environment. I will then show how this structure can be
exploited for finding the best action using a minimal number of steps
and for deciding when to transfer samples to improve the performance
in other similar environments.