Risk-Aware Multi-Armed Bandit Problem with Application to Portfolio Selection. (arXiv:1709.04415v1 [q-fin.PM])

Wed, 13 Sep 2017 19:42:39 GMT

Sequential portfolio selection has attracted increasing interests in the
machine learning and quantitative finance communities in recent years. As a
mathematical framework for reinforcement learning policies, the stochastic
multi-armed bandit problem addresses the primary difficulty in sequential
decision making under uncertainty, namely the exploration versus exploitation
dilemma, and therefore provides a natural connection to portfolio selection. In
this paper, we incorporate risk-awareness into the classic multi-armed bandit
setting and introduce an algorithm to construct portfolio. Through filtering
assets based on the topological structure of financial market and combining the
optimal multi-armed bandit policy with the minimization of a coherent risk
measure, we achieve a balance between risk and return.