<![CDATA[MoneyScience: Quantitative Finance at arXiv's blog: Towards Inverse Reinforcement Learning for Limit Order Book Dynamics. (arXiv:1906.04813v1 [cs.LG])]]>
http://www.moneyscience.com/pg/blog/arXiv/read/857673/towards-inverse-reinforcement-learning-for-limit-order-book-dynamics-arxiv190604813v1-cslg?view=rss
http://www.moneyscience.com/pg/blog/arXiv/read/857673/towards-inverse-reinforcement-learning-for-limit-order-book-dynamics-arxiv190604813v1-cslgWed, 12 Jun 2019 23:02:24 -0500
http://www.moneyscience.com/pg/blog/arXiv/read/857673/towards-inverse-reinforcement-learning-for-limit-order-book-dynamics-arxiv190604813v1-cslg
<![CDATA[Towards Inverse Reinforcement Learning for Limit Order Book Dynamics. (arXiv:1906.04813v1 [cs.LG])]]>Multi-agent learning is a promising method to simulate aggregate competitive
behaviour in finance. Learning expert agents' reward functions through their
external demonstrations is hence particularly relevant for subsequent design of
realistic agent-based simulations. Inverse Reinforcement Learning (IRL) aims at
acquiring such reward functions through inference, allowing to generalize the
resulting policy to states not observed in the past. This paper investigates
whether IRL can infer such rewards from agents within real financial stochastic
environments: limit order books (LOB). We introduce a simple one-level LOB,
where the interactions of a number of stochastic agents and an expert trading
agent are modelled as a Markov decision process. We consider two cases for the
expert's reward: either a simple linear function of state features; or a
complex, more realistic non-linear function. Given the expert agent's
demonstrations, we attempt to discover their strategy by modelling their latent
reward function using linear and Gaussian process (GP) regressors from previous
literature, and our own approach through Bayesian neural networks (BNN). While
the three methods can learn the linear case, only the GP-based and our proposed
BNN methods are able to discover the non-linear reward case. Our BNN IRL
algorithm outperforms the other two approaches as the number of samples
increases. These results illustrate that complex behaviours, induced by
non-linear reward functions amid agent-based stochastic scenarios, can be
deduced through inference, encouraging the use of inverse reinforcement
learning for opponent-modelling in multi-agent systems.
]]>857673