TR2020-101
Control of traffic light timing using decentralized deep reinforcement learning
-
- "Control of traffic light timing using decentralized deep reinforcement learning", World Congress of the International Federation of Automatic Control (IFAC), Rolf Findeisen and Sandra Hirche and Klaus Janschek and Martin Mönnigmann, Eds., DOI: 10.1016/j.ifacol.2020.12.1980, July 2020, pp. 14936-14941.BibTeX TR2020-101 PDF
- @inproceedings{Maske2020jul,
- author = {Maske, Harshal and Chu, Tianshu and Kalabic, Uros},
- title = {Control of traffic light timing using decentralized deep reinforcement learning},
- booktitle = {World Congress of the International Federation of Automatic Control (IFAC)},
- year = 2020,
- editor = {Rolf Findeisen and Sandra Hirche and Klaus Janschek and Martin Mönnigmann},
- pages = {14936--14941},
- month = jul,
- publisher = {Elsevier},
- doi = {10.1016/j.ifacol.2020.12.1980},
- url = {https://www.merl.com/publications/TR2020-101}
- }
,
- "Control of traffic light timing using decentralized deep reinforcement learning", World Congress of the International Federation of Automatic Control (IFAC), Rolf Findeisen and Sandra Hirche and Klaus Janschek and Martin Mönnigmann, Eds., DOI: 10.1016/j.ifacol.2020.12.1980, July 2020, pp. 14936-14941.
-
Research Areas:
Abstract:
In this work, we introduce a scalable, decentralized deep reinforcement learning (DRL) scheme for controlling traffic signalization. The work builds on previous results using multi-agent DRL, with a new state representation and reward definitions. The state representation is a coarse image of traffic and the definitions of reward functions are tested based on the simulated Monaco SUMO Traffic (MoST) scenario. Based on extensive numerical experimentation, we have found the most appropriate choice of the reward function is related to minimizing the average amount of time vehicles spent in the network, but with various modifications that improve the learning process. The resulting algorithm performs better than the previous one on which it is based and markedly better than a non-learning based, greedy policy