Adaptive optics control with reinforcement learning: first steps
Document typeConference report
PublisherBarcelona Supercomputing Center
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
When planar wavefronts from distant stars traverse the atmosphere, they become distorted due to the atmosphere’s inhomogeneous temperature distribution. Adaptive Optics (AO) is the field in charge of correcting those distortions allowing high-quality observations of distant targets. The AO solution is composed of three main components: a deformable mirror (DM) that corrects the deformation in the wavefront, a wavefront sensor (WFS) that allows characterising the current turbulence in the wavefront and a real time controller (RTC) that issues commands to, via the deformation of the DM, correct the wavefront. Usually, the operations are performed on closed-loop with stringent real-time requirements (in the order of 103 104 actions per second). At each iteration, the WFS observes the wavefront after being corrected by the DM and the RTC issues the commands to correct for the evolution of turbulence and previous uncorrected errors (Figure 1 left). One of the primary sources of error for an AO control algorithm is the temporal error. The delay between characterising the turbulence with the WFS and setting the desired commands in the DM creates the need that any successful control approach must take into account past commands and the probable evolution of the atmosphere in this gap of time. To do that, the most common approach in AO are variants of Linear Quadratic Gaussian (LQG) with Kalman filters with one of its initial iterations presented in . Usually, a linear model of the system’s evolution is built with a set of parameters that are usually fitted based on observations or on theoretical assumptions, which limits the capability of the system to correct the turbulence. In this paper, we present a novel solution based on Reinforcement Learning (RL), based on a reward signal to be optimised, that does not need any previously built model (as LQG) and is non-linear. RL has been already applied in the domain of AO, however, it has been limited to WFS-less systems (e.g. ) or, more recently, to control a very limited number of actuators . This work’s main practical objective is to be applied in the 8.2 m Subaru telescope (located in Hawaii), which includes thousands of actuators. B. AO Control: Integrator with gain
CitationPou Mulet, B.; Quiñones, E.; Martín Muñoz, M. Adaptive optics control with reinforcement learning: first steps. A: . Barcelona Supercomputing Center, 2021, p. 87-89.
|BSC_DS-2021-32_Adaptive Optics Control with.pdf||2,769Mb||View/Open|