TOP: Introduction

FORWARD: Laser guide stars

3. Wave-Front Sensors

3.1. Requirements to wave-front sensing

The problem of measuring wave-front distortions is common to optics (e.g. in the fabrication and control of telescope mirrors), and typically is solved with the help of interferometers. Why do not use standard laser interferometers in Adaptive Optics Wave-Front Sensors (WFSs)?

First, an AO system must use the light of stars passing through the turbulent atmosphere to measure the wave-fronts, hence use incoherent (and sometimes non-point) sources. Even the laser guide stars are not coherent enough to work in typical interferometers. WFS must work on white-light incoherent sources.

Second, the interference fringes are chromatic. We can not afford to filter the stellar light, because we want to use faint stars. WFS must use the photons very efficiently.

Third, interferometers have an intrinsic phase ambiguity of $2\pi$, whereas atmospheric phase distortions exceed $2\pi$, typically. The WFS must be linear over the full range of atmospheric distortions. There are algorithms to "un-wrap" the phase and to remove this ambiguity, but they are slow, while atmospheric turbulence evolves fast, on a millisecond time scale: WFS must be fast.

These requirements are fulfilled in several existing WFS concepts. Each WFS consists of the following main components:

Generic WFS

Aliasing illustration Needless to say that any real WFS has a finite spatial resolution, which must match the size of correcting elements (e.g. inter-actuator spacing of the DM). Wave-front distortions of smaller size are not sensed. However, they influence the WFS signal, causing the so-called aliasing error (like an aliasing error in temporal signals with finite sampling, see the Figure). Turbulence spectrum decreases at high spatial frequencies, hence aliasing error is often of little importance compared to other AO errors, e.g. to the fitting error.

3.2. Shack-Hartmann WFS

Shack-Hartmann WFS

A well-known Hartmann test devised initially for telescope optics control was adapted for AO and is the most frequently used type of WFS. An image of the exit pupil is projected onto a lenslet array - a collection of small identical lenses. Each lens takes a small part of the aperture, called sub-pupil, and forms an image of the source. All images are formed on the same detector, typically a CCD.

When an incoming wave-front is plane, all images are located in a regular grid defined by the lenslet array geometry. As soon as the wave-front is distorted, the images become displaced from their nominal positions. Displacements of image centroids in two orthogonal directions $x,y$ are proportional to the average wave-front slopes in $x,y$ over the sub-apertures. Thus, a Shack-Hartmann (S-H) WFS measures the wave-front slopes. The wave-front itself is reconstructed from the arrays of measured slopes, up to a constant which is of no importance for imaging. Resolution of a S-H WFS is equal to the sub-aperture size.

Question: What is the maximum angular size of the source when images from adjacent sub-apertures begin to overlap? Take lenslet size of 0.5 mm and its focal distance 50 mm. Will this lenslet array be adequate for an AO system with sub-aperture size $d$=1 m?

Question: Estimate the r.m.s. slopes of wave-fronts on the sub-apertures as a function of sub-aperture size $d$ and $r_0$ (use the coefficients of atmospheric tip and tilt from Sect. 1.10). Compute for $d$=1 m and 1 arcsecond seeing.

A good feature of the S-H WFS is that it is completely achromatic, the slopes do not depend on the wavelength. It can also work on non-point (extended) sources. If $\phi(\vec{r})$ is the wave-front phase, the x-slope measured by a S-H WFS is computed as

x = \frac{\lambda}{ 2 pi S} \int_{sub-aperture} \frac{\partial
\phi(\vec{r})}{\partial r_x} \; {\rm d}\vec{r},
\end{displaymath} (1)

where $S$ is the area of the sub-aperture. The slopes x,y are estimated from the displacements of the image centroid (or center of gravity), like

x = \frac{\sum_{i,j} x_{i,j}I_{i,j}}{\sum_{i,j} I_{i,j}} \;\...
y = \frac{\sum_{i,j} y_{i,j}I_{i,j}}{\sum_{i,j} I_{i,j}},
\end{displaymath} (2)

where $I_{i,j}$ are the intensities of light on the detector pixels. It is assumed that x,y coordinates are expressed in radians (this can be done knowing the pixel scale of the detector).

Photon noise of centroid Now the error of slope measurement which arises from the photon noise will be estimated. Let $\beta$ radians be the radius of the image formed by each sub-aperture. For extended sources, $\beta$ is equal to the source size (more precisely, to the dispersion of the intensity distribution around the center). For point sources, $\beta=\lambda/d$ if the sub-apertures are smaller than $r_0$ (diffraction-limited images), or $\beta = \lambda/r_0$ for large sub-apertures (image size determined by the atmospheric blur). The image intensity distribution can be regarded as a probability density distribution of the arriving photons. Hence, each arriving photon permits to determine image position with an error of $\beta$. When $n$ photons are detected during exposure time, the photon error of the centroid position (i.e. slope) becomes $\beta/\sqrt{n}$, like after repeating the same measurement $n$ times.

In the photometric band R (wavelength around 600 nm) where the modern detectors are most sensitive, a star of magnitude 0 gives a flux of  8000 photons per second per square centimeter per nanometer of bandpass (effective bandpass may reach 300 nm for a good CCD). For a star of magnitude m the flux diminishes by $10^{-0.4m}$ times. In calculating the flux available for the WFS detector, the optical transmission must be taken into account.

Question: Compute the number of photons detected in 1 ms exposure time per sub-aperture of 1 m available from a star of 15-th magnitude. Assume total transmission of 0.3 and quantum efficiency of 0.6.

It is generally agreed to express all wave-front errors in radians. We multiply the slope error by $\frac{2\pi}{\lambda} d$ to obtain the variance of phase difference between the edges of sub-aperture in square radians:

\langle \epsilon_{\rm phot}^2 \rangle =
\frac{4 \pi^2}{n} \left( \frac{ \beta d}{\lambda} \right) ^2.
\end{displaymath} (3)

Be careful when manipulating this formula: here $\lambda$ is the wavelength of imaging with AO, while image size $\beta$ must be computed for the wavelength of wave-front sensing, which may be different.

Question: How many photons per exposure are needed to achieve a 1 radian photon error in a S-H WFS with $d = 3 r_0$? Assume that imaging and sensing is done at the same wavelength.

The error of reconstructed wave-fronts is proportional to $\langle
\epsilon_{\rm phot}^2 \rangle$ with a coefficient called noise propagation. It is known that for a S-H WFS noise propagation is of the order of 1 and increases only slowly with the number of elements (the slopes are integrated in the reconstructor, so noise is not amplified).

The photon flux is proportional to the square of sub-aperture size $d$. It means that, for a given $\beta$, the photon error of a S-H WFS is independent of the size of its sub-apertures. This conclusion applies only to the ideal detector; in real systems with CCDs (e.g. NAOS at VLT) larger sub-apertures are selected for fainter guide stars.

Quad-cell How many detector pixels must be allocated for each sub-aperture? In order to compute the centroids accurately, the individual images must be well sampled, more than 4x4 pixels per sub-aperture. However, each pixel of a CCD detector contributes the readout noise which dominates the photon noise for faintest guide stars. Thus, in some designs (e.g. Altair for Gemini-North) there are only 2x2 pixels per sub-aperture. In this case each element works as a quad cell, the x,y slopes are deduced from the intensity ratios:

x \approx \frac{\beta}{2} \; \frac{I_1 +I_2 - I_3 - I_4}{I_1...
...beta}{2} \; \frac{I_2 +I_3 - I_1 - I_4}{I_1 +I_2 + I_3
+ I_4}.
\end{displaymath} (4)

The response of a quad-cell slope detector is linear only for slopes less than $\pm \beta/2$, the response coefficient is proportional to $\beta$ (hence may be variable, depending on seeing or object size). This is the price to pay for the increased sensitivity, which is of major importance to astronomers.

Question: What shape of the guide star image is needed to achieve the exactly linear response curve of a quad cell?

The S-H WFSs are very common because they rely on a proven technology and solid experience, are compact and stable. These WFSs require a calibration of the nominal spot positions, which is achieved by imaging an artificial point source.

3.3. Curvature sensors

The curvature wave-front sensing was developed by F. Roddier since 1988. His idea was to couple a curvature sensor (CS) and a bimorph DM directly, without a need for intermediate calculations (although nobody actually does this).

Curvature sensing

Let $I_1(\vec{r})$ be the light intensity distribution in the intra-focal stellar image, defocused by some distance $l$, and $I_2(\vec{r})$ - the corresponding intensity distribution in the extra-focal image. Here $\vec{r}$ is the coordinate in the image plane and $F$ is the focal distance of the telescope. These two images are like pupil images reduced by a factor of $\frac{l}{F-l}$. In the geometrical optics approximation, a local wave-front curvature makes one image brighter and the other one dimmer; the normalized intensity difference is written as

\frac{I_1(\vec{r}) -I_2(\vec{r})}{I_1(\vec{r}) +I_2(\vec{r})...
...triangledown^2 \phi \left( \frac{F \vec{r}}{l}\right) \right].
\end{displaymath} (5)

The operator $\bigtriangledown^2 = \frac{\partial^2}{\partial x^2} +
\frac{\partial^2}{\partial y^2}$ is called Laplacian and is used to compute the curvature of the phase distribution $\phi(\vec{x})$. The first term in the above equation is the phase gradient at the edge of the aperture (this is written symbolically as a partial derivative over the direction perpendicular to edge multiplied by an "edge function" $\delta_c$). CS is achromatic (recall that $\phi(\vec{x})$ is inversely proportional to $\lambda$). Although the formula looks complicated, it is intuitively clear. The important thing is that the sensitivity of CS is inversely proportional to the de-focusing $l$.

Question: Draw the pairs of intra- and extra-focal images for Zernike aberrations from 2 to 6. Hint: defocused images from astigmatism to number 12.

For a source of finite angular size $\beta$ the intra- and extra-focal images are blurred by the amount of $\beta(F-l)$. The blur must be less than the projected size of sub-aperture $d$:

\beta (F-l) < \frac{l}{F}d
\end{displaymath} (6)

The de-focusing is always much less than the focal length $F$, hence the condition of minimum de-focusing is:

l > \beta \frac{f^2}{d}.
\end{displaymath} (7)

Larger de-focusing is needed to measure wave-front with higher resolution, the sensitivity of CS will be reduced accordingly. This means that a CS may have problems for sensing high-order aberrations.

For point sources and large sub-apertures (a case of practical interest) the blur $\beta$ is defined by the atmospheric aberrations, $\beta = \lambda/r_0$, as in the S-H WFS. If the AO system works in the closed loop and the residual aberrations (at the sensing wavelength) become small, the blur is reduced to $\beta=\lambda/d$, permitting to reduce de-focusing and to gain the sensitivity. This feature is actually used to a limited extent in the real AO systems: de-focusing is reduced once the loop is closed.

The high-frequency wave-front distortions (smaller than sub-aperture size) have power spectrum (variance of Fourier amplitudes) proportional to $f^{11/6}$, but their curvature spectrum is proportional to $f^{1/6}$ and may cause a large aliasing error. To prevent this, the signal must be smoothed before being sub-divided into sub-apertures (sampled). Smoothing is achieved by decreasing the defocusing $l$, which also increases the sensitivity. In short, the choice of $l$ in a CS is critical and must be adjusted to varying seeing conditions. The signal of a CS is only a more or less crude approximation of the true wave-front curvature...

We give without derivation the formula for a phase variance due to photon noise in a CS when the defocusing is adjusted to its optimum value:

\langle \epsilon_{\rm phot}^2 \rangle =
\frac{\pi^2}{n} \left( \frac{\beta d}{\lambda} \right) ^2 .
\end{displaymath} (8)

Like for a S-H WFS, this is the variance for one sub-aperture. We see that the expressions for S-H and CS are very similar. To obtain the overall wave-front error, $\langle
\epsilon_{\rm  phot}^2  \rangle$ must be multiplied by the noise propagation coefficient, which for a CS is proportional to the number of sub-apertures $N$ (it is proportional to log($N$) for S-H). In the reconstruction of the wave-front, the low frequencies are amplified, so the noise is mostly propagated to low-order modes. This indicates a potential problem of using CS in high-order AO systems. The detailed computer simulations of the Gemini AO system (~200 actuators) have shown that the performance of S-H and CS sensors is almost identical (Applied Optics, V. 36, P. 2856, 1997).

Gradient sensing The scale of intra- and extra-focal images depends on defocusing $l$ which must be changed during operation. This is not convenient; in fact the curvature signal is detected in the pupil image with fixed scale, while the amount of de-focusing is adjusted by a special optical element (see below). The outer sub-apertures project onto the pupil boundary, their signal provides information on the radial phase gradients, including global tip and tilt (see the Figure).

The CSs that actually work in astronomical AO systems (e.g. in PUEO and Hokupa'a ) use the Avalanche Photo-Diodes (APDs) as light detectors. These are single-pixel devices, like photo-multipliers. The individual photons are detected and converted to electrical pulses with no readout noise and small dark count, maximum quantum efficiency is around 60%. Individual segments of the pupil are isolated by a lenslet array (which, typically, matches the radial geometry of the bimorph DM), then the light from each segment is focused and transmitted to the corresponding APD via an optical fiber. The number of APDs is equal to the number of segments. Outer segments sample the edge of the aperture, and their signals are proportional to the wavefront gradients along normal.

Optical scheme of CS

APDs are bulky and expensive, hence this design is suitable only for low-order systems. In order to have only 1 detector per pixel, the intra- and extra-focal images are switched in time and directed to the same APD, then the signal is de-modulated in the wave-front computer. The focus modulation is done by placing an oscillating membrane mirror in the focal plane (typical frequency is 2 kHz). The defocusing $l$ is inversely proportional to the amplitude of membrane oscillation, which is adjusted to varying seeing conditions and can be reduced once the AO loop is closed, increasing the sensitivity of the CS. Some useful turbulence compensation was achieved even with signals as low as 1 photon per sub-aperture per loop cycle!

Alternative solution would be to use CCDs as light detectors in the CS. This is discussed for a long time, but not yet implemented in real systems. The drawback of CCDs is their readout noise which becomes a dominating noise source at low light levels. Special CCDs were developed at ESO that permit multiple modulation cycles per single readout.

Question: Suppose that a CCD with 5 electrons readout noise is used in the WFS. How large a number of detected photons $n$ must be to make the readout noise smaller than the photon noise?

3.4. Other wave-front sensors

Shearing interferometer The problems of interferometric wave-front measurement can be overcome when the interfering beams represent wave-fronts with a small lateral shift $\vec{\rho}$ (this is called shearing interferometer). If the shear is less than $r_0$, the phase differences are less than 1 wavelength, and there is no $2\pi$ ambiguity. The light intensity in the interferogram is

I(\vec{r}) = \vert e^{i \phi(\vec{r})} + e^{i \phi(\vec{r}+\...
...t[ \rho \frac{\partial \phi(\vec{r})}{\partial \rho} \right].
\end{displaymath} (9)

For small shifts the phase difference is proportional to the first derivative (slope), hence the signal of a shearing interferometer is is similar to that of S-H WFS. Two shears in the orthogonal directions are needed to measure x,y slopes. The first successful AO system (RTAC) used a WFS based on the shearing interferometer, but this approach is now completely abandoned in favor of S-H WFS.

Question: Estimate the maximum shear $\vec{\rho}$ to preserve a linear response of the shearing interferometer under given seeing conditions (given $r_0$).

Other types of interferometers were suggested for wave-front sensing. Some of them can provide signals directly proportional to the phase (thus not needing reconstructor), although in a limited dynamical range. Such solutions can be interesting for correcting high-order residual aberrations (e.g. in AO systems with a very high degree of compensation as needed for detecting extra-solar planets).

Pyramid WFS

The pyramid WFS (P-WFS) is being developed by Italian astronomers. A transparent pyramid is placed in the focal plane and dissects the stellar image into four parts. Each beam is deflected, these beams form four images of the telescope pupil on the same CCD detector. Thus, each sub-aperture is detected by 4 CCD pixels. This optical setup is similar to Foucault knife-edge test.

Let us suppose that the light source is extended and use the geometric optics. A wave-front slope at some sub-aperture changes the source position on the pyramid, hence changes the light flux detected by the 4 pixels which would otherwise be equal. By computing the normalized intensity differences we get two signals proportional to the wave-front slopes in two directions. The sensitivity of a P-WFS depends on a source size $\beta$. P-WFS can be viewed as an array of quad-cells and is similar to a S-H WFS.

What happens when a point source (star) is used and when diffraction effects are taken into account? The intensity distributions in the four pupil images become complicated and non-linear functions of the wave-front shape, P-WFS does not measure slopes any longer. In case of weak aberrations (amplitude much less than $\lambda$) the wave-front shape can still be reconstructed, although in a more complex way. In order to retrieve the linearity, the star is rapidly moved over the pyramid edge (e.g. in a circular pattern), creating a ring-shaped source. This is not modulation (like in the CS), but simply smearing of the point source, because the signal is integrated over one or more wobble cycles.

Question: Draw the four pupil images in a P-WFS for the case of defocusing (Zernike mode number 6).

What are the advantages of a P-WFS? First, there is no lenslet array, the sub-apertures are defined by the detector pixels. It means that for faint stars the number of sub-apertures can be reduced simply by binning the CCD. Second, the amplitude of the star wobble can be adjusted as a trade-off between the sensitivity (smaller wobble) and linearity (larger wobble). At small amplitudes the sensitivity of a P-WFS can be higher than that of a S-H WFS (see Astron. Astrophys. V. 369, P. L9, 2001). Finally, it is possible (at least in principle) to place several pyramids in the focal plane, in order to combine the light from several faint guide stars on a single detector. Despite the general interest in P-WFS, there are yet no working AO systems with this kind of WFS.

The phase can be retrieved from the analysis of two simultaneous images of a star, one in-focus and the other one defocused (or, generally, with some known aberration). This approach is called phase diversity. The algorithm is non-linear (hence slow?), the advantages of its application to AO are not yet clear.

The "ideal" WFS is not yet invented. There is no general theorem which would state the absolute sensitivity limit of any WFS due to photon noise. Instead, we have several empirical solutions, optimize their parameters and choose the best among available options.

3.5. Wave-front reconstruction

In this section the problem of computing the wave-front shape from the data provided by a WFS is addressed in a general way.

The measurements (WFS data) can be represented by a vector $S$ (its length is twice the number of sub-apertures N for a S-H WFS, because slopes in two directions are measured, and equal to N for CS). The unknowns (wave-front) is a vector $\phi$, which can be specified as phase values on a grid, or, more frequently, as Zernike coefficients. It is supposed that the relation between the measurements and unknowns is linear, at least in the first approximation. The most general form of a linear relation is given by matrix multiplication,

S = A \phi ,
\end{displaymath} (10)

where the matrix $A$ is called interaction matrix. In real AO systems the interaction matrix is determined experimentally: all possible signals (e.g. Zernike modes) are applied to a DM, and the WFS reaction to these signals is recorded.

A reconstructor matrix B performs the inverse operation, retrieving wave-front vector from the measurements:

\phi = B S.
\end{displaymath} (11)

Question: For a given number of sub-apertures N, estimate the number of arithmetic operations needed to reconstruct phase. How does it depend on the imaging wavelength (for given Strehl ratio)?

The number of measurements is typically more than the number of unknowns, so a least-squares solution is useful. In the least-squares approach we look for such a phase vector $\phi$ that would best match the data. The resulting reconstructor is

B = ( A^T A)^{-1} A^T .
\end{displaymath} (12)

Here superscript T means matrix transpose, and superscript -1 means inverse matrix. Matrix operations are very frequently encountered in the AO.

In almost all cases the matrix inversion presents problems because the matrix $A^T A$ is singular. It means that some parameters (or combinations of parameters) are not constrained by the data. For example, we can not determine the first Zernike mode (piston) from the slope measurements. In practice the matrix inversion is done by removing the indetermined (or poorly determined) parameters with the help of Singular Value Decomposition algorithm. In S-H systems with square geometry, poorly determined modes typically include "waffle" (quasi-periodic deformation with actuator-grid frequency).

How many Zernike modes can be reconstructed with a S-H WFS having $N$ sub-apertures? At first sight, up to 2$N$. In fact, only $N$, because the x,y slopes are not completely independent, they are redundant. For a CS, the maximum number of modes is also $N$.

The least-squares reconstructor is not the best one. It is known from the statistical textbooks that by using a priori information on the signal properties a better reconstruction can be achieved. In case of AO, this information is the statistics of wave-front perturbations (e.g. a covariance of Zernike modes) and the statistics of WFS noise. Looking for a solution that gives the minimum expected residual phase variance (hence maximum Strehl ratio), we obtain a reconstructor matrix which is similar to a Wiener filter.

In case of one-dimensional signals, the Wiener filter in frequency space is written as

W(f) = \frac{ \tilde{A}^* \vert\tilde{S}\vert^2 }
{ \vert\tilde{A}\vert^2 \vert\tilde{S}\vert^2 + \vert\tilde{N}\vert^2},
\end{displaymath} (13)

where $\vert\tilde{S}\vert^2$ and $\vert\tilde{N}\vert^2$ stand for the power spectra of signal and noise, respectively. If noise can be neglected, the Wiener filter tends to the inverse filter $\tilde{A}^{-1}$, but it cuts off the frequencies where noise dominates over the signal. In the AO context it means that both compensation order and servo bandwidth are reduced when there is not enough photons.

Question: The spatial power spectrum of slope errors is white (independent of frequency f) and the power spectrum of atmospheric tilts is proportional to $f^{-8/3}$. How does the maximum frequency of the compensated aberrations depend on the noise level $\vert\tilde{N}\vert^2$?

In AO systems the expressions for minimal variance reconstructor involve the interaction matrix and the covariance matrices of noise and atmospheric perturbations. Similar results are obtained using other statistical approaches (maximum likelihood or maximum a posteriori probability).

For any reconstructor B, the noise of the reconstructed phase $\langle \epsilon^2 \rangle$ is

\langle \epsilon^2 \rangle = \frac{1}{N} \;\; {\rm trace} (B C_S B^T),
\end{displaymath} (14)

where $C_S$ is the covariance matrix of measurements (a diagonal matrix with elements $\langle
\epsilon_{\rm phot}^2 \rangle$ in case of uncorrelated noise), trace means a sum of diagonal matrix elements. This expression permits to compute the noise propagation coefficient, relating the error of the WFS measurements to the error of the reconstructed phases.

Summary. Wave-front sensor is the most critical part of astronomical AO systems because guide stars are often faint, limiting the achievable degree of turbulence compensation. The two most common WFS concepts, Shack-Hartmann and curvature, were studied. For both of them we can compute the photon error and estimate the error of reconstructed wave-fronts as a function of guide star magnitude and system parameters. The basic ideas of wave-front reconstruction were introduced without going into much details.

TOP: Introduction

FORWARD: Laser guide stars