Cracking the Galaxy Spectra code

Title: Galaxy Spectra I AutoCoding: Architecture

Authors: P. Melchior, Y. Liang, C. Hahn, et al.

First Author Foundation: Department of Astrophysical Sciences, Princeton University, NJ, USA and Center for Statistics and Machine Learning, Princeton University, NJ, USA

condition: Submitted to AJ

Imagine for a moment that you have never seen a car before and really wanted to make a toy car as a gift for your car-loving friend. Understandably, in the face of this conundrum, you may feel that the outlook is bleak. However, now suppose another friend comes along and has a great idea to send you to the nearest busy street corner, explaining that everything that passes can be considered a car. Then they ask you to say goodbye and leave. Armed with this new information, you are revitalized. Watching cars pass by for a few minutes, you quickly form an image in your mind of what the car might be like. After about an hour, you’re back inside, ready to take on your challenge: you need to build your car (toy) based only on what you’ve seen.

This example is obviously contrived, but provides a useful connection point from which we can understand how to perform Autoencoder, or more generally, any unsupervised machine learning algorithm, might work (see these astrobites for examples of machine learning used in astronomy). If you think about how to approach the challenge above, the basic principle might be obvious: With just observations of passing cars, you can stick to the patterns you’ve noticed and use them to “reconstruct” the definition of a car in your mind. You’ll probably see that most had four wheels, many had headlights, and they all had the same shape overall. With that in mind, you might be able to build a decent looking toy car that your friend will be proud of. This is the basic idea behind an unsupervised learning task: the algorithm is presented with data and tries to identify relevant features of that data to help achieve some of the goals that have been presented. In the specific case of the autoencoder, that goal is to figure out how to reconstruct the original data from the compressed data set, like you were trying to do by building a toy car from memory. In particular, you (or the computer) boil down the cars’ observations (data) into common properties (the so-called “latent features”) and rebuild the car from those properties (the data reconstruction). This process is depicted schematically in Figure 1.

Figure 1: Schematic of the autoencoder architecture – it takes the input data (X’; input layer on the left), processes it into a lower dimensional representation (h; the ‘code’ layer in the center), and then attempts to reconstruct the original data from the compressed representation (X’; the output layer on the right ). (Source: Wikipedia)

Apply this to astronomy

To understand how this machine learning method can be applied in today’s paper, let’s take the car reconstruction example one step further. Instead of being able to observe a random sample of the cars that were driving by, let’s say this time your friend was a little less subtle and just pulled up pictures of a few different cars on their phone. Based on that, you might be able to do a decent job of producing something look Like a car, but your car probably won’t run very well (eg maybe the wheels are attached to the chassis and your car won’t move). Alternatively, your friend may have wanted to challenge you further and just describe how the car works. In this case, you might be able to build a toy car that works efficiently, but it probably won’t look very accurate. These scenarios loosely resemble some of the challenges found in current approaches to modeling galaxy spectra, the subject of today’s research.

Approaches to modeling galaxy spectra can be divided between experimental data-based models and theoretical models. The first is equivalent to the pictures of cars your friend showed you – astronomers use “typical” spectra and observations of local galaxies to build model spectra that can fit observations of systems at higher redshifts. While useful, they are usually based on observations of local galaxies, and thus may be limited to a limited wavelength range once correction for the cosmic redshift is incorporated. On the other hand, theoretical models reflect the last proposition from your friend; That is, it produces typical spectra based on a physical understanding of emission and absorption in the interstellar medium, stars and nebulae. These are interpretable and physically motivated, so they can be applied to higher redshifts, for example, but they usually depend on some estimation and are therefore unable to accurately capture the complexity of true spectra.

Despite these challenges, today’s authors note that the historical usefulness of applying template spectra to describe new observations of other spectra means that this data may not be as intrinsically complex as it appears—perhaps the differences between spectra can be reduced to a few relevant parameters. This goes back to the discussion about autoencoders and inspires the approach of today’s paper – perhaps one can find low-dimensional embedding (read: simpler representation) of spectra that makes reconstruction an easy task.

How to build a galaxy spectrum

Most conventional galactic spectrum analysis pipelines work by converting the observed (redshifted) spectrum into the emitted spectrum in the rest frame of the galaxy and then fitting the observation to a model. This means that the spectra are usually limited in the usable wavelength range of a band common to all the different spectra in the scan sample. In today’s authors’ build, they chose to keep the spectra as observed, which means they don’t do any kind of redshift processing before analyzing them, which allows them to present the algorithm with the full wavelength range of observation, thus preserving more data. Today’s paper introduces this algorithm, called SPENDER, which is represented schematically in Figure 2.

Figure 2: Schematic representation of the different layers in the SPENDER algorithm. For the encoding process, the input spectrum (y) is passed through several convolution layers (gray squares), and the relevant parts of the spectrum are selected in the attention layer (red square), processed and redshifted to the remaining frame in the MLP layer, leaving a representation latent (simple) for the prominent features in the latent (green) layer. For the decoding process, vector S is passed through three ‘activation layers’, which decode the low-dimensional modulation to produce a reconstruction of the reconstructed data frame (labelled x” in a green layer). This layer is then redshifted to produce the reconfigured spectrum. Its construction, y’.(Adapted from Figure 1 in the paper)

The algorithm takes an input spectrum and first passes it through three convolution layers to reduce the dimensions. Then, since the spectra are not redshifted, the processed data is passed through the attention layer. This is very similar to what you would do when watching cars go by on the street – although there were many cars passing by and they were all in different locations and moving at different speeds, I focused Attention on specific cars and certain features of those cars to train your neural network (read: brain) to figure out what a car is. This layer does the same thing by defining which parts of the spectrum you should focus on; For example, where the relevant emission and absorption lines may be. Then, to a conclusion encryption From the data, the data is passed through the Multilayer Perception (MLP) which transmits the data to the frame of the rest of the galaxy and compresses the data into s-dimensions (the required dimensions of latent space).

Now the model has to decode the embedded data and try to reproduce the original spectrum. It does this by passing the data through three “activation layers” that process the data through some predefined functions. These layers convert the simple, low-dimensional (latent) representation of the data into spectrum within the rest of the galaxy. Finally, this representation is transferred to the red back into the observation and the reconstruction is completed.

In practice, the contributions of different pieces of data to the final result depend on initially unknown weights. To learn these weights, the form is been trained – The reconstructed and original data are compared and the weights are adjusted (approximately by trial and error) until the optimal set of weights is reached.

So how do you do that?

The results of running the SPENDER model on an example spectrum from a galaxy in the Sloan Digital Sky Survey are shown in Figure 3.

Figure 3: Spectrum sample fit with the SPENDER model. Input data is presented in black and the reconstructed spectrum is given by the red curve. The super-resolution reconstruction is shown in blue, which indicates that the model can identify nearby unresolved features in the original data. The location of known spectral lines is marked with gray vertical bars and indicated along the top of the figure. (Adapted from Figure 3 in the paper)

Visually, the model appears to do well in reproducing the specified spectrum. Figure 3 also illustrates one of the advantages of a model like this. Not only is the model able to reproduce the various intricacies of the spectrum, but by varying the resolution of the reconstructed spectrum, the model is able to discern overlapping (or blended) features in the input data (see the two close OII lines in Fig. 3, for example). Ultimately, the nature of the SPENDER construction means that data can be passed to the model as it is received from the tool – because the model is trained without changing or cleaning up the input, the model learns to incorporate this processing into its analysis. Such a structure can also be used to generate phantom spectra and provides a novel approach for modeling spectra of galaxies in detail that mitigate some of the problems found in current experimental galaxy modeling approaches.

Astrobite Edited by Katie Proctor

Featured image credit: Adapted from the paper and bizior (via FreeImages)

On the plain of Hajla

I’m a first-year astrophysics PhD student at UCLA. I am currently using semi-analytical models to study the formation of the first stars and galaxies in the universe. I completed my undergraduate degree at Columbia University, and am originally from the San Francisco Bay Area. Outside of astronomy, you’ll find me playing tennis, surfing (read: elimination), and playing board games/TTRPGs!

Leave a Reply

%d bloggers like this: