simeonedef / galaxy-image-gen Goto Github PK

This project proposes a solution to score how 'realistic' sparse galaxy images are as well as a 2-stage GAN to generate those sparse galaxy images.

License: GNU General Public License v3.0

Python 0.67% Jupyter Notebook 99.33%

gan deep-learning image-processing image-generation convolutional-neural-networks eth-zurich analysis regressors

galaxy-image-gen's Introduction

2-Stage GAN for Sparse Galaxy Image Generation

Siméone de Fremond, Carlos Gomes, Andra-Maria Ilies, Andrej Ivanov
Please refer to the project report for details.

The rapid growth of astronomical data in recent years has prompted interest in building generative models of cosmological images. This paper focuses on 2 related tasks in this area: 1) generating realistic images and 2) scoring how realistic an image is. We begin by formalizing the structure of the images in terms of galaxies and their properties, and show that simplistic baselines are not sufficient for both tasks. We then present a novel generative approach consisting of a two-stage sampling generative adversarial neural network (GAN), capturing the position of cosmological objects and their representation separately, which produces diverse and realistic looking images. Furthermore, we develop an accurate scoring method based on Power Spectrum features. Using this scoring method we empirically show that our two-stage generative model significantly outperforms both the baselines and a standard GAN approach. Finally, we provide experiments towards understanding of the most important factors in our model.

Dataset

On a high level, the images in our dataset are of a high resolution (1000x1000 pixels), but most of the image represents a constant dark background populated with a number of very distant, and hence tiny, celestial objects.

Below is a sample of what they look like:

We consider any non-black cluster of pixels a galaxy. These objects in our images represent both stars and galaxies, but because of the low resolution we model them together, and for the rest of the paper we will refer to them as simply galaxies. Visually, a galaxy is defined by 4 main properties: location in the image, size, intensity and shape.

Scoring Model

To score the images and overcome the sparsity element we use an XGBoost model on a histogram of intensities of the picture as well as features from the 1D power spectrum of the image. (more details in the report)

Generation Model

We use a 2-stage GAN that works in the following way:

Position sampling: For each image in our dataset, we divide it into a 32x32 grid. Cells in this grid with galaxies are marked with a 1, and those without are marked with a 0. The first GAN, which we will refer to as the position GAN, learns these as 32x32 images, and is able to output such a grid representing the locations of galaxies in the image
Galaxy sampling: we use a second GAN, which we will call the galaxy GAN, to generate images of individual galaxies conditioned on the size of the galaxy. The training data is made up of extracted 32x32 galaxy patches from our original dataset.

Essentially, the position GAN generates a grid, and the galaxyGAN produces the necessary galaxies. A generated galaxy isplaced wherever a cell is 1, and 0s wherever it is 0.

Setup

Place the 3 image folders labeled, query and scored in the data directory.

Run python setup.py develop.

Run conda env create -f environment.yml, followed by conda activate galaxy_gen.

In order to train the two-stage models, under the scripts directory, run python galaxy_patches.py.

Project directories

generators: contains the generative model classes for all our generative models
regressors: contains the regressor model classes for all our regression models
training: contains the notebooks and scripts used to train our generative/regressive models
common: common utils used throughout the project
analysis: experiments on our models and data analysis
scripts: self-explanatory

Each directory has its own README explaining how to run its scripts/notebooks.

galaxy-image-gen's People

Contributors

Stargazers

Watchers

Forkers

carlosgomes98

galaxy-image-gen's Issues

[report] write about regressor with FFT

[Regressor] histogram + andrej's magic features

[Regessor] Try ensemble

Improve generator scorer

Improve evaluate script to show mean, std dev of more than 16 images

Try random forrest stuff

add main README

Have a README file that (at least) describes what your
software does, and which commands to run to obtain
results. Also mention anything special that needs to be
set up, such as toolboxes1

Data augmentation - patching dataset

images in initial dataset are 1000x1000
pad them to 1024x1024 ( we <3 powers of 2)
divide image in sections of 64x64 ⇒ 256 resulted smaller images
get rid of patches that are all black → do not provide any relevant info and we assume that the network will learn from the other patches that the background should be black
train convolutional gan with this and hope for the best

Follow up → how to create big cosmologic images

this gan will create 64x64 patches of images
could use these images to create a bigger image
need info about galaxy distribution in images (how many galaxies in the image, how are they distributed, are they clustered, how many of each color in images, are things just random)

Extract new features

In the paper below, they mention as features:

• Elongation
• Form Factor
• Convexity
• Bounding Rectangle to Fill Factor
• Galaxy Area
• Galaxy Bounding Rectangle Width
• Galaxy Bounding Rectangle Height
• Brightness

https://people.cs.uct.ac.za/~stpjul004/downloads/hnhroy002-feature_extraction_and_selection.pdf

[Regressor] Features: histogram of oriented gradients

Implement baseline generative model (my humps my humps my lovely little lumps)

We sample the number of galaxies k from a normal distribution, say N(3,2) so we end up mostly with 1 <= k <= 5 galaxies.
For each galaxy, we sample the centre of the galaxy uniformly over the set of all H * W pixels.
For each galaxy, we sample its radius from a normal distribution
The pixel intensity of each galaxy is proportional to the size of the galaxy

Full text galaxy features + histogram features

Analysis of background threshold in real images

Distribution of # of small galaxies in images
How do images with background_threshold > 0 look like ?
Extend information file script to in include num white pixels
Add paragraph in draft about decisions we made regarding this.

Read GENERATIVE NETWORKS FOR EMULATING SYNTHETIC SKY IMAGES

https://kspa.soe.ucsc.edu/sites/default/files/Guilloteau.pdf

Using basic features like number and size of clusters for regression

Detailed text Fourier transform + wavelets

Extracting useful statistics from data (my milkshake brings all the boys to the yard)

If we call the black pixels (everything except for the galaxy blobs) ‘background’, what is the background-to-galaxy proportion on average. What is the distribution of this proportion over all images ?
Does the background consist only of purely black pixels? Is there any variation in the ‘blackness’ of the background ?
What is the distribution of the number of galaxies per image ?
What is the distribution of the size of the galaxies ?

Change GAN

must have a generate method
input: no of images to generate

make BEGAN a conditional GAN

why not

Data augmentation on BEGAN

translate
rotation

Tune the hard-coded parts of baseline 1

Cleaning up the code

Cleaning up notebooks:

Add instructions in an appropriate README about what needs to be run before the notebook can be ran. Explain that an alternative to running the pre-requisites is downloading the pre-computed data from Polybox (include a link and an instruction for where to download it).
Make sure the paths used are correct (this will be relevant if we restructure some of directories).
Add markdown at the start of the notebook about what the notebook does.
For the more important parts add some markdown descriptions of what the cells do.
Remove redundant cells

Cleaning up scripts:

Same as notebooks
Same as notebooks
Add some documentation where necessary