Git Product home page Git Product logo

simeonedef / galaxy-image-gen Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 126.77 MB

This project proposes a solution to score how 'realistic' sparse galaxy images are as well as a 2-stage GAN to generate those sparse galaxy images.

License: GNU General Public License v3.0

Python 0.67% Jupyter Notebook 99.33%
gan deep-learning image-processing image-generation convolutional-neural-networks eth-zurich analysis regressors

galaxy-image-gen's Introduction

2-Stage GAN for Sparse Galaxy Image Generation

Siméone de Fremond, Carlos Gomes, Andra-Maria Ilies, Andrej Ivanov
Please refer to the project report for details.

The rapid growth of astronomical data in recent years has prompted interest in building generative models of cosmological images. This paper focuses on 2 related tasks in this area: 1) generating realistic images and 2) scoring how realistic an image is. We begin by formalizing the structure of the images in terms of galaxies and their properties, and show that simplistic baselines are not sufficient for both tasks. We then present a novel generative approach consisting of a two-stage sampling generative adversarial neural network (GAN), capturing the position of cosmological objects and their representation separately, which produces diverse and realistic looking images. Furthermore, we develop an accurate scoring method based on Power Spectrum features. Using this scoring method we empirically show that our two-stage generative model significantly outperforms both the baselines and a standard GAN approach. Finally, we provide experiments towards understanding of the most important factors in our model.

Python 3.6 License GPL

Dataset

On a high level, the images in our dataset are of a high resolution (1000x1000 pixels), but most of the image represents a constant dark background populated with a number of very distant, and hence tiny, celestial objects.

Below is a sample of what they look like:

We consider any non-black cluster of pixels a galaxy. These objects in our images represent both stars and galaxies, but because of the low resolution we model them together, and for the rest of the paper we will refer to them as simply galaxies. Visually, a galaxy is defined by 4 main properties: location in the image, size, intensity and shape.

Scoring Model

To score the images and overcome the sparsity element we use an XGBoost model on a histogram of intensities of the picture as well as features from the 1D power spectrum of the image. (more details in the report)

Generation Model

We use a 2-stage GAN that works in the following way:

  • Position sampling: For each image in our dataset, we divide it into a 32x32 grid. Cells in this grid with galaxies are marked with a 1, and those without are marked with a 0. The first GAN, which we will refer to as the position GAN, learns these as 32x32 images, and is able to output such a grid representing the locations of galaxies in the image
  • Galaxy sampling: we use a second GAN, which we will call the galaxy GAN, to generate images of individual galaxies conditioned on the size of the galaxy. The training data is made up of extracted 32x32 galaxy patches from our original dataset.

Essentially, the position GAN generates a grid, and the galaxyGAN produces the necessary galaxies. A generated galaxy isplaced wherever a cell is 1, and 0s wherever it is 0.

Setup

Place the 3 image folders labeled, query and scored in the data directory.

Run python setup.py develop.

Run conda env create -f environment.yml, followed by conda activate galaxy_gen.

In order to train the two-stage models, under the scripts directory, run python galaxy_patches.py.

Project directories

  1. generators: contains the generative model classes for all our generative models
  2. regressors: contains the regressor model classes for all our regression models
  3. training: contains the notebooks and scripts used to train our generative/regressive models
  4. common: common utils used throughout the project
  5. analysis: experiments on our models and data analysis
  6. scripts: self-explanatory

Each directory has its own README explaining how to run its scripts/notebooks.

galaxy-image-gen's People

Contributors

andramariailies avatar andrejivanov1 avatar carlosgomes98 avatar simeonedef avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

carlosgomes98

galaxy-image-gen's Issues

add main README

Have a README file that (at least) describes what your
software does, and which commands to run to obtain
results. Also mention anything special that needs to be
set up, such as toolboxes1

Data augmentation - patching dataset

  • images in initial dataset are 1000x1000
  • pad them to 1024x1024 ( we <3 powers of 2)
  • divide image in sections of 64x64 ⇒ 256 resulted smaller images
  • get rid of patches that are all black → do not provide any relevant info and we assume that the network will learn from the other patches that the background should be black
  • train convolutional gan with this and hope for the best

Follow up → how to create big cosmologic images

  • this gan will create 64x64 patches of images
  • could use these images to create a bigger image
  • need info about galaxy distribution in images (how many galaxies in the image, how are they distributed, are they clustered, how many of each color in images, are things just random)

Implement baseline generative model (my humps my humps my lovely little lumps)

  • We sample the number of galaxies k from a normal distribution, say N(3,2) so we end up mostly with 1 <= k <= 5 galaxies.
  • For each galaxy, we sample the centre of the galaxy uniformly over the set of all H * W pixels.
  • For each galaxy, we sample its radius from a normal distribution
  • The pixel intensity of each galaxy is proportional to the size of the galaxy

Analysis of background threshold in real images

  1. Distribution of # of small galaxies in images
  2. How do images with background_threshold > 0 look like ?
  3. Extend information file script to in include num white pixels
  4. Add paragraph in draft about decisions we made regarding this.

Extracting useful statistics from data (my milkshake brings all the boys to the yard)

  • If we call the black pixels (everything except for the galaxy blobs) ‘background’, what is the background-to-galaxy proportion on average. What is the distribution of this proportion over all images ?
  • Does the background consist only of purely black pixels? Is there any variation in the ‘blackness’ of the background ?
  • What is the distribution of the number of galaxies per image ?
  • What is the distribution of the size of the galaxies ?

Change GAN

must have a generate method
input: no of images to generate

Cleaning up the code

Cleaning up notebooks:

  • Add instructions in an appropriate README about what needs to be run before the notebook can be ran. Explain that an alternative to running the pre-requisites is downloading the pre-computed data from Polybox (include a link and an instruction for where to download it).
  • Make sure the paths used are correct (this will be relevant if we restructure some of directories).
  • Add markdown at the start of the notebook about what the notebook does.
  • For the more important parts add some markdown descriptions of what the cells do.
  • Remove redundant cells

Cleaning up scripts:

  • Same as notebooks
  • Same as notebooks
  • Add some documentation where necessary

Setup.py

Make a setup.py from setuptools so we dont need to add stuff to path.

Fix logical flaw in baseline 2

Currently we divide the clusters into small and large, and both the small and large can be of size 1, just with different probabilities? Doesn't make sense

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.