Astronomical-Images-Classification

Project overview

Recently, Astronomy has been witnessed a great jump and advancements in detectors, instruments, telescopes and even the probes that are sent into outer space and far planets for collecting data in sky surveys to map our universe. These collected data are organized into very large datasets forming what is called data-oriented astronomy. Unfortunately, it is impossible to work on these massive datasets manually to get effective results so, astronomers are seeking approaches to automate the human error borne processes of manual scanning in order to discover astronomical knowledge and information from these large raw datasets through state of the art data mining techniques, statistical methods and data science tools (This a new discipline called Astroinformatics ). I Think, as many others that there is no better player here than applying machine learning algorithms and techniques on massive astronomical datasets to classify stars, quasars, and galaxies either by using photometric images, redshifts or radio astronomy collected signals datasets. Historically, early efforts started in 2010 as initiatives in National Research Council (United states) . This step was the basis on which later research contributions focused on and enriched the field using large worldwide distributed collection of digital astronomical databases such as Large Synoptic Survey Telescope (LSST), Square Kilometer Array observatory, and Sloan Digital Sky Survey (SDSS). But the field is still young and need more research to find new precise techniques especially in machine learning and specifically utilizing deep learning and availability of massive astronomical datasets opened now for public. Theses classification problems should be solved in order to correctly map our universe to gain more understanding about it and endorse the existing and new theories in cosmology, of course the perfect candidate technique here is deep learning, since it has proved its success on massive image datasets such as the case of our domain. From this point of view, I've selected concrete astronomical classification problem to investigate applying a Deep Learning algorithm in order to validate this mentioned possibility using the well-known public dataset of the Sloan Digital Sky Survey (SDSS) Supernova surveys specifically the SDSS legacy survey named SDSS-II SN Survey.

Problem statement

Sky surveys provides massive astronomical datasets in many different forms such as optical spectra, infrared spectra, photometric redshifts, light curves and imaging data which represent variety of astronomical information and objects. Considering imaging data, astronomers start the classification process manually by scanning the in hand images to detect what is real and what is not in then group them into groups of possible stars, quasars, and galaxies. Imaging data often contains objects that appear for a short amount of time, called Transients such as gamma rays, asteroids, and supernovae. Supernova (SN) occurs at the end of star's life time i.e. when a star is destroyed by an explosion, it will extremely shine throughout the galaxy. The resulting supernovae (SNe) are classified based on their spectral features at maximum light into five main categories (Type I, Type II, Type Ib, Type Ic, Type Ia) differs in the presence of Hydrogen, Helium and other elements in their spectra. These supernovae explosions are either thermonuclear explosions of white dwarfs (for stars around 1.4 solar mass and dipoles) or core collapse of the star (massive stars around 9 solar mass) which releases gravitational energy and remains as neutron stars (pulsars) or black holes. The problem here that the classification of artefacts and objects for these physical phenomena is done by hand and it is a very time consuming job and also it is subject to human bias which differs from person to person, as well as the manual scanning is infeasible for a huge amount of images. As a solution to for this problem, there are grand efforts to explore the possibility of applying machine learning algorithms to automate the process of knowledge discovery and the extraction of astronomical information within these large raw datasets instead of the infeasible manual and human handling. Herein our project we have investigated the applying of a Deep Learning algorithm in order to validate this mentioned possibility but for the purpose of the comparison to our benchmark model I have chosen the astronomers' first step which is looking for interesting objects in a binary classification problem of determining what is real and what is not in the underlying imaging data.

Algorithms and techniques

In order to reduce human bias depending on the mood and tiredness of who perform manual scanning, SDSS datasets is injected randomly with some amount of fake objects images among the original images. The remaining part of the problem is the infeasibility of classifying hundreds and thousands of images per night by human, so they should be replaced by machines and this what I am trying to validate in my project. I will construct a convolutional neural network to automate the manual process of classifying transient imaging data from SDSS Supernova survey into real objects and artefacts since this algorithm provides impressive image recognition results. Difference images are created by subtracting a reference image from the most recent image of a given part of the sky, this will leave a pure noise image unless transient is exist. To simplify our problem, I will convert it to binary classification task i.e. the mission will be to detect real transients from non-real ones

Benchmark

Most of the similar works to this project had been focused on SN Factory prior work (Roman, Aragon & Ding in 2006 and Bailey 2007) utilizing classical supervised learning algorithms to accomplish the task of automatic classification. As a benchmark model for my work, I will suggest the publication titled " Machine learning classification of SDSS transient survey images " . This research works on the same dataset, but used different learning algorithms such as Random forest, k-nearest neighbors, Naïve Bayes, and Support vector machine (SVM). Then it compares their performance using the same measure metrics I have used, in other words I had reworked this research but using deep learning convolutional networks and then compared my work to the prior work. But on contrast they depend on Principal Component Analysis Algorithm (PCA) to extract features like attributes as shape, position, Full-Width Half Maximum (FWHM), and distance to the nearest object in the difference image as well as SDSS Camera filters (g, r, i, z, u) , In our solution there is no need to process feature extraction by using PCA but our CNN will learn them. The benchmark model had achieved the following results which I am comparing my results to them.

Conclusion

The benchmark model we are comparing with is examining a variety of machine learning algorithms for transient classification was simplified to perform the primary process of binary classification for the objects in the astronomical imaging data into Real and Non-Real objects. From the bench mark results we note that the best classifier was used is Random Forest (RF) with Accuracy and Recall at 91%. As well as it uses Neural Network called SkyNet that gives Accuracy of 88% and Recall of 89 %. But our final result using Deep Learning Convolutional Network (CNN) gives a good acceptable accuracy of 92.36 % and Recall 96.51 %, I have tried to get optimal solution for the problem and showing that using machine learning algorithms especially deep learning could exceed human manual scanning process and I think I have reached this goal with some degree.

wahyurahmaniar / astronomical-images-classification Goto Github PK