Git Product home page Git Product logo

black_holes_backend's Introduction

Black Hole or Not Black Hole?

Welcome to the BlackholeNotblackhole project!

This is the backend for our web app, so we'll be writing server-side code to communicate with the frontend, store information in our database, and build the neural net to find galaxies with 'fossil' black holes.

For a quick intro to the project check out the MVP in notes.md. For more detailed description, check out the ML_Info/Project_Information.pdf directory

WR Classifier Procedure

  1. From the .fits file, open wavelength_values = 10**hdulist[1].data['loglam'] flux_values = hdulist[1].data['flux'] z = hdulist[2].data['z']
  2. Scale the wavelengths by 1/(1+z), correcting for redshift
  3. Adjust the slope of the curve
    1. Find the endpoints of the line by averaging the values around 4517±50 Å and 4785±50 Å
    2. Construct a line through those points
    3. At each point along the line, subtract the line's value from the flux
  4. Apply a Gaussian Kernel to smooth the data and remove noise
  5. Crop the range of samples to 4686±150 Å
  6. Standardize the domain of wavelengths by interpolating between samples, reducing the number of samples to 300

To classify He2, we perform the above operations, then take the sum of values around 4686±5 Å. If the sum is less that 0, we reject it as not containing He2.

Todo

  • Todo - He2 classifier

    • using charts, check that the data's around y=0
    • add in a small threshold value, exclude values below the threshold
    • try different widths
    • check how many samples we're actually using
  • Todo - WR classifier

    • Try multiple flux values
    • Look into other data in the fits file that could help train the classifier. What's contained in the other rows?
    • Find other features we can extract from the data? We could fit the overall curve to a blackbody curve, or fit the curve to a gaussian curve, then use the parameters from those to help build the classifier.

https://github.com/codeforgoodconf/black_holes_backend/blob/master/he2_classifier.py

black_holes_backend's People

Contributors

amjcosta avatar frankamp avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

black_holes_backend's Issues

Put data somewhere else

Currently, all the data is stored directly in the git repo. As we get more data to train on, this will become too big. The data should be stored somewhere that can be retrieved via a wget command.

  • Consolidate raw data in a single zip
  • Store the data somewhere (George Mason servers? AWS free-tier?)
  • Develop setup script to pull data from serve and run basic preprocessing.

Comments on the procedure

  1. You can get the variance from the same header data unit (HDU) the flux column comes from. flux_ivar = hdu[1].data['ivar']. Then the standard deviation of the flux is ferr = np.sqrt(1/fivar). You may have certain columns with bad data. In this case, ivar might be zero. I usually do:

wav, flux, flux_ivar = wav[flux_ivar != 0], flux[flux_ivar != 0], flux_ivar[flux_ivar != 0]
flux_err = np.sqrt(1/flux_ivar)

  1. Instead of averaging the endpoints, it might be worthwhile to try using the median, which is less sensitive to outliers (like emission lines). Or a weighted average using the above inverse variance.

For the Todos:

I'm not sure anything else in the FITS file, besides the redshift and the variance, will help train the classifier. Most of the other information is instrumental stuff.

Other features we can extract from the data are the strong line fluxes (H-beta, [OIII]5007, H-alpha, [NII]6584, and the [SII] doublet. These lines may be correlated with the presence or non-presence of WR activity. Maybe an ANN can pick out such correlations?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.