Git Product home page Git Product logo

astropt's Introduction

earthPT

astroPT: a Large Observation Model for astronomy ๐Ÿ”ญ

Welcome to our simple repository for training astronomical large observation models. This repository began its life as Andrej Karpathy's nanoGPT, and has been altered so that it is usable for imagery data. Within train.py you will find a ~300-line boilerplate training loop and within model.py you will find a ~300-line GPT model definition with an MLP tokeniser and a regressive loss.

Check out the UniverseTBD Discord for updates: https://discord.gg/MNEVegvfJq

install

Dependencies:

  • pip install -r requirements.txt

results

AstroPT has been trained on 8.6M galaxy grz band *.png postage stamps downloaded from DESI-LS DR8 to see if neural scaling laws apply to galaxian data (in other words, to see if more galaxy data == more better model).
We tried to make the astroPT model as simple as possible so that other modalities can be easily folded in. We also choose to use a causally trained autoregressive transformer model as our backbone so that our work can more easily integrate the wider deep learning FOSS community.

Our pretraining task is feeding in our galaxy images patch-by-patch and predicting the next patch in our galaxy patch sequence. We follow ViT and define a patch as a 16 by 16 pixel square, and feed the galaxy patches in a spiral order:

galaxy

The trained model results are promising -- below we show our full training run validation losses across a parameter sweep of {1,5,12,21,89,309,830,2100}M trainable parameters:

scaling

We also test our astroPT models on some scientifically-useful downstream tasks by taking the models' penultimate layer outputs and finetuning linear probes to predict emergent physical properties of the galaxies:

downstream

In the above pic, $M_g$ and $M_z$ are the absolute magnitudes (or brightness at a fixed distance) of the galaxies, $g - r$ and $r - z$ are the differences between the observations of different telescope filter bands, redshift is the distance to the galaxies, sSFR is the total mass of new stars born each year in the galaxies per total galaxy mass, and $M_{*}$ is the total mass of stars within the galaxies. "smooth?", "disc?", "artefact?", "edge on?" and "tight spiral?" are morphological properties of the galaxies as described by citizen scientists.

The cool thing to take away from these plots is that the surrogate task loss (predicting the next patch in a sequence of ViT-like galaxy image patches) is correlated with astronomically "useful" downstream tasks ๐Ÿคฏ๐Ÿš€.

Finally, check out our UMAP projection of astroPT-87M's penultimate layer outputs of our validation set. We colour each point with an emergent physical galaxy property described above. The structure suggests that the model has learnt some knowledge about physics simply from our next-token prediction pretraining task!

hexbin

pretrained weights, and full galaxy dataset

Check out the paper here: https://arxiv.org/abs/2405.14930.

We of course release all our model weights checkpointed across our full training runs on HuggingFace ๐Ÿค— here.

We also release our full dataset and galaxy metadata on HuggingFace ๐Ÿ”ฅ.

astropt's People

Contributors

dependabot[bot] avatar rj-roberts avatar smith42 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.