Git Product home page Git Product logo

ds4cg2024-inaturalist's Introduction

Data Science for the Common Good 2024

iNaturalist GeoModel Annotator Project

Code repository for 2024 Data Science for the Common Good project with iNaturalist.

Collaborators: Angela Zhu, Paula Navarrete, Sergei Pogorelov, Ozel Yilmazel

Spatial Implicit Neural Representations for Global-Scale Species Mapping - ICML 2023

This code enables the recreation of the results from our ICML 2023 paper Spatial Implicit Neural Representations for Global-Scale Species Mapping.

๐ŸŒ Overview

Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of thousands of species simultaneously. SINRs scale gracefully, making better predictions as we increase the number of training species and the amount of training data per species. We introduce four new range estimation and spatial representation learning benchmarks, and we use them to demonstrate that noisy and biased crowdsourced data can be combined with implicit neural representations to approximate expert-developed range maps for many species.

Getting Started

๐Ÿฅ Installation for local development

  1. Clone the repository git clone [email protected]:UMassCDS/ds4cg2024-inaturalist.git

  2. For local development and testing, you can choose a database engine from two options, PostgreSQL or SQLite. The most important thing is to ensure the DATABASE_URL is configured appropriately for your database according to SQLAlchemy Database Engine docs. To aid in setting up the database we've provided example environment files, .env.copy and .docker.env.copy, where the environment variables are listed. You can copy them to .env or .docker.env and fill in the values.

    a. PostgreSQL: Run Postgres in a local Postgres server or Docker container (see provided docker-compose.yml). You should configure the following environment variables in your environment file (.env or .docker.env).

    • POSTGRES_DB: Name of your database
    • POSTGRES_USER: Username the server will use to connect to the database
    • POSTGRES_PASSWORD: Password the server will use to connect to the database
    • DATABASE_URL: SQLAlchemy database connection URL. This should be something like postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:PORT_NUMBER/${POSTGRES_DB}, but

    b. SQLite: Storing the databases in a simple SQLite database file is useful for development and testing, but shouldn't be used in production. You only need to configure the DATABASE_URL=sqlite://<path to desired database file>

  3. Navigate to cloned project's root, run git submodule init and then git submodule update --remote --merge

  4. You are all set for setting up the code.

Note: We need .env for local development, and .docker.env for docker containers. The difference between them is database url, which is caused by how docker manages networks.

Note: to update submodules with latest changes, run git submodule update --remote --merge

Note: src/backend/sinr is a submodule from UMassCDS/inatrualist-sinr, if you need to work on sinr code, follow development practices for that repository, this includes making a dedicated development branch, making PRs.

Note: If you need to switch to a particular UMassCDS/inaturalist-sinr branch and run the prototype, navigate to src/backend/sinr, use git checkout <branch-name> to switch to a branch. Now the submodule will be at a different branch.

Downloading the pretrained models

If you want to run the app locally for development purposes, download the pretrained models from here, unzip them, and place them in a folder located at
/src/backend/sinr/pretrained_models.
If you only run the app in Docker, there's no need to download the models; Docker will handle this for you inside the image.

Installing the app

  1. We recommend using an isolated Python environment to avoid dependency issues. Install the Anaconda Python 3.9 distribution for your operating system from here.

  2. Create a new environment and activate it:

 conda create -y --name inatator python==3.9
 conda activate inatator
  1. After activating the environment, install the required packages:
pip install -r src/backend/requirements.txt && pip install -r src/backend/requirements-dev.txt

Note: If you get errors for psycopg2-binary, installing PostgreSQL can solve it. See PostgreSQL documentation for installation instructions and repeat step 3 after installing.

  1. install js libraries needed for react
npm i --prefix src/frontend/

๐Ÿง Running the iNatAtor Application

Run database in first terminal:

  1. Navigate to project root
  2. Launch postgres container:
docker compose up --build db

Run backend in second terminal:

  1. Navigate to the main ds4cg2024-inaturalist directory if you are not already there:
  2. Activate environment:
  conda activate inatator
  1. Launch the backend:
  uvicorn src.backend.app.main:app --reload --env-file .env

Run frontend in third terminal:

  1. Navigate to the main ds4cg2024-inaturalist directory if you are not already there:
  2. Launch the frontend:
  npm start --prefix src/frontend/

In your web browser, open the link http://localhost:3000/

Running applications with Docker

  1. Install Docker if you haven't already
  2. Open Docker Desktop, you cannot run containers or build images, if docker engine is not running
  3. Now in terminal, navigate to project root
  4. Run docker compose up --build, for the first build it may take a while, after build the application will be ran, you can access the application through the localhost:3000.
  5. You can stop containers with ctrl+c or using the Docker app

Common Docker commands:

  • If you want to start the application again, run docker compose up
  • If you want to just build images, run docker compose build
  • If you want to build and run containers, run docker compose up --build
  • If you want to build only one service, run docker compose build <service-name>, for example docker compose build backend

Note: you don't have to initialize submodule to run docker, dockerfile will set up the submodules for you while building the image.

Working with database

Sometimes you want to run the application without containers, allowing you to develop things quickly. The Running the iNatAtor Application section explains how to run the application locally.

You can verify connections are working by going to localhost:8000/health.

Note: The codebase is getting bigger, therefore add database related code in src/backend/app/db, then make proper API routes in main, if it gets too big, we can resort to using API routers from fastapi.

Note: There are two environment files (.env and .docker.env) because the database url for local development and docker environments are separate.

Make sure you always update your local branch to the latest.

Code Standards

  1. Use Docstrings, for some functions just a one-liner is fine, but for more complicated functions include multi-line documentation that explains the function simply, has information about arguments, and has details about the output.
  2. Module Docstrings, include a short description of module and functions inside the module.
  3. Use a formatter if possible. We have included ruff for linting and formatting in the developer dependencies in requirements-dev.txt.

๐Ÿ™ Acknowledgements

This project was enabled by data from the Cornell Lab of Ornithology, The International Union for the Conservation of Nature, iNaturalist, NASA, USGS, JAXA, CIESIN, and UC Merced. We are especially indebted to the iNaturalist and eBird communities for their data collection efforts. We also thank Matt Stimas-Mackey and Sam Heinrich for their help with data curation. This project was funded by the Climate Change AI Innovation Grants program, hosted by Climate Change AI with the support of the Quadrature Climate Foundation, Schmidt Futures, and the Canada Hub of Future Earth. This work was also supported by the Caltech Resnick Sustainability Institute and an NSF Graduate Research Fellowship (grant number DGE1745301).

If you find our work useful in your research please consider citing our paper.

@inproceedings{SINR_icml23,
  title     = {{Spatial Implicit Neural Representations for Global-Scale Species Mapping}},
  author    = {Cole, Elijah and Van Horn, Grant and Lange, Christian and Shepard, Alexander and Leary, Patrick and Perona, Pietro and Loarie, Scott and Mac Aodha, Oisin},
  booktitle = {ICML},
  year = {2023}
}

๐Ÿ“œ Disclaimer

Extreme care should be taken before making any decisions based on the outputs of models presented here. Our goal in this work is to demonstrate the promise of large-scale representation learning for species range estimation, not to provide definitive range maps. Our models are trained on biased data and have not been calibrated or validated beyond the experiments illustrated in the paper.

ds4cg2024-inaturalist's People

Contributors

sergepogorelov avatar oz03-hub avatar cheerstopaula avatar angelazhu2 avatar

Stargazers

 avatar

Watchers

 avatar Virginia Partridge avatar  avatar

ds4cg2024-inaturalist's Issues

Remove copied SINR code from the respository

As a developer, I need our application to be easy to maintain, so that I have time to work on new features and not just repository maintenance. Having copied code from a different repository makes a project much more difficult to maintain for these reasons:

  • It makes reviewing code and comparing code difficult, because it is difficult to tell which changes are from which repository.
  • It creates additional maintenance overhead. Every time the code in the copied repo is changed on git, I need to first find out about it, then recopy the code to my repository.

There are two options here:
Option 1) Add the SINR code as a git submodule (see https://www.freecodecamp.org/news/how-to-use-git-submodules/)
Option 2) Convert the SINR code to a proper python package that can be installed via the requirements.txt file (note that it doesn't need to be pushed to pypi, see https://pip.pypa.io/en/stable/topics/vcs-support/)

Easy UI fixes

If there are UI updates we want that take less than an hour (e.g changing button names, descriptions), list them here so we have a record.

Create separate branch and PR for each fix so that it's easy to review.

Create shareable Docker image of iNatAtor

UPDATED:

  • There are images of React and FastAPI (database?)
  • There is a Docker Compose file, which connects all images and runs all containers - docker-compose.yml
  • The images are published on Docker Hub - https://hub.docker.com/repository/docker/oyilmazel/ds4cg-inaturalist/general
  • There are instructions: - in the readme
    • How to build images - docker-compose build
    • How to update images - docker-compose build
    • How to run the Docker Compose file docker-compose up, docker-compose up --build

  • There's a Dockerfile in the repository that builds an image
  • Instructions in README on how to create an image that we can share

Stretch: Host image on Azure or AWS

Add loading status in application

App has long loading times, and there are no indicators whether it crashed or what is going on. To show users that everything is working well, or something has crashed, we should add status bars in the application that show if the action they just did is successful or not.

I really started to feel the necessity of this task as we started to integrate database into the application, there should be ui changes that signal what is going on in the background, users who have no access to console or server logs have no idea what is going on it is a little stressful to use it

User selects one species to display the species' range map for

As an ecologist, I would like to visualize the predicted range map for a species I choose.

Definition of Done:

  • Pick one version of prototype code to use as a starting point on the main branch -> reassigned to Sergei
  • Copy code from SINR repo that's needed (few functions that are needed) -> reassigned to Sergei
  • Add prototype code -> reassigned to Sergei
  • API loads prediction and sends them to front end
  • Front end renders the species' range on the map
  • User searches for species name in drop down
  • Prediction threshold is hard coded at 0.1
  • Hex resolution is set to H5 for visualization
  • Update Readme

Deploy iNatator on Microsoft Azure on an externally accessible endpoint

As an annotator, I want to be able to try out the tool, so that I can give feedback to the development team.
Anyone with a link to the web tool can try out the annotation tool.

Definition of Done:

  • Sam and Grant can access a VM in the cloud where our annotation tool is running
  • No domain name needed, an IP address is fine, so long as it is publicly accessible
  • This is just to try out and play with, so taking shortcuts (e.g. security) is ok
  • Virginia suggests trying Azure Container Instances (we used these for Red Cross last summer), but there will be some research and trial & error involved.
  • Works for max 5 people at a time.

Github workflow to push Docker containers to CDS DockerHub for tagged releases

As a developer, I would like Docker containers to be built for official releases using CICD workflows on Github so that I can more easily share and deploy the application.

This will involve both gathering team input to decide when and how releases should be created, as well as setting up Github workflows and documentation.

Definition of Done:

Annotators can individually change predictions in hexes at resolution level 5 and save them

As an ecologist, I need to be able to select a single hex and change its value for whether the species is predicted there or not.

  • Cells where the model predicts the species to be are initially colored in
  • Clicking a hex flips whether the species range is there or not
  • There is a clear visual distinction between hexes that are predicted within the species range and those that are not.
  • The user will click a "Save" button that export JSON (later on this button will populates the changes in the database). No changes are saved until they click that button

Definition of done:

  • Hexagon grid at resolution 5 is displayed in the map as a base layer
  • OPTIONAL: After certain zoom, grid is no longer displayed
  • When getting prediction, the annotation layer is created, showing in a different color (green) the corresponding res-5-hexagon. This will be the annotation starting point (if the user submits annotation without changes, then their annotation coincides with the model prediction)
  • User can click on hexagons and this action changes colors of the hexagons (select and unselect)
  • We are keeping track somehow of selected hexagons so they can be stored in a json file by the end of the annotation.

Add a step to the backend Dockerfile to download pre-trained models

As a developer, I would like to the automated deployment process to include the step for downloading and unzipping pre-trained models in order to reduce the number of steps in set up where things can go wrong.

Definition of done:

  • Currently in order to deploy the application, the developer has to download the pretrained models locally and then run the docker commands. After this ticket is done, there is no need for someone to manually download the models if they just want to run the service in docker containers.
  • There is a step in the docker container that will download and unzip the appropriate pre-trained models (using a command like wget or curl)

Push code to repository with instructions on how to set up the development environment

Goal: Create a repository for annotation tool that other developers can use to add features or work on

Definition of Done:

  • All code related to the app/annotation tool should be in its own repo (ds4cg2024-inaturalist repo)
  • README includes installation instructions
  • React packaging requirements
  • Set code standards for library and include them in documentation
  • Have meeting with team to do a code review
  • Pick one version of prototype code to use as a starting point on the main branch -> reassigned to Sergei
  • Copy code from SINR repo that's needed (few functions that are needed) -> reassigned to Sergei
  • Add prototype code -> reassigned to Sergei

Users can select a large region at once to annotate

As an ecologist, I want to be able to change the predicted species range in multiple adjacent cells/hexes at once in order to save time and make annotating less tedious.

Definition of done:

  • There's a bulk select "mode" that the user can enter in order to select multiple cells at once
  • The cells are all changed to the same value (even if they had different values at the start)
  • The user will choose whether the species should be predicted there (filled in) or not (all cells left empty)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.