Git Product home page Git Product logo

speciesdistributiontoolkit.jl's Introduction

SpeciesDistributionToolkit

πŸ—ΊοΈ SpeciesDistributionToolkit.jl is a collection of Julia packages forming a toolkit meant to deal with (surprise!) species distribution data. Specifically, the goal of these packages put together is to provide a consistent way to handle occurrence data, put them on a map, and make it interact with environmental information.

GitHub Release DOC Static Badge

Important

This package is not intended to perform any actual modeling, but can serve as a robust basis for such models.

Current component packages

The packages do work independently, but they are designed to work together. In particular, when installing SpeciesDistributionToolkit, you get access to all the functions and types exported by the component packages. This is the recommended way to interact with the packages.

Note

The badges will not pick up old releases of the component packages, and so they will show "no matching release found" until a new release is done. The packages still work.

Getting occurrence data: GBIF.jl

A wrapper around the GBIF API, to retrieve taxa and occurrence datasets, and perform filtering on these occurrence data based on flags.

GitHub Release DOC

Getting environmental data: SimpleSDMDatasets.jl

An efficient way to download and store environmental raster data for consumption by other packages.

GitHub Release DOC

Using environmental data: SimpleSDMLayers.jl

A series of types and common operations on raster data.

GitHub Release DOC

Simulating occurrence data: Fauxcurrences.jl

A package to simulate realistic species occurrence data from a know series of occurrences, with additional statistical constraints.

GitHub Release DOC

Getting organisms silhouettes: Phylopic.jl

A wrapper around the Phylopic API.

GitHub Release DOC

Want to help?

πŸ§‘β€πŸ’» To get a sense of the next steps and help with the development, see the issues/bugs tracker.

πŸ€“ From a technical point of view, this repository is a Monorepo consisting of several related packages to work with species distribution data. These packages were formerly independent and tied together with moxie and Require, which was less than ideal. All the packages forming the toolkit share a version number (which was set based on the version number of the eldest package, SimpleSDMLayers), and the toolkit itself has its own version number.

speciesdistributiontoolkit.jl's People

Contributors

danielskatz avatar gabrieldansereau avatar github-actions[bot] avatar gottacatchenall avatar michielstock avatar mkborregaard avatar rafaqz avatar spaette avatar tpoisot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

speciesdistributiontoolkit.jl's Issues

Avoid using stride when clipping

clip currently uses the stride, which might return approximation errors as in #144.

@tpoisot had an idea to avoid using stride completely. Possibly using indices instead of coordinates?

CHELSA2 has no mask for land masses

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

IO not exported?

Am I missing a new change to the API or are the files in ./src/io not exported? Is there another way to read/write from disk?

Add layer description to get human-readable description of layer names

Add a layerdescriptionsto SimpleSDMDatasets, which by default would return some sort of mapping from a layer index, code, and human readable description. This function can be exported, and the fallback would be that the usual layer is its own description (this is true for everything except bioclim, for the moment).

Logo

The revamped meta-package will need a new logo. Presumably something we can make a sticker out of.

Update documentation

Now that the packages have reached their new equilbrium, the "docstrings" section of the documentation needs to be put back in.

Fix SimpleSDMLayers tests

There is a lot in the tests that is actually relying on SimpleSMDatasest, which needs to be removed. These tests can move in the main repo, and be replaced by mock examples.

Update the model cards

The model cards in the model catalogue are a little bit of a mess, it would be a good idea to list the layer names, the years or time intervals, and have the info presented in a less overwhelming way.

The WorldClim tests are going to fail a lot

The WorldClim server is less and less reliable, so it would be a good idea to allow this entire test suite to fail, and make a mention of this in the documentation. For the moment, this is not a big issue as none of the examples use the WorldClim data, but this might become one later.

Stitch returns the wrong bounding box in the end

See the doc example, which gives out

SDM response β†’ 450Γ—1088 grid with 489600 Int64-valued cells
  Latitudes	43.18333333333334 β‡’ 43.65208333333334
  Longitudes	-80.00000000000001 β‡’ -70.93333333333335

where it should be

SDM response β†’ 450Γ—1088 grid with 489600 UInt8-valued cells
  Latitudes	43.18333333333334 β‡’ 46.93333333333335
  Longitudes	-80.00000000000001 β‡’ -70.93333333333335

Wrong coordinates returned when clipping with exact coordinates

@tpoisot I ran into the following issue with clip while trying to subset a layer for Canada into smaller subregions. The boundary returned for some exact coordinates (e.g. 56.0 as a top boundary) sometimes leaves out one pixel row/column. I'm running into this issue with SpeciesDistributionToolkit v0.0.2 in a temporary environment, but I also had it with SimpleSDMLayers v0.8.3.

Here is an example:

using SpeciesDistributionToolkit

spatialrange = (left = -145.0, right = -50.0, bottom = 40.0, top = 89.0);
layer = SimpleSDMPredictor(RasterData(WorldClim2, BioClim); spatialrange...); # also happens with 2.5 arcmin

# Wrong coordinates while clipping
clip(layer; top=56.0).top # 55.833333333333336
clip(layer; top=56.0 - 0.0000001).top # 56.0
clip(layer; top=56.0 + 0.0000001).top # 56.16666666666667

I would expect the first two clip calls to return 56.0 exactly, otherwise it leaves out one cell row. I tried with other exact coordinates and the problem seems to happen with no specific pattern:

julia> exact_lats = (spatialrange.bottom+1.0):1.0:(spatialrange.top - 1.0)
41.0:1.0:88.0

julia> show([clip(layer; top=l).top for l in exact_lats]) # patterns seems random...
[40.833333333333336, 41.833333333333336, 43.0, 43.833333333333336, 45.0, 46.0, 47.0, 47.833333333333336, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 55.833333333333336, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 66.83333333333333, 68.0, 69.0, 70.0, 71.0, 71.83333333333333, 73.0, 74.0, 75.0, 76.0, 76.83333333333333, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0]

The problem also happens with the bottom bound and with longitudes. On the other hand, it does not happen with geotiff reading calls.

julia> exact_lats = (spatialrange.bottom+1.0):1.0:(spatialrange.top - 1.0)
41.0:1.0:88.0

julia> exact_lons = (spatialrange.left+1.0):1.0:(spatialrange.right - 1.0)
-144.0:1.0:-51.0

julia> all(isinteger.([clip(layer; top=l).top for l in exact_lats])) # patterns seems random...
false

julia> all(isinteger.([clip(layer; bottom=l).bottom for l in exact_lats])) # only 1 case
false

julia> all(isinteger.([clip(layer; right=l).right for l in exact_lons])) # happening with longitudes too
false

julia> all(isinteger.([clip(layer; left=l).left for l in exact_lons]))
false

julia> all(isinteger.([SimpleSDMPredictor(RasterData(WorldClim2, BioClim); left = -145.0, right = -50.0, bottom = 40.0, top = t).top for t in exact_lats]))
true

I tracked the problem down to the _match_latitude function. It seems to be a rounding issue. I have a fix I'll make in a PR and let you review.

function _match_latitude(layer::T, lat::K; lower::Bool=true) where {T <: SimpleSDMLayer, K <: AbstractFloat}

Link to documentation in the original repositories

@tpoisot The links to the documentation in the original repositories are broken, but I can't fix them since they have been archived. Raising this here since it would be a good thing to fix them until the monorepo is ready.

The links that need to be updated (in the README and the About section):

Reading certain tiffs with `geotiff` is wrong if the CRS is not WGS84

While working on this, GEO-BON/bon-in-a-box-pipelines#10, we realized that some types of missing values throw an error when trying to load

What happened?

Reading layers in with geotiff, we run into several issues with missing/NaN values in geotiffs.

As an example

layerpath = "chelsa_clim_bio11981.tif"
geotiff(SimpleSDMPredictor, layerpath)

fails with the below error where the chelsa tiff is available here

Stacktrace

ERROR: MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:872
  iterate(::Union{LinRange, StepRangeLen}, ::Integer) at range.jl:872
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at dict.jl:712
  ...
Stacktrace:
 [1] indexed_iterate(I::Nothing, i::Int64)
   @ Base ./tuple.jl:91
 [2] (::SimpleSDMLayers.var"#53#54"{SimpleSDMPredictor, Int64})(dataset::ArchGDAL.Dataset)
   @ SimpleSDMLayers ~/.julia/packages/SimpleSDMLayers/lYDIT/src/datasets/geotiff.jl:81
 [3] read(f::SimpleSDMLayers.var"#53#54"{SimpleSDMPredictor, Int64}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ArchGDAL ~/.julia/packages/ArchGDAL/zkx2f/src/context.jl:267
 [4] read
   @ ~/.julia/packages/ArchGDAL/zkx2f/src/context.jl:264 [inlined]
 [5] #geotiff#52
   @ ~/.julia/packages/SimpleSDMLayers/lYDIT/src/datasets/geotiff.jl:46 [inlined]
 [6] geotiff (repeats 2 times)
   @ ~/.julia/packages/SimpleSDMLayers/lYDIT/src/datasets/geotiff.jl:32 [inlined]
 [7] top-level scope
   @ ~/Code/Tmp/foo.jl:9

Allow options to ArchGDAL driver when saving Geotiff

Description of the to-do item

support for different drivers when writing geotiff

More information about the item

The current geotiff function for writing files uses the GTiff driver in ArchGDAL by default.

https://github.com/PoisotLab/SimpleSDMLayers.jl/blob/dcc49145c2c844bfe63fff99b96cf86720076512/src/datasets/geotiff.jl#L191

It would be nice to have this either be a keyword argument or something to enable selection of different drivers, e.g. I need to use the "COG" driver to write a cloud-optimized geotiff

MLJ integration

Building the MLJ integration should not be too hard since there is already an interfaces to tables. What would help is a small wrapper around multiple layers with the same coordinates to build the tables quickly. The materializer will take care of re-projecting everything.

Add expand options to clip

Add options like leftexpand = true to clip to specify if the function should include outside pixels when the clipping coordinate is exactly between two pixels

downloaders break when imported as part of another module

When building download links, this bit of code

stem = "CHELSA_$(lowercase(model_code))_r1i1p1f1_w5e5_$(scenario_code)_$(var_code)_$(month_code)_$(replace(year_sep, "-"=>"_"))_norm.tif"
can (in some cases) build a misshapen URL because it trys to inject $scenario_code where scenario_code = replace(lowercase(string(S)), "_" => "-"), and S is a SSP struct.

When I import the SDT in a module, e.g. MyModule.jl:

module MyModule
   using SpeciesDistributionToolkit
  
  function foo()
        SimpleSDMPredictor(
            RasterData(CHELSA2, BioClim),
            Projection(SSP370, GFDL_ESM4);
            layer="BIO1" ) 
  end   
   export foo
end

and run

include("MyModule.jl")
MyModule.foo()

fails because it trys to download a url with slug 2011-2040/SIMPLESDMDATASETS.GFDL-ESM4/simplesdmdatasets.ssp370/bio/CHELSA_bio1_2011-2040_simplesdmdatasets.gfdl-esm4_simplesdmdatasets.ssp370_V.2.1.tif.

Ring Ouzel example

The example from Zurell (https://damariszurell.github.io/SDM-Intro/) is really well written, and something that could be a good vignette. The spatial extent is also relatively small, and could be a good opportunity to show how polygon clipping works.

This will likely require MLJ integration, so this would work as a "capstone" vignette.

GBIF queries get converted (for some reason)

Using "hasCoordinate" => true as a query will result in this being interpeted as "hasCoordinate" => 1. The fix will probably involve making sure that everything is a string before being passed to HTTP for the request.

Integrate with Phylopic

Phylopic has a public RESTful API @ http://phylopic.org/api

It would definitely be a good idea to establish a way to get pictures from a GBIF taxonomic name. This should probably turn into a new component package, and one additional issue would be to see how we can turn the phylopic into something that Makie can use.

Vignette on scaling and quantiles

This can be done by essentially replicating the former vignette on the BIOCLIM model - this would mostly show the various ways to do stats, including quantiles, on layers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.