ARLAS workshop Argonautica

ARLAS workshop Argonautica
Argonautica and weather data
- Argonautica data
- Marine weather data
Exploring multiple collections at once
- Indexing and loading data into ARLAS
- Creating a dashboard to explore Argonautica data with ARLAS
The final words

Introduction

This workshop's aim is to present ARLAS's capabilities to explore big volumes of geo-spatial data using an analytic and geo-analytic approach through the cross study of loggerhead turtles' movements and climate conditions. It is a deeper dive into the data that was presented during the talk at the 2022 GeoDataDays.

In the first part of the workshop, we present which data sets we will use, as well as the processes that led to the final data that will be explored in this workshop.

In the second part, we will then explore data of :

Loggerhead turtles' movements from the CNES educational program Argonautica
Marine weather data from the Copernicus Marine Service platform

This example helps you go through the different steps to start exploring data with ARLAS :

Studying the data and make it analytics ready
Starting the ARLAS-stack and reference the data in ARLAS
Building dashboards with map layers and widgets
Exploring the data.

What will you learn?

With this tutorial, you'll be able to:

Start an ARLAS-Exploration stack
Index some Argos and weather data in Elasticsearch
Reference the indexed data in ARLAS
Create a view of ARLAS-wui (a dashboard) to explore the impact of climate conditions on loggerhead turtles' movements using ARLAS-wui-hub and ARLAS-wui-builder

Prepare the workshop

The workshop is prepared to be run under Linux / Mac OSX operating system.
You will need to have docker and docker-compose installed
You will need curl for http requests and to download some tools for the workshop

What will you get?

With this workshop, you will discover how to add data to ARLAS, as well as create an ARLAS dashboard to explore this data.

figure 0: Example of view thanks to the created dashboard

Argonautica and weather data

Argonautica data

Let's explore some animal tracking data.

We built a dataset composed of 3 loggerhead turtles' movements between 2018 and 2021 from the CNES educational program Argonautica. From the communicated positions, we added information regarding their movements (speed, trail, ...) as well as the climate conditions they encountered by interpolating those values from the un-merged global weather data set that we constitued as explain in the next section.

This subset is stored in argonautica_data.csv. It contains 8 755 positions described with the following columns:

class: if not 'G' indicates the precision range of an Argos measure. If 'G', the measure is a GPS one.
dateHour; lat; lon: temporal and geospatial coordinates of the record
name: name of the loggerhead turtle
point_geom: the position in WKT format
timestamp: the UNIX timestamp of the measure. It is used to indicate the temporality of the measure to ARLAS as seen in the argonautica_collection.json
trail_id: the unique ID of this record
trail_geom: the WKT line linking this point and the next one
gps_speed: the computed GPS speed (m/s)
gps_bearing: the computed GPS orientation (°)
current_speed; current_angle; SLA; SST; CHL: climate conditions interpolated from the weather dataset
dayOfYear; weekOfYear; month; year: helpers for the representation of periodic temporal data

The content of a line of the csv file looks like:

class	lat	lon	name	type	dateHour	point_geom	timestamp	trail_id	trail_geom	gps_speed	gps_bearing	current_speed	current_angle	SLA	SST	CHL	dayOfYear	weekOfYear	month	year
B	37.7233	-0.0592	Gloria	Loggerhead turtle	2021/05/05 10:14	POINT(-0.0592 37.7233)	1620201690	Gloria#1620200940_1620202440	LINESTRING (-0.0585 37.7254, -0.0592 37.7233)	0.5786710530504017	194.82992333596488	0.0890163012627082	55.2911825117637	0.0357612146212001	18.053640347593728	0.1463965331146449	125	18	5	2021

Marine weather data

The weather dataset is built by associating Copernicus datasets composed of data that could be useful to understand loggerhead turtles movements. In order to obtain the weather data set that is in this workshop, the following process was applied:

Retrieve the interesting data sets form the Copernicus platform
Because of the download size limit, piece together each data set
Complete them with other data sets with the same variables to reach the maximum coverage possible
Interpolate them to a common temporal and spatial grid
Merge them together
From the Argonautica data selected, extract the daily bounding boxes that encompass the turtles movements, with a certain margin
Add to the data set the geometry of the cell that the weather record represent

This subset is stored in weather_data.csv. It contains around 57 199 processed weather records described with the following columns:

date; latitude; longitude: temporal and geospatial coordinates of the record
SLA: Sea Level Anomaly, or the difference in height between the avergae global sea height and the measured one (m)
SST: Sea Surface Temperature (°C)
chl: the quantity of chlorophyll measured (mg/m^3)
current_angle: the angle of the average surface currents (°)
current_speed: the speed of the average surface currents (m/s)
timestamp: the UNIX timestamp of the measure. It is used to indicate the temporality of the measure to ARLAS as seen in the weather_collection.json
unique_id: the unique ID of this record
dayOfYear; weekOfYear; month_name; year: helpers for the representation of periodic temporal data
point_geom: the position in WKT format
box_geom: the WKT polygon representing the cell of the grid on which the data has been interpolated
location_id: an ID formed of longitude_latitude, common to a grid cell to cluster measures of a same position together in Network analytics layers

The content of a line of the csv file looks like:

date	latitude	longitude	SLA	SST	chl	current_angle	current_speed	timestamp	unique_id	dayOfYear	weekOfYear	month_name	year	point_geom	box_geom	location_id
2018-07-03	46.167	-2.833	0.023		0.084	-9.319	0.028	1530568800	46.167_-2.833_1530568800	184	27	July	2018	POINT(-2.833 46.167)	POLYGON((-2.875 46.125,-2.875 46.209,-2.791 46.209,-2.791 46.125,-2.875 46.125))	-2.833_46.167

Exploring multiple collections at once

Indexing and loading data into ARLAS

Now that we discovered what the data represents and looks like, we will explore it through ARLAS. But before that, we need to load the data into ARLAS.

Setup

Before diving into the exploration of the data, we need to retrieve the data, as well as the ARLAS stack itself, by following the steps below.

Create a repository dedicated to this tutorial

mkdir ARLAS-argonautica-workshop
cd ARLAS-argonautica-workshop

Download the turtles' movements and weather data

curl -o weather_data.csv -L "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/data/weather_data.csv"

curl -o argonautica_data.csv -L "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/data/argonautica_data.csv"

Check that both data files are downloaded

ls -l weather_data.csv
ls -l argonautica_data.csv

Download the ARLAS-Exploration-stack project and unzip it

(curl -L -O "https://github.com/gisaia/ARLAS-Exploration-stack/archive/develop.zip"; unzip develop.zip; rm develop.zip)

Check that the ARLAS-Exploration-stack-develop stack is downloaded

ls -l ARLAS-Exploration-stack-develop

Now our tutorial environment is set up.

Starting ARLAS Exploration Stack

The ARLAS stack is composed of multiple entities that work together to allow the user to efficiently explore geo-spatial data. To start all of them, we execute the following script.

./ARLAS-Exploration-stack-develop/start.sh

Troubleshooting: if the ARLAS stack does not start properly and you are on Linux, try the following command to increase the virtual memory given to Elasticsearch.

sysctl -w vm.max_map_count=262144

Indexing data in Elasticsearch

Once the stack is launched, we can use Elasticsearch and Logstash to index and upload the data into ARLAS.

Argonautica data

Create argonautica_index index in Elasticsearch with argonautica.es_mapping.json mapping file

curl "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/configs/argonautica.es_mapping.json" |
curl -XPUT "http://localhost:9200/argonautica_index/?pretty" \
    -d @- \
    -H 'Content-Type: application/json'

You can check that the index is successfuly created by running the following command

curl -XGET "http://localhost:9200/argonautica_index/_mapping?pretty"

Index data that is in argonautica_data.csv in Elasticsearch.

For that, we need Logstash as a data processing pipeline that ingests data in Elasticsearch. Logstash needs a configuration file (argonautica2es.logstash.conf) that indicates how to transform data from the CSV file and to index it in Elasticsearch.

curl "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/configs/argonautica2es.logstash.conf" \
    -o argonautica2es.logstash.conf

Now we will use Logstash in order to apply the data model transformation and to index data in Elasticsearch given the argonautica2es.logstash.conf configuration file with the docker image docker.elastic.co/logstash/logstash :

network=$(docker network ls --format "table {{.Name}}" | grep arlas[-_]exploration)

cat argonautica_data.csv | docker run -e XPACK_MONITORING_ENABLED=false \
    --net ${network} \
    --env ELASTICSEARCH=elasticsearch:9200  \
    --env INDEXNAME=argonautica_index --rm -i \
    -v ${PWD}/argonautica2es.logstash.conf:/usr/share/logstash/pipeline/logstash.conf docker.elastic.co/logstash/logstash:7.17.4

Check if 8755 turtle locations are indexed:

curl -XGET http://localhost:9200/argonautica_index/_count?pretty

Weather data

Create weather_index index in Elasticsearch with weather.es_mapping.json mapping file

curl "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/configs/weather.es_mapping.json" |
curl -XPUT "http://localhost:9200/weather_index/?pretty" \
    -d @- \
    -H 'Content-Type: application/json'

You can check that the index is successfuly created by running the following command

curl -XGET "http://localhost:9200/weather_index/_mapping?pretty"

Index data that is in weather_data.csv in Elasticsearch. For that, we need Logstash as a data processing pipeline that ingests data in Elasticsearch. Logstash needs a configuration file (weather2es.logstash.conf) that indicates how to transform data from the CSV file and to index it in Elasticsearch.

curl "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/configs/weather2es.logstash.conf" \
    -o weather2es.logstash.conf

Now we will use Logstash in order to apply the data model transformation and to index data in Elasticsearch given the weather2es.logstash.conf configuration file with the docker image docker.elastic.co/logstash/logstash :

network=$(docker network ls --format "table {{.Name}}" | grep arlas[-_]exploration)

cat weather_data.csv | docker run -e XPACK_MONITORING_ENABLED=false \
    --net ${network} \
    --env ELASTICSEARCH=elasticsearch:9200  \
    --env INDEXNAME=weather_index --rm -i \
    -v ${PWD}/weather2es.logstash.conf:/usr/share/logstash/pipeline/logstash.conf docker.elastic.co/logstash/logstash:7.17.4

Check if 57 199 weather measures are indexed:

curl -XGET http://localhost:9200/weather_index/_count?pretty

Declaring `argonautica_index` and `weather_index` in ARLAS

ARLAS-server interfaces with data indexed in Elasticsearch via a collection reference.

The collection references an identifier, a timestamp, and geographical fields which allows ARLAS-server to perform a spatial-temporal data analysis

Create an Argonautica collection in ARLAS

curl "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/configs/argonautica_collection.json" | \
curl -X PUT \
    --header 'Content-Type: application/json;charset=utf-8' \
    --header 'Accept: application/json' \
    "http://localhost:81/server/collections/argonautica_collection?pretty=true" \
    --data @-

Check that the collection is created using the ARLAS-server collections/{collection}

curl -X GET "http://localhost:81/server/collections/argonautica_collection?pretty=true"

Now we repeat the same process for the weather data.

Create a weather collection in ARLAS

curl "https://raw.githubusercontent.com/gisaia/ARLAS-argonautica-workshop/master/configs/weather_collection.json" | \
curl -X PUT \
    --header 'Content-Type: application/json;charset=utf-8' \
    --header 'Accept: application/json' \
    "http://localhost:81/server/collections/weather_collection?pretty=true" \
    --data @-

Check that the collection is created using the ARLAS-server collections/{collection}

curl -X GET "http://localhost:81/server/collections/weather_collection?pretty=true"

Creating a dashboard to explore `Argonautica data` with ARLAS

ARLAS stack is up and running, we have both Argonautica and weather data available for exploration. We can now create our first dashboard composed of:

a map to observe the turtles' locations, the positions' geographical distribution and the weather data
a timeline presenting the number of positions and weather measures over time
a search bar to look for turtles by their names for instance
some widgets to analyse the data from another axis such as the turtles' speed or the temperature they are experiencing

To do so, let's go to ARLAS-wui-hub at http://localhost:81/hub and create a new dashboard named Argonautica dashboard.

figure 1: Creation of a dashboard in ARLAS-wui-hub

After clicking on Create, you are automatically redirected to ARLAS-wui-builder to start configuring your dashboard.

Choosing the collection

The first thing we need to do is to tell ARLAS which collection of data we want to use to create our dashboard.

figure 2: Choose collection

In our case we will choose the argonautica_collection as our main collection, but we will have the opportunity to add the weather_collection further along.

Map configuration

As a first step, we set the map at zoom level 3 and the map's center coordinates at Latitude=15° and Longitude=10°. This way, when loading the dashboard in ARLAS-wui, the map will cover the extent of the turtles positions: from the Atlantic to the Indian Ocean.

figure 3: Map initialisation

For now, the map is empty. The first thing we want to find out is where the turtles are passing by.

figure 4: Layers view

To do so, let's add a Geometric Features layer named Turtle location to visualise the trajectory associated to each turtle on the map.

In the Geometry section ($1^{st}$ section), choose arlas_track_trail as the features geo-field.

figure 5: Adding a Geometric Features layer named 'Turtle location'

Now, let's define the layer's style. As a starter, we choose the best representation of our geometries: a turtle's track is a line. We choose a solid line with a color generated by the name of each turtle. We choose the width of the track to be interpolated by the time that each turtle has spent moving since being tagged, to get a sense of the direction in which they are moving.

figure 6: Customizing 'Turtle location' style: colored line based on the turtle's name

In order to be able to visualise all of the turtles positions, we increase the limit of features that this layer can display to 10 000.

figure 7: Customizing 'Turtle location' visibility: increase the maximum of features

After clicking on Validate, our first layer is created.

figure 8: New layer 'Turtle location' is created

We can also choose the basemaps that will be available in the dashboard.

figure 9: Choice of Basemaps

We can go and preview the layer in Preview tab.

figure 10: Preview of 'Photo location' layer

We can now see where the turtles are thanks to this layer, as well as save a preview to be displayed in the hub.

Since we are aiming to analyse the impact of climate conditions on the turtles' movements, we need to be able to visualise weather data. In order to facilitate their visualisation, each of the weather record has been tagged with a location_id in the following pattern longitude_latitude: this allows us to create a Network analytics layer, where weather records are clustered by this tag. We will create three similar layers, able to represent the variation of chlorophyll, Sea Surface Temperature and Sea Level Anomaly around the turtles' tracks as well as a slightly different layer for the current data.

figure 11: Visualisation set tab

In order to be able to switch between the different layers of weather data, we create a Visualisation set for each of them.

figure 12: Create a 'Visualisation set'

figure 13: All the created 'Visualisation set'

For the layers, we start by creating a Network analytics layer for the chlorophyll field, based on the location_id identifier. To do that, we have to change collection by selecting it under the visualisation type.

figure 14: Create a 'Network analytics' layer

We interpolate the color of the displayed polygon with the chl field.

figure 15: Edit the style of the 'Chlorophyll' layer

We choose the third palette to keep the green part of it, to represent well the chlorophyll of the water.

figure 16: Default color palette choices

To remove the unused color, just click on the minus sign on the same line. We then reorder the colors by dragging and dropping them to form a more informative layer. Finally we reset the midpoints of the colors thanks to the button situated up-top.

figure 17: Edition process of the 'Chlorophyll' color palette

To best visualise the data, we increase the maximum number of features to 10 000.

figure 18: Edit the visibility parameters of the layer

We now have one layer for the turtle position, and one for the average chlorophyll measured around the turtles.

figure 19: 'Turtles' positions' and 'Chlorophyll' layer

The creation of the next Network analytics layers will allow you to explore the impact that different climate conditions can have on loggerhead turtles' movements. You can skip them if you want to dive as soon as possible in the exploration.

To create the layers for the Sea Level Anomaly and the Sea Surface Temperature, we repeat the same process, with different color palettes to keep as much information possible in these representations. We suggest to associate each layer to a different visualisation set that we created for more clarity.

For the Sea Level Anomaly layer we suggest to keep the second palette as is.

figure 20: Geometry of the 'Sea Level Anomaly' layer

figure 21: Style of the 'Sea Level Anomaly' layer

For the color palette of the Sea Surface Temperature layer, we suggest to shift all the colors of the first palette to red to get a monochromatic fade.

figure 22: Style of the 'Sea Surface Temperature' layer

Once you are done with those layers, you should have something similar to the figure below.

figure 23: 'Chlorophyll', 'Sea Surface Temperature' and 'Sea Level Anomaly' layers

Remember to change your visibility settings to 10 000 maximum features!

For the final layer, we will represent the currents. The difference resides in how they will be displayed on the map: instead of using polygons and colors to convey the direction and intensity of those currents, we use labels.

figure 24: Create the 'Currents' layer

For more clarity, we disable the overlap of the black arrows that will be displayed.

figure 25: Edit the style of the 'Currents' layer

Once the color chosen, the arrows are all horizontal and point towards the East. To change that we interpolate their size and orientation depending on the orientation and intensity of the currents.

figure 26: Orientate and expand the arrows depending on the data

You can play with the other parameters of the label representation to get a fancier display.

In the end, we obtain the 5 following layers. It already is possible to explore the data, but adding some analytics and additional information helps to bring the best out of the data.

Timeline configuration

First, let's find out the time period when these positions and weather records were emitted.

For that, let's define a timeline: an histogram that will represent the number of pictures taken over time.

For the x-Axis we choose the arlas_timestamp field. The y-Axis represents the number of positions in each bucket. We set 50 buckets in this example. The detailed timeline appears when the selection is very precise. We set it at 50 buckets.

In order to also render the temporal distribution of the weather records, we select weather_collection as an additional collection.

figure 27: Customise the timeline

We can edit the render tab to select a curve histogram, which will allow us to see best both collections at once.

Search bar configuration

To explore huge amounts of data without scrolling, the search bar can be a powerful tool. To define the search bar we can set:

the placeholder string
the field used to search keywords
the field used to autocomplete the searched words

figure 28: Customise the 'Search' bar

Save the dashbord and start exploring in ARLAS-wui

Now we defined

the 'Turtle location' layer and weather layers in the map
the timeline
the search bar

Let's save this dashboard by clicking on the 'Disk' icon at the bottom-left of the page.

If we go back to ARLAS-wui-hub at http://localhost:81/hub, we'll find the Argonautica dashboard created.

We can now View it in ARLAS-wui.

figure 29: Marine currents and Samson's movements

To obtain this figure with as much contrast, you can select the area with turtle tracks in the Indian Ocean thanks to the Geo-filter tools on the right of the map (the icon with the magnifying glass). Then by clicking on the Map settings above, change the operation type to doesn't intersect.

figure 30: 'Turtle location' in parallel with 'Chlorophyll'

Analytics board

Now that we can visualise the turtles' tracks and a summary of the climate conditions they encountered, what's next? To add ways to explore the data and gain other insights on it, we can define different types of analytics back in the builder. We can group them by tab (for example by whether they are related to turtles or climate conditions), as well as by group.

figure 31: Empty analytics board

We can create a Turtles' tracks tab with two groups: Turtles and Turtles' speed.

By clicking on the plus button in the group, we create a powerbar widget that will allow us to see the proportion of each individual in the dataset.

figure 32: Create a group and add a 'powerbar' widget

In this widget, we want to represent which turtles are in the dataset, so we select the corresponding field for the Powerbar field.

figure 33: Create a turtles' name powerbar

In the second group, we can add a Metric that will display the average speed of the turtles.

figure 34: Create a metric

We select the average of the arlas_track_dynamics_gps_speed field, representing the computed GPS speed of the turtle. By pressing add, the metric is added.

figure 35: Edit the metric to display the turtles' average speed

We add an explanatory text as well as an unit to make the metric more understandable for others.

figure 36: Explain the metric

To explore the true distribution of the turtles' speed, we create an histogram of the Hits count (number of documents falling in a specific bucket) of arlas_track_dynamics_gps_speed.

figure 37: Create a speed histogram

Since the Argos and GPS measures have an estimated error that can go up to multiple kilometers, the GPS speed can be way higher than the real speed of loggerhead turtles, which is usually at most 10 km/h.

figure 38: The finished analytics

Thanks to that histogram, we can filter out absurdly high values both in the ARLAS-wui and the preview on the right of the builder. By selecting the turtles' speeds up until 15 m/s, we can have a better insight on what the true loggerhead turtle average speed is.

figure 39: A filtered view of Icare's trajectory and the surrounding currents

Take a look at the loop in athe middle of Icare's trail, as well as to the count of the number of records that fit all the filters that have been set up-top. We can see gaps in the trail and a number of records that is lower than the initial one of 8755. This means that as turtles' positions do not fit anymore the set of filters that the user defined, they are not taken into account by other metrics, analytics or layers.

There exist many ways to filter and explore data in ARLAS, that could help you to see clearer through your mass of data, and help you save time doing the tedious task of inspecting entries of your file.

Geo Big Data

For this tutorial, we only have a sample of 3 turtles to follow, among the more than 20 000 Argos beacons that are active.

What to do in case we had millions of positions and weather records to display?

It would be very difficult to display them all at once as it would be very heavy to request all this data and the browser will not be able to render this many features. We will end up loosing the user experience fluidity.

Most importantly, loading tens or thousands or even millions of positions on the map will not be necessarily understandable: we cannot derive clear and synthesized information from it.

That's why ARLAS proposes a geo-analytic view: we can aggregate the locations to a geographical grid and obtain a geographical distribution!

Let's create a new dedicated visualisation set and layer Turtles' positions distribution to display the density of turtles' movements.

We choose to aggregate the point_geom geo-field to a geographical grid and we choose a fine granularity for this grid. We will display on the map the grid's cells.

figure 40: Creating a geographical distribution 'Cluster' layer

Let's define the style of these cells in Style section.

We interpolate the cells' color based on the number of positions in each cell. To do so, we choose a normalised Hits count as our interpolation field and we then select a color palette.

figure 41: Styling the geographical distribution layer

We set the cell opacity at 0.7 and the stroke opacity at 0 (no need to display stroke here).

figure 43: All the cartographic layers created thus far

After saving this layer, we can visualise it and visualise where the turtles are geographically distributed.

figure 44: Photo location distribution

We have now the distribution of the turtles' entire sample. However, the distribution layer doesn't show the exact location of the turtles displayed on the right.

The final words

As you can see we created a simple dashboard to start exploring environmental data!

Check out a more sophisticated dashboard about the Argonautica data that explores ~70000 turtles' positions as well as ~400 000 weather records in our demo space!

You can get inspired from our different demos to build other map layers and other widgets.

gisaia / arlas-argonautica-workshop Goto Github PK

arlas-argonautica-workshop's Introduction

ARLAS workshop Argonautica

Introduction

What will you learn?

Prepare the workshop

What will you get?

Argonautica and weather data

Argonautica data

Marine weather data

Exploring multiple collections at once

Indexing and loading data into ARLAS

Setup

Starting ARLAS Exploration Stack

Indexing data in Elasticsearch

Argonautica data

Weather data

Declaring argonautica_index and weather_index in ARLAS

Creating a dashboard to explore Argonautica data with ARLAS

Choosing the collection

Map configuration

Timeline configuration

Search bar configuration

Save the dashbord and start exploring in ARLAS-wui

Analytics board

Geo Big Data

The final words

arlas-argonautica-workshop's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Declaring `argonautica_index` and `weather_index` in ARLAS

Creating a dashboard to explore `Argonautica data` with ARLAS