With this tutorial, you'll be able to:
- Start an ARLAS-Exploration stack
- Index some Flickr photo metadata in Elasticsearch
- Reference the indexed Flickr photo data in ARLAS
- Create a view of ARLAS-wui (a dashboard) to explore the Flickr photo data using ARLAS-wui-hub and ARLAS-wui-builder
You will need :
- docker & docker-compose
- curl
Exploration app created in this tutorial
Let's explore some photo data, available on Flickr.
We built a dataset composed of Flickr metadata for photos from the month of January 2020 thanks to the Flickr API
This subset is stored in flickr_data.json
. It contains around 50.000 geo located Flickr photos described with at least 9 main columns:
- photo_id: Unique identifier of the photo
- datetaken: Date of the photo
- ownername: Name of the Flickr user who published the photo
- owner: Code of the Flickr user who published the photo
- timestamp: Unix timestamp corresponding to the date of the photo
- point_geom: WKT geometry of the location associated to the photo
- tags: Tags associated to the phot by the owner
- server: Server of the image
- secret: Identifier used to access image on Flicker server
The content of a line of the json file looks like:
photo_id | datetaken | ownername | owner | timestamp | point_geom | tags | server | secret |
---|---|---|---|---|---|---|---|---|
51684435389 | 2020-01-10 12:21:15 | jeffergray | 27685985@N03 | 1578655275 | "POINT(-89.810623 18.105227)" | "2020","winter" | 65535 | 1167660620 |
Some other fields are contained in the data but will not be explored in this tutorial, feel free to explore them.
We will explore this data using ARLAS.
0. Setup
- Create a repository dedicated to this tutorial
mkdir ARLAS-stack-flickr-tutorial
cd ARLAS-stack-flickr-tutorial
- Download the flickr data
curl -o flickr_data.json -L "https://raw.githubusercontent.com/gisaia/ARLAS-stack-photo-flickr-tutorial/master/data/flickr_data.json"
Check that flickr_data.json
file is downloaded
ls -l flickr_data.json
- Download the ARLAS-Exploration-stack project and unzip it
(curl -L -O "https://github.com/gisaia/ARLAS-Exploration-stack/archive/develop.zip"; unzip develop.zip)
Check that the ARLAS-Exploration-stack-develop
stack is downloaded
ls -l ARLAS-Exploration-stack-develop
Now our tutorial environment is set up.
1. Starting ARLAS Exploration Stack
./ARLAS-Exploration-stack-develop/start.sh
2. Indexing flickr data in Elasticsearch
- Create
flickr_index
index in Elasticsearch withflickr.es_mapping.json
mapping file
curl "https://raw.githubusercontent.com/gisaia/ARLAS-stack-photo-flickr-tutorial/master/configs/flickr.es_mapping.json" |
curl -XPUT http://localhost:9200/flickr_index/?pretty \
-d @- \
-H 'Content-Type: application/json'
You can check that the index is successfuly created by running the following command
curl -XGET http://localhost:9200/flickr_index/_mapping?pretty
- Index data that is in
flickr_data.json
in Elasticsearch. For that, we need Logstash as a data processing pipeline that ingests data in Elasticsearch. Logstash needs a configuration file (flickr2es.logstash.conf
) that indicates how to transform data from the CSV file and to index it in Elasticsearch.
curl "https://raw.githubusercontent.com/gisaia/ARLAS-stack-photo-flickr-tutorial/master/configs/flickr2es.logstash.conf" \
-o flickr2es.logstash.conf
- Now we will use Logstash in order to apply the data model transformation and to index data in Elasticsearch given the
flickr2es.logstash.conf
configuration file with the docker imagedocker.elastic.co/logstash/logstash
:
network=$(docker network ls --format "table {{.Name}}" | grep arlas)
cat flickr_data.json | docker run -e XPACK_MONITORING_ENABLED=false \
--net ${network} \
--env ELASTICSEARCH=elasticsearch:9200 \
--env INDEXNAME=flickr_index --rm -i \
-v ${PWD}/flickr2es.logstash.conf:/usr/share/logstash/pipeline/logstash.conf docker.elastic.co/logstash/logstash:7.11.2
- Check if 50 000 photo locations are indexed:
curl -XGET http://localhost:9200/flickr_index/_count?pretty
3. Declaring flickr_index
in ARLAS
ARLAS-server interfaces with data indexed in Elasticsearch via a collection reference.
The collection references an identifier, a timestamp, and geographical fields which allows ARLAS-server to perform a spatial-temporal data analysis
- Create a Flickr collection in ARLAS
curl "https://raw.githubusercontent.com/gisaia/ARLAS-stack-photo-flickr-tutorial/master/configs/flickr_collection.json" | \
curl -X PUT \
--header 'Content-Type: application/json;charset=utf-8' \
--header 'Accept: application/json' \
"http://localhost:81/server/collections/flickr_collection?pretty=true" \
--data @-
- Check that the collection is created using the ARLAS-server
collections/{collection}
curl -X GET "http://localhost:81/server/collections/flickr_collection?pretty=true"
4. Create a dashboard to explore flickr data
with ARLAS
ARLAS stack is up and running, we have flickr data available for exploration. We can now create our first dashboard composed of
- a map to observe the photo locations, the positions' geographical distribution
- a timeline presenting the number of photos taken over time
- a search bar to look for photos by their names for instance
- some widgets to analyse the data from another axis such as the tags or owner.
To do so, let's go to ARLAS-wui-hub at http://localhost:81/hub and create a new dashboard named Flickr dashboard
.
figure 0: Creation of a dashboard in ARLAS-wui-hub
After clicking on Create, you are automatically redirected to ARLAS-wui-builder to start configuring your dashboard.
The first thing we need to do is to tell ARLAS which collection of data we want to use to create our dashboard.
figure 1: Choose collection
In our case we choose the flickr_collection
.
As a first step, we set the map at zoom level 4 and the map's center coordinates at Latitude=44° and Longitude=4°. This way, when loading the dashboard in ARLAS-wui, the map will be located over Western Europe.
figure 2: Map initialisation
For now, the map is empty. The first thing we want to find out is where the photos are taken.
figure 3: Layers view
To do so, let's add a layer named Photo location
to visualise the location associated to each photo on the map.
In the Geometry
section (
figure 4: Adding a Geometric Features layer named 'Photo location'
Now, let's define the layer's style. As a starter, we choose the best representation of our geometries: a photograph's location is a point. We choose a circle with a fixed color (white for instance) and a fixed width of 4 pixels. We choose a black stroke with a 1 pixel width. We set the transparency at 0.5 to better observe location overlapping.
figure 5: Customizing 'Photo location' style: white circle
figure 6: Customizing 'Photo location' style: black stroke
After clicking on Validate, our first layer is created.
figure 7: New layer 'Photo location' is created
We can also choose the basemaps that will be available in the dashboard.
figure 8: Choice of Basemaps
We can go and preview the layer in Preview
tab.
figure 9: Preview of 'Photo location' layer
We can now see where the photos are taken thanks to this layer.
Let's find out the time period when these positions were emitted.
For that, let's define a timeline: a histogram that will represent the number of pictures taken over time.
For the x-Axis we choose the timestamp
field. The y-Axis represents the number of positions in each bucket. We set 50 buckets in this example. The detailed timeline appears when the selection is very precise. We set it at 50 buckets.
figure 10: Defining the timeline
In the Render
tab we can set the element name for the timelines ('Photos' here), the date format and the histogram type. Let's choose bars histogram.
figure 11: Timeline render
To define the search bar we can set
- the placeholder string
- the field used to search keywords
- the field used to autocomplete the searched words
figure 12: Defining the search bar
Now we defined
- the 'Photo location' layer in the map
- the timeline
- the search bar
Let's save this dashboard by clicking on the 'Disk' icon at the bottom-left of the page.
If we go back to ARLAS-wui-hub at http://localhost:81/hub, we'll find the Flickr dashboard
created.
figure 13: List of created dashboards
We can now View it in ARLAS-wui.
figure 14: View of the dashboard in ARLAS-wui
We can now explore the images' locations, but we want to also be able to see the real images.
The first step is to create a data table to represent the elements of our dashboard.
figure 15: Creation of the data table
We quickly configure basic information about the photo that will be displayed in the data table. We can add two columns:
- Date: When the photo was taken, based on the field datetaken
- Title: Name of the published picture, based on the field title
Each row of the data table represents a photo.
figure 16: Configuration of the data table's columns
We can also configure the Details
section to display more information about the photo.
Let's add two more fields:
- Owner: The name of the user who published the photo, based on the field ownername
- Tags: The list of tags associated by its owner, based on the field tags
figure 17: Configuration of the data table's details
It results in a data table that can be displayed on the right side of the dashboard.
figure 18: Data table to the right of the dashboard
When hovering the mouse over a row of the data table, it highlights the picture's location on the map.
We now quickly have access to the name, the date, the location, the owner and the tags associated to the photos. But the photo itself is still missing!
In the Render
section of the data table, it is possible to configure a Grid View to display images instead of text and numbers. It is possible to configure the following parameters:
- The title of the image is the field title
- For each image, the url to the flickr server to access the image as a thumbnail (_t). It follows a pattern based on some fields of the data: https://live.staticflickr.com/{server}/{photo_id}_{secret}_t.jpg
- For each image, the url to the flickr server to access the image in a bigger format (_b) is stored in image_url. It follows a pattern based on fields of the data: https://live.staticflickr.com/{server}/{photo_id}_{secret}_b.jpg
figure 19: Configuration of the image grid
The dashboard is now able to display a preview of the images.
figure 20: Thumbnails on the right of the dashboard
We can extend these thumbnails to see more images in the grid. Click on an image to display a bigger version of it.
figure 20: Image grid on the right of the dashboard
We have now a much better display of the available images.
For this tutorial, we only have a sample of 50.000 Flickr images to explore and we start noticing an overlapping of photo location in some areas.
What to do in case we had millions of images to display?
It would be very difficult to display them all at once as it would be very heavy to request all this data and the browser will not be able to render this many features. We will end up loosing the user experience fluidity.
Most importantly, loading millions of images on the map will not be necessarily understandable: we cannot derive clear and synthesized information from it.
That's why ARLAS proposes a geo-analytic view: we can aggregate the photo locations to a geographical grid and obtain a geographical distribution!
Let's create a new dedicated layer Photo distribution
for photo location geographical distribution.
figure 21: Creating a geographical distribution layer
We choose to aggregate the point_geom
geo-field to a geographical grid and we choose a fine granularity for this grid. We will display on the map the grid's cells.
Let's define the style of these cells in Style
section.
We interpolate the cells' color based on the number of photos in each cell. To do so, we choose a normalised Hits count as our interpolation field and we then select a color palette.
figure 22: Styling the geographical distribution layer
We set the cell opacity at 0.3 and the stroke opacity at 0 (no need to display stroke here).
After saving this layer, we can visualise it and explore where the photos are geographically distributed.
figure 23: Photo location distribution
We have now the distribution of the entire Flickr photo sample. However, the distribution layer doesn't show the exact location of the pictures displayed on the right.
The solution to always see these locations, is to create a new layer that only shows the scrollable features. As we will still represent photo location by a circle, we can start from the Photo location
layer, duplicate it and edit the style.
figure 24: Duplicate 'Photo location' layer
Let's rename it Scrollable photos
and change the circle color to red. We can also increase a bit the radius, opacity and stroke.
figure 25: Set the 'Scrollable photos' layer style
The most important is to change the data fetching strategy of the layer to Scrollable features in the Visibility
section.
figure 26: Set the 'Scrollable photos' layer visibility
Once created, we can also rename the Visualisation set
that contains our layer and reorder them to let the new 'Scrollable photos' layer always be on top.
figure 27: Update the visualisation set
Now, if we go back to our dashboard, the 'Scrollable photos' layer is available and always displays in red the locations of scrollable photos (loaded in the grid). The pictures that are directly visible in the grid have a black stroke.
figure 28: Update the visualisation set
Now we can keep an eye on the displayed photos' location.
We focused on the geographical and temporal analysis. We can also explore other dimensions of the data.
Let's see what are the most used tag of the posted photos.
To do so we need to create a powerbar diagram. ARLAS proposes to organise all the powerbars, histograms and other widgets in an analytics board that appears to the left of the dashboard.
We can split the analytics board into tabs. Let's create a tab called 'Photos' where we will add our tag distribution.
figure 29: Creating a tab in the Analytics board
Once the tab is created, we can add in it a group of widgets. Let's name it 'Tags'.
figure 30: Creating a group in the Analytics board tab
Let's now create our powerbar diagram.
figure 31: Choosing a powerbar for tags distribution
We can give a title to the tags distribution powerbar.
We choose the field tags
and we associate the metric Hits count, which is the number of repetition of each tag value. We display the 15 most important values in this example.
figure 32: Configure powerbar for photos' tags distribution
In the Render
section, we can allow the search of values and the downloading of the distribution.
figure 32: Allow search and download of tags powerbar
When we save the histogram we automatically get a preview of it in the analytics board!
figure 33: Preview of the tags distribution histogram
We can now save the dashboard again using the 'Disk' icon at the bottom-left of the page and view it in ARLAS-wui.
figure 34: Exploring the Flickr photo dashboard in ARLAS-wui
As you can see we created a simple dashboard to start exploring Flickr photo data!
Check out a more sophisticated dashboard about the Flickr photo data that explores ~450 000 photos in our demo space!
You can get inspired from our different demos to build other map layers and other widgets.