openkfw / mapme.protectedareas Goto Github PK

Reproducible workflows in R for processing open geodata to create knowledge about KfW supported protected areas and conservation effectiveness.

License: GNU General Public License v3.0

R 90.50% PostScript 8.70% CSS 0.52% HTML 0.02% Roff 0.26%

r geodata open-data conservation maps rmarkdown workflowr ecosystems ecology reproducible-research

mapme.protectedareas's People

Contributors

Stargazers

Watchers

Forkers

cccneto fbedecarrats

mapme.protectedareas's Issues

Improve organization of datalake and naming conventions

Hi @Ohm-Np .

I thought that we could still improve a bit how the data is stored and named. My suggestion is to organize data based on projects and processing steps in different folders. I think that will facilitate the organization in the future. I would therefore suggest to order a bit better the datalake and I wrote all suggestion down here in the contributions file. @Ohm-Np could you please have a look at the file and reorganize our so far stored data accordingly. Also the naming conventions for the files are eased now since we have the data in folders that already show the origin.

Please make also sure to not store any data inside the repository. All should go to datalake.

Process burnt areas and fire hotspot

Possible sources

300m resolution global burnt areas from Copernicus
- Monthly composite, July 2020 - present [Sentinel]
- April 2014 - present [PROBA-V]
500m resolution global burnt areas MODIS USGS
VIIRS Active fire

Progress overview

Create routine to process Terrain Rugedness Indicator

derived from SRTM data.

Create Routine to process mangrove gain and loss data

First please give a short overview again which datasets are available for mangrove gain and loss so that we decide upon the datasource
try to create a routine based on the recommendations from the contrib file and the existing raster routine from the carbon balance.

Improve TEOW ecoregions processing

Since our last chat Friday, I have been trying to use library sf exclusively.

The routine steps for teow ecoregions processing follow like this:

fetch country level WDPA polygon from wdpar ✓
select desired wdpa polygon from wdpar and clean the data ✓
load archived global TEOW polygon ✓
simplify the TEOW polygon ✓
generate projstring using area_proj function & transform the projection system of polygons ✕
intersect TEOW and polygon layer ✓
extract areas of the intersection ✓
for each wdpaid, get name of the PA, ecoregion IDs, ecoregion names and area of intersected polygons ✓

I would like @Jo-Schie to check step no. 5 (in rmd, line no. 93-105) since I am getting error messages while applying st_transform which is from sf library.
What I did in this chunk is:

call function area_proj passing sf object as argument
change the crs to the projstring; the return value of area_proj function

GRACE-based drought indicator datasets

Dataset:

GRACE-based shallow groundwater drought indicator

Spatial Coverage	Global
Spatial Resolution	0.25 degrees
Temporal Coverage	2003-Feb to Present
Temporal Resolution	Weekly updates
Unit	percentile
Inference	`percentile` describes current wet or dry conditions showing the probability of occurrence - with lower values meaning dryer than normal, and higher values meaning wetter than normal

I already created function (API connecting to nasagrace) to download these rasters.
Of which time period you need the data @Jo-Schie? Should I download all or may be yearly by choosing specific dates of the year?

Note: Size of one raster file is ~900 kb

Error in dopa-rest.rmd: Only data frames can be used to create a paged_table.

When trying to build the website with wflow_build() I get the following error message:

Error: callr subprocess failed: Only data frames can be used to create a paged_table.

I am troubleshooting this problem right now. Can someone confirm this issue? Otherwise I would expect it to be a problem with my setup.
Thanks!

Change Author Details

Currently we have the follwing author details in all markdown files:

"Johannes Schielein, Om Prakash Bhandari"

Please change that in the following logic: For every markdown file and routine that you wrote as first author, please put

"Om Prakash Bhandari (Author), Johannes Schielein (Review)"

Fix missing WDPA ids and missing BMZ Numbers

There are six areas currently in the kfw wdpa db that cannot be found in the latest version of wdpa. Those should fixed.

And here goes the stuff for a missing BMZ Number

Link to the source codes in the markdownfiles.

Some of the markdownfiles are using source-code from the repository like this one:.

source("code/area_proj.R")

Since we cannot assume that all of the people who read and use the turials have also downloaded the github repo (probably only the minority does), people would need to be able to find the code in order to use it. You may want therefore to write something like:

_ We will now use a function from our repository. If you have downloaded our repository to your local machine, you can call the function as given below with the sourcecommand. If you do not have a local copy, you can download the source-code for the function here._

Assignment: Please check on that in all markdownfiles that describe the routines, and copy and paste the last sentence wherever you source external code. And keep that in mind for the future ;)

Test TEOW routines for bigger data

Test with KFW supported areas (about 430) -> record processing time
Test with all areas of countries with KfW supported (about 5000) -> record processing time
if the code does not run efficiently enough for either of both try to create new routines to process larger datsets. Suggestions: Simplify Polygons (TEOW), subset the data and parallel processing e.g. with multidyplr see here
Save data in datalake output

Portfolio Analysis Ideas

Financial records

tabulate information on how many projects started + completed
Tabulate information on disbursement per project + disbursement average per PA + disbursement average per sqkm
Horizontal Density Plots with disbursement of project funds over time

Improve DOPA API script

This is a short list of improvements that need yet to be implemented in the DOPA script @Ohm-Np .

At some points we need a bit more detailed description of what the function actually returns. I marked them with TODO but there maybe still more to come once we start to analyze the data.
Explain why there are two functions that are more powerful than the others. One is the get_multiple_indicators which returns several (all?) available variables... and the other one is get_dopa where one can choose the topic/variable, kind of being the Swiss knife, right? So basically with these two I can fulfill all of my needs right? Why do I need the rest of the queries than? At least besides the country and the ecosystem query I do not see a clear need for that. So first we would need to check if we can actually get all of the other variables with these two functions. If so than I would suggest discussing this in the beginning and restructuring the script so that these two functions are in the end. So in the beginning you could just say something like "here come some queries for different data, but in the end we have two functions which are the most powerful ones because they can get basically everything from the others" or something similar.
From the section Get absolute landcover ESA the script still needs some reformatting to fit to how I restructured the sections before.
I guess it would be good practice for functions to order the parameters in this way: 1) object that is processed 2,3,4... additional parameters. This is not the case e.g. for get_dopa function where first the two parameters are given and then the object.
The function get_dopa needs a bit more explanation in general. Some of the topics do not work with some of the get query strings. So it was a bit confusing for me to know which one to use first.

create a contrib file

Contents of the contrib file

how to contribute: minimum requirements for rmd files and scripts
- Overview processing routine
- metadata
data storage
output data structure.

It is desirable to have a template that defines the minimum information requirements that should be included in individual processing routines to allow others to quickly understand which data has been utilized and which routines have been applied. This information could be in a contrib. file that will show contributors on how to prepare theirs scripts and data.

It is good to do this soon, so that @Ohm-Np can already gather all necessary information while working on new datasets in the first run.

Improve Net Forest Carbon Flux processing

Improvements could be made by

Just work with the downloaded data instead of querrying new data.
use terra package instead of raster
use sfpackage and all of its spatial functionality instead of sp
improve the text description
put metadata information for the datasources as described in the contributions.md file

all of these issues are relevant in order to process larger datasets in a pipe.

Currently, the tiles need to be inserted manually by the user. This means that there is no way currently to process all PAs from KfW at once. A possible solution to this could be to use the REST API from ArcGIS. I also created an issue for this in the mapme forest repository. See here)
The raster package will be most probably not able to create zonal statistics for very large areas.
usage of deprecated

Create buffer zone

In addition to WDPA polygon processing, it is also important to see the changes going on outside of the protected areas too. For this, we need to create a 10 Km buffer zone around supported & all polygons.

create 10 km buffer around supported PAs
create 10 km buffer around all PAs
write reproducible routine to create buffer zone

Irrecoverable CArbon dataset

Hi @Ohm-Np : Could you check how difficult it is to process this dataset:

https://www.nature.com/articles/s41893-021-00803-6

and report back here. Someone from the team would like to have this information.

Create first descriptive analysis for the variables from DOPA and others

Goal: Create first appealing analysis for the ministry meeting.

check DOPA variables and their meanings (+ eventually update the DOPA rmd at the same time)
download data for all KfW PAs
Plan an interesting analysis
Create an interactive report for the portfolio with appealing maps and interactive statistics
eventually compare KfW areas to other areas in the country (only if time left)
Write EMAIL to DOPA people about missing data

Check links in Markdownfiles for data-processing

I saw that the links in the section that describes the datasets (metadata links and download links) are not working in the rendered website. Could you please check on them? I guess it is only a formatting thing because they are actually translated in my browser e.g. like this

https://openkfw.github.io/mapme.protectedareas/"22https://data.unep-wcmc.org/pdfs/45/GMW_001_Metadata.pdf?1560444488%22"

problem with treamtent cells being used as control

Hi @melvinhlwong 👍
I just plotted treatment cells for 2007 and 2015 together and it seems like we have cells that are both, treatment and control. I would have guessed, that this should not happen but currently it does. Not sure what might be the problem but I am attaching a map so that you can see it.

Create routine carbon balance

create routine to merge all input data with gdalutils
run the code for supported PAs dataset and complete PA dataset in a new Rscript that is saved in folder code
update the Rmarkdown example in the end with new routine

Treated polygons without corresponding WDPA-ID or BMZ-Nr in matched panels

I have noticed that our matched panels contain 259 treated polygons (treat_ever == 1) without a corresponding WDPA-ID or a BMZ-No. Treated polygons with disbursements should at least have a corresponding BMZ-No.

An example: In the matched panel for year 2006, the polygon with poly_id "id_1301" is assigned treat_ever==1, disbursements start in year 2006. However, there is no corresponding BMZ-No or WDPA-ID.

I came across this issue when trying to merge the matched panels with project data and wdpa variables. We should look into this in one of our sessions @Jo-Schie @melvinhlwong

Empty geometries and NAs in WDPAID - supported PAs

Supported PAs polygon have got few issues with the geometry and wdpaid. Two of the major issues are noticeable:

empty geometries for WDPAID - 107428, 107427 ,107424, 36053, 30625, 30622, 55544090 & 220263
NAs in WDPAID for row numbers: 426 - Sierra del Abra Tanchipa, 429 - Shell Beach & 433 - Juatinga
In the result CSV, for a single WDPAID, there are repetitive rows mainly for the WDPAID - 255
In mapme.forest package, if there are duplicate rows, then the saved raster file gets overwritten when polygons are passed individually and when passed as a big polygon, it throws error (no unique values)

Please check results from TEOW for circle polygons AOI

Hi @Ohm-Np . I checked the results from the polygon processing.

There are three issues for the TEOW data in "../../datalake/mapme.protectedareas/processing/sampling_results/teow_eco_biome_long.csv",

There is one column called biome_intersect_sqkm_ so it does not contain the actual name of the biome.
zero values are coded as NA which is bad practice. NAs should only be NAs if there is missing data. If the area is equal zero the corresponding value should be zero
There are 4869 columns that do not contain any area from any of the biomes. My first guess was, that this is because teow maybe only covers part of the area, but that is not the case. I plotted the missing values, and they are all over the original reasearch area. So I am not sure, where the problem lies. You can recreate my short consistency analysis with the code below if needed.

# ---- create matching db ----
# load again sampling ids
sampling.ids.original<-
  read_sf("../../Om/test/sampling_October-20-2021.gpkg")
# function to load matching data and transform to wide
f.load <- function(my_data_dir)
{
  pivot_wider(read.csv(my_data_dir),
              id_cols = "UID")
}

# apply on all relevant datasets which are structured in long
matching_data <-
  lapply(
    c(
      "../../datalake/mapme.protectedareas/processing/sampling_results/accessibility_5k_10k_long.csv",
      "../../datalake/mapme.protectedareas/processing/sampling_results/accessibility_50k_100k_long.csv",
      "../../datalake/mapme.protectedareas/processing/sampling_results/clay_content_0_10_30_long.csv",
      "../../datalake/mapme.protectedareas/processing/sampling_results/teow_eco_biome_long.csv",
      "../../datalake/mapme.protectedareas/processing/sampling_results/terrain_ruggedness_index_long.csv",
      "../../datalake/mapme.protectedareas/processing/sampling_results/worldpop_population_count_long.csv",
      "../../datalake/mapme.protectedareas/processing/sampling_results/gfw_area_loss_long.csv"
    ),
    f.load
  )
# left join all of the data
matching_data_combined <-
  matching_data %>%
  reduce(left_join, by = "UID")
# add treatment variable
matching_data_combined <-
  left_join(matching_data_combined,
            sampling.ids.original)

# remove irrelevant columns
matching_data_combined <-
  matching_data_combined %>%
  select(-c(
    geom,
    clay_content_0_cm,
    clay_content_10_cm,
    starts_with("teow_intersect"),
    # biome_intersect_sqkm_,
    terrain_ruggedness_index_median,
    terrain_ruggedness_index_standard_deviation
  ))

# rename columns
colnames(matching_data_combined)
matching_data_combined<-
  matching_data_combined%>%
  rename(travel_time_to_nearby_cities_min_5k_10k=travel_time_to_nearby_cities_min.x,
         travel_time_to_nearby_cities_min_50k_100k=travel_time_to_nearby_cities_min.y)

## replace NAs in biomes data with zero ()
# first check whether there are any true NAs (e.g. because there is not data e.g. because the AOIs are not covered by TEOW)
test <-
  matching_data_combined %>%
  select(starts_with("biome")) %>%
  rowSums(., na.rm = T)
# 

length(sampling.ids.original[which(test==0),"strata"])
plot(sampling.ids.original[which(test!=0),"strata"])
# there is clearly now geographical pattern regarding those values that are zero. Hence it must be most probably some programming issue. 

# check if a lot of those values come from the supported PAs strata
table(st_drop_geometry(sampling.ids.original[which(test!=0),"strata"]))

Open empirical issues (Regression)

Clustered standard errors
Heteroskedastic standard errors in R can#t be computed for weighted regressions
Regressions with new dataset for entire population of cells
Econometric justification why to use matching despite DiD setup with control variables. We have no parallel trends of treatment and control units. Hence we need to make groups comparable. Needed step. Create plot of Forest cover loss for treatment and control (matched and unmatched) to check parallel trend. @yotaae @Jo-Schie let's talk about this on Tuesday. April 07 2021

Collect ideas on master thesis

Hi @Ohm-Np.

I think there can be a lot of different ways on how you could take advantage of what we are currently creating to do a Master thesis (if this is of interested to you). I just wanted to open this issue to collect ideas in case you are interested. My first idea would be something along the very broad topic of how to create an enterprise ready open geodatascientist infrastructure with R and you could

use the PAs example as a blueprint to further elaborate on that.
Gather more information on our usecases and needs
Talk about our technical setup with pros and cons and how to improve that.
Show ways on how to integrate with other platforms such as OpenEO and Python.
Make recommendations on best packages etc.
And possibly write the whole thesis as a rmarkdown/bookdown Inna own dedicated repository.

Another more closed idea could be to further improve the PAs repository to eG automate workflows, process more variables etc. Depending on your interests. First idea is more general from a geoinformatics perspective, second more thematically aligned to conservation.

Process Accessibility and GFW for new AOIs

Hi Om,

I am updating our AOIs for the Impact Evaluation analysis. Could you please process these two datasets for me ASAP:

aois_supported<-read_sf("../../datalake/mapme.protectedareas/output/polygon/sampling/fishnets/fishnet_within_10km.gpkg")

aois_aux<-read_sf("../../datalake/mapme.protectedareas/output/polygon/sampling/fishnets/fishnet_within_countries_100km.gpkg")

For the first dataset (aois supported) I need all the variables as last time with the AOIs.
For the second (aois_aux) I only need GFW and Accessibilty. This data will be only used in an intermediary step. I would like you to ask to process this dataset first, because I need it quickly to then sample non-supported areas.

Create routine to process population count data

Create routine for Analyzing Copernicus Global Land Cover data

I think we should focus on this first. @Ohm-Np : Can you start to sketch a routine how this data could be analyzed and what the output would be?.

Update contribution file

with description on how output data should be structured including the longtable example
with description on how Bigdata processing scripts should be stored and called in a masterscript with little example code.

Progress track - dataset processing

Issue dedicated to keep track of processing of the variables and to discuss any issues related to particular variables.

All Protected Areas

Supported Protected Areas

Feedback on readme and contributions file

I just put an extensive update to the readme and the contributions file. The idea is:

readme defines very generally what is the purpose of the project and what outputs are expected, and
contribution file gives a very detailed information on how the repo works, how it is intended to make things reproducible, what are the minimum quality standards and how should a good contribution be made (step by step).

This is now already relevant for the work of @Ohm-Np, so it would be great if you had a look at both files and give it a good read. Also, you are invited to leave feedback here or to just edit the files. It is supposed to be a living document.

Nevertheless, since this could be also a blueprint for a MAPME standard for reproducible research I'd like to also invite @goergen95 to have a look at it and edit or leave feedback here (if you have time of course). I put into this basically all your suggestions from the issues in the mapme repo issue but I also extended it a bit from an organizational perspective.

Create routine to process Accessibility Estimates

Test code in mass processing

Hi @Ohm-Np . I created a sample dataset that has about 11.600 AOIs. Could you please try to run a few data processing routines on a test-set of say 5000 AOIs:

Terrain Ruggedness
Accessibility
Worldpop and
Global Forest Watch

You might already try to implement parallel routines if you feel that serialized processing is too slow. Just keep in mind that we will have to process all 11.600 in the end. The Global Forest Watch functions already have the ability for parallel but the other three functions from you I guess do not have that yet built in.

You can find the file in: "../../datalake/mapme.protectedareas/output/polygon/sampling/sampling_AOIs_2021-09-15.gpkg"

Please note that Global Forest Watch might have a problem. Darius said it is probably and authentication issue and not caused by the package. If you cannot make it work after a couple of days please visit this issue and also leave your feedback, so Darius may hopefully fix it.

Modeling: Missing BMZ numbers for projects

The project database has a lot of projects without a BMZ number.
projectdata_database<-
read_csv("../../datalake/mapme.protectedareas/output/matching/model_frames/projectdata_supported.csv")

This happens particularly for 2008. @Jo-Schie Do you know what happened here? We are losing a lot of data here, because there I won't be able to attribute the disbursements to the polygons.

Test Matching Strategies

Hi @melvinhlwong ,

I just created all necessary files for the matching procedures. The following two scripts have been updated in code/developmentand pushed to main:

spatial_machting_db_creation.R -> contains a lot of steps to create the matching "frames" for each year.
spatial_matching.R -> removed all the code for the frames creation and just left some sample matching code. There you can also find where the for each year is stored because it is loaded via read_csv.

Next steps would be:

create matching procedures for individual years
figure out how to add categorical variables (country name and ecosystem name) to the procedure (currently droped in the process)
adapt the breaking points (they are quite bad at the moment)
Add or delete single variables based on empirical knowledge or correlation (e.g. elevation).
Edit: interpretation of matching results (Multivariate Imbalance Measure, Percentage of local common support: LCS)
create a single database for the model (details below)

You can just pull the latest script changes to your project if you like (remember that you need to swich branch to main if you do so). You should be able to load all data as well. I will now also give you write access to your userfolder. That did not work before. I have to see also about read and write permissions to the database.

If anything does not work as expected you can also create a new issue in this repo.

Details for output database:
Database could look like this:

Columns: UID | Treatment | Matchyear | WDPA - ID | BMZ- Number| Project Start | Project Stop |Country|Biome| Forest Cover Loss|Finance per area
Columns (to be processed still): Precipitation|Temperature|+other time-variant
long-format: Forest Cover Loss should be only for startyear, treatmentyear 1, treatmentyear2, , ... stopyear
eventually long-format 1 : Forest Cover Loss for stopyear, stopyear +1 , stopyear + 2 , .... stopyear + n (maybe effects materialize later).

what can we derive from this database:

comparision of total forest cover loss for matched and controls during

Process climatic variables

The datasets are available in two different spatial and temporal resolution combinations for precipitation and temperature:

Data	Temporal coverage	Temporal resolution	Spatial resolution	Remarks
Precipitation	1970-2000	aggregated over years	~1 sqkm	Monthly aggregate: total 12 rasters for 12 different months
Precipitation	2000-2009 & 2010-2018	Annual	~21 sqkm	12 rasters per year
Average temperature	1970-2000	aggregated over years	~1 sqkm	Monthly aggregate: total 12 rasters for 12 different months
Min. Temperature & Max. Temperature	2000-2009 & 2010-2018	Annual	~21 sqkm	12 rasters per year

What do you think @Jo-Schie, which raster datasets should we process first?

Create routine to process TEOW Ecoregions data

For Teow Ecoregions we need a processing routine that intersects Ecoregions layer (polygon) with the PA polygons from WDPA. We need to calculate:

Type of ecoregions that intersects with the PA
Total area of each ecoregion in sqkm that intersects with the PA
Share of each ecoregion area as part of total PA area.

The data can be assessed via API, see here.

Most probably for this issue we need to think a bit on how to automatically find an adequate equal-area projection system for the processed PAs? Any suggestions? Maybe @goergen95 has also an idea on this since this variable is also part of the old PA database, right @goergen95?

Update Variables List

include variables that are important for impact evaluation
create a column that indicates whether we need the output data as raster or polygons
order the variables list based on the priority of processing

Adapt coarsened exact matching algorithm

Adapt bins to improve ratio of preserved matching cells (Melvin)
Aggregate central American countries (Melvin)
Leave out different variables (Melvin)
Exclude marine and partial areas. (Melvin)
Test k2k Matching (melvin
NA analysis (Yota)

Rserver session Error

The issue that was temporarily resolved has reappeared. While running the copernicus-land-cover script, it was working fine upto approx. 600 polygons but then this error appeared:

# [1] "Done processing line 583"
# [1] "Done processing line 584"
# [1] "Done processing line 585"
# 04 May 2021 17:27:28 [rsession-ombhandari] ERROR system error 2 (No such file or directory); OCCURRED AT rstudio::core::Error rstudio::session::source_database::get(const string&, bool, rstudio_boost::shared_ptr<rstudio::session::source_database::SourceDocument>) src/cpp/session/SessionSourceDatabase.cpp:693; LOGGED FROM: SEXPREC* rstudio::session::modules::r_packages::{anonymous}::rs_getKnitParamsForDocument(SEXP) src/cpp/session/modules/SessionRCompletions.cpp:463

Then I tried to run only upto 500 polygons at a time, it was able to process upto 500 lines without any issue however the error appeared again when I tried to save the resultant data frame from workspace.

Update
It appeared even while testing the markdown file copernicus-land-cover.rmd.

# 04 May 2021 18:36:18 [rsession-ombhandari] ERROR session hadabend; LOGGED FROM: rstudio::core::Error {anonymous}::rInit(const rstudio::r::session::RInitInfo&) src/cpp/session/SessionMain.cpp:655

RStudio Server connection error

On starting RStudio Server, I am getting this error:
Terminal reconnection failure: Error occurred during transmission

I have also lost my progress since the GitHub connection is lost.

I think it might be because of large files that were created after merging the huge raster files of Latin America which resulted in ~500 GB of a single raster file and later that was loaded to the workspace for analysis.
I tried clearing the workspace but it didn't work as well.

Empirical Strategy

Spatial Matching Buffer Area approach.
- Standard errors could be from the buffer zones.
- Outside Buffer + Inside Buffer to have a realistic estimation since the core is not threatened anyways.
- Macthing constrained to same ecosystem and same administrative area see e.g. Indonesia Study
Synthetic control. See e.g. Thales Study on REDD
Regression Discontinuity

Prepare WDPA data for analysis

Find out which countries are contained in the KfW database
Download all countries with KfW support from WDPA
apply cleaning routines from wdpar (probably serialized in the cloud)
merge all data and create a new column support by KfW and non-supported
Eventually pre-process financial data to create start and stop year of financial disbursements

Create Database for dashboard prototyping

Dear @IngmarM & @Ohm-Np.

For the purpose of creating a dashboard I put you two here in contact on Github. I think it makes sense to work on Github from day 1 instead of writing emails (if that is okay with you, of course).

@IngmarM: Feel free to also create a private repo if you'd like for the prototyping and continue discussion there.

As for the database. I marked all relevant variables for the prototype in green in our google sheet which tracks the variables See here.

There are two columns especially relevant for this purpose.

Column Prototype: This could indicate to you @IngmarM where the variable could be exibited (e.g. in a popup of the mapviewer or in a dashboard section). I tried to keep this very small for now.
Column Prototype Variable Name: Here @Ohm-Np Should put the name of the variable in the dataset that he hands over to you.

As for the data that should be handed over:

Polygon information should be provided in geojson Format containing the WDPA-ID which serves as our primary key to identify individual polygons (protected areas).
Tabular information could be provided as a longtable csv containing the columns "WDPA-ID", "Variable", "Value" and basically all variables and information just three columns. @IngmarM : Is it possible that you just match both information with the primary key (WDPA-ID) and convert them to wide? If that is a problem, @Ohm-Np can surely also do that preprocessing step in R and just hand you over a clean Geojson in wide.

Please let me know if anything remains unclear.

Publish Repo on Github pages

@jzakotnik

Is it possible to make github pages available for this private repository? As to my understanding that would be possible with a PRO account see here. Not sure if you have that.

We would like to publish a report of our current portfolio analysis on the protected areas. The underlying tecnology renders our Rmarkdown scripts into html files and creates a whole website. For more information see here.

We would like to publish the site to make our progress available to the ministry and the supervisor from @Ohm-Np at the university of Münster. We do not really care if it is currently available to the general public or if it is somehow password protected or something like that. We want to be just able to share the progress.

In addition there is some information here:

Warning: GitHub Pages sites are publicly available on the internet by default, even if the repository for the site is private or internal. If your project site is published from a private or internal repository owned by an organization using GitHub Enterprise Cloud, you can manage access control for the site. Otherwise, if you have sensitive data in your site's repository, you may want to remove the data before publishing

As said, no preferences from our site, buy you are the policy maker :)

Edit 2: This is what I currently get in the settings:

Cannot merge pull requests.

Hi @Ohm-Np . I cannot resolve the pull requests because there are merge conflicts especially in the big data processing file. Could you have a look at this (in "pull requests" you can look and directly resolve the conflicts here on the github page). let me know if you can manage.

Best

Process Global Fishing Watch Data

Variables	Brief Description	Contents	Resolution & Units	File Types	Table Schema
Bathymetry	It contains the bathymetry layer used to assign a depth for every AIS position message in our database. These data are sourced from the General Bathymetric Chart of the Oceans (GEBCO).	bathymetry.zip	450 m	Zipped tiff	-
Distance from port in meters	It provides the distance from port of every point in the ocean (11/23/2020)	distance-from-port-v20201104.tiff	1 km	tiff	-
Distance from shore in meters	It provides the distance from shore (in meters) of every point in the ocean (03/20/2020)	distance-from-shore-v1.zip	1 km	Zipped tiff	-
Transshipment Behavior	to identify vessels capable of receiving transshipments and analyze their movements (07/17/2017)	(1) List of vessels involved in transshipment (2) Loitering events (3) Potential transshipment events (encounters)	-	CSV	(1) `mmsi` `shipname` `callsign` `flag` `imo` `first_timestamp` `last_timestamp` (2) `transshipment_mmsi` `starting_latitude` `starting_longitude` `ending_latitude` `ending_longitude` `starting_timestamp` `ending_timestamp` `median_speed_knots` `total_event_duration` (3) `fishing_vessel_mmsi` `transshipment_vessel_mmsi` `start_time` `end_time` `mean_latitude` `mean_longitude` `duration_hr` `median_distance_km` `median_speed_knots`
Anonymized AIS training data	anonymized AIS data with labeled fishing positions suitable for training machine learning solutions	(1) Drifting longlines (2) Fixed gear (3) Pole and line (4) Purse seines (5) Trawlers (6) Trollers (7) Unknown	-	CSV	`mmsi` `timestamp` `distance_from_shore` `distance_from_port` `speed` `course` `lat` `lon` `is_fishing` `source`
Anchorages	global database of anchorage locations where vessels congregate (2020-11-04)	over 160,000 individual anchorage locations, which are further grouped into nearly 32,000 ports when applicable.	-	CSV	`s2id` `lat` `lon` `label` `sublabel` `label_source` `iso3` `distance_from_shore_m` `drift_radius` `at_dock`
Carriers	The Carrier Vessel Portal utilizes AIS data to show the historical activity of carriers, including port visits, loitering and encounter events (2020-12-01)	(1) encounter: Encounters between a carrier and a fishing vessel (2) loitering: Carrier vessels exhibiting behavior indicative of a potential encounter event (3) port: Vessels potentially docking or waiting in an anchorage.	-	CSV	`id` `type` `start` `end` `lat` `lon` `vessel.id` `vessel.type` `vessel.name` `vessel.mmsi` `vessel.flag` `vessel.origin_port.country` `vessel.origin_port.name` `vessel.destination_port.country` `vessel.destination_port.name` `median_speed_knots` `elevation_m` `distance_from_shore_m` `distance_from_port_m` `regions.rfmo` `port.country` `port.name`
Fishing effort	This dataset contains the Global Fishing Watch AIS-based fishing effort and vessel presence datasets (V2.0: 2012-2020)	(1) fleet (2) mmsi	(1) 0.01 degrees (2) 0.001 degrees unit: hours	CSV & Zipped CSV	-

Fore more information: click here

Propensity score matching

test out and report comments and results here

Create flat layer for aggregated portfolio stats

Currently we have overlapping polygons in our PA database. That creates a problem with aggregated portfolio stats (double count of areas, loss areas etc). Therefore we will have to create a routine at some point to create a flat layer (unionize polygons) to correct for that.

To do so we need to define ex ante the aggregation levels (eg based on project number, country or complete portfolio) and probably even create different flattenings since the overlaps might differ)