Models of deprivation sub-domains for the IDEAMAPS data ecosystem project. This repo contains the source code used to run models and the model outputs. It also contains logic to upload model outputs to the IDEAMAPS platform.
@Gtregon to coordinate the reference data team @Adenikemie + Alex in creating the training dataset for the new morphological infomality model based on the reference data created in #9
The task involves using 3-point reference data to generate training data from the following datasets:
Satellite imagery (sentinel)
Google Buildings ( Building Density )
Irregular Layout
Road connectivity (TBC) **
Population density (TBC) **
The datasets marked with an ** are new to the modelling process and will require some time to familiarise.
This task can be completed when we have a training dataset for the morphological informality model - and this dataset is referenced from within the Github and is likely stored in an accessible place like CRIB.
We need to create a model ready dataset for road connectivity for Nairobi.
The starting point of this issue is that two dataset are ready to go within CRIB:
OSM Road Data
Million Neighbourhood Block Data
This process will involve:
Calculate length of internal streets within blocks connected to external st network (m)
Calculate length of internal streets within blocks NOT connected to external st network (m)
This issue can be closed when there is an output dataset that is stored on CRIB and is model ready for NAIROBI.
In this task, @Gtregon will review existing literature relating to WP2/3 activities to ground myself in novel methods etc. This will enable @Gtregon to justify his actions/decisions during sprint meetings.
What is ongoing atm?
What is the gap?
How are we going to fulfil that gap?
This task can be closed when @Gtregon has completed a literature review of the top 10 papers and their relevance to the project. Link to literature review here.
The full sub-domain modelling network has met (prev w/c 15th April). Going forward teams will meet within their individual sub-domain groups every two weeks. Between these weeks, the team will also regroup at the WP2/3 meetings (also operating every 2 weeks).
The first meeting will help to define a roadmap for each sub-domain.
Envisaged tasks include:
define model spec
define roles and responsibilities within the team
define model parameters and thresholds
define datasets associated with model (or activities for data collection/acquisition)
We need to render documentation pages so that the markdown documents prduced in the Docs repository can be rendered automatically on the interface.
This code should reflect the existing UBDC approach to rendering documentation in either the CCTV or UBDC Web Starter Kit Repos - ๐จ So long as this approach is able to render on React Native Applications.
See feedback from #44. we need to name file names in a standardised way so that future modellers (not us) are able to locate files and understand how they are related to the methods outlined in #43 . A meeting ahs been scheduled so that this work can be discussed with @Gtregon@AlexandraMiddleton@Adenikemie and instructions can be given.
โ Definition of Done
1. Define acceptance criteria.
2. Assess the need for a review process. If a review process is required, the issue states:
Who is involved in the review?
When will the review take place?
Who is responsible for taking on the feedback?
What additional tasks are involved and are they visible on the backlog?
We need to update the DPIA in the ethics application to reflect the use of Nhost as a new cloud provider.
We will update the following sections:
Once this is updated, we pass to JPA for review and submission to the ethics board. This issue can be closed when the updates are submitted to the ethics board.
We need to discuss with the WP2/3 team about what datasets to use and the approach to modelling building density and irregular layout.
There has been a discussion about using the outputs from phase 1 (small dense structure & irregular settlement layout) as inputs to the new model. However this depends upon the quality of these outputs regarding the feedback from the validation actiivty.
@Gtregon will need to source the datasets that will be used for this purpose and process them so that they are ready for use.
@Gtregon will need to provide a link to a document that describes the method - this should not be a long document that takes very long to prepare just something for our working method.
The outcome of this task is a dataset that is prepared for the training process
We need to create a notebook to analyse the validation data from phase 1.
This notebook will ...
load validation data
provide summary statistics of validation data at a grid level.
These summary statistics will directly inform the reference data team and provide them a new layer containing the summary statistics at a grid level.
This task can be closed when one notebook has been created - is readable - and has been uploaded to the Github @Gtregon to provide further details on the location later.
Also - the reference data team should have the layer in order to close this issue.
We need to create a simple word document for internal review of the onboarding questions that we will ask users as they initially gain acces to the interface.
This issue can be closed after this document has been reviewed by WP4/5.
Refine scripts used for ISL, SDS and OD i.e. pull out metric scripts used for each model.
This issue can be closed when each sub-domain model has a list of scripts that are explicitly used to model the sub-domain i.e. SDS contains scripts for the 11 metrics used to model SDS etc.
We need to follow the tracability process outlined in the image below for creating model ready datasets for buildings (small dense settlements + irregular layout) - FOR LAGOS ONLY.
Data required is already on CRIB.
This process will involve:
Reperform block analysis
Perform metric clustering
Analyse metric importance
Define metrics for irregularity
Define metrics for SDS
Aggregate metrics to the grid level
K-Means clustering
Create city-level data product.
The issue can be closed when LAGOS has a city level data product ready for the modelling process and this dataset is stored on CRIB. Access information for the dataset should be included in this issue.
We need to move the raw data to CRIB. The reason for doing this is to have a single source of truth for all training - reference data (and other) that is used for the modelling. This is a secure location that takes away any need to consider data management plan.
This issue can be closed when we have all spatial data relating to the project has been migrated to CRIB.
We need to discuss with the WP2/3 team about what datasets to use and the approach to modelling population density.
There has been discussions already about using building density as a proxy for population density.
@Gtregon will need to source the datasets that will be used for this purpose and process them so that they are ready for use.
@Gtregon will need to provide a link to a document that describes the method - this should not be a long document that takes very long to prepare just something for our working method.
The outcome of this task is a dataset that is prepared for the training process
Using summary statistics generated in #8 to cross reference original reference datasets and determine what can be reused
Reference data team to create a document that defines the characteristics of 'resusable data' (1st draft)
Using validation data to determine a range of HIGH - MEDIUM - LOW values that can be used for NEW reference data
The outcome of this task is a geopackage file that contains a range of values that are matched to grid cells and represent either a HIGH - MEDIUM - LOW likelihood that this grid cell is a deprived area.
We need to follow the tracability process outlined in the image below for creating model ready datasets for buildings (small dense settlements + irregular layout) - FOR KANO ONLY.
Data required is already on CRIB.
This process will involve:
Reperform block analysis
Perform metric clustering
Analyse metric importance
Define metrics for irregularity
Define metrics for SDS
Aggregate metrics to the grid level
K-Means clustering
Create city-level data product.
The issue can be closed when KANO has a city level data product ready for the modelling process and this dataset is stored on CRIB. Access information for the dataset should be included in this issue.
We need to speak to the data protection and freedom of information office at UofG about the requirements for switching to Vercel/Nhost
We may need further liaising with the Procurement Office.
For now, we are planning to pay for Vercel/Nhost via credit card and we will NOT store any data there until we have approvals in place from the university.
In order to ensure our model parameters for buildings, roads and population align with real world thresholds, @Gtregon will investigate existing examples of representative thresholds within case studies e.g. what thresholds exist within Lagos, Kano and Nairobi (and other small, medium and large cities)? Do these align within our model parameters?
This information will then be used to set thresholds for our model parameters and data acquisition/data preprocessing can begin.
This issue can be closed once thresholds for model parameters (high, med, low) have been established. A draft of model parameters will be sent to the WP2/3 team on Friday 19th and will be discussed during the next WP2/3 meeting on Tuesday 23rd April.
See feedback from #44. we need to name file names in a standardised way so that future modellers (not us) are able to locate files and understand how they are related to the methods outlined in #43 . A meeting ahs been scheduled so that this work can be discussed with @Gtregon@AlexandraMiddleton@Adenikemie and instructions can be given.
โ Definition of Done
1. Define acceptance criteria.
2. Assess the need for a review process. If a review process is required, the issue states:
Who is involved in the review?
When will the review take place?
Who is responsible for taking on the feedback?
What additional tasks are involved and are they visible on the backlog?
We need to discuss with the WP2/3 team about what datasets to use and the approach to modelling road connectivity.
There is no real idea yet on how to move forward with this.
@Gtregon will need to source the datasets that will be used for this purpose and process them so that they are ready for use.
@Gtregon will need to provide a link to a document that describes the method - this should not be a long document that takes very long to prepare just something for our working method.
The outcome of this task is a dataset that is prepared for the training process
The source code developed in #14 will be used to run a random forest model using reference data developed during #15#16#17 and Sentinel-2 imagery to generate three classes of morphological informality (high, med and low). Supplementary data will also be used to conduct an accuracy assessment of model outputs e.g. 80% of reference/satellite data will be ingested within the RF model, whilst 20% will be used to validate/assess accuracy of model outputs.
This issue can be closed when the model has been ran for all three pilot cities and outputs have been uploaded to the github repo
We need to speak to Yevdokia about paying for the new cloud services via credit card. We need to ask her if there is an option to pay by direct debit. Yevdokia returns from annual leave on May 20th or 21st.