opendrr / model-factory Goto Github PK

View Code? Open in Web Editor NEW

2.0 5.0 3.0 23.63 MB

OpenQuake compilation and data manipulation scripts

Home Page: https://opendrr.github.io/model-factory/

License: MIT License

Python 100.00%

natural-resources-canada government-of-canada

model-factory's Issues

need additional/update vs30 site models

additional site models per province and not just vs30_bc_site_model.

Suggest from Teigan by trying vs30_CAN_site_model.csv for all of Canada, and if possible, run preprocessing and load into stack as table direct from github isntead of processing on stack side.

PSRA: Create script to merge source loss tables across economic regions prior to PostGIS import

PSRA indicators (b0/r2 to b0/r1)

Similar to DSRA. Future runs of PSRA will be based off b0/r2 to b0/r1. Will need to revise psra sql scripts to reflect change. PSRA*.py files may need to be updated to reflect this.

add_data.sh related Python scripts - flexible data loading

Dynamically load raw model datasets which may have changes to the types of fields by reading and loading those fields. Implement tests on inputs with some reasonable constraints on what the stack will load

Major Priorities

PSRA:

PSRA_copyTables.py (Preview WIP flexible-csv-headers branch)
- Shrink the list of expected_headers (though these should probably be removed altogether eventually)
- pylint, flake8
- Skip OpenQuake comment header (and deal with U+FEFF BOM). (Originally taken care off by sed `add_data.sh
- Read CSV header from PostgreSQL dynamically, something like psql opendrr -c 'COPY (SELECT * FROM psra_BC.psra_BC_hcurves_pga WHERE FALSE ) TO STDOUT WITH CSV HEADER;'
- Quote SQL identifiers when necessary for column names with upper-case letters, ., (), etc.
- Restore reading config.ini and add error checking if file not found (POSTGRES_* variables not defined)
- Write some kind of unit test?

DSRA:

DSRA_outputs2postgres_lfs.py (consider refactoring)
DSRA_ruptures2postgres.py

Minor Priorities

Exposure:

copyAncillaryTables.py

DSRA - revise SA(0.3) >= 0.02 filter

@tieganh @jvanulde discovered that some of the values from earthquake scenarios md are different from our indicators.
Our original indicators / shakemap are filtered with a SA(0.3) >= 0.02 from previous discussions whereas the md values are not filtered. Current scenarios (for BC) are run across all assets of BC regardless of magnitude.

In order to match the 'true' value of the scenario and be consistent with the scenario md, will need to remove the filter and find another way of implementation.

generate new set suite of geopackages for latest db build (dsra - 3 scenarios as of Aug 5 2021)

physical exposure, psra will not be complete as physical exposure source is waiting PR and psra still underway.

Edit transformation scripts for OQ 3.10

NOTE: not sure this issue is in the right repo.

Need to remove all stdv-related indicators as newest OQ (version 3.10) doesn't contain these.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

update physical exposure indicators

Meeting with @tieganh and @plesueur and under agreement to make the changes to physical exposure

SENDAI Indicator Table

Murray to provide table with indicators.

psra - create hexgrid aggregations

create aggregation hexgrid (full compliment) for psra national
create a few sample aggregations and include eqri to see if it makes sense to have them. Will followup and check with @plesueur
may need to create hexgrid for individual provinces if needed?

Investigate if we could de-hardcode /usr/src/app

Originally posted by @jvanulde in #96 (comment)

@jvanulde: We should avoid system paths.
@drotheram: Pass the path as an argument?
@jvanulde: Probably best.

PSRA - add aggregation column for agg_curves_stats, agg_losses_stats, src_loss_table from filename

One of the new indicators for PSRA in Expected Loss (previously called PML) is 'e_Aggregation'. The plan is to have the region it was created from based on the exposure in that column (i.e. NS (whole province), BC5920, ON3515-20, etc. for the time being).

Our current workflow combines all the data into 1 file and is imported into the postgis db from python. Currently the psra_{prov}_agg_curves_stats table does not hold any of the region information to correctly populate the 'e_Aggregation'.

Had conversation with @drotheram on this and looks like he can parse the region info from the ebR_{region}_agg_curves-stats_b0/r1.csv file and add a column to contain the {region} info in the combined agg_curves_stats table that will be brought into postgis db so I can relate that column to 'E_Aggregation'.

@tieganh if there is any additional info you like to add we can discuss here.

For the time being I will code in the {prov} AS 'e_Aggregation' to get the column as province.

revise hexbin to hexgrid naming

Keep consistency between all hex layers. Change any hexbin to hexgrid names. ie nhsl_physical_exposure_indicators_hexbin_1km to physical_exposure_indicators_hexgrid_1km etc.

create shakemap grid view

create shakemap grid view (with geom) from shakemap tables

investigate new changes for psra

investigate the new changes for psra (canada-srm2) repo in preparation for future stack build.

DSRA: Replace SQL 'COPY' command to \copy and move from sql scrip to os.system call throughout stack workflow. Multiple instances.

Get GeoBC setup with docker and GitHub access

Eliminate the use of 'all_indicators'

The use of 'all_indicators' has become redundant now that we have eliminated the thematic indicators.

Let's change the use of 'all_indicators' in postgis to simply 'indicators' and propagate those changes to the ES indices as well

psra revisions (Apr 2022)

Follow up in issues below

Attach shapes to agg_loss, expected_loss indicators? (canada-wide, fsa?)
any indicators beyond csd and not 1-1 are too big to have any geometries attached, and should only remain as tables
remove ss_region from canada-wide indicators
Canada wide PML (missing the values for retrofits, appear to be zeros), likely missing an asset value.
removing ss_region solved the issue
Canada wide average annual loss (building retrofit values are 0 in that table).
removing ss_region solved the issue
FSA (expected loss FSA, they are all 0 values)
don't see any issues on my end as all values are populated, some are 0 but not all. Need clarification
Remove micromort for deaths
kept default calculation for fatality indicators and removed micromort (/0.000001)
PML fatality values need attention(?)
don't see any fatality related values for PML (expected loss / agg loss). Need clarification.
update psra data dictionary as required
remove micromort mention in the fatality indicators (en/fr) as of now.
e_Aggregation should change to the filename between {P/T} and _agg ie (ebR_BC_V_Capital_agg_curves-q05_b0.csv -> BC_V_Capital)

update \copy statements for new changes in update_sovi_hazthreat_feb2021 branch

Updates to be made in update_sovi_hazthreat_feb2021 branch
https://github.com/OpenDRR/model-factory/tree/update_sovi_hazthreat_feb2021/scripts

copyAncillaryTables.py

UPDATE/ADD copy statements for:
Create_table_mh_intensity_canada_v2.sql
Create_table_mh_thresholds.sql
Create_table_sovi_census_canada.sql
Create_table_sovi_index_canada_v2.sql
Create_table_sovi_thresholds.sql
Create_psra_merge_into_national_indicators.sql
Create_scenario_risk_master_tables.sql

PSRA_copyTables.py

ADD copy statement for:
psra_1.Create_tables.sql (xref create table line starts from 498 - 544, processing xref code commented out in psra_2)

psra_qc_avg_losses_stats has no records

Investigate to newest build of db on stack for psra_qc_avg_losses_stats table and why it has no records which has a cascading affect on qc_all_indicators_b / _s.
All other provinces check out fine.

update ebR_{P/T}_agg_losses-(q05,q95,stats) to include region

One of the columns for {P/T}_expected_loss_fsa is e_Aggregation. Currently the field is populated using {prov} and is updated when our scripts go through each {P/T}. ie for BC = BC for all e_Aggregation field in BC_expected_loss_fsa.

After checking with Tiegan/Phil the e_Aggregation field should be updated and only populate based on the 'region' that it was populated from, ie ebR_BC_V_Capital_agg_losses-q50_r1.csv -> V_Capital, ebR_BC_V_CentralIsland_agg_curves-q05_b0.csv -> V_Centrallsland etc.

PSRA_combineAggCurvesStats.py seems to have columns.append('region') but I don't it actually apends the region to the final output file?

DSRA - "gmpe_Model" indicator defaults to value "NBCC2020_TEST_PLACEHOLDER" for all views

Just noticed all our DSRA views for "gmpe_Model" indicator defaults to "NBCC2020_TEST_PLACEHOLDER" and comes set a default value from DSRA_outputs2postgres_lfs.py

https://github.com/OpenDRR/model-factory/blob/master/scripts/DSRA_outputs2postgres_lfs.py

Need to check and verify actual source of gmpe values.

Fix DSRA idm6p8_jdfpathways processing

Bug in modelfactory processing pulling indicators from PostGIS to ElasticSearch

Magnitude parsed from rupture xml is stored as string instead of float

add new indicators to DSRA

Taken from exposure to help build charts

"E_BldgOccS1"
"E_BldgTypeG"
E_BldgDesLev

@drotheram @tieganh
Update attribute xls after
OpenDRR/earthquake-scenarios#44

Add agg-curves q05 and q95 to PSRA_copyTables.py

Add both b0 and r1
ebR_${PT}agg_curves
ebR_${PT}_agg_losses

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Check this box to trigger a request for Renovate to run again on this repository

test shakemap for 1km hexgrid aggregation (DSRA)

investigate 1km shakemap aggregation into hexbin, nearest neighbor if empty hexbin
tied to OpenDRR/boundaries#32

Create compilation trigger

Once we have the criteria finalized for the compilation trigger we need to implemented it in AWS.

Migrate compilation scripts to AWS

Migrate DSRA and PSRA compilation scripts to Lamba

Move unused scripts to the attic

Move scripts/*.{py,sql} that are no longer used into scripts/attic directory, and add a scripts/attic/README.md documenting why (e.g. superseded by refactored scripts, etc.), and

Ideally do the above after one or more of:

list manually, or detect automatically, interdependencies of the Python and SQL scripts
listing all nested dependencies (down to the file level) of add_data.sh
Git LFS download has been eliminated, and test stack build can be done (without cost) with GitHub CI

DSRA Compilation Script

Need to make changes to the DSRA compilation script (not yet commited to repo):

read tables in from private github repo
output tables to PostGreSQL
flexible inputs/outputs via .ini file

PSRA - add id field to psra reference layers that are missing

add auto increment id field to psra reference layers that do not have them

update sovi (Dec 2021)

check new sovi data (nov 2021) and update to source or indicators as needed

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']

2022-08-09 Update: @wkhchow encountered this error too, but could not reproduce it the second time. Suspect to be an occasional network error that caused incomplete download. Moved this issue from OpenDRR/opendrr-api to OpenDRR/model-factory. Perhaps adding a fail-safe mechanism in DSRA_outputs2postgres_lfs.py (verify checksum, retry download, etc.) would mitigate the issue?

2021-06-07 Update: I wasn't able to reproduce this in my June 4 to 5 local run (using the pipeline-optimization branch at commit 15c2d1889c8b28a04671fbc10a5a0436ba071289). To be investigated.

On 2021-05-28, Joost encountered ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] during

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW

[2021-06-07 Update] Joost was using commit 6ae277be4189bfee6af52c82afde89cfdc2baabf, which is the tip branch that I renamed to pipeline-optimization_experimental_with_bubble-merged_gen-pygeoapi-config).

It does look like something that #53 intends to fix, i.e. flexible CSV header data loading.

The same routine that Anthony ran on 2021-05-22 was successful though, and there did not seem to be any recent change to the upstream repos.

Hypothesis 1: only have PSRA data enabled in the ENV?

In Joost's .env file, only loadPsraModels is set to true only; all the other load* variables are set to false.

... but on closer look, routines for loadPsraModels are run very first,
And the DSRA_outputs2postgres_lfs.py comes before all of the above, so that's probably not it.

Hypothesis 2: Result of keeping volumes from previous run?

Nope, Joost nuked the volumes.

Hypothesis 3: Upstream repos changed?

Not at first glance, but maybe I missed something.

More info

Refer to the Slack DM log between Joost and me on 2021-05-21.

Failure log

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW
python-opendrr_1     | Traceback (most recent call last):
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 205, in <module>
python-opendrr_1     |   main() 
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 65, in main
python-opendrr_1     |   dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
python-opendrr_1     |   dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 676, in parser_f
python-opendrr_1     |   return _read(filepath_or_buffer, kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 448, in _read
python-opendrr_1     |   parser = TextFileReader(fp_or_buf, **kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 880, in __init__
python-opendrr_1     |   self._make_engine(self.engine)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
python-opendrr_1     |   self._engine = CParserWrapper(self.f, **self.options)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1937, in __init__
python-opendrr_1     |   _validate_usecols_names(usecols, self.orig_names)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
python-opendrr_1     |   raise ValueError(
python-opendrr_1     | ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']
python-opendrr_1     | Command exited with non-zero status 1
python-opendrr_1     | 24.71user 10.78system 1:49.17elapsed 32%CPU (0avgtext+0avgdata 1758380maxresident)k
python-opendrr_1     | 248inputs+0outputs (4major+1899875minor)pagefaults 0swaps
python-opendrr_1     | 
python-opendrr_1     | real	1m49.174s
python-opendrr_1     | user	0m24.716s
python-opendrr_1     | sys	0m10.788s
python-opendrr_1 exited with code 1

speed up xref processing time

PSRA compilation - implement data ingest from repositories and non-local database output

The compilation script was written and tested for csv input files on the local file system and output into a database on localhost. There will need to be some modifications to read the OpenQuake outputs from a github/gitlab or similar data repository. Some of the lines specifying the database location will also need to be modified.

To implement these changes, the location and format of the OpenQuake outputs will need to be defined as well as the remote database service.

Create script to parse collapse_probability.csv from

Related to issue: OpenDRR/openquake-inputs#11

read in the 'Collapse Rates' sheet of https://github.com/OpenDRR/openquake-inputs/blob/main/exposure/general-building-stock/1.%20documentation/Hazus_Consequence_Parameters.xlsx into pandas dataframe
rename columns and convert data to conform to this format: https://github.com/OpenDRR/openquake-inputs/blob/main/exposure/general-building-stock/1.%20documentation/collapse_probability.csv
write dataframe to a new csv file

preprocess site exposure tables (pathways)

Some time extensive operations when building stack for processing the site exposure tables for the 1 scenario.
Could consider preprocessing the site exposure tables, and uploading the processed table to the repo to be copied straight into postgis instead.

DSRA scenarios - Investigate to find best extent

Stems from a previous issue on (OpenDRR/opendrr-data-store#40) on developing dsra scenarios tables at various aggregation levels. Using the scenario shakemap grid points to convert to polygon to show the physical extents of each scenario. Current DSRA scenarios run on assets for entire P/T(s) and can be filtered down to match the shakemap scenario extents.

Currently the filter to is set to sa(0.3) >= 0.03 for all dsra scenario indicators and shakemap extents. Tiegan will look investigate and find make edits to the filter if needed.

Write script to generate job files for national model (read ini files)

DSRA - Generate indicators to CSDUID level, check all available layers required in ES

From riskprofiler dataset prioritization meeting:

add csd level aggregation to indicators, add to ES
check if other dsra layers for risk profiler needed in ES

DSRA: Test replacing SQL 'COPY' with \copy in DSRA_runCreateTableShakemap.py workflow and move copy command from SQL called from psycopg2 to os.system call

Optimization - add flexible column inputs to SQL raw data loading scrips. Ideally read column names from file, sql copy the file then drop any columns not used in processing

Sometimes small changes in raw files such as the addition of an index column break the workflow and create maintenance overhead. The stack would be more flexible and robust if it dynamically reads file headers rather than assuming the inputs conform to a rigid standard.

Implement Cloudwatch workflow to automatically run sql scripts in AWS Lambda when a new dataset is detected in a Github repo.

Define way to run stats and generate jenk values for plots dynamically with python (borrow from R scripts)

Integrate nuanced demographic info into shelter calc

HAZUS methodology, Section 14.3, describes shelter needs calculation. Murray had wanted to include some more nuanced indicators (Imm_LT5, Live_Alone, No_EngFr, LonePar3Kids, Indigenous - see attached below), but we don't have data on how those indicators affect demand for shelters. As of Sep 2021 we're omitting these indicators because we don't have that data.

displacement-shelter-models-with-damage-factors.pdf

We're also changing Murray's 'demographic' section back to 'ethnicity' as it was in HAZUS, using the 'VisMin' indicator, because groups of indicators must be correlated. In other words, the percentage of the population must sum to ~1 over the category. In future this could be adjusted to include more groupings than just 'VisMin'.

Assigning @yipjackie but if someone else has capacity and wants to work on this then they're welcome to. Some cool work has been coming out of UBC that could be relevant for Canadian context of displacement. Also no pressure for Jackie to implement, as even HAZUS folks decided not to implement anything besides ethnicity and income.

Relevant files are building DSRA and SAUID. Can contact @wkhchow for more info on these.

DSRA updates for new scenarios (b0/r1)

Tiegan's new revised scenarios are now running off b0/r1 instead of the previous iterations of b0/r2.
Must make changes to @drotheram 's DSRA to postgres python scripts and my dsra indicator scripts as they will no longer work.
Need to confirm whether to have the previous b0/r2 scenarios run as well or those will be recalculated to run off b0/r1.