opendrr / model-factory Goto Github PK
View Code? Open in Web Editor NEWOpenQuake compilation and data manipulation scripts
Home Page: https://opendrr.github.io/model-factory/
License: MIT License
OpenQuake compilation and data manipulation scripts
Home Page: https://opendrr.github.io/model-factory/
License: MIT License
additional site models per province and not just vs30_bc_site_model.
Suggest from Teigan by trying vs30_CAN_site_model.csv for all of Canada, and if possible, run preprocessing and load into stack as table direct from github isntead of processing on stack side.
Similar to DSRA. Future runs of PSRA will be based off b0/r2 to b0/r1. Will need to revise psra sql scripts to reflect change. PSRA*.py files may need to be updated to reflect this.
Dynamically load raw model datasets which may have changes to the types of fields by reading and loading those fields. Implement tests on inputs with some reasonable constraints on what the stack will load
See also Issue #48
PSRA:
sed
`add_data.shpsql opendrr -c 'COPY (SELECT * FROM psra_BC.psra_BC_hcurves_pga WHERE FALSE ) TO STDOUT WITH CSV HEADER;'
.
, ()
, etc.config.ini
and add error checking if file not found (POSTGRES_* variables not defined)DSRA:
Exposure:
@tieganh @jvanulde discovered that some of the values from earthquake scenarios md are different from our indicators.
Our original indicators / shakemap are filtered with a SA(0.3) >= 0.02 from previous discussions whereas the md values are not filtered. Current scenarios (for BC) are run across all assets of BC regardless of magnitude.
In order to match the 'true' value of the scenario and be consistent with the scenario md, will need to remove the filter and find another way of implementation.
physical exposure, psra will not be complete as physical exposure source is waiting PR and psra still underway.
NOTE: not sure this issue is in the right repo.
Need to remove all stdv-related indicators as newest OQ (version 3.10) doesn't contain these.
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.
Meeting with @tieganh and @plesueur and under agreement to make the changes to physical exposure
Remove genocc and replace with occtype (E_BldgOccG)
update all indicators using genocc with occtype/occtype1 as needed
E_BldgOccG, resource to occtype
Et_BldgAreaRes
Et_BldgAreaComm
Et_BldgAreaCivic
Et_BldgAreaAgr
Et_ResLD, Et_ResMD, Et_ResHD, to just Et_Res
Et_Comm
Et_Civic
Et_Agr
displaced household calculation in DSRA
update risk profile gsheet
export new geopackage to distribute (FGP, nhsl data etc)
Murray to provide table with indicators.
Originally posted by @jvanulde in #96 (comment)
@jvanulde: We should avoid system paths.
@drotheram: Pass the path as an argument?
@jvanulde: Probably best.
See also OpenDRR/opendrr-api#114
One of the new indicators for PSRA in Expected Loss (previously called PML) is 'e_Aggregation'. The plan is to have the region it was created from based on the exposure in that column (i.e. NS (whole province), BC5920, ON3515-20, etc. for the time being).
Our current workflow combines all the data into 1 file and is imported into the postgis db from python. Currently the psra_{prov}_agg_curves_stats table does not hold any of the region information to correctly populate the 'e_Aggregation'.
Had conversation with @drotheram on this and looks like he can parse the region info from the ebR_{region}_agg_curves-stats_b0/r1.csv file and add a column to contain the {region} info in the combined agg_curves_stats table that will be brought into postgis db so I can relate that column to 'E_Aggregation'.
@tieganh if there is any additional info you like to add we can discuss here.
For the time being I will code in the {prov} AS 'e_Aggregation' to get the column as province.
Keep consistency between all hex layers. Change any hexbin to hexgrid names. ie nhsl_physical_exposure_indicators_hexbin_1km to physical_exposure_indicators_hexgrid_1km etc.
create shakemap grid view (with geom) from shakemap tables
investigate the new changes for psra (canada-srm2) repo in preparation for future stack build.
The use of 'all_indicators' has become redundant now that we have eliminated the thematic indicators.
Let's change the use of 'all_indicators' in postgis to simply 'indicators' and propagate those changes to the ES indices as well
Follow up in issues below
Updates to be made in update_sovi_hazthreat_feb2021 branch
https://github.com/OpenDRR/model-factory/tree/update_sovi_hazthreat_feb2021/scripts
copyAncillaryTables.py
UPDATE/ADD copy statements for:
Create_table_mh_intensity_canada_v2.sql
Create_table_mh_thresholds.sql
Create_table_sovi_census_canada.sql
Create_table_sovi_index_canada_v2.sql
Create_table_sovi_thresholds.sql
Create_psra_merge_into_national_indicators.sql
Create_scenario_risk_master_tables.sql
PSRA_copyTables.py
Investigate to newest build of db on stack for psra_qc_avg_losses_stats table and why it has no records which has a cascading affect on qc_all_indicators_b / _s.
All other provinces check out fine.
One of the columns for {P/T}_expected_loss_fsa is e_Aggregation. Currently the field is populated using {prov} and is updated when our scripts go through each {P/T}. ie for BC = BC for all e_Aggregation field in BC_expected_loss_fsa.
After checking with Tiegan/Phil the e_Aggregation field should be updated and only populate based on the 'region' that it was populated from, ie ebR_BC_V_Capital_agg_losses-q50_r1.csv -> V_Capital, ebR_BC_V_CentralIsland_agg_curves-q05_b0.csv -> V_Centrallsland etc.
PSRA_combineAggCurvesStats.py seems to have columns.append('region') but I don't it actually apends the region to the final output file?
Just noticed all our DSRA views for "gmpe_Model" indicator defaults to "NBCC2020_TEST_PLACEHOLDER" and comes set a default value from DSRA_outputs2postgres_lfs.py
https://github.com/OpenDRR/model-factory/blob/master/scripts/DSRA_outputs2postgres_lfs.py
Need to check and verify actual source of gmpe values.
Bug in modelfactory processing pulling indicators from PostGIS to ElasticSearch
Taken from exposure to help build charts
@drotheram @tieganh
Update attribute xls after
OpenDRR/earthquake-scenarios#44
Add both b0 and r1
ebR_${PT}agg_curves
ebR_${PT}_agg_losses
This issue provides visibility into Renovate updates and their statuses. Learn more
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
Once we have the criteria finalized for the compilation trigger we need to implemented it in AWS.
Migrate DSRA and PSRA compilation scripts to Lamba
Move scripts/*.{py,sql} that are no longer used into scripts/attic directory, and add a scripts/attic/README.md documenting why (e.g. superseded by refactored scripts, etc.), and
Ideally do the above after one or more of:
Need to make changes to the DSRA compilation script (not yet commited to repo):
check new sovi data (nov 2021) and update to source or indicators as needed
2022-08-09 Update: @wkhchow encountered this error too, but could not reproduce it the second time. Suspect to be an occasional network error that caused incomplete download. Moved this issue from OpenDRR/opendrr-api to OpenDRR/model-factory. Perhaps adding a fail-safe mechanism in DSRA_outputs2postgres_lfs.py (verify checksum, retry download, etc.) would mitigate the issue?
2021-06-07 Update: I wasn't able to reproduce this in my June 4 to 5 local run (using the pipeline-optimization
branch at commit 15c2d1889c8b28a04671fbc10a5a0436ba071289). To be investigated.
On 2021-05-28, Joost encountered ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']
during
[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW
[2021-06-07 Update] Joost was using commit 6ae277be4189bfee6af52c82afde89cfdc2baabf, which is the tip branch that I renamed to pipeline-optimization_experimental_with_bubble-merged_gen-pygeoapi-config
).
It does look like something that #53 intends to fix, i.e. flexible CSV header data loading.
The same routine that Anthony ran on 2021-05-22 was successful though, and there did not seem to be any recent change to the upstream repos.
In Joost's .env
file, only loadPsraModels
is set to true
only; all the other load* variables are set to false
.
... but on closer look, routines for loadPsraModels
are run very first,
And the DSRA_outputs2postgres_lfs.py comes before all of the above, so that's probably not it.
Nope, Joost nuked the volumes.
Not at first glance, but maybe I missed something.
Refer to the Slack DM log between Joost and me on 2021-05-21.
[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW
python-opendrr_1 | Traceback (most recent call last):
python-opendrr_1 | File "DSRA_outputs2postgres_lfs.py", line 205, in <module>
python-opendrr_1 | main()
python-opendrr_1 | File "DSRA_outputs2postgres_lfs.py", line 65, in main
python-opendrr_1 | dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
python-opendrr_1 | File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
python-opendrr_1 | dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
python-opendrr_1 | File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 676, in parser_f
python-opendrr_1 | return _read(filepath_or_buffer, kwds)
python-opendrr_1 | File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 448, in _read
python-opendrr_1 | parser = TextFileReader(fp_or_buf, **kwds)
python-opendrr_1 | File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 880, in __init__
python-opendrr_1 | self._make_engine(self.engine)
python-opendrr_1 | File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
python-opendrr_1 | self._engine = CParserWrapper(self.f, **self.options)
python-opendrr_1 | File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1937, in __init__
python-opendrr_1 | _validate_usecols_names(usecols, self.orig_names)
python-opendrr_1 | File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
python-opendrr_1 | raise ValueError(
python-opendrr_1 | ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']
python-opendrr_1 | Command exited with non-zero status 1
python-opendrr_1 | 24.71user 10.78system 1:49.17elapsed 32%CPU (0avgtext+0avgdata 1758380maxresident)k
python-opendrr_1 | 248inputs+0outputs (4major+1899875minor)pagefaults 0swaps
python-opendrr_1 |
python-opendrr_1 | real 1m49.174s
python-opendrr_1 | user 0m24.716s
python-opendrr_1 | sys 0m10.788s
python-opendrr_1 exited with code 1
The compilation script was written and tested for csv input files on the local file system and output into a database on localhost. There will need to be some modifications to read the OpenQuake outputs from a github/gitlab or similar data repository. Some of the lines specifying the database location will also need to be modified.
To implement these changes, the location and format of the OpenQuake outputs will need to be defined as well as the remote database service.
Related to issue: OpenDRR/openquake-inputs#11
Some time extensive operations when building stack for processing the site exposure tables for the 1 scenario.
Could consider preprocessing the site exposure tables, and uploading the processed table to the repo to be copied straight into postgis instead.
Stems from a previous issue on (OpenDRR/opendrr-data-store#40) on developing dsra scenarios tables at various aggregation levels. Using the scenario shakemap grid points to convert to polygon to show the physical extents of each scenario. Current DSRA scenarios run on assets for entire P/T(s) and can be filtered down to match the shakemap scenario extents.
Currently the filter to is set to sa(0.3) >= 0.03 for all dsra scenario indicators and shakemap extents. Tiegan will look investigate and find make edits to the filter if needed.
From riskprofiler dataset prioritization meeting:
Sometimes small changes in raw files such as the addition of an index column break the workflow and create maintenance overhead. The stack would be more flexible and robust if it dynamically reads file headers rather than assuming the inputs conform to a rigid standard.
HAZUS methodology, Section 14.3, describes shelter needs calculation. Murray had wanted to include some more nuanced indicators (Imm_LT5, Live_Alone, No_EngFr, LonePar3Kids, Indigenous - see attached below), but we don't have data on how those indicators affect demand for shelters. As of Sep 2021 we're omitting these indicators because we don't have that data.
displacement-shelter-models-with-damage-factors.pdf
We're also changing Murray's 'demographic' section back to 'ethnicity' as it was in HAZUS, using the 'VisMin' indicator, because groups of indicators must be correlated. In other words, the percentage of the population must sum to ~1 over the category. In future this could be adjusted to include more groupings than just 'VisMin'.
Assigning @yipjackie but if someone else has capacity and wants to work on this then they're welcome to. Some cool work has been coming out of UBC that could be relevant for Canadian context of displacement. Also no pressure for Jackie to implement, as even HAZUS folks decided not to implement anything besides ethnicity and income.
Relevant files are building DSRA and SAUID. Can contact @wkhchow for more info on these.
Tiegan's new revised scenarios are now running off b0/r1 instead of the previous iterations of b0/r2.
Must make changes to @drotheram 's DSRA to postgres python scripts and my dsra indicator scripts as they will no longer work.
Need to confirm whether to have the previous b0/r2 scenarios run as well or those will be recalculated to run off b0/r1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.