Git Product home page Git Product logo

model-factory's Issues

need additional/update vs30 site models

additional site models per province and not just vs30_bc_site_model.

Suggest from Teigan by trying vs30_CAN_site_model.csv for all of Canada, and if possible, run preprocessing and load into stack as table direct from github isntead of processing on stack side.

PSRA indicators (b0/r2 to b0/r1)

Similar to DSRA. Future runs of PSRA will be based off b0/r2 to b0/r1. Will need to revise psra sql scripts to reflect change. PSRA*.py files may need to be updated to reflect this.

add_data.sh related Python scripts - flexible data loading

Dynamically load raw model datasets which may have changes to the types of fields by reading and loading those fields. Implement tests on inputs with some reasonable constraints on what the stack will load

See also Issue #48

Major Priorities

PSRA:

  • PSRA_copyTables.py (Preview WIP flexible-csv-headers branch)
    • Shrink the list of expected_headers (though these should probably be removed altogether eventually)
    • pylint, flake8
    • Skip OpenQuake comment header (and deal with U+FEFF BOM). (Originally taken care off by sed `add_data.sh
    • Read CSV header from PostgreSQL dynamically, something like psql opendrr -c 'COPY (SELECT * FROM psra_BC.psra_BC_hcurves_pga WHERE FALSE ) TO STDOUT WITH CSV HEADER;'
    • Quote SQL identifiers when necessary for column names with upper-case letters, ., (), etc.
    • Restore reading config.ini and add error checking if file not found (POSTGRES_* variables not defined)
    • Write some kind of unit test?

DSRA:

Minor Priorities

Exposure:

DSRA - revise SA(0.3) >= 0.02 filter

@tieganh @jvanulde discovered that some of the values from earthquake scenarios md are different from our indicators.
Our original indicators / shakemap are filtered with a SA(0.3) >= 0.02 from previous discussions whereas the md values are not filtered. Current scenarios (for BC) are run across all assets of BC regardless of magnitude.

In order to match the 'true' value of the scenario and be consistent with the scenario md, will need to remove the filter and find another way of implementation.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

update physical exposure indicators

Meeting with @tieganh and @plesueur and under agreement to make the changes to physical exposure

  • Remove genocc and replace with occtype (E_BldgOccG)

  • update all indicators using genocc with occtype/occtype1 as needed

  • E_BldgOccG, resource to occtype

  • Et_BldgAreaRes

  • Et_BldgAreaComm

  • Et_BldgAreaCivic

  • Et_BldgAreaAgr

  • Et_ResLD, Et_ResMD, Et_ResHD, to just Et_Res

  • Et_Comm

  • Et_Civic

  • Et_Agr

  • displaced household calculation in DSRA

  • update risk profile gsheet

  • export new geopackage to distribute (FGP, nhsl data etc)

psra - create hexgrid aggregations

  • create aggregation hexgrid (full compliment) for psra national
  • create a few sample aggregations and include eqri to see if it makes sense to have them. Will followup and check with @plesueur
  • may need to create hexgrid for individual provinces if needed?

PSRA - add aggregation column for agg_curves_stats, agg_losses_stats, src_loss_table from filename

One of the new indicators for PSRA in Expected Loss (previously called PML) is 'e_Aggregation'. The plan is to have the region it was created from based on the exposure in that column (i.e. NS (whole province), BC5920, ON3515-20, etc. for the time being).

Our current workflow combines all the data into 1 file and is imported into the postgis db from python. Currently the psra_{prov}_agg_curves_stats table does not hold any of the region information to correctly populate the 'e_Aggregation'.

Had conversation with @drotheram on this and looks like he can parse the region info from the ebR_{region}_agg_curves-stats_b0/r1.csv file and add a column to contain the {region} info in the combined agg_curves_stats table that will be brought into postgis db so I can relate that column to 'E_Aggregation'.

@tieganh if there is any additional info you like to add we can discuss here.

For the time being I will code in the {prov} AS 'e_Aggregation' to get the column as province.

revise hexbin to hexgrid naming

Keep consistency between all hex layers. Change any hexbin to hexgrid names. ie nhsl_physical_exposure_indicators_hexbin_1km to physical_exposure_indicators_hexgrid_1km etc.

Eliminate the use of 'all_indicators'

The use of 'all_indicators' has become redundant now that we have eliminated the thematic indicators.

Let's change the use of 'all_indicators' in postgis to simply 'indicators' and propagate those changes to the ES indices as well

psra revisions (Apr 2022)

Follow up in issues below

  • Attach shapes to agg_loss, expected_loss indicators? (canada-wide, fsa?)
  • any indicators beyond csd and not 1-1 are too big to have any geometries attached, and should only remain as tables
  • remove ss_region from canada-wide indicators
  • Canada wide PML (missing the values for retrofits, appear to be zeros), likely missing an asset value.
  • removing ss_region solved the issue
  • Canada wide average annual loss (building retrofit values are 0 in that table).
  • removing ss_region solved the issue
  • FSA (expected loss FSA, they are all 0 values)
  • don't see any issues on my end as all values are populated, some are 0 but not all. Need clarification
  • Remove micromort for deaths
  • kept default calculation for fatality indicators and removed micromort (/0.000001)
  • PML fatality values need attention(?)
  • don't see any fatality related values for PML (expected loss / agg loss). Need clarification.
  • update psra data dictionary as required
  • remove micromort mention in the fatality indicators (en/fr) as of now.
  • e_Aggregation should change to the filename between {P/T} and _agg ie (ebR_BC_V_Capital_agg_curves-q05_b0.csv -> BC_V_Capital)

update \copy statements for new changes in update_sovi_hazthreat_feb2021 branch

Updates to be made in update_sovi_hazthreat_feb2021 branch
https://github.com/OpenDRR/model-factory/tree/update_sovi_hazthreat_feb2021/scripts

copyAncillaryTables.py

  • UPDATE/ADD copy statements for:

  • Create_table_mh_intensity_canada_v2.sql

  • Create_table_mh_thresholds.sql

  • Create_table_sovi_census_canada.sql

  • Create_table_sovi_index_canada_v2.sql

  • Create_table_sovi_thresholds.sql

  • Create_psra_merge_into_national_indicators.sql

  • Create_scenario_risk_master_tables.sql

PSRA_copyTables.py

  • ADD copy statement for:
  • psra_1.Create_tables.sql (xref create table line starts from 498 - 544, processing xref code commented out in psra_2)

psra_qc_avg_losses_stats has no records

Investigate to newest build of db on stack for psra_qc_avg_losses_stats table and why it has no records which has a cascading affect on qc_all_indicators_b / _s.
All other provinces check out fine.

update ebR_{P/T}_agg_losses-(q05,q95,stats) to include region

One of the columns for {P/T}_expected_loss_fsa is e_Aggregation. Currently the field is populated using {prov} and is updated when our scripts go through each {P/T}. ie for BC = BC for all e_Aggregation field in BC_expected_loss_fsa.

After checking with Tiegan/Phil the e_Aggregation field should be updated and only populate based on the 'region' that it was populated from, ie ebR_BC_V_Capital_agg_losses-q50_r1.csv -> V_Capital, ebR_BC_V_CentralIsland_agg_curves-q05_b0.csv -> V_Centrallsland etc.

PSRA_combineAggCurvesStats.py seems to have columns.append('region') but I don't it actually apends the region to the final output file?

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.


  • Check this box to trigger a request for Renovate to run again on this repository

Move unused scripts to the attic

Move scripts/*.{py,sql} that are no longer used into scripts/attic directory, and add a scripts/attic/README.md documenting why (e.g. superseded by refactored scripts, etc.), and

Ideally do the above after one or more of:

  • list manually, or detect automatically, interdependencies of the Python and SQL scripts
  • listing all nested dependencies (down to the file level) of add_data.sh
  • Git LFS download has been eliminated, and test stack build can be done (without cost) with GitHub CI

DSRA Compilation Script

Need to make changes to the DSRA compilation script (not yet commited to repo):

  • read tables in from private github repo
  • output tables to PostGreSQL
  • flexible inputs/outputs via .ini file

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']

2022-08-09 Update: @wkhchow encountered this error too, but could not reproduce it the second time. Suspect to be an occasional network error that caused incomplete download. Moved this issue from OpenDRR/opendrr-api to OpenDRR/model-factory. Perhaps adding a fail-safe mechanism in DSRA_outputs2postgres_lfs.py (verify checksum, retry download, etc.) would mitigate the issue?


2021-06-07 Update: I wasn't able to reproduce this in my June 4 to 5 local run (using the pipeline-optimization branch at commit 15c2d1889c8b28a04671fbc10a5a0436ba071289). To be investigated.


On 2021-05-28, Joost encountered ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] during

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW

[2021-06-07 Update] Joost was using commit 6ae277be4189bfee6af52c82afde89cfdc2baabf, which is the tip branch that I renamed to pipeline-optimization_experimental_with_bubble-merged_gen-pygeoapi-config).

It does look like something that #53 intends to fix, i.e. flexible CSV header data loading.

The same routine that Anthony ran on 2021-05-22 was successful though, and there did not seem to be any recent change to the upstream repos.

Hypothesis 1: only have PSRA data enabled in the ENV?

In Joost's .env file, only loadPsraModels is set to true only; all the other load* variables are set to false.

... but on closer look, routines for loadPsraModels are run very first,
And the DSRA_outputs2postgres_lfs.py comes before all of the above, so that's probably not it.

Hypothesis 2: Result of keeping volumes from previous run?

Nope, Joost nuked the volumes.

Hypothesis 3: Upstream repos changed?

Not at first glance, but maybe I missed something.

More info

Refer to the Slack DM log between Joost and me on 2021-05-21.

Failure log

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW
python-opendrr_1     | Traceback (most recent call last):
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 205, in <module>
python-opendrr_1     |   main() 
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 65, in main
python-opendrr_1     |   dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
python-opendrr_1     |   dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 676, in parser_f
python-opendrr_1     |   return _read(filepath_or_buffer, kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 448, in _read
python-opendrr_1     |   parser = TextFileReader(fp_or_buf, **kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 880, in __init__
python-opendrr_1     |   self._make_engine(self.engine)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
python-opendrr_1     |   self._engine = CParserWrapper(self.f, **self.options)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1937, in __init__
python-opendrr_1     |   _validate_usecols_names(usecols, self.orig_names)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
python-opendrr_1     |   raise ValueError(
python-opendrr_1     | ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']
python-opendrr_1     | Command exited with non-zero status 1
python-opendrr_1     | 24.71user 10.78system 1:49.17elapsed 32%CPU (0avgtext+0avgdata 1758380maxresident)k
python-opendrr_1     | 248inputs+0outputs (4major+1899875minor)pagefaults 0swaps
python-opendrr_1     | 
python-opendrr_1     | real	1m49.174s
python-opendrr_1     | user	0m24.716s
python-opendrr_1     | sys	0m10.788s
python-opendrr_1 exited with code 1

PSRA compilation - implement data ingest from repositories and non-local database output

The compilation script was written and tested for csv input files on the local file system and output into a database on localhost. There will need to be some modifications to read the OpenQuake outputs from a github/gitlab or similar data repository. Some of the lines specifying the database location will also need to be modified.

To implement these changes, the location and format of the OpenQuake outputs will need to be defined as well as the remote database service.

Create script to parse collapse_probability.csv from

preprocess site exposure tables (pathways)

Some time extensive operations when building stack for processing the site exposure tables for the 1 scenario.
Could consider preprocessing the site exposure tables, and uploading the processed table to the repo to be copied straight into postgis instead.

DSRA scenarios - Investigate to find best extent

Stems from a previous issue on (OpenDRR/opendrr-data-store#40) on developing dsra scenarios tables at various aggregation levels. Using the scenario shakemap grid points to convert to polygon to show the physical extents of each scenario. Current DSRA scenarios run on assets for entire P/T(s) and can be filtered down to match the shakemap scenario extents.

Currently the filter to is set to sa(0.3) >= 0.03 for all dsra scenario indicators and shakemap extents. Tiegan will look investigate and find make edits to the filter if needed.

Integrate nuanced demographic info into shelter calc

HAZUS methodology, Section 14.3, describes shelter needs calculation. Murray had wanted to include some more nuanced indicators (Imm_LT5, Live_Alone, No_EngFr, LonePar3Kids, Indigenous - see attached below), but we don't have data on how those indicators affect demand for shelters. As of Sep 2021 we're omitting these indicators because we don't have that data.

displacement-shelter-models-with-damage-factors.pdf

We're also changing Murray's 'demographic' section back to 'ethnicity' as it was in HAZUS, using the 'VisMin' indicator, because groups of indicators must be correlated. In other words, the percentage of the population must sum to ~1 over the category. In future this could be adjusted to include more groupings than just 'VisMin'.

Assigning @yipjackie but if someone else has capacity and wants to work on this then they're welcome to. Some cool work has been coming out of UBC that could be relevant for Canadian context of displacement. Also no pressure for Jackie to implement, as even HAZUS folks decided not to implement anything besides ethnicity and income.

Relevant files are building DSRA and SAUID. Can contact @wkhchow for more info on these.

DSRA updates for new scenarios (b0/r1)

Tiegan's new revised scenarios are now running off b0/r1 instead of the previous iterations of b0/r2.
Must make changes to @drotheram 's DSRA to postgres python scripts and my dsra indicator scripts as they will no longer work.
Need to confirm whether to have the previous b0/r2 scenarios run as well or those will be recalculated to run off b0/r1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.