Git Product home page Git Product logo

bcfishobs's People

Contributors

smnorris avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bcfishobs's Issues

just create tables, not views

pg_featureserv doesn't yet support views and pg_tileserv is a bit fussy about them. Instead of creating view whse_fish.fiss_fish_obsrvtn_events_vw, just create a table. Maybe call it whse_fish.fiss_fish_obsrvtn_events_sp

output tables - distinct locations or all locations

Currently a point is created for each observation record in whse_fish.fiss_fish_obsrvtn_events_sp, while whse_fish.fiss_falls_events_sp only includes distinct locations. This is fine for current use cases but standardizing would be valuable. It is probably best to go with what I've done for observations - use the events table for actual processing (no geoms), and use events_sp for visualizing. It is handy to have every input record easily queryable by the source id.

matching stream observations > 100m from stream

fish_observation_point_id = 466322 on Wardropper Creek (north side of Lake Cowichan) does not get matched to a stream.

The site is <100m from a stream but is ~125m from the non-waterbody portion of the stream. Presumably there is no match via the xref and linear_feature_id.

As we already have a ~98% match rate this may not be a priority but to fix, we could try:

  • attempt to match sites 100-500m from a stream that do not have exact linear_feature_id matches via blue_line_key, watershed_code etc, and attributes like gazetted_name
  • allow matching stream records to flow lines in waterbodies

de-aggregated view is primary product of interest

The aggregated locations are nice and efficient but most workflows benefit from having one record per observation.
Modify schema and outputs so that the de-aggregated data is what gets dumped / is readily available for other tools

improve qa/summary

If records are accidentally dropped with a bad query in 07_create-output-tables.sql, they will not be caught and reported on. Modify the QA query to double check that all records (that are matched to a stream) make it through the analysis.

write outputs to non-whse schema, if writing to bcfishpass consider moving code to bcfishpass repo

Not a priority, but while I am tidying up / hardening the various dependencies to bcfishpass I am thinking it would be nice to keep WHSE_FISH and similar DataBC schemas clean, holding only the source data and generally writing derived data to separate schemas.

This repo is generally for bcfishpass related work, it could be a sub-repository within the bcfishpass repo and outputs could be written to the bcfishpass schema.

CI/CD

The job could be scheduled and run with GHA and the result posted to file somewhere (rather than requiring every user to run the job).

cannot add postgres view to arcgis desktop

On ArcGIS 10.1 - perhaps this isn't an issue for current versions but it could be valuable to write whse_fish.fiss_fish_obsrvtn_events_vw as a table or maybe materialized view

suggest a readme hint that rebuilding from older versions of repo should perhaps remove all targets

Makes sense that a build of fiss_fish_obsrvtn_events_vw via sql/schema.sql does not happen with calls to

$ rm .make/fiss_fish_obsrvtn_pnt_sp
$ make

but it took me a while to clue into what was going on. Thinking that it wasn't there before because the tables in the db were dumped in from an older db before the upgrade to the view.

Does it make sense to add "for a full rebuild run:

rm -d -f .make

to the Makefile for the make white belts?

increase distance threshold for points with no match via fwa-50k lookup

We are perhaps just a little bit too conservative with the 100m cutoff for matching observations to streams where there is no lookup table confirmation of the match. A few valid seeming observations do not make it in to the output.

Perhaps go up to ~150 or 200m in cases where there is no lookup match?

reconfigure for easier refresh

To avoid having to CASCADE if any views etc depend on the data:

  • use views instead of temp tables where practical
  • create tables once, truncate/reload/refresh when re-processing

installation for windows

Incredible tool. Hope to get it running. Working on installing this to windows. So far i have done the following. I am not experienced with most of these tools so please pardon my ignorance and newbieisms.

I did not:

  • get started with a fresh Windows Subsystem for Linux (WSL) (Ubuntu 20.04.1 LTS) - maybe this was my first mistake?!

I did:

  • Install virt env and bcdata as per the instructions at bcfishpass
  • Download wget (actually just found it elsewhere on my system) and install into the same bcfp virt env directory
  • git clone the bcfishobs repo locally https://github.com/smnorris/bcfishobs.git
  • rename 01_load.sh to 01_load.bat and 02_process.sh to 02_process.bat
  • change line endings in both bat files from \ to ^

When trying to run 01_load.bat from the Anaconda prompt at cd bcfishobs and within the bcfp virt env issue arose at:

  1. (bcfp) C:\scripts\bcfishobs>ogr2ogr -f PostgreSQL "PG:host=$PGHOST user=$PGUSER dbname=$PGDATABASE port=$PGPORT" -lco OVERWRITE=YES -lco SCHEMA=whse_fish -nln wdic_waterbodies_load -nlt NONE whse_fish.wdic_waterbodies.csv

fails with

ERROR 1: PQconnectdb failed.
invalid integer value "$PGPORT" for connection option "port"

tried

export PGOGR='host=localhost user=postgres dbname=postgres password=postgres port=5432'

and

export PGHOST=localhost etc.

as per README.md on bcfishpass to no prevail with message

'export' is not recognized as an internal or external command

below is a screen shot of my env variables. Should they have dollar signs in front of them perhaps?

image

  1. line 45 of 01_load.bat fails psql -c "CREATE TABLE whse_fish.fiss_obstacles_unpublished (id integer .......... due to unexpected line endings. Could not seem to find a solution other than re-writing the command with no spaces and copying into the terminal directly. This worked but is there another way that you know of?

  2. bcdata bc2pg WHSE_FISH.FISS_OBSTACLES_PNT_SP --fid FISH_OBSTACLE_POINT_ID fails with the following error message:
    Any idea what might be going wrong? Thank you!

(bcfp) C:\scripts\bcfishobs>bcdata bc2pg WHSE_FISH.FISS_OBSTACLES_PNT_SP --fid FISH_OBSTACLE_POINT_ID
Traceback (most recent call last):
  File "c:\users\allan\.conda\envs\bcfp\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\allan\.conda\envs\bcfp\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\allan\.conda\envs\bcfp\Scripts\bcdata.exe\__main__.py", line 7, in <module>
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\bcdata\cli.py", line 350, in bc2pg
    conn = pgdata.connect(db_url)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\pgdata\__init__.py", line 20, in connect
    return Database(url, schema, sql_path=sql_path, multiprocessing=multiprocessing)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\pgdata\database.py", line 41, in __init__
    self.engine = create_engine(url)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\sqlalchemy\engine\__init__.py", line 500, in create_engine
    return strategy.create(*args, **kwargs)
  File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\sqlalchemy\engine\strategies.py", line 56, in create
    plugins = u._instantiate_plugins(kwargs)
AttributeError: 'NoneType' object has no attribute '_instantiate_plugins'

CLI options

Options that could be valuable to add to the CLI:

  • output table name
  • initial matching threshold (currently 1500m)
  • modify the current 100m / 500m thresholds

handle observations at the same location

In output event table, position is primary key and we insert with on conflict do nothing - this means records are lost if they are in the same location as another. Points with slightly different locations can be snapped to exactly the same point on the network.

Fixing #33 would help ensure problems like this are found immediately

minor reorganization for easy updates

Load data that doesn't change in script 1, download BCGW and process in script 2.
This way updates can easily be processed via a single script called via cron job

slight inconsistencies

Are the (very) minor changes to qa_summary.csv outputs due to imprecise queries, or is it changes to the source data?

Increase number of observation data points by incorporating point_type_code = 'Summary'

I have noticed that in 01_clean-fishobs.sql that o.point_type_code = 'Observation'. This excludes o.point_type_code = 'Summary'.

I am not sure when exactly the submission requirements to fulfill permit conditions changed, but I do know that since 2008, a permit holder need only submit observations as a summary. For this reason, there will be many many thousands of observations that are not included in the current bcfishobs outputs.

The requirement to only summarize fish captured by species and life stage is very unfortunate as there is so much good information lost that way in my opinion. The classifying fish by life stage is likely extremely error-ridden and is very easily done afterwards when each individual fish isinput. Nevertheless, the summary does contain an accurate account of how many fish of each species is captured and is often the only data submitted (example fish_observation_point_id = 471343).

I am obviously not expecting that you are going to jump on this or change anything at this point but it is something that I will explore in the next couple of months as we look towards densities of fish captured in electrofishing sites and fish observations vs. stream/watershed characteristics.

Btw, this package is a thing of beauty. โญ

reduce measure precision

Observations at measures such as 309.442m and 309.761m are in essentially the same spot.
Try rounding values and perhaps storing downstream_route_measure as an integer

remove unneeded dependencies

pgdata - unnecessary
psqlcsv - use /copy OR, even better - create a table to hold the summary and write the results with a timestamp each time job is run

WHSE_FISH schema does not exist

mkdir -p .make
mkdir -p data
# wdic waterbodies table for relating 50k waterbodies to fwa
wget -qNP data https://hillcrestgeo.ca/outgoing/public/whse_fish/whse_fish.wdic_waterbodies.csv.zip
  HTTP/1.1 200 OK
  Server: nginx/1.18.0 (Ubuntu)
  Date: Wed, 08 Feb 2023 19:34:14 GMT
  Content-Type: application/zip
  Content-Length: 16294451
  Last-Modified: Sat, 30 Oct 2021 00:26:53 GMT
  Connection: keep-alive
  ETag: "617c914d-f8a233"
  Accept-Ranges: bytes
unzip -qjun -d data data/whse_fish.wdic_waterbodies.csv.zip
# load via ogr because it makes cleaning the input file easy.
ogr2ogr \
	-f PostgreSQL \
	"PG:postgresql://postgres@localhost:5432/bcfp_test" \
	-lco OVERWRITE=YES \
	-lco SCHEMA=whse_fish \
	-nln wdic_waterbodies_load \
	-nlt NONE \
	data/whse_fish.wdic_waterbodies.csv
ERROR 1: ERROR:  schema "whse_fish" does not exist
LINE 1: CREATE TABLE "whse_fish"."wdic_waterbodies_load" ( "ogc_fid"...

load and process data in a temp schema then flush to `bcfishobs` when complete

Recreating the schema and tables on every load means also updating grants for application roles. Therefore, errors on refresh currently break any downstream applications.
To fix:

  • create all empty bcfishobs relations on setup (only run once)
  • download and run processing in temp schema
  • flush data to bcfishobs only on successful completion of entire load

cascade when dropping view

psycopg2.errors.DependentObjectsStillExist: cannot drop table whse_fish.fiss_fish_obsrvtn_pnt_sp because other objects depend on it
DETAIL:  materialized view bcfishpass.observations_vw depends on table whse_fish.fiss_fish_obsrvtn_pnt_sp
materialized view bcfishpass.freshwater_fish_habitat_accessibility_model_salmon_vw depends on materialized view bcfishpass.observations_vw
materialized view bcfishpass.freshwater_fish_habitat_accessibility_model_steelhead_vw depends on materialized view bcfishpass.observations_vw
materialized view bcfishpass.freshwater_fish_habitat_accessibility_model_observations_vw depends on materialized view bcfishpass.observations_vw
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

generate primary key for falls

fish_obstacle_point_id is only guaranteed to be a unique identifier when the data is downloaded, it is not stable between downloads (when new records are loaded).

Generate a stable primary key based on geom and some other combination of columns

output from PG11 and PG12 are slightly different

$ diff qa_match_report_1.csv diffs_pg12/qa_match_report_1.csv
diff --git a/qa_match_report_1.csv b/diffs_pg12/qa_match_report_1.csv
index 38c8cbd..05e45b8 100644
--- a/qa_match_report_1.csv
+++ b/diffs_pg12/qa_match_report_1.csv
@@ -1,11 +1,11 @@
match_type,n_distinct_events,n_observations
A. matched - stream; within 100m; lookup,54010,166828lookup,53916,166813
B. matched - stream; within 100m; closest stream,6668,19294stream,6694,19289
C. matched - stream; 100-500m; lookup,4542,30501lookup,4549,30510
D. matched - waterbody; construction line within 1500m; lookup,11775,113250lookup,11828,113250
E. matched - waterbody; construction line within 1500m; closest,1360,15814closest,1367,15814
TOTAL MATCHED,78355,345687MATCHED,78354,345676
F. unmatched - less than 1500m to stream,1740,5348stream,1742,5359
G. unmatched - more than 1500m to stream,103,722
TOTAL UNMATCHED,1843,6070UNMATCHED,1845,6081
GRAND TOTAL,80198,351757TOTAL,80199,351757

bcdata 0.8

date download is now logged to bcdata.log not public.bcdata

minor cleanup

Reference DATABASE_URL variable for all psql commands, so various $PG_ vars don't need to be set

consistent fish_obsrvtn_pnt_distinct_id

instead of using a serial PK, ensure the PK is consistent between runs by using a reference to the stream network like this:

id bigint
     GENERATED ALWAYS AS ((((blue_line_key::bigint + 1) - 354087611) * 10000000) + round(downstream_route_measure::bigint)) STORED PRIMARY KEY

add FISS obstacles

Use existing logic to also match WHSE_FISH.FISS_OBSTACLES_PNT_SP to FWA features. The obstacles data is already better matched to FWA streams than observations, but there are still records that would benefit from matching via the xref rather than just finding the closest stream.

improve matching of observations to streams based on watershed code

On submission and load to BCGW, fish observations are snapped to 1:20k streams when loaded to the warehouse. However, observations may occur on unmapped streams and older observations may not have been snapped to 1:50k streams (or not snapped to a stream at all).

Modify the logic for matching observations (to streams, not waterbodies) to use watershed codes when distance from stream is greater than a small threshold. Review logic of matching streams to waterbodies to see if using the watershed code will improve the match.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.