smnorris / bcfishobs Goto Github PK
View Code? Open in Web Editor NEWReference BC Known Fish Observations to the Freshwater Atlas stream network
License: Apache License 2.0
Reference BC Known Fish Observations to the Freshwater Atlas stream network
License: Apache License 2.0
pg_featureserv
doesn't yet support views and pg_tileserv
is a bit fussy about them. Instead of creating view whse_fish.fiss_fish_obsrvtn_events_vw
, just create a table. Maybe call it whse_fish.fiss_fish_obsrvtn_events_sp
Currently a point is created for each observation record in whse_fish.fiss_fish_obsrvtn_events_sp
, while whse_fish.fiss_falls_events_sp
only includes distinct locations. This is fine for current use cases but standardizing would be valuable. It is probably best to go with what I've done for observations - use the events
table for actual processing (no geoms), and use events_sp
for visualizing. It is handy to have every input record easily queryable by the source id.
No longer used
fish_observation_point_id = 466322
on Wardropper Creek (north side of Lake Cowichan) does not get matched to a stream.
The site is <100m from a stream but is ~125m from the non-waterbody portion of the stream. Presumably there is no match via the xref and linear_feature_id
.
As we already have a ~98% match rate this may not be a priority but to fix, we could try:
linear_feature_id
matches via blue_line_key
, watershed_code
etc, and attributes like gazetted_name
The aggregated locations are nice and efficient but most workflows benefit from having one record per observation.
Modify schema and outputs so that the de-aggregated data is what gets dumped / is readily available for other tools
Species codes are easier to work with than ids, but by storing the ids in the array we can use https://www.postgresql.org/docs/12/intarray.html for indexed (instant) searches - I think this is worth the tradeoff.
If records are accidentally dropped with a bad query in 07_create-output-tables.sql
, they will not be caught and reported on. Modify the QA query to double check that all records (that are matched to a stream) make it through the analysis.
Not a priority, but while I am tidying up / hardening the various dependencies to bcfishpass
I am thinking it would be nice to keep WHSE_FISH and similar DataBC schemas clean, holding only the source data and generally writing derived data to separate schemas.
This repo is generally for bcfishpass
related work, it could be a sub-repository within the bcfishpass
repo and outputs could be written to the bcfishpass schema.
The job could be scheduled and run with GHA and the result posted to file somewhere (rather than requiring every user to run the job).
On ArcGIS 10.1 - perhaps this isn't an issue for current versions but it could be valuable to write whse_fish.fiss_fish_obsrvtn_events_vw as a table or maybe materialized view
Makes sense that a build of fiss_fish_obsrvtn_events_vw
via sql/schema.sql
does not happen with calls to
$ rm .make/fiss_fish_obsrvtn_pnt_sp
$ make
but it took me a while to clue into what was going on. Thinking that it wasn't there before because the tables in the db were dumped in from an older db before the upgrade to the view.
Does it make sense to add "for a full rebuild run:
rm -d -f .make
to the Makefile
for the make white belts?
We are perhaps just a little bit too conservative with the 100m cutoff for matching observations to streams where there is no lookup table confirmation of the match. A few valid seeming observations do not make it in to the output.
Perhaps go up to ~150 or 200m in cases where there is no lookup match?
To avoid having to CASCADE if any views etc depend on the data:
Incredible tool. Hope to get it running. Working on installing this to windows. So far i have done the following. I am not experienced with most of these tools so please pardon my ignorance and newbieisms.
I did not:
I did:
https://github.com/smnorris/bcfishobs.git
When trying to run 01_load.bat from the Anaconda prompt at cd bcfishobs and within the bcfp virt env issue arose at:
(bcfp) C:\scripts\bcfishobs>ogr2ogr -f PostgreSQL "PG:host=$PGHOST user=$PGUSER dbname=$PGDATABASE port=$PGPORT" -lco OVERWRITE=YES -lco SCHEMA=whse_fish -nln wdic_waterbodies_load -nlt NONE whse_fish.wdic_waterbodies.csv
fails with
ERROR 1: PQconnectdb failed.
invalid integer value "$PGPORT" for connection option "port"
tried
export PGOGR='host=localhost user=postgres dbname=postgres password=postgres port=5432'
and
export PGHOST=localhost
etc.
as per README.md on bcfishpass to no prevail with message
'export' is not recognized as an internal or external command
below is a screen shot of my env variables. Should they have dollar signs in front of them perhaps?
line 45 of 01_load.bat fails psql -c "CREATE TABLE whse_fish.fiss_obstacles_unpublished (id integer ..........
due to unexpected line endings. Could not seem to find a solution other than re-writing the command with no spaces and copying into the terminal directly. This worked but is there another way that you know of?
bcdata bc2pg WHSE_FISH.FISS_OBSTACLES_PNT_SP --fid FISH_OBSTACLE_POINT_ID
fails with the following error message:
Any idea what might be going wrong? Thank you!
(bcfp) C:\scripts\bcfishobs>bcdata bc2pg WHSE_FISH.FISS_OBSTACLES_PNT_SP --fid FISH_OBSTACLE_POINT_ID
Traceback (most recent call last):
File "c:\users\allan\.conda\envs\bcfp\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\allan\.conda\envs\bcfp\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\allan\.conda\envs\bcfp\Scripts\bcdata.exe\__main__.py", line 7, in <module>
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\bcdata\cli.py", line 350, in bc2pg
conn = pgdata.connect(db_url)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\pgdata\__init__.py", line 20, in connect
return Database(url, schema, sql_path=sql_path, multiprocessing=multiprocessing)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\pgdata\database.py", line 41, in __init__
self.engine = create_engine(url)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\sqlalchemy\engine\__init__.py", line 500, in create_engine
return strategy.create(*args, **kwargs)
File "c:\users\allan\.conda\envs\bcfp\lib\site-packages\sqlalchemy\engine\strategies.py", line 56, in create
plugins = u._instantiate_plugins(kwargs)
AttributeError: 'NoneType' object has no attribute '_instantiate_plugins'
document the tables so descriptions/comments show up in https://features.hillcrestgeo.ca/bcfishobs/collections.html
Options that could be valuable to add to the CLI:
In output event table, position is primary key and we insert with on conflict do nothing
- this means records are lost if they are in the same location as another. Points with slightly different locations can be snapped to exactly the same point on the network.
Fixing #33 would help ensure problems like this are found immediately
as per @NewGraphEnvironment comments, these columns are very useful:
life_stage
activity
acat_report_url
Add them (and potentially other columns) to output review rather than requiring user to join or refer back to source
Need to point to $DATABASE_URL here:
https://github.com/smnorris/bcfishobs/blob/main/Makefile#L66
Load data that doesn't change in script 1, download BCGW and process in script 2.
This way updates can easily be processed via a single script called via cron job
Are the (very) minor changes to qa_summary.csv
outputs due to imprecise queries, or is it changes to the source data?
I have noticed that in 01_clean-fishobs.sql that o.point_type_code
= 'Observation'
. This excludes o.point_type_code
= 'Summary'
.
I am not sure when exactly the submission requirements to fulfill permit conditions changed, but I do know that since 2008, a permit holder need only submit observations as a summary. For this reason, there will be many many thousands of observations that are not included in the current bcfishobs
outputs.
The requirement to only summarize fish captured by species and life stage is very unfortunate as there is so much good information lost that way in my opinion. The classifying fish by life stage is likely extremely error-ridden and is very easily done afterwards when each individual fish isinput. Nevertheless, the summary does contain an accurate account of how many fish of each species is captured and is often the only data submitted (example fish_observation_point_id
= 471343).
I am obviously not expecting that you are going to jump on this or change anything at this point but it is something that I will explore in the next couple of months as we look towards densities of fish captured in electrofishing sites and fish observations vs. stream/watershed characteristics.
Btw, this package is a thing of beauty. โญ
Observations at measures such as 309.442m and 309.761m are in essentially the same spot.
Try rounding values and perhaps storing downstream_route_measure
as an integer
This will make it easier to join the two outputs if/when necessary
pgdata - unnecessary
psqlcsv - use /copy
OR, even better - create a table to hold the summary and write the results with a timestamp each time job is run
mkdir -p .make
mkdir -p data
# wdic waterbodies table for relating 50k waterbodies to fwa
wget -qNP data https://hillcrestgeo.ca/outgoing/public/whse_fish/whse_fish.wdic_waterbodies.csv.zip
HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Wed, 08 Feb 2023 19:34:14 GMT
Content-Type: application/zip
Content-Length: 16294451
Last-Modified: Sat, 30 Oct 2021 00:26:53 GMT
Connection: keep-alive
ETag: "617c914d-f8a233"
Accept-Ranges: bytes
unzip -qjun -d data data/whse_fish.wdic_waterbodies.csv.zip
# load via ogr because it makes cleaning the input file easy.
ogr2ogr \
-f PostgreSQL \
"PG:postgresql://postgres@localhost:5432/bcfp_test" \
-lco OVERWRITE=YES \
-lco SCHEMA=whse_fish \
-nln wdic_waterbodies_load \
-nlt NONE \
data/whse_fish.wdic_waterbodies.csv
ERROR 1: ERROR: schema "whse_fish" does not exist
LINE 1: CREATE TABLE "whse_fish"."wdic_waterbodies_load" ( "ogc_fid"...
Joining to streams, if point is at zero measure, current logic is not quite right, points at zero measure will be lost.
https://github.com/smnorris/bcfishobs/blob/main/sql/07_create-output-tables.sql#L103
update to match like this, or using +/- .01 or similar
ON r.blue_line_key = s.blue_line_key
and r.downstream_route_measure < s.upstream_route_measure
and r.downstream_route_measure >= s.downstream_route_measure
Recreating the schema and tables on every load means also updating grants for application roles. Therefore, errors on refresh currently break any downstream applications.
To fix:
bcfishobs
relations on setup (only run once)bcfishobs
only on successful completion of entire loadpsycopg2.errors.DependentObjectsStillExist: cannot drop table whse_fish.fiss_fish_obsrvtn_pnt_sp because other objects depend on it
DETAIL: materialized view bcfishpass.observations_vw depends on table whse_fish.fiss_fish_obsrvtn_pnt_sp
materialized view bcfishpass.freshwater_fish_habitat_accessibility_model_salmon_vw depends on materialized view bcfishpass.observations_vw
materialized view bcfishpass.freshwater_fish_habitat_accessibility_model_steelhead_vw depends on materialized view bcfishpass.observations_vw
materialized view bcfishpass.freshwater_fish_habitat_accessibility_model_observations_vw depends on materialized view bcfishpass.observations_vw
HINT: Use DROP ... CASCADE to drop the dependent objects too.
fish_obstacle_point_id
is only guaranteed to be a unique identifier when the data is downloaded, it is not stable between downloads (when new records are loaded).
Generate a stable primary key based on geom and some other combination of columns
load table is required when running .wdic_waterbodies
but deleted in final step. Create and delete the temp file in the wdic_waterbodies
recipe.
To make short traces visible, we can set downstream_route_measure
to 20 where it is <= 20 before generating a downstream line.
However, this won't always be valid - there are bound to be some that do not continue to 20m - the tweak would have to be conditional.
$ diff qa_match_report_1.csv diffs_pg12/qa_match_report_1.csv
diff --git a/qa_match_report_1.csv b/diffs_pg12/qa_match_report_1.csv
index 38c8cbd..05e45b8 100644
--- a/qa_match_report_1.csv
+++ b/diffs_pg12/qa_match_report_1.csv
@@ -1,11 +1,11 @@
match_type,n_distinct_events,n_observations
A. matched - stream; within 100m; lookup,54010,166828lookup,53916,166813
B. matched - stream; within 100m; closest stream,6668,19294stream,6694,19289
C. matched - stream; 100-500m; lookup,4542,30501lookup,4549,30510
D. matched - waterbody; construction line within 1500m; lookup,11775,113250lookup,11828,113250
E. matched - waterbody; construction line within 1500m; closest,1360,15814closest,1367,15814
TOTAL MATCHED,78355,345687MATCHED,78354,345676
F. unmatched - less than 1500m to stream,1740,5348stream,1742,5359
G. unmatched - more than 1500m to stream,103,722
TOTAL UNMATCHED,1843,6070UNMATCHED,1845,6081
GRAND TOTAL,80198,351757TOTAL,80199,351757
date download is now logged to bcdata.log
not public.bcdata
For quicker queries/lookups and easier visualization, add field flagging maximal observations, by species
Reference DATABASE_URL
variable for all psql commands, so various $PG_ vars don't need to be set
instead of using a serial PK, ensure the PK is consistent between runs by using a reference to the stream network like this:
id bigint
GENERATED ALWAYS AS ((((blue_line_key::bigint + 1) - 354087611) * 10000000) + round(downstream_route_measure::bigint)) STORED PRIMARY KEY
It seems the IDs can change over time (as with FISS Obstacles).
There is not much we can do about this but it is worth noting in the README.
Use existing logic to also match WHSE_FISH.FISS_OBSTACLES_PNT_SP to FWA features. The obstacles data is already better matched to FWA streams than observations, but there are still records that would benefit from matching via the xref rather than just finding the closest stream.
On submission and load to BCGW, fish observations are snapped to 1:20k streams when loaded to the warehouse. However, observations may occur on unmapped streams and older observations may not have been snapped to 1:50k streams (or not snapped to a stream at all).
Modify the logic for matching observations (to streams, not waterbodies) to use watershed codes when distance from stream is greater than a small threshold. Review logic of matching streams to waterbodies to see if using the watershed code will improve the match.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.