FSADT1-316 Investigate FME for querying large data
For FME api that could allow setup cron job in openshift, some reading link:
https://apps.nrs.gov.bc.ca/int/jira/browse/INFRA-13455
https://apps.nrs.gov.bc.ca/int/confluence/display/AR/2021-02-12+NRPTI+data+flow+options
https://apps.nrs.gov.bc.ca/int/confluence/pages/viewpage.action?pageId=91193899
https://apps.nrs.gov.bc.ca/int/confluence/pages/viewpage.action?pageId=91199846
Advantage:
- Copy data really fast
- Support complicate data transform (renaming, combine columns)
Restriction:
- The process of getting FME Server and API is unknown, not sure how to handle data merge and table updates
- Take time to learn and hard finding examples
Alternative Solution:
- Run db script to create the table with relationships
- Use FME Desktop for one-time data transfer with option "Create if needed" and "insert"
- Write a function in nestjs application to check and do db updates every night
Steps to copy the Forest_Client table from oracle db to postgres db in openshift:
- Get FME Desktop
- Download citrix workspace app https://www.citrix.com/downloads/workspace-app/
- Go to dts.gov.bc.ca -> login with idir -> find the desktop “Kamloops Desktop - ArcGIS 10-8 Test” -> click on it to start Citrix desktop
- Click okay for the pop up window for security and permission
- From the start menu, search for FME Desktop -> FME work bench
- Inside FME, could create readers and writers which read from original database and write into new database
- Add reader: select readers from top menu bar -> add reader -> format select oracle non-spatial -> and add database connection, enter credentials -> add parameter, select the table name -> click "Ok"
4. Once the reader is added, could click the front triangle to expand and see the table columns
5. Add writer: select writers from top menu bar -> add writer -> format select postgressql -> and add database connection, enter credentials -> table definition select "automatic" -> click "Ok". The following screenshot is an example connect to a database in openshift that got port forward to 4200 on localhost.
6. After click ok to add the writer, it will pop up a feature type window, where we could define the table name in the postgres db, and the operation and table handling, change the table name to whatever you like and click ok. This also allows you to select if you want the writer to insert or update or delete data, and how to handle the case when has an existing table, like "Drop and Create" or "Create if needed"
7. If just want a direct data copy without any restructure, could just go to "View" in the top menu bar -> select "Windows" -> select "Feature Type Connection". This will open a session at the bottom, where we could click on to select the table name under "source" and select the table name under "destination", and then click "connect". This will link the table attributes from the source db table to the new db table
8. If we want to rename the column in the new table in postgres db, or add more column, click on the setting icon next to the writer
This will open the feature type settings. Let's select "User Attributes" tab and "Manual" for attribute definition. This will allow us to rename the column, re-describe the column, and add or delete columns, and order the columns. There is a list of icon at the bottom for adding/deleting/reordering columns
9. If want to combine two columns in the oracle table and put into a new column in the postgres table, could add a transformer "String Concatenator", press enter
10. A transformer is added for combine column values. Update the feature type connection, to let oracle source table connect to the input of the string concatenator, and let output of the string concatenator connect to the postgres destination table
11. Click on the setting icon for the string concatenator, give a name for the new attribute, and select the string type as "Attribute Value" and select the column name from dropdown for "String Value". So we add a new column called "incorporation_number" here, and its value is the combine of two columns, "REGISTRY_COMPANY_TYPE_CODE + CORP_REGN_NMBR "
12. Click on the setting icon for the writer, add an extra column for "incorporation_number" as well, and then click on the triangle from "incorporation_number" under the string concatenator to link to the "incorporation_number" under the writer, so it's mapped
13. To add a primary key, click on the setting icon for the writer, "user attribute" tab, could add a column with type serial, and then select "primary key" under "index"
14. The final flow chart looks like this
15. To connect to the remote openshift postgres db, download the openshift cli tools from namespace, and use that to do a port forwarding, so our above Postgres config works. From the command line `oc project [namespace]` and `oc port-forward [postgres-db-pod] 4200:5432 `
16. Click the Run button in the FME to run the workspace
Reference reading:
Importing Database Tables, Raster Data, Vector Data
Managing Feature Type Connections
To update and manage database connections