Git Product home page Git Product logo

cartodb / analytics-toolbox-core Goto Github PK

View Code? Open in Web Editor NEW
185.0 27.0 43.0 12.86 MB

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

License: Other

Shell 0.35% JavaScript 61.87% Makefile 3.01% PLpgSQL 6.97% Python 17.34% Scala 7.48% Java 2.80% Cython 0.08% MDX 0.11%
bigquery snowflake carto gis redshift geospatial databricks postgres sql analytics-toolbox

analytics-toolbox-core's Introduction

CARTO Analytics Toolbox Core

The CARTO Analytics Toolbox is a set of UDFs and Stored Procedures to unlock Spatial Analytics. It is organized into modules based on the functionality they offer. This toolbox is cloud-native, which means it is available for different data warehouses: BigQuery, Snowflake, Redshift, Postgres, and Databricks. It is built on top of the data warehouse's GIS features, extending and complementing this functionality.

BigQuery Snowflake Redshift Postgres Databricks

This repo contains the core open-source modules of the toolbox. CARTO offers a set of premium modules that are available for CARTO users.

Getting started

Using the functions on this project depends on the Datawarehouse you are using. In BigQuery and Snowflake you can access them directly as a shared resources without having to install them, for the rest you will have to install them locally on your database.

BigQuery

You can use directly the functions as they are globally shared in the US region.

SELECT `carto-os.carto.H3_CENTER`('84390cbffffffff')

If you need to use them from the Europe region use:

SELECT `carto-os-eu.carto.H3_CENTER`('84390cbffffffff')

If you need to install them on your own VPC or in a different region, follow the instructions later on.

Snowflake

The easiest way to start using these functions is to add them to your Datawarehouse through the Snowflake Marketplace. Go there and install it using theregular methods. After that you should be able to use them on the location you have installed them. For example try:

SELECT carto_os.carto.H3_FROMGEOGPOINT(ST_POINT(-3.7038, 40.4168), 4)

If you need to install them directly, not through the data share process, follow the instructions later on.

Redshift

Right now the only way to get access the Analytics toolbox is by installing it directly on your database. Follow the instructions later on.

Documentation

Cloud Documentation
BigQuery https://docs.carto.com/analytics-toolbox-bigquery
Snowflake https://docs.carto.com/analytics-toolbox-snowflake
Redshift https://docs.carto.com/analytics-toolbox-redshift
Postgres https://docs.carto.com/analytics-toolbox-postgres
Databricks https://docs.carto.com/analytics-toolbox-databricks

Development

Cloud Development
BigQuery README.md
Snowflake README.md
Redshift README.md
Postgres README.md
Databricks README.md

Useful make commands

To run tests, switch to a specific cloud directory. For example, Showflake: cd clouds/snowflake.

# All tests
make test

# Specific module(s)
make test modules=h3
make test modules=h3,transformations

# Specific function(s)
make test functions=H3_POLYFILL
make test functions=H3_POLYFILL,ST_BUFFER

Contribute

This project is public. We are more than happy of receiving feedback and contributions. Feel free to open a ticket with a bug, a doubt or a discussion, or open a pull request with a fix or a new feature.

analytics-toolbox-core's People

Contributors

aarroyosal avatar aitorch avatar alberhander avatar alex-graciano avatar algunenano avatar alvarorm22 avatar andy-esch avatar arredond avatar deansherwin avatar dependabot[bot] avatar francois-baptiste avatar github-actions[bot] avatar giuliacarella avatar jatorre avatar jesus89 avatar jgoizueta avatar jtmiclat avatar jvillarf avatar luipir avatar manuellr avatar miguelangelmorenochacon avatar smazu avatar tedezed avatar vdelacruzb avatar vehrka avatar volaya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

analytics-toolbox-core's Issues

Snowflake ST_DELAUNAYPOLYGONS error with aggregation

Describe the bug
Hello! I am getting the following error when using ST_DELAUNAYPOLYGONS with aggregation, but not while using ST_VORONOIPOLYGONS with aggregation. Reproducible example below using CARTO Snowflake documentation.

Invalid aggregate function in ON clause [ARRAY_AGG(CORRELATION(CORRELATION(DATA.POINT)))]

To Reproduce

;-- RUNS: ST_VORONOIPOLYGONS

SELECT CARTO.CARTO.ST_VORONOIPOLYGONS(
  ARRAY_CONSTRUCT(
    ST_ASGEOJSON(ST_POINT(-75.833, 39.284))::STRING,
    ST_ASGEOJSON(ST_POINT(-75.6, 39.984))::STRING,
    ST_ASGEOJSON(ST_POINT(-75.221, 39.125))::STRING,
    ST_ASGEOJSON(ST_POINT(-75.521, 39.325))::STRING
  )
)

;-- RUNS: ST_VORONOIPOLYGONS

WITH 

DATA AS (
    SELECT ST_ASGEOJSON(ST_POINT(-75.833, 39.284))::STRING POINT UNION ALL
    SELECT ST_ASGEOJSON(ST_POINT(-75.6, 39.984))::STRING  POINT UNION ALL
    SELECT ST_ASGEOJSON(ST_POINT(-75.221, 39.125))::STRING  POINT UNION ALL
    SELECT ST_ASGEOJSON(ST_POINT(-75.521, 39.325))::STRING  POINT 
)

SELECT CARTO.CARTO.ST_VORONOIPOLYGONS(ARRAY_AGG(POINT))
FROM DATA
GROUP BY ALL

;-- RUNS: ST_DELAUNAYPOLYGONS

SELECT CARTO.CARTO.ST_DELAUNAYPOLYGONS(
  ARRAY_CONSTRUCT(
    ST_ASGEOJSON(ST_POINT(-75.833, 39.284))::STRING,
    ST_ASGEOJSON(ST_POINT(-75.6, 39.984))::STRING,
    ST_ASGEOJSON(ST_POINT(-75.221, 39.125))::STRING,
    ST_ASGEOJSON(ST_POINT(-75.521, 39.325))::STRING
  )
)

;-- DOES NOT RUN: ST_DELAUNAYPOLYGONS Invalid aggregate function in ON clause [ARRAY_AGG(CORRELATION(CORRELATION(DATA.POINT)))]


WITH 

DATA AS (
    SELECT ST_ASGEOJSON(ST_POINT(-75.833, 39.284))::STRING POINT UNION ALL
    SELECT ST_ASGEOJSON(ST_POINT(-75.6, 39.984))::STRING  POINT UNION ALL
    SELECT ST_ASGEOJSON(ST_POINT(-75.221, 39.125))::STRING  POINT UNION ALL
    SELECT ST_ASGEOJSON(ST_POINT(-75.521, 39.325))::STRING  POINT 
)

SELECT CARTO.CARTO.ST_DELAUNAYPOLYGONS(ARRAY_AGG(POINT))
FROM DATA
GROUP BY ALL

Quadint prefix search

With quadkeys, it's easy to do a string prefix match to perform range queries (ie: startswith(qk15, qk10)). Is there a quick equivalent with quadints (ie: sql only)? Converting to quadkey or using TOPARENT (bit mask to extract level from parent id) would work, but rather avoid the js UDF in the join.

Use case:

  • store a polygon as a collection of low zoom QK cells (perhaps varying zooms, but still non-overlapping) and join to points represented as a high zoom QK cell.

Would also be interested in this for s2 and h3 (though h3 non-nesting isn't ideal).

option for makevalid => TRUE when converting geojson back to geography in BQ

Hi, very excited to use these tools -- thanks for making them publicly available.

I'm running into frequent circumstances where a buffer operation returns an invalid polygon, which causes the entire query to fail with an error like ST_GeogFromGeoJSON failed: Invalid polygon loop: Edge 1011 crosses edge 4140; in loop 1. These kinds of invalidities are not consequential for the applications I work on, but they do break queries routinely (and often unfixably).

I would love it if functions could allow for passing through the makevalid => TRUE option, wherever there's a call to ST_GEOGFROMGEOJSON call in the SQL. For the specific case of buffer, that happens here

ST_GEOGFROMGEOJSON(@@BQ_PREFIX@@transformations.__BUFFER(ST_ASGEOJSON(geog),radius, units, steps))

This feels like a design question beyond the specific buffer application, so thought I'd file an issue instead of a PR at first.

Core functions no longer available on Snowflake

On Snowflake, all UDFs have disappeared from the carto schema and are no longer available in the data share. The functions were working as recently as yesterday, so this seems to be a new issue.

ST_PLACEKEYWHERE.sql file is empty

Many thanks for making this functionality available.

I've deployed the functions to our own bucket so that we use them against our data in the eu-west2 region.

However, I noticed that the file sql/placekey/ST_PLACEKEYWHERE.sql is empty which breaks the deployment script i.e. it fails and then the other scripts aren't loaded (and subsequent 'function not found' errors if attempting to run queries).

As a work-around I copied the function script from your bucket and updated the file. I can create a PR for this, but probably quicker/easier for you to do?

Function not found: jslibs.turf.ST_BUFFER

I am executing a query in BigQuery, but this is not working, I get the following error:

Function not found: jslibs.turf.ST_BUFFER at [2: 3]

But this line works perfectly:
SELECT jslibs.turf.ST_BUFFER (ST_GEOGPOINT (-33.2187979, -70.6843369), 1000, 'meters', 10) AS geo;

image

image

jslibs not available in regions other than US

Hi Javier,

I just came across your article and I really like it! I was particularly impressed by the fact that it instantly runs in BigQuery.

But I walked into one issue, when I'm running the functions on our internal datasets, it doesn't work. And I get the error: "function not found"
image

Not 100% sure, but I think this is because the function is only available in the US-region (if I run it on a US dataset it does work).

Are you planning to deploy these functions in other regions apart from the US, or should we just figure this out by ourselves?

Thanks either way for building this great addition to BQ!

ST_H3_POLYFILLFROMGEOG problem

First of all, an amazing idea to package these functions as a public dataset.

I'm having a problem using ST_H3_POLYFILLFROMGEOG. My original problem was trying to polyfill Germany but I think it's also apparent here.

Compare what you show in the article

image

to what I get in Bigquery Geo Viz using the same SQL:

image

The same (?) thing happens for me in Germany, these are hexagons I'm getting:

WITH countries AS (SELECT country, jslibs.h3.ST_H3_POLYFILLFROMGEOG(geometry,8) AS cells, geometry
       FROM `conrad-analytics-staging.public.administrative_boundaries`)
SELECT 'red' as stroke, geometry, jslibs.h3.ST_H3_BOUNDARY(cell) AS geom FROM countries, UNNEST(countries.cells) AS cell
WHERE country = 'Germany'

image

This the boundary it polyfills:

image

adding support for middle east

looking at the list of supported regions, middle east is not there.
is it in planning? is there a work-around to use the existing functions in middle east?

thank you!

ST_BUFFER is giving the wrong radius at different Latitudes

We noticed that when we run the ST_BUFFER function, on a point, the radius of the circle differs if we run it on different latitudes on the globe. With the largest discrepancy being on Latitude 50.

Any idea what's causing this?


SELECT  SQRT(ST_AREA(jslibs.turf.ST_BUFFER(ST_GEOGPOINT(0.00,0.00),100,'kilometers',100))/3.14159265359) AS radius_from_area_equator,
       SQRT(ST_AREA(jslibs.turf.ST_BUFFER(ST_GEOGPOINT(0.00,25.00),100,'kilometers',100))/3.14159265359) AS radius_from_area_lat25,
       SQRT(ST_AREA(jslibs.turf.ST_BUFFER(ST_GEOGPOINT(0.00,50.00),100,'kilometers',100))/3.14159265359) AS radius_from_area_lat50,
       SQRT(ST_AREA(jslibs.turf.ST_BUFFER(ST_GEOGPOINT(0.00,75.00),100,'kilometers',100))/3.14159265359) AS radius_from_area_lat75,
       SQRT(ST_AREA(jslibs.turf.ST_BUFFER(ST_GEOGPOINT(0.00,89.00),100,'kilometers',100))/3.14159265359) AS radius_from_area_lat89

Functions not found: `jslibs.eu_h3.ST_H3_CENTROID` and so on

Functions not found:
jslibs.eu_h3.ST_H3_CENTROID
jslibs.eu_h3.ST_H3_BOUNDARY
jslibs.eu_h3.ST_GEOGPOINTFROMH3

maybe other function missing...

I think it's a bug from deploy_functions.sh
Foncion should be created considering dependancies order
When a function is created before its dependancies, the creation fails.
When the deploy_functions.sh script is run over a second time the dependancies already exist and the creation succeed

s2.latLngToKey is not publicly accessible

I can run SELECT jslibs.h3.ST_H3(st_geogpoint(1,2),7),
but SELECT jslibs.s2.latLngToKey(1,2,3) gives me

Access Denied: Routine jslibs:s2.latLngToKey: The user ...@... does not have permission to invoke routine jslibs.s2.latLngToKey.

QUADINT_TOZXY and QUADINT_FROMZXY do not detect invalid QUADINT nor ZXY

SELECT `carto-un.carto.QUADINT_TOZXY`(32)

SELECT`carto-un.carto.QUADINT_FROMZXY`(0,0,1)

return a result even if these QUADINT or ZXY are invalid.

It would be great to protect these functions against this (might have an overall performance impact) or to create IS_VALID_QUADINT and IS_VALID_ZXY functions.

Where can I see the old code for jslibs?

Hi there.

Is there another branch or repo where I can see the code for the H3 functions available under jslibs?

Also, are the new functions available publicly in BigQuery? Where?

Thanks!

Support hexadecimal format for S2

Kepler.gl among other tools use hexadecimal format for s2_token.
See:
keplergl/kepler.gl#800
https://raw.githubusercontent.com/uber-common/deck.gl-data/master/website/sf.s2cells.json

Would be nice if carto-spatial-extension S2 fonction generate directly hexadecimal formated s2_token.

For now on I use:

CREATE TEMP FUNCTION my_to_hex(x INT64) AS 
(
  (
    SELECT 
      STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
    FROM 
      UNNEST(GENERATE_ARRAY(0, 4)) AS byte
  )
);

CREATE TEMP FUNCTION latLngToId(latitude FLOAT64, longitude FLOAT64, level NUMERIC) RETURNS INT64 DETERMINISTIC LANGUAGE js
OPTIONS (library=["gs://bigquery-geolib/s2geometry.js"]) AS """
return S2.keyToId(S2.S2Cell.latLngToKey(latitude, longitude, level));
""";

SELECT my_to_hex(latLngToId(-74.006, 40.7128, 12))

S2 Contains/prefix match (bigquery)

This is a S2 follow up to #63: I want to perform a join like st_contains(geography, point) by storing a geography's covering cells with varying levels (eg: using the new BQ s2_coveringcellids) and then join to a point dataset tagged with a high level s2 cell.

My naive implementation would be to convert the cell int64s to tokens and then do a prefix match, something like:

-- imitate storing the int64s directly
declare parent_cell int64 default `carto-os`.s2.ID_FROMTOKEN('89c25a3000000000');
declare child_cell  int64 default `carto-os`.s2.ID_FROMTOKEN('89c25a3000000001');

SELECT starts_with(
    `carto-os`.s2.TOKEN_FROMID(child_cell),
    rtrim(`carto-os`.s2.TOKEN_FROMID(parent_cell), '0')
);

Is there a more direct/optimized way to compare the int64s directly? Perhaps doing similar to the TOKEN_FROMID but just checking that all parent bytes <= the matching child byte (I know <= is probably wrong, but you get the idea ๐Ÿ˜)?

Thanks!

Crosses?

Any way to get Turf Crosses function in?

Sporadic failure of Snowflake user defined functions in the carto h3 module

Describe the bug
We use the carto h3 module in Snowflake. We are using dbt and often when updating a model using the carto h3 module the user-defined functions are not available, but upon rerunning they suddenly are available.

As far as I can tell there is no pattern to when this happens and not. See my screenshot for two consecutive runs where one failed and one was successful. Nothing has changed between the runs and they should both have been successful.

To Reproduce
Hard to reproduce as this happens sporadic. I have been running a query in the Snowflake UI as well to try to reproduce. When running the following command it first failed and then it suddenly worked. Nothing changed in between

select sfcarto.carto.h3_fromlonglat(1, 2, 9);

Expected behavior
I expect this to always be successful.

Screenshots
image

Additional context
N/A

ST_MAKEELLIPSE Documentation

Describe the bug
@vdelacruzb in relation to pull #477

I think the angle argument in ST_MAKEELLIPSE might be positive clockwise, relative y-axis north. I decomposed a "4" point ellipse
(really 5 points) in the example below to check how points were constructed by turf.

Example: when degree is 0, point 1 indexed from 0 is oriented north (aka y-axis relative north, or x-axis relative east). And when degree is 90, point 1 indexed from 0 is oriented east. Not quite sure how to "define" this in the doc's though...

But let me know if I misunderstand the behavior

To Reproduce

WITH

DATA AS (
SELECT '5 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, 5, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '45 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, 45, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '90 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, 90, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '135 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, 135, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '180 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, 180, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '-135 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, -135, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '-90 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, -90, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '-45 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, -45, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
UNION ALL
SELECT '-5 DEG' AS TYPE, TO_ARRAY(ST_ASGEOJSON(CARTO.CARTO.ST_MAKEELLIPSE(ST_POINT(-70,40), 2, 10, -5, 'kilometers', 4)):coordinates[0]) AS POINT_ARR
)

SELECT D.TYPE, (F.INDEX)::VARCHAR AS POINT_INDEX, TO_GEOGRAPHY(OBJECT_CONSTRUCT('type', 'Point', 'coordinates', F.VALUE)) AS GEOG
FROM DATA D,
LATERAL FLATTEN(INPUT=>D.POINT_ARR) F

Screenshots
image

Inconsistent results between s2 libraries for S2CellId

According to the s2geometry documentation the S2CellID is a UINT64 type whereas the CARTO documentation specifies a INT64 type. I've seen in this issue that BQ doesn't support UINT64 and that's why you decided to use int64 with the same binary representation as a uint64.

However, I'm getting inconsistent results when using s2 libraries, e.g. in Python:

Screenshot 2021-05-13 at 15 56 25

Screenshot 2021-05-13 at 15 57 02

This was also tested in Go.

Error when executing H3_CENTER

When running a query using carto-os-eu.carto.H3_CENTER(h3_key)BigQuery return this error:TypeError: h3Lib.h3ToGeo is not a function at UDF$1(STRING) line 10, columns 25-26`

System Error Using ST_H3_POLYFILLFROMGEOG()

The following example code (and all my internal queries) are all resulting with the error:
SyntaxError: Unexpected token if at UDF$2(STRING, NUMERIC) line 1, columns 45-47

WITH data as (
	SELECT jslibs.h3.compact(
  	jslibs.h3.ST_H3_POLYFILLFROMGEOG(tract_geom,11)) as geo 
  	FROM `bigquery-public-data.geo_census_tracts`.census_tracts_new_york 
  	WHERE geo_id='36081000100')

SELECT jslibs.h3.ST_H3_BOUNDARY(h3) as h3geo 
FROM data,UNNEST(geo) as h3

ST_DESTINATION Documentation

Describe the bug
Is the bearing in ST_DESTINATION counter-clockwise relative east? Screen shot included with the SQL I ran. From documentation:

bearing: DOUBLE counter-clockwise angle from East, ranging from -180 to 180 (e.g. 0 is East, 90 is North, 180 is West, -90 is South)

To Reproduce

SELECT 
      'ORIGIN' AS TYPE
    , ST_POINT(-3.70325,40.4167) AS GEOG
UNION ALL
SELECT 
      '0 DEG' AS TYPE
    , CARTO.CARTO.ST_DESTINATION(ST_POINT(-3.70325,40.4167), 1, 0)
UNION ALL
SELECT 
      '90 DEG' AS TYPE
    , CARTO.CARTO.ST_DESTINATION(ST_POINT(-3.70325,40.4167), 1, 90)
UNION ALL
SELECT 
      '180 DEG' AS TYPE
    , CARTO.CARTO.ST_DESTINATION(ST_POINT(-3.70325,40.4167), 1, 180)
UNION ALL
SELECT 
      '-90 DEG' AS TYPE
    , CARTO.CARTO.ST_DESTINATION(ST_POINT(-3.70325,40.4167), 1, -90)

Screenshots
image

No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command.

Describe the bug
When I run make deploy against snowfalke environment I'm getting following error:
[_ENVELOPE_ARR] ERROR: No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command.

To Reproduce
Steps to reproduce the behavior:

  1. Use the cloud 'snowflake'
  2. Run the query 'make deploy'
  3. See error
    [_ENVELOPE_ARR] ERROR: No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command.

Expected behavior
I would expect additional SF_WAREHOUSE environment variable in .env file.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
I'm happy to provide pull request with required changes in run-query.js and run-script.js files.

S2 'level' 0 should be allowed

Describe the bug
S2 'level' 0 should be allowed

To Reproduce
Steps to reproduce the behavior:

SELECT `carto-os.carto.S2_FROMLONGLAT`(0, 0, 0);
-- Error: 'level' is not a number between 1 and 30 (but it should be)

Expected behavior

SELECT `carto-os.carto.S2_FROMLONGLAT`(0, 0, 0);
-- Returns 1

Additional context
S2 Cell Statistics

Snowflake Performance

On Snowflake, the performance of certain functions from here are very poor.

I believe the fix is simple, and it uses caching of libraries across function calls:

https://docs.snowflake.com/en/developer-guide/udf/javascript/udf-javascript-introduction.html#global-state

For instance, for LONGLAT_AS_H3, you can modify the existing code with some setup function like so:

CREATE OR REPLACE FUNCTION @@SF_PREFIX@@h3._LONGLAT_ASH3
(longitude DOUBLE, latitude DOUBLE, resolution DOUBLE)
RETURNS STRING
LANGUAGE JAVASCRIPT
IMMUTABLE
AS $$
    function setup() {
        @@SF_LIBRARY_LONGLAT_ASH3@@
        h3LibGlobal = h3Lib;
    }
    if (typeof(setup_done) === "undefined") {
        setup();
        setup_done = true;
    }
    if (LONGITUDE == null || LATITUDE == null || RESOLUTION == null) {
        return null;
    }
    const index = h3LibGlobal.geoToH3(Number(LATITUDE), Number(LONGITUDE), Number(RESOLUTION));
    return index;
$$;

On my informal benchmarking, adding this caching allows this function to be called 1 Million times in 3-4 seconds, while without caching, it never completes (> 5 minutes and counting).

Have you come across similar performance problems? Do you see any reason not to adopt this pattern across all Snowflake calls?

carto.H3_BOUNDARY returns null for some valid cell IDs

Hi! Thank you so much for the toolbox, it really makes my life easier! :)

Now, on to the bug:

The carto.H3_BOUNDARY function returns null for some valid cell IDs.

Steps to reproduce the behavior:

  1. Use Snowflake
  2. Run these three queries:
    a. SELECT carto.carto.H3_BOUNDARY('870813269ffffff');
    b. SELECT carto.carto.H3_BOUNDARY('8708132c9ffffff');
    c. SELECT carto.carto.H3_BOUNDARY('870811a69ffffff');
  3. All of these queries return NULL.

I was expecting all the queries above to return a GEOGRAPHy representing the H3 cell with the corresponding ID. I have inspected the cells with these IDs here, and they seem perfectly fine.

Thanks!

carto.PLACEKEY_TOH3 occasionally produces invalid H3s on Snowflake

On Snowflake, carto.PLACEKEY_TOH3 occasionally returns incorrect and invalid H3s for valid placekeys.

As an example the placekey "@8f3-tw4-evz" maps to the H3 index "8a44516b382ffff" (this can be confirmed with the placekey api) which is a valid index, however carto.PLACEKEY_TOH3 returns the H3 "8a44516b2fc3fff" which is an invalid index.

This issue appears to be quite rare (appearing roughly on the order of 1 every 100k - 1M placekeys)

Add lowercase version of all the functions

BigQuery supports all its native functions to be upper or lower case. It would be nice to have the same support for here too.

Is this something that could be valued by others too?

This probably is valid for snowflake too but I don't use it so I am not sure.

ST_BOUNDARY generates error in BigQuery

Hi. I am using the bqcartoeu UDFs and have a table with H3 cell values generated from point data. I want to visualize different statistics on these cells, for example, number of observations per cell. I am trying to generate the geometry for the cells using:

select bqcartoeu.h3.ST_BOUNDARY(bqcartoeu.h3.H3_FROMHEX(col_with_h3_hex_string))

This gives an error:

ST_GeogFromText failed: Invalid polygon loop: Edge 5 crosses edge 7

Unfortunately, I cannot localize which H3 values are causing the error, but I want to ask if the SAFE prefix could be added to the UDF where you are calling ST_GeogFromText.

I guess I am thinking it would make sense here?

https://github.com/CartoDB/carto-spatial-extension/blob/afb66d77dde3f1773a9bb877bd793cfcac30ae46/h3/bq/sql/ST_BOUNDARY.sql#L32

Thanks!

ST_S2 returning crazy values

WITH data AS (SELECT jslibs.s2.ST_S2(ST_GEOGPOINT(-74.006, 40.7128), 12) AS id)
SELECT * FROM data

Returns

Failed to coerce output value "9926595690882924544" to type INT64: Bad int64 value: 9926595690882924544

This seems to be the case for geopoints with longitudes between -60 and -120 and latitudes below about (but not exactly) 40.

How to build the project

Running into build issues that I am unsure how to debug:

carto-spatial-extension on ๎‚  master
โฏ make clean && make build
> Module accessors/bigquery
> Module quadkey/bigquery
> Module constructors/bigquery
> Module geohash/bigquery
> Module h3/bigquery
> Module measurements/bigquery
> Module placekey/bigquery
> Module processing/bigquery
> Module s2/bigquery
> Module transformations/bigquery
> Module accessors/bigquery
/bin/sh: Fusion.app/Contents/Public:/Library/TeX/texbin:/usr/local/MacGPG2/bin:/opt/X11/bin:/Users/a.narayan/.cargo/bin: No such file or directory
make[1]: *** [build] Error 127
make: *** [build] Error 1

What deps am I missing?

is there any way to use `jslibs` in another regions?

Hello.

I am not sure whether this should be considered an "issue", it is rather a question.

I am having a BigQuery dataset in asia-northeast1 region. It is a customer's requirement to store their data in this region and I CANNOT move it to us. However, I would like to use jslibs for operations on the data in this dataset.

Therefore, here is my question:

Is there any simple way to use jslibs in regions other than us? Or I have to copy all function codes to my own bucket etc.?

Support for `geohash.ST_AS_GEOHASH_POLYFILL`

Is your feature request related to a problem? Please describe.
No function available to convert a polygon to a list of GeoHashes.

Describe the solution you'd like
We have support for converting a polygon to H3 in BigQuery via polyfill. However, our use case needs the same function for GeoHash i.e. geohash.ST_AS_GEOHASH_POLYFILL

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.