cartodb / camshaft Goto Github PK
View Code? Open in Web Editor NEWAnalysis library to create data views from queries
License: BSD 3-Clause "New" or "Revised" License
Analysis library to create data views from queries
License: BSD 3-Clause "New" or "Revised" License
so we can use infowindows and other stuff
It should re-trigger the analysis of nodes depending on them, however that's not happening.
P.S.: It happens when you add a column or, in general, modify the table schema.
We should create default styles for all the analysis that are on the builder.
Grab column or columns from the nearest neighbor in table 2 and assign to table 1
Hey there! I'm taking a look at the current geocoder analyses from the Data Services team point of view. I hope this helps!
Analysis here.
For this category, DS API has a function per each administrative level. Right now we have 2 of them: admin0 (countries) and admin1 (states, provinces)
cdb_geocode_admin0_polygon(country_name text)
The result is always a polygon.
cdb_geocode_admin1_polygon(admin1_name text)
null
as country will not bring back the desired resultcdb_geocode_admin1_polygon(admin1_name text, country_name text)
The result of these functions is always a polygon.
cdb_geocode_ipaddress_point(ip_address text)
This function always return points.
None ๐ ๐
cdb_geocode_namedplace_point(city_name text)
cdb_geocode_namedplace_point(city_name text, country_name text)
cdb_geocode_namedplace_point(city_name text, admin1_name text, country_name text)
cdb_geocode_street_point(search_text text, [city text], [state text], [country text])
These functions always return points.
None ๐ ๐
But a Note:
Mapzen geocoder works in a special way and works much better if we send the country code in ISO3. We do this internally through the API, but for this we need the country
parameter to be filled -- we convert it to ISO3. The current implementation of this analysis is respecting this, but it's something to take into account if something is to be changed. :)
cdb_geocode_postalcode_polygon(postal_code text, country_name text)
cdb_geocode_postalcode_point(postal_code text, country_name text)
polygon
function is being used at this moment in the analysis. There should be a new parameter with the type of geometry (point or polygon) which should determine which of the two functions above is called. Parameters are the same for both but the result varies in terms of the geometry.The invalidation trigger should be similar to the one found at cdb_cartodbfytable.
Reference: CartoDB/cartodb#8571.
Currently adding a filter to a cached node creates an additional table because filters affect node id.
Filters have to affect node.id but they must not change cached table name. Cached table name should only change on params change.
So we setup node status with onerror/onsuccess callback queries instead of using an array of queries that might fail.
Download this dataset.
https://team.cartodb.com/u/saleiva/tables/chicago_crimes_2014
I've created two fake columns for testing (crimes_value and crimes_value_2)
The centroids you get when using crimes_value column (all rows equal to 1) as weight is the same result you get when using cromes_value_2 (lots of rows equal to 2, the rest to 1).
cc/ @javisantana @rochoa @ohasselblad @stuartlynn
Instead of always using ST_Centroid
, add a new parameter to allow using ST_PointOnSurface instead.
My use case is that I'm creating a centroid analysis node that I want later to intersect again with the original polygon layer (countries) and some of them have the centroid outside their geometry (like Japan).
It should be used to change internal node.id() so dependant nodes get recalculated as well.
cc @javisantana
Right now we need to specify a string (cartodb_id
, for example) in order to aggregate using count
.
Need to generate a list of fake customers
Certain georeference analyses, like georeference-admin-region
and georeference-city
require to send all the params. Would it be possible to make them optional, like we do in georeference-street-address
?
cc: @saleiva
I'd like to see a basic Data Observatory workflow defined such as,
This is essentially the first half of this blog post. @talos this is going to be a point of entry for the UI to get the DO, so it would be good if you actually helped define some of the core bits to start it off.
@rochoa are you happy if we start trying to test some of the integration points across DO and crankshaft functions?
Create a Camshaft analysis to add a column to a table based on a user specified formula using other existing fields (arbitrary sql?)
batch-client
is defined with host-header
: {username}.localhost.lan
. It should be configurable by environment.
Multiple origin destination routing / distance calculation
Open/close here just to notify
Adding the feature so people can easily track open and closed analyses in the pipeline. Good for docs and Tutorials that are going to be chasing to catchup
Right now it is returning an empty list.
So the smaller ones keep visible on top.
Right now the big ones are the last ones being rendered so the smaller ones are not visible.
And the viz is not accesible from embed, editor or builder
Working on this issue CartoDB/cartodb#8759 I've realized that certain analysis (like georeference-city
) always expect column names as input params and retrieve an error if that's not the case:
{
"errors": [
"column \"Island\" does not exist"
],
"errors_with_context": [
{
"type": "analysis",
"message": "column \"Island\" does not exist",
"analysis": {
"id": "b1",
"type": "georeference-city"
}
}
]
}
To maintain the same functionality of the old editor, we should allow sending arbitrary texts and not only columns.
Also, note that currently, if the user sends a couple of words separated by an space, the API returns an error (in this example I sent 'Trinidad and Tobago' as the country):
{
"errors": [
"column \"trinidad\" does not exist"
],
"errors_with_context": [
{
"type": "analysis",
"message": "column \"trinidad\" does not exist",
"analysis": {
"id": "b1",
"type": "georeference-city"
}
}
]
}
that's the error I get when trying to do a join between a table with country names and a few columns more and the world countries dataset from the Data library.
{"errors":["Postgis Plugin: ERROR: transform: couldn't project point (180 -90 0): tolerance condition error (-20)\n\nin executeQuery Full sql was: 'SELECT ST_AsTWKB(ST_Simplify(ST_RemoveRepeatedPoints("the_geom_webmercator",1e-05),1e-05,true),5) AS geom FROM (SELECT ST_Transform(the_geom, 3857) the_geom_webmercator, titulo_seminario, sector, mercado, country, accesos, right_iso_a3\nFROM (SELECT _cdb_analysis_right_source.the_geom as the_geom, _cdb_analysis_left_source.titulo_seminario, _cdb_analysis_left_source.sector, _cdb_analysis_left_source.mercado, _cdb_analysis_left_source.country, _cdb_analysis_left_source.accesos, _cdb_analysis_right_source.iso_a3 as right_iso_a3\nFROM\n (SELECT * FROM table_sheet1) AS _cdb_analysis_left_source\n INNER JOIN\n (SELECT * FROM world_countries) AS _cdb_analysis_right_source\nON _cdb_analysis_left_source.country = _cdb_analysis_right_source.name) _cdb_analysis_query) as cdbq WHERE "the_geom_webmercator" && ST_MakeEnvelope(-20037508.3,20037508.25881302,-20037508.25881302,20037508.3,3857)'\n"],"errors_with_context":[{"type":"unknown","message":"Postgis Plugin: ERROR: transform: couldn't project point (180 -90 0): tolerance condition error (-20)\n\nin executeQuery Full sql was: 'SELECT ST_AsTWKB(ST_Simplify(ST_RemoveRepeatedPoints("the_geom_webmercator",1e-05),1e-05,true),5) AS geom FROM (SELECT ST_Transform(the_geom, 3857) the_geom_webmercator, titulo_seminario, sector, mercado, country, accesos, right_iso_a3\nFROM (SELECT _cdb_analysis_right_source.the_geom as the_geom, _cdb_analysis_left_source.titulo_seminario, _cdb_analysis_left_source.sector, _cdb_analysis_left_source.mercado, _cdb_analysis_left_source.country, _cdb_analysis_left_source.accesos, _cdb_analysis_right_source.iso_a3 as right_iso_a3\nFROM\n (SELECT * FROM table_sheet1) AS _cdb_analysis_left_source\n INNER JOIN\n (SELECT * FROM world_countries) AS _cdb_analysis_right_source\nON _cdb_analysis_left_source.country = _cdb_analysis_right_source.name) _cdb_analysis_query) as cdbq WHERE "the_geom_webmercator" && ST_MakeEnvelope(-20037508.3,20037508.25881302,-20037508.25881302,20037508.3,3857)'\n"}]}
Create a camshaft analysis that will take all points/regions and augment them with a measure from the data observatory.
That requires sorting as well.
Using the correct schema from the extension they belong to.
having this scenario:
[points table][A0] -> [buffer][A1] -> [DO measure][A2]
adding a histogram widget to A1 and filter makes the A2 to recalcualte again all the rows (which is slow). In this particular case we could just use the same filter than the on applied to [A1](in other words, it's the same to apply to filter to the A1 output than A2 one)
Obviously this is not always true, not all the analysis work in that but in general augmetation ones could work in that way
(not sure if this ticket belongs here of to windshaft)
Add new analysis to support filtering by a ranked column to pick top or bottom N elements.
Ref #116
Right now centroid find the centroid of the collected dataset (ST_Centroid(ST_Collect(the_geom))
, but users typically want to find the centroid of each of the geometries as well (ST_Centroid(the_geom)
). There should be an option to return one or the other of these, and should probably default to the second one.
cc @rochoa
In order to give more context if something goes wrong during analysis creation, we should add node_id
of failing analysis node to error.
For this ticket CartoDB/cartodb#8742 we'd need to add two new params:
CC: @saleiva
That's its strict name
Create a Camshaft/Crankshaft analysis that takes a list of points and weights and calculates n weighted cluster centers for those points.
Use sk-learn with modified distance formula so that we can find the weighted centroid rather than the regular one, something like:
Add an option for forcing duplication
Currently the weighted mean function uses lat lng in its calculation. This is fine for small scales but really we want to be doing this with actual distances. @ohasselblad
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.