Comments (32)
what about using this?
https://pypi.python.org/pypi/sqlalchemy-redshift
from superset.
@ChiragKParmar and others (cc @mistercrunch): I use schemas extensively in the cluster I manage and Caravel + psycopg2 works perfectly fine out of the box and I can access all tables in the schemas on the search path. I suspect you guys might be using the default parameter group for Redshift, which only has $user, public
as the search path.
You can set a custom search_path
for Redshift from the admin console in AWS. Go to your Redshift dashboard > Parameter Groups and create a new cluster parameter group (or modify an existing one). In it, change the search_path
property to include the schemas you want.
Please also note: setting a search path in parameter groups sets it for the entire cluster (for all users). In case of duplicate names, an unqualified table name in a query (i.e. tables called simply as <table>
instead of <schema>.<table>
) will default to the one that appears first in the search path (which may or may not be what you want, depending on how you use schemas).
from superset.
I tried that! As you can see their example, there is no way to pass schema name during engine creation, session defaults to public.
sa.create_engine('redshift+psycopg2://[email protected]:5439/database')
.
from superset.
Did you try using the url specified in your AWS Redshift instance. I believe you don't need to specify schema name in the url, it will automatically connect to the schema user has access to.
from superset.
That user has access to multiple schema. I was able to connect to the public schema. So connection works fine. There is no way to explicitly specify if I want to access schema other than public (or changing the order of search_path parameter from AWS console, but the point is there is no way to explicitly specifying schema name after connection is established in Caravel.
from superset.
I'm guessing that these lines of code makes sqlalchemy scans only to public schema (on models.py, Database class).
def get_table(self, table_name): meta = MetaData()
I think it could be improved a bit by adding schema_name parameter?
meta = Metadata(schema=schema_name)
from superset.
I don't have any postgres database laying around to test, but doesnt it have the notion of owner on top of the notion of schema? Have you tried the owner.tablename notation?
from superset.
Here it is the traceback I am getting while trying to add a table in a non public schema :
2016-04-01 11:19:00,778:INFO:werkzeug:127.0.0.1 - - [01/Apr/2016 11:19:00] "GET /tablemodelview/add HTTP/1.1" 200 -
2016-04-01 11:19:21,096:ERROR:root:clean.pages
Traceback (most recent call last):
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/views.py", line 171, in post_add
table.fetch_metadata()
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py", line 608, in fetch_metadata
table = self.database.get_table(self.table_name)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py", line 340, in get_table
autoload_with=self.get_sqla_engine())
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 416, in __new__
metadata._remove_table(name, schema)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 411, in __new__
table._init(name, metadata, *args, **kw)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 484, in _init
self._autoload(metadata, autoload_with, include_columns)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 496, in _autoload
self, include_columns, exclude_columns
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1972, in run_callable
return conn.run_callable(callable_, *args, **kwargs)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1477, in run_callable
return callable_(self, *args, **kwargs)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 364, in reflecttable
return insp.reflecttable(table, include_columns, exclude_columns)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 563, in reflecttable
table_name, schema, **table.dialect_kwargs):
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 369, in get_columns
**kw)
File "<string>", line 2, in get_columns
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
ret = fn(self, con, *args, **kw)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py", line 2376, in get_columns
info_cache=kw.get('info_cache'))
File "<string>", line 2, in get_table_oid
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
ret = fn(self, con, *args, **kw)
File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py", line 2274, in get_table_oid
raise exc.NoSuchTableError(table_name)
NoSuchTableError: clean.pages
This is using sqlachemy-redshift
.
from superset.
Usual disclaimer -- I have no idea what I am doing but this kind of fix the immediate issue:
yml@garfield$ (git: master)git diff /srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py /srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py.ori
diff --git a/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py b/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py.ori
index 1856e48..e8973c4 100644
--- a/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py
+++ b/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py.ori
@@ -333,12 +333,7 @@ class Database(Model, AuditMixinNullable):
return {grain.name: grain for grain in self.grains()}
def get_table(self, table_name):
- if '.' in table_name:
- schema = table_name.split('.')[0]
- table_name = table_name.split('.')[1]
- meta = MetaData(schema=schema)
- else:
- meta = MetaData()
+ meta = MetaData()
return Table(
table_name, meta,
autoload=True,
However then I run into the next dragon:
2016-04-01 13:14:26,015:INFO:werkzeug:127.0.0.1 - - [01/Apr/2016 13:14:26] "GET /caravel/explore/table/4/?viz_type=table&row_limit=50000&include_search=false&metrics=count&wher
e=&having=&flt_col_0=anonymous_id&flt_op_0=in&flt_eq_0=&slice_id=&slice_name=&collapsed_fieldsets=&action=&datasource_name=clean.pages&datasource_id=4&datasource_type=table&pre
vious_viz_type=table&json=true&force=false HTTP/1.1" 500 -
Traceback (most recent call last):
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", line 26, in wraps
return f(self, *args, **kwargs)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/caravel/models.py", line 1115, in wrapper
return f(*args, **kwargs)
File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/caravel/views.py", line 449, in explore
raise e
ProgrammingError: (psycopg2.ProgrammingError) relation "clean.pages" does not exist
[SQL: 'SELECT COUNT(*) AS count \nFROM "clean.pages" \n LIMIT 50000']
The root of this issue might be that we are missing a schema
field here or at least we should follow the convention of schema
.tablename
. We could easily separate the 2 elements on .
.
from superset.
If someone knows for sure that there is no way to specify the schema in the URL, or specifying a searchpath server side, and that @ariepratama 's solution works, I'm ok with adding a schema field in the connection model and passing it to the Metadata() call.
Can someone tests that in iPython or however else against Redshift and confirms that it allows to fetch the metadata?
from superset.
@mistercrunch There is no way to specify the schema in the URI. It appears @ariepratama's solution should work. This SO answer mentions it as well.
from superset.
In the docs bellow it seems that if you use a username that matches the schema name then it will substitute the searchpath.
http://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
Can you try that?
I wonder if there are other ways to alter and persist the search_path on the server...
We should probably add a catchall params field to the Database model, and allow for schema, which we'd squeeze in the Metadata
call if it exist. I guess I'd have to recreate the problem on a local postgres if I wanted to work on it.
from superset.
@mistercrunch I have a Redshift instance and am willing to help and make the changes to my Caravel setip. As I'm new to Python I'll be a little slow.
One problem - I can't seem to find the Metadata() call or the models.py in my setup.
from superset.
Turns creating a user with a name the same as a schema doesn't help either. However when querying the user's search path:
show search_path;
It returns this:
$user, public
It's not clear this helps Redshift locate the table.
from superset.
Seems like should have workd, so Jjust double checking, did you use this user in your sqlachelmy URI?
from superset.
Yep - I even tried forcing the search path setting. It didn't work in SQLAlchemy but it worked fine in Postico.
Testing the connection would continue to throw this error:
ERROR: Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/caravel/views.py", line 592, in testconn
engine.connect()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2018, in connect
return self._connection_cls(self, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 72, in __init__
if connection is not None else engine.raw_connection()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2104, in raw_connection
self.pool.unique_connection, _connection)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
e, dialect, self)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/s...y/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/pool.py", line 607, in __connect
connection = self.__pool._invoke_creator(self)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 97, in connect…
Furthermore, adding any tables to Caravel from that schema with the saved Redshift DB resulted in "Table not found".
from superset.
from superset.
@mistercrunch comma should be deleted at models.py line 299 "engine_params": {},
should be "engine_params": {}
(without comma at the end) otherwise it will be buggy, got Expecting property name enclosed in double quotes error when trying to add the json params (that comma will keep generated though I've delete it on database configuration). Also need to add the schema parameter into every FromClause object (returned by calling table()
function on sqlalchemy.sql). if I add these lines:
metadata_params = self.database.get_extra().get('metadata_params', None)
setattr(from_clause,'schema', metadata_params.get('schema', None))
at line 574 and change line 606 into something like:
t = table(self.table_name)
setattr(t, 'schema', metadata_params.get('schema', None))
subq = select(inner_select_exprs)
subq = subq.select_from(t)
it will work for this issue on redshift and postgres
from superset.
Sorry guys, my bad. @mistercrunch I think you're right. Setting the username to the name of the schema seems to work.
Unfortunately I made many other changes recently on top of that which also may have fixed it. The core issue - I was using an incorrect port in the SQLAlchemy URI. Embarrassing.
I'm now querying successfully with redshift+psycopg2.
from superset.
Please confirm that the current approach works for postgres/redshift!
from superset.
Hi @mistercrunch - I managed to connect using a username matching the schema name, however after trying the latest commits on the master branch, I can't seem to set the schema name still. It just returns the tables underneath schema $user
I gave the user permissions to access this schema and view all tables inside the schema. Next I added the following JSON into the Extras field (not sure if this is correct):
{
"metadata_params": {
"schema":"atomic"
}
}
As an example:
Username: bob
Schema 1: bob
Schema 2: atomic
Access bob has: All tables under bob and atomic
bob can access all tables under bob and atomic in Postico. However in Caravel, bob can only see tables under bob. Even setting the metadata params object has no bearing on the tables returned by Caravel.
from superset.
What u know about scheme...can u tell me..
On 6 Apr 2016 10:01 am, "Robert Kingston" [email protected] wrote:
Hi @mistercrunch https://github.com/mistercrunch - I managed to connect
using a username matching the schema name, however after trying the latest
commits on the master branch, I can't seem to set the schema name still. It
just returns the tables underneath schema $userI gave the user permissions to access this schema and view all tables
inside the schema. Next I added the following JSON into the Extras field
(not sure if this is correct):{
"metadata_params": {
schema:"atomic"
}
}—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#217 (comment)
from superset.
Just edited my answer above with details about the schema.
Both schemas have tables underneath which bob has access to. Caravel returns tables for bob under the bob schema but not the atomic schema. Even when setting the extras object.
from superset.
Thanks
On 6 Apr 2016 10:20 am, "Robert Kingston" [email protected] wrote:
Just edited my answer above with details about the schema.
Both schemas have tables underneath which bob has access to. Caravel
returns tables for bob under the bob schema but not the atomic schema. Even
when setting the extras object.—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#217 (comment)
from superset.
Can someone document that in docs/installation.rst
maybe with a nice screenshot of the AWS console?
Thanks @rsmenon !
from superset.
@rsmenon - The solution does not seem to work.
These are the steps that I followed:-
- I created a new parameter group for redshift. $user, public, schema1
- Update my cluster to map new parameter group and rebooted it.
- I went to my console and queried a table which is only in "schema1" without specifying schema name
Eg -select * from <tablename>
instead ofselect * from <schema1>.<tablename>
It worked fine and returned me data. - I went to Caravel and created a new Database connection using
redshift+psycopg2://username:password@host:5439/dbname - Clicked on Test Connection. Pop up shows "Connection seems ok!". But it only retrieved public schema tables and did not load tables from
schema1
. I am not able to see tables fromschema1
from superset.
Altering a the search_path
is a nice temporary work around but it is not a bullet proof solution because not everyone can do it and it does not support table with the same name in multiple format.
It is possible with the version on master as of yesterday to add a table with the full name '.
' but then this table can't be used to create a slice. The generated sql does not quote it properlyipdb> n
> /srv/virtualenvs/noodleenv/src/caravel/caravel/models.py(647)query()
645 sql = "{}".format(
646 qry.compile(engine, compile_kwargs={"literal_binds": True}))
--> 647 df = pd.read_sql_query(
648 sql=sql,
649 con=engine
ipdb> sql
u'SELECT COUNT(*) AS count \nFROM "clean.sessions" \n LIMIT 50000'
"
around clean.sessions
is not valid
ipdb> engine.execute('SELECT COUNT(*) AS count \nFROM "clean.sessions" \n LIMIT 1')
*** ProgrammingError: (psycopg2.ProgrammingError) relation "clean.sessions" does not exist
[SQL: 'SELECT COUNT(*) AS count \nFROM "clean.sessions" \n LIMIT 1']
The following statement would be valid:
ipdb> engine.execute('SELECT COUNT(*) AS count \nFROM clean.sessions \n LIMIT 1')
<sqlalchemy.engine.result.ResultProxy object at 0x7fe9edb60a90>
ipdb> engine.execute('SELECT COUNT(*) AS count \nFROM clean.sessions \n LIMIT 1').fetchall()
[(10709562L,)]
from superset.
@shahneil88 Not quite sure what you mean by "only retrieved public schema tables". Can you add a slice from a table that exists only in schema1
? I'm able to do that with my cluster. I didn't see a screen that listed all the tables, but maybe I missed that (or it doesn't show up because of how my schemas are setup).
@yml Just to be sure I understood you correctly: altering the search path does work for you, right? I agree with you that altering the global search path can only be done by admins. Users can set their own search path in a session with set search_path to schema1, schema2, public;
. I don't know if we can supply some pre-load statements to Caravel that are applied once as soon as a connection is established. If so, that might be a good place to do this (and maybe set other things like timeout).
I hadn't tested mine with schema.table
(I used the unqualified names for the tables in different schemas) since we don't have duplicated names, so that might be something I missed. The issue appears to be incorrect formatting by SQLAlchemy, and not Caravel, is that correct?
from superset.
@rsmenon I would suspect this chunk of code to not do the right thing with the fully qualified name schema.table
.
I confirm that altering the search_path
is not an option in our case.
from superset.
I ended up installing postgres locally and getting it to work. The approach is simply to provide the schema name as an attribute of the table. It's out in the latest pypi release that I will announce this morning.
from superset.
@mistercrunch Looks like their email got pwned. They responded to this issue earlier by email, so the virus is just sending spam to all emails in their address book (including the Github issue-reply email). Since you're an admin on this repo, you can probably just delete that comment.
from superset.
deleted 3 spam related comments...
from superset.
Related Issues (20)
- [SQL Lab] Cannot perform any query when change language
- Top Legend of the charts always placed on right HOT 2
- Cant take screenshot: 'chromedriver' executable may have wrong permissions. HOT 1
- Cannot send Cross-Origin-Resource-Policy from embed code HOT 2
- Latest release (3.1.1) missing arm64 docker images HOT 4
- Add Romania to countries available in choropleth HOT 2
- Import dashboard API
- Timerange filter empty result
- Using jinja filters, when i drilldown, the filter is not passed through.
- Unable to start non-dev locally on main HOT 3
- Drill to detail gives error on some charts when using Druid. HOT 3
- helm 0.12.6 fails on init db with HOT 4
- Drill to detail fails with "Error: 400 Bad Request: The CSRF tokens do not match." HOT 1
- Embedded dashboard error is always in English HOT 1
- Superset SQL parser does not recognize FETCH FIRST clause causing syntax error when appending LIMIT clause
- [SIP-119] Table Visualizations with Time Comparison Data HOT 5
- The mixed chart histogram indicates the order adjustment of the items HOT 1
- After successfully logging in, you can view the user's plaintext password via F12 (Developer Tools). HOT 1
- [Bug] [v3.0.4] Dataset serialization can produce bad export. ("An error occurred while importing dataset")
- Y Axis boundaries on Line/Area/Bar charts remove values rather than truncating. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from superset.