Git Product home page Git Product logo

Comments (32)

mistercrunch avatar mistercrunch commented on May 4, 2024 1

what about using this?
https://pypi.python.org/pypi/sqlalchemy-redshift

from superset.

rsmenon avatar rsmenon commented on May 4, 2024 1

@ChiragKParmar and others (cc @mistercrunch): I use schemas extensively in the cluster I manage and Caravel + psycopg2 works perfectly fine out of the box and I can access all tables in the schemas on the search path. I suspect you guys might be using the default parameter group for Redshift, which only has $user, public as the search path.

You can set a custom search_path for Redshift from the admin console in AWS. Go to your Redshift dashboard > Parameter Groups and create a new cluster parameter group (or modify an existing one). In it, change the search_path property to include the schemas you want.

Please also note: setting a search path in parameter groups sets it for the entire cluster (for all users). In case of duplicate names, an unqualified table name in a query (i.e. tables called simply as <table> instead of <schema>.<table>) will default to the one that appears first in the search path (which may or may not be what you want, depending on how you use schemas).

from superset.

ChiragKParmar avatar ChiragKParmar commented on May 4, 2024

I tried that! As you can see their example, there is no way to pass schema name during engine creation, session defaults to public.
sa.create_engine('redshift+psycopg2://[email protected]:5439/database').

from superset.

machira avatar machira commented on May 4, 2024

Did you try using the url specified in your AWS Redshift instance. I believe you don't need to specify schema name in the url, it will automatically connect to the schema user has access to.

from superset.

ChiragKParmar avatar ChiragKParmar commented on May 4, 2024

That user has access to multiple schema. I was able to connect to the public schema. So connection works fine. There is no way to explicitly specify if I want to access schema other than public (or changing the order of search_path parameter from AWS console, but the point is there is no way to explicitly specifying schema name after connection is established in Caravel.

from superset.

ariepratama avatar ariepratama commented on May 4, 2024

I'm guessing that these lines of code makes sqlalchemy scans only to public schema (on models.py, Database class).
def get_table(self, table_name): meta = MetaData()

I think it could be improved a bit by adding schema_name parameter?

meta = Metadata(schema=schema_name)

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

I don't have any postgres database laying around to test, but doesnt it have the notion of owner on top of the notion of schema? Have you tried the owner.tablename notation?

from superset.

yml avatar yml commented on May 4, 2024

Here it is the traceback I am getting while trying to add a table in a non public schema :

2016-04-01 11:19:00,778:INFO:werkzeug:127.0.0.1 - - [01/Apr/2016 11:19:00] "GET /tablemodelview/add HTTP/1.1" 200 -
2016-04-01 11:19:21,096:ERROR:root:clean.pages
Traceback (most recent call last):
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/views.py", line 171, in post_add
    table.fetch_metadata()
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py", line 608, in fetch_metadata
    table = self.database.get_table(self.table_name)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py", line 340, in get_table
    autoload_with=self.get_sqla_engine())
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 416, in __new__
    metadata._remove_table(name, schema)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 411, in __new__
    table._init(name, metadata, *args, **kw)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 484, in _init
    self._autoload(metadata, autoload_with, include_columns)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 496, in _autoload
    self, include_columns, exclude_columns
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1972, in run_callable
    return conn.run_callable(callable_, *args, **kwargs)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1477, in run_callable
    return callable_(self, *args, **kwargs)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 364, in reflecttable
    return insp.reflecttable(table, include_columns, exclude_columns)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 563, in reflecttable
    table_name, schema, **table.dialect_kwargs):
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 369, in get_columns
    **kw)
  File "<string>", line 2, in get_columns
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
    ret = fn(self, con, *args, **kw)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py", line 2376, in get_columns
    info_cache=kw.get('info_cache'))
  File "<string>", line 2, in get_table_oid
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/engine/reflection.py", line 54, in cache
    ret = fn(self, con, *args, **kw)
  File "/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py", line 2274, in get_table_oid
    raise exc.NoSuchTableError(table_name)
NoSuchTableError: clean.pages

This is using sqlachemy-redshift.

from superset.

yml avatar yml commented on May 4, 2024

Usual disclaimer -- I have no idea what I am doing but this kind of fix the immediate issue:

yml@garfield$ (git: master)git diff /srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py /srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py.ori
diff --git a/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py b/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py.ori
index 1856e48..e8973c4 100644
--- a/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py
+++ b/srv/virtualenvs/noodleenv/local/lib/python2.7/site-packages/caravel/models.py.ori
@@ -333,12 +333,7 @@ class Database(Model, AuditMixinNullable):
         return {grain.name: grain for grain in self.grains()}

     def get_table(self, table_name):
-        if '.' in table_name:
-            schema = table_name.split('.')[0]
-            table_name = table_name.split('.')[1]
-            meta = MetaData(schema=schema)
-        else:
-            meta = MetaData()
+        meta = MetaData()
         return Table(
             table_name, meta,
             autoload=True,

However then I run into the next dragon:

2016-04-01 13:14:26,015:INFO:werkzeug:127.0.0.1 - - [01/Apr/2016 13:14:26] "GET /caravel/explore/table/4/?viz_type=table&row_limit=50000&include_search=false&metrics=count&wher
e=&having=&flt_col_0=anonymous_id&flt_op_0=in&flt_eq_0=&slice_id=&slice_name=&collapsed_fieldsets=&action=&datasource_name=clean.pages&datasource_id=4&datasource_type=table&pre
vious_viz_type=table&json=true&force=false HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", line 26, in wraps
    return f(self, *args, **kwargs)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/caravel/models.py", line 1115, in wrapper
    return f(*args, **kwargs)
  File "/srv/virtualenvs/noodleenv/lib/python2.7/site-packages/caravel/views.py", line 449, in explore
    raise e
ProgrammingError: (psycopg2.ProgrammingError) relation "clean.pages" does not exist
 [SQL: 'SELECT COUNT(*) AS count \nFROM "clean.pages" \n LIMIT 50000']

The root of this issue might be that we are missing a schema field here or at least we should follow the convention of schema.tablename. We could easily separate the 2 elements on ..

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

If someone knows for sure that there is no way to specify the schema in the URL, or specifying a searchpath server side, and that @ariepratama 's solution works, I'm ok with adding a schema field in the connection model and passing it to the Metadata() call.

Can someone tests that in iPython or however else against Redshift and confirms that it allows to fetch the metadata?

from superset.

bruth avatar bruth commented on May 4, 2024

@mistercrunch There is no way to specify the schema in the URI. It appears @ariepratama's solution should work. This SO answer mentions it as well.

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

In the docs bellow it seems that if you use a username that matches the schema name then it will substitute the searchpath.
http://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html

Can you try that?

I wonder if there are other ways to alter and persist the search_path on the server...

We should probably add a catchall params field to the Database model, and allow for schema, which we'd squeeze in the Metadata call if it exist. I guess I'd have to recreate the problem on a local postgres if I wanted to work on it.

from superset.

kingo55 avatar kingo55 commented on May 4, 2024

@mistercrunch I have a Redshift instance and am willing to help and make the changes to my Caravel setip. As I'm new to Python I'll be a little slow.

One problem - I can't seem to find the Metadata() call or the models.py in my setup.

from superset.

kingo55 avatar kingo55 commented on May 4, 2024

Turns creating a user with a name the same as a schema doesn't help either. However when querying the user's search path:

show search_path;

It returns this:

$user, public

It's not clear this helps Redshift locate the table.

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

Seems like should have workd, so Jjust double checking, did you use this user in your sqlachelmy URI?

from superset.

kingo55 avatar kingo55 commented on May 4, 2024

Yep - I even tried forcing the search path setting. It didn't work in SQLAlchemy but it worked fine in Postico.

Testing the connection would continue to throw this error:

ERROR: Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/caravel/views.py", line 592, in testconn
    engine.connect()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2018, in connect
    return self._connection_cls(self, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 72, in __init__
    if connection is not None else engine.raw_connection()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2104, in raw_connection
    self.pool.unique_connection, _connection)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
    e, dialect, self)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/s...y/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/pool.py", line 607, in __connect
    connection = self.__pool._invoke_creator(self)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 97, in connect…

Furthermore, adding any tables to Caravel from that schema with the saved Redshift DB resulted in "Table not found".

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

#249

from superset.

ariepratama avatar ariepratama commented on May 4, 2024

@mistercrunch comma should be deleted at models.py line 299 "engine_params": {}, should be "engine_params": {} (without comma at the end) otherwise it will be buggy, got Expecting property name enclosed in double quotes error when trying to add the json params (that comma will keep generated though I've delete it on database configuration). Also need to add the schema parameter into every FromClause object (returned by calling table() function on sqlalchemy.sql). if I add these lines:

metadata_params = self.database.get_extra().get('metadata_params', None)
setattr(from_clause,'schema', metadata_params.get('schema', None))

at line 574 and change line 606 into something like:

t = table(self.table_name)
setattr(t, 'schema', metadata_params.get('schema', None))
subq = select(inner_select_exprs)
subq = subq.select_from(t)

it will work for this issue on redshift and postgres

from superset.

kingo55 avatar kingo55 commented on May 4, 2024

Sorry guys, my bad. @mistercrunch I think you're right. Setting the username to the name of the schema seems to work.

Unfortunately I made many other changes recently on top of that which also may have fixed it. The core issue - I was using an incorrect port in the SQLAlchemy URI. Embarrassing.

I'm now querying successfully with redshift+psycopg2.

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

Please confirm that the current approach works for postgres/redshift!

from superset.

kingo55 avatar kingo55 commented on May 4, 2024

Hi @mistercrunch - I managed to connect using a username matching the schema name, however after trying the latest commits on the master branch, I can't seem to set the schema name still. It just returns the tables underneath schema $user

I gave the user permissions to access this schema and view all tables inside the schema. Next I added the following JSON into the Extras field (not sure if this is correct):

{
    "metadata_params": {
         "schema":"atomic"
    }
}

As an example:

Username: bob
Schema 1: bob
Schema 2: atomic
Access bob has: All tables under bob and atomic

bob can access all tables under bob and atomic in Postico. However in Caravel, bob can only see tables under bob. Even setting the metadata params object has no bearing on the tables returned by Caravel.

from superset.

broughmilan avatar broughmilan commented on May 4, 2024

What u know about scheme...can u tell me..
On 6 Apr 2016 10:01 am, "Robert Kingston" [email protected] wrote:

Hi @mistercrunch https://github.com/mistercrunch - I managed to connect
using a username matching the schema name, however after trying the latest
commits on the master branch, I can't seem to set the schema name still. It
just returns the tables underneath schema $user

I gave the user permissions to access this schema and view all tables
inside the schema. Next I added the following JSON into the Extras field
(not sure if this is correct):

{
"metadata_params": {
schema:"atomic"
}
}


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#217 (comment)

from superset.

kingo55 avatar kingo55 commented on May 4, 2024

Just edited my answer above with details about the schema.

Both schemas have tables underneath which bob has access to. Caravel returns tables for bob under the bob schema but not the atomic schema. Even when setting the extras object.

from superset.

broughmilan avatar broughmilan commented on May 4, 2024

Thanks
On 6 Apr 2016 10:20 am, "Robert Kingston" [email protected] wrote:

Just edited my answer above with details about the schema.

Both schemas have tables underneath which bob has access to. Caravel
returns tables for bob under the bob schema but not the atomic schema. Even
when setting the extras object.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#217 (comment)

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

Can someone document that in docs/installation.rst maybe with a nice screenshot of the AWS console?

Thanks @rsmenon !

from superset.

shahneil88 avatar shahneil88 commented on May 4, 2024

@rsmenon - The solution does not seem to work.

These are the steps that I followed:-

  1. I created a new parameter group for redshift. $user, public, schema1
  2. Update my cluster to map new parameter group and rebooted it.
  3. I went to my console and queried a table which is only in "schema1" without specifying schema name
    Eg - select * from <tablename> instead of select * from <schema1>.<tablename>
    It worked fine and returned me data.
  4. I went to Caravel and created a new Database connection using
    redshift+psycopg2://username:password@host:5439/dbname
  5. Clicked on Test Connection. Pop up shows "Connection seems ok!". But it only retrieved public schema tables and did not load tables from schema1. I am not able to see tables from schema1

from superset.

yml avatar yml commented on May 4, 2024

Altering a the search_path is a nice temporary work around but it is not a bullet proof solution because not everyone can do it and it does not support table with the same name in multiple format.

It is possible with the version on master as of yesterday to add a table with the full name '.

' but then this table can't be used to create a slice. The generated sql does not quote it properly

ipdb> n
> /srv/virtualenvs/noodleenv/src/caravel/caravel/models.py(647)query()
    645         sql = "{}".format(
    646             qry.compile(engine, compile_kwargs={"literal_binds": True}))
--> 647         df = pd.read_sql_query(
    648             sql=sql,
    649             con=engine

ipdb> sql
u'SELECT COUNT(*) AS count \nFROM "clean.sessions" \n LIMIT 50000'

" around clean.sessions is not valid

ipdb> engine.execute('SELECT COUNT(*) AS count \nFROM "clean.sessions" \n LIMIT 1')
*** ProgrammingError: (psycopg2.ProgrammingError) relation "clean.sessions" does not exist
 [SQL: 'SELECT COUNT(*) AS count \nFROM "clean.sessions" \n LIMIT 1']

The following statement would be valid:

ipdb> engine.execute('SELECT COUNT(*) AS count \nFROM clean.sessions \n LIMIT 1')                                                                                               
<sqlalchemy.engine.result.ResultProxy object at 0x7fe9edb60a90>
ipdb> engine.execute('SELECT COUNT(*) AS count \nFROM clean.sessions \n LIMIT 1').fetchall()
[(10709562L,)]

from superset.

rsmenon avatar rsmenon commented on May 4, 2024

@shahneil88 Not quite sure what you mean by "only retrieved public schema tables". Can you add a slice from a table that exists only in schema1? I'm able to do that with my cluster. I didn't see a screen that listed all the tables, but maybe I missed that (or it doesn't show up because of how my schemas are setup).

@yml Just to be sure I understood you correctly: altering the search path does work for you, right? I agree with you that altering the global search path can only be done by admins. Users can set their own search path in a session with set search_path to schema1, schema2, public;. I don't know if we can supply some pre-load statements to Caravel that are applied once as soon as a connection is established. If so, that might be a good place to do this (and maybe set other things like timeout).

I hadn't tested mine with schema.table (I used the unqualified names for the tables in different schemas) since we don't have duplicated names, so that might be something I missed. The issue appears to be incorrect formatting by SQLAlchemy, and not Caravel, is that correct?

from superset.

yml avatar yml commented on May 4, 2024

@rsmenon I would suspect this chunk of code to not do the right thing with the fully qualified name schema.table.
I confirm that altering the search_path is not an option in our case.

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

I ended up installing postgres locally and getting it to work. The approach is simply to provide the schema name as an attribute of the table. It's out in the latest pypi release that I will announce this morning.

from superset.

rsmenon avatar rsmenon commented on May 4, 2024

@mistercrunch Looks like their email got pwned. They responded to this issue earlier by email, so the virus is just sending spam to all emails in their address book (including the Github issue-reply email). Since you're an admin on this repo, you can probably just delete that comment.

from superset.

mistercrunch avatar mistercrunch commented on May 4, 2024

deleted 3 spam related comments...

from superset.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.