tableau / hyper-api-samples Goto Github PK

View Code? Open in Web Editor NEW

127.0 26.0 70.0 9.85 MB

Sample code to get started with the Hyper API.

Home Page: https://help.tableau.com/current/api/hyper_api/en-us/index.html

License: MIT License

CMake 1.77% C++ 10.14% C# 10.24% Batchfile 0.11% Shell 0.08% Java 12.36% Python 65.30%

tableau hyper hyperapi

hyper-api-samples's Introduction

Hyper API Samples

This repo is the home of the Hyper API samples. It contains both Tableau-Supported Samples and Community-Supported Samples:

The official samples are packaged and shipped within the Hyper API itself. All of the official samples are available for each language supported by the Hyper API: Python, Java, C++, and C#/.Net (.NET Standard 2.0) and are entirely supported and maintained by Tableau.
The community samples focus on individual use cases and are Python-only. They have been written by members of the Tableau development team but receive the level of 'Community Supported' as they may not be maintained in newer releases. Each of the samples has been manually tested and reviewed before publishing and will still be open to pull requests and issues.
See our support page for more information.

If you are looking to learn more about the Hyper API, please check out the official documentation.

What is the Hyper API?

For the unfamiliar, the Hyper API contains a set of functions you can use to automate your interactions with Tableau extract (.hyper) files. You can use the API to create new extract files, or to open existing files, and then insert, delete, update, or read data from those files. Using the Hyper API developers and administrators can:

Create extract files for data sources not currently supported by Tableau.
Automate custom extract, transform and load (ETL) processes (for example, implement rolling window updates or custom incremental updates).
Retrieve data from an extract file.

How do I install the Hyper API?

It is a prerequisite that to work with these code samples, the Hyper API is installed in your language of choice. Head to our official Hyper API Documentation to get it up and running.

How do I get help or give feedback?

If you have questions, want to submit ideas, or give feedback on the code, please do so by submitting an issue on this project.

Contributions

Code contributions and improvements by the community are welcomed and accepted on a case-by-case basis. See the LICENSE file for current open-source licensing and use information.

Before we can accept pull requests from contributors, we require a signed Contributor License Agreement (CLA).

Hyper API License

Note that the Hyper API comes with a proprietary license, see the LICENSE file in the Hyper API package.

hyper-api-samples's People

Contributors

Stargazers

Watchers

hyper-api-samples's Issues

What is the max size of hyperfile upload support?

Hi question regarding https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/publish-hyper where it shows how to create only one hyper file and upload to Tableau server. I am trying to do the same: I have 30G of CSV data on s3, I am planning to download it locally, and convert them to hyper files.

Question:

can I break my table's data into several hyper files and upload to Tableau server separately?
what is the max size of hyper file can be uploaded from local to Tableau server? assuming I am using the python client.

Thanks!

Hyperfile within Azure databricks

Hello,

I am creating an hyperfile inside Azure databricks. I want the hyperfile created to be saved in Azure Blob storage. How and where can i configure to save the .hyper file?

Query 18 spill to disk and take a lot of space

I am trying to run TPCH-SF100 on my laptop, the good news, I can build the hyper file easily, the test run very well except Query 18 which timeout as I don't have enough empty space on my laptop , I saw some people sort lineitem but I can't do it on my laptop as it time out too

defining FK and PK did not help

SELECT
    --Query18
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    SUM(l_quantity)
FROM
    customer,
    orders,
    lineitem
WHERE
    o_orderkey IN (
        SELECT
            l_orderkey
        FROM
            lineitem
        GROUP BY
            l_orderkey
        HAVING
            SUM(l_quantity) > 300
    )
    AND c_custkey = o_custkey
    AND o_orderkey = l_orderkey
GROUP BY
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
ORDER BY
    o_totalprice DESC,
    o_orderdate
LIMIT
    100;

Read from s3 private buckets

Hi Folks,

Is it possible to read from private buckets? I've tried different ways to read from my s3 (inside EC2) but no success. IAM has access to the bucket, it's possible to pefrorm head-bucket, cp operations etc. The only issue when I'm trying to use hyper API to read directly from S3.

add additional command before building for .net

for DotNet build commands (build.bat / build.sh) I recommend you add the following like
dotnet add package Tableau.HyperAPI.NET
before you do
dotnet build Example.csproj

otherwise some people that just git cloned this to their machine do not have required references.

Error converting parquet to hyper

Failed to start a new Hyper instance.
Context: 0x86a93465

Caused by:
The Hyper server process exited during startup with exit code: 1
Command-line: ""C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\tableauhyperapi\bin\hyper\hyperd.exe" "run" "--date-style=MDY" "--date-style-lenient=false" "--experimental-external-format-parquet=1" "--init-user=tableau_internal_user" "--language=en_US" "--log-config=file,json,all,hyperd,0" "--log-dir=C:\XXX\XXX" "--no-password=true" "--skip-license=true" "--telemetry-opt-in=" "--listen-connection" "tab.pipe://./pipe/auto" "--callback-connection" "tab.pipe://./pipe/{9F41B5E8-19D8-4E88-A0BE-6852C66DCF87}" "
Child process' stdout:
LogFile: \?\C:\XXX\XXX\hyperd.log

    Child process' stderr:
    Error(s) while applying config file or command line parameters.
    Error. Could not interpret 'experimental_external_format_parquet' as global setting: No internal setting named 'experimental_external_format_parquet' exists.
    Critical setting errors encountered, shutting down.

Request: Column Names Added to Results Object

Request:
I'd like to request the connection.execute_query function to automatically output the column names to the Results object created. These should be the list of column names from submitted query (all columns if SELECT * used)

Purpose:
Automatically creates column names for Results object (dataframe), which makes it easier for reference later on if needed.

Alternative (in case of technical limitation):

create empty list
iterate over list of columns for desired table using <table_obj_name>.columns
2.a.) extract column name using name method from object
2.b.) convert returned column name to string (using __str__() ), because it's originally returned as tabledefinition object
2.c.) append to list
use connection.execute_query function for desired query that creates Results object, and then pass completed list of columns from 2.c. above as argument for columns when dataframe is created

Screenshot attached with example

Connect multiple tables only possible with joins?

Is it possible to connect tables via relationships when creating a hyper file?
I couldn't find anything on that topic in the documentation (Or I'm looking for the wrong keywords..)
I'd expect it to be possible, as the extracts contain the relationships.

Hope it's not a trivial thing.
Thank you!

REQUEST : add a HASH32 and HASH64 functions

In order to improve join performance , would it be possible to have a scalar functions that return a int(HASH32) or bigint (HASH64) resulting from a hash algorithme from a string input.

The idea is to allow to create column in table and compute them using the functions.

Like
UPDATE PRODUCT
SET PROD_JoinKey = XXHASH3(CODSOC || '+' || CODPRODUCT)
WHERE PROD_JoinKey <> XXHASH3(CODSOC || '+' || CODPRODUCT)

UPDATE PRODUCT
SET PROD_JoinKey = HASH64(CODSOC || '+' || CODPRODUCT, 'XXHASH3')
WHERE PROD_JoinKey <> HASH64((CODSOC || '+' || CODPRODUCT,'XXHASH3')

The objective is to hash fast, not a crypto usage, limit collision of course, with a standard and open algorithme (because the data can came from outside already hashed or not)

I take XXHASH as exemple but there is many others algo : a real zoo :-)

Build a hyper table definition from source_schema in Java

private static TableDefinition CUSTOMER_TABLE = new TableDefinition(
// Since the table name is not prefixed with an explicit schema name, the table will reside in the default "public" namespace.
new TableName("Customer"),
asList(
new TableDefinition.Column("Customer ID", text(), NOT_NULLABLE),
new TableDefinition.Column("Customer Name", text(), NOT_NULLABLE),
new TableDefinition.Column("Loyalty Reward Points", bigInt(), NOT_NULLABLE),
new TableDefinition.Column("Segment", text(), NOT_NULLABLE)
)
);

How to handle hyperextract table creation by reading the schema from a file and create the table definitions dynamically?
I was able create the table dynamically for extapi 2.0, but having trouble with hyperapi. I am newbie, any tips/help is appreciated.

Unable to leverage multi-table hyper file as dataset for Tableau server.

Was on the Hyper API workshop call on 6/23. Was suggested to post this here.

dont know if this is really a hyper api issue. But here is how we attempting to leverage it.
I have a data warehouse data mart (fact (140 million rows) with 12 dimensions). If we attempt to create an extract with Tableau Desktop on current gen macbook pro with 32gb ram it takes about 14 hours ish. I created a python script that queries and creates csvs from snowflake for the tables in parallel, then parse the schema of the tables and create a multi table hyper file from it.
The goal is to get that hyper file/extract be uploaded for our creators and explorers to look at the data in tableau server.
I then as a test, just opened tableau desktop pointed at the hyperfile, defined the relationships and attempted to upload the tdsx to the server. My computer than sat for hours and using in the end 100gb of swap and all memory. Evantually ran out of space on my disk after a couple hours.
Thats a bit of detail that may not be necessary but I want to know am I approaching the problem with the right workflow?

Here is my simple scripts right now. They need more work but got me to a multi-table hyper file way quicker than using desktop to create an extract with snowflake direct.

Tableau Extract Automation.zip

tableauhyperapi (v0.0.16123) HyperException: This database does not support 128-bit numerics.

test@local: pip3 list | grep hyper   
tableauhyperapi        0.0.16377

What’s New in the Hyper API (v0.0.16123)
December 7, 2022

Added support for 128-bit numerics. This allows a precision of up to 38 for the NUMERIC SQL type.

Run:

from pathlib import Path

from tableauhyperapi import HyperProcess, Telemetry, \
    Connection, CreateMode, \
    NOT_NULLABLE, NULLABLE, SqlType, TableDefinition, \
    Inserter, \
    escape_name, escape_string_literal, \
    HyperException

customer_table = TableDefinition(
    table_name="Customer",
    columns=[
        TableDefinition.Column("Customer ID", SqlType.text(), NOT_NULLABLE),
        TableDefinition.Column("Customer Amount", SqlType.numeric(38,3), NOT_NULLABLE),
        TableDefinition.Column("Customer Score", SqlType.big_int(), NOT_NULLABLE)
    ]
)

path_to_database = Path("customer.hyper")

with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
    
    connection_parameters = {"lc_time": "en_US"}
    
    with Connection(endpoint=hyper.endpoint,
                        database=path_to_database,
                        create_mode=CreateMode.CREATE_AND_REPLACE,
                        parameters=connection_parameters) as connection:
        
        connection.catalog.create_table(table_definition=customer_table)

Output:

---------------------------------------------------------------------------
HyperException                            Traceback (most recent call last)
Input In [32], in <module>
     23 connection_parameters = {"lc_time": "en_US"}
     25 with Connection(endpoint=hyper.endpoint,
     26                     database=path_to_database,
     27                     create_mode=CreateMode.CREATE_AND_REPLACE,
     28                     parameters=connection_parameters) as connection:
---> 30     connection.catalog.create_table(table_definition=customer_table)

File ~/.local/lib/python3.8/site-packages/tableauhyperapi/catalog.py:65, in Catalog.create_table(self, table_definition)
     59 def create_table(self, table_definition: TableDefinition):
     60     """
     61     Creates a table. Raise an exception if the table already exists.
     62 
     63     :param table_definition: the table definition.
     64     """
---> 65     self.__create_table(table_definition, True)

File ~/.local/lib/python3.8/site-packages/tableauhyperapi/catalog.py:57, in Catalog.__create_table(self, table_definition, fail_if_exists)
     55 def __create_table(self, table_definition: TableDefinition, fail_if_exists: bool):
     56     native_table_def = SchemaConverter.table_definition_to_native(table_definition)
---> 57     Error.check(hapi.hyper_create_table(self.__cdata, native_table_def.cdata, fail_if_exists))

File ~/.local/lib/python3.8/site-packages/tableauhyperapi/impl/dllutil.py:100, in Error.check(p)
     97 if p != ffi.NULL:
     98     # this will free the error when it goes out of scope
     99     errp = Error(p)
--> 100     raise errp.to_exception()

HyperException: This database does not support 128-bit numerics.
Context: 0xfa6b0e2f

Request for sample of connection to a external hyperprocess

Would it be possible to have a sample in Python for openning a connection to an hyperprocess that is already openned by another process ?
Using tcp and a fix port for exemple.

wrong result when multiply by a calculus

I have a problem on the query 11 on the TPCH(SF10).
It seams Hyper does not return the correct result set.
By digging into the issue I found something strange :
These Query that is part on the having filter part of the TPCH Q11 return 0

SELECT   
   SUM("PS_SUPPLYCOST" * "PS_AVAILQTY") * (0.0001/10)
from
    "PARTSUPP",
    "SUPPLIER",
    "NATION"
where
    "PS_SUPPKEY" = "S_SUPPKEY"
and "S_NATIONKEY" = "N_NATIONKEY"
and "N_NAME" = 'GERMANY';

Whereas this one return the good results : 8102913.76524679

SELECT   
   SUM("PS_SUPPLYCOST" * "PS_AVAILQTY") /100000       
from
    "PARTSUPP",
    "SUPPLIER",
    "NATION"
where
    "PS_SUPPKEY" = "S_SUPPKEY"
	and "S_NATIONKEY" = "N_NATIONKEY"
	and "N_NAME" = 'GERMANY';

how to access `tableau-libs/hyper/hyperd` packaged in jar

Hi there,

I'm wondering, is there some standard/recommended way how to work with Tableau hyper library in Java world?
Basically, I need to pack Tableau Hyper lib in a jar file, publish to maven (as normal java lib/app) and then use it within some other app.

Imagine, I have library a with HyperAPI (and hyperd binary) published on my private maven. I'm developing project b that has a as a dependency. The problem is that I dont know how to access access binary tableau-libs/hyper/hyperd from package b.

error:

Cause: com.tableau.hyperapi.HyperException: The Hyper executable "/home/me/.m2/repository/a/0.3.1-pk009-SNAPSHOT/hyper/hyperd" does not exist.

Any idea how to proceed?

Publish hyper file does not work

https://github.com/tableau/hyper-api-samples/blob/main/Community-Supported/publish-hyper/publish-hyper-file.py

This example does not seem to publish the data source.

I don't get any errors and I get the full print statements:

Creating single table for publishing.
Tables available in customer.hyper are: [TableName('Extract', 'Extract')]
The number of rows in table "Extract"."Extract" is 2.
The connection to the Hyper file has been closed.
The Hyper process has been shut down.
Signing into ... at https://10ax.online.tableau.com/
Publishing customer.hyper to Group A Reports...
Datasource published. Datasource ID:...

However, when I go into the Tableau Server, there is no data source. I have even tried the following code after publishing to see if I could find it anywhere and no luck

all_datasources, pagination_item = server.datasources.get()
print("\nThere are {} datasources on site: ".format(pagination_item.total_available))
print([datasource.name for datasource in all_datasources])

Is there an updated version or way to publish data sources (specifically hyper files)?

Alteryx Workflow - Python tool - Read Hyper file

I come across a requirement to read Hyper files from our Online Tableau environment as a source in Alteryx Workflow.
I read through the documentation and examples. I could not figure out the following.

Does Python tool in Alteryx be able to read a hyper file online Tableau?
How to set the location of a hyper file which is in Online Tableau.

Thank you.

Hyper API without Tableau installed

Hi,

is it possible to read a Tableau Hyper file with the Hyper API without Tableau installed on the current computer?

I know this is not exactly a question about the examples, but I think the answer would be interesting for others to and couldn´t be found in the documentation.

Regards,
Jan

"'type' must be a SqlType instance"

Hello,
I am trying to ingest my parquet file.
I tried using https://github.com/tableau/hyper-api-samples/blob/9a3ea9c110a87ca27ddf99226837977c69c7dcb5/Community-Supported/parquet-to-hyper/create_hyper_file_from_parquet.py
However when I first tried to use the SqlType.double() I got the following error:
"'type' must be a SqlType instance"
Then I converted the type back to int() to see if it would work.
I get the following error now even if I set the type back to double() (which is what I want)
unable to read from external source.: Source: "~/Desktop/airSensors.parquet" Context: 0xfa6b0e2f
The part of the example that I changed is:

if __name__ == '__main__':
    try:
        # The `airSensors` table to read from the Parquet file.
        table_definition = TableDefinition(
            table_name="airSensors",
            columns=[
                TableDefinition.Column("co", SqlType.double(), NOT_NULLABLE),
                TableDefinition.Column("humidity", SqlType.double(), NOT_NULLABLE),
                TableDefinition.Column("sensor_ids", SqlType.double(), NOT_NULLABLE),
                TableDefinition.Column("temperature", SqlType.double(), NOT_NULLABLE),
                TableDefinition.Column("time", SqlType.date(), NOT_NULLABLE),
            ]
        )

        run_create_hyper_file_from_parquet(
            "airSensors.parquet",
            table_definition,
            "airSensors.hyper")

Thank you
Is there a better way to load parquet to Tableau?
I'm having trouble finding examples.

Read from InputStream instead of nio.Path?

Following https://github.com/tableau/hyper-api-samples/tree/main/Tableau-Supported/Java/read-and-print-data-from-existing-hyper-file

Can I read a hyper file from input stream or byte array instead of a Path?

Hyper API Copy from csv to hyper failed with Context: 0x5fdfad59 without explaining details

My python script used Hyper API Copy from csv to hyper and it failed with Context: 0x5fdfad59 without telling me more details. Here is the stack trace:

  File "/usr/local/lib/python3.7/dist-packages/tableauhyperapi/connection.py", line 230, in execute_command
    row_count_cdata))
  File "/usr/local/lib/python3.7/dist-packages/tableauhyperapi/impl/dllutil.py", line 110, in check
    raise errp.to_exception()
tableauhyperapi.hyperexception.HyperException: 
Context: 0x5fdfad59

versions that I am using

tableauhyperapi==0.0.11889
tableauserverclient==0.12

I do see some discussions in the tableau community about 0x5fdfad59 error but I think I have a different case:

the error is happening on company's production environment and I can't reproduce it locally (it runs fine locally). I have checked the package versions they should be correct.
this error only tells me Context: 0x5fdfad59 while I see error message from other posts will have exact lines and columns that have issue.

Do you have idea on how to debug this? thanks!

Support arm64

Hi!

I'm trying to install this on arm64, and am getting this error:

  Unable to find installation candidates for tableauhyperapi (0.0.14567)

Can we get builds for arm64 for linux, so we can run it on Graviton, and for Mac so we can run it on M1?

https://pypi.org/project/tableauhyperapi/#files

tableauserverclient does this for "all"
https://pypi.org/project/tableauserverclient/#files

will hyper release the arrch64 linux version?

https://tableau.github.io/hyper-db/docs/releases/ shows only x64 version.

does hyper support reading multiple files from the same folders

is this supposed to work ?

create table "region" as (select * from ('./region/*.parquet') );

Unable to install tableauhyperapi on my macbook pro

Getting the following error when running the pip install tableauhyperapi command
ERROR: Could not find a version that satisfies the requirement tableauhyperapi (from versions: none)
ERROR: No matching distribution found for tableauhyperapi

Any insight would be greatly appreciated, please let me know if you need any additional information

Doesn't work with more than 2 tables?

The example works at first but then I tried to add a third table, and I get an error as below. Does anybody know if it is possible to upload a hyper file with more than 2 tables without a TDS?

`tableauserverclient.server.endpoint.exceptions.ServerResponseError:

400011: Bad Request
	There was a problem publishing the file 'Test.hyper'.`

java/scala: InsertDataIntoSingleTable - write hyper file to an arbitrary dir

Hi, I have a problem with writing Hyper file to a temp file (/tmp) directory

...
    val EXTRACT_TABLE = new TableDefinition(
      new TableName("Extract", "Extract"))
      .addColumn("Customer ID", SqlType.text(), NOT_NULLABLE)
      .addColumn("Customer Name", SqlType.text(), NOT_NULLABLE)
      .addColumn("Loyalty Reward Points", SqlType.bigInt(), NOT_NULLABLE)
      .addColumn("Segment", SqlType.text(), NOT_NULLABLE);

    //here is a problem - it cannot create a file/database in tmp dir (it creates empty file)
    val tmpHyperFile = Files.createTempFile("stataggregator-",".hyper")

    val process = new HyperProcess(Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU)
    val connection = new Connection(process.getEndpoint(),
      tmpHyperFile.toString(),
      CreateMode.CREATE_AND_REPLACE)


    val catalog = connection.getCatalog();

    catalog.createSchema(new SchemaName("Extract"));
    catalog.createTable(EXTRACT_TABLE);

    val inserter = new Inserter(connection, EXTRACT_TABLE)

    logger.info("TRY will start")

    try {

      // Insert data into "Extract"."Extract" table
      inserter.add("DK-13375").add("Dennis Kane").add(518).add("Consumer").endRow();
      inserter.add("EB-13705").add("Ed Braxton").add(815).add("Corporate").endRow();
      inserter.execute()

      logger.info("DataFrame copied to: " + tmpHyperFile)
...

If I run the code above, I got an empty tmpHyperFile; if the file is defined as: val customerDatabasePath = Paths.get("localfile"); it works well.

Any idea why defining a dir in a hyper file name makes empty files?

Request : Add CREATE VIEW

Would it be possible to add new object like VIEW ?

Non-OSS tableauhyperapi dependency

This project depends on tableauhyperapi , which is not OSS.
see tableau/server-client-python#594

This should be clearly explained in the README and ideally also a note at the top of the LICENSE file that while the samples are MIT licensed, they are tightly coupled with a non-OSS Python library.

Request for IGNORE_ERROR=>NumErrors for COPY or external table

It would usefull (i think) to have an IGNORE_ERROR option for COPY and external table with à number of errors that can be ignored during the process.

Failed to import tableauhyperapi in virtual environment

When I tried to run my app in virtual environment with tableau hyper api. I hit this error. Please take a look at the below error message:

/Users/sfsdf/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/tableauhyperapi/impl/dll.py:24: UserWarning: Failed to import cdef_compiled module, importing tableauhyperapi will be slow. Please report this to Tableau. Error message was: dlopen(/Users/dsfsd/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/_cffi_backend.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/libffi/lib/libffi.7.dylib Referenced from: /Users/asdasd/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/_cffi_backend.cpython-37m-darwin.so Reason: image not found warnings.warn(f'Failed to import cdef_compiled module, importing tableauhyperapi will be slow. Please report this to ' Traceback (most recent call last): File "/Users/dasd/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/tableauhyperapi/impl/dll.py", line 19, in <module> from .cdef_compiled import ffi File "/Users/sadd/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/tableauhyperapi/impl/cdef_compiled.py", line 13, in <module> import _cffi_backend ImportError: dlopen(/Users/dasd/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/_cffi_backend.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/libffi/lib/libffi.7.dylib Referenced from: /Users/dasd/developer/Tabloader/tabloader-env/lib/python3.7/site-packages/_cffi_backend.cpython-37m-darwin.so Reason: image not found

tableauserverclient.server.endpoint.exceptions.InternalServerError: Error status code: 500

Hi, I'm facing error when I try to publish the hyper file to Tableau Server.

Below is the code:

def publish_hyper_file_to_tableau():

server_config = load_server_config()

hyper_name = server_config['hyper_name']
server_address = server_config['server_address']
project_name = server_config['project_name']
token_name = server_config['token_name']
token_value = server_config['token_value']
username = server_config['username']
password = server_config['password']

path_to_database = Path(hyper_name)

# Sign in to server
tableau_auth = TSC.PersonalAccessTokenAuth(token_name=token_name, personal_access_token=token_value, site_id="")
server = TSC.Server(server_address, use_server_version=True)
credentials = TSC.ConnectionCredentials(name=username, password=password)

print(f"Signing into tableau at {server_address}")

with server.auth.sign_in(tableau_auth):

    publish_mode = TSC.Server.PublishMode.Overwrite
    
    # Get project_id from project_name
    all_projects, pagination_item = server.projects.get()
    for project in TSC.Pager(server.projects):
        if project.name == project_name:
            project_id = project.id

    # Create the datasource object with the project_id
    datasourceitem = TSC.DatasourceItem(project_id)
    
    print(f"Publishing {hyper_name} to {project_name}...")

    # Publish datasource
    datasource = server.datasources.publish(datasourceitem, path_to_database, publish_mode, credentials)
    print("Datasource published. Datasource ID: {0}".format(datasource.id))

Output:

Server config file loaded.
Signing into tableau at ********************* [server name changed for privacy]
Publishing ACE_Dashboard.hyper to ACE_Dashboard...
Traceback (most recent call last):
File "/home/ramanan/Documents/ACE_Dashboard/ACE_Dashboard/ACE_Dashboard_CSV_To_HYPER .py", line 207, in
publish_hyper_file_to_tableau()
File "/home/ramanan/Documents/ACE_Dashboard/ACE_Dashboard/ACE_Dashboard_CSV_To_HYPER .py", line 199, in publish_hyper_file_to_tableau
datasource = server.datasources.publish(datasourceitem, path_to_database, publish_mode, credentials)
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/endpoint.py", line 177, in wrapper
return func(self, *args, **kwargs)
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/endpoint.py", line 219, in wrapper
return func(self, *args, **kwargs)
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/endpoint.py", line 219, in wrapper
return func(self, *args, **kwargs)
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/datasources_endpoint.py", line 275, in publish
raise err
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/datasources_endpoint.py", line 271, in publish
server_response = self.post_request(url, xml_request, content_type)
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/endpoint.py", line 140, in post_request
parameters=parameters,
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/endpoint.py", line 71, in _make_request
self._check_status(server_response)
File "/home/ramanan/miniconda3/lib/python3.7/site-packages/tableauserverclient/server/endpoint/endpoint.py", line 85, in _check_status
raise InternalServerError(server_response)
tableauserverclient.server.endpoint.exceptions.InternalServerError:

Error status code: 500
b'

Internal Server Error

The server encountered an error and cannot complete your request. Contact your server administrator.'

Request for a FOOTER format options for external table or COPY

Would it be possible to have a FOOTER=>nrows advanced format option for CSV ?

My need comes from a HANA DB export as CLOB and the hana driver i use (ADO.net) generates two footer lines : one full of NULNULNUL... and one empty line

the max level of WITH RECURSIVE is about 150000? how to increase it?

import time
from tableauhyperapi import HyperProcess, Telemetry, Connection

with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
    with Connection(endpoint=hyper.endpoint) as connection:
        a=connection.execute_command("CREATE TEMPORARY EXTERNAL TABLE tripdata FOR 'd:/yellow_tripdata_2021-06.parquet'")

        t=time.time()
        a=connection.execute_scalar_query("""WITH RECURSIVE
   cnt(x) AS (
      SELECT 1
      UNION ALL
      SELECT x+1 FROM cnt
       where x < 150000
) select count(*) from cnt;""")
        print(time.time()-t, ": ", a)
        exit()

can run ok

D:\>python temp.py
1.0250585079193115 :  150000

if changed the 150000 to 160000, then raise an error

D:\>python temp.py
Traceback (most recent call last):
  File "temp.py", line 9, in <module>
    a=connection.execute_scalar_query("""WITH RECURSIVE
  File "D:\Python38\lib\site-packages\tableauhyperapi\connection.py", line 238, in execute_scalar_query
    with self.execute_query(query, text_as_bytes) as result:
  File "D:\Python38\lib\site-packages\tableauhyperapi\connection.py", line 191, in execute_query
    Error.check(hapi.hyper_execute_query(self._cdata,
  File "D:\Python38\lib\site-packages\tableauhyperapi\impl\dllutil.py", line 100, in check
    raise errp.to_exception()
tableauhyperapi.hyperexception.HyperException: A segment overflowed.
Context: 0xfa6b0e2f

Request: Data-Modifying CTEs

Request:
I'd like to request the connection.execute_query function permit CTEs (Common Table Expressions) with DML operations as noted in below examples on Postgre's main webpage (see section 7.8.2. Data-Modifying Statements in WITH)
https://www.postgresql.org/docs/13/queries-with.html#QUERIES-WITH-MODIFYING

Purpose:
We'd like to have the ability to persist the original records before they're updated or deleted within the same query using the RETURNING clause. This seems to also be allowed in existing Tableau Hyper API document (https://help.tableau.com/current/api/hyper_api/en-us/reference/sql/sql-delete.html); trying this today (tableauhyperapi 0.0.14265) throws an exception

tableauhyperapi.hyperexception.HyperException: syntax error: got "delete", expected "explain", "select", "table", "values", "with", "with ", '(:'

Alternative (in case of technical limitation):
First call the connection.execute_query function that returns the rows that will be updated or deleted and persist them in a Results object (dataframe), then call the connection.execute_command function that performs the update or delete operation (using same filter from step one above).

File is locked

Hi,
Thanks for sharing these examples, they are great. Most of the time all works perfectly but from time to time I'm getting the following error when trying to open the connection to .hyper file:
Tableau.HyperAPI.HyperException: error opening database 'xxxxx': There was an error during loading database: The database file is locked by another process: DatabaseId: 'xxxxxx'

This concerns the situation when a user browses Tableau Server workbook that refers to the same Data Source as I'm trying to access via a code - but even if the user closes a browser, the lock on the .hyper file remains for about 30 more minutes.
Does anyone know how to overcome the locking issues? Thanks,

Publishing of multi-tables not working

Hello,
Has anyone got his working? https://github.com/tableau/hyper-api-samples/tree/main/Community-Supported/publish-multi-table-hyper
I have tried the exact solution but the it is still publishes the old hyper file and not the new one. Can someone here help me out please?

Shoma

Issue with Timestamp type when using Hyper extract

I created a hyper extract using the Hyper API in Python. This hyper extract contains several Timestamp fields.

I set the columns to type=SqlType.timestamp() and the data is a Timestamp Object like this: Timestamp(year, month, day, hour, min, sec).

The issue is when I open the generated hyper extract in Tableau desktop, any back-end date calculations fail, giving the following error:

An error occurred while communicating with data source 'SN PPM Extract (local copy)'.

Bad Connection: Tableau could not connect to the data source.
ERROR: unsupported data types in call to 'tableau.normalize_datetime'

SELECT TABLEAU.TO_DATETIME(DATE_TRUNC('YEAR', TABLEAU.NORMALIZE_DATETIME("pm_project"."start_date")), "pm_project"."start_date") AS "tyr:start_date (pm_project):ok"
FROM "public"."pm_portfolio" "pm_portfolio"
LEFT JOIN "public"."pm_project" "pm_project" ON ("pm_portfolio"."name" = "pm_project"."primary_portfolio")
GROUP BY 1

The above case is using the field 'start_date (pm_project)', and is trying to get the Year part of the date time.

Attached is the .tdsx containing the hyper file as a .zip file. Any help you could provide would be greatly appreciated.

SN PPM Extract.zip

Request for PARRALEL option for external table with ARRAY

Currently CTAS using an external table for ARRAY as source is sequential : the source files in the array are read one after another.

It would be great if the select (and may be the insert) could be done in parallel.
A PARRALEL => degree could be use to parameter/limit that.

errorCode=410012 occurred and 6 million rows of BigQuery data could not be published by export_load method

Problem Description

The following error occurred when executing BigQueryExtractor's export_load.

Specified table does not exist: updated_rows. (errorCode=410012)']

The source table had about 6 million rows, and it seemed that part of the split process at update_datasource_from_hyper_file was failed.
Checking the destination tableau online source_data, the 1.9 million rows of data was created and not all.

However, when I set sample_rows=10000000 (10M) for load_sample, it worked fine.(but, it took about an hour for local processing...)

Is there something wrong with the settings in export_load?
Or would it be better to use load_sample for large table (if there is no performance difference with export_load)

I also tried with BigQuery's Public dataset below, but got the same error.
bigquery-public-data.google_trends.top_rising_terms

*Due to a data type error, the following part of bigquery_extractor.py has been temporarily corrected.

type_lookup = { "BOOL": SqlType.bool(), "BYTES": SqlType.bytes(), "DATE": SqlType.date(), "DATETIME": SqlType.timestamp(), "INT64": SqlType.big_int(), "INTEGER": SqlType.text(), "NUMERIC": SqlType.numeric(18, 9), "FLOAT64": SqlType.double(), "FLOAT": SqlType.text(), "GEOGRAPHY": SqlType.text(), "STRING": SqlType.text(), "TIME": SqlType.time(), "TIMESTAMP": SqlType.timestamp_tz(), }

version:

tableauhyperapi:0.0.14401
tableauserverclient:0.23

main.py

config = yaml.safe_load(open("config.yml"))
tableau_env = config.get("tableau_env")
db_env = config.get("bigquery")    
extractor = BigQueryExtractor(
            source_database_config=db_env,
            tableau_hostname=tableau_env.get("server_address"),
            tableau_project=tableau_env.get("project"),
            tableau_site_id=tableau_env.get("site_id"),
            tableau_token_name="xxxx",
            tableau_token_secret="xxxx",
        )

extractor.export_load(
            source_table="bigquery-public-data.google_trends.top_rising_terms",
            tab_ds_name="export_load_test"
        )

Error

==================
Traceback (most recent call last):
File "/Users/yuji.yamamoto/test/load_tableau_hyper_files_2/hyper-api-samples/Community-Supported/clouddb-extractor/main.py", line 26, in
extractor.export_load(
File "/Users/yuji.yamamoto/test/load_tableau_hyper_files_2/hyper-api-samples/Community-Supported/clouddb-extractor/base_extractor.py", line 165, in execution_timer
result = func(*args, **kw)
File "/Users/yuji.yamamoto/test/load_tableau_hyper_files_2/hyper-api-samples/Community-Supported/clouddb-extractor/base_extractor.py", line 690, in export_load
self.update_datasource_from_hyper_file(
File "/Users/yuji.yamamoto/test/load_tableau_hyper_files_2/hyper-api-samples/Community-Supported/clouddb-extractor/base_extractor.py", line 584, in update_datasource_from_hyper_file
self.tableau_server.jobs.wait_for_job(async_job)
File "/Users/yuji.yamamoto/.anyenv/envs/pyenv/versions/3.9.6/lib/python3.9/site-packages/tableauserverclient/server/endpoint/jobs_endpoint.py", line 72, in wait_for_job
raise JobFailedException(job)
tableauserverclient.server.endpoint.exceptions.JobFailedException: Job 23a09876-ec23-4d88-b1fa-1314f23f2aef failed with notes ['com.tableausoftware.domain.server.exception.TableauServerException: There was a problem retrieving table information for the source table. Specified table does not exist: updated_rows. (errorCode=410012)']

inserting data from csv file into existing hyper file

Hi, I'm creating a data frame & csv file from pulling data from a SQL data base and then write the same into an existing hyper file. But when ever I do it, it is just replacing the data in the hyper file rather then inserting new data along with the existing ones.

def create_hyper_file_from_csv(dataframe):

print("Load data from CSV into table in new Hyper file for ACE Dashboard")

process_parameters = {
    # Limits the number of Hyper event log files to two.
    "log_file_max_count": "2",
    # Limits the size of Hyper event log files to 100 megabytes.
    "log_file_size_limit": "100M"
}

with HyperProcess(Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU, 'Tabloader', parameters=process_parameters) as hyper:

    connection_parameters = {"lc_time": "en_US"}

    with Connection(endpoint=hyper.endpoint, database=PATH_TO_HYPER,
                    create_mode=CreateMode.CREATE_AND_REPLACE, parameters=connection_parameters) as connection:
        
        connection.catalog.create_table(ace_dashboard_table)

        with Inserter(connection, ace_dashboard_table) as inserter:
            for index, row in dataframe.iterrows():
                try:
                    inserter.add_row(row)
                except Exception as e:
                    print("An error occured at row {index}: {e}")
            
            inserter.execute()
            
        #insert_csv_data = connection.execute_command(
        # ) command=f"COPY {ace_dashboard_table.table_name} FROM {escape_string_literal(PATH_TO_CSV)} with "
        #    "(format csv, NULL 'NULL', delimiter ';', header true)")

    print("The connection to the Hyper file is closed.")
#print(f"Loaded {insert_csv_data} rows.")
os.remove(PATH_TO_CSV)

Cannot COPY from a FIFO

I tried feeding the COPY command a bash command subsitution (e.g. ./import.py <(zstd -d < file.zst)) or a manually created FIFO (mkfifo import && zstd -d < file.zst > import &; ./import.py import).

The data I'm trying to read in is 90G compressed ... I'm not very excited about decompressing it just to put it back into the .hyper file. Could the hyper API not try to seek and just stream the data instead?

Here's the error in the hyper log:

{"ts":"2022-04-30T14:29:55.586433","pid":28907,"tid":"7ffa337fe700","sev":"error","req":"2","sess":"bg12tJyuSG-bx1p5CskBCw","k":"query-end-system-error","v":{"error-code":"58000","error-message":"unable to read from external source.","error-detail-internal":"source: \"import\"\nsystem error: lseek(): Illegal seek","elapsed":0.0041398,"parsing-time":3.0753e-05,"initial-compilation-time":0.00201559,"execution-time":0.00198987,"exec-threads":{"thread-time":0.00198987,"cpu-time":0.00198974,"wait-time":0,"storage":{}},"peak-transaction-memory-mb":0.25,"time-to-schedule":2.5986e-05,"lock-acquisition-time":6.97e-07,"peak-result-buffer-memory-mb":0,"peak-result-buffer-disk-mb":0,"result-size-mb":2.28882e-05,"statement":"COPY","spooling":false,"query-settings-active":false,"plan-cache-status":"cache miss","plan-cache-hit-count":0,"cols":0,"rows":0,"query-hash":"293c.08f52f363204.623a1f","query-trunc":"COPY \"meth\" from 'import' with (format csv, NULL 'NULL', delimiter '\t')"}}

java.lang.RuntimeException: Failed to start a new Hyper instance. The Hyper executable "<path>/hyper/hyperd" does not exist.

Following https://github.com/tableau/hyper-api-samples/tree/main/Tableau-Supported/Java/read-and-print-data-from-existing-hyper-file

getting this error:
java.lang.RuntimeException: Failed to start a new Hyper instance. The Hyper executable "<path>/hyper/hyperd" does not exist.

Is there some hyperd binary I need to install/run? I don't have tableau installed but I want to read a hyper file in java

Cannot create Hyper Files on mounted webdav volumes despite full write permission.

Problem Description

When trying to create a hyper file using tableauhyperapi, on a webdav volume mounted via davfs2 (this might apply to other file systems as well), there seems to be some issue writing the file.

The permissions per se are not a problem, as you can see in the sample code:

Sample Code

from tablefrom pathlib import Path

from tableauhyperapi import Connection, CreateMode, HyperProcess, Telemetry

test_file = Path("/mnt/data/test.hyper")

# write and read file to test write permissions
test_file.unlink(missing_ok=True)  # delete existing file
with test_file.open("w") as f:
    f.write("Write/read success!")
with test_file.open("r") as f:
    print(f.readline())
test_file.unlink(missing_ok=True)  # delete test file before testing hyper

with HyperProcess(
    telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU,
) as hyper:
    # Open a connection to the Hyper process. This will also create the new Hyper file.
    # `CREATE_AND_REPLACE` mode causes the file to be replaced if it already exists.
    with Connection(
        endpoint=hyper.endpoint,
        database=test_file,
        create_mode=CreateMode.CREATE_AND_REPLACE,
    ) as connection:
        print("Hyper write success!")

---------------------------------------------------------------------------
HyperException                            Traceback (most recent call last)
Input In [2], in <cell line: 15>()
     13 test_file.unlink(missing_ok=True)  # delete test file before testing hyper
     15 with HyperProcess(
     16     telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU,
     17 ) as hyper:
     18     # Open a connection to the Hyper process. This will also create the new Hyper file.
     19     # `CREATE_AND_REPLACE` mode causes the file to be replaced if it already exists.
---> 20     with Connection(
     21         endpoint=hyper.endpoint,
     22         database=test_file,
     23         create_mode=CreateMode.CREATE_AND_REPLACE,
     24     ) as connection:
     25         print("Hyper write success!")
     26         pass

File ~/.local/lib/python3.9/site-packages/tableauhyperapi/connection.py:99, in Connection.__init__(self, endpoint, database, create_mode, parameters)
     96     database = parameters['dbname']
     97     del parameters['dbname']
---> 99 self.__cdata = self.__create_connection(endpoint, database, create_mode, parameters)
    100 self.__endpoint = endpoint
    102 # Lock to serialize cancel() and close() calls.

File ~/.local/lib/python3.9/site-packages/tableauhyperapi/connection.py:126, in Connection.__create_connection(endpoint, database, create_mode, parameters)
    123         native_params.set_value(key, value)
    125 pp = ffi.new('hyper_connection_t**')
--> 126 Error.check(hapi.hyper_connect(native_params.cdata, pp, create_mode.value))
    127 return ffi.gc(pp[0], hapi.hyper_disconnect)

File ~/.local/lib/python3.9/site-packages/tableauhyperapi/impl/dllutil.py:100, in Error.check(p)
     97 if p != ffi.NULL:
     98     # this will free the error when it goes out of scope
     99     errp = Error(p)
--> 100     raise errp.to_exception()

HyperException: The database "hyper.file:/mnt/data/test.hyper" could not be created: Growing the database file failed
Context: 0xfa6b0e2f

Versions:

System: Ubuntu 22.04.1 LTS
Python: 3.9.7
tableauhyperapi: 0.0.15530

can hyper python API use multi-core？

from tableauhyperapi import HyperProcess, Telemetry, Connection

with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
    with Connection(endpoint=hyper.endpoint) as connection:
        import time
        t=time.time()
        a=connection.execute_scalar_query("select count(1) from 'd:/yellow_tripdata_2021-06.parquet'")
        print(a,time.time()-t)
        t=time.time()
        a=connection.execute_list_query("select passenger_count,count(1) from 'd:/yellow_tripdata_2021-06.parquet'group by passenger_count order by 1")
        print(a,time.time()-t)
        t=time.time()
        a=connection.execute_list_query("select passenger_count,sum(trip_distance) from 'd:/yellow_tripdata_2021-06.parquet'group by passenger_count order by 1")
        print(a,time.time()-t)

returns

2834264 0.18601059913635254
[[0, 66636], [1, 1968440], [2, 412798], [3, 108634], [4, 40950], [5, 67686], [6, 45562], [7, 11], [8, 5], [9, 4], [None, 123538]] 0.20101165771484375
[[0, 172554.11], [1, 5797179.629999995], [2, 1341309.7100000011], [3, 343928.14999999997], [4, 134748.31000000006], [5, 204493.66000000003], [6, 13989
3.91], [7, 33.44], [8, 9.17], [9, 0.0], [None, 11517949.330000013]] 0.2130122184753418

while duckdb CLI on same machine query same file

D select passenger_count,count(1) from 'd:/yellow_tripdata_2021-06.parquet'group by passenger_count order by 1;
┌─────────────────┬──────────┐
│ passenger_count │ count(1) │
│      int32      │  int64   │
├─────────────────┼──────────┤
│               0 │    66636 │
│               1 │  1968440 │
│               2 │   412798 │
│               3 │   108634 │
│               4 │    40950 │
│               5 │    67686 │
│               6 │    45562 │
│               7 │       11 │
│               8 │        5 │
│               9 │        4 │
│                 │   123538 │
├─────────────────┴──────────┤
│ 11 rows          2 columns │
└────────────────────────────┘
Run Time (s): real 0.197 user 0.171601 sys 0.000000
D select passenger_count,count(1) from 'd:/yellow_tripdata_2021-06.parquet'group by passenger_count order by 1;
┌─────────────────┬──────────┐
│ passenger_count │ count(1) │
│      int32      │  int64   │
├─────────────────┼──────────┤
│               0 │    66636 │
│               1 │  1968440 │
│               2 │   412798 │
│               3 │   108634 │
│               4 │    40950 │
│               5 │    67686 │
│               6 │    45562 │
│               7 │       11 │
│               8 │        5 │
│               9 │        4 │
│                 │   123538 │
├─────────────────┴──────────┤
│ 11 rows          2 columns │
└────────────────────────────┘
Run Time (s): real 0.074 user 0.156001 sys 0.046800
D select passenger_count,sum(trip_distance) from 'd:/yellow_tripdata_2021-06.parquet'group by passenger_count order by 1
> ;
┌─────────────────┬────────────────────┐
│ passenger_count │ sum(trip_distance) │
│      int32      │       double       │
├─────────────────┼────────────────────┤
│               0 │  172554.1099999999 │
│               1 │  5797179.629999994 │
│               2 │ 1341309.7100000044 │
│               3 │ 343928.15000000084 │
│               4 │ 134748.30999999997 │
│               5 │ 204493.66000000027 │
│               6 │ 139893.91000000006 │
│               7 │              33.44 │
│               8 │               9.17 │
│               9 │                0.0 │
│                 │ 11517949.330000013 │
├─────────────────┴────────────────────┤
│ 11 rows                    2 columns │
└──────────────────────────────────────┘
Run Time (s): real 0.079 user 0.296402 sys 0.140401

Support for R

Is there any support for R (officially or unofficially)? I do most of my data analysis projects in R and would like to update the .hyper file feeding my Tableau dashboard directly from R.

If not, is there any documentation I should reference should I want to make my own R package to help with this?

Thank you!

''Inserter is already closed' error when tried multiprocessing upstream

When I used the below function in my tableau datapipeline, which writes pandas data frame to a hyper extract, I received "Inserter is already closed" when writing to hyper extract. Please be noted, I didn't try to do multiprocessing to insert data to hyper extract, I was using in a pure pandas operation functions upstream, but the insertion failed.

def parallelize_dataframe(df, func, n_cores=4):
    df_split = np.array_split(df, n_cores)
    pool = Pool(n_cores)
    df = pd.concat(pool.map(func, df_split))
    pool.close()
    pool.join()
    return df

def convert_df_to_hyper_file(filename, df, path_to_hyper, columns):
    with HyperProcess(
            Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU,
            'Tabloader') as hyper:
        with Connection(endpoint=hyper.endpoint,
                        create_mode=CreateMode.CREATE_AND_REPLACE,
                        database=path_to_hyper) as connection:
            hyper_table = TableDefinition(
                filename,
                columns=columns
            )
            connection.catalog.create_table(hyper_table)

            with Inserter(connection, hyper_table) as inserter:
                for index, row in df.iloc.iterrows():
                    try:
                        inserter.add_row(row)
                    except Exception as e:
                        logging.error(f'an error occured at row {index}:'
                                      f'\n {e}')
                inserter.execute()

Copy From Parquet Feature

Hi,

Really love the copy from parquet feature for hyper API. Any idea when that would become an official feature (vs an experimental feature)?

Thanks

Could not interpret 'experimental_external_s3' as global setting: No internal setting named 'experimental_external_s3' exists

Hello, I'm trying to create .hyper file from S3 natively by following an example here.

and I'm getting following error:

Caused by:
The Hyper server process exited during startup with exit code: 1
	Command-line: "/opt/python/tableauhyperapi/bin/hyper/hyperd run --date-style=MDY --date-style-lenient=false --experimental-external-s3=true --init-user=tableau_internal_user --language=en_US --log-config= --log-dir=/tmp/ --no-password=true --skip-license=true --telemetry-opt-in= --listen-connection tab.domain:///tmp/domain/auto --callback-connection tab.domain:///tmp/domain/3b5f57d9d4bf42b38beca85975adcd08"
	Child process' stderr:
	Error(s) while applying config file or command line parameters.
	Error. Could not interpret 'experimental_external_s3' as global setting: No internal setting named 'experimental_external_s3' exists.
	Critical setting errors encountered, shutting down.

I'm running this in AWS Lambda, Runtime 3.9, x86_64.
I have built a custom layer with this command:
mkdir -pv /build/layer/ && pip install -t /build/layer/python tableauhyperapi>=0.0.14946 tableauserverclient>=0.19.0

Below is the code I'm running:

from tableauhyperapi import HyperProcess, Connection, Telemetry, CreateMode, SqlType, TableDefinition, TableName, Nullability, Inserter, escape_string_literal
import os

database_path = '/tmp/ls1_kpi_15min.hyper'
CURRENT_DATASET = escape_string_literal("s3://mybucket/ymd=20230201/581b0d96eb7a46508ca22dca0702sdf9530.snappy.parquet")

def lambda_handler(event, context):
    list_files = os.listdir('/opt/python/tableauhyperapi/bin/hyper')
    print(list_files)
    
    # Start up a local Hyper process. 
    with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU, parameters={'log_config': '', 'log_dir':'/tmp/', "experimental_external_s3": "true"}) as hyper:

        with Connection(endpoint=hyper.endpoint, database=database_path, create_mode=CreateMode.CREATE_AND_REPLACE) as connection:
            table_name=TableName("ls1", "kpi_15min")
            
            cmd = f"CREATE TABLE {table_name}" \
                  f" AS ( SELECT * FROM EXTERNAL(S3_LOCATION({CURRENT_DATASET} FORMAT => 'parquet')))"
            connection.execute_command(cmd)
            
            row_count = connection.execute_scalar_query(f"SELECT COUNT(*) FROM {table_name}")
            print (f"Loaded {row_count} rows")

Any idea why this job is failing to interpret 'experimental_external_s3' as global setting?

add support for CREATE EXTERNAL TABLE

currently only temporary external tables are supported, it will be nice to remove that limitation please

tableau / hyper-api-samples Goto Github PK

hyper-api-samples's Introduction

Hyper API Samples

If you are looking to learn more about the Hyper API, please check out the official documentation.

What is the Hyper API?

How do I install the Hyper API?

How do I get help or give feedback?

Contributions

Hyper API License

hyper-api-samples's People

Contributors

Stargazers

Watchers

Forkers

hyper-api-samples's Issues

Problem Description

main.py

Error

Problem Description

Sample Code

Versions:

Recommend Projects

Recommend Topics

Recommend Org