Git Product home page Git Product logo

greenplum-db / pxf Goto Github PK

View Code? Open in Web Editor NEW
78.0 21.0 60.0 28.12 MB

Platform Extension Framework: Federated Query Engine

Home Page: https://greenplum.docs.pivotal.io/pxf/latest/using/overview_pxf.html

License: Apache License 2.0

Shell 6.65% Makefile 0.91% Java 79.20% Python 0.09% Dockerfile 0.78% Perl 1.23% C 8.81% PLpgSQL 0.01% Go 1.33% Ruby 0.02% HiveQL 0.31% Jinja 0.41% HCL 0.22% Smarty 0.04%
pxf hadoop hdfs s3 query-federation greenplum hbase google-cloud-storage azure-blob-storage azure-data-lake

pxf's Introduction

PXF Build Concourse Build Status | PXF Certification Concourse Build Status


Introduction

PXF is an extensible framework that allows a distributed database like Greenplum to query external data files, whose metadata is not managed by the database. PXF includes built-in connectors for accessing data that exists inside HDFS files, Hive tables, HBase tables, JDBC-accessible databases and more. Users can also create their own connectors to other data storage or processing engines.

Repository Contents

external-table/

Contains the Greenplum extension implementing an External Table protocol handler

fdw/

Contains the Greenplum extension implementing a Foreign Data Wrapper (FDW) for PXF

server/

Contains the server side code of PXF along with the PXF Service and all the Plugins

cli/

Contains command line interface code for PXF

automation/

Contains the automation and integration tests for PXF against the various datasources

singlecluster/

Hadoop testing environment to exercise the pxf automation tests

concourse/

Resources for PXF's Continuous Integration pipelines

regression/

Contains the end-to-end (integration) tests for PXF against the various datasources, utilizing the PostgreSQL testing framework pg_regress

downloads/

An empty directory that serves as a staging location for Greenplum RPMs for the development Docker image

PXF Development

Below are the steps to build and install PXF along with its dependencies including Greenplum and Hadoop.

To start, ensure you have a ~/workspace directory and have cloned the pxf and its prerequisites (shown below) under it. (The name workspace is not strictly required but will be used throughout this guide.)

mkdir -p ~/workspace
cd ~/workspace

git clone https://github.com/greenplum-db/pxf.git

Alternatively, you may create a symlink to your existing repo folder.

ln -s ~/<git_repos_root> ~/workspace

Install Dependencies

To build PXF, you must have:

  1. GCC compiler, make system, unzip package, maven for running integration tests

  2. Installed Greenplum DB

    Either download and install Greenplum RPM or build Greenplum from the source by following instructions in the GPDB README.

    Assuming you have installed Greenplum into /usr/local/greenplum-db directory, run its environment script:

    source /usr/local/greenplum-db/greenplum_path.sh
    
  3. JDK 1.8 or JDK 11 to compile/run

    Export your JAVA_HOME:

    export JAVA_HOME=<PATH_TO_YOUR_JAVA_HOME>
    
  4. Go (1.9 or later)

    To install Go on CentOS, sudo yum install go. For other platforms, see the Go downloads page.

    Make sure to export your GOPATH and add go to your PATH. For example:

    export GOPATH=$HOME/go
    export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin

    For the new M1 Apple Macs, add the following to your path instead

    export PATH=$PATH:/opt/homebrew/bin/go/bin:$GOPATH/bin
  5. cURL (7.29 or later):

    To install cURL devel package on CentOS 7, sudo yum install libcurl-devel.

    Note that CentOS 6 provides an older, unsupported version of cURL (7.19). You should install a newer version from source if you are on CentOS 6.

How to Build PXF

PXF uses Makefiles to build its components. PXF server component uses Gradle that is wrapped into the Makefile for convenience.

cd ~/workspace/pxf

# Compile & Test PXF
make

# Only run unit tests
make test

How to Install PXF

To install PXF, first make sure that the user has sufficient permissions in the $GPHOME and $PXF_HOME directories to perform the installation. It's recommended to change ownership to match the installing user. For example, when installing PXF as user gpadmin under /usr/local/greenplum-db:

export GPHOME=/usr/local/greenplum-db
export PXF_HOME=/usr/local/pxf
export PXF_BASE=${HOME}/pxf-base
chown -R gpadmin:gpadmin "${GPHOME}" "${PXF_HOME}"
make -C ~/workspace/pxf install

NOTE: if PXF_BASE is not set, it will default to PXF_HOME, and server configurations, libraries or other configurations, might get deleted after a PXF re-install.

How to Run PXF

Ensure that PXF is in your path. This command can be added to your .bashrc

export PATH=/usr/local/pxf/bin:$PATH

Then you can prepare and start up PXF by doing the following.

pxf prepare
pxf start

If ${HOME}/pxf-base does not exist, pxf prepare will create the directory for you. This command should only need to be run once.

Re-installing PXF after making changes

Note: Local development with PXF requires a running Greenplum cluster.

Once the desired changes have been made, there are 2 options to re-install PXF:

  1. Run make -sj4 install to re-install and run tests
  2. Run make -sj4 install-server to only re-install the PXF server without running unit tests.

After PXF has been re-installed, you can restart the PXF instance using:

pxf restart

How to demonstrate Hadoop Integration

In order to demonstrate end to end functionality you will need Hadoop installed. We have all the related hadoop components (hdfs, hive, hbase, zookeeper, etc) mapped into simple artifact named singlecluster. You can download from here and untar the singlecluster-HDP.tar.gz file, which contains everything needed to run Hadoop.

mv singlecluster-HDP.tar.gz ~/workspace/
cd ~/workspace
tar xzf singlecluster-HDP.tar.gz

Create a symlink using ln -s ~/workspace/singlecluster-HDP ~/workspace/singlecluster and then follow the steps in Setup Hadoop.

While PXF can run on either Java 8 or Java 11, please ensure that you are running Java 8 for hdfs, hadoop, etc. Please set your java version by seting your JAVA_HOME to the appropriate location.

On a Mac, you can set your java version using JAVA_HOME like so:

export JAVA_HOME=`/usr/libexec/java_home -v 1.8`

Initialize the default server configurations:

cp ${PXF_HOME}/templates/*-site.xml ${PXF_BASE}/servers/default

Development With Docker

NOTE: Since the docker container will house all Single cluster Hadoop, Greenplum and PXF, we recommend that you have at least 4 cpus and 6GB memory allocated to Docker. These settings are available under docker preferences.

The quick and easy is to download the GPDB 6.6 RPM from Github and move it into the /downloads folder. Then run ./dev/start.bash to get a docker image with a running GPDB6, Hadoop cluster and an installed PXF.

If you would like more control over the GPDB installation, you can use the steps below.

# Get the latest centos7 image for GPDB6
docker pull gcr.io/$PROJECT_ID/gpdb-pxf-dev/gpdb6-centos7-test-pxf:latest

# If you want to use gdb to debug gpdb you need the --privileged flag in the command below
docker run --rm -it \
  -p 5432:5432 \
  -p 5888:5888 \
  -p 8000:8000 \
  -p 5005:5005 \
  -p 8020:8020 \
  -p 9000:9000 \
  -p 9090:9090 \
  -p 50070:50070 \
  -w /home/gpadmin/workspace \
  -v ~/workspace/gpdb:/home/gpadmin/workspace/gpdb \
  -v ~/workspace/pxf:/home/gpadmin/workspace/pxf \
  -v ~/workspace/singlecluster-HDP:/home/gpadmin/workspace/singlecluster \
  gcr.io/$PROJECT_ID/gpdb-pxf-dev/gpdb6-centos7-test-pxf:latest /bin/bash -c \
  "/home/gpadmin/workspace/pxf/dev/set_up_gpadmin_user.bash && /usr/sbin/sshd && su - gpadmin"

Setup GPDB in the Docker image

Configure, build and install GPDB. This will be needed only when you use the container for the first time with GPDB source.

~/workspace/pxf/dev/build_gpdb.bash
sudo mkdir /usr/local/greenplum-db-devel
sudo chown gpadmin:gpadmin /usr/local/greenplum-db-devel
~/workspace/pxf/dev/install_gpdb.bash

For subsequent minor changes to GPDB source you can simply do the following:

~/workspace/pxf/dev/install_gpdb.bash

Run all the instructions below and run GROUP=smoke (in one script):

~/workspace/pxf/dev/smoke_shortcut.sh

Create Greenplum Cluster

source /usr/local/greenplum-db-devel/greenplum_path.sh
make -C ~/workspace/gpdb create-demo-cluster
source ~/workspace/gpdb/gpAux/gpdemo/gpdemo-env.sh

Setup Hadoop

Hdfs will be needed to demonstrate functionality. You can choose to start additional hadoop components (hive/hbase) if you need them.

Setup User Impersonation prior to starting the hadoop components (this allows the gpadmin user to access hadoop data).

~/workspace/pxf/dev/configure_singlecluster.bash

Setup and start HDFS

pushd ~/workspace/singlecluster/bin
echo y | ./init-gphd.sh
./start-hdfs.sh
popd

Start other optional components based on your need

pushd ~/workspace/singlecluster/bin
# Start Hive
./start-yarn.sh
./start-hive.sh

# Start HBase
./start-zookeeper.sh
./start-hbase.sh
popd

Setup Minio (optional)

Minio is an S3-API compatible local storage solution. The development docker image comes with Minio software pre-installed. To start the Minio server, run the following script:

source ~/workspace/pxf/dev/start_minio.bash

After the server starts, you can access Minio UI at http://localhost:9000 from the host OS. Use admin for the access key and password for the secret key when connecting to your local Minio instance.

The script also sets PROTOCOL=minio so that the automation framework will use the local Minio server when running S3 automation tests. If later you would like to run Hadoop HDFS tests, unset this variable with unset PROTOCOL command.

Setup PXF

Install PXF Server

# Install PXF
make -C ~/workspace/pxf install

# Start PXF
export PXF_JVM_OPTS="-Xmx512m -Xms256m"
$PXF_HOME/bin/pxf start

Install PXF client (ignore if this is already done)

psql -d template1 -c "create extension pxf"

Run PXF Tests

All tests use a database named pxfautomation.

pushd ~/workspace/pxf/automation

# Initialize default server configs using template
cp ${PXF_HOME}/templates/{hdfs,mapred,yarn,core,hbase,hive}-site.xml ${PXF_BASE}/servers/default

# Run specific tests. Example: Hdfs Smoke Test
make TEST=HdfsSmokeTest

# Run all tests. This will be very time consuming.
make GROUP=gpdb

# If you wish to run test(s) against a different storage protocol set the following variable (for eg: s3)
export PROTOCOL=s3
popd

If you see any HBase failures, try copying pxf-hbase-*.jar to the HBase classpath, and restart HBase:

cp ${PXF_HOME}/lib/pxf-hbase-*.jar ~/workspace/singlecluster/hbase/lib/pxf-hbase.jar
~/workspace/singlecluster/bin/stop-hbase.sh
~/workspace/singlecluster/bin/start-hbase.sh

Make Changes to PXF

To deploy your changes to PXF in the development environment.

# $PXF_HOME folder is replaced each time you make install.
# So, if you have any config changes, you may want to back those up.
$PXF_HOME/bin/pxf stop
make -C ~/workspace/pxf install
# Make any config changes you had backed up previously
rm -rf $PXF_HOME/pxf-service
yes | $PXF_HOME/bin/pxf init
$PXF_HOME/bin/pxf start

IDE Setup (IntelliJ)

  • Start IntelliJ. Click "Open" and select the directory to which you cloned the pxf repo.
  • Select File > Project Structure.
  • Make sure you have a JDK (version 1.8) selected.
  • In the Project Settings > Modules section, select Import Module, pick the pxf/server directory and import as a Gradle module. You may see an error saying that there's no JDK set for Gradle. Just cancel and retry. It goes away the second time.
  • Import a second module, giving the pxf/automation directory, select "Import module from external model", pick Maven then click Finish.
  • Restart IntelliJ
  • Check that it worked by running a unit test (cannot currently run automation tests from IntelliJ) and making sure that imports, variables, and auto-completion function in the two modules.
  • Optionally you can replace ${PXF_TMP_DIR} with ${GPHOME}/pxf/tmp in automation/pom.xml
  • Select Tools > Create Command-line Launcher... to enable starting Intellij with the idea command, e.g. cd ~/workspace/pxf && idea ..

Debugging the locally running instance of PXF server using IntelliJ

  • In IntelliJ, click Edit Configuration and add a new one of type Remote
  • Change the name to PXF Service Boot
  • Change the port number to 2020
  • Save the configuration
  • Restart PXF in DEBUG Mode PXF_DEBUG=true pxf restart
  • Debug the new configuration in IntelliJ
  • Run a query in GPDB that uses PXF to debug with IntelliJ

To run a Kerberized Hadoop Cluster

  • See instructions in the dev folder for spinning up a kerberized Dataproc cluster in GCP.

pxf's People

Contributors

avocader avatar axue-broadcom avatar benchristel avatar bradfordb-vmware avatar danielgustafsson avatar denalex avatar dependabot[bot] avatar divyabhargov avatar djianwen-vmware avatar edespino avatar frankgh avatar hlinnaka avatar hornn avatar jiadexin avatar kavinderd avatar lchx1010 avatar lucasbonner avatar m7onov avatar mkiyama avatar mperezfuster avatar oliverralbertini avatar outofmem0ry avatar radarwave avatar raymondyin avatar romaze avatar rvs avatar shivzone avatar tatb-vmware avatar tumuguskun avatar yimingli-vmware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pxf's Issues

ESCAPE 'OFF' is not processed correctly on PXF side

GP 6.4, PXF 5.10.1.
We can create a valid external table with ESCAPE 'OFF' clause:

CREATE EXTERNAL TABLE test (test text)
LOCATION ('pxf://folder?PROFILE=hdfs:text')
FORMAT 'TEXT' ( ESCAPE 'OFF');

The table is created successfully, but while selecting we got the following error:

ERROR:  remote component error (500) from '127.0.0.1:5888':  Type  Exception Report   Message  invalid ESCAPE character 'OFF'. Only single character is allowed for ESCAPE.   Description  The server encountered an unexpected condition that prevented it from fulfilling the request.   Exception   java.lang.IllegalArgumentException: invalid ESCAPE character 'OFF'. Only single character is allowed for ESCAPE. (libchurl.c:920)  (seg0 slice1 10.92.8.11:10000 pid=23082) (libchurl.c:920)
CONTEXT:  External table test, line 1 of file pxf://folder?PROFILE=hdfs:text

It seems that PXF should process ESCAPE 'OFF' correctly and ignore all escape symbols.

ERROR: invalid byte sequence for encoding "UTF8"

Version: 5.1.6

We got an error when accessing a text file on S3.

ERROR: invalid byte sequence for encoding "UTF8": 0x81 (seg1 slice1 1.2.3.4:6001 pid=3023)
DETAIL: External table my_table, line 1 of pxf://my_bucket/my_file?PROFILE=s3:text&SERVER=s3ssp&COMPRESSION_TYPE=BLOCK: "2019-04-02 20:43:29 0000 0000 0000 https://url domain.com https://url https://url?param1=z:\MS_Att..."
SQL state: 22021

And we tracked down the data row(tab separated),

2019-04-02 20:43:29 0000 0000 0000 https://url domain.com https://url https://url?param1=z:\MS_AttachZ2\Picture\R2583000\2583419\2\2019-04-02_11.45.42.jpg&param2=https://url 594 0 Edge 42.17134.1.0 674904952 2019-04-02

It seems that those backslashes caused the error.

External table

CREATE EXTERNAL TABLE public.my_table(
    col1 text,
    col2 text,
    col3 text,
    col4 text,
    col5 text,
    col6 text,
    col7 text,
    col8 text,
    col9 text,
    col10 text,
    col11 integer,
    col12 text,
    col13 text,
    col14 text,
    col15 text,
    col16 text,
    col17 text,
    col18 text,
    col19 text,
    col20 text,
    col21 text,
    col22 text,
    col23 text,
    col24 text)
LOCATION (
    'pxf://my_bucket/my_file?PROFILE=s3:text&SERVER=s3ssp&COMPRESSION_TYPE=BLOCK')
ON ALL
FORMAT 'text' (delimiter E'\t')
ENCODING 'UTF8'
;

Thanks

insert into pxf_table select * from db2_table

I use pxf pull data from db2 to greenplum,
greenplum version is 5.14 ,
db2 version is v11.1,
pxf version is 4.0.3;

when the db2 column type is int or bigint and the column is null in the db2 table , after execute insert into pxf_table select * from db2_table, data which is null automatically converted to 0 .

wish your help ,thanks!

Querying external database thru JDBC without presave

Hi! Thank you for PXF! It helps!

According Greenplum JDBC PXF manual there is ability to create presaved queries and call it via pxf://query:<query_name>.

But I need to use the external database with different types of queries. In short I need change query to the external database every time. And I can't use pushdown SQL syntax for it (like select columns from external_tbl where ...) because the database has not SQL syntax. Everything I can it's make select * from ext_table and set the external query to the definition of ext_table.

@frankgh @denalex Could you please advice the best way for it? Probably there is a magic to pass the query to PXF thru external table connection string?

Thanks!

pxf connect hive occur error

when greenplum read hive data by pxf it occur:
hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it. Other configuration is correct and hive version is 1.2.1 and pxf-hive is compiled by hive-1.2.1. However, it still report version error

Add isolation level to JDBC Server configuration file

Databases like DB2 and SQL Server use locking to provide read consistency so long running queries can block other sessions from updating a table. To prevent this, many users of these databases will use dirty reads to prevent blocking locks. This is achieved in JDBC by setting the transaction isolation level to read uncommitted.

Example:

int transactionIsolationLevel = 0;

if (readCommitted) 
  transactionIsolationLevel = Connection.TRANSACTION_READ_COMMITTED;
else
  transactionIsolationLevel = Connection.TRANSACTION_READ_UNCOMMITTED;

conn.setTransactionIsolation(transactionIsolationLevel);

So add property to JDBC server configuration file so that the read isolation level can be set. The default should be read committed.

insert char into inner table from pxf

I link to MySQL through PXF and write it to the GP internal table. There is a field of char type. For example:
`
inner tabel:
CREATE EXTERNAL TABLE inner_schema.tb(
id bigint ,
name char(5));

EXTERNAL tabel:
CREATE EXTERNAL TABLE ext_schema.tb(
id bigint ,
name char(5))
LOCATION ('pxf://tb?PROFILE=Jdbc&SERVER=jdbc_mysql&QUOTE_COLUMNS=true')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');

INSERT INTO schema.tb select * from ext_schema.tb;
`
The name's value in MySQL is '1', I expect the same in GP.BUT it is '1 ' in GP.There are 4 spaces after 1.

pxf cluster init Segmentation fault

[gpadmin@sdw3 ~]$ pxf cluster init
Segmentation fault
[gpadmin@sdw3 ~]$ pxf version
PXF version 5.15.1

how to do fix this problem, anything else do i need to supply. thx

Can't read data from hdfs when the erasure coding policy is specified

Greenplum version or build

I install gpdb from 6.2.1 rpm.

postgres=# select version();
                                                                                               version                                                                                                
------------------------------
 PostgreSQL 9.4.24 (Greenplum Database 6.2.1 build commit:d90ac1a1b983b913b3950430d4d9e47ee8827fd4) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 12 2019 18:35:48
(1 row)

pxf version

$ pxf --version
PXF version 5.10.0

OS version and uname -a

Linux 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Installation information ( pg_config )

$ pg_config
BINDIR = /usr/local/greenplum-db-6.2.1/bin
DOCDIR = /usr/local/greenplum-db-6.2.1/share/doc/postgresql
HTMLDIR = /usr/local/greenplum-db-6.2.1/share/doc/postgresql
INCLUDEDIR = /usr/local/greenplum-db-6.2.1/include
PKGINCLUDEDIR = /usr/local/greenplum-db-6.2.1/include/postgresql
INCLUDEDIR-SERVER = /usr/local/greenplum-db-6.2.1/include/postgresql/server
LIBDIR = /usr/local/greenplum-db-6.2.1/lib
PKGLIBDIR = /usr/local/greenplum-db-6.2.1/lib/postgresql
LOCALEDIR = /usr/local/greenplum-db-6.2.1/share/locale
MANDIR = /usr/local/greenplum-db-6.2.1/man
SHAREDIR = /usr/local/greenplum-db-6.2.1/share/postgresql
SYSCONFDIR = /usr/local/greenplum-db-6.2.1/etc/postgresql
PGXS = /usr/local/greenplum-db-6.2.1/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-quicklz' '--enable-gpperfmon' '--with-gssapi' '--enable-mapreduce' '--enable-orafce' '--enable-orca' '--with-libxml' '--with-pgport=5432' '--disable-debug-extensions' '--disable-tap-tests' '--with-perl' '--with-python' '--with-includes=/tmp/build/f8c7ee08/gpdb_src/gpAux/ext/rhel7_x86_64/include /tmp/build/f8c7ee08/gpdb_src/gpAux/ext/rhel7_x86_64/include/libxml2' '--with-libraries=/tmp/build/f8c7ee08/gpdb_src/gpAux/ext/rhel7_x86_64/lib' '--with-openssl' '--with-pam' '--with-ldap' '--prefix=/usr/local/greenplum-db-devel' '--mandir=/usr/local/greenplum-db-devel/man' 'CC=gcc -m64' 'CFLAGS=-m64 -O3 -fargument-noalias-global -fno-omit-frame-pointer -g'
CC = gcc -m64
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/tmp/build/f8c7ee08/gpdb_src/gpAux/ext/rhel7_x86_64/include -I/usr/local/greenplum-db-6.2.1/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fno-aggressive-loop-optimizations -Wno-unused-but-set-variable -Wno-address -m64 -O3 -fargument-noalias-global -fno-omit-frame-pointer -g -std=gnu99 -Werror=uninitialized -Werror=implicit-function-declaration -I/usr/local/greenplum-db-6.2.1/include
CFLAGS_SL = -fPIC
LDFLAGS = -L/tmp/build/f8c7ee08/gpdb_src/gpAux/ext/rhel7_x86_64/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/greenplum-db-devel/lib',--enable-new-dtags -L/usr/local/greenplum-db-6.2.1/lib
LDFLAGS_EX = 
LDFLAGS_SL = 
LIBS = -lpgcommon -lpgport -lgpopt -lnaucrates -lgpdbcost -lgpos -lxerces-c -lxml2 -lpam -lrt -lyaml -lgssapi_krb5 -lquicklz -lzstd -lrt -lcrypt -ldl -lm -L/usr/local/greenplum-db-6.2.1/lib
VERSION = PostgreSQL 9.4.24

hadoop vresion

hadoop-3.1.0

Expected behavior

Reading HDFS text data by pxf external table.

Actual behavior

I can reading HDFS data by pxf external table when the erasure coding policy is unspecified, but can't reading when the erasure coding policy is specified.

reading from hdfs is ok when erasure coding policy is unspecified

hdfs file.

$hdfs ec -getPolicy -path hdfs://tmp/part-05998
The erasure coding policy of hdfs://tmp/part-05998 is unspecified

pxf reading test.

postgres=# CREATE EXTERNAL TABLE public.pxf_example (
    offsetid bigint,
    tdid text,
    monthid integer,
    app_install bigint[]
) LOCATION (
    'pxf://tmp/part-05998?PROFILE=hdfs:text'
) ON ALL 
FORMAT 'text' (delimiter E'\t' null E'' escape E'\\')
ENCODING 'UTF8'
SEGMENT REJECT LIMIT 1 PERCENT;
CREATE EXTERNAL TABLE

postgres=# select offsetid,tdid,monthid from  pxf_example limit 100;
  offsetid   |               tdid                | monthid 
-------------+-----------------------------------+---------
  9830096987 | 3ab70a0bebbf2f599aa1fc5c66baf6705 |  201910
  4668257082 | 31fe2d65a078ba0e496c474d704da2603 |  201910
  4702428099 | 31c74ce394e56f12560e769d9a2e95594 |  201910
  7521396273 | 30fa188c2155852adddcfd51ed1be30c9 |  201910
  7478403144 | 3e6f27f147cbe2ce18c022343adeb1225 |  201910
  5942421014 | 3cbbccdc5503f0e6bee0cfb18f5d39e63 |  201910
  9052621959 | 366a3a33123900d9aabd863982cdbfbac |  201910
  9806218394 | 3054d1ef923047a6b3a695497e2f9ad18 |  201910
  9447557062 | 339a43c09826de999e4a1759c0550e32a |  201910
 10078571309 | 385f1116c88900d490ce9b148ee250fa5 |  201910

reading from hdfs is failed when erasure coding policy is specified

hdfs file (specified erasure coding policy to RS-10-4-1024k).

$hdfs ec -getPolicy -path /spark_logs/part-05998
RS-10-4-1024k

read hadoop file by hdfs client is ok.

$hdfs dfs -cat /spark_logs/part-05998 | more
2020-01-02 11:30:41,876 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
9063519718      3c6cc505cdb0c947bfe9bda367d648640       201910  {-5061992991064318030,-6370420019429456499,3317077434980644433,
0602584}
10097449320     3c16b484c845f230b007ec1d8b57f3b4b       201910  {4345025949894747099,-6753995990058415489}

pxf reading failed.

postgres=# CREATE EXTERNAL TABLE public.pxf_example_ec (
    offsetid bigint,
    tdid text,
    monthid integer,
    app_install bigint[]
) LOCATION (
    'pxf://spark_logs/part-05998?PROFILE=hdfs:text'
) ON ALL 
FORMAT 'text' (delimiter E'\t' null E'' escape E'\\')
ENCODING 'UTF8'
SEGMENT REJECT LIMIT 1 PERCENT;
CREATE EXTERNAL TABLE

postgres=# select offsetid,tdid,monthid from  pxf_example_ec limit 100;
ERROR:  remote component error (500) from '127.0.0.1:5888':  Type  Exception Report   Message  Could not obtain block: BP-2067671923-172.8.9.1-1530169621728:blk_-9223372016158418064_1669579651 file=/spark_logs/part-05998   Description  The server encountered an unexpected condition that prevented it from fulfilling the request.   Exception   java.io.IOException: Could not obtain block: BP-2067671923-172.8.9.1-1530169621728:blk_-9223372016158418064_1669579651 file=/spark_logs/part-05998 (libchurl.c:920)  (seg41 slice1 172.xx.xx.xx:24001 pid=149842) (libchurl.c:920)
CONTEXT:  External table pxf_example_ec, line 1 of file pxf://spark_logs/part-05998?PROFILE=hdfs:text

pxf instance log

Jan 02, 2020 11:34:03 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception
java.io.IOException: Could not obtain block: BP-2067671923-172.x.x.x-1530169621728:blk_-9223372016158418064_1669579651 file=/spark_logs/part-05998
        at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:147)
        at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
        at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
        at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
        at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:444)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1137)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2067671923-172.x.x.x-1530169621728:blk_-9223372016158418064_1669579651 file=/spark_logs/part-05998
        at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1084)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1068)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1047)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:655)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:949)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1004)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.greenplum.pxf.plugins.hdfs.ChunkReader.readChunk(ChunkReader.java:107)
        at org.greenplum.pxf.plugins.hdfs.ChunkRecordReader.next(ChunkRecordReader.java:210)
        at org.greenplum.pxf.plugins.hdfs.ChunkRecordReader.next(ChunkRecordReader.java:56)
        at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.readNextObject(HdfsSplittableDataAccessor.java:132)
        at org.greenplum.pxf.service.bridge.ReadBridge.getNext(ReadBridge.java:94)
        at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:138)
        ... 37 more

Jan 02, 2020 11:34:10 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception
java.io.IOException: Could not obtain block: BP-2067671923-172.x.x.x-1530169621728:blk_-9223372016158418064_1669579651 file=/spark_logs/part-05998
        at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:147)
        at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
        at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
        at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
        at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:444)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1137)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2067671923-172.x.x.x-1530169621728:blk_-9223372016158418064_1669579651 file=/spark_logs/part-05998
        at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1084)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1068)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1047)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:655)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:949)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1004)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.greenplum.pxf.plugins.hdfs.ChunkReader.readLine(ChunkReader.java:155)
        at org.greenplum.pxf.plugins.hdfs.ChunkRecordReader.<init>(ChunkRecordReader.java:146)
        at org.greenplum.pxf.plugins.hdfs.LineBreakAccessor.getReader(LineBreakAccessor.java:71)
        at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.getNextSplit(HdfsSplittableDataAccessor.java:119)
        at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.openForRead(HdfsSplittableDataAccessor.java:88)
        at org.greenplum.pxf.service.bridge.ReadBridge.beginIteration(ReadBridge.java:72)
        at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:131)
        ... 37 more

Should set parameter kernel.sem when running Single cluster inside a docker container

I had error when initializing gpdb

2019-11-07 06:46:11.864957 UTC,,,p15195,th-603387840,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create semaphores: No space left on device (pg_sema.c:126)","Failed system call was semget(118, 17, 03600).","This error does *not* mean that you have run out of disk space.  It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded.  You need to raise the respective kernel parameter.  Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter.
The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.",,,,,,"InternalIpcSemaphoreCreate","pg_sema.c",126,1    0x9c8b90 postgres errstart (elog.c:555)
2    0x7d0a98 postgres PGSemaphoreCreate (pg_sema.c:113)
3    0x847859 postgres InitProcGlobal (proc.c:262)
4    0x8343f5 postgres CreateSharedMemoryAndSemaphores (ipci.c:299)
5    0x9e04e9 postgres BaseInit (postinit.c:564)
6    0x56b0e9 postgres AuxiliaryProcessMain (bootstrap.c:363)
7    0x49debc postgres main (main.c:236)
8    0x7f9fda309495 libc.so.6 __libc_start_main + 0xf5
9    0x49df6d postgres <symbol not found> + 0x49df6d

So I passed kernel param kernel.sem using --sysctl
Value was taken from
https://gpdb.docs.pivotal.io/6-1/install_guide/prep_os.html

docker run -it \
  -p 5432:5432 \
  -p 5888:5888 \
  -p 8000:8000 \
  -p 5005:5005 \
  -p 8020:8020 \
  -p 9000:9000 \
  -p 9090:9090 \
  -p 50070:50070 \
  -w /home/gpadmin/workspace \
  -v ~/workspace/gpdb:/home/gpadmin/workspace/gpdb \
  -v ~/workspace/pxf:/home/gpadmin/workspace/pxf \
  -v ~/workspace/singlecluster-HDP:/home/gpadmin/workspace/singlecluster \
  --sysctl kernel.sem="500 2048000 200 40960" \
  pivotaldata/gpdb-pxf-dev:centos7 /bin/bash -c \
  "/home/gpadmin/workspace/pxf/dev/set_up_gpadmin_user.bash && /usr/sbin/sshd && su - gpadmin"

pxf support read multi files

Does pxf support wildcard pattern to load data from s3 storage(or s3 like)?
when I use methods below, all encountered error.

  1. LOCATION ('pxf://BUCKET_NAME/dirname/?PROFILE=s3:text&COMPRESSION_CODEC=gzip')
    FORMAT 'TEXT' (delimiter=E',');

  2. LOCATION ('pxf://BUCKET_NAME/dirname/*?PROFILE=s3:text&COMPRESSION_CODEC=gzip')
    FORMAT 'TEXT' (delimiter=E',');

  3. LOCATION ('pxf://BUCKET_NAME/dirname/*.gz?PROFILE=s3:text&COMPRESSION_CODEC=gzip')
    FORMAT 'TEXT' (delimiter=E',');

As far as I know, gphdfs support wildcard pattern to load multi files, is that my wrong method to use or really not support ?

PXF fails to write Avro data in case of NULL values in SMALLINT or BYTEA columns.

SQL Error [XX000]: ERROR: remote component error (500) from '127.0.0.1:5888': Type Exception Report Description The server encountered an unexpected condition that prevented it from fulfilling the request. Exception java.lang.NullPointerException (libchurl.c:935)

Steps to reproduce:

For SMALLINT:

CREATE WRITABLE EXTERNAL TABLE test_smallint (
    col smallint
    )
    LOCATION('pxf://path/to/external/storage?PROFILE=(s3|hdfs):avro&SERVER=default')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');

INSERT INTO test_smallint VALUES(null);

For BYTEA:

CREATE WRITABLE EXTERNAL TABLE test_bytea (
    col bytea
    )
    LOCATION('pxf://path/to/external/storage?PROFILE=(s3|hdfs):avro&SERVER=default')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');

INSERT INTO test_bytea VALUES(null);

Cause:

NullPointerException when trying to convert null to primitives:

if (field.type == DataType.BYTEA.getOID()) {
    // Avro does not seem to understand regular byte arrays
    field.val = ByteBuffer.wrap((byte[]) field.val);
} else if (field.type == DataType.SMALLINT.getOID()) {
    // Avro doesn't have a short, just an int type
    field.val = (int) (short) field.val;
}

Greenplum PXF EXTERNAL READABLE TABLE using pre-defined query

As per the example listed for aggregation query, I did exactly the same -
MYSQL :-

use demodb;
create table dept(
id int(4) not null primary key,
name varchar(20) not null
);

create table emp(
dept_id int(4) not null,
name varchar(20) not null,
salary int(8)
);

Some data is inserted into MySQL tables:

insert into dept values(1, 'sales');
insert into dept values(2, 'finance');
insert into dept values(3, 'it');

insert into emp values(1, 'alice', 11000);
insert into emp values(2, 'bob', 10000);
insert into emp values(3, 'charlie', 10500);

Then a complex aggregation query is created and placed in a file, say report.sql. The file needs to be placed in the server configuration directory under $PXF_CONF/servers/. So, let's assume we have created a mydb server configuration directory, then this file will be $PXF_CONF/servers/mydb/report.sql. Jdbc driver name and connection parameters should be configured in $PXF_CONF/servers/mydb/jdbc-site.xml for this server.

SELECT dept.name AS name, count(*) AS count, max(emp.salary) AS max
FROM demodb.dept JOIN demodb.emp
ON dept.id = emp.dept_id
GROUP BY dept.name;

A table in GPDB is created with the schema corresponding to the results returned by the aggregation query.

CREATE EXTERNAL TABLE dept_report (
name text,
count int,
max int
)
LOCATION (
'pxf://query:report?PROFILE=JDBC&SERVER=mydb'
)
FORMAT 'CUSTOM' (
FORMATTER='pxfwritable_import'
);

A query to a GPDB external table is made:

I'm getting this Error message :-

ERROR: remote component error (500) from '127.0.0.1:5888': type Exception report message

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':report' at line 1 description The server encountered an internal error that prevented it from fulfilling this request. exception java.io.IOException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':report' at line 1 (libchurl.c:944) (seg1 slice1 172.17.0.2:50001 pid=7495) (libchurl.c:944)

MYSQL Version = 5.1
MYSQL JDBC Connector = 5.1

SnappyCompressor has not been loaded!

PXF version 5.9.2

this is my extenal table:
postgres=# \d dmp_mi_read1
External table "test_table"
Column | Type | Modifiers
-----------+--------+-----------
zid | text |
mark | text |
timestamp | bigint |
ip | bigint |
Type: readable
Encoding: SQL_ASCII
Format type: custom
Format options: formatter 'pxfwritable_import'
External options: {}
External location: pxf://test/2019/11/14/00/00/part-r-00000?PROFILE=hdfs:SequenceFile&DATA-SCHEMA=xray.greenplum.DmpMiWritable
Execute on: all segments

when i excute:
select * from test_table;

error occurred,:

ERROR: remote component error (500) from '127.0.0.1:5888': Type Exception Report Message native snappy library not available: SnappyCompressor has not been loaded. Description The server encountered an unexpected condition that prevented it from fulfilling the request. Exception java.io.IOException: native snappy library not available: SnappyCompressor has not been loaded. (libchurl.c:920) (seg2 slice1 172.22.16.37:6004 pid=48492) (libchurl.c:920)
CONTEXT: External table test_table

this is localhost log:

Nov 19, 2019 10:47:59 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception
java.io.IOException: native snappy library not available: SnappyCompressor has not been loaded.
at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:149)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:444)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1137)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:72)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:195)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:2007)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1893)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1842)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1856)
at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49)
at org.greenplum.pxf.plugins.hdfs.SequenceFileAccessor.getReader(SequenceFileAccessor.java:76)
at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.getNextSplit(HdfsSplittableDataAccessor.java:119)
at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.openForRead(HdfsSplittableDataAccessor.java:88)
at org.greenplum.pxf.service.bridge.ReadBridge.beginIteration(ReadBridge.java:72)
at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:133)
... 37 more

Nov 19, 2019 10:55:26 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception
java.io.IOException: native snappy library not available: SnappyCompressor has not been loaded.
at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:149)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:444)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1137)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:72)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:195)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:2007)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1893)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1842)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1856)
at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49)
at org.greenplum.pxf.plugins.hdfs.SequenceFileAccessor.getReader(SequenceFileAccessor.java:76)
at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.getNextSplit(HdfsSplittableDataAccessor.java:119)
at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.openForRead(HdfsSplittableDataAccessor.java:88)
at org.greenplum.pxf.service.bridge.ReadBridge.beginIteration(ReadBridge.java:72)
at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:133)
... 37 more


what's wrong with it ?
who can help me?

PXF when loading unescaped regex

I am having an issue in PXF 5.16.0 on Greenplum version 6.14.0 where if I try to load a column's data with an un-escaped regex that the column will break into a new line at the characters \.. Minimal reproducible example

parse.txt
account_id	banner	other
11999	www.butter.com	big words
11999	/^https?:\/\/([a-z0-9_\-\.]+\.)?abc\.net(:\d{1,5})?\//i.test(src)	jaffa kree
11999	www.google.com	text stuff
create external table staging.parsing_issue_ext
(
account_id INTEGER,
banner TEXT,
other TEXT
)
LOCATION ('pxf://<external_file_path>/parse.txt?PROFILE=gs:text&SERVER=gcs')
FORMAT 'TEXT' (NULL AS 'staq_no_value_found' ESCAPE 'OFF')
LOG ERRORS
SEGMENT REJECT LIMIT 2500;

The first and third row should load while the second breaks into a new column at each occurrence of "."

Ideally I'd like to keep the regex un-escaped because we process the data further down in our pipeline using those regexes. Is this intended behavior or is loading this type of data intentionally not supported?

Thanks.

Invalid DELIMITER character

Following instructions here to install PXF in docker: https://github.com/greenplum-db/pxf/blob/master/README.md
Docker image base: pivotaldata/gpdb-pxf-dev:centos6
PXF version: 5.10.0

I use psql to run the following:

template1=# \l
List of databases
Name | Owner | Encoding | Access privileges
--------------------+---------+----------+---------------------
contrib_regression | gpadmin | UTF8 |
gpadmin | gpadmin | UTF8 |
postgres | gpadmin | UTF8 |
template0 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
template1 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
(5 rows)
template1=# CREATE EXTERNAL TABLE my_table(a TEXT) LOCATION ('pxf://myfile?PROFILE=Hive') FORMAT 'TEXT';
CREATE EXTERNAL TABLE
template1=#
template1=# select * from my_table;
ERROR: remote component error (500) from '127.0.0.1:5888': Type Exception Report Message invalid DELIMITER character '%09'. Only single character is allowed for DELIMITER. Description The server encountered an unexpected condition that prevented it from fulfilling the request. Exception java.lang.IllegalArgumentException: invalid DELIMITER character '%09'. Only single character is allowed for DELIMITER. (libchurl.c:963) (seg0 slice1 172.26.0.3:25432 pid=11040) (cdbdisp.c:254)
DETAIL: External table my_table, file pxf://myfile?PROFILE=Hive
template1=#
template1=# CREATE EXTERNAL TABLE my_table_s3(a TEXT) LOCATION ('pxf://myfile?PROFILE=s3:text') FORMAT 'TEXT';
CREATE EXTERNAL TABLE
template1=#
template1=# select * from my_table_s3;
ERROR: remote component error (500) from '127.0.0.1:5888': Type Exception Report Message s3%3Atext is not defined in pxf-profiles.xml Description The server encountered an unexpected condition that prevented it from fulfilling the request. Exception org.greenplum.pxf.service.profile.ProfileConfException: s3%3Atext is not defined in pxf-profiles.xml (libchurl.c:963) (seg0 slice1 172.26.0.3:25432 pid=11052) (cdbdisp.c:254)
DETAIL: External table my_table_s3, file pxf://myfile?PROFILE=s3:text
template1=#

I tried setting a specific delimiter, but same behaviour.
In the two different errors, the cause seems to be the same: a problem with the characters. E.g. pxf has a profile with name s3:text. but instead is trying to find a profile with name s3%3Atext, which does not find.

In case it is useful, here is some logs from pxf server:

Apr 27, 2020 11:44:00 PM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
org.greenplum.pxf.service.profile.ProfileConfException: s3%3Atext is not defined in pxf-profiles.xml
at org.greenplum.pxf.service.profile.ProfilesConf.getProfile(ProfilesConf.java:118)
at org.greenplum.pxf.service.profile.ProfilesConf.getPlugins(ProfilesConf.java:97)
at org.greenplum.pxf.service.HttpRequestParser.addProfilePlugins(HttpRequestParser.java:266)
at org.greenplum.pxf.service.HttpRequestParser.parseRequest(HttpRequestParser.java:84)
at org.greenplum.pxf.service.HttpRequestParser.parseRequest(HttpRequestParser.java:30)
at org.greenplum.pxf.service.rest.BaseResource.parseRequest(BaseResource.java:40)
at org.greenplum.pxf.service.rest.FragmenterResource.getFragments(FragmenterResource.java:110)

PXF mysql schema wrapped by backquote "`" ERROR: remote component error (400) from '127.0.0.1:5888': HTTP status code is 400 but HTTP response string is empty (libchurl.c:935)

Greenplum version or build

6.7.1

OS version and uname -a

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

$ uname -a
Linux master 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

autoconf options used ( config.status --config )

Installation information ( pg_config )

Expected behavior

return correct query result, like other mysql table, no backquote wrapped.

Actual behavior

ERROR:  remote component error (400) from '127.0.0.1:5888': HTTP status code is 400 but HTTP response string is empty (libchurl.c:935)

Step to reproduce the behavior

mysql:

CREATE DATABASE `test-database`;
use `test-database`;
CREATE TABLE `test-table` (
    `test-id` INT PRIMARY KEY AUTO_INCREMENT,
    `test-name` VARCHAR(64)
);
INSERT INTO `test-table`(`test-name`) VALUES('Bob');

greenplum:

CREATE EXTERNAL TABLE "pxf_schema"."test-database.test-table" (
    "test-id" BIGINT,
    "test-name" TEXT
)
LOCATION ('pxf://`test-database`.`test-table`?PROFILE=Jdbc&SERVER=myqsl_server_name&QUOTE_COLUMNS=true')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import')
;
SELECT * FROM "pxf_schema"."test-database.test-table" LIMIT 1;
ERROR:  remote component error (400) from '127.0.0.1:5888': HTTP status code is 400 but HTTP response string is empty (libchurl.c:935)  (seg0 slice1 192.168.237.147:6000 pid=1805) (libchurl.c:935)
CONTEXT:  External table ...

PXF segment DEBUG2 log:

"DEBUG2","00000","churl http header: cell #29: X-GP-URI: pxf%3A%2F%2F%60test-database%60.%60test-table%60%3FPROFILE%3DJdbc%26SERVER%3Dmyqsl_server_name%26QUOTE_COLUMNS%3Dtrue

"libchurl.c",935,"Stack trace:
1    0xbe7c3c postgres errstart (elog.c:557)
2    0xbeac38 postgres elog_finish (elog.c:1729)
3    0x7f4e691d8cec pxf.so churl_read_check_connectivity + 0x2bc
4    0x7f4e691da7f6 pxf.so <symbol not found> + 0x691da7f6
5    0x7f4e691dae2e pxf.so get_fragments + 0xfe
6    0x7f4e691d6793 pxf.so pxfprotocol_import + 0x123
7    0x75bdbd postgres <symbol not found> (discriminator 4)
8    0x75bffc postgres url_custom_fread (url_custom.c:135)
9    0x751a35 postgres <symbol not found> (fileam.c:1412)
10   0x754093 postgres external_getnext (fileam.c:1119)
11   0x8f72a1 postgres <symbol not found> (nodeExternalscan.c:148)
12   0x8ba128 postgres ExecProcNode (execProcnode.c:1009)
13   0x8ddef8 postgres ExecLimit (tuptable.h:159)
14   0x8b9f98 postgres ExecProcNode (execProcnode.c:1117)
15   0x8f8d60 postgres ExecMotion (tuptable.h:159)
16   0x8b9f88 postgres ExecProcNode (execProcnode.c:1121)
17   0x8b14a9 postgres <symbol not found> (tuptable.h:159)
18   0x8b21ac postgres standard_ExecutorRun (execMain.c:2940)
19   0xa83a57 postgres <symbol not found> (pquery.c:1152)
20   0xa85a41 postgres PortalRun (pquery.c:999)
21   0xa7de1e postgres <symbol not found> (postgres.c:1377)
22   0xa82a58 postgres PostgresMain (postgres.c:5403)
23   0x6ad8b8 postgres <symbol not found> (postmaster.c:4462)
24   0xa09b62 postgres PostmasterMain (postmaster.c:1513)
25   0x6b1141 postgres main (main.c:205)
26   0x7f4e84760505 libc.so.6 __libc_start_main + 0xf5
27   0x6bcd4c postgres <symbol not found> + 0x6bcd4c
"

PXF Hive: filtering does not work if partition column is string data type

Hello,
it seems filtering based on partition column in PXF Hive profile is not correctly applied in case the partition column is string type - tested on several cases in the same environment and partition filtering always works for partition columns of integer types however does not work if partition column is string type.

PXF version: 5.12.0
GPDB version: 6.7.1

How tested - created two identical tables:
a) partition column filtering is applied on is integer type
Partition columns:
year_month int
platform string

Query: SELECT * FROM adhoc.table WHERE year_month = 202004;

PXF log:

2020-05-20 08:56:05.0671 DEBUG tomcat-http--15 org.greenplum.pxf.plugins.hive.HiveMetaStoreClientCompatibility1xx - Attempting to fallback
2020-05-20 08:56:05.0955 DEBUG tomcat-http--15 org.greenplum.pxf.plugins.hive.HiveClientWrapper - Item: dw.table, type: EXTERNAL_TABLE
2020-05-20 08:56:05.0956 DEBUG tomcat-http--15 org.greenplum.pxf.plugins.hive.HiveClientWrapper - Hive table: 12 fields. 2 partitions.
2020-05-20 08:56:05.0956 DEBUG tomcat-http--15 org.greenplum.pxf.plugins.hive.HiveDataFragmenter - setPartitions: [platform, year_month]
2020-05-20 08:56:05.0961 DEBUG tomcat-http--15 org.greenplum.pxf.plugins.hive.HiveDataFragmenter - Filter String for Hive partition retrieval : year_month = "202004"
2020-05-20 08:56:05.0986 DEBUG tomcat-http--3 org.greenplum.pxf.service.rest.BridgeResource - Starting streaming fragment 0 of resource hdfs://nameservice1/projects/dw/table/year_month=201707/platform=windows/part-05750-42764e67-c785-4ee3-8984-484a964af736.c000.gz.parquet
2020-05-20 08:56:06.0023 DEBUG tomcat-http--15 org.greenplum.pxf.plugins.hive.HiveDataFragmenter - Table - dw.table matched partitions list size: 8
2020-05-20 08:56:06.0155 INFO tomcat-http--15 org.greenplum.pxf.service.rest.FragmenterResource - org.greenplum.pxf.plugins.hive.HiveDataFragmenter returns 28 fragments for path dw.table in 635 ms for Session = gpadmin:1588764755-0000007762:2:hadoop-hive [profile Hive filter is available]

b) partition column filtering is applied on is string type

Partition columns:
year_month string
platform string

Query: SELECT * FROM adhoc.table WHERE year_month = '202004';

PXF log:

2020-05-20 06:37:20.0100 DEBUG tomcat-http--6 org.greenplum.pxf.plugins.hive.HiveMetaStoreClientCompatibility1xx - Attempting to fallback
2020-05-20 06:37:20.0259 DEBUG tomcat-http--6 org.greenplum.pxf.plugins.hive.HiveClientWrapper - Item:dw.table, type: EXTERNAL_TABLE
2020-05-20 06:37:20.0259 DEBUG tomcat-http--6 org.greenplum.pxf.plugins.hive.HiveClientWrapper - Hive table: 12 fields. 2 partitions.
2020-05-20 06:37:24.0734 INFO tomcat-http--6 org.greenplum.pxf.service.rest.FragmenterResource - org.greenplum.pxf.plugins.hive.HiveDataFragmenter returns 25119 fragments for path dw.table in 5131 ms for Session = gpadmin:1588764755-0000007756:0:hadoop-hive [profile Hive filter is not available]

Seems to me it starts on this line: https://github.com/greenplum-db/pxf/blob/master/server/pxf-hive/src/main/java/org/greenplum/pxf/plugins/hive/HiveDataFragmenter.java#L198
with context.hasFilter() condition since tbl.getPartitionKeysSize() is returning value 2 based on the log (Hive table: 12 fields. 2 partitions.), which comes from https://github.com/greenplum-db/pxf/blob/master/server/pxf-hive/src/main/java/org/greenplum/pxf/plugins/hive/HiveClientWrapper.java#L125 and uses to get number of partition columns the same function tbl.getPartitionKeysSize().

Best,

Ales

Error while parsing headers of CSV files stored in a NFS from external table

Hi guys,

I stumbled upon a very strange behaviour while trying to query compressed CSV data stored on a Network File System from an external table using profile file:text or file:csv.

The thing is, if I have more files in the directory than the number of segments in my database, it will stop skipping the header of the CSV files, and will eventually try to parse it as if it was a data row.

Ex.: I have 3 segment hosts, with 6 segments on each. The issue happens when I have more than 18 CSV files in my directory.

It looks like each segment only processes the first CSV file correctly. If I change all the columns of the external table to text or varchar, I'm able to see the name of the columns as data in the table.

I've tested with different CSV files, with different sizes, in different greenplum installations (Ubuntu and RedHat), with different number of segments.

I'm using PXF 6.1 with GP6. This behaviour should be relatively easy to reproduce.

PXF JDBC select ERROR while the column name it happens to be a keywords

I want to load some data from mysql to greenplum using PXF, and I did these:

in mysql:

CREATE TABLE `test` (
	`id` BIGINT(20) UNSIGNED NOT NULL,
	`name` VARCHAR(54) NOT NULL,
	`group` INT(10) UNSIGNED NOT NULL,
	`user` VARCHAR(254) NOT NULL
);
INSERT INTO test VALUES (1, "whirly", 11, "hello");

in greenplum:

CREATE READABLE EXTERNAL TABLE pxf_test (
	id BIGINT,
	name VARCHAR(54),
	"group" INTEGER,
	"user" VARCHAR(254) 
)LOCATION ('pxf://test?PROFILE=Jdbc&SERVER=aclog')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');

and query data in psql:

select * from pxf_test;

then get ERROR:

ERROR:  remote component error (500) from '127.0.0.1:5888':  type  Exception report   message   
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server 
version for the right syntax to use near 'group, user FROM test' at line 1    description   The 
server encountered an internal error that prevented it from fulfilling this request.    exception   
java.io.IOException: You have an error in your SQL syntax; check the manual that corresponds to 
your MySQL server version for the right syntax to use near 'group, user FROM test' at line 1 
(libchurl.c:944)  (seg0 slice1 28.28.15.202:7000 pid=3017) (cdbdisp.c:254)
DETAIL:  External table pxf_test

I guess that it was cause by the column "group" and "user", They happen to be the keywords of MySQL.

I used '', "", "``", but it does not take effect.

env:

  • PXF version: 5.6.0
  • Greenplum version: 5.18
  • MySQL connector: mysql-connector-java-5.1.47-bin.jar
  • MySQL version: mysql-5.7.26-winx64

Poor performance quering large HDFS data with PXF

Problem summary:
The performance of Greenplum PXF is very poor whether it reads the data on HDFS directly or accesses the data on HDFS through hive. The format of the stored data is parquet, it's actual data size is 1.2T, and the data size stored in HDFS is about 350G. Based on the TPCH test scenario, the execution time of Q1 is around 1680 seconds, and other SQL is over 1000 seconds. During the test, it was found that Java process occupied a lot of CPU, and HDFS also occupied a lot of I/O. From the perspective of statistical time, the reason for this phenomenon should be that all the data on HDFS was read, and the required data was not filtered. We want to know the detailed process of PXF in the query process and which process causes the low performance. It includes the following questions:

Question 1: What metadata does PXF's agent get from Name Node in HDFS? What's the difference between accessing HDFS directly and reading metadata through hive table access? On the surface, why is the performance of accessing HDFS directly similar to that of accessing hive table?
Question 2: PXF uses Tomcat to read and forward data? Tomcat is to read the whole data to each host?
Question 3: Whether the where condition pushdown of PXF do filtering in Tomcat or when reading HDFS data?
Question 4: Does Tomcat on each host read data from HDFS once or multiple host nodes only read data once?
Question 5: When the segments on each GP read data, do they read only a part of the data or all data from Tomcat?
Question 6: For this scenario, HDFS has a large amount of data (more than 300GB for a single table). Is there any good optimization methods?

Testing process:

  1. Test version and environment:
    Greenplum version: GP 6.4
    Operating system: REHL7
    pxf version: 5.10.1

  2. Network environment:
    image

Scene No. Test results(s)
GP+PXF+Parquet
1 1681.623
3 1466.411
5 1683.317
6 1175.192
8 1286.907
9 1514.672
14 1151.166
17 2346.653

Look forward to your replies~
@yydzero

Help Wanted ERR HikariPool-1 - Connection is not available, request timed out after 30000ms.

Good afternoon
I am using PXF 5.14
There was a problem, with a large number of rows more than a million, the query execution time exceeds 30 seconds, as a result of which the application crashes with an error

SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.

I conducted an audit of all the configuration files and where I found connection timeout = 30c doubled, restarted the service, but it had no results
when executing a request with a time of more than 30 seconds, an error occurred at 31 seconds

сен 16, 2021 12:04:38 PM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:697)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
at org.greenplum.pxf.plugins.jdbc.utils.ConnectionManager.getConnection(ConnectionManager.java:127)
at org.greenplum.pxf.plugins.jdbc.JdbcBasePlugin.getConnectionInternal(JdbcBasePlugin.java:465)
at org.greenplum.pxf.plugins.jdbc.JdbcBasePlugin.getConnection(JdbcBasePlugin.java:367)
at org.greenplum.pxf.plugins.jdbc.JdbcAccessor.openForWrite(JdbcAccessor.java:173)
at org.greenplum.pxf.service.bridge.WriteBridge.beginIteration(WriteBridge.java:59)
at org.greenplum.pxf.service.rest.WritableResource.writeResponse(WritableResource.java:154)
at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:136)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1195)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)

сен 16, 2021 12:05:00 PM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.io.IOException: Invalid chunk header
at org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:615)
at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:192)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:298)
at org.apache.coyote.Request.doRead(Request.java:442)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
at org.apache.tomcat.util.buf.ByteChunk.checkEof(ByteChunk.java:431)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:369)
at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:304)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:106)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.greenplum.pxf.api.io.GPDBWritable.readPktLen(GPDBWritable.java:158)
at org.greenplum.pxf.api.io.GPDBWritable.readFields(GPDBWritable.java:180)
at org.greenplum.pxf.service.BridgeInputBuilder.makeInput(BridgeInputBuilder.java:54)
at org.greenplum.pxf.service.bridge.WriteBridge.setNext(WriteBridge.java:69)
at org.greenplum.pxf.service.rest.WritableResource.writeResponse(WritableResource.java:161)
at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:136)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1195)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)

make test failed due to java.lang.IllegalAccessError error

$ java --version
openjdk 13.0.2 2020-01-14
OpenJDK Runtime Environment (build 13.0.2+8)
OpenJDK 64-Bit Server VM (build 13.0.2+8, mixed mode, sharing)
$ make test --debug

./gradlew test

> Task :pxf-api:test FAILED

org.greenplum.pxf.api.security.SecureLoginTest > initializationError FAILED
    java.lang.IllegalAccessError

org.apache.hadoop.security.PxfUserGroupInformationTest > initializationError FAILED
    java.lang.IllegalAccessError
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.mockito.cglib.core.ReflectUtils$2 (file:/Users/yydzero/.gradle/caches/modules-2/files-2.1/org.mockito/mockito-core/1.9.5/c3264abeea62c4d2f367e21484fbb40c7e256393/mockito-core-1.9.5.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of org.mockito.cglib.core.ReflectUtils$2
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

221 tests completed, 2 failed

FAILURE: Build failed with an exception.

PXF connection SQL Server error

I get error:

SQL Error [XX000]: ERROR: remote component error (500) from '127.0.0.1:5888': type Exception report message com.microsoft.sqlserver.jdbc.SQLServerDriver description The server encountered an internal error that prevented it from fulfilling this request. exception java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerDriver (libchurl.c:944) (seg0 slice1 127.0.1.1:6000 pid=13489) (cdbdisp.c:254)

when i tryed to connect to SQL Server from my external table.

The DLL of my external table is:

-- Drop table
-- DROP EXTERNAL TABLE public.pxf_test_from_mssql

CREATE EXTERNAL TABLE demodb.public.pxf_test_from_mssql (
name varchar
)
LOCATION (
'pxf://dbo.test_pxf?PROFILE=Jdbc&JDBC_DRIVER=com.microsoft.sqlserver.jdbc.SQLServerDriver&DB_URL=jdbc:sqlserver://192.168.183.144:1433;database=TestDB;user=sa;password=12qwasZX'
) ON ALL
FORMAT 'CUSTOM' ( FORMATTER='pxfwritable_import' )
ENCODING 'UTF8';

-- Permissions

ALTER EXTERNAL TABLE public.pxf_test_from_mssql OWNER TO gpadmin;
GRANT ALL ON TABLE public.pxf_test_from_mssql TO gpadmin;

I can work with Postgresql from an external table without such errors and I can not identify the cause of the error when working with SQL Server.

can't write data to hdfs by pxf external table: ERROR: failed sending to remote component - org.greenplum.pxf.service.rest.WritableResource - wrote 0 bulks to

Greenplum version or build

I build gpdb from 6.1.0 source code.

postgres=# select version();
                                                                                          version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.24 (Greenplum Database 6.0.0-beta.1 build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3), 64-bit compiled on Nov 15 2019 02:56:05

pxf version

[postgres@udw-m0 ~]$ pxf --version
PXF version 5.9.1

OS version and uname -a

CentOS release 6.10 (Final)

Installation information ( pg_config )

[postgres@udw-m0 ~]$ pg_config
BINDIR = /usr/local/gpdb/bin
DOCDIR = /usr/local/gpdb/share/doc/postgresql
HTMLDIR = /usr/local/gpdb/share/doc/postgresql
INCLUDEDIR = /usr/local/gpdb/include
PKGINCLUDEDIR = /usr/local/gpdb/include/postgresql
INCLUDEDIR-SERVER = /usr/local/gpdb/include/postgresql/server
LIBDIR = /usr/local/gpdb/lib
PKGLIBDIR = /usr/local/gpdb/lib/postgresql
LOCALEDIR = /usr/local/gpdb/share/locale
MANDIR = /usr/local/gpdb/share/man
SHAREDIR = /usr/local/gpdb/share/postgresql
SYSCONFDIR = /usr/local/gpdb/etc/postgresql
PGXS = /usr/local/gpdb/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-perl' '--with-python' '--with-libxml' '--with-gssapi' '--prefix=/usr/local/gpdb'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/local/gpdb/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fno-aggressive-loop-optimizations -Wno-unused-but-set-variable -Wno-address -O3 -std=gnu99 -Werror=uninitialized -Werror=implicit-function-declaration -I/usr/local/gpdb/include
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/gpdb/lib',--enable-new-dtags -L/usr/local/gpdb/lib
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lgpopt -lnaucrates -lgpdbcost -lgpos -lxerces-c -lxml2 -lrt -lgssapi_krb5 -lzstd -lrt -lcrypt -ldl -lm -L/usr/local/gpdb/lib
VERSION = PostgreSQL 9.4.24

Expected behavior

I followed this document: https://gpdb.docs.pivotal.io/6-1/pxf/hdfs_text.html, and hope I can reading and writing HDFS text data by pxf external table.

Actual behavior

I can reading HDFS data by pxf external table, but can't writing.

reading from hdfs is ok

postgres=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=hdfs:text')
FORMAT 'TEXT' (delimiter=E',');
CREATE EXTERNAL TABLE
postgres=# select * from pxf_hdfs_textsimple;
 location  | month | num_orders | total_sales
-----------+-------+------------+-------------
 Prague    | Jan   |        101 |     4875.33
 Rome      | Mar   |         87 |     1557.39
 Bangalore | May   |        317 |     8936.99
 Beijing   | Jul   |        411 |    11600.67
(4 rows)

write to hdfs failed

When writing data to hdfs throw an error:

postgres=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=hdfs:text')
FORMAT 'TEXT' (delimiter=',');
CREATE EXTERNAL TABLE
postgres=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
ERROR:  failed sending to remote component '127.0.0.1:5888' (libchurl.c:599)  (seg2 10.13.99.105:40000 pid=6945) (libchurl.c:599)
postgres=# \q

and the pxf agent is running

[postgres@udw-m0 ~]$ pxf cluster status
Checking status of PXF servers on 2 hosts...
PXF is running on 2 out of 2 hosts

here is hdfs file:

[root@hadoop-master1 ~]# hdfs dfs -ls /data/pxf_examples
Found 3 items
-rw-r--r--   3 postgres supergroup         94 2019-12-26 20:52 /data/pxf_examples/pxf_hdfs_simple.txt
drwxr-xr-x   - postgres supergroup          0 2019-12-26 20:43 /data/pxf_examples/pxfwritable_hdfs_textsimple1
drwxr-xr-x   - postgres supergroup          0 2019-12-26 20:55 /data/pxf_examples/pxfwritable_hdfs_textsimple2

in hdfs, the dir /data/pxf_examples/pxfwritable_hdfs_textsimple2 has created by the external table, but contents of data is empty.

[root@uhadoop-master1 ~]# hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
[root@uhadoop-master1 ~]#

and here is pxf-service.log

2019-12-27 02:59:00.0055 INFO udw-c02-startStop-1 org.greenplum.pxf.service.utilities.SecureLogin - Kerberos Security is not enabled
2019-12-27 02:59:01.0386 DEBUG tomcat-http--18 org.greenplum.pxf.service.rest.InvalidPathResource - REST request: http://localhost:5888/pxf/v0. Version v0, supported version is v15
2019-12-27 02:59:07.0545 DEBUG tomcat-http--4 org.greenplum.pxf.service.servlet.SecurityServletFilter - Retrieving proxy user for session: Session = postgres:1577352484-0000029633:0
2019-12-27 02:59:07.0545 DEBUG tomcat-http--4 org.greenplum.pxf.service.UGICache - Session = postgres:1577352484-0000029633:0 Creating remote user = postgres
2019-12-27 02:59:07.0547 DEBUG tomcat-http--4 org.greenplum.pxf.service.servlet.SecurityServletFilter - Performing request chain call for proxy user = postgres
2019-12-27 02:59:07.0617 INFO tomcat-http--4 org.greenplum.pxf.service.profile.ProfilesConf - Processed 55 profiles from file pxf-profiles-default.xml
2019-12-27 02:59:07.0622 WARN tomcat-http--4 org.greenplum.pxf.service.profile.ProfilesConf - Profile file 'pxf-profiles.xml' is empty
2019-12-27 02:59:07.0623 INFO tomcat-http--4 org.greenplum.pxf.service.profile.ProfilesConf - PXF profiles loaded: [adl:avro, adl:AvroSequenceFile, adl:csv, adl:json, adl:parquet, adl:SequenceFile, adl:text, adl:text:multi, Avro, GemFireXD, gs:avro, gs:AvroSequenceFile, gs:csv, gs:json, gs:parquet, gs:SequenceFile, gs:text, gs:text:multi, HBase, hdfs:avro, hdfs:AvroSequenceFile, hdfs:csv, hdfs:json, hdfs:parquet, hdfs:SequenceFile, hdfs:text, hdfs:text:multi, HdfsTextMulti, HdfsTextSimple, Hive, HiveORC, HiveRC, HiveText, HiveVectorizedORC, Jdbc, Json, Parquet, s3:avro, s3:AvroSequenceFile, s3:csv, s3:json, s3:parquet, s3:SequenceFile, s3:text, s3:text:multi, SequenceText, SequenceWritable, wasbs:avro, wasbs:AvroSequenceFile, wasbs:csv, wasbs:json, wasbs:parquet, wasbs:SequenceFile, wasbs:text, wasbs:text:multi]
2019-12-27 02:59:07.0625 DEBUG tomcat-http--4 org.greenplum.pxf.service.HttpRequestParser - Parsing request parameters: [accept, content-type, expect, host, transfer-encoding, x-gp-alignment, x-gp-attr-name0, x-gp-attr-name1, x-gp-attr-name2, x-gp-attr-name3, x-gp-attr-typecode0, x-gp-attr-typecode1, x-gp-attr-typecode2, x-gp-attr-typecode3, x-gp-attr-typename0, x-gp-attr-typename1, x-gp-attr-typename2, x-gp-attr-typename3, x-gp-attrs, x-gp-data-dir, x-gp-format, x-gp-has-filter, x-gp-options-delimiter, x-gp-options-encoding, x-gp-options-escape, x-gp-options-null, x-gp-options-profile, x-gp-segment-count, x-gp-segment-id, x-gp-uri, x-gp-url-host, x-gp-url-port, x-gp-user, x-gp-xid]
2019-12-27 02:59:07.0626 DEBUG tomcat-http--4 org.greenplum.pxf.service.HttpRequestParser - Adding plugins for profile hdfs:text
2019-12-27 02:59:07.0641 DEBUG tomcat-http--4 org.greenplum.pxf.service.HttpRequestParser - Added option encoding to request context
2019-12-27 02:59:07.0641 DEBUG tomcat-http--4 org.greenplum.pxf.service.HttpRequestParser - Unused property x-gp-uri
2019-12-27 02:59:07.0649 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Initializing configuration for server default
2019-12-27 02:59:07.0649 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Using directory /home/postgres/pxf/usercfg/servers/default for server default configuration
2019-12-27 02:59:07.0650 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/yarn-site.xml
2019-12-27 02:59:07.0675 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/mapred-site.xml
2019-12-27 02:59:07.0717 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/core-site.xml
2019-12-27 02:59:07.0742 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/hdfs-site.xml
2019-12-27 02:59:07.0764 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding 0 additional properties to configuration for server default
2019-12-27 02:59:07.0794 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Initializing configuration for server default
2019-12-27 02:59:07.0794 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Using directory /home/postgres/pxf/usercfg/servers/default for server default configuration
2019-12-27 02:59:07.0794 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/yarn-site.xml
2019-12-27 02:59:07.0807 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/mapred-site.xml
2019-12-27 02:59:07.0817 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/core-site.xml
2019-12-27 02:59:07.0829 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding configuration resource for server default from file:/home/postgres/pxf/usercfg/servers/default/hdfs-site.xml
2019-12-27 02:59:07.0840 DEBUG tomcat-http--4 org.greenplum.pxf.api.model.BaseConfigurationFactory - Adding 0 additional properties to configuration for server default
2019-12-27 02:59:07.0844 DEBUG tomcat-http--4 org.greenplum.pxf.plugins.hdfs.CodecFactory - No codec was found for file data/pxf_examples/pxfwritable_hdfs_textsimple2
2019-12-27 02:59:07.0844 DEBUG tomcat-http--4 org.greenplum.pxf.service.bridge.WriteBridge - Bridge is thread safe
2019-12-27 02:59:07.0844 DEBUG tomcat-http--4 org.greenplum.pxf.service.rest.WritableResource - Request for data/pxf_examples/pxfwritable_hdfs_textsimple2 will be handled without synchronization
2019-12-27 02:59:07.0844 DEBUG tomcat-http--4 org.greenplum.pxf.plugins.hdfs.HcfsType$3 - File name for write: hdfs://Ucluster/data/pxf_examples/pxfwritable_hdfs_textsimple2/1577352484-0000029633_0
2019-12-27 02:59:08.0755 DEBUG tomcat-http--4 org.greenplum.pxf.plugins.hdfs.LineBreakAccessor - Closing writing stream for path hdfs://Ucluster/data/pxf_examples/pxfwritable_hdfs_textsimple2/1577352484-0000029633_0
2019-12-27 02:59:08.0764 DEBUG tomcat-http--4 org.greenplum.pxf.service.rest.WritableResource - wrote 0 bulks to
2019-12-27 02:59:08.0765 DEBUG tomcat-http--4 org.greenplum.pxf.service.servlet.SecurityServletFilter - Releasing proxy user for session: Session = postgres:1577352484-0000029633:0.

I noticed that there is wrote 0 bulks to in log file, but no error.

com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to node1/192.168.18.1:46049 failed on local

gp6.2.1 through pxf import hbase data to
environment:
hadoop-2.6.5.
hbase-1.2.7
error message:
Jan 14, 2020 4:05:56 PM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to node1/192.168.18.1:46049 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to node1/192.168.18.1:46049 is closing. Call id=0, waitTime=52
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1573)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1593)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1750)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isMasterRunning(ConnectionManager.java:955)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:3181)
at org.greenplum.pxf.plugins.hbase.HBaseDataFragmenter.getFragments(HBaseDataFragmenter.java:89)
at org.greenplum.pxf.service.rest.FragmenterResource.getFragments(FragmenterResource.java:186)
at org.greenplum.pxf.service.rest.FragmenterResource.access$100(FragmenterResource.java:62)
at org.greenplum.pxf.service.rest.FragmenterResource$1.call(FragmenterResource.java:128)
at org.greenplum.pxf.service.rest.FragmenterResource$1.call(FragmenterResource.java:122)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at org.greenplum.pxf.service.rest.FragmenterResource.getFragments(FragmenterResource.java:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

Need help to compile

Hi!

I tried to compile PXF on my debian system, but couldn't figure it out. I installed the required dependencies - I think - with:

sudo apt install golang go-dep golang-ginkgo-dev

I already had a OpenJDK installed.

I cloned the repository with:

git clone https://github.com/greenplum-db/pxf.git
cd pxf

Then I tried to build it with make, but it errored out:

~/git-sandbox-gpdb/pxf (master)$ make
make -C cli/go/src/pxf-cli
make[1]: Entering directory '/home/heikki/git-sandbox-gpdb/pxf/cli/go/src/pxf-cli'
/home/heikki/git-sandbox-gpdb/pxf/cli/go/src/pxf-cli is not within a known GOPATH/src
make[1]: *** [Makefile:29: depend] Error 1
make[1]: Leaving directory '/home/heikki/git-sandbox-gpdb/pxf/cli/go/src/pxf-cli'
make: *** [Makefile:18: cli] Error 2

What am I missing?

pxf+hive+kerberos error

When pxf connect hive, it occur:
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
2019-11-01 20:09:36.0129 WARN tomcat-http--6 hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:3762)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:3748)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:502)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:281)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:187)
at org.greenplum.pxf.plugins.hive.utilities.HiveUtilities.initHiveClient(HiveUtilities.java:94)
at org.greenplum.pxf.plugins.hive.HiveDataFragmenter.initialize(HiveDataFragmenter.java:113)
at org.greenplum.pxf.api.utilities.BasePluginFactory.getPlugin(BasePluginFactory.java:54)
at org.greenplum.pxf.service.rest.FragmenterResource.getFragments(FragmenterResource.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:105)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)

but kerberos authorize works:

2019-11-01 20:09:24.0953 INFO localhost-startStop-1 org.greenplum.pxf.service.utilities.SecureLogin - Kerberos Security is enabled
2019-11-01 20:09:24.0954 DEBUG localhost-startStop-1 org.greenplum.pxf.service.utilities.SecureLogin - Kerberos principal: [email protected]
2019-11-01 20:09:24.0954 DEBUG localhost-startStop-1 org.greenplum.pxf.service.utilities.SecureLogin - Kerberos keytab: /home/ansible/dsarch_gp.keytab
2019-11-01 20:09:25.0047 DEBUG localhost-startStop-1 org.apache.hadoop.security.UserGroupInformation - hadoop login
2019-11-01 20:09:25.0049 DEBUG localhost-startStop-1 org.apache.hadoop.security.UserGroupInformation - hadoop login commit
2019-11-01 20:09:25.0050 DEBUG localhost-startStop-1 org.apache.hadoop.security.UserGroupInformation - using kerberos user:[email protected]
2019-11-01 20:09:25.0050 DEBUG localhost-startStop-1 org.apache.hadoop.security.UserGroupInformation - Using user: "[email protected]" with name [email protected]
2019-11-01 20:09:25.0051 DEBUG localhost-startStop-1 org.apache.hadoop.security.UserGroupInformation - User entry: "[email protected]"
2019-11-01 20:09:25.0051 INFO localhost-startStop-1 org.apache.hadoop.security.UserGroupInformation - Login successful for user [email protected] using keytab file /home/ansible/dsarch_gp.keytab

pxf can not read composite avro file from aws s3 and return error org.apache.avro.generic.GenericData$EnumSymbol cannot be cast to java.lang.String

avsc meta data

vim avro_schema.avsc 
{
"type" : "record",
  "name" : "example_schema",
  "namespace" : "com.example",
  "fields" : [ {
    "name" : "id",
    "type" : "long",
    "doc" : "Id of the user account"
  }, {
    "name" : "username",
    "type" : "string",
    "doc" : "Name of the user account"
  }, {
    "name" : "followers",
    "type" : {"type": "array", "items": "string"},
    "doc" : "Users followers"
  }, {
    "name": "fmap",
    "type": {"type": "map", "values": "long"}
  }, {
    "name": "relationship",
    "type": {
        "type": "enum",
        "name": "relationshipEnum",
        "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
    }
  }, {
    "name": "address",
    "type": {
        "type": "record",
        "name": "addressRecord",
        "fields": [
            {"name":"number", "type":"int"},
            {"name":"street", "type":"string"},
            {"name":"city", "type":"string"}]
    }
  } ],
  "doc:" : "A basic schema for storing messages"
}

avro data

vim pxf_avro.txt 
{"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
{"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}}

java -jar ./avro-tools-1.9.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_avro.txt > /tmp/pxf_avro.avro

convert

java -jar ./avro-tools.jar fromjson --schema-file ./avro_schema.avsc ./pxf_avro.txt > ./pxf_avro.avro

create table

DROP FOREIGN TABLE s3_pxf_cn_with_avro;
CREATE FOREIGN TABLE s3_pxf_cn_with_avro(
id bigint, username text, followers text, fmap text, relationship text, address text
)
SERVER s3_server_cn
OPTIONS ( resource '/testpxf/avrowithenum/pxf_avro.avro', format 'avro');
select * from s3_pxf_cn_with_avro;

query

postgres=# select * from s3_pxf_cn_with_avro;
psql: ERROR:  remote component error (500) from '127.0.0.1:5888':  Type  Exception Report   Message  org.apache.avro.generic.GenericData$EnumSymbol cannot be cast to java.lang.String   Description  The server encountered an unexpected condition that prevented it from fulfilling the request.   Exception   java.io.IOException: org.apache.avro.generic.GenericData$EnumSymbol cannot be cast to java.lang.String (libchurl.c:963)  (seg0 slice1 192.168.100.14:50000 pid=452878) (libchurl.c:963)

PXF read Hive transactional table Error

When I set a Hive ORC table to TBLPROPERTIES ('transactional'='true'),it will report an exception.

environment

  • greenplum: 5.21
  • Hive: HDP 3.1

step

note: the sample data is the data on the official document, http://docs.greenplum.org/5190/pxf/hive_pxf.html#hive_orc

  • step1: create HIVE ORC table (transactional)
CREATE TABLE sales_info_ORC 
	(location string, month string, number_of_orders int, total_sales double) 
clustered by (number_of_orders) into 8 buckets 
STORED AS ORC 
TBLPROPERTIES ('transactional'='true');
  • step2: load data from sales_info to sales_info_ORC
INSERT INTO TABLE sales_info_ORC SELECT * FROM sales_info;
  • step3: create PXF EXTERNAL TABLE
CREATE EXTERNAL TABLE salesinfo_hiveORCprofile
	(location text, month text, number_of_orders int, total_sales float8)
LOCATION ('pxf://zh.sales_info_ORC?PROFILE=HiveORC&SERVER=hivetest')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  • step 4: read the PXF table and it will report exception
default=# select * from salesinfo_hiveORCprofile;
ERROR:  remote component error (500) from '127.0.0.1:5888':  type  Exception report   
message   serious problem    description   The server encountered an internal error 
that prevented it from fulfilling this request.    exception   
java.lang.RuntimeException: serious problem (libchurl.c:946)  (seg0 slice1 192.168.192.1:6000 
pid=10889) (cdbdisp.c:254)
DETAIL:  External table salesinfo_hiveorcprofile

unit test error when make install

$ java -version
openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-b10)
OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)

$ make -C ~/pxf install
> Task :pxf-api:test

org.greenplum.pxf.api.security.SecureLoginTest > testPrincipalGetsResolvedForServer() FAILED
org.opentest4j.AssertionFailedError at SecureLoginTest.java:389

org.greenplum.pxf.api.security.SecureLoginTest > testLoginKerberosReuseExistingLoginSessionWithResolvedHostnameInPrincipal() FAILED
java.lang.RuntimeException at SecureLoginTest.java:346
Caused by: java.lang.NullPointerException at SecureLoginTest.java:346

339 tests completed, 2 failed

> Task :pxf-api:test FAILED

FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ':pxf-api:test'.
    > There were failing tests. See the report at: file:///home/gpadmin/pxf/server/pxf-api/build/reports/tests/test/index.html

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

Does anyone know how to solve it?

PXF - Partitioned Hive table with different number of columns in underlying parquet files

https://groups.google.com/a/greenplum.org/forum/?utm_medium=email&utm_source=footer#!msg/gpdb-users/VXov5JU0xG4/sjc8j4ufBgAJ

From the open source channel:

Hello,

I'm experiencing an PXF issue while reading a HIVE table with multiple partitions while underlying parquet files have different number of columns - there was a change in time and the latest parquest files contains more columns than previous ones (some columns were added).
Hive can smoothly handle this situation and returns NULL for missing columns in older partitions / parquet files however PXF is failing with the following error:
[22000] ERROR: Record has 33 fields but the schema size is 34 (seg0 slice1 10.0.0.1:6000 pid=5440)

Is it a known limitation of PXF for Hive? Is there a workaround please?

Thanks,

Ales

PXF External Table Hive 3.1 failed with java.io.IOException: Not a file

5.17.0

CentOS 7 3.10.0-862.11.6.el7.x86_64

HDP 3.1 / Hive 3.1

Hive:
CREATE TABLE sales_info (location string, month string, number_of_orders int, total_sales double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS textfile;

select * from sales_info;
+----------------------+-------------------+------------------------------+-------------------------+
| sales_info.location | sales_info.month | sales_info.number_of_orders | sales_info.total_sales |
+----------------------+-------------------+------------------------------+-------------------------+
| Prague | Jan | 101 | 4875.33 |
| Rome | Mar | 87 | 1557.39 |
| Bangalore | May | 317 | 8936.99 |
| Beijing | Jul | 411 | 11600.67 |
| San Francisco | Sept | 156 | 6846.34 |
| Paris | Nov | 159 | 7134.56 |
| San Francisco | Jan | 113 | 5397.89 |
| Prague | Dec | 333 | 9894.77 |
| Bangalore | Jul | 271 | 8320.55 |
| Beijing | Dec | 100 | 4248.41 |
+----------------------+-------------------+------------------------------+-------------------------+
Greenplum:
CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://default.sales_info?PROFILE=HiveText&DELIMITER=\x2c')
FORMAT 'TEXT' (delimiter=E',');

select * from salesinfo_hivetextprofile; ERROR: remote component error (500) from '127.0.0.1:5888': type Exception report message javax.servlet.ServletException: java.io.IOException: Not a file: hdfs://hdm1-t.aq3hcc40qdkurot1542u4kjxfd.ax.internal.cloudapp.net:8020/warehouse/tablespace/managed/hive/sales_info/delta_0000001_0000001_0000 description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: javax.servlet.ServletException: java.io.IOException: Not a file: hdfs://hdm1-t.aq3hcc40qdkurot1542u4kjxfd.ax.internal.cloudapp.net:8020/warehouse/tablespace/managed/hive/sales_info/delta_0000001_0000001_0000 (libchurl.c:944) (seg0 slice1 172.16.0.7:6000 pid=52329) (cdbdisp.c:254)
DETAIL: External table salesinfo_hivetextprofile, file pxf://default.sales_info?PROFILE=HiveText&DELIMITER=\x2c

PXF log:
Feb 20, 2019 2:46:54 PM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.io.IOException: Not a file: hdfs://hdm1-t.aq3hcc40qdkurot1542u4kjxfd.ax.internal.cloudapp.net:8020/warehouse/tablespace/managed/hive/sales_info/delta_0000001_0000001_0000
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329)
at org.greenplum.pxf.plugins.hive.HiveDataFragmenter.fetchMetaData(HiveDataFragmenter.java:302)
at org.greenplum.pxf.plugins.hive.HiveDataFragmenter.fetchMetaDataForSimpleTable(HiveDataFragmenter.java:263)
at org.greenplum.pxf.plugins.hive.HiveDataFragmenter.fetchMetaDataForSimpleTable(HiveDataFragmenter.java:257)
at org.greenplum.pxf.plugins.hive.HiveDataFragmenter.fetchTableMetaData(HiveDataFragmenter.java:229)
at org.greenplum.pxf.plugins.hive.HiveDataFragmenter.getFragments(HiveDataFragmenter.java:124)
at org.greenplum.pxf.service.rest.FragmenterResource.getFragments(FragmenterResource.java:92)
at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:105)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:120)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:957)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)

gpdb + pxf + pool + jdbc + mongo = not authorized for query

Hi guys!
I'm trying to use Greenplum 6 with PXF 5.8.2 to querying collections in MongoDB 3.4.2 over JDBC driver (for example from UnityJDBC package). In a nutshell when I switch jdbc.pool.enabled to false in the properties everything works fine but if I switch it to true I've got flashing error not authorized for query from MongoDB when I try to get the data.

More details:

  1. I've setted up PXF server for Greenplum as you describe in the manual without any issues
  2. I've based connection configuration on pxf/templates/user/templates/jdbc-site.xml template:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>jdbc.driver</name>
        <value>mongodb.jdbc.MongoDriver</value>
    </property>
    <property>
        <name>jdbc.url</name>
        <value>jdbc:mongo://10.0.0.4:27017/ud</value>
    </property>
    <property>
        <name>jdbc.user</name>
        <value>user</value>
    </property>
    <property>
        <name>jdbc.password</name>
        <value>testpassword</value>
    </property>

    <!-- Connection Pool properties -->
    <!-- You can use properties described here: https://github.com/brettwooldridge/HikariCP
         except for the following ones: dataSourceClassName, jdbcUrl, username, password,
         dataSource.user, dataSource.password
    -->
    <property>
        <name>jdbc.pool.enabled</name>
        <value>false</value>                                                                                                                                                                                                   </property>
  1. I've created external table as it described in the Greenplum PXF manual
CREATE EXTERNAL TABLE test ( _id text )
LOCATION ('pxf://Users?PROFILE=JDBC&SERVER=mongo_ud')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');

SELECT * FROM test;
  1. Result:
    💚 If I set jdbc.pool.enabled to false everything is fine I got the data every time I need
    ❤️ If I set jdbc.pool.enabled to true I get flashing error not authorized for query on ud.Users.
  2. I've tried two JDBC drivers and behavior of the error is the same
  3. I've checked MongoDB logs with maximum log level and find no interesting (ud.Users is my collection to getting data):
2019-09-12T21:14:15.992+0000 I QUERY    [conn36070] assertion 13 not authorized for query on ud.Users ns:ud.Users query:{}
2019-09-12T21:14:25.792+0000 D -        [conn36070] User Assertion: 13:not authorized for query on ud.Users src/mongo/db/instance.cpp 365

This error is flashing (appears and disappears) because it depends on previous queries or cancelled queries. First query could works but second no. Or first and second could work but if the third was canceled the fourth will fail with error.

My hypothesis are around closed/dead connections in the pool. It looks like after closing connection it stay in the pool and reuse of this connection raise the error.

Could you please help me to detect the location of the problem:

Thanks!

Does GreenPlum with PXF support avro data with schema evolution

We have user data (avro files) validated and ingested into HDFS using Schema Registry(data keep on evolving) and using GreenPlum with PXF to access HDFS data. Created one external table and trying to query the HDFS data but getting error as,

warehouse=# select * from user;
ERROR:  Record has 151 fields but the schema size is 152  (seg1 slice1 192.168.1.17:6001 pid=6582)
CONTEXT:  External table user
warehouse=#

user HDFS files are ingested using different schema versions, and GreenPlum external table has been created with fields from all the schema versions.

Named query and MongoDB

Hello!
We successfully using PXF with MongoDB. Thank you!

Now we've faced with new type of queries: we need to execute the query in MongoDB and get the result. Ordinary select queries using external table is not working for us because PXF is not supported "ORDER BY" and "LIMIT" for external database. We can't do:

SELECT * FROM ext_table_some_collection WHERE field > 10 ORDER BY field LIMIT 10

and get the result of:

db.some_collection.find({},{"field": {$gt: 10}}).sort({"field":1}).limit(10)

We've found "Named Queries" in the documentation.

Could you please give an example of how we could use this in our case described above?

My current try was fail:

  1. I put example_query.sql to pxf/server/mongo_test/. The content was:
db.some_collection.find({},{"field": {$gt: 10}}).sort({"field":1}).limit(10)
  1. After restarting pxf I created the external table:
CREATE EXTERNAL TABLE test ( _id text, field int)
LOCATION ('pxf://query:example_query?PROFILE=Jdbc&SERVER=mongo_test')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  1. And the query return error:
select * from test limit 1;
ERROR:  remote component error (500) from '127.0.0.1:5888':  type  Exception report   message   ERROR: No schema defined.  A schema is needed when performing joins or complex SQL.  Put the parameter rebuildschema=true in your JDBC URL at least for the first time connecting to rebuild the schema.  Example: jdbc:mongo:localhost:27017?rebuildschema=true.  The default schema location is in the file mongo_&lt;dbname&gt;.xml.  Use the schema parameter to set a file location (e.g. schema=mongo.xml) to store the schema.  See connection parameters at http://www.unityjdbc.com/mongojdbc/ for more details.    description   The server encountered an internal error that prevented it from fulfilling this request.    exception   java.io.IOException: ERROR: No schema defined.  A schema is needed when performing joins or complex SQL.  Put the parameter rebuildschema=true in your JDBC URL at least for the first time connecting to rebuild the schema.  Example: jdbc:mongo:localhost:27017?rebuildschema=true.  The default schema location is in the file mongo_&lt;dbname&gt;.xml.  Use the schema parameter to set a file location (e.g. schema=mongo.xml) to store the schema.  See connection parameters at http://www.unityjdbc.com/mongojdbc/ for more details. (libchurl.c:920)  (seg0 slice1 172.18.0.2:40000 pid=7258) (libchurl.c:920)
  1. I see in PXF logs the query was read by PXF

Please help:

  1. What query should be in sql file? It should be native query for external database (for mongo: db.col.find()) or it should be sql language queries with collection name and field names (for mongo: select * from col) and it will be converted to mongo query (db.col.find()) by PXF?

  2. Do you have thoughts about what ERROR: No schema defined means in the error above in PXF context?

  3. Could you help to make PXF Named Queries work with MongoDB?

Thanks!

Can not read s3:sequence file

Version: 5.1.6

We are trying to create an external table to read file on S3.
Here is the data schema class

public class SspUrlDomainAdult implements Writable {

    public String st1 = "";
    public String st2 = "";
    public String st3 = "";

    public SspUrlDomainAdult() {
    }

    public SspUrlDomainAdult(String st1, String st2, String st3) {
        this.st1 = st1;
        this.st2 = st2;
        this.st3 = st3;
    }

    String GetSt1() {
        return st1;
    }

    String GetSt2() {
        return st2;
    }

    String GetSt3() {
        return st3;
    }

    @Override
    public void write(DataOutput out) throws IOException {

        Text txt = new Text();
        txt.set(st1);
        txt.write(out);
        txt.set(st2);
        txt.write(out);
        txt.set(st3);
        txt.write(out);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        Text txt = new Text();
        txt.readFields(in);
        st1 = txt.toString();
        txt.readFields(in);
        st2 = txt.toString();
        txt.readFields(in);
        st3 = txt.toString();
    }

    public void printFieldTypes() {
        Class myClass = this.getClass();
        Field[] fields = myClass.getDeclaredFields();

        for (int i = 0; i < fields.length; i++) {
            System.out.println(fields[i].getType().getName());
        }
    }
}

and the file on S3 is

  • sequence format
  • compression type is BLOCK
  • compression codec is gzip
  • can access by Athena

and the sql of table is

create external table public.pxf_s3_ssp_url_domain_adult2(
  domain text,
  exclusion_from text,
  exclusion_to text
)
location (
  'pxf://our_bucket/our_table?PROFILE=s3:SequenceFile&DATA-SCHEMA=co.jp.issp.pxf.ssp.SspUrlDomainAdult&SERVER=s3ssp&COMPRESSION_TYPE=BLOCK'
)
format 'CUSTOM' (FORMATTER='pxfwritable_import')

This sql can create the table,
but query is not working on the table.
select * from pxf_s3_ssp_url_domain_adult2 limit 10;

Logs are

2019-04-03 18:36:53.0702 DEBUG tomcat-http--13 org.greenplum.pxf.api.model.BaseConfigurationFactory - Using directory /usr/local/greenplum-db/pxf/conf/serverss3ssp for server s3ssp configuration
2019-04-03 18:36:53.0703 DEBUG tomcat-http--13 org.greenplum.pxf.api.model.BaseConfigurationFactory - adding configuration resource from file:/usr/local/greenplum-db/pxf/conf/servers/s3ssp/s3-site.xml
2019-04-03 18:36:53.0703 DEBUG tomcat-http--13 org.greenplum.pxf.plugins.hdfs.WritableResolver - Field #0, name: st1 type: java.lang.String, Primitive, accessible field
2019-04-03 18:36:53.0705 DEBUG tomcat-http--13 org.greenplum.pxf.plugins.hdfs.WritableResolver - Field #1, name: st2 type: java.lang.String, Primitive, accessible field
2019-04-03 18:36:53.0705 DEBUG tomcat-http--13 org.greenplum.pxf.plugins.hdfs.WritableResolver - Field #2, name: st3 type: java.lang.String, Primitive, accessible field
2019-04-03 18:36:53.0705 DEBUG tomcat-http--13 org.greenplum.pxf.plugins.hdfs.utilities.HdfsUtilities - No codec was found for file s3a://our_bucket/our_table/000000_0
2019-04-03 18:36:53.0709 DEBUG tomcat-http--13 org.greenplum.pxf.service.bridge.ReadBridge - Bridge is thread safe
2019-04-03 18:36:53.0710 DEBUG tomcat-http--13 org.greenplum.pxf.service.rest.BridgeResource - Request for s3a://our_bucket/our_table/000000_0 will be handled without synchronization
2019-04-03 18:36:53.0711 DEBUG tomcat-http--13 org.greenplum.pxf.api.utilities.Utilities - parsed file split: path s3a://our_bucket/our_table/000000_0, start 0, end 25525, hosts {localhost}
2019-04-03 18:36:53.0816 INFO tomcat-http--13 org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]
2019-04-03 18:36:53.0817 DEBUG tomcat-http--13 org.greenplum.pxf.service.rest.BridgeResource - Starting streaming fragment 0 of resource s3a://our_bucket/our_table/000000_0
2019-04-03 18:36:53.0817 ERROR tomcat-http--13 org.greenplum.pxf.service.rest.BridgeResource - Exception thrown when streaming
java.lang.IllegalArgumentException: Can not set java.lang.String field co.jp.issp.pxf.ssp.SspUrlDomainAdult.st1 to org.apache.hadoop.io.Text
	at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
	at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
	at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:58)
	at sun.reflect.UnsafeObjectFieldAccessorImpl.get(UnsafeObjectFieldAccessorImpl.java:36)
	at java.lang.reflect.Field.get(Field.java:393)
	at org.greenplum.pxf.plugins.hdfs.WritableResolver.populateRecord(WritableResolver.java:179)
	at org.greenplum.pxf.plugins.hdfs.WritableResolver.getFields(WritableResolver.java:125)
	at org.greenplum.pxf.service.bridge.ReadBridge.makeOutput(ReadBridge.java:76)
	at org.greenplum.pxf.service.bridge.ReadBridge.getNext(ReadBridge.java:106)
	at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:139)
	at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
	at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
	at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
	at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
	at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
	at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
	at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:105)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
	at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:120)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
	at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:957)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
2019-04-03 18:36:53.0819 DEBUG tomcat-http--13 org.greenplum.pxf.service.rest.BridgeResource - Stopped streaming fragment 0 of resource s3a://our_bucket/our_table/000000_0, 0 records.
2019-04-03 18:36:53.0819 WARN tomcat-http--13 com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2019-04-03 18:36:53.0826 DEBUG tomcat-http--13 org.greenplum.pxf.service.servlet.SecurityServletFilter - Releasing proxy user for session: Session = gpadmin:1554252863-0000048959:1. 

Cheers

is support hadoop3 erasure coding policies

my hadoop cluster version is 3.1.0 ,when a path not open erasure coding policies ,the EXTERNAL table read normal ,when open the erasure coding policies read failed. so pxf not support this or need configure.

Add properties to JDBC Server configuration file

Add DriverManager.getConnection properties to SERVER configuration file. This is required to get good performance out of an Oracle connection where the property "defaultRowPrefetch" needs to be set to a higher value than the default of 10. Usually 2000 is used for large data extracts that is used by something like PXF. Other properties may need to be set so either make this a comma delimited list of properties or allow multiple properties to be set in the configuration file.

example:
Connection conn = DriverManager.getConnection(connectionUrl, props);

PXF can't running on gpdb6.0.beta7

when i use command "pxf cluster start" to startup the pxf service,the tomcat is up and the PXF is down.The error likes this:

  1. java.lang.NoSuchMethodException: org.apache.catalina.deploy.WebXml setSessionConfig
  2. java.lang.NoSuchMethodException: org.apache.catalina.deploy.WebXml addServlet
    more detail error on ​attached file.

    Environment:
    OS: CentOS 6.10
    Kernel:​3.18.140-11.el6.x86_64
    DB server: GPDB 6.0.beta7
    pxf version:PXF version 5.7.0
    Tomcat: Apache Tomcat/7.0.62​
    JVM Version:1.8.0_91-b14

ERROR: failed sending to remote component data movement between vertica and Greenplum

Hi,

when we are trying to push data we are getting error. some times data will get inserted and at some times we will face this issue.
we are trying topush 500 Million records some times it will not fail and some times it will fail. please find query and sample error below.

Query: INSERT INTO external_table1 select * from Table2 limit 500000000

Error: This same appears in he pxf logs on all segment hosts /usr/local/pxf-user/logs/localhost.2020-11-24.log
Nov 24, 2020 7:55:03 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception [javax.servlet.ServletException: java.sql.BatchUpdateException: [Vertica]VJDBC One or more rows were rejected by the server.] with root cause
java.sql.BatchUpdateException: [Vertica]VJDBC One or more rows were rejected by the server.
at com.vertica.jdbc.common.SStatement.processBatchResults(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.executeBatch(Unknown Source)
at com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl.executeBatch(Unknown Source)
at com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:128)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
at org.greenplum.pxf.plugins.jdbc.writercallable.BatchWriterCallable.call(BatchWriterCallable.java:73)
at org.greenplum.pxf.plugins.jdbc.JdbcAccessor.writeNextObject(JdbcAccessor.java:247)
at org.greenplum.pxf.service.bridge.WriteBridge.setNext(WriteBridge.java:78)
at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:138)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:145)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:834)

Nov 24, 2020 7:55:29 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception [javax.servlet.ServletException: java.io.IOException: Invalid chunk header] with root cause
java.io.IOException: Invalid chunk header
at org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:615)
at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:192)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:316)
at org.apache.coyote.Request.doRead(Request.java:442)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
at org.apache.tomcat.util.buf.ByteChunk.checkEof(ByteChunk.java:431)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:369)
at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:304)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:106)
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.greenplum.pxf.api.io.GPDBWritable.readPktLen(GPDBWritable.java:158)
at org.greenplum.pxf.api.io.GPDBWritable.readFields(GPDBWritable.java:180)
at org.greenplum.pxf.service.BridgeInputBuilder.makeInput(BridgeInputBuilder.java:60)
at org.greenplum.pxf.service.bridge.WriteBridge.setNext(WriteBridge.java:69)
at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:138)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:145)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:834)

Nov 24, 2020 7:55:29 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception [javax.servlet.ServletException: java.io.IOException: Invalid chunk header] with root cause
java.io.IOException: Invalid chunk header
at org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:615)
at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:192)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:316)
at org.apache.coyote.Request.doRead(Request.java:442)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
at org.apache.tomcat.util.buf.ByteChunk.checkEof(ByteChunk.java:431)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:369)
at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:304)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:106)
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.greenplum.pxf.api.io.GPDBWritable.readPktLen(GPDBWritable.java:158)
at org.greenplum.pxf.api.io.GPDBWritable.readFields(GPDBWritable.java:180)
at org.greenplum.pxf.service.BridgeInputBuilder.makeInput(BridgeInputBuilder.java:60)
at org.greenplum.pxf.service.bridge.WriteBridge.setNext(WriteBridge.java:69)
at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:138)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:145)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:834)

Nov 24, 2020 7:55:34 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception [javax.servlet.ServletException: java.io.IOException: Invalid chunk header] with root cause
java.io.IOException: Invalid chunk header
at org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:615)
at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:192)
at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:316)
at org.apache.coyote.Request.doRead(Request.java:442)
at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
at org.apache.tomcat.util.buf.ByteChunk.checkEof(ByteChunk.java:431)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:369)
at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:304)
at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:106)
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.greenplum.pxf.api.io.GPDBWritable.readPktLen(GPDBWritable.java:158)
at org.greenplum.pxf.api.io.GPDBWritable.readFields(GPDBWritable.java:180)
at org.greenplum.pxf.service.BridgeInputBuilder.makeInput(BridgeInputBuilder.java:60)
at org.greenplum.pxf.service.bridge.WriteBridge.setNext(WriteBridge.java:69)
at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:138)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:145)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:834)
Thanks
Joe

GreenPlum 5.11 PXF Issue

https://gpdb.docs.pivotal.io/5110/pxf/client_instcfg.html

I followed the instruction set in this page, but some of the instructions are very unclear.

I used the Cloudera distributions for hadoop and pretty much followed the instruction documented in this page.

but when I initialize PXF, it throws me the below error
ERROR: Can not determine Hadoop distribution, please install Hadoop clients.

Not sure what is wrong here.

PXF: Error while reading timestamp column from Hive parquet file

https://groups.google.com/a/greenplum.org/forum/?utm_medium=email&utm_source=footer#!msg/gpdb-users/iYNLT7_FSeM/c3XdFKesBAAJ

From the open source channel:

Hello,

getting error while trying to read a Hive table (parquet format)containing timestamp column (created via Spark) - PXF v5.11.2 .
Seems like joda-time dependency is missing.

Best,

Ales

Apr 01, 2020 4:19:46 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [PXF REST Service] in context with path [/pxf] threw exception [javax.servlet.ServletException: Servlet execution threw an exception] with root cause
java.lang.NoClassDefFoundError: jodd/datetime/JDateTime
        at org.apache.hadoop.hive.ql.io.parquet.timestamp.NanoTimeUtils.getTimestamp(NanoTimeUtils.java:93)
        at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$9$1.convert(ETypeConverter.java:205)
        at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$9$1.convert(ETypeConverter.java:196)
        at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.setDictionary(ETypeConverter.java:283)
        at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341)
        at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
        at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
        at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
        at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
        at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
        at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
        at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
        at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
        at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
        at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
        at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
        at org.greenplum.pxf.plugins.hive.HiveAccessor.getReader(HiveAccessor.java:201)
        at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.getNextSplit(HdfsSplittableDataAccessor.java:119)
        at org.greenplum.pxf.plugins.hdfs.HdfsSplittableDataAccessor.openForRead(HdfsSplittableDataAccessor.java:88)
        at org.greenplum.pxf.plugins.hive.HiveAccessor.openForRead(HiveAccessor.java:145)
        at org.greenplum.pxf.service.bridge.ReadBridge.beginIteration(ReadBridge.java:72)
        at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:131)
        at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
        at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
        at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.greenplum.pxf.service.servlet.SecurityServletFilter.lambda$doFilter$0(SecurityServletFilter.java:146)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
        at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:158)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:452)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1195)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

Require image for ARM64 architecture

Hi Team,

I am trying to use pivotaldata/gpdb6-ubuntu18.04-test, pivotaldata/ubuntu-gpdb-debian-dev, pivotaldata/gpdb6-ubuntu18.04-build, pivotaldata/gpdb6-centos6-build and pivotaldata/gpdb6-centos7-build images on ARM64 architecture but it seems it is not available for arm64.

Do you have any plans on releasing arm64 images on Docker hub?

It will be very helpful if the image is released for the ARM64 platform. If required, I am happy to contribute. But as a start, can you please provide me with the source code available for these packages.

using pxf jdbc writable table insert to db2 lose data

Referenced from GPDB issues: greenplum-db/gpdb#7510

when using pxf and db2 jdbc driver to load millions of data to db2 , mostly the data lost silently,occasionally report following error
2019-04-12 10:17:31.0603 ERROR tomcat-http--17 org.greenplum.pxf.service.rest.WritableResource - Exception: totalWritten so far 59728 to /DB2TMDB.TP_ENTERPRISE/1553735636-0000097227_0
12974 java.io.IOException: Invalid chunk header
12975 at org.apache.coyote.http11.filters.ChunkedInputFilter.throwIOException(ChunkedInputFilter.java:619)
12976 at org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:192)
12977 at org.apache.coyote.http11.AbstractInputBuffer.doRead(AbstractInputBuffer.java:341)
12978 at org.apache.coyote.Request.doRead(Request.java:431)
12979 at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)
12980 at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:390)
12981 at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:304)
12982 at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:106)
12983 at java.io.DataInputStream.readInt(DataInputStream.java:387)
12984 at org.greenplum.pxf.service.io.GPDBWritable.readPktLen(GPDBWritable.java:155)
12985 at org.greenplum.pxf.service.io.GPDBWritable.readFields(GPDBWritable.java:177)
12986 at org.greenplum.pxf.service.BridgeInputBuilder.makeInput(BridgeInputBuilder.java:54)
12987 at org.greenplum.pxf.service.WriteBridge.setNext(WriteBridge.java:74)
12988 at org.greenplum.pxf.service.rest.WritableResource.writeResponse(WritableResource.java:152)
12989 at org.greenplum.pxf.service.rest.WritableResource.stream(WritableResource.java:129)
12990 at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)
12991 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
12993 at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
12994 at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
12995 at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
12996 at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
12997 at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
12998 at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
12999 at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
13000 at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
13001 at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
13002 at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
13003 at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
13004 at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
13005 at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
13006 at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
13007 at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
13008 at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
13009 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
13010 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
13011 at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
13012 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
13013 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
13014 at org.greenplum.pxf.service.servlet.SecurityServletFilter$1.run(SecurityServletFilter.java:99)
13015 at org.greenplum.pxf.service.servlet.SecurityServletFilter$1.run(SecurityServletFilter.java:93)
13016 at java.security.AccessController.doPrivileged(Native Method)
13017 at javax.security.auth.Subject.doAs(Subject.java:422)
13018 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
13019 at org.greenplum.pxf.service.servlet.SecurityServletFilter.doFilter(SecurityServletFilter.java:118)
13020 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
13021 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
13022 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
13023 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
13024 at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
13025 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
13026 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
13027 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:957)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
13029 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
13030 at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
13031 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
13032 at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
13033 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
13034 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
13035 at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
13036 at java.lang.Thread.run(Thread.java:748)
13037 2019-04-12 10:38:29.0952 ERROR tomcat-http--7 org.greenplum.pxf.service.rest.BridgeResource - Remote connection closed by GPDB
13038 org.apache.catalina.connector.ClientAbortException: java.net.SocketException: Connection reset
13039 at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:407)
13040 at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
13041 at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:310)
13042 at org.apache.catalina.connector.OutputBuffer.writeByte(OutputBuffer.java:451)
13043 at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:77)
13044 at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:292)
13045 at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:139)
13046 at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
13047 at org.greenplum.pxf.service.io.GPDBWritable.write(GPDBWritable.java:466)
13048 at org.greenplum.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:149)
13049 at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
13050 at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
13051 at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
13052 at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
13053 at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
13054 at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
13055 at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
13056 at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
13057 at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
13058 at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
13059 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
13060 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
13061 at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
13062 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
13063 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

pxfwritable_export - ERROR: failed sending to remote component

Here is the steps on what I did.

  1. Create a new database pgtest
    template1=# CREATE DATABASE pgtest;
    CREATE DATABASE
    template1=# \c pgtest

  2. connect to pgtest database and create a new table and insert one row.
    pgtest=# CREATE TABLE public.test (id integer);
    CREATE TABLE
    pgtest=# INSERT INTO public.test values (1);
    INSERT 0 1

  3. connect to template1 database
    pgtest=# \c template1
    template1=# CREATE WRITABLE EXTERNAL TABLE pxf_writeto_postgres(id int)
    template1-# LOCATION ( 'pxf://public.test?PROFILE=JDBC&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://localhost:15432/pgtest&USER=gpadmin') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');

CREATE EXTERNAL TABLE
template1=# INSERT INTO pxf_writeto_postgres VALUES (111);

ERROR: failed sending to remote component '127.0.0.1:5888' (libchurl.c:620) (seg1 172.17.0.2:50001 pid=9738) (libchurl.c:620)

I'm on
PostgreSQL 9.4.20
Greenplum Database 6.0.0-beta.3
PXF version 5.3.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.