Git Product home page Git Product logo

waggle-dance's Introduction

Bee waggle-dancing on a hive.

Start using

You can obtain Waggle Dance from Maven Central:

Maven Central TGZ Maven Central RPM Build Coverage Status GitHub license

Overview

Waggle Dance is a request routing Hive metastore proxy that allows tables to be concurrently accessed across multiple Hive deployments. It was created to tackle the appearance of dataset silos that arose as our large organization gradually migrated from monolithic on-premises clusters to cloud based platforms.

In short, Waggle Dance provides a unified end point with which you can describe, query, and join tables that may exist in multiple distinct Hive deployments. Such deployments may exist in disparate regions, accounts, or clouds (security and network permitting). Dataset access is not limited to the Hive query engine, and should work with any Hive metastore enabled platform. We've been successfully using it with Spark for example.

We also use Waggle Dance to apply a simple security layer to cloud based platforms such as Qubole, DataBricks, and EMR. These currently provide no means to construct cross platform authentication and authorization strategies. Therefore we use a combination of Waggle Dance and network configuration to restrict writes and destructive Hive operations to specific user groups and applications.

We maintain a mapping of virtual database names to federated metastore instances. These virtual names are resolved by Waggle Dance during execution and requests are forwarded to the mapped metastore instance.

Virtual database name Mapped database name Mapped metastore URIs
mydb mydb thrift://host:port/

So when we do the following in a Hive CLI client connected to a Waggle Dance instance:

select *
from mydb.table;

We are actually performing the query against the thrift://host:port/ metastore. All metastore calls will be forwarded and data will be fetched and processed locally. This makes it possible to read and join data from different Hive clusters via a single Hive CLI.

System architecture

Waggle Dance system diagram.

Install

Waggle Dance is intended to be installed and set up as a service that is constantly running and should be installed on a machine that is accessible from wherever you want to query it from and which also has access to the Hive metastore service(s) that it is federating. Waggle Dance is available as a RPM or TGZ package, steps for installation of both are covered below.

TGZ version

The TGZ package provides a "vanilla" version of Waggle Dance that is easy to get started with but will require some additional scaffolding in order to turn it into a fully-fledged service.

Download the TGZ from Maven central and then uncompress the file by executing:

tar -xzf waggle-dance-<version>-bin.tgz

Although it's not necessary, we recommend exporting the environment variable WAGGLE_DANCE_HOME by setting its value to wherever you extracted it to:

export WAGGLE_DANCE_HOME=/<foo>/<bar>/waggle-dance-<version>

Refer to the configuration section below on what is needed to customise the configuration files before continuing.

Running on the command line

To run Waggle Dance execute:

$WAGGLE_DANCE_HOME/bin/waggle-dance.sh --server-config=$WAGGLE_DANCE_HOME/conf/waggle-dance-server.yml --federation-config=$WAGGLE_DANCE_HOME/conf/waggle-dance-federation.yml

Log messages will be output to the standard output by default.

RPM version

The RPM package provides a fully-fledged service version of Waggle Dance.

Download the RPM from Maven Central and install it using your distribution's packaging tool, e.g. yum:

sudo yum install <waggle-dance-rpm-file>

This will install Waggle Dance into /opt/waggle-dance (this location is referred to as $WAGGLE_DANCE_HOME in this documentation). It will also create a log file output folder in /var/log/waggle-dance and register Waggle Dance as an init.d service.

Refer to the configuration section below on what is needed to customise the configuration files before continuing.

Running as a service

Once configured, the service needs to be started:

sudo service waggle-dance start

Currently any changes to the configuration files require restarting the service in order for the changes to take effect (the exception to this is any changes to the log4j2.xml logging config file which will be picked up while running):

sudo service waggle-dance restart

Log messages can be found in /var/log/waggle-dance/waggle-dance.log.

Configuration

In order to start using Waggle Dance it must first be configured for your environment. The simplest way to do this is to copy and then modify the template configuration files that are provided by the Waggle Dance package, i.e.:

cp $WAGGLE_DANCE_HOME/conf/waggle-dance-server.yml.template $WAGGLE_DANCE_HOME/conf/waggle-dance-server.yml
cp $WAGGLE_DANCE_HOME/conf/waggle-dance-federation.yml.template $WAGGLE_DANCE_HOME/conf/waggle-dance-federation.yml

This sets up the default YAML configuration files which need to be customised for your use case. Now edit the property remote-meta-store-uris in $WAGGLE_DANCE_HOM/conf/waggle-dance-federation.yml and modify this to contain the URI(s) of the metastore(s) you want to federate. The sections below contain further details about the available configuration settings and should be used to customise the rest of the values in these files accordingly.

Server

Server config is by default located in $WAGGLE_DANCE_HOME/conf/waggle-dance-server.yml.

The table below describes all the available configuration values for Waggle Dance server:

Property Required Description
port No Port on which the waggle-dance listens. Default is 0xBEE5 (48869)
verbose No Log detailed trace. Default is false
disconnect-connection-delay No Idle metastore connection timeout. Default is 5
disconnect-time-unit No Idle metastore connection timeout units. Default is MINUTES
database-resolution No Controls what type of database resolution to use. See the Database Resolution section. Default is MANUAL.
status-polling-delay No Controls the delay that checks metastore availability and updates long running connections of any status change. Default is 5 (every 5 minutes).
status-polling-delay-time-unit No Controls the delay time unit. Default is MINUTES .
configuration-properties No Map of Hive properties that will be added to the HiveConf used when creating the Thrift clients (they will be shared among all the clients).
queryFunctionsAcrossAllMetastores No Controls if the Thrift getAllFunctions should be fired to all configured metastores or only the primary metastore. The advice is to set this to false. Executing getAllFunctions can have an unwanted performance impact when a metastore is slow to respond. The function call is typically only called when a client is initialized and is largely irrelevant. Default is true (to be backward compatible)

Extensions (for instance Rate Limiting) are described here: waggle-dance-extensions/README.md

Federation

Federation config is by default located in: $WAGGLE_DANCE_HOME/conf/waggle-dance-federation.yml.

Example:

primary-meta-store:                                     # Primary metastore
  access-control-type: READ_AND_WRITE_AND_CREATE_ON_DATABASE_WHITELIST
  name: primary                                         # unique name to identify this metastore
  remote-meta-store-uris: thrift://127.0.0.1:9083
  writable-database-white-list:
  - my_writable_db1
  - my_writable_db2
  - user_db_.*
  - ...
federated-meta-stores:                                  # List of read only metastores to federate
- remote-meta-store-uris: thrift://10.0.0.1:9083
  name: secondary
  metastore-tunnel:
    route: ec2-user@bastion-host -> hadoop@emr-master
    private-keys: /home/user/.ssh/bastion-key-pair.pem,/home/user/.ssh/emr-key-pair.pem
    known-hosts: /home/user/.ssh/known_hosts
  hive-metastore-filter-hook: filter.hook.class
  mapped-databases:
  - prod_db1
  - prod_db2
  - dev_group_1.*
  mapped-tables:
  - database: prod_db1
    mapped-tables:
    - tbl1
    - tbl_.*
  - database: prod_db2
    mapped-tables:
    - tbl2
- ...

The table below describes all the available configuration values for Waggle Dance federations:

Property Required Description
primary-meta-store No Primary MetaStore config. Can be empty but it is advised to configure it.
primary-meta-store.remote-meta-store-uris Yes Thrift URIs of the federated read-only metastore.
primary-meta-store.name Yes Database name that uniquely identifies this metastore. Used internally. Cannot be empty.
primary-meta-store.database-prefix No Prefix used to access the primary metastore and differentiate databases in it from databases in another metastore. The default prefix (i.e. if this value isn't explicitly set) is empty string.
primary-meta-store.access-control-type No Sets how the client access controls should be handled. Default is READ_ONLY Other options READ_AND_WRITE_AND_CREATE, READ_AND_WRITE_ON_DATABASE_WHITELIST and READ_AND_WRITE_AND_CREATE_ON_DATABASE_WHITELIST see Access Control section below.
primary-meta-store.writable-database-white-list No White-list of databases used to verify write access used in conjunction with primary-meta-store.access-control-type. The list of databases should be listed without any primary-meta-store.database-prefix. This property supports both full database names and (case-insensitive) Java RegEx patterns.
primary-meta-store.metastore-tunnel No See metastore tunnel configuration values below.
primary-meta-store.latency No Indicates the acceptable slowness of the metastore in milliseconds for increasing the default connection timeout. Default latency is 0 and should be changed if the metastore is particularly slow. If you get an error saying that results were omitted because the metastore was slow, consider changing the latency to a higher number.
primary-meta-store.mapped-databases No List of databases to federate from the primary metastore; all other databases will be ignored. This property supports both full database names and Java RegEx patterns (both being case-insensitive). By default, all databases from the metastore are federated.
primary-meta-store.mapped-tables No List of mappings from databases to tables to federate from the primary metastore, similar to mapped-databases. By default, all tables are available. See mapped-tables configuration below.
primary-meta-store.hive-metastore-filter-hook No Name of the class which implements the MetaStoreFilterHook interface from Hive. This allows a metastore filter hook to be applied to the corresponding Hive metastore calls. Can be configured with the configuration-properties specified in the waggle-dance-server.yml configuration. They will be added in the HiveConf object that is given to the constructor of the MetaStoreFilterHook implementation you provide.
primary-meta-store.database-name-mapping No BiDirectional Map of database names and mapped name, where key=<database name as known in the primary metastore> and value=<name that should be shown to a client>. See the Database Name Mapping section.
primary-meta-store.glue-config No Can be used instead of remote-meta-store-uris to federate to an AWS Glue Catalog (AWS Glue. See the Federate to AWS Glue Catalog section.
primary-meta-store.read-only-remote-meta-store-uris No Can be used to configure an extra read-only endpoint for the primary Metastore. This is an optimization if your environment runs separate Metastore endpoints and traffic needs to be diverted efficiently. Waggle Dance will direct traffic to the read-write or read-only endpoints based on the call being done. For instance get_table will be a read-only call but alter_table will be forwarded to the read-write Metastore.
federated-meta-stores No Possible empty list of read only federated metastores.
federated-meta-stores[n].remote-meta-store-uris Yes Thrift URIs of the federated read-only metastore.
federated-meta-stores[n].name Yes Name that uniquely identifies this metastore. Used internally. Cannot be empty.
federated-meta-stores[n].database-prefix No Prefix used to access this particular metastore and differentiate databases in it from databases in another metastore. Typically used if databases have the same name across metastores but federated access to them is still needed. The default prefix (i.e. if this value isn't explicitly set) is {federated-meta-stores[n].name} lowercased and postfixed with an underscore. For example if the metastore name was configured as "waggle" and no database prefix was provided but PREFIXED database resolution was used then the value of database-prefix would be "waggle_".
federated-meta-stores[n].metastore-tunnel No See metastore tunnel configuration values below.
federated-meta-stores[n].latency No Indicates the acceptable slowness of the metastore in milliseconds for increasing the default connection timeout. Default latency is 0 and should be changed if the metastore is particularly slow. If you get an error saying that results were omitted because the metastore was slow, consider changing the latency to a higher number.
federated-meta-stores[n].mapped-databases No List of databases to federate from this federated metastore, all other databases will be ignored. This property supports both full database names and Java RegEx patterns (both being case-insensitive). By default, all databases from the metastore are federated.
federated-meta-stores[n].mapped-tables No List of mappings from databases to tables to federate from this federated metastore, similar to mapped-databases. By default, all tables are available. See mapped-tables configuration below.
federated-meta-stores[n].hive-metastore-filter-hook No Name of the class which implements the MetaStoreFilterHook interface from Hive. This allows a metastore filter hook to be applied to the corresponding Hive metastore calls. Can be configured with the configuration-properties specified in the waggle-dance-server.yml configuration. They will be added in the HiveConf object that is given to the constructor of the MetaStoreFilterHook implementation you provide.
federated-meta-stores[n].database-name-mapping No BiDirectional Map of database names and mapped names where key=<database name as known in the federated metastore> and value=<name that should be shown to a client>. See the Database Name Mapping section.
federated-meta-stores[n].writable-database-white-list No White-list of databases used to verify write access used in conjunction with federated-meta-stores[n].access-control-type. The list of databases should be listed without a federated-meta-stores[n].database-prefix. This property supports both full database names and (case-insensitive) Java RegEx patterns.
federated-meta-stores[n].glue-config No Can be used instead of remote-meta-store-uris to federate to an AWS Glue Catalog (AWS Glue. See the Federate to AWS Glue Catalog section.

Metastore tunnel

The table below describes the metastore tunnel configuration values:

Property Required Description
*.metastore-tunnel.localhost No The address on which to bind the local end of the tunnel. Default is 'localhost'.
*.metastore-tunnel.port No The port on which SSH runs on the remote node. Default is 22.
*.metastore-tunnel.route No A SSH tunnel can be used to connect to federated metastores. The tunnel may consist of one or more hops which must be declared in this property. See Configuring a SSH tunnel for details.
*.metastore-tunnel.known-hosts No Path to a known hosts file.
*.metastore-tunnel.private-keys No A comma-separated list of paths to any SSH keys required in order to set up the SSH tunnel.
*.metastore-tunnel.timeout No The SSH session timeout in milliseconds, 0 means no timeout. Default is 60000 milliseconds, i.e. 1 minute.
*.metastore-tunnel.strict-host-key-checking No Whether the SSH tunnel should be created with strict host key checking. Can be set to yes or no. The default is yes.

Mapped tables

The table below describes the mapped-tables configuration. For each entry in the list, a database name and the corresponding list of table names/patterns must be mentioned.

Property Required Description
*.mapped-tables[n].database Yes Name of the database which contains the tables to be mapped.
*.mapped-tables[n].mapped-tables Yes List of tables allowed for the database specified in the field above. This property supports both full table names and Java RegEx patterns (both being case-insensitive).

Access Control

A metastore's access control configuration is controlled by the access-control-type property.

The available values of this property are described below.

Property Description
READ_ONLY Read only access, creation of databases and and update/alters or other data manipulation requests to the metastore are not allowed.
READ_AND_WRITE_AND_CREATE Reads are allowed, writes are allowed on all databases, creating new databases is allowed.
READ_AND_WRITE_AND_CREATE_ON_DATABASE_WHITELIST Reads are allowed, writes are allowed on database names listed in the primary-meta-store.writable-database-white-list property, creating new databases is allowed and they are added to the white-list automatically.
READ_AND_WRITE_ON_DATABASE_WHITELIST Reads are allowed, writes are allowed on database names listed in the primary-meta-store.writable-database-white-list and federated-meta-stores[n].writable-database-white-list properties, creating new databases is not allowed.

Primary metastores can configure access-control-type to have any of the described access-control-types whereas federated metastores may only be configured to READ_ONLY and READ_AND_WRITE_ON_DATABASE_WHITELIST.

There are a number of write operations in the metastore whose requests do not contain database/table name context, and so cannot be routed to federated metastore instances configured with a writeable access control level.

These include:

  • Create database
  • Function handling: create/delete/get functions
  • Type handling: create/delete/get types
  • Keys foreign/primary
  • Locks
  • Transactions / compact
  • Security management: roles, principals, grant/revoke
  • Delegation tokens
  • Notifications
  • Config management
  • File metadata / cache

This is not an issue for general operation, but may be a problem if you are wanting to use certain specific Hive features. At this time these features cannot be supported in a writable federation model.

Federation configuration storage

Waggle Dance reads and writes federation configuration from and to a YAML file - refer to the section federation for details.

The following properties are configured in the server configuration file(waggle-dance-server.yml) and control the behaviour of the YAML federation storage:

yaml-storage:
  overwrite-config-on-shutdown: true
Property Required Description
overwrite-config-on-shutdown No Controls whether the federations configuration must be overwritten when the server is stopped. Settings this to false will cause any federations dynamically added at runtime to be lost when the server is stopped. This is also the case of databases created at runtime when database-resolution is set to MANUAL. Default is true.

Federate to AWS Glue Catalog

Waggle Dance supports federation to AWS Glue Catalog. The federation only works as read-only. Write (Create/Alter/Drop) operations are not supported very well as the Glue APIS don't expose all Hive Metastore functions for instance lock/transactions and other functions are not supported so clients might get exceptions when using certain operations (this can depend on a client like Hive, Spark, etc...). Some research has been done to allow write operations and it is not impossible with a bit more work but out of scope at the moment. The GlueConfig configuration should be used if federation to Glue is needed.

Property Required Description
glue-account-id Yes (if glueConfig used) The AWS account number.
glue-endpoint Yes (if glueConfig used) The AWS glue endpoint example: glue.us-east-1.amazonaws.com. The value is the same for all AWS accounts per region.

Example,:

glue-config:
  glue-account-id: 1234566789012
  glue-endpoint: glue.us-east-1.amazonaws.com

As with Hive federation, the IAM permissions need to be setup to read underlying data. IAM permissions are not setup by this code, but are usually setup by the Terraform code that deploys WaggleDance, such as (apiary-federation)[https://github.com/ExpediaGroup/apiary-federation].

If federating across AWS accounts, the correct (cross account federation permissions)[https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html] needs to be setup as well.
The policy giving access to the role running Waggle Dance will need at least these IAM Glue actions:

 actions = [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersions",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions"
]

Configuring a SSH tunnel

Each federation in Waggle Dance can be configured to use a SSH tunnel to access a remote Hive metastore in cases where certain network restrictions prevent a direct connection from the machine running Waggle Dance to the machine running the Thrift Hive metastore service. A SSH tunnel consists of one or more hops or jump-boxes. The connection between each pair of nodes requires a user - which if not specified defaults to the current user - and a private key to establish the SSH connection.

As outlined above the metastore-tunnel property is used to configure Waggle Dance to use a tunnel. The tunnel route expression is described with the following EBNF:

path = path part, {"->", path part}
path part = {user, "@"}, hostname
user = ? user name ?
hostname = ? hostname ?

For example, if the Hive metastore runs on the host hive-server-box which can only be reached first via bastion-host and then jump-box then the SSH tunnel route expression will be bastion-host -> jump-box -> hive-server-box. If bastion-host is only accessible by user ec2-user, jump-box by user user-a and hive-server-box by user hadoop then the expression above becomes ec2-user@bastion-host -> user-a@jump-box -> hadoop@hive-server-box.

Once the tunnel is established Waggle Dance will set up port forwarding from the local machine specified in metastore-tunnel.localhost to the remote machine specified in remote-meta-store-uris. The last node in the tunnel expression doesn't need to be the Thrift server, the only requirement is that this last node must be able to communicate with the Thrift service. Sometimes this is not possible due to firewall restrictions so in these cases they must be the same.

All the machines in the tunnel expression can be included in the known_hosts file and in this case the keys required to access each box should be set in metastore-tunnel.private-keys. For example, if bastion-host is authenticated with bastion.pem and both jump-box and hive-server-box are authenticated with emr.pem then the property must be set asmetastore-tunnel.private-keys=<path-to-ssh-keys>/bastion.pem, <path-to-ssh-keys>/emr.pem.

If all machines in the tunnel expression are not included in the known_hosts file then metastore-tunnel.strict-host-key-checking should be set to no.

To add the fingerprint of remote-box in to the known___hosts file the following command can be used:

ssh-keyscan -t rsa remote-box >> .ssh/known_hosts

The following configuration snippets show a few examples of valid tunnel expressions.

Simple tunnel to metastore server
    remote-meta-store-uris: thrift://metastore.domain:9083
    metastore-tunnel:
      route: [email protected]
      private-keys: /home/user/.ssh/user-key-pair.pem
      known-hosts: /home/user/.ssh/known_hosts
Simple tunnel to cluster node with current user
    remote-meta-store-uris: thrift://metastore.domain:9083
    metastore-tunnel:
      route: cluster-node.domain
      private-keys: /home/run-as-user/.ssh/key-pair.pem
      known-hosts: /home/run-as-user/.ssh/known_hosts
Bastion host to cluster node with different users and key-pairs
    remote-meta-store-uris: thrift://metastore.domain:9083
    metastore-tunnel:
      route: [email protected] -> [email protected]
      private-keys: /home/run-as-user/.ssh/bastionuser-key-pair.pem, /home/run-as-user/.ssh/user-key-pair.pem
      known-hosts: /home/run-as-user/.ssh/known_hosts
Bastion host to cluster node with same user
    remote-meta-store-uris: thrift://metastore.domain:9083
    metastore-tunnel:
      route: [email protected] -> [email protected]
      private-keys: /home/user/.ssh/user-key-pair.pem
      known-hosts: /home/user/.ssh/known_hosts
Bastion host to cluster node with current user
    remote-meta-store-uris: thrift://metastore.domain:9083
    metastore-tunnel:
      route: bastion-host.domain -> cluster-node.domain
      private-keys: /home/run-as-user/.ssh/run-as-user-key-pair.pem
      known-hosts: /home/run-as-user/.ssh/known_hosts
Bastion host to metastore via jump-box with different users and key-pairs
    remote-meta-store-uris: thrift://metastore.domain:9083
    metastore-tunnel:
      route: [email protected] -> [email protected] -> [email protected]
      private-keys: /home/run-as-user/.ssh/bastionuser-key-pair.pem, /home/run-as-user/.ssh/user-key-pair.pem, /home/run-as-user/.ssh/hive-key-pair.pem
      known-hosts: /home/run-as-user/.ssh/known_hosts

Metrics

Waggle Dance exposes a set of metrics that can be accessed on the /metrics end-point. These metrics include a few standard JVM, Spring and per-federation metrics which include per-metastore number of calls and invocation duration. If a Graphite server is provided in the server configuration then all the metrics will be exposed in the endpoint and Graphite.

The following snippet shows a typical Graphite configuration:

graphite:
  port: 2003
  host: graphite.domain
  prefix: aws.myservice.myapplication
  poll-interval: 1000
  poll-interval-time-unit: MILLISECONDS
Property Required Description
graphite.port No Port where Graphite listens for metrics. Defaults to 2003.
graphite.host No Hostname of the Graphite server. If not specified then no metrics will be sent to Graphite.
graphite.prefix No Graphite path prefix.
graphite.poll-time No Amount of time between Graphite polls. Defaults to 5000.
graphite.poll-time-unit No Time unit of graphite.poll-time - this is the list of allowed values. Defaults to MILLISECONDS.

Prometheus can also be used to gather metrics. This can be done by enabling the Prometheus endpoint in the configuration:

management.endpoints.web.exposure.include: health,info,prometheus

If this config is added, all endpoints that are required to be expose need to be specified. The Prometheus endpoint can be accessed at /actuator/prometheus.

Database Resolution

Waggle Dance presents a view over multiple (federated) Hive metastores and therefore could potentially encounter the same database in different metastores. Waggle Dance has two ways of resolving this situation, the choice of which can be configured in waggle-dance-server.yml via the property database-resolution. This property can have two possible values MANUAL and PREFIXED. These are explained below in more detail.

Database resolution: MANUAL

Waggle Dance can be configured to use a static list of databases in the configuration waggle-dance-federations.yml:federated-meta-stores[n].mapped-databases and primary-meta-store.mapped-databases. It is up to the user to make sure there are no conflicting database names in the primary-metastore or other federated metastores. If Waggle Dance encounters a duplicate database it will throw an error and won't start. Example configuration:

waggle-dance-server.yml:

database-resolution: MANUAL

waggle-dance-federation.yml:

primary-meta-store:
  name: primary
  remote-meta-store-uris: thrift://primaryLocalMetastore:9083
federated-meta-stores:
  - name: waggle_prod
    remote-meta-store-uris: thrift://federatedProdMetastore:9083
    mapped-databases:
    - etldata
    - mydata

Using this example Waggle Dance can be used to access all databases in the primary metastore and etldata/mydata from the federated metastore. The databases listed must not be present in the primary metastore otherwise Waggle Dance will throw an error on start up. If you have multiple federated metastores listed a database can only be uniquely configured for one metastore. Following the example configuration a query select * from etldata will be resolved to the federated metastore. Any database that is not mapped in the config is assumed to be in the primary metastore.

All non-mapped databases of a federated metastore are ignored and are not accessible.

Adding a mapped database in the configuration requires a restart of the Waggle Dance service in order to detect the new database name and to ensure that there are no clashes.

Database resolution: PREFIXED

Waggle Dance can be configured to use a prefix when resolving the names of databases in its primary or federated metastores. In this mode all queries that are issued to Waggle Dance need to be written to use fully qualified database names that start with the prefixes configured here. In the example below Waggle Dance is configured with a federated metastore with the prefix waggle_prod_. Because of this it will inspect the database names in all requests, and if any start with waggle_prod_ it will route the request to the configured matching metastore. The prefix will be removed for those requests as the underlying metastore knows nothing of the prefixes. So, the query: select * from waggle_prod_etldata.my_table will effectively be translated into this query: select * from etldata.my_table on the federated metastore. If a database is encountered that is not prefixed then the primary metastore is used to resolve the database name.

waggle-dance-server.yml:

database-resolution: PREFIXED

waggle-dance-federation.yml:

primary-meta-store:
  name: primary
  remote-meta-store-uris: thrift://primaryLocalMetastore:9083
federated-meta-stores:
  - name: federated
    database-prefix: waggle_prod_
    remote-meta-store-uris: thrift://federatedProdMetastore:9083

Note: When choosing a prefix ensure that it does not match the start of any existing database names in any of the configured metastores. To illustrate the problem this would cause, imagine you have a database in the primary metastore named "my_database" and you configure the federated metastore with the prefix my_. Waggle Dance will register the prefix and any requests for a database starting with my_ will be routed to the federated metastore even if they were intended to go to the primary metastore.

In PREFIXED mode any databases that are created while Waggle Dance is running will be automatically visible and will need to adhere to the naming rules described above (e.g. not clash with the prefix). Alternatively, Waggle Dance can be configured to use a static list of unprefixed databases in the configuration waggle-dance-federations.yml:federated-meta-stores[n].mapped-databases and primary-meta-store.mapped-databases. Example configuration:

waggle-dance-server.yml:

database-resolution: PREFIXED

waggle-dance-federation.yml:

primary-meta-store:
  name: primary
  remote-meta-store-uris: thrift://primaryLocalMetastore:9083
federated-meta-stores:
  - name: federated
    database-prefix: waggle_prod_
    remote-meta-store-uris: thrift://federatedProdMetastore:9083
    mapped-databases:
    - etldata

In this scenario, like in the previous example, the query: select * from waggle_prod_etldata.my_table will effectively be this query: select * from etldata.my_table on the federated metastore. Any other databases which exist in the metastore named federated won't be visible to clients.

Sample run through

Assumes database resolution is done by adding prefixes. If database resolution is done manually via the a list of configured databases the prefixes in this example can be ommitted.

Connect to Waggle Dance:
hive --hiveconf hive.metastore.uris=thrift://localhost:48869
Show databases in all your metastores:
hive> show databases;
OK
default
somedata
waggle_aws_dw_default
waggle_aws_dw_mydata
waggle_aws_dw_moredata
waggle_aws_dw_extredata
Time taken: 0.827 seconds, Fetched: 6 row(s)
Join two tables in different metastores:
select h.data_id, h.entity_id, p.entity_id, p.hotel_brand_name
  from waggle_aws_dw_mydata.some_data h
  join somedata.other_table p
    on h.entity_id = p.entity_id
 where h.date = '2016-05-13'
   and h.hour = 1
;

Database Name Mapping

NOTE: mapping names adds an extra layer of abstraction and we advise to use this as a temporary migration solution only. It becomes harder to debug where a virtual (remapped) table actually is coming from.

Waggle Dance allows one to refer to databases by names that are different to what they are defined in the Hive metastore via a database-name-mapping configuration. This feature can be useful when migrating data from existing databases into different environments. To clarify how this feature could be used, below is an example use case: As part of a data migration we have decided that we want to store all Hive tables related to hotel bookings in a database called booking. However we have legacy data lakes that contain booking-related tables which are stored in databases with other names e.g. 'datawarehouse'. To ease migration we want to expose the 'datawarehouse' database via both the original name and the booking name. This way consumers can start using the new name while producers migrate their scripts from the old name or vice versa. When the migration is done the mapping can be removed and the database renamed.

So in this example we have a Data lake X which federates tables from another Data lake Y. The desired end result is to show a 'datawarehouse' database and a 'booking' database in X which is proxied to the same 'datawarehouse' database in Y.

X: show databases;

datawarehouse (proxies to y - datawarehouse)
booking (proxies to y - datawarehouse)

Y: show databases;

datawarehouse

To achieve a unified view of all the booking tables in the different databases (without actually renaming them in Hive) we can configure Waggle Dance in X to map from the old to the new names like so:

federated-meta-stores:
  - remote-meta-store-uris: thrift://10.0.0.1:9083
    name: y
    mapped-databases:
    - datawarehouse
    database-name-mapping:
      datawarehouse: booking

Note: Both the 'datawarehouse' name and the mapped name 'booking' are shown in X, so the mapping adds an additional virtual database mapping to the same remote database. You can only map one extra name and you cannot map different databases to the same name. This is not allowed (will fail to load (invalid yaml)):

database-name-mapping:
  datawarehouse: booking
  datawarehouse: booking2

This is not allowed (will fail to load (invalid mapping)):

database-name-mapping:
  datawarehouse: booking
  datawarehouse2: booking

If an optional mapped-databases is used that filter is applied first and the renaming is applied after.

Endpoints

Being a Spring Boot Application, all standard actuator endpoints are supported.

e.g. Healthcheck Endpoint: http://localhost:18000/actuator/health

In addition to these Spring endpoints Waggle Dance exposes some custom endpoints which provide more detailed information. The URLs of these are logged when Waggle Dance starts up. The most notable is: http://host:18000/api/admin/federations, which returns information about the availability of the configured metastores (it can be used for troubleshooting, but it is not recommended for use as a health check).

Logging

Waggle Dance uses Log4j 2 for logging. In order to use a custom Log4j 2 XML file, the path to the logging configuration file has to be added to the server configuration YAML file:

logging:
    config: file:/home/foo/waggle-dance/conf/log4j2.xml

This only works when Waggle Dance is obtained from the compressed archive (.tar.gz) file. If the RPM version is being used, the default log file path is hardcoded. Refer to the RPM version section for more details.

Notes

  • Only the metadata communications are rerouted.
  • Access to underlying table data is still directly to the locations encoded in the metadata.
  • Users of Waggle Dance must still have the relevant authority to access the underlying table data.
  • All data processing occurs in the client cluster, not the external clusters. Data is simply pulled into the client cluster that connect to Waggle Dance.
  • Metadata operations are routed. Write and destructive operations can be performed on the local metastore. Federated metastore are by default read only but can be configured to allow write operations via Access Controls configuration.
  • When using Spark to read tables with a big number of partitions it may be necessary to set spark.sql.hive.metastorePartitionPruning=true to enable partition pruning. If this property is false Spark will try to fetch all the partitions of the tables in the query which may result on a OutOfMemoryError in Waggle Dance.
  • If a configuration file is updated and the update disappears after server shutdown, yaml-storage.overwrite-config-on-shutdown should be set to false in the federation configuration file (refer to the federation configuration storage section).

Limitations

Hive Views and prefixes

Support for views utilises Hive's query parsing utilities where in some cases the conversion of the original view query may fail. A Hive Table contains two properties that contain the view query: viewExpandedText and viewOriginalText. You can see the values of these by running the desc extended <table> statement. In order for a prefixed federated view to behave correctly Waggle Dance needs to manipulate the queries in those properties and prefix any databases it finds. It does so by calling the org.apache.hadoop.hive.ql.parse.ParseUtils class to parse the query where we discovered (and raised) a Hive issue (HIVE-19896). To work around this issue we store the parseable viewExpandedText in the viewOriginalText property if the viewOriginalText is not parseable. If the viewExpandedText is also not parseable we keep the untransformed original values. This might result in a slight discrepancy when using a view in a federated manner. If you run into this limitation please raise an issue on the Waggle Dance Mailing List.

Hive UDFs and prefixes

Hive UDFs are registered with a database. There are currently two limitations in how Waggle Dance deals with them:

  • show functions only returns UDFs that are registered in the primary metastore.
  • UDFs used in a view are not prefixed with their corresponding metastore. A workaround is to register the UDF from the federated metastore in your own (primary) metastore.

Due to the distributed nature of Waggle Dance using UDFs is not that simple. If you would like a UDF to be used from a federated metastore we'd recommend registering the code implementing it in a distributed file or object store that is accessible from any client (for example you could store the UDF's jar file on S3). See creating permanent functions in the Hive documentation.

Hive metastore filter hook

You can configure a Hive filter hook via: hive-metastore-filter-hook: filter.hook.class This class needs to be on the classpath and can be an external jar. If so the command to run needs to be updated to ensure correct class loading. This can be done by adding: -Dloader.path=<path_to_jar> Note: The database calls getDatabases and getAllDatabases, as well as getTableMeta do not support having the provided filter applied at the moment, so their result will not be modified by the filter.

Building Waggle Dance

Prerequisites

In order to build Waggle Dance, AWS Glue libraries will need to be installed locally. Please follow this installation guide to install those libraries.

Building

Waggle Dance can be built from source using Maven:

mvn clean package

This will produce a .tgz in the waggle-dance module (under waggle-dance/waggle-dance/target/) and an rpm in the waggle-dance-rpm (under waggle-dance/waggle-dance-rpm/target/rpm/waggle-dance-rpm/RPMS/noarch/). This RPM is built using the maven rpm plugin which requires the 'rpm' program to be available on the command line. On OSX this can be accomplished by using the Brew package manager like so brew install rpm.

Contact

Mailing List

If you would like to ask any questions about or discuss Waggle Dance please join our mailing list at

https://groups.google.com/forum/#!forum/waggle-dance-user

Credits

Created by Elliot West, Patrick Duin & Daniel del Castillo with thanks to: Adrian Woodhead, Dave Maughan and James Grant.

The Waggle Dance logo uses the Beetype Filled font by Adrian Candela under the Creative Commons Attribution License (CC BY).

Legal

This project is available under the Apache 2.0 License.

Copyright 2016-2024 Expedia, Inc.

waggle-dance's People

Contributors

abhimanyugupta07 avatar ananamj avatar andreeapad avatar barnharts4 avatar chergey avatar cmathiesen avatar ddcprg avatar dependabot[bot] avatar dh20 avatar dhrubajyotisadhu avatar eg-oss-ci avatar elvis0607 avatar flaming-archer avatar javsanbel2 avatar jaygreeeen avatar massdosage avatar nvitucci avatar patduin avatar pradeepbhadani avatar rickart avatar s-hcom-ci avatar teabot avatar vedantchokshi avatar zzzzming95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

waggle-dance's Issues

Improve YAML config validation error messaging

As a user of Waggle Dance
I'd like to get some meaningful error when the yml config has errors
So that I can debug it myself

The yml config is sort of fiddly at times and when for instance the federation-server.yml is not indented correctly it fails to start WD and because logging isn't initialized the user gets no feedback.
Reproduce, try to load this config with (notice the extra spaces on the last two lines):

  database-prefix: ''
  name: local
  remote-meta-store-uris: thrift://localhost:9083
federated-meta-stores:
- remote-meta-store-uris: thrift://localhost:9083
    database-prefix: remote_
    name: remote

Acceptance Criteria:

  • A bad yml config file is reject by WD with an error message (the more specific the better but an indication which file is wrong is already much better then current situation)

Refactor duplicated SSH code

As a developer
I want code that is duplicated by code present in hcommon-ssh removed from WaggleDance
So that I have less code to maintain, can fix bugs in one place etc.

Acceptance Criteria
The following are candidates for refactoring, further investigation might reveal that it's not feasible so for all of the below we should either make the change or document why we decided not to:

  • Replace com.hotels.bdp.waggledance.api.model.MetastoreTunnel with com.hotels.hcommon.ssh.SshSettings
  • If the above is successful create a similar ticket to this for CircusTrain to make the same change for com.hotels.bdp.circustrain.api.conf.MetastoreTunnel
  • Replace com.hotels.bdp.waggledance.api.validation.constraint.TunnelRoute with com.hotels.hcommon.ssh.validation.constraint.TunnelRoute and
    com.hotels.bdp.waggledance.api.validation.validator.TunnelRouteValidator with com.hotels.hcommon.ssh.validation.validator.TunnelRouteValidator
  • If the above is successful create a similar ticket to this for CircusTrain to make the same change for com.hotels.bdp.circustrain.api.validation.constraints.TunnelRoute and com.hotels.bdp.circustrain.api.validation.constraintvalidators.TunnelRouteValidator)
  • Look at TunnelableFactorySupplier and how we're using WaggleDanceHiveConfVars, could we replace some of this with com.hotels.hcommon.ssh.SshSettings or com.hotels.hcommon.hive.metastore.client.tunnelling.TunnellingMetaStoreClientSupplierBuilder? See ExpediaGroup/circus-train@0272e36 for a similar change to CircusTrain that could possibly be applied here.
  • Check whether is any scope for refactoring/removing any code in com.hotels.bdp.waggledance.client.TunnelingMetaStoreClientFactory
  • Above code tested out in our development environment in AWS via a tunnel to ensure tunnels still work correctly.

Documentation for installation is confusing

Issues

  • Sets expectation that I can directly install from a TGZ download at https://github.com/HotelsDotCom/waggle-dance/releases but this is not true
  • Does not describe how to build from TGZ, moves straight on to configuration making the assumption that the *-bin.tgz has been summoned from somewhere.
  • RPM installation is not in an 'RPM' section, or even the Installation section, but is instead hidden away in the 'Running as a Service` section.
  • Describes building an RPM, but does not describe where the RPM is generated, so makes it quite hard to locate: waggle-dance-rpm/target/rpm/waggle-dance-rpm/RPMS/noarch/
  • Does not explicitly describe in which directory the service is installed to, forcing the user to context switch immediately after install: /opt/waggle-dance/
  • Paths in README italicised but should be monospaced.

I'd suggest documenting RPM end to end, then having a section on the differences for TGZ. You can signpost this at the start of the installation section.

FAILED: ParseException line 1:7 'CREATE' 'REMOTE' 'TABLE' in ddl statement

USING Waggle Dance 2.3.7 rpm

Connected to waggle dance

hive --hiveconf hive.metastore.uris=thrift://localhost:48869.
Doesn't Like Create REMOTE table statement itself. Did anyone faced similar issue .

hive> CREATE REMOTE TABLE test
> CONNECTED TO third_db_hive.t3
> VIA 'org.apache.hadoop.hive.metastore.ThriftHiveMetastoreClientFactory'
> WITH TBLPROPERTIES (
> 'hive.metastore.uris' = 'thrift://xxxx.compute.amazonaws.com:9083'
> );
NoViableAltException(24@[846:1: ddlStatement : ( createDatabaseStatement | switchDatabaseStatement | dropDatabaseStatement | createTableStatement | dropTableStatement | truncateTableStatement | alterStatement | descStatement | showStatement | metastoreCheck | createViewStatement | createMaterializedViewStatement | dropViewStatement | dropMaterializedViewStatement | createFunctionStatement | createMacroStatement | createIndexStatement | dropIndexStatement | dropFunctionStatement | reloadFunctionStatement | dropMacroStatement | analyzeStatement | lockStatement | unlockStatement | lockDatabase | unlockDatabase | createRoleStatement | dropRoleStatement | ( grantPrivileges )=> grantPrivileges | ( revokePrivileges )=> revokePrivileges | showGrants | showRoleGrants | showRolePrincipals | showRoles | grantRole | revokeRole | setRole | showCurrentRole | abortTransactionStatement );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:3757)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2382)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
FAILED: ParseException line 1:7 cannot recognize input near 'CREATE' 'REMOTE' 'TABLE' in ddl statement

What I need to do to fix it.

Thanks

enable READ_WRITE support for federated metastores

As a user of WD
I like to run DDL on federated metastores
So that I can run ETL againt primary metastore and update federated central datalake.

Acceptance Criteria:
ability to configure access-control-type option for federated metastores with atleast following possible values READ_ONLY, READ_WRITE.

Enable custom ThriftHiveMetastore.IFace implementations

Enable configurable pluggable implementations of the ThriftHiveMetastore.IFace interface on a per (primary or federated) metastore basis.

Example of how the configuration could look like

primary-meta-store:
  access-control-type: READ_ONLY
  database-prefix: ''
  name: primary
  closeable-iface: com.company.foo.bar.MyImpl
  #com.company.foo.bar.MyImpl is a custom 
  #implementation of the CloseableThriftHiveMetastoreIface  that can forward
  #requests to some service in the background and adapt them to the Hive interface 
  #e.g. federate Glue onto hive, federate BigQuery onto hive etc by using this
  #interface as an adapter from your data store to the 
  #CloseableThriftHiveMetastoreIface  interface 

federated-meta-stores:
- access-control-type: READ_ONLY
  database-prefix: waggle_test_
  mapped-databases: []
  name: secondary
  remote-meta-store-uris: thrift://waggle-dance:48869
  #No closeable-iface implementation is specified so it defaults to Thrift 

  #Thrift Hive will be federated onto your custom implementation

Clarify what is meant by 'write' in the documentation

We make a distinction between 'write' and 'create', but understandably this may cast doubt on what behaviours 'write' encompasses. We should call out in a list something like:

  • Read: anything that does not mutate metadata.
  • Write: anything that might create or mutate metadata with the exception of the creation of databases.
  • Create: Creation of databases.

Alternatively we could create a matrix of all metadata actions by support in each access control type.

Remove io.spring.platform.platform-bom

As a developer of Waggle Dance
I want to ensure that it doesn't use any end of life libraries
So that the code is easier to maintain

Acceptance Criteria

  • io.spring.platform.platform-bom removed as a dependency from Waggle Dance and replaced with something more future proof.

Notes
Waggle Dance currently uses Spring's io.spring.platform.platform-bom to manage a number of Spring related depedencies. This is currently planned to be "end of life" by April 2019 (see https://spring.io/blog/2018/04/09/spring-io-platform-end-of-life-announcement). We should replace this with something else (the above blog post suggests using spring-boot-dependencies as a possible solution. We should also keep an eye on the project's official home page (https://platform.spring.io/platform/) to see if any other migrations are suggested. Hopefully a number of other people will have done this and we can learn from their experiences.

Increase code coverage (2)

As a developer
I'd like to be confident my tests can detect refactoring issues
So I can easily make code changes

Acceptance criteria
Code coverage should be at least 80%

Below is a list of classes which could benefit from some extra testing and should be the focus of this ticket. They are arranged (more or less) in increasing test complexity:

ValidationError
FederatedMetaStore
PrimaryMetaStore
MetastoreUnavailableException
TunnelableFactorySupplier
TunnelingMetaStoreClientFactory
NotifyingFederationService
AdvancedPropertyUtils

PrefixBasedDatabaseMappingService
WaggleDanceConfiguration
YamlStorageConfiguration (trivial)
MonitoringConfiguration

FederatedHMSHandlerFactory
FederatedHMSHandler

Persist config file during process restart

As a SysOps
I want WD to avoid overwriting local config during process restart
So that I do not lose local configs

Acceptance Criteria:

  • WD to not update/re-write config file during restart

Primary metastore should default to read/write

I didn't provide an access-control-type for my primary-meta-store. However, when I restarted the service it wrote out the following configuration:

primary-meta-store:
  access-control-type: READ_ONLY

I think this should have defaulted to some variant of READ_AND_WRITE_* for the primary, possibly:READ_AND_WRITE_AND_CREATE.

mapped-databases set to [] on server restart if no initial value provided

Reported by an internal user, I'm guessing this is the sequence of events that causes a bug:

  1. Create a federation yml file and for the "federated-meta-stores" completely omit the "mapped-databases" and "overwrite-config-on-shutdown" configuration so these get their default values.
  2. Start and then stop Waggle Dance.
  3. Check the configuration file, if the issue is reproduced then "mapped-databases" will now be in the config file and set to [] which now means no databases are being federated (instead of the default of all of them).

To fix this we need to ensure that if no mapped databases have been configured that the update of the config file on shutdown doesn't put an empty list into the file.

Use configuration-properties in clients HiveConf

I'd like to use the configuration-properties that I can set in the waggle-dance-server.yml in HiveConf that is used to create Thrift Clients.

Currently we use the properties only in the Handler (the server), it would be good if we can set properties and override settings used in the clients.
For instance see ThriftMetastoreClient it contains this code:
retries = HiveConf.getIntVar(conf, HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES);
But there is currently no way to override this cause we create the HiveConf ourselves in CloseableThriftHiveMetastoreIfaceClientFactory. Thus that values is hardcoded to the Hive default 3.

Acceptance Criteria:

  • Use theconfiguration-properties(see WaggleDanceConfiguration class) when creating the HiveConf for the clients.
  • Documents it's use in the README (it seems to be a hidden feature now).

Replace usage of Cobertura with Jacoco

As a developer
I want to use Jacoco for code coverage rather than Cobertura since the latter is pretty much a dead project (and doesn't handle Java 8 code)
So that I have a future proof code coverage solution.

Acceptance Criteria

  • Usage of Cobertura in Travis replaced with Jacoco
  • Coveralls reports checked to still work

Cleanup/Refactor FederatedHMSHandler and MappingEventListener

As a developer
I want to ensure that code is logical and clean
so that I spend less time on maintenance

Acceptance Criteria
TBD based on extracting exactly what we want to from notes below

Notes
Below is a brain dump of discussions we've had on the issue that hopefully we can extract proper acceptance criteria out of:

Notes from handover discussion with @ddcprg:

In FederatedHMSHandler we have a variable called databaseMappingService but the type is MappingEventListener - this interface extends 2 others (service and handler) and is thus both, this is confusing that its handling two different responsibilities. Service and Listener should probably be different classes, a listener and a service both pointing to same thing or should we refactor the code out into two separate classes. Perhaps we should have a listener which knows about the service and tells it do things?

Comments from @patduin in Slack:

Probably a good idea if the MappingServices are split in a MappingService and a corresponding MappingServiceListener. I find classes that implement their own listeners usually confusing as well but it is often convenient. Not sure if it is easy to do though, the listen methods need access to the fields, dunno maybe worth a shot. For example PrefixBasedDatabaseMappingService
will get split? The onX methods have synchronisation locks and use fields of the service you somehow have to replace that.

tunneling: Make strict host checking optional

As a user of Waggle Dance
I'd like to make strict host checking option
So that I can bypass any host checking issues

The JSCH library we use to setup the ssh tunnel can be configure to skip strict host checking. We ran into issue (see #32) where the encryption key wasn't supported and it would be great to have quick workarounds for this. Or at least leave it up to the user to enable or disable strict host checking.

Acceptance Criteria

  • configuration flag in the yaml to disable/enable strict host checking per tunnel.

Update parent version

The newest version of hotels-oss-parent includes the sonatype-oss-release profile with the maven-gpg-plugin, so it can be removed from this project.

Acceptance Criteria;

  • Update to version 2.0.5 of hotels-oss-parent.
  • Remove the sonatype-oss-release profile in pom.xml.

Add generic whitelisting feature for every metastore mappings

At the moment the whitelisting logic is spread across the different services and it feels like these checks could be done just before the request hits the handler.

We would like to centralize the whitelisting logic in one place and make sure all requests go through the checks before they are processed. These should also simplify the mapping services.

Status health checks doesn't support tunnels

As a user of Waggle Dance
I'd like to see the status of a metastore that is connected to via a tunnel

Acceptance Criteria:

  • Fix the STATUS check returned by the rest api to support a tunnelled connection.

From looking at the code I think it would be great (if possible) if we can reuse the connections already setup in the MetastoreMappingImpl. That way the status check is a more accurate representation of available connections.

Support Wild Cards in DB name

As a SysOps
I want WD to support regex in DB names
So that I can use regexp to whitelist multiple DBs

Acceptance Criteria:

  • interpret the db name in the whitelist as a regular expression match accordingly
  • Able to whitelist user_abc,user_def like user_.*
  • Able to whitelist ALL databases with wildcard .*

Upgrade dependency and plugin versions

As a developer
I want WaggleDance using the latest versions of various dependencies and plugins
So that I benefit from new features, bug fixes etc.

Acceptance Criteria

  • Build and all tests passing with any required changes for below version upgrades.

  • The following upgraded to their latest (or indicated) versions in waggle-dance-parent:

    • cobertura.version
    • cobertura.maven.plugin.version
    • maven.release.plugin.version
    • hive.version -> 2.3.3
    • dropwizard.version
    • aspectj-maven-plugin.version -> check whether this is actually still in use? if not remove it
    • aspectj.version -> same for this
    • beeju.version
    • hotels-oss-parent
  • The following upgraded to their latest (or indicated) versions in waggle-dance module:

    • maven-assembly-plugin.version
    • maven-install-plugin.version
    • maven-deploy-plugin.version
    • Ensure the TGZ generated is as close to identical as before this change
  • The following upgraded to their latest (or indicated) versions in waggle-dance-core module:

    • mockito.version
    • powermock.version
    • jcabi-aspects.version -> check whether this is actually still in use? if not remove it
    • hcommon-ssh.version
  • The following upgraded to their latest (or indicated) versions in waggle-dance-rpm module:

    • rpm.maven.plugin.version
    • Ensure the RPM generated is as close to identical as before this change

Refactor FederationAdminController and its use of AbstractMetastore

As a developer
I want to ensure that code is logical and clean
so that I spend less time on maintenance

Acceptance Criteria
TBD based on extracting exactly what we want to from notes below

Notes
Below is a brain dump of discussions we've had on the issue that hopefully we can extract proper acceptance criteria out of:

Notes from handover discussion with @ddcprg:

FederationAdminController-> add() takes an Abstract class as param this should really be a concrete class or an interface? There is no wayit would know what the actual type is when it came to create one. We think currently only federations() method is being called by one of our end users.

Comments from @massdosage in Slack:

seeing as nobody is really using the REST interface I'm not so bothered about the above, it could probably do with an entire refactor as all the methods take the abstract meta store class which isn't ideal
but probably only worth doing if/when someone actually wants to do something more than a health check with the rest calls.

Comments from @patduin in Slack:

Yes, this is a general thing that we donโ€™t have an interface over the AbstractMetastoreHierarchy
probably makes sense to add that. What do you say of introducing a rest model metastore class, that takes a API model metastore class in the constructor? it would decouple the API model class from being serialized in the yaml config and in the presentation layer (JSON). Weโ€™ve had issues with that before as we try to fix something for nice yml and that breaking the JSON for instance: private transient @JsonProperty @NotNull MetaStoreStatus status = MetaStoreStatus.UNKNOWN;
is a field in the AbstractMetastore, but it is only serialized for json not for the yml and hence transient which is a bit weird.

Increase code coverage (1)

As a developer
I'd like to be confident my tests can detect refactoring issues
So I can easily make code changes

Acceptance criteria
Code coverage should be at least 80%

Below is a list of classes which could benefit from some extra testing and should be the focus of this ticket. They are arranged (more or less) in increasing test complexity:

WaggleDanceException
AbstractMetaStore
Federations
CloseableThriftHiveMetastoreIfaceClientFactory
GrammarUtils
MonitoredDatabaseMappingService
AccessControlHandlerFactory

GraphiteConfiguration

StaticDatabaseMappingService
CommonVFSResource

TunnelingMetaStoreClientFactory
WaggleDanceRunner -> could at least test the builder part of this
DefaultMetaStoreClientFactory
ThriftMetastoreClient

Select * from <VIEW> on federated metastore does not work.

If I have a View View1 over Table T1 and T2 under hive database db1 in federated metastore FM1. Following command fails with Table Not Found Exception

Select * from fm1_db1.view1 ;
This will fail with Table T1 & T2 Not Found.

Additional Info:

  • WD is running in prefixed mode.

Move minimum supported Java version from 7 to 8

As a developer
I want to move Waggle Dance to use Java 8
So that I can benefit from the updated language features and not spend time maintaining backwards compatibility and also to allow certain libraries that no longer support Java 7 to be updated.

Acceptance Criteria

  • Java 8 used as jdk.version when building WaggleDance.

DESCRIBE FORMATTED fails on Federated metastore tables

Hive command DESCRIBE FORMATTED failing:

2018-04-03T10:26:49,798 INFO  com.hotels.bdp.waggledance.client.ThriftMetastoreClient:158 - Opened a connection to metastore 'thrift://cloud9-lab-main.us-west-2.hcom-data-lab.aws.hcom:9083', total current connections to all metastores: 3
2018-04-03T10:26:49,798 INFO  com.hotels.bdp.waggledance.client.ThriftMetastoreClient:193 - Connected to metastore.
2018-04-03T10:26:49,802 DEBUG com.hotels.bdp.waggledance.mapping.service.impl.PrefixBasedDatabaseMappingService:173 - Database Name `bix_mobile` maps to metastore with EMPTY_PREFIX
2018-04-03T10:26:49,936 DEBUG com.hotels.bdp.waggledance.mapping.service.impl.PrefixBasedDatabaseMappingService:173 - Database Name `bix_mobile` maps to metastore with EMPTY_PREFIX
2018-04-03T10:26:50,014 DEBUG com.hotels.bdp.waggledance.server.FederatedHMSHandler:168 - #get_foreign_keys('ForeignKeysRequest(parent_db_name:null, parent_tb..54..l_apt_performancemanagement_endpoints_sort)'): thrown javax.validation.ConstraintViolationException(may not be null) out of com.jcabi.aspects.aj.MethodValidator#checkForViolations[189] in 45ms
2018-04-03T10:26:50,018 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler:204 - javax.validation.ConstraintViolationException: may not be null
	at com.jcabi.aspects.aj.MethodValidator.checkForViolations(MethodValidator.java:189)
	at com.jcabi.aspects.aj.MethodValidator.validateMethod(MethodValidator.java:154)
	at com.jcabi.aspects.aj.MethodValidator.beforeMethod(MethodValidator.java:87)
	at com.hotels.bdp.waggledance.mapping.service.impl.MonitoredDatabaseMappingService.databaseMapping(MonitoredDatabaseMappingService.java:48)
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_foreign_keys_aroundBody566(FederatedHMSHandler.java:1384)
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure567.run(FederatedHMSHandler.java:1)
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
	at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:57)
	at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:47)
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_foreign_keys_aroundBody568(FederatedHMSHandler.java:1382)
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure569.run(FederatedHMSHandler.java:1)
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
	at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:213)
	at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1)
	at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:169)
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_foreign_keys(FederatedHMSHandler.java:1382)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.hotels.bdp.waggledance.server.ExceptionWrappingHMSHandler.invoke(ExceptionWrappingHMSHandler.java:50)
	at com.sun.proxy.$Proxy132.get_foreign_keys(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at com.sun.proxy.$Proxy132.get_foreign_keys(Unknown Source)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_foreign_keys.getResult(ThriftHiveMetastore.java:12933)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_foreign_keys.getResult(ThriftHiveMetastore.java:12917)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
	at java.lang.Thread.run(Thread.java:748)

2018-04-03T10:26:50,018 ERROR org.apache.thrift.server.TThreadPoolServer:297 - Error occurred during processing of message.
javax.validation.ConstraintViolationException: may not be null
	at com.jcabi.aspects.aj.MethodValidator.checkForViolations(MethodValidator.java:189) ~[jcabi-aspects-0.22.6.jar!/:?]
	at com.jcabi.aspects.aj.MethodValidator.validateMethod(MethodValidator.java:154) ~[jcabi-aspects-0.22.6.jar!/:?]
	at com.jcabi.aspects.aj.MethodValidator.beforeMethod(MethodValidator.java:87) ~[jcabi-aspects-0.22.6.jar!/:?]
	at com.hotels.bdp.waggledance.mapping.service.impl.MonitoredDatabaseMappingService.databaseMapping(MonitoredDatabaseMappingService.java:48) ~[classes!/:?]
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_foreign_keys_aroundBody566(FederatedHMSHandler.java:1384) ~[classes!/:?]
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure567.run(FederatedHMSHandler.java:1) ~[classes!/:?]
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjweaver-1.8.10.jar!/:1.8.10]
	at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:57) ~[classes!/:?]
	at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:47) ~[classes!/:?]
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_foreign_keys_aroundBody568(FederatedHMSHandler.java:1382) ~[classes!/:?]
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure569.run(FederatedHMSHandler.java:1) ~[classes!/:?]
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjweaver-1.8.10.jar!/:1.8.10]
	at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:213) ~[jcabi-aspects-0.22.6.jar!/:?]
	at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1) ~[jcabi-aspects-0.22.6.jar!/:?]
	at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:169) ~[jcabi-aspects-0.22.6.jar!/:?]
	at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_foreign_keys(FederatedHMSHandler.java:1382) ~[classes!/:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_171]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_171]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_171]
	at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_171]
	at com.hotels.bdp.waggledance.server.ExceptionWrappingHMSHandler.invoke(ExceptionWrappingHMSHandler.java:50) ~[classes!/:?]
	at com.sun.proxy.$Proxy132.get_foreign_keys(Unknown Source) ~[?:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_171]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_171]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_171]
	at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_171]
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-metastore-2.3.0.jar!/:2.3.0]
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-metastore-2.3.0.jar!/:2.3.0]
	at com.sun.proxy.$Proxy132.get_foreign_keys(Unknown Source) ~[?:?]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_foreign_keys.getResult(ThriftHiveMetastore.java:12933) ~[hive-metastore-2.3.0.jar!/:2.3.0]
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_foreign_keys.getResult(ThriftHiveMetastore.java:12917) ~[hive-metastore-2.3.0.jar!/:2.3.0]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[libthrift-0.9.3.jar!/:0.9.3]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.3.jar!/:0.9.3]
	at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48) ~[hive-metastore-2.3.0.jar!/:2.3.0]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [libthrift-0.9.3.jar!/:0.9.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152) [?:1.7.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) [?:1.7.0_171]
	at java.lang.Thread.run(Thread.java:748) [?:1.7.0_171]
2018-04-03T10:28:44,323 DEBUG com.hotels.bdp.waggledance.server.TTransportMonitor:66 - Releasing disconnected sessions
2018-04-03T10:28:44,326 INFO  com.hotels.bdp.waggledance.client.ThriftMetastoreClient:224 - Closed a connection to metastore, current connections: 2
2018-04-03T10:28:44,326 INFO  com.hotels.bdp.waggledance.client.ThriftMetastoreClient:224 - Closed a connection to metastore, current connections: 1
2018-04-03T10:28:44,327 INFO  com.hotels.bdp.waggledance.client.ThriftMetastoreClient:224 - Closed a connection to metastore, current connections: 0








2018-04-03T10:33:44,323 DEBUG com.hotels.bdp.waggledance.server.TTransportMonitor:66 - Releasing disconnected sessions

Describe Formatted failure occurs on the primary and the federated metastore tables:
Config

primary-meta-store:
  access-control-type: READ_ONLY
  database-prefix: ''
  name: primary
  remote-meta-store-uris: thrift://localhost:9083 
federated-meta-stores:
- access-control-type: READ_ONLY
  remote-meta-store-uris: thrift://localhost:9083   
  mapped-databases: []
  database-prefix: waggle_test_
  name: secondary

Output

hive> show databases;
OK
bdp
default
waggle_test_bdp
waggle_test_default
Time taken: 1.307 seconds, Fetched: 4 row(s)
hive> describe formatted bdp.test;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.transport.TTransportException
hive> describe formatted waggle_test_bdp.test;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.transport.TTransportException
hive> 

Restrict Read access on Federated Metastore in PREFIXED mode

As a SysOps
I want to restrict Read access to Databases in Federated Metastore when using PREFIXED mode
So that I can control visibility of databases

Acceptance Criteria:

  • If Federated Metastore has three databases - db1, db2 and db3, I can only make db1 readable for users.

Waggle Dance fails to serve when destination bastion is down/unreachable

Issue:
Waggle Dance federates to primary Metasotre A and foreign Metastore B (over SSH Tunnel through bastion host BH).
If for any reason, BH is down/unreachable or disallow SSH connection then this cause whole WD to stops serving request even for Metastore A.

Excepted Behaviour:
WD to handle the above situation gracefully and continue to serve rest of the Metastores.

Current Behaviour:
WD fails to serve all requests.

Avoid reboot to refresh config

As a SysOps
I want to have a WD to update config from local FS without need of restart
So that it can be managed easily in production environment

Acceptance Criteria:

  • Able to add/remove/update Whitelist DB without having to restart Waggle dance service.
  • Waggle Dance to refresh in-memory config from local FS config file every 5 or 10sec (maybe configurable via property)

Improve memory usage in federated metastore request/response forwards

Spike

Overview
At the moment the database mappings create copies of the Thrift objects it processes. Some of these objects can be huge and the creation of copy will demand memory for the request. As we only need the original object to determine which megastore will process the request we could mutate the current object instead of making a copy.

Acceptance criteria
Evaluate memory usage before and after the change.

Add integration test for federations over SSL

As a developer
I'd like to test that SSL connectivity
So I can make sure that federations over SSL works the same as federations over plain connectivity

Acceptance Criteria
New integration test that federates megastores over SSL

Note: this we'll depend on BeeJU ticket 8

Configure and use the HADOOP_USER_NAME in WD for all requests

As a user of WD
I'd like to issue all requests to a federated metastore as one user
So that I can implement authorization for that one user and not for all clients

Acceptance Criteria:

  • Waggle Dance configuration allows configuration for HADOOP_USER_NAME (Decide on property name, decide for all metastores or per metastore configuration?)
  • If the HADOOP_USER_NAME is set in the WD configuration then it is used and overrides any client set username.

Example my hive cli client issues all requests as user 'hadoop'. If the WD HADOOP_USER_NAME is set then the user 'hadoop' is ignored and whatever value WD is configured with is used instead.

Notes

Might help with what is discussed in https://groups.google.com/d/msg/waggle-dance-user/rA3JPimh94A/0q0ZXasjAwAJ

tunneling: Support multiple host key encryptions

As a user of Waggle Dance
I'd like to have a working tunnel where my host key is of any encryption type
So that I can seamlessly create tunnels on newer SSH installations

We've seen the JCSH library we use for tunneling fail with Caused by: com.jcraft.jsch.JSchException: UnknownHostKey on host checks when the .ssh/known_hosts file has keys encrypted with ecdsa-sha2-nistp256.

Acceptance Criteria:

  • Investigate if we can make JCSH work with all encryptions

Work around

A command line workaround is to get the RSA key using ssh-keyscan host-ip and adding that to the known_hosts file:

ssh-keyscan <host-ip> >> .ssh/known_hosts

Move RPM GPG settings to a profile

The rpm-maven-plugin GPG passphrase has been configured directly in the plugin. This value is only required for releases but with the current settings snapshots and local compilations require this value or the compilation fails.

This configuration needs to be moved out to Sonatype profile.

Internal error processing get_table_req

Here is the environment.

  • waggle-dance version : 2.3.7 (also tested on 2.3.6)
  • waggle-dance configuration
    • database-resolution: PREFIXED
    • primary-meta-store :
      • hive & metastore version : 2.3.0
      • thrift://localhost:9083
    • remote-meta-store
      • hive & metastore version : 2.1.0
      • thrift://remote-ip:9083

I can see the remote db and tables with 'prod_' prefix which I set. (I executed use prod_test;)
But when I execute a select query for a table(select * from test_1;), it makes error.

Here is the logs (and stack trace)

2018-07-25 23:25:29.500  INFO 28387 --- [           main] o.h.v.i.x.ValidationBootstrapParameters  : HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
2018-07-25 23:25:29.921  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerAdapter : Looking for @ControllerAdvice: org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@4f4a7090: startup date [Wed Jul 25 23:25:24 UTC 2018]; root of context hierarchy
2018-07-25 23:25:30.031  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/admin/federations],methods=[GET]}" onto public java.util.List<com.hotels.bdp.waggledance.api.model.AbstractMetaStore> com.hotels.bdp.waggledance.rest.endpoint.FederationsAdminController.federations()
2018-07-25 23:25:30.033  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/admin/federations],methods=[POST]}" onto public void com.hotels.bdp.waggledance.rest.endpoint.FederationsAdminController.add(com.hotels.bdp.waggledance.api.model.AbstractMetaStore)
2018-07-25 23:25:30.033  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/admin/federations/{name}],methods=[DELETE]}" onto public void com.hotels.bdp.waggledance.rest.endpoint.FederationsAdminController.remove(java.lang.String)
2018-07-25 23:25:30.033  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/admin/federations/{name}],methods=[GET]}" onto public com.hotels.bdp.waggledance.api.model.AbstractMetaStore com.hotels.bdp.waggledance.rest.endpoint.FederationsAdminController.read(java.lang.String)
2018-07-25 23:25:30.035  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/error],produces=[text/html]}" onto public org.springframework.web.servlet.ModelAndView org.springframework.boot.autoconfigure.web.BasicErrorController.errorHtml(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)
2018-07-25 23:25:30.036  INFO 28387 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/error]}" onto public org.springframework.http.ResponseEntity<java.util.Map<java.lang.String, java.lang.Object>> org.springframework.boot.autoconfigure.web.BasicErrorController.error(javax.servlet.http.HttpServletRequest)
2018-07-25 23:25:30.111  INFO 28387 --- [           main] o.s.w.s.h.SimpleUrlHandlerMapping        : Mapped URL path [/webjars/**] onto handler of type [class org.springframework.web.servlet.resource.ResourceHttpRequestHandler]
2018-07-25 23:25:30.111  INFO 28387 --- [           main] o.s.w.s.h.SimpleUrlHandlerMapping        : Mapped URL path [/**] onto handler of type [class org.springframework.web.servlet.resource.ResourceHttpRequestHandler]
2018-07-25 23:25:30.200  INFO 28387 --- [           main] o.s.w.s.h.SimpleUrlHandlerMapping        : Mapped URL path [/**/favicon.ico] onto handler of type [class org.springframework.web.servlet.resource.ResourceHttpRequestHandler]
2018-07-25 23:25:31.236  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/beans || /beans.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.238  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/env/{name:.*}],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EnvironmentMvcEndpoint.value(java.lang.String)
2018-07-25 23:25:31.239  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/env || /env.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.240  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/configprops || /configprops.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.241  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/trace || /trace.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.244  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/dump || /dump.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.246  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/autoconfig || /autoconfig.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.247  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/auditevents || /auditevents.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public org.springframework.http.ResponseEntity<?> org.springframework.boot.actuate.endpoint.mvc.AuditEventsMvcEndpoint.findByPrincipalAndAfterAndType(java.lang.String,java.util.Date,java.lang.String)
2018-07-25 23:25:31.248  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/health || /health.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.HealthMvcEndpoint.invoke(javax.servlet.http.HttpServletRequest,java.security.Principal)
2018-07-25 23:25:31.250  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/info || /info.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.251  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/metrics/{name:.*}],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.MetricsMvcEndpoint.value(java.lang.String)
2018-07-25 23:25:31.251  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/metrics || /metrics.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.252  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/mappings || /mappings.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:31.254  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/heapdump || /heapdump.json],methods=[GET],produces=[application/octet-stream]}" onto public void org.springframework.boot.actuate.endpoint.mvc.HeapdumpMvcEndpoint.invoke(boolean,javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse) throws java.io.IOException,javax.servlet.ServletException
2018-07-25 23:25:31.259  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/loggers/{name:.*}],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.LoggersMvcEndpoint.get(java.lang.String)
2018-07-25 23:25:31.259  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/loggers/{name:.*}],methods=[POST],consumes=[application/vnd.spring-boot.actuator.v1+json || application/json],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.LoggersMvcEndpoint.set(java.lang.String,java.util.Map<java.lang.String, java.lang.String>)
2018-07-25 23:25:31.260  INFO 28387 --- [           main] o.s.b.a.e.m.EndpointHandlerMapping       : Mapped "{[/loggers || /loggers.json],methods=[GET],produces=[application/vnd.spring-boot.actuator.v1+json || application/json]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.mvc.EndpointMvcAdapter.invoke()
2018-07-25 23:25:32.433  INFO 28387 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2018-07-25 23:25:32.454  INFO 28387 --- [           main] o.s.c.s.DefaultLifecycleProcessor        : Starting beans in phase 0
2018-07-25 23:25:32.808  INFO 28387 --- [           main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 18000 (http)
2018-07-25 23:25:32.814  INFO 28387 --- [           main] c.h.b.w.s.MetaStoreProxyServer           : Starting WaggleDance on port 48869
Starting WaggleDance on port 48869
2018-07-25 23:25:32.839  INFO 28387 --- [           main] c.h.b.w.s.MetaStoreProxyServer           : Starting WaggleDance Server
2018-07-25 23:25:32.846  INFO 28387 --- [           main] c.h.b.w.s.MetaStoreProxyServer           : Started the new WaggleDance on port [48869]...
2018-07-25 23:25:32.847  INFO 28387 --- [           main] c.h.b.w.s.MetaStoreProxyServer           : Options.minWorkerThreads = 200
2018-07-25 23:25:32.847  INFO 28387 --- [           main] c.h.b.w.s.MetaStoreProxyServer           : Options.maxWorkerThreads = 1000
2018-07-25 23:25:32.847  INFO 28387 --- [           main] c.h.b.w.s.MetaStoreProxyServer           : TCP keepalive = true
2018-07-25 23:25:57.209  INFO 28387 --- [pool-4-thread-1] c.h.b.w.m.m.MetaStoreMappingFactoryImpl  : Mapping databases with name 'primary' to metastore: thrift://localhost:9083
2018-07-25 23:25:57.451  INFO 28387 --- [pool-4-thread-1] c.h.b.w.m.m.MetaStoreMappingFactoryImpl  : Mapping databases with name 'prod' to metastore: thrift://10.10.127.175:9083
2018-07-25 23:25:57.732  INFO 28387 --- [pool-4-thread-1] c.j.a.a.NamedThreads                     : jcabi-aspects 0.22.6/3f0a1f7 started new daemon thread jcabi-loggable for watching of @Loggable annotated methods
2018-07-25 23:25:57.736  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Trying to connect to metastore with URI thrift://localhost:9083
2018-07-25 23:25:57.766  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Opened a connection to metastore 'thrift://localhost:9083', total current connections to all metastores: 1
2018-07-25 23:25:57.766  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Connected to metastore.
2018-07-25 23:25:57.766  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Trying to connect to metastore with URI thrift://10.10.127.175:9083
2018-07-25 23:25:57.769  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Opened a connection to metastore 'thrift://10.10.127.175:9083', total current connections to all metastores: 2
2018-07-25 23:25:57.769  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Connected to metastore.
2018-07-25 23:25:58.077  INFO 28387 --- [pool-4-thread-1] c.h.b.w.s.FederatedHMSHandler            : Fetching database prod_test
2018-07-25 23:25:58.085  INFO 28387 --- [pool-4-thread-1] o.h.v.i.x.ValidationXmlParser            : HV000007: META-INF/validation.xml found. Parsing XML based configuration.
2018-07-25 23:25:58.096  INFO 28387 --- [pool-4-thread-1] o.h.v.i.x.ValidationBootstrapParameters  : HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
2018-07-25 23:25:58.114  INFO 28387 --- [pool-4-thread-1] c.h.b.w.s.FederatedHMSHandler            : Mapping is 'prod_'
2018-07-25 23:25:58.185  INFO 28387 --- [pool-4-thread-2] c.h.b.w.m.m.MetaStoreMappingFactoryImpl  : Mapping databases with name 'primary' to metastore: thrift://localhost:9083
2018-07-25 23:25:58.188  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Closed a connection to metastore, current connections: 1
2018-07-25 23:25:58.188  INFO 28387 --- [pool-4-thread-1] c.h.b.w.c.ThriftMetastoreClient          : Closed a connection to metastore, current connections: 0
2018-07-25 23:25:58.245  INFO 28387 --- [pool-4-thread-2] c.h.b.w.m.m.MetaStoreMappingFactoryImpl  : Mapping databases with name 'prod' to metastore: thrift://10.10.127.175:9083
2018-07-25 23:25:58.281  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Trying to connect to metastore with URI thrift://localhost:9083
2018-07-25 23:25:58.282  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Opened a connection to metastore 'thrift://localhost:9083', total current connections to all metastores: 1
2018-07-25 23:25:58.282  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Connected to metastore.
2018-07-25 23:25:58.282  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Trying to connect to metastore with URI thrift://10.10.127.175:9083
2018-07-25 23:25:58.285  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Opened a connection to metastore 'thrift://10.10.127.175:9083', total current connections to all metastores: 2
2018-07-25 23:25:58.285  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Connected to metastore.
2018-07-25 23:25:58.305  INFO 28387 --- [pool-4-thread-2] c.h.b.w.s.FederatedHMSHandler            : Fetching database prod_test
2018-07-25 23:25:58.308  INFO 28387 --- [pool-4-thread-2] c.h.b.w.s.FederatedHMSHandler            : Mapping is 'prod_'
2018-07-25 23:25:58.322  INFO 28387 --- [pool-4-thread-2] c.h.b.w.s.FederatedHMSHandler            : Fetching database prod_test
2018-07-25 23:25:58.325  INFO 28387 --- [pool-4-thread-2] c.h.b.w.s.FederatedHMSHandler            : Mapping is 'prod_'
2018-07-25 23:26:13.054 ERROR 28387 --- [pool-4-thread-2] o.a.h.h.m.RetryingHMSHandler             : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.hotels.bdp.waggledance.client.DefaultMetaStoreClientFactory$ReconnectingMetastoreClientInvocationHandler.invoke(DefaultMetaStoreClientFactory.java:67)
        at com.sun.proxy.$Proxy134.get_table_req(Unknown Source)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody618(FederatedHMSHandler.java:1481)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure619.run(FederatedHMSHandler.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:57)
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:47)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody620(FederatedHMSHandler.java:1479)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure621.run(FederatedHMSHandler.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
        at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:213)
        at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1)
        at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:169)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req(FederatedHMSHandler.java:1479)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.hotels.bdp.waggledance.server.ExceptionWrappingHMSHandler.invoke(ExceptionWrappingHMSHandler.java:50)
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11457)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11441)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

2018-07-25 23:26:13.055 ERROR 28387 --- [pool-4-thread-2] o.a.t.ProcessFunction                    : Internal error processing get_table_req

org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
        at com.hotels.bdp.waggledance.client.DefaultMetaStoreClientFactory$ReconnectingMetastoreClientInvocationHandler.invoke(DefaultMetaStoreClientFactory.java:67) ~[classes!/:?]
        at com.sun.proxy.$Proxy134.get_table_req(Unknown Source) ~[?:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody618(FederatedHMSHandler.java:1481) ~[classes!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure619.run(FederatedHMSHandler.java:1) ~[classes!/:?]
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjweaver-1.8.10.jar!/:1.8.10]
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:57) ~[classes!/:?]
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:47) ~[classes!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody620(FederatedHMSHandler.java:1479) ~[classes!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure621.run(FederatedHMSHandler.java:1) ~[classes!/:?]
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjweaver-1.8.10.jar!/:1.8.10]
        at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:213) ~[jcabi-aspects-0.22.6.jar!/:?]
        at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1) ~[jcabi-aspects-0.22.6.jar!/:?]
        at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:169) ~[jcabi-aspects-0.22.6.jar!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req(FederatedHMSHandler.java:1479) ~[classes!/:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
        at com.hotels.bdp.waggledance.server.ExceptionWrappingHMSHandler.invoke(ExceptionWrappingHMSHandler.java:50) ~[classes!/:?]
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source) ~[?:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source) ~[?:?]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11457) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11441) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48) [hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [libthrift-0.9.3.jar!/:0.9.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2018-07-25 23:26:14.066  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Closed a connection to metastore, current connections: 1
2018-07-25 23:26:14.066  INFO 28387 --- [pool-4-thread-2] c.h.b.w.c.ThriftMetastoreClient          : Closed a connection to metastore, current connections: 0
2018-07-25 23:26:14.067  INFO 28387 --- [pool-4-thread-3] c.h.b.w.m.m.MetaStoreMappingFactoryImpl  : Mapping databases with name 'primary' to metastore: thrift://localhost:9083
2018-07-25 23:26:14.105  INFO 28387 --- [pool-4-thread-3] c.h.b.w.m.m.MetaStoreMappingFactoryImpl  : Mapping databases with name 'prod' to metastore: thrift://10.10.127.175:9083
2018-07-25 23:26:14.136  INFO 28387 --- [pool-4-thread-3] c.h.b.w.c.ThriftMetastoreClient          : Trying to connect to metastore with URI thrift://localhost:9083
2018-07-25 23:26:14.138  INFO 28387 --- [pool-4-thread-3] c.h.b.w.c.ThriftMetastoreClient          : Opened a connection to metastore 'thrift://localhost:9083', total current connections to all metastores: 1
2018-07-25 23:26:14.138  INFO 28387 --- [pool-4-thread-3] c.h.b.w.c.ThriftMetastoreClient          : Connected to metastore.
2018-07-25 23:26:14.138  INFO 28387 --- [pool-4-thread-3] c.h.b.w.c.ThriftMetastoreClient          : Trying to connect to metastore with URI thrift://10.10.127.175:9083
2018-07-25 23:26:14.141  INFO 28387 --- [pool-4-thread-3] c.h.b.w.c.ThriftMetastoreClient          : Opened a connection to metastore 'thrift://10.10.127.175:9083', total current connections to all metastores: 2
2018-07-25 23:26:14.141  INFO 28387 --- [pool-4-thread-3] c.h.b.w.c.ThriftMetastoreClient          : Connected to metastore.
2018-07-25 23:26:14.150 ERROR 28387 --- [pool-4-thread-3] o.a.h.h.m.RetryingHMSHandler             : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.hotels.bdp.waggledance.client.DefaultMetaStoreClientFactory$ReconnectingMetastoreClientInvocationHandler.invoke(DefaultMetaStoreClientFactory.java:67)
        at com.sun.proxy.$Proxy134.get_table_req(Unknown Source)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody618(FederatedHMSHandler.java:1481)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure619.run(FederatedHMSHandler.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:57)
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:47)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody620(FederatedHMSHandler.java:1479)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure621.run(FederatedHMSHandler.java:1)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
        at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:213)
        at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1)
        at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:169)
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req(FederatedHMSHandler.java:1479)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.hotels.bdp.waggledance.server.ExceptionWrappingHMSHandler.invoke(ExceptionWrappingHMSHandler.java:50)
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11457)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11441)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

2018-07-25 23:26:14.150 ERROR 28387 --- [pool-4-thread-3] o.a.t.ProcessFunction                    : Internal error processing get_table_req

org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
        at com.hotels.bdp.waggledance.client.DefaultMetaStoreClientFactory$ReconnectingMetastoreClientInvocationHandler.invoke(DefaultMetaStoreClientFactory.java:67) ~[classes!/:?]
        at com.sun.proxy.$Proxy134.get_table_req(Unknown Source) ~[?:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody618(FederatedHMSHandler.java:1481) ~[classes!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure619.run(FederatedHMSHandler.java:1) ~[classes!/:?]
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjweaver-1.8.10.jar!/:1.8.10]
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:57) ~[classes!/:?]
        at com.hotels.bdp.waggledance.metrics.MonitoredAspect.monitor(MonitoredAspect.java:47) ~[classes!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req_aroundBody620(FederatedHMSHandler.java:1479) ~[classes!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler$AjcClosure621.run(FederatedHMSHandler.java:1) ~[classes!/:?]
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjweaver-1.8.10.jar!/:1.8.10]
        at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:213) ~[jcabi-aspects-0.22.6.jar!/:?]
        at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1) ~[jcabi-aspects-0.22.6.jar!/:?]
        at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:169) ~[jcabi-aspects-0.22.6.jar!/:?]
        at com.hotels.bdp.waggledance.server.FederatedHMSHandler.get_table_req(FederatedHMSHandler.java:1479) ~[classes!/:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
        at com.hotels.bdp.waggledance.server.ExceptionWrappingHMSHandler.invoke(ExceptionWrappingHMSHandler.java:50) ~[classes!/:?]
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source) ~[?:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at com.sun.proxy.$Proxy137.get_table_req(Unknown Source) ~[?:?]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11457) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:11441) ~[hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.9.3.jar!/:0.9.3]
        at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48) [hive-metastore-2.3.0.jar!/:2.3.0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [libthrift-0.9.3.jar!/:0.9.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

Fix Manifest metadata logging

From the logs:
2017-08-10T15:52:41,866 DEBUG com.hotels.bdp.waggledance.manifest.ManifestAttributes:86 - Manifest location in JARs is null
2017-08-10T15:52:41,867 DEBUG com.hotels.bdp.waggledance.manifest.ManifestAttributes:135 - Could not find manifest in location file:file:/opt/waggle-dance/service/waggle-dance-core-latest-exec.jar!/BOOT-INF/classes!//META-INF/MANIFEST.MF
2017-08-10T15:52:41,867 DEBUG com.hotels.bdp.waggledance.manifest.ManifestAttributes:92 - Manifest location on disk is null
2017-08-10T15:52:41,867 DEBUG com.hotels.bdp.waggledance.manifest.ManifestAttributes:98 - Manifest location via getResource() is jar:file:/opt/waggle-dance/service/waggle-dance-core-latest-exec.jar!/META-INF/MANIFEST.MF

That's from an rpm install, not sure what happened but would be nice if we can see the Manifest attributes logged again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.