Git Product home page Git Product logo

presto-gateway's Introduction

NOTE: This is a legacy version of Trino Gateway. Please refer to https://github.com/trinodb/trino-gateway for active development and updates moving forward.

presto-gateway (outdated)

A load balancer / proxy / gateway for presto compute engine.

How to setup a dev environment

Step 1: setup mysql. Install docker and run the below command when setting up first time:

docker run -d -p 3306:3306  --name mysqldb -e MYSQL_ROOT_PASSWORD=root123 -e MYSQL_DATABASE=prestogateway -d mysql:5.7

Next time onwards, run the following commands to start mysqldb

docker start mysqldb

Now open mysql console and install the presto-gateway tables:

mysql -uroot -proot123 -h127.0.0.1 -Dprestogateway

Once logged in to mysql console, please run gateway-ha-persistence.sql to populate the tables.

Build and run

Please note these steps have been verified with JDK 8 and 11. Higher versions of Java might run into unexpected issues.

run mvn clean install to build presto-gateway

Edit the config file and update the mysql db information.

cd gateway-ha/target/
java -jar gateway-ha-{{VERSION}}-jar-with-dependencies.jar server ../gateway-ha-config.yml

If you encounter a Failed to connect to JDBC URL error, this may be due to newer versions of java disabling certain algorithms when using SSL/TLS, in particular TLSv1 and TLSv1.1. This will cause Bad handshake errors when connecting to the MySQL server. To enable TLSv1 and TLSv1.1 open the following file in any editor (sudo access needed):

/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/lib/security/java.security

Search for jdk.tls.disabledAlgorithms, it should look something like this:

jdk.tls.disabledAlgorithms=SSLv3, TLSv1, TLSv1.1, RC4, DES, MD5withRSA, \
    DH keySize < 1024, EC keySize < 224, 3DES_EDE_CBC, anon, NULL, \
    include jdk.disabled.namedCurves

Remove TLSv1, TLSv1.1 and redo the above steps to build and run presto-gateway.

Now you can access load balanced presto at localhost:8080 port. We will refer to this as prestogateway.lyft.com

If you see test failures while building presto-gateway or in an IDE, please run mvn process-classes to instrument javalite models which are used by the tests . Ref javalite-examples for more details.

Gateway API

Add or update a backend

curl -X POST http://localhost:8080/entity?entityType=GATEWAY_BACKEND \
 -d '{  "name": "presto1", \
        "proxyTo": "http://presto1.lyft.com",\
        "active": true, \
        "routingGroup": "adhoc" \
    }'

curl -X POST http://localhost:8080/entity?entityType=GATEWAY_BACKEND \
 -d '{  "name": "presto2", \
        "proxyTo": "http://presto2.lyft.com",\
        "active": true, \
        "routingGroup": "adhoc" \
    }'

If the backend URL is different from the proxyTo URL (for example if they are internal vs. external hostnames). You can use the optional externalUrl field to override the link in the Active Backends page.

curl -X POST http://localhost:8080/entity?entityType=GATEWAY_BACKEND \
 -d '{  "name": "presto1", \ 
        "proxyTo": "http://presto1.lyft.com",\
        "active": true, \
        "routingGroup": "adhoc" \
        "externalUrl": "http://presto1-external.lyft.com",\
    }'

curl -X POST http://localhost:8080/entity?entityType=GATEWAY_BACKEND \
 -d '{  "name": "presto2", \ 
        "proxyTo": "http://presto2.lyft.com",\
        "active": true, \
        "routingGroup": "adhoc" \
        "externalUrl": "http://presto2-external.lyft.com",\
    }'

Get all backends behind the gateway

curl -X GET http://localhost:8080/entity/GATEWAY_BACKEND
[
    {
        "active": true,
        "name": "presto1",
        "proxyTo": "http://presto1.lyft.com",
        "routingGroup": "adhoc"
    },
    {
        "active": true,
        "name": "presto2",
        "proxyTo": "http://presto2.lyft.com",
        "routingGroup": "adhoc"
    }
]

Delete a backend from the gateway

curl -X POST -d "presto3" http://localhost:8080/gateway/backend/modify/delete

Deactivate a backend

curl -X POST http://localhost:8080/gateway/backend/deactivate/presto2

Get all active backend behind the Gateway

curl -X GET http://localhost:8080/gateway/backend/active | python -m json.tool

    [{
        "active": true,
        "name": "presto1",
        "proxyTo": "http://presto1.lyft.com",
        "routingGroup": "adhoc"
    }]

Activate a backend

curl -X POST http://localhost:8080/gateway/backend/activate/presto2

Query History UI - check query plans etc.

PrestoGateway records history of recent queries and displays links to check query details page in respective presto cluster. prestogateway.lyft.com

Gateway Admin UI - add and modify backend information

The Gateway admin page is used to configure the gateway to multiple backends. Existing backend information can also be modified using the same. prestogateway.lyft.com/entity

Resource Groups API

For resource group and selector apis, we can now specify a query parameter with the request supporting multiple presto databases for different presto backends. This allows a user to configure a db for every presto backend with their own resource groups and selector tables. To use this, just specify the query parameter ?useSchema= to the request. Example, to list all resource groups,

curl -X GET http://localhost:8080/presto/resourcegroup/read/{INSERT_ID_HERE}?useSchema=newdatabasename

Add a resource group

To add a single resource group, specify all relevant fields in the body. Resource group id should not be specified since the database should autoincrement it.

curl -X POST http://localhost:8080/presto/resourcegroup/create \
 -d '{  
        "name": "resourcegroup1", \
        "softMemoryLimit": "100%", \
        "maxQueued": 100, \
        "softConcurrencyLimit": 100, \
        "hardConcurrencyLimit": 100, \
        "schedulingPolicy": null, \
        "schedulingWeight": null, \
        "jmxExport": null, \
        "softCpuLimit": null, \
        "hardCpuLimit": null, \
        "parent": null, \
        "environment": "test" \
    }'

Get existing resource group(s)

If no resourceGroupId (type long) is specified, then all existing resource groups are fetched.

curl -X GET http://localhost:8080/presto/resourcegroup/read/{INSERT_ID_HERE}

Update a resource group

Specify all columns in the body, which will overwrite properties for the resource group with that specific resourceGroupId.

curl -X POST http://localhost:8080/presto/resourcegroup/update \
 -d '{  "resourceGroupId": 1, \
        "name": "resourcegroup_updated", \
        "softMemoryLimit": "80%", \
        "maxQueued": 50, \
        "softConcurrencyLimit": 40, \
        "hardConcurrencyLimit": 60, \
        "schedulingPolicy": null, \
        "schedulingWeight": null, \
        "jmxExport": null, \
        "softCpuLimit": null, \
        "hardCpuLimit": null, \
        "parent": null, \
        "environment": "test" \
    }'

Delete a resource group

To delete a resource group, specify the corresponding resourceGroupId (type long).

curl -X POST http://localhost:8080/presto/resourcegroup/delete/{INSERT_ID_HERE}

Add a selector

To add a single selector, specify all relevant fields in the body. Resource group id should not be specified since the database should autoincrement it.

curl -X POST http://localhost:8080/presto/selector/create \
 -d '{  
        "priority": 1, \
        "userRegex": "selector1", \
        "sourceRegex": "resourcegroup1", \
        "queryType": "insert" \
     }'

Get existing selectors(s)

If no resourceGroupId (type long) is specified, then all existing selectors are fetched.

curl -X GET http://localhost:8080/presto/selector/read/{INSERT_ID_HERE}

Update a selector

To update a selector, the existing selector must be specified with all relevant fields under "current". The updated version of that selector is specified under "update", with all relevant fields included. If the selector under "current" does not exist, a new selector will be created with the details under "update". Both "current" and "update" must be included to update a selector.

curl -X POST http://localhost:8080/presto/selector/update \
 -d '{  "current": {
            "resourceGroupId": 1, \
            "priority": 1, \
            "userRegex": "selector1", \
            "sourceRegex": "resourcegroup1", \
            "queryType": "insert" \
        },
        "update":  {
            "resourceGroupId": 1, \
            "priority": 2, \
            "userRegex": "selector1_updated", \
            "sourceRegex": "resourcegroup1", \
            "queryType": null \
        }
}'

Delete a selector

To delete a selector, specify all relevant fields in the body.

curl -X POST http://localhost:8080/presto/selector/delete \
 -d '{  "resourceGroupId": 1, \
        "priority": 2, \
        "userRegex": "selector1_updated", \
        "sourceRegex": "resourcegroup1", \
        "queryType": null \
     }'

Add a global property

To add a single global property, specify all relevant fields in the body.

curl -X POST http://localhost:8080/presto/globalproperty/create \
 -d '{
        "name": "cpu_quota_period", \
        "value": "1h" \
     }'

Get existing global properties

If no name (type String) is specified, then all existing global properties are fetched.

curl -X GET http://localhost:8080/presto/globalproperty/read/{INSERT_NAME_HERE}

Update a global property

Specify all columns in the body, which will overwrite properties for the global property with that specific name.

curl -X POST http://localhost:8080/presto/globalproperty/update \
 -d '{
        "name": "cpu_quota_period", \
        "value": "2h" \
     }'

Delete a global property

To delete a global property, specify the corresponding name (type String).

curl -X POST http://localhost:8080/presto/globalproperty/delete/{INSERT_NAME_HERE}

Graceful shutdown

Presto gateway supports graceful shutdown of Presto clusters. Even when a cluster is deactivated, any submitted query states can still be retrieved based on the Query ID.

To graceful shutdown a Presto cluster without query losses, the steps are:

  1. Set the backend to deactivate state, this prevents any new incoming queries from getting assigned to the backend.
  2. Poll the Presto backend coorinator URL until the queued query count and the running query count both hit 0.
  3. Terminate the Presto Coordinator & Worker Java process.

To gracefully shutdown a single worker process, see this for the operations.

Routing Rules Engine

By default, presto-gateway reads the X-Trino-Routing-Group request header to route requests. If this header is not specified, requests are sent to default routing group (adhoc).

The routing rules engine feature enables you to write custom logic to route requests based on the request info such as any of the request headers. Routing rules are separated from presto-gateway application code to a configuration file, allowing for dynamic rule changes.

Defining your routing rules

To express and fire routing rules, we use the easy-rules engine. These rules should be stored in a YAML file. Rules consist of a name, description, condition, and list of actions. If the condition of a particular rule evaluates to true, its actions are fired.

---
name: "airflow"
description: "if query from airflow, route to etl group"
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\""
actions:
  - "result.put(\"routingGroup\", \"etl\")"
---
name: "airflow special"
description: "if query from airflow with special label, route to etl-special group"
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\" && request.getHeader(\"X-Trino-Client-Tags\") contains \"label=special\""
actions:
  - "result.put(\"routingGroup\", \"etl-special\")"

In the condition, you can access the methods of a HttpServletRequest object called request. There should be at least one action of the form result.put(\"routingGroup\", \"foo\") which says that if a request satisfies the condition, it should be routed to foo.

The condition and actions are written in MVEL, an expression language with Java-like syntax. In most cases, users can write their conditions/actions in Java syntax and expect it to work. There are some MVEL-specific operators that could be useful though. For example, instead of doing a null-check before accessing the String.contains method like this:

condition: "request.getHeader(\"X-Trino-Client-Tags\") != null && request.getHeader(\"X-Trino-Client-Tags\").contains(\"label=foo\")"

You can use the contains operator

condition: "request.getHeader(\"X-Trino-Client-Tags\") contains \"label=foo\""

If no rules match, then request is routed to adhoc.

Execution of Rules

All rules whose conditions are satisfied will fire. For example, in the "airflow" and "airflow special" example rules given above, a query with source airflow and label special will satisfy both rules. The routingGroup is set to etl and then to etl-special because of the order in which the rules of defined. If we swap the order of the rules, then we would possibly get etl instead, which is undesirable.

One could solve this by writing the rules such that they're atomic (any query will match exactly one rule). For example we can change the first rule to

---
name: "airflow"
description: "if query from airflow, route to etl group"
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\" && request.getHeader(\"X-Trino-Client-Tags\") == null"
actions:
  - "result.put(\"routingGroup\", \"etl\")"
---

This could be hard to maintain as we add more rules. To have better control over the execution of rules, we could use rule priorities and composite rules. Overall, with priorities, composite rules, and the constructs that MVEL support, you should likely be able to express your routing logic.

Rule Priority

We can assign an integer value priority to a rule. The lower this integer is, the earlier it will fire. If the priority is not specified, the priority is defaulted to INT_MAX. We can add priorities to our airflow and airflow special rule like so:

---
name: "airflow"
description: "if query from airflow, route to etl group"
priority: 0
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\""
actions:
  - "result.put(\"routingGroup\", \"etl\")"
---
name: "airflow special"
description: "if query from airflow with special label, route to etl-special group"
priority: 1
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\" && request.getHeader(\"X-Trino-Client-Tags\") contains \"label=special\""
actions:
  - "result.put(\"routingGroup\", \"etl-special\")"

Note that both rules will still fire. The difference is that we've guaranteed that the first rule (priority 0) is fired before the second rule (priority 1). Thus routingGroup is set to etl and then to etl-special, so the routingGroup will always be etl-special in the end.

Above, the more specific rules have less priority since we want them to be the last to set routingGroup. This is a little counterintuitive. To further control the execution of rules, for example to have only one rule fire, we can use composite rules.

Composite Rules

First, please refer to easy-rule composite rules docs: https://github.com/j-easy/easy-rules/wiki/defining-rules#composite-rules

Above, we saw how to control the order of rule execution using priorities. In addition to this, we could have only the first rule matched to be fired (the highest priority one) and the rest ignored. We can use ActivationRuleGroup to achieve this.

---
name: "airflow rule group"
description: "routing rules for query from airflow"
compositeRuleType: "ActivationRuleGroup"
composingRules:
  - name: "airflow special"
    description: "if query from airflow with special label, route to etl-special group"
    priority: 0
    condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\" && request.getHeader(\"X-Trino-Client-Tags\") contains \"label=special\""
    actions:
      - "result.put(\"routingGroup\", \"etl-special\")"
  - name: "airflow"
    description: "if query from airflow, route to etl group"
    priority: 1
    condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\""
    actions:
      - "result.put(\"routingGroup\", \"etl\")"

Note that the priorities have switched. The more specific rule has a higher priority, since we want it to be fired first. A query coming from airflow with special label is matched to the "airflow special" rule first, since it's higher priority, and the second rule is ignored. A query coming from airflow with no labels does not match the first rule, and is then tested and matched to the second rule.

We can also use ConditionalRuleGroup and ActivationRuleGroup to implement an if/else workflow. The following logic in pseudocode:

if source == "airflow":
  if clientTags["label"] == "foo":
    return "etl-foo"
  else if clientTags["label"] = "bar":
    return "etl-bar"
  else
    return "etl"

Can be implemented with these rules:

name: "airflow rule group"
description: "routing rules for query from airflow"
compositeRuleType: "ConditionalRuleGroup"
composingRules:
  - name: "main condition"
    description: "source is airflow"
    priority: 0 # rule with the highest priority acts as main condition
    condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\""
    actions:
      - ""
  - name: "airflow subrules"
    compositeRuleType: "ActivationRuleGroup" # use ActivationRuleGroup to simulate if/else
    composingRules:
      - name: "label foo"
        description: "label client tag is foo"
        priority: 0
        condition: "request.getHeader(\"X-Trino-Client-Tags\") contains \"label=foo\""
        actions:
          - "result.put(\"routingGroup\", \"etl-foo\")"
      - name: "label bar"
        description: "label client tag is bar"
        priority: 0
        condition: "request.getHeader(\"X-Trino-Client-Tags\") contains \"label=bar\""
        actions:
          - "result.put(\"routingGroup\", \"etl-bar\")"
      - name: "airflow default"
        description: "airflow queries default to etl"
        condition: "true"
        actions:
          - "result.put(\"routingGroup\", \"etl\")"
If statements (MVEL Flow Control)

Above, we saw how we can use ConditionalRuleGroup and ActivationRuleGroup to implement and if/else workflow. We could also take advantage of the fact that MVEL supports if statements and other flow control (loops, etc). The following logic in pseudocode:

if source == "airflow":
  if clientTags["label"] == "foo":
    return "etl-foo"
  else if clientTags["label"] = "bar":
    return "etl-bar"
  else
    return "etl"

Can be implemented with these rules:

---
name: "airflow rules"
description: "if query from airflow"
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\""
actions:
  - "if (request.getHeader(\"X-Trino-Client-Tags\") contains \"label=foo\") {
      result.put(\"routingGroup\", \"etl-foo\")
    }
    else "if (request.getHeader(\"X-Trino-Client-Tags\") contains \"label=bar\") {
      result.put(\"routingGroup\", \"etl-bar\")
    }
    else {
      result.put(\"routingGroup\", \"etl\")
    }"

Enabling routing rules engine

To enable routing rules engine, find the following lines in gateway-ha-config.yml. Set rulesEngineEnabled to True and rulesConfigPath to the path to your rules config file.

routingRules:
  rulesEngineEnabled: true
  rulesConfigPath: "src/test/resources/rules/routing_rules.yml" # replace with path to your rules config file

Contributing

Want to help build Presto Gateway? Check out our contributing documentation

References ✨

Lyft

Pinterest

Zomato

Shopify

Electronic Arts

{{Your org here}}

presto-gateway's People

Contributors

akhurana001 avatar alexey-fin avatar amitds1997 avatar axelsteingrimsson avatar bhuwanchopra avatar butterflysky avatar chets25 avatar derekheldtwerle avatar electrum avatar endoplasmicr avatar gustavoatt avatar hamlet-lee avatar hecris avatar hereisharish avatar hitejinder avatar jchoi614 avatar keith avatar nishantrayan avatar ogrockimatthew avatar pluies avatar puneetjaiswal avatar reverson avatar riteshvaryani avatar rohit-menon avatar rongfengliang avatar ssanthanam185 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

presto-gateway's Issues

implement TLS/SSL support in the gw-server

We should add the following config under GwConfig to enable SSL/TLS:

  ssl: true
  keystorePath: /path/to/keystore.jks
  keystorePass: keystorePass
  truststorePath: /path/to/truststore.jks
  truststorePass: truststorePass

Make presto-gateway HA

Presto gateway supports multinode setup as of today, but adding following would make it true HA:

  1. Persisting cluster state - an HAGatewayManager could persist active backend state to a global cache/DB.

  2. HA History Manager - currently last 2k query history is saved in ring buffer on the gateway node. We should off load it to global cache/db.

  3. Query id to backend mapping - Along with keeping this mapping in local cache, we should back it up in a global cache/db so that other gateway nodes dont have to query all presto backends/clusters in case of cache miss.

Missing null query check in query response

For an incompatible client, Trino/Presto throw exception :

Basic authentication or X-Trino-User must be sent // Trino
Basic authentication or X-Presto-User must be sent // Presto

and sends no Query Id information.

We are currently submitting the query for addition to Query History without checking if the Query Id is null.

which leads to an Exception

java.sql.SQLException: Field 'query_id' doesn't have a default value

Random "Error opening rules configuration file" when using Routing Rule Engine

I'm getting random "Error opening rules configuration file" when using Routing Rule Engine. It only happens when I have 2 or more clients sending requests to presto-gateway.

ERROR [2022-04-04 21:03:57,026] com.lyft.data.gateway.ha.router.RoutingGroupSelector$Logger: Error opening rules configuration file, using routing group header as default.
! org.yaml.snakeyaml.parser.ParserException: while parsing a block node
!  in 'reader', line 13, column 1:
!     
!     ^
! expected the node content, but found '<block end>'
!  in 'reader', line 13, column 1:
!     
!     ^
! 
! at org.yaml.snakeyaml.parser.ParserImpl.parseNode(ParserImpl.java:482)
! at org.yaml.snakeyaml.parser.ParserImpl.access$1300(ParserImpl.java:117)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockNode.produce(ParserImpl.java:359)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceEntry.produce(ParserImpl.java:506)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceFirstEntry.produce(ParserImpl.java:496)
! at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158)
! at org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:148)
! at org.yaml.snakeyaml.composer.Composer.checkNode(Composer.java:78)
! at org.yaml.snakeyaml.constructor.BaseConstructor.checkData(BaseConstructor.java:123)
! at org.yaml.snakeyaml.Yaml$1.hasNext(Yaml.java:489)
! at org.jeasy.rules.support.reader.YamlRuleDefinitionReader.loadRules(YamlRuleDefinitionReader.java:71)
! at org.jeasy.rules.support.reader.AbstractRuleDefinitionReader.read(AbstractRuleDefinitionReader.java:43)
! at org.jeasy.rules.mvel.MVELRuleFactory.createRules(MVELRuleFactory.java:100)
! at com.lyft.data.gateway.ha.router.RoutingGroupSelector.lambda$byRoutingRulesEngine$1(RoutingGroupSelector.java:42)
! at com.lyft.data.gateway.ha.handler.QueryIdCachingProxyHandler.rewriteTarget(QueryIdCachingProxyHandler.java:123)
! at com.lyft.data.proxyserver.ProxyServletImpl.rewriteTarget(ProxyServletImpl.java:65)
...

Using:
java version "1.8.0_202"
Python 3.7.13
trino-python-client 0.3.11.0

rules.yaml:

---
name: 'airflow'
description: 'if query from airflow, route to etl group'
condition: 'request.getHeader("X-Trino-Source") == "trino-python-client"'
actions:
  - 'result.put("routingGroup", "adhoc")'

python client script:

#!/usr/bin/python

from trino.dbapi import connect
import time

conn = connect(
    host='localhost',
    port=8090,
    user='user1',
    catalog='system',
    schema='runtime',
)
cur = conn.cursor()

while 1 < 2:
  cur.execute('SELECT * from nodes limit 1')
  rows = cur.fetchall()
  print (rows)

Other Observations:

  • When the error is raised, presto-gateway routes request to default RoutingGroup "adhoc". If RoutingGroup "adhoc" does not exist, client request fails.
  • CPU usage keeps increasing over time on the JVM. I stopped my python clients after 30mins, but CPU usage was still at 25%. If I leave it running for days, it will eventually eat up all my CPU cores.

After further investigation, I narrowed it down to MVELRuleFactory class. I'm not a developer so I can't really explain what is happening but the following fixes the issue.

--- a/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
+++ b/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
@@ -34,11 +34,11 @@ public interface RoutingGroupSelector {
    * to determine the right routing group.
    */
   static RoutingGroupSelector byRoutingRulesEngine(String rulesConfigPath) {
-    RulesEngine rulesEngine = new DefaultRulesEngine();
-    MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
 
     return request -> {
       try {
+        RulesEngine rulesEngine = new DefaultRulesEngine();
+        MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
         Rules rules = ruleFactory.createRules(
             new FileReader(rulesConfigPath));
         Facts facts = new Facts();

It ran for hours without any errors and CPU usage was stable.
If anyone can provide more input, It would be appreciated.

Getting '401 Unauthorized' in ActiveClusterMonitor

After registering a trino backend, I'm able to make queries via presto-gateway but I'm getting these messages in the logs:
WARN [2022-04-25 06:00:24,403] com.lyft.data.gateway.ha.clustermonitor.ActiveClusterMonitor: Received non 200 response, response code: 401

I'm running trino version 371 with no UI authentication. The trino admin UI asks for a username (any username works) and generates a JWT auth token. This token is missing when we are making the API call to cluster from presto-gateway to get the cluster health information and hence we're getting 401.

Am I missing something here?

com.h2database:h2 XML External Entity (XXE) Injection

We put the project through Snyk, and it reported a vulnerability. Can we take a look at it?

Introduced through
com.h2database:[email protected] and org.javalite:[email protected]
Exploit maturity: PROOF OF CONCEPT

Detailed paths
Introduced through: com.lyft.data:[email protected] › com.h2database:[email protected]
Fix: No remediation path available.
Introduced through: com.lyft.data:[email protected] › org.javalite:[email protected] › com.h2database:[email protected]
Fix: No remediation path available.
Overview
com.h2database:h2 is a database engine

Affected versions of this package are vulnerable to XML External Entity (XXE) Injection via the org.h2.jdbc.JdbcSQLXML class object, when it receives parsed string data from org.h2.jdbc.JdbcResultSet.getSQLXML() method. If it executes the getSource() method when the parameter is DOMSource.class it will trigger the vulnerability.

How to debug / start this project in the IntelliJ Idea

I want to use com.lyft.data.gateway.ha.ActiveClusterMonitor‘s start() method. I config this in the config file.
When I add activeClusterMonitor = new ActiveClusterMonitor();
in the com.lyft.data.gateway.ha.module.HaGatewayProviderModule
like this

  @Provides
  @Singleton
  public ActiveClusterMonitor getActiveClusterMonitor() {
    return activeClusterMonitor;
  }

In the com.lyft.data.gateway.ha.router.RoutingManager , I Inject activeClusterMonitor like this

@Inject ActiveClusterMonitor activeClusterMonitor

In the com.lyft.data.gateway.ha.router.RoutingManager.provideAdhocBackendmethod, I use the activeClusterMonitor's map<String, Boolean> .Map contains backends' helathy condition.
Then, I get the NullPointerException in the com.lyft.data.gateway.ha.ActiveClusterMonitor,

List<ProxyBackendConfiguration> activeClusters =
                  gatewayBackendManager.getAllActiveBackends();

This gatewayBackendManager is null!!!
I want to know why, plan to debug this project in my idea IDE. But I don't know how to do this. Please help me, th~

QueryIdCachingProxyHandler.java extractQueryIdIfPresent logic error

extractQueryIdIfPresent methed in QueryIdCachingProxyHandler.java get queryid may get error id when statement in different statu, this will cause query routing error
like this

/v1/statement/queued/20190823_150041_00020_bm7k5/xfe619af6efe0400296a6f29f80ce67ab/1

should do more check for path /v1/statement/..... if contains different statu

Add support for Trino

PrestoSQL has been rebranded as Trino. Starting with version 351, client protocol headers have been renamed to start with X-Trino instead of X-Presto (source).

Impact

This will make Presto Gateway incompatible does not make the Presto Gateway incompatible with the newer versions of Trino since Presto Gateway simply forwards the client headers to the backend server. So, one will be fine as long as one uses a client library that send X-Trino-User header in their request.

X-Presto-Source and X-Presto-User, are currently used to gather user and source which is used for load balancing and query history maintenance. But because they are not present in newer versions, no info about the user and the source of the query is recorded.

So, to support newer versions of Trino, we have to

  1. Check if X-Presto-User and X-Presto-Source headers are present in the request.
  2. If not, check if X-Trino-User or X-Trino-Source headers are present.

Add transaction support in presto-gateway

At present, gateway performs routing based on query id. This issue is to extend it to add txn id as well in routing state.

How transactions are set:

  1. Client sends start transaction query
  2. Backend sends response as
{
queryId: "xxx",
nextUri: "http://localhost:8080/xxx/abc/1"
.
.
}
  1. Now client sends read request on nextUri http://localhost:8080/xxx/abc/1
  2. Server sets X-Presto-Started-Transaction-Id header in response.
  3. Client sets X-Presto-Transaction-Id in subsequent queries till rollback or commit command is sent.
  4. When rollback or commit is sent by client, server adds X-Presto-Clear-Transaction-Id=true in response header and client clears the transaction from its session.

DBeaver support or guide to connect using datagrip

Hi All, Thanks for this wonderfull project.
We would like to know if gateway supports DBeaver client ( via presto-jdbc driver).
Also if there is a guide how to connect to gateway using clients ( example DataGrip or SuperSet) or slack/google channel to chat.
Thanks!

call the "add selector" api has exception.

request body

{
    "priority": 1,
    "userRegex": "selector1",
    "sourceRegex": "rs1",
    "queryType": "select"
}

backend exception

 Causing: org.javalite.activejdbc.DBException: java.sql.SQLIntegrityConstraintViolationException: Cannot add or update a child row: a foreign key constraint fails (`prestogateway`.`selectors`, CONSTRAINT `selectors_ibfk_1` FOREIGN KEY (`resource_group_id`) REFERENCES `resource_groups` (`resource_group_id`)), query: INSERT INTO selectors (priority, query_type, resource_group_id, source_regex, user_regex) VALUES (?, ?, ?, ?, ?), params: 1, select, 0, rs1, selector1
! at org.javalite.activejdbc.DB.exec(DB.java:633)
! at org.javalite.activejdbc.Model.insert(Model.java:2816)
! at com.lyft.data.gateway.ha.persistence.dao.Selectors.create(Selectors.java:66)
! at com.lyft.data.gateway.ha.router.HaResourceGroupsManager.createSelector(HaResourceGroupsManager.java:133)
! at com.lyft.data.gateway.ha.resource.PrestoResource.createSelector(PrestoResource.java:109)
! at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
! at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
! at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at java.lang.reflect.Method.invoke(Method.java:498)
! at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
! at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
! at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
! at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
! at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
! at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
! at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
! at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
! at io.dropwizard.jetty.NonblockingServletHolder.handle(NonblockingServletHolder.java:49)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
! at io.dropwizard.servlets.ThreadNameFilter.doFilter(ThreadNameFilter.java:35)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.handle(AllowedMethodsFilter.java:45)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.doFilter(AllowedMethodsFilter.java:39)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
! at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
! at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
! at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
! at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
! at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
! at com.codahale.metrics.jetty9.InstrumentedHandler.handle(InstrumentedHandler.java:239)
! at io.dropwizard.jetty.RoutingHandler.handle(RoutingHandler.java:52)
! at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:666)
! at io.dropwizard.jetty.BiDiGzipHandler.handle(BiDiGzipHandler.java:67)
! at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:56)
! at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
! at org.eclipse.jetty.server.Server.handle(Server.java:531)
! at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
! at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
! at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
! at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
! at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
! at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:754)
! at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:672)
! at java.lang.Thread.run(Thread.java:748)

I guess the reason

The selector property resource_group_id is 0. But resource_group_id=0 not present in db. In the resource_groups tables the resource_group_id start 1.

Cannot not extract queryId exactly in code, is that a bug?

Hi, thanks for your repo, it give me a lot of inspiration.
But I found a bug, this is code segment:

protected String extractQueryIdIfPresent(HttpServletRequest request) {

It use a overload method extractQueryIdIfPresent(path, queryParams), and the test case TestQueryIdCachingProxyHandler test extractQueryIdIfPresent(path, queryParams).

But if the request url is "/ui/query.html?20200416_160256_03078_6b4yt", path will be "/ui/query.html" and queryParams will be "20200416_160256_03078_6b4yt", in this case, we cannot not extract queryId exactly and get null

Logic is not the same as test case.

String queryId = QueryIdCachingProxyHandler.extractQueryIdIfPresent(path, null);

I think it's a bug, isn't it?

What is the use of "jmxExport" in ResourceGroup

Can we have an example for the field jmxExport in the ResourceGroup table?

I see that it is a boolean flag denoting whether jmx is enabled for the resource group however it is not referenced elsewhere in the package and also it seems the jmx endpoint to obtain metrics is missing

Build docker image

It would be useful configuring a Docker build for this repo and publishing to hub.docker.com or some other public repo.

I've got a Dockerfile I'm using myself to build this, I'll be happy to raise a PR with this and the build settings if you think this can be useful.

Include prometheus metrics

Hi there

There is an intention to include in the presto-gateway a prometheus support, would be easier to monitor things like number of actives backends and their status, request per second/minute, uptime, and so on.

Thank you

Presto cluster stats broken when redirecting from the query UI

When a user navigates in this flow:

  1. Go to the presto-gateway UI - http://presto-gateway
  2. Click on a query, which redirects to the presto query UI - http://presto-gateway/ui/query.html
  3. Click on the presto logo on the top left, which redirects to the presto cluster UI - http://presto-gateway/ui,

the cluster view is broken with the v1/cluster API returning 404 . (Image attached below)

This is because the v1/cluster endpoint is currently not whitelisted.
image

Add plug-able routing rule engine

We should supply a dummy routing rule engine that can be plugged into Routing ProxyHandler.
A user should be able to override this with a custom rule engine.

Please help me understand how resourcegroup works here

Hi all,

I understood resource groups in Presto. But after I reviewed the code of presto-gateway, I can only see the CURD API for resource group and selector. I didn't find any clue about how it works in presto-gateway. So I am confused. Anyone can give me a hint? Thanks.

Add quartz scheduler app in Presto-gateway to schedule activate/deactivating backend presto clusters

Add quartz scheduler app in Presto-gateway for easy activate/deactivating backend scheduling.

Currently we have scheduled presto cluster up/down through AWS's ScheduledActions feature. And few minutes before presto cluster is scheduled to go down, we deactivate it on presto-gateway through an external cron.

With quartz scheduler app we can remove the dependency on external cron.
And we can add activate/deactivate schedule as part of BackendProxyServer config.

SSL support

It's not obvious whether presto-gateway currently support SSL-enabled Presto or not. Because Presto-Gateway need to parse the query, it needs to terminate the client's SSL connection by itself before forwarding the request to a Presto coordinator.

I suppose one cannot simply add an SSL backend and expects it to works.

Ehcache Memory Leak

Problem:
The memory usage of presto-gateway continuously increases when activeJDBC is used. This is causing the need for regular restarts of our presto-gateway instances to free the heap.

Solution:
After doing some research we found that this is likely due to the older ehcache version 2 being used by activeJDBC. (https://groups.google.com/forum/#!topic/ehcache-users/SelySlrLp18)

This is supported by a heap dump which shows the majority of the heap being used by 63,000 unreleased instances of a statistics object from ehcache.

I have a pull request ready to merge in to fix this issue.

Screen Shot 2020-02-18 at 4 14 14 PM

presto_cluster_state.json file not found

I am trying this repo on my laptop and run it for the first time, got this error:

ERROR [2019-07-10 23:06:30,233] com.lyft.data.gateway.router.impl.GatewayBackendManagerImpl: Could not read previous backend cluster state
! java.io.FileNotFoundException: /var/log/prestoproxy/cache/presto_cluster_state.json (No such file or directory)
!

Backend external URL is not persisted

When a new backend is added externalUrl is not persisted in the database. When the backend is used the proxyTo field is used instead of externalUrl. Persisting externalUrl in the database should fix the problem

presto-gateway support low-latency HA

We use presto-gateway as proxy front of the Presto cluster. Presto-gateway stores query_id and next_url in the memory.

If presto-gateway is down, the mapping relationship between query_id and next_url is lost. Although presto can get mapping relationship by using mysql, it is too slow.

We deploy presto-gateway on the k8s, it is common for presto-gateway instances to restart. So we stores mapping relationship in the Redis. Presto-gateway becomes stateless service.

There are some advantages

  • Horizontal expansion
  • Upgrade presto-gateway transparent to users
  • Tolerate N-1 instance failures

I think this feature can make Presto-gateway highly available. Now I want to contribute this feature to the presto-gateway.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.