Routing rules¶
Trino Gateway includes a routing rules engine.
By default, Trino Gateway reads the X-Trino-Routing-Group
request header to
route requests. If this header is not specified, requests are sent to the
default routing group called adhoc
.
The routing rules engine feature enables you to either write custom logic to route requests based on the request info such as any of the request headers, or set a URL address to make an HTTP POST request and route based on the returned result.
Routing rules are defined in a configuration file or implemented in separate, custom service application. The connection to the separate service is configured as a URL. It can implement any dynamic rule changes or other behavior.
Enabling the routing rules engine¶
To enable the routing rules engine, find the following lines in
gateway-ha-config.yml
:
- Set
rulesEngineEnabled
totrue
, thenrulesType
asFILE
orEXTERNAL
. - If you set
rulesType: FILE
, then setrulesConfigPath
to the path to your rules config file. - If you set
rulesType: EXTERNAL
, setrulesExternalConfiguration
to the URL of an external service for routing rules processing. rulesType
is by defaultFILE
unless specified.
routingRules:
rulesEngineEnabled: true
rulesType: FILE
rulesConfigPath: "app/config/routing_rules.yml" # replace with actual path to your rules config file
rulesExternalConfiguration:
urlPath: https://router.example.com/gateway-rules # replace with your own API path
excludeHeaders:
- 'Authorization'
- 'Accept-Encoding'
- Redirect URLs are not supported.
- Optionally, add headers to the
excludeHeaders
list to exclude requests with corresponding header values from being sent in the POST request. - Check headers to exclude when making API requests, specifics depend on the network configuration.
If there is error parsing the routing rules configuration file, an error is
logged, and requests are routed using the routing group header
X-Trino-Routing-Group
as default.
Use an external service for routing rules¶
You can use an external service for processing your routing by setting the
rulesType
to EXTERNAL
and configuring the rulesExternalConfiguration
.
Trino Gateway then sends all headers, other than those specified in
excludeHeaders
, as a map in the body of a POST request to the external
service. If requestAnalyzerConfig.analyzeRequest
is set to true
,
TrinoRequestUser
and TrinoQueryProperties
are also included.
Additionally, the following HTTP information is included:
remoteUser
method
requestURI
queryString
session
remoteAddr
remoteHost
parameterMap
The external service can process the information in any way desired and must return a result with the following criteria:
- Response status code of OK (200)
- Message in JSON format
- Only one group can be returned
- If errors is not null, then query would route to default routing group adhoc
Configure routing rules with a file¶
To express and fire routing rules, we use the
easy-rules engine. These rules must be
stored in a YAML file. Rules consist of a name, description, condition, and list
of actions. If the condition of a particular rule evaluates to true
, its
actions are fired.
---
name: "airflow"
description: "if query from airflow, route to etl group"
condition: 'request.getHeader("X-Trino-Source") == "airflow"'
actions:
- 'result.put("routingGroup", "etl")'
---
name: "airflow special"
description: "if query from airflow with special label, route to etl-special group"
condition: 'request.getHeader("X-Trino-Source") == "airflow" && request.getHeader("X-Trino-Client-Tags") contains "label=special"'
actions:
- 'result.put("routingGroup", "etl-special")'
In the condition, you can access the methods of a
HttpServletRequest
object called request
. Rules may also utilize
trinoRequestUser and
trinoQueryProperties
objects, which provide information about the user and query respectively.
You must include an action of the form result.put(\"routingGroup\", \"foo\")
to trigger routing of a request that satisfies the condition to the specific
routing group. Without this action, the default adhoc group is used and the
whole routing rule is redundant.
The condition and actions are written in MVEL,
an expression language with Java-like syntax. In most cases, you can write
conditions and actions in Java syntax and expect it to work. There are some
MVEL-specific operators. For example, instead of doing a null-check before
accessing the String.contains
method like this:
condition: 'request.getHeader("X-Trino-Client-Tags") != null && request.getHeader("X-Trino-Client-Tags").contains("label=foo")'
You can use the contains
operator
If no rules match, then the request is routed to the default adhoc
routing
group.
TrinoStatus¶
The TrinoStatus
class attempts to track the current state of the configured
Trino clusters. The three possible states of these cluster are updated with
every healthcheck:
PENDING
: A Trino cluster shows this state when it is still starting up. It is treated as unhealthy byRoutingManager
, and therefore requests are not be routed to these clusters.HEALTHY
: A Trino cluster shows this state when healthchecks report the cluster as healthy and ready.RoutingManager
only routes requests to healthy clusters.UNHEALTHY
: A Trino cluster shows this state when healthchecks report the cluster as unhealthy.RoutingManager
does not route requests to unhealthy clusters.
TrinoRequestUser¶
The TrinoRequestUser
class attempts to extract user information from a
request, in the following order:
X-Trino-User
header.Authorization: Basic
header.Authorization: Bearer
header. Requires configuring an OAuth2 User Info URL.Trino-UI-Token
or__Secure-Trino-ID-Token
cookie.
Kerberos and Certificate authentication are not currently supported. If the
request contains the Authorization: Bearer
header, an attempt is made to treat
the token as a JWT and deserialize it. If this is successful, the value of the
claim named in requestAnalyzerConfig.tokenUserField
is used as the username.
By default, this is the email
claim. If the token is not a valid JWT, and
requestAnalyzerConfig.oauthTokenInfoUrl
is configured, then the token is
exchanged with the Info URL. Responses are cached for 10 minutes to avoid
triggering rate limits.
You may call trinoRequestUser.getUser()
and trinoRequestUser.getUserInfo()
in your routing rules. If user information was not successfully extracted,
trinoRequestUser.getUser()
returns an empty Optional
.
trinoRequestUser.getUserInfo()
returns an Optional<UserInfo>
, with an
OpenID Connect UserInfo
if a token is successfully exchanged with the oauthTokenInfoUrl
, and an empty
Optional
otherwise.
trinoRequestUser.userExistsAndEquals("usernameToTest")
can be used to check a
username against the extracted user. It returns false
if a user has not been
extracted.
User extraction is only available if enabled by configuring
requestAnalyzerConfig.analyzeRequest = True
TrinoQueryProperties¶
The TrinoQueryProperties
class attempts to parse the body of a request to
determine the SQL statement and other information. Note that only a
syntactic analysis is performed.
If a query references a view, then that view is not expanded, and tables
referenced by the view are not recognized. Views and materialized views are
treated as tables and added to the list of tables in all contexts, including
statements such as CREATE VIEW
.
A routing rule can call the following methods on the trinoQueryProperties
object:
String errorMessage()
: the error message, only if there was any error while creating thetrinoQueryProperties
object.boolean isNewQuerySubmission()
: boolean flag to indicate if the request is a POST to thev1/statement
query endpoint.String getQueryType()
: the class name of theStatement
, e.g.ShowCreate
. Note that these are not mapped to theResourceGroup
query types. For a full list of potential query types, see the classes in STATEMENT_QUERY_TYPESString getResourceGroupQueryType()
: the Resource Group query type, for exampleSELECT
,DATA_DEFINITION
. For a full list see queryType in the Trino documentationString getDefaultCatalog()
: the default catalog, if set. It may or may not be referenced in the actual SQLString getDefaultSchema()
: the default schema, if set. It may or may not be referenced in the actual SQLSet<String> getCatalogs()
: the set of catalogs used in the query. Includes the default catalog if used by a non-fully qualified table referenceSet<String> getSchemas()
: the set of schemas used in the query. Includes the default schema if used by a non-fully qualified table referenceSet<String> getCatalogSchemas()
the set of qualified schemas used in the query, in the formcatalog.schema
boolean tablesContains(String testName)
returns true if the query contains a reference to the tabletestName
.testName
should be fully qualified, for exampletestcat.testschema.testtable
Set<QualifiedName> getTables()
: the set of tables used in the query. These are fully qualified, any partially qualified table reference in the SQL will be qualified by the default catalog and schema.String getBody()
: the raw request body
Configuration¶
The trinoQueryProperties
are configured under the requestAnalyzerConfig
configuration node.
analyzeRequest
:
Set to True
to make trinoQueryProperties
and trinoRequestUser
available.
maxBodySize
:
By default, the max body size is 1,000,000 characters. This can be modified by
configuring maxBodySize
. If the request body is greater or equal to this
limit, Trino Gateway does not process the query. A buffer of length
maxBodySize
is allocated per query. Reduce this value if you observe
excessive garbage collection at runtime. maxBodySize
cannot be set to values
larger than 2**31-1, the maximum size of a Java String.
isClientsUseV2Format
:
Some commercial extensions to Trino use the V2 Request Structure
V2 style request structure.
Support for V2-style requests can be enabled by setting this property to true
.
If you use a commercial version of Trino, ask your vendor how to set this
configuration.
tokenUserField
:
When extracting the user from a JWT token, this field is used as the username.
By default, the email
claim is used.
oauthTokenInfoUrl
:
If configured, Trino Gateway attempts to retrieve the user info by exchanging potential authorization tokens with this URL. Responses are cached for 10 minutes to avoid triggering rate limits.
Execution of rules¶
All rules whose conditions are satisfied fire. For example, in the "airflow"
and "airflow special" example rules from the following rule priority section, a
query with source airflow
and label special
satisfies both rules. The
routingGroup
is set to etl
and then to etl-special
because of the order in
which the rules of defined. If you swap the order of the rules, then you get
etl
instead.
You can avoid this ordering issue by writing atomic rules, so any query matches exactly one rule. For example you can change the first rule to the following:
---
name: "airflow"
description: "if query from airflow, route to etl group"
condition: 'request.getHeader("X-Trino-Source") == "airflow" && request.getHeader("X-Trino-Client-Tags") == null'
actions:
- 'result.put("routingGroup", "etl")'
---
This can difficult to maintain with more rules. To have better control over the execution of rules, we can use rule priorities and composite rules. Overall, priorities, composite rules, and other constructs that MVEL support allows you to express your routing logic.
Rule priority¶
You can assign an integer value priority
to a rule. The lower this integer is,
the earlier it fires. If the priority is not specified, the priority defaults to
INT_MAX
. Following is an example with priorities:
---
name: "airflow"
description: "if query from airflow, route to etl group"
priority: 0
condition: 'request.getHeader("X-Trino-Source") == "airflow"'
actions:
- 'result.put("routingGroup", "etl")'
---
name: "airflow special"
description: "if query from airflow with special label, route to etl-special group"
priority: 1
condition: 'request.getHeader("X-Trino-Source") == "airflow" && request.getHeader("X-Trino-Client-Tags") contains "label=special"'
actions:
- 'result.put("routingGroup", "etl-special")'
Note that both rules still fire. The difference is that you are guaranteed
that the first rule (priority 0) is fired before the second rule (priority 1).
Thus routingGroup
is set to etl
and then to etl-special
, so the
routingGroup
is always etl-special
in the end.
More specific rules must be set to a lesser priority so they are evaluated last
to set a routingGroup
. To further control the execution of rules, for example
to have only one rule fire, you can use composite rules.
Composite rules¶
First, please refer to the easy-rule composite rules documentation.
The preceding section covers how to control the order of rule execution using
priorities. In addition, you can configure evaluation so that only the first
rule matched fires (the highest priority one) and the rest is ignored. You can
use ActivationRuleGroup
to achieve this:
---
name: "airflow rule group"
description: "routing rules for query from airflow"
compositeRuleType: "ActivationRuleGroup"
composingRules:
- name: "airflow special"
description: "if query from airflow with special label, route to etl-special group"
priority: 0
condition: 'request.getHeader("X-Trino-Source") == "airflow" && request.getHeader("X-Trino-Client-Tags") contains "label=special"'
actions:
- 'result.put("routingGroup", "etl-special")'
- name: "airflow"
description: "if query from airflow, route to etl group"
priority: 1
condition: 'request.getHeader("X-Trino-Source") == "airflow"'
actions:
- 'result.put("routingGroup", "etl")'
Note that the priorities have switched. The more specific rule has a higher priority, since it should fire first. A query coming from airflow with special label is matched to the "airflow special" rule first, since it's higher priority, and the second rule is ignored. A query coming from airflow with no labels does not match the first rule, and is then tested and matched to the second rule.
You can also use ConditionalRuleGroup
and ActivationRuleGroup
to implement
an if/else workflow. The following logic in pseudocode:
if source == "airflow":
if clientTags["label"] == "foo":
return "etl-foo"
else if clientTags["label"] = "bar":
return "etl-bar"
else
return "etl"
This logic can be implemented with the following rules:
name: "airflow rule group"
description: "routing rules for query from airflow"
compositeRuleType: "ConditionalRuleGroup"
composingRules:
- name: "main condition"
description: "source is airflow"
priority: 0 # rule with the highest priority acts as main condition
condition: 'request.getHeader("X-Trino-Source") == "airflow"'
actions:
- ""
- name: "airflow subrules"
compositeRuleType: "ActivationRuleGroup" # use ActivationRuleGroup to simulate if/else
composingRules:
- name: "label foo"
description: "label client tag is foo"
priority: 0
condition: 'request.getHeader("X-Trino-Client-Tags") contains "label=foo"'
actions:
- 'result.put("routingGroup", "etl-foo")'
- name: "label bar"
description: "label client tag is bar"
priority: 0
condition: 'request.getHeader("X-Trino-Client-Tags") contains "label=bar"'
actions:
- 'result.put("routingGroup", "etl-bar")'
- name: "airflow default"
description: "airflow queries default to etl"
condition: "true"
actions:
- 'result.put("routingGroup", "etl")'
If statements (MVEL Flow Control)¶
In the preceding section you see how ConditionalRuleGroup
and
ActivationRuleGroup
are used to implement an if/else
workflow. You can
use MVEL support for if
statements and other flow control. The following logic
in pseudocode:
if source == "airflow":
if clientTags["label"] == "foo":
return "etl-foo"
else if clientTags["label"] = "bar":
return "etl-bar"
else
return "etl"
This logic Can be implemented with the following rules:
---
name: "airflow rules"
description: "if query from airflow"
condition: "request.getHeader(\"X-Trino-Source\") == \"airflow\""
actions:
- "if (request.getHeader(\"X-Trino-Client-Tags\") contains \"label=foo\") {
result.put(\"routingGroup\", \"etl-foo\")
}
else if (request.getHeader(\"X-Trino-Client-Tags\") contains \"label=bar\") {
result.put(\"routingGroup\", \"etl-bar\")
}
else {
result.put(\"routingGroup\", \"etl\")
}"