10.1. Migrating From Hive#
Presto uses ANSI SQL syntax and semantics, whereas Hive uses a SQL-like language called HiveQL which is loosely modeled after MySQL (which itself has many differences from ANSI SQL).
Use subscript for accessing a dynamic index of an array instead of a udf#
The subscript operator in SQL supports full expressions, unlike Hive (which only supports constants). Therefore you can write queries like:
SELECT my_array[CARDINALITY(my_array)] as last_element
FROM ...
Avoid out of bounds access of arrays#
Accessing out of bounds elements of an array will result in an exception. You can avoid this with an if
as follows:
SELECT IF(CARDINALITY(my_array) >= 3, my_array[3], NULL)
FROM ...
Use ANSI SQL syntax for arrays#
Arrays are indexed starting from 1, not from 0:
SELECT my_array[1] AS first_element
FROM ...
Construct arrays with ANSI syntax:
SELECT ARRAY[1, 2, 3] AS my_array
Use ANSI SQL syntax for identifiers and strings#
Strings are delimited with single quotes and identifiers are quoted with double quotes, not backquotes:
SELECT name AS "User Name"
FROM "7day_active"
WHERE name = 'foo'
Quote identifiers that start with numbers#
Identifiers that start with numbers are not legal in ANSI SQL and must be quoted using double quotes:
SELECT *
FROM "7day_active"
Use the standard string concatenation operator#
Use the ANSI SQL string concatenation operator:
SELECT a || b || c
FROM ...
Use standard types for CAST targets#
The following standard types are supported for CAST
targets:
SELECT
CAST(x AS varchar)
, CAST(x AS bigint)
, CAST(x AS double)
, CAST(x AS boolean)
FROM ...
In particular, use VARCHAR
instead of STRING
.
Use CAST when dividing integers#
Presto follows the standard behavior of performing integer division when dividing two integers. For example, dividing 7
by 2
will result in 3
, not 3.5
.
To perform floating point division on two integers, cast one of them to a double:
SELECT CAST(5 AS DOUBLE) / 2
Use WITH for complex expressions or queries#
When you want to re-use a complex output expression as a filter, use either an inline subquery or factor it out using the WITH
clause:
WITH a AS (
SELECT substr(name, 1, 3) x
FROM ...
)
SELECT *
FROM a
WHERE x = 'foo'
Use UNNEST to expand arrays and maps#
Presto supports UNNEST for expanding arrays and maps.
Use UNNEST
instead of LATERAL VIEW explode()
.
Hive query:
SELECT student, score
FROM tests
LATERAL VIEW explode(scores) t AS score;
Presto query:
SELECT student, score
FROM tests
CROSS JOIN UNNEST(scores) AS t (score);
Caution with datediff#
The Hive datediff
function returns the difference between the two dates in
days and is declared as:
datediff(string enddate, string startdate) -> integer
The equivalent Presto function date_diff uses a reverse order for the two date parameters and requires a unit. This has to be taken into account when migrating:
Hive query:
datediff(enddate, startdate)
Presto query:
date_diff('day', startdate, enddate)
Overwriting data on insert#
By default, INSERT
queries are not allowed to overwrite existing data. You
can use the catalog session property insert_existing_partitions_behavior
to
allow overwrites. Prepend the name of the catalog using the Hive connector, for
example hdfs
, and set the property in the session before you run the insert
query:
SET SESSION hdfs.insert_existing_partitions_behavior = 'OVERWRITE';
INSERT INTO hdfs.schema.table ...
The resulting behavior is equivalent to using INSERT OVERWRITE in Hive.