Categories
数据库

Presto兼容Hive语法语义的一些改动

Presto是一款优秀的交互式查询解决方案,并且已经被诸多公司证实过,因为其数倍于Hive的查询速度的优势,团队决定引入Presto用于交互式查询场景。

但是因为迁移成本和服务器成本,Presto和Hive需要有一段长期共存的时间。我们的解决方案是,将Presto的语法和语义向Hive靠拢,整个产品对外提供Hive的语法和语义。本文介绍Presto的语法和语义向Hive兼容的一些改造工作。

语法部分

语法部分的兼容思路是修改SQL,我们定义了一个SQL Rewriter将用户发过来的SQL进行改写,因为整个产品对外提供Hive的语法标准,所以SQL Rewriter需要将Hive的语法改写成Presto的语法。

一开始的解决方案是通过Apache Calcite,因为它提供了Hive的语法的改写功能,但是我们发现它的改写器BUG太多,我们修了几个BUG后,就放弃了该方案。以下是我们修的BUG:CALCITE-3220CALCITE-3247CALCITE-3282

新的方案是自定义一个SQL语法解析器,然后修改语法。我们的方案选型是ANTLR4,因为ANTLR4支持SQL语法的修改。Spark的语法跟Hive是高度一致的,并且Spark的语法文件也是基于ANTLR4的,所以我们直接使用了Spark的语法文件。

以下是语法部分的改动:

1. Presto不支持’%’运算符

改写前:
select 1 % 2

改写后:
select mod(1, 2)

2.Presto不支持regexp、rlike语法

改写前:
select map['field'] rlike '.*'
select map['field'] regexp '.*'

改写后:
select regexp_like(map['field'], '.*')

3.数组定义方式

改写前:
select array (1, 2)

改写后:
select array [1, 2]

4.变量名的quoting方案

改写前:
select `col`

改写后:
select "col"

5.变量名是否能以数字开头

改写前:
select 1col

改写后:
select "1col"

6.Presto不支持LATERAL VIEW语法

改写前:
SELECT student, score FROM tests LATERAL VIEW explode(scores) t AS score

改写后:
SELECT student, score FROM tests cross join unnest(scores) as t (score)

7.Presto不支持cluster by、distribute by语法

改写前:
SELECT student, score FROM tests cluster by student
SELECT student, score FROM tests distribute by student

改写后:
SELECT student, score FROM tests

8.Presto不支持sort by语法

改写前:
SELECT student, score FROM tests sort by student

改写后:
SELECT student, score FROM tests order by student

9.Presto如果函数名包含’.’则需要Quoting

改写前:
select a.b(1)

改写后:
select "a.b"(1)

10.Presto的字符串常量不能用双引号Quoting

改写前:
select "stringLiteral"

改写后:
select 'stringLiteral'

11正则表达式转义方案不同

改写前:
select RLIKE( '1 2', '\\S')

改写后:
select regexp_like( '1 2', '\S')

12.Presto不支持String数据类型

改写前:
select cast (1 as string)

改写后:
select cast(1 as varchar)

语义部分

对于语义部分的处理Hive和Presto有着截然相反的味道,Hive对于语义的处理显得更“包容”,相反Presto显得更“严格”。比如数组越界,Hive会当做null处理,而Presto会中断整个查询;再比如某一个数据文件损坏或者缺失,Hive会忽略该文件,而Presto依然会中断查询。

语义部分的改写需要修改Presto内核,由于Hive和Presto截然相反的理念,所以需要改写的细节是繁多的,我们只处理了查询频率较多的一些场景:

支持查询Hive View

因为与Hive的语法语义处理不同,所以Presto是不支持Hive View查询的,但是我们改造的目标是将Presto的语法语义向Hive靠拢,所以自然改造后的Presto是可以查询Hive View的。

改造思路是将Hive view按照Presto view处理,我们的处理方式参考了社区的一个方案#9031。并且我们将上边的SQL Rewriter内嵌到了Hive view处理的流程中,以修改Hive语法。

隐式类型转换

对于隐式类型转换社区也有一个提案#116,可以看到社区坚持ANSI SQL标准,对于隐式类型转换的支持是有所顾虑的,但是支持隐式类型转换无疑是公司内部的迫切需求。

我们知道通过显示添加cast是可以实现字符串和整醒家族的类型转换的,那么思路就比较明确了,在解析SQL的时候添加cast。

改动的类是ExpressionAnalyzer,该类的作用是给AST做类型检查,目前改动点主要是:

1.算术二元运算符

select 1 + '2'
3

2.比较运算符

select 2 > '1'
true

3.between

select 2 between '1' and 3
true

4.in

// TODO check
select '2' in array(1,2)
true
空值处理

1.数组越界

select array(1,2)[-1]
NULL

2.Map中字段不存在

select map(ARRAY('1','3'), ARRAY('2','4'))['6']
NULL

3.一些数组函数的下标越界

select slice(ARRAY(1,2), 3, 1)
NULL

select element_at(ARRAY(1,2), 3)
NULL
其它语义上的改造

1.整形相除返回结果为Double类型

select 1/2
0.5

2.数组下标从零开始

select ARRAY(1,2)[0]
1

3.Map查询不准确

实践中有用户反馈查询Map类型的字段有不准确的现象,通过调查发现其实是Presto的一个bug,我们将其进行了修复,可以参考#1321

小结

通过大量线上查询统计,经过改造后的Presto的语法和语义与Hive的兼容率为99%。目前不兼容的场景主要集中在函数上,一是Hive中有大量的内置函数还没有迁移到Presto中;二是已经迁移过来的函数在隐式类型转换上还需要做进一步兼容。

版权声明:文章为作者辛勤劳动的成果,转载请注明作者与出处。

7 replies on “Presto兼容Hive语法语义的一些改动”

I precisely had to say thanks all over again. I’m not certain the things I could possibly have created in the absence of the actual hints discussed by you concerning such concern. It actually was a real difficult condition for me personally, but discovering a new professional fashion you handled it forced me to cry for delight. I’m grateful for the assistance and as well , have high hopes you comprehend what a powerful job you’re carrying out educating men and women by way of your blog. I am certain you’ve never come across all of us.

Thank you for every one of your labor on this web site. My daughter really loves making time for investigations and it’s really obvious why. Most people learn all regarding the lively mode you present reliable tactics through this web site and even inspire participation from some others on the point then our favorite princess is now starting to learn a whole lot. Take advantage of the rest of the year. You are always doing a remarkable job.

A lot of thanks for all your valuable efforts on this site. Kate loves engaging in investigation and it’s really easy to understand why. I know all concerning the dynamic medium you give important secrets on this web blog and as well as foster participation from others on that content plus our daughter is always discovering a lot. Have fun with the remaining portion of the new year. You are conducting a great job.

Needed to send you this very little word to finally thank you so much the moment again relating to the striking things you have shared on this site. This has been simply tremendously open-handed with you to present unreservedly what most people would have sold for an electronic book to help with making some profit on their own, primarily seeing that you might well have tried it in the event you wanted. These solutions in addition served to be the good way to know that some people have the identical interest really like my personal own to learn much more when considering this matter. I believe there are several more pleasant sessions in the future for many who read through your site.

A lot of thanks for all your valuable hard work on this web page. Kate loves getting into investigation and it’s really easy to understand why. I know all of the dynamic medium you make insightful solutions on this web blog and as well as foster participation from people on that idea so our girl is certainly discovering a lot of things. Have fun with the remaining portion of the new year. You are conducting a powerful job.

I wanted to post you that tiny remark to say thanks a lot over again just for the pleasant solutions you have featured in this case. It was really shockingly open-handed with you to grant publicly just what many of us might have offered for an ebook to get some dough for themselves, mostly now that you might have tried it in case you desired. These secrets as well served to be the good way to fully grasp that other people online have the identical interest much like my personal own to learn more regarding this matter. I am sure there are numerous more pleasant periods in the future for many who read your site.

Leave a Reply

Your email address will not be published. Required fields are marked *