Categories
数据库

Presto兼容Hive语法语义的一些改动

Presto是一款优秀的交互式查询解决方案,并且已经被诸多公司证实过,因为其数倍于Hive的查询速度的优势,团队决定引入Presto用于交互式查询场景。

但是因为迁移成本和服务器成本,Presto和Hive需要有一段长期共存的时间。我们的解决方案是,将Presto的语法和语义向Hive靠拢,整个产品对外提供Hive的语法和语义。本文介绍Presto的语法和语义向Hive兼容的一些改造工作。

语法部分

语法部分的兼容思路是修改SQL,我们定义了一个SQL Rewriter将用户发过来的SQL进行改写,因为整个产品对外提供Hive的语法标准,所以SQL Rewriter需要将Hive的语法改写成Presto的语法。

一开始的解决方案是通过Apache Calcite,因为它提供了Hive的语法的改写功能,但是我们发现它的改写器BUG太多,我们修了几个BUG后,就放弃了该方案。以下是我们修的BUG:CALCITE-3220CALCITE-3247CALCITE-3282

新的方案是自定义一个SQL语法解析器,然后修改语法。我们的方案选型是ANTLR4,因为ANTLR4支持SQL语法的修改。Spark的语法跟Hive是高度一致的,并且Spark的语法文件也是基于ANTLR4的,所以我们直接使用了Spark的语法文件。

以下是语法部分的改动:

1. Presto不支持’%’运算符

改写前:
select 1 % 2

改写后:
select mod(1, 2)

2.Presto不支持regexp、rlike语法

改写前:
select map['field'] rlike '.*'
select map['field'] regexp '.*'

改写后:
select regexp_like(map['field'], '.*')

3.数组定义方式

改写前:
select array (1, 2)

改写后:
select array [1, 2]

4.变量名的quoting方案

改写前:
select `col`

改写后:
select "col"

5.变量名是否能以数字开头

改写前:
select 1col

改写后:
select "1col"

6.Presto不支持LATERAL VIEW语法

改写前:
SELECT student, score FROM tests LATERAL VIEW explode(scores) t AS score

改写后:
SELECT student, score FROM tests cross join unnest(scores) as t (score)

7.Presto不支持cluster by、distribute by语法

改写前:
SELECT student, score FROM tests cluster by student
SELECT student, score FROM tests distribute by student

改写后:
SELECT student, score FROM tests

8.Presto不支持sort by语法

改写前:
SELECT student, score FROM tests sort by student

改写后:
SELECT student, score FROM tests order by student

9.Presto如果函数名包含’.’则需要Quoting

改写前:
select a.b(1)

改写后:
select "a.b"(1)

10.Presto的字符串常量不能用双引号Quoting

改写前:
select "stringLiteral"

改写后:
select 'stringLiteral'

11正则表达式转义方案不同

改写前:
select RLIKE( '1 2', '\\S')

改写后:
select regexp_like( '1 2', '\S')

12.Presto不支持String数据类型

改写前:
select cast (1 as string)

改写后:
select cast(1 as varchar)

语义部分

对于语义部分的处理Hive和Presto有着截然相反的味道,Hive对于语义的处理显得更“包容”,相反Presto显得更“严格”。比如数组越界,Hive会当做null处理,而Presto会中断整个查询;再比如某一个数据文件损坏或者缺失,Hive会忽略该文件,而Presto依然会中断查询。

语义部分的改写需要修改Presto内核,由于Hive和Presto截然相反的理念,所以需要改写的细节是繁多的,我们只处理了查询频率较多的一些场景:

支持查询Hive View

因为与Hive的语法语义处理不同,所以Presto是不支持Hive View查询的,但是我们改造的目标是将Presto的语法语义向Hive靠拢,所以自然改造后的Presto是可以查询Hive View的。

改造思路是将Hive view按照Presto view处理,我们的处理方式参考了社区的一个方案#9031。并且我们将上边的SQL Rewriter内嵌到了Hive view处理的流程中,以修改Hive语法。

隐式类型转换

对于隐式类型转换社区也有一个提案#116,可以看到社区坚持ANSI SQL标准,对于隐式类型转换的支持是有所顾虑的,但是支持隐式类型转换无疑是公司内部的迫切需求。

我们知道通过显示添加cast是可以实现字符串和整醒家族的类型转换的,那么思路就比较明确了,在解析SQL的时候添加cast。

改动的类是ExpressionAnalyzer,该类的作用是给AST做类型检查,目前改动点主要是:

1.算术二元运算符

select 1 + '2'
3

2.比较运算符

select 2 > '1'
true

3.between

select 2 between '1' and 3
true

4.in

// TODO check
select '2' in array(1,2)
true
空值处理

1.数组越界

select array(1,2)[-1]
NULL

2.Map中字段不存在

select map(ARRAY('1','3'), ARRAY('2','4'))['6']
NULL

3.一些数组函数的下标越界

select slice(ARRAY(1,2), 3, 1)
NULL

select element_at(ARRAY(1,2), 3)
NULL
其它语义上的改造

1.整形相除返回结果为Double类型

select 1/2
0.5

2.数组下标从零开始

select ARRAY(1,2)[0]
1

3.Map查询不准确

实践中有用户反馈查询Map类型的字段有不准确的现象,通过调查发现其实是Presto的一个bug,我们将其进行了修复,可以参考#1321

小结

通过大量线上查询统计,经过改造后的Presto的语法和语义与Hive的兼容率为99%。目前不兼容的场景主要集中在函数上,一是Hive中有大量的内置函数还没有迁移到Presto中;二是已经迁移过来的函数在隐式类型转换上还需要做进一步兼容。

版权声明:文章为作者辛勤劳动的成果,转载请注明作者与出处。

1,378 replies on “Presto兼容Hive语法语义的一些改动”

Exceptional post but I was wondering if you could write a litte more on this subject? I’d be very grateful if you could elaborate a little bit more. Appreciate it!

The meaning of Vikasa is evolution, or spiritual growth, as translated from Sanskrit. The Vikasa Yoga Method is rooted in the ancient traditions of Hatha Yoga practice. Yoga is a tool for personal growth and evolution – this is the foundation of the Vikasa Yoga philosophy.At Vikasa Yoga Retreat in Thailand, we focus on the study of yoga with an emphasis on holistic health, organic food and an inspirational environment that is conducive to personal transformation. Vikasa is more than a place or even a practice – it is a lifestyle. https://vikasayoga.com/ Vikasa Yoga Retreat in Koh Samui, Thailand is home to the Vikasa Yoga Academy that offers Yoga Teacher Training courses. If you are not ready for such a serious commitment, you can choose to just do a yoga vacation, which is our signature yoga retreat program – on your own time, available 365 days a year. If you have been living in a city and would really like to kick-start a new healthier lifestyle, our detox program may be the perfect solution perfect for that.

Definitely believe that which you stated. Your favorite reason seemed to be on the web the simplest thing to be aware of. I say to you, I definitely get irked while people consider worries that they plainly don’t know about. You managed to hit the nail upon the top and also defined out the whole thing without having side-effects , people could take a signal. Will likely be back to get more. Thanks

whoah this weblog is magnificent i love reading your posts. Keep up the great work! You recognize, a lot of individuals are hunting round for this info, you can aid them greatly.

Hello there, You have done a great job. I’ll certainly digg it and personally suggest
to my friends. I’m confident they’ll be benefited from this web site.

Hi there! This post could not be written any better!
Reading through this post reminds me of my good old room
mate! He always kept talking about this. I will forward
this article to him. Pretty sure he will have a good read.
Thanks for sharing!

First off I want to say superb blog! I had a quick question which I’d
like to ask if you do not mind. I was interested to
know how you center yourself and clear your head prior to writing.
I’ve had a difficult time clearing my mind in getting my ideas out.
I do enjoy writing but it just seems like the first 10 to 15 minutes
tend to be wasted simply just trying to figure out how to begin. Any ideas or tips?
Thanks! https://hhydroxychloroquine.com/