hadoop - Presto failing to query hive table -


on emr created dataset in parquet using spark , storing on s3. able create external table , query using hive when try perform same query using presto obtain error (the part referred changes @ every run).

2016-11-13t13:11:15.165z        error   remote-task-callback-36 com.facebook.presto.execution.stagestatemachine stage 20161113_131114_00004_yp8y5.1 failed com.facebook.presto.spi.prestoexception: error opening hive split s3://my_bucket/my_table/part-r-00013-b17b4495-f407-49e0-9d15-41bb0b68c605.snappy.parquet (offset=1100508800, length=68781800): null         @ com.facebook.presto.hive.parquet.parquethiverecordcursor.createparquetrecordreader(parquethiverecordcursor.java:475)     @ com.facebook.presto.hive.parquet.parquethiverecordcursor.<init>(parquethiverecordcursor.java:247)     @ com.facebook.presto.hive.parquet.parquetrecordcursorprovider.createhiverecordcursor(parquetrecordcursorprovider.java:96)     @ com.facebook.presto.hive.hivepagesourceprovider.gethiverecordcursor(hivepagesourceprovider.java:129)     @ com.facebook.presto.hive.hivepagesourceprovider.createpagesource(hivepagesourceprovider.java:107)     @ com.facebook.presto.spi.connector.classloader.classloadersafeconnectorpagesourceprovider.createpagesource(classloadersafeconnectorpagesourceprovider.java:44)     @ com.facebook.presto.split.pagesourcemanager.createpagesource(pagesourcemanager.java:48)     @ com.facebook.presto.operator.tablescanoperator.createsourceifnecessary(tablescanoperator.java:268)     @ com.facebook.presto.operator.tablescanoperator.isfinished(tablescanoperator.java:210)     @ com.facebook.presto.operator.driver.processinternal(driver.java:375)     @ com.facebook.presto.operator.driver.processfor(driver.java:301)     @ com.facebook.presto.execution.sqltaskexecution$driversplitrunner.processfor(sqltaskexecution.java:622)     @ com.facebook.presto.execution.taskexecutor$prioritizedsplitrunner.process(taskexecutor.java:529)     @ com.facebook.presto.execution.taskexecutor$runner.run(taskexecutor.java:665)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617)     @ java.lang.thread.run(thread.java:745) caused by: java.io.eofexception     @ java.io.datainputstream.readfully(datainputstream.java:197)     @ java.io.datainputstream.readfully(datainputstream.java:169)     @ parquet.hadoop.parquetfilereader.readfooter(parquetfilereader.java:420)     @ parquet.hadoop.parquetfilereader.readfooter(parquetfilereader.java:385)     @ com.facebook.presto.hive.parquet.parquethiverecordcursor.lambda$createparquetrecordreader$0(parquethiverecordcursor.java:416)     @ com.facebook.presto.hive.authentication.nohdfsauthentication.doas(nohdfsauthentication.java:23)     @ com.facebook.presto.hive.hdfsenvironment.doas(hdfsenvironment.java:76)     @ com.facebook.presto.hive.parquet.parquethiverecordcursor.createparquetrecordreader(parquethiverecordcursor.java:416)     ... 16 more 

the parquet location constituted 128 parts - data stored on s3 , encrypted using client-side encryption kms. presto uses custom encryption-materials provider (specified using presto.s3.encryption-materials-provider) returns kmsencryptionmaterials object initialized master key. using emr 5.1.0 (hive 2.1.0, spark 2.0.1, presto 0.152.3).

does surface when encryption turned off?

there bugreport surfaced against asf s3a client (not emr one), things breaking when filesystem listed length != actual file length. is: because of encryption, file length in list > length in read.

we couldn't repro in our tests, , our conclusion anyway "filesystems must not that" (indeed, it's fundamental requirement of hadoop fs spec: listed len must equal actual length). if emr code getting wrong, it's in driver downstream code cannot expected handle


Comments

Popular posts from this blog

php - How to display all orders for a single product showing the most recent first? Woocommerce -

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

angularjs - How restrict admin panel using in backend laravel and admin panel on angular? -