Symptom:-
Job run through Hivecli will fail with below error on hive console:-Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:243)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:198)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:184)
at org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:572)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:541)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:458)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
]
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:458)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:243)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:354)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:198)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:184)
at org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:572)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:541)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:458)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:165)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:536)
... 9 more
Caused by: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveWritableObject(ParquetStringInspector.java:52)
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveWritableObject(ParquetStringInspector.java:28)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:337)
at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$TextKeyWrapper.copyKey(KeyWrapperFactory.java:220)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:779)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:693)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
... 15 more
From the above logs we see that , file 000002_0 under the location /user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/ lead to the error.
Now we need to find the actual type of rating column in the file 000002_0 . To find the actual type we can use the parquet tool to extract the metadata of the file during the file creation.
- Job run through Beeline, job fails with below on console:-
INFO : 2017-10-16 12:20:02,421 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 29.08 sec
INFO : MapReduce Total cumulative CPU time: 29 seconds 80 msec
ERROR : Ended Job = job_1508179782656_0005 with errors
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
But when checked with container logs by running (yarn logs -applicationId <application id>) the attempts will be failed with above mentioned stack trace.
Diagnostics
Sometimes it is important to outline the steps taken to narrow-down the specific issue that the article attempts to solve. Capture these diagnostics here.
- This issue happens when parquet files are created by different query engine like pig/spark etc and Hive being used to query those files using external table.
- Create a Hive external table on parquet data which is already recreated by other engine like spark or pig.
- Create a partition on the table, this issue can also be repro without partition tables.
ex:- CREATE EXTERNAL TABLE `trade_eod_test02201709021`(
`userid` bigint,
`movieid` bigint,
`rating` string,
`timestap` bigint)
PARTITIONED BY (`businessdate` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe';
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat';
LOCATION 'maprfs:/user/mapr/trade_eod_test02201709021'
- Since the table is partitioned,you can load the new data directly into partition using the below:-
LOAD INPATH ':///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/' OVERWRITE INTO TABLE trade_eod_test02201709021 PARTITION (businessdate ='2017-08-20')
you can find the table data under the following location.
you can find the table data under the following location.
hadoop fs -ls maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20
-rwxrwxrwx 3 mapr mapr 105376038 2017-09-17 14:43 maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000000_0
-rwxrwxrwx 3 mapr mapr 101323580 2017-09-17 14:43 maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000001_0
-rwxrwx--- 3 root root 105378438 2017-09-17 15:43 maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000002_0
- Query the created Hive table using like select distinct on the column. Query fails with mentioned error
ex:-select distinct rating from trade_eod_test02201709021;
error:-java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
- Query the created Hive table using like select count(*) on the column. Query goes fine.
Ex:-select distinct count(*) from trade_eod_test02201709021
Total MapReduce CPU Time Spent: 56 seconds 820 msec
OK
50199464
- Since the Select count(*) goes fine over the Select distinct column, We are suspecting type issue here, meaning unexpected type for the column rating over declare column type in the Hive.
Root Cause
Examples: 1. Jobtracker configuration is incorrect in /opt/mapr/hue/hue-<version>/desktop/conf/hue.ini 2. No active Jobtracker for the cluster.
Since the query failed for particular column,Now we need to identify the column type and in this case column rating is string type based on Hive Table definition. Later we need to identity file, which lead to this error. This can be done by looking at the container logs
2017-10-16 13:11:17,182 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing filemaprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000002_0
2017-10-16 13:11:17,182 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = maprfs:/user/mapr/tmp/hive/mapr/05db8316-f375-4822-a920-58112c50fd2c/hive_2017-10-16_12-18-26_807_7065711258481051372-2/-mr-10004/b58555d3-b5d3-48f3-bf52-56480a6d8d69/map.xml
2017-10-16 13:11:17,198 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77)
2017-10-16 13:11:17,182 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = maprfs:/user/mapr/tmp/hive/mapr/05db8316-f375-4822-a920-58112c50fd2c/hive_2017-10-16_12-18-26_807_7065711258481051372-2/-mr-10004/b58555d3-b5d3-48f3-bf52-56480a6d8d69/map.xml
2017-10-16 13:11:17,198 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77)
From the above logs we see that , file 000002_0 under the location /user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/ lead to the error.
Now we need to find the actual type of rating column in the file 000002_0 . To find the actual type we can use the parquet tool to extract the metadata of the file during the file creation.
[root@vm3 ~]# hadoop jar parquet-tools-1.6.0.jar schema maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000002_0
message hive_schema {
optional int64 userid;
optional int64 movieid;
optional int64 rating;
optional int64 timestap;
}
message hive_schema {
optional int64 userid;
optional int64 movieid;
optional int64 rating;
optional int64 timestap;
}
From above o/p we see that rating was created as int/bigint instead of string in the file 000002_0.
Where as the well-formed file like 000001_0, rating column is created as string which is same as table definition in Hive.
Where as the well-formed file like 000001_0, rating column is created as string which is same as table definition in Hive.
[root@vm3 ~]# hadoop jar parquet-tools-1.6.0.jar schema maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000001_0
message hive_schema {
optional int64 userid;
optional int64 movieid;
optional binary rating (UTF8);
optional int64 timestap;
message hive_schema {
optional int64 userid;
optional int64 movieid;
optional binary rating (UTF8);
optional int64 timestap;
Solutions
The file maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000002_0 which received int/bigint from source need to corrected at source itself. Meaning the file maprfs:///user/mapr/trade_eod_test02201709021/businessdate=2017-08-20/000002_0 need to be created with type same as column type in Hive which is String from source. Corrected files need to be reloaded back into Hive table.
OR
Table definition in Hive needs to be changed according new needs