作业失效无此元素例外错误

无此元素例外使用Apache箭头时可能发生错误

写由烟灰

2023年3月3日

问题

工作间歇性故障NoSuchElementException报错

实例栈跟踪

Py4JavaError:调用o2843计时出错: org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in stage 868.0 failed 4 times, most recent failure: Lost task 17.3 in stage 868.0 (TID 3065) (10.249.38.86 executor 6): java.util.NoSuchElementException  at org.apache.spark.sql.vectorized.ColumnarBatch$1.next(ColumnarBatch.java:69)  at org.apache.spark.sql.vectorized.ColumnarBatch$1.next(ColumnarBatch.java:58)  at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)  at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$4.next(ArrowConverters.scala:401)  at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$4.next(ArrowConverters.scala:382)  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown Source)  ...

因果

上头NoSuchElementException误差是阿帕契箭头优化问题的结果阿帕契箭头模拟列数据格式用于spark高效传输JVM和Python数据

当绿箭优化启动Py4J接口时,它有可能调用exator.next()无检查iterator.hasNext().可产生NoSuchElementException报错

求解

集成spark.databricks.pyspark.emptyArrowBatchCheck真实性内集群spark配置高山市AWS系统|休眠|GCP)

spark.databricks.pyspark.emptyArrowBatchCheck=true

赋能spark.databricks.pyspark.emptyArrowBatchCheck防位NoSuchElementException批量大小为0时出错

或选,您可禁用箭头优化方法,在集群内设置下列属性spark配置.

spark.sql.execution.arrow.pyspark.enabled=false  spark.sql.execution.arrow.enabled=false

禁用绿箭优化可能涉及性能

文章有帮助吗