问题
工作间歇性故障NoSuchElementException报错
实例栈跟踪
Py4JavaError:调用o2843计时出错: org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in stage 868.0 failed 4 times, most recent failure: Lost task 17.3 in stage 868.0 (TID 3065) (10.249.38.86 executor 6): java.util.NoSuchElementException at org.apache.spark.sql.vectorized.ColumnarBatch$1.next(ColumnarBatch.java:69) at org.apache.spark.sql.vectorized.ColumnarBatch$1.next(ColumnarBatch.java:58) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$4.next(ArrowConverters.scala:401) at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$4.next(ArrowConverters.scala:382) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown Source) ...
因果
上头NoSuchElementException误差是阿帕契箭头优化问题的结果阿帕契箭头模拟列数据格式用于spark高效传输JVM和Python数据
当绿箭优化启动Py4J接口时,它有可能调用exator.next()无检查iterator.hasNext().可产生NoSuchElementException报错
求解
集成spark.databricks.pyspark.emptyArrowBatchCheck至真实性内集群spark配置高山市AWS系统|休眠|GCP)
spark.databricks.pyspark.emptyArrowBatchCheck=true
赋能spark.databricks.pyspark.emptyArrowBatchCheck防位NoSuchElementException批量大小为0时出错
或选,您可禁用箭头优化方法,在集群内设置下列属性spark配置.
spark.sql.execution.arrow.pyspark.enabled=false spark.sql.execution.arrow.enabled=false
禁用绿箭优化可能涉及性能