问题
阿帕契spark作业因JaJava.Lang.AssertionError:断言失败
实例栈跟踪
achesspark.sql.streamQueryExceptive:当试图推断当前批量文件分区图时出错ac549-c4b-4e4e-9403-4793f48240unitId=4e743dda909f-4932893d04d811com/km/gold/cfy_gold/clfy_xlfy_evt: ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:385) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:268) Caused by: java.lang.RuntimeException: There was an error when trying to infer the partition schema of the current batch of files.Please provide your partition columns explicitly by using: .option('cloudFiles.partitionColumns', 'comma-separated-list') at com.databricks.sql.fileNotification.autoIngest.CloudFilesErrors$.partitionInferenceError(CloudFilesErrors.scala:115) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.liftedTree1$1(CloudFilesSourceFileIndex.scala:65) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.partitionSpec(CloudFilesSourceFileIndex.scala:63) at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSource.getBatch(CloudFilesSource.scala:361) ...+1导出:java.lang.AssertionError:断言失败:检测到冲突目录结构可疑路径 : ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt com/km/gold/cfy_gold/clfy_xclfy_evt/clfy_xclfy_evt有多根目录时,请单独加载并编队at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:204) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parseP
因果
存储点目录路径冲突
实例栈跟踪显示有2条相冲突目录路径
-
://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt -
://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/clfy_x_clfy_evt
因为这些目录出现在相同的层次结构中,root或分支级更新可能导致冲突
求解
避免层目录结构中并发多项更新或在同一分区中更新
检测冲突后应多路更新或多加分区
示例目录不冲突
-
://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt=clfy_x_clfy_evt1 -
://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt=clfy_x_clfy_evt2