冲突目录结构错误

存储位置应使用不同路径,否则冲突目录结构可能导致报错

写由烟灰

2022年5月19日

问题

阿帕契spark作业因JaJava.Lang.AssertionError:断言失败

实例栈跟踪

achesspark.sql.streamQueryExceptive:当试图推断当前批量文件分区图时出错ac549-c4b-4e4e-9403-4793f48240unitId=4e743dda909f-4932893d04d811
          
           com/km/gold/cfy_gold/clfy_xlfy_evt:
           
            ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt]  at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:385)  at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:268)  Caused by: java.lang.RuntimeException: There was an error when trying to infer the partition schema of the current batch of files.Please provide your partition columns explicitly by using: .option('cloudFiles.partitionColumns', 'comma-separated-list')  at com.databricks.sql.fileNotification.autoIngest.CloudFilesErrors$.partitionInferenceError(CloudFilesErrors.scala:115)  at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.liftedTree1$1(CloudFilesSourceFileIndex.scala:65)  at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.partitionSpec(CloudFilesSourceFileIndex.scala:63)  at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50)  at com.databricks.sql.fileNotification.autoIngest.CloudFilesSource.getBatch(CloudFilesSource.scala:361)  ...+1导出:java.lang.AssertionError:断言失败:检测到冲突目录结构可疑路径 :
            
             ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt
             
              com/km/gold/cfy_gold/clfy_xclfy_evt/clfy_xclfy_evt有多根目录时,请单独加载并编队at scala.Predef$.assert(Predef.scala:223)  at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:204)  at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parseP
             
            
           
          

因果

存储点目录路径冲突

实例栈跟踪显示有2条相冲突目录路径

  • ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt
  • ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/clfy_x_clfy_evt

因为这些目录出现在相同的层次结构中,root或分支级更新可能导致冲突

求解

避免层目录结构中并发多项更新或在同一分区中更新

检测冲突后应多路更新或多加分区

示例目录不冲突

  • ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt=clfy_x_clfy_evt1
  • ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt=clfy_x_clfy_evt2
文章有帮助吗