数据加载和复制
的复制成
SQL命令允许您的数据文件位置加载到三角洲表。这是一个re-triable和幂等操作;文件已经被加载的源位置跳过。
请注意
更具有可伸缩性和健壮的文件摄取经验,砖建议SQL用户利用流表。
例如:数据加载到一个无模式三角洲湖表
请注意
这个特性可以在砖运行时11.0及以上。
您可以创建空的占位符三角洲表模式后推断出在一个复制成
命令:
创建表如果不存在my_table(评论<表- - - - - -描述>](TBLPROPERTIES(<表- - - - - -属性>));复制成my_table从“/道路/ /文件”FILEFORMAT=<格式>FORMAT_OPTIONS(“mergeSchema”=“真正的”)COPY_OPTIONS(“mergeSchema”=“真正的”);
上面的SQL语句是幂等的,可以调度运行摄取数据只有一次到三角洲表。
请注意
空三角洲表之外不是可用的复制成
。插入成
和合并成
不支持将数据写入无模式三角洲表。在数据插入到表中复制成
,表就可查询。
看到创建复制到目标表。
例如:设置模式和数据加载到一个三角洲湖表
下面的例子显示了如何创建一个增量表,然后使用复制成
SQL命令加载示例数据砖的数据集到桌子上。您可以运行Python的例子中,R, Scala中,或从一个SQL代码笔记本附加到一个砖集群。您还可以运行的SQL代码查询关联到一个SQL仓库在砖的SQL。
table_name=“default.loan_risks_upload”source_data=/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'source_format=“铺”火花。sql(如果存在删除表”+table_name)火花。sql(“CREATE TABLE”+table_name+”(“\“loan_id BIGINT。”+\“funded_amnt INT。”+\“paid_amnt加倍,”+\“addr_state字符串)”)火花。sql(“复制到”+table_name+\“从”+source_data+“”+\" FILEFORMAT = "+source_format)loan_risks_upload_data=火花。sql(“SELECT * FROM”+table_name)显示(loan_risks_upload_data)“‘结果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +…“‘
图书馆(SparkR)sparkR.session()table_name=“default.loan_risks_upload”source_data=“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”source_format=“铺”sql(粘贴(如果存在删除表”,table_name,9月=”“))sql(粘贴(“CREATE TABLE”,table_name,”(“,“loan_id BIGINT。”,“funded_amnt INT。”,“paid_amnt加倍,”,“addr_state字符串)”,9月=”“))sql(粘贴(“复制到”,table_name,“从”,source_data,“”," FILEFORMAT = ",source_format,9月=”“))loan_risks_upload_data=tableToDF(table_name)显示(loan_risks_upload_data)结果:# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | loan_id | funded_amnt | paid_amnt | addr_state |# + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +# | 0 | 1000 | 182.22 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 1 | 1000 | 361.19 | |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +# | 2 | 1000 | 176.26 | TX |# + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +#……
瓦尔table_name=“default.loan_risks_upload”瓦尔source_data=“/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet”瓦尔source_format=“铺”火花。sql(如果存在删除表”+table_name)火花。sql(“CREATE TABLE”+table_name+”(“+“loan_id BIGINT。”+“funded_amnt INT。”+“paid_amnt加倍,”+“addr_state字符串)”)火花。sql(“复制到”+table_name+“从”+source_data+“”+" FILEFORMAT = "+source_format)瓦尔loan_risks_upload_data=火花。表(table_name)显示(loan_risks_upload_data)/ *结果:+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| loan_id | funded_amnt | paid_amnt | addr_state |+ = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +| 0 | 1000 | 182.22 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1000 | 361.19 | |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +| 2 | 1000 | 176.26 | TX |+ - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +…* /
下降表如果存在默认的。loan_risks_upload;创建表默认的。loan_risks_upload(loan_id长整型数字,funded_amntINT,paid_amnt双,addr_state字符串);复制成默认的。loan_risks_upload从/ databricks-datasets / learning-spark-v2 /贷款/ loan-risks.snappy.parquet 'FILEFORMAT=拼花;选择*从默认的。loan_risks_upload;——结果:- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| loan_id | funded_amnt | paid_amnt | addr_state |- + = = = = = = = = = + = = = = = = = = = = = = = + = = = = = = = = = = = + = = = = = = = = = = = = +——| 0 | 1000 | 182.22 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 1 | 1000 | 361.19 | |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——| 2 | 1000 | 176.26 | TX |- + - + - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - +——……
清理,运行以下代码,删除表:
火花。sql(“删除表”+table_name)
sql(粘贴(“删除表”,table_name,9月=”“))
火花。sql(“删除表”+table_name)
下降表默认的。loan_risks_upload
额外的资源
_
常见的使用模式,包括多个的例子
复制成
对相同的三角洲表操作,明白了常见的数据加载模式使用副本。