Databricks Runtime 6.2 for ML(不支持)

Databricks于2019年12月发布了这张图片。

Databricks Runtime 6.2 for Machine Learning为机器学习和数据科学提供了一个现成的环境Databricks Runtime 6.2(不支持).Databricks Runtime ML包含许多流行的机器学习库,包括TensorFlow, PyTorch, Keras和XGBoost。它还支持使用Horovod进行分布式深度学习训练。

有关更多信息,包括创建Databricks Runtime ML集群的说明,请参见Databricks运行时机器学习

新功能

Databricks Runtime 6.2 ML是建立在Databricks Runtime 6.2之上的。有关Databricks Runtime 6.2中的新功能的信息,请参见Databricks Runtime 6.2(不支持)发行说明。

改进

升级的机器学习库

  • TensorFlow和TensorBoard: 1.14.0到1.15.0。有两个已知的问题:

    • 你可能需要在函数中显式地导入tensorflow模块,以避免PySpark、HorovodRunner、HyperOpt和其他机器学习库中的pickle问题。

    • TensorBoard中的“投影仪”页签为空白。作为一种变通方法,直接访问投影仪页面,可以进行替换#投影仪在URL中数据/插件/投影仪/ projector_binary.html

  • Keras: 2.2.4到2.2.5。

    请注意

    如果你在Keras上使用TensorFlow后端,Databricks建议使用tf.keras代替。

  • PyTorch: 1.2.0到1.3.0。

  • tensorboardX: 1.8到1.9。

    请注意

    由于PyTorch现在正式支持TensorBoard,我们将在下一个主要版本中删除tensorboardX。

  • MLflow: 1.3.0到1.4.0。

    • Keras和TensorFlow自记录和Keras模型持久化api现在与TensorFlow 2.0兼容。

    • get_runget_experimentget_experiment_by_name功能

  • Hyperopt: 0.2-db1与Databricks MLflow集成。

  • mleap- databicks -runtime到0.15.0,包括mleap-xgboost-runtime。

SparkTrials增加了对广播变量的支持

以前,带SparkTrials的Hyperopt不能与PySpark广播变量一起使用。现在,广播变量可以包含在函数中fn传递给fmin ()

的用法

除了在Databricks Runtime 6.2中已弃用外,以下软件包也已弃用,并将在下一个主要版本中删除:

  • TensorFrames。使用熊猫UDF代替。

  • Python包中的一些模块和类sparkdl.主要有:

    • sparkdl。HorovodEstimator.使用sparkdl。HorovodRunner代替。

    • sparkdl.graph.使用一个熊猫UDF代替。

    • sparkdl.udf.使用一个熊猫UDF代替。

    • Spark ML管道中使用的transformer和Estimators已弃用。使用以下替代方法:

      • 使用一个熊猫UDF作为下列变形金刚的替代品:

        • TFImageTransformer

        • TFTransformer

        • DeepImagePredictor

        • DeepImageFeaturizer

        • KerasImageFileTransformer

        • KerasTransformer

      • KerasImageFileEstimator:用于调优深度学习模型,请使用Hyperopt代替。

有关更多详细信息和推荐的替代方案,请查看在笔记本中使用这些包时的弃用消息。

错误修复

在Databricks社区版中,PySpark工作者现在可以找到预安装的Spark软件包。

系统环境

Databricks Runtime 6.2 ML的系统环境与Databricks Runtime 6.2不同:

以下部分列出了Databricks Runtime 6.2 ML中包含的不同于Databricks Runtime 6.2的库。

Python库

Databricks Runtime 6.2 ML使用Conda进行Python包管理,包括许多流行的ML包。下面介绍Databricks Runtime 6.2 ML的Conda环境。

CPU集群上的Python

的名字databricks-ml渠道--pytorch-违约依赖关系-_libgcc_mutex = 0.1 =主要-_py-xgboost-mutex = 2.0 = cpu_0-_tflow_select = tripwire = mkl-absl-py = 0.8.1 = py37_0-asn1crypto = 0.24.0 = py37_0-阿斯特= 0.8.0 = py37_0-backcall = 0.1.0 = py37_0-补丁= 1.0 = py_2-bcrypt = 3.1.7 = py37h7b6447c_0-布拉斯特区= 1.0 = mkl-宝途= 2.49.0 = py37_0-boto3 = 1.9.162 = py_0-botocore = 1.12.163 = py_0-c-ares = 1.15.0 = h7b6447c_1001-ca证书= 2019.1.23 = 0-certifi = 2019.3.9 = py37_0-cffi = 1.12.2 = py37h2e261b9_1-chardet = 3.0.4 = py37_1003-单击= 7.0 = py_0-cloudpickle = 0.8.0 = py37_0-彩色光= 0.4.1 = py_0-configparser = 3.7.4 = py37_0-cpuonly = 1.0 = 0-密码= 2.6.1 = py37h1ba5d50_0-周期计= 0.10.0 = py37_0-cython = 0.29.6 = py37he6710b0_0-decorator = 4.4.0 = py37_1-docutils = 0.14 = py37_0-entrypoints = 0.3 = py37_0-et_xmlfile = 1.0.1 = py37_0-瓶1.0.2 = = py37_1-freetype的= 2.9.1 = h8a8886c_1-未来= 0.17.1 = py37_0-恐吓= 0.2.2 = py37_0-gitdb2 = 2.0.6 = py_0-gitpython = 2.1.11 = py37_0-google-pasta = 0.1.8 = py_0-grpcio = 1.16.1 = py37hf8bcb03_1-gunicorn = 19.9.0 = py37_0-h5py = 2.9.0 = py37h7918eee_0-hdf5 = 1.10.4 = hb1b8bf9_0-html5lib = 1.0.1 = py_0-icu = 58.2 = h9c2bf20_1-idna = 2.8 = py37_0-intel-openmp = 2019.3 = 199-ipython = 7.4.0 = py37h39e3cac_0-ipython_genutils = 0.2.0 = py37_0-itsdangerous = 1.1.0 = py_0-jdcal = 1.4 = py37_0-绝地= 0.13.3 = py37_0-jinja2 = 2.10 = py37_0-jmespath = 0.9.4 = py_0-jpeg = 9 b = h024ee3a_2-keras-applications = 1.0.8 = py_0-keras-preprocessing = 1.1.0 = py_1-kiwisolver = 1.0.1 = py37hf484d3e_0-krb5 = 1.16.1 = h173b8e3_7-libedit = 3.1.20181209 = hc058e9b_0-libffi = 3.2.1 = hd88cf55_4-libgcc-ng = 8.2.0 = hdf63c60_1-libgfortran-ng = 7.3.0 = hdf63c60_0-libpng = 1.6.36 = hbc83047_0-libpq = 11.2 = h20c2e04_0-libprotobuf = 3.9.2 = hd408876_0-libsodium = 1.0.16 = h1bed415_0-libstdcxx-ng = 8.2.0 = hdf63c60_1-libtiff = 4.0.10 = h2733197_2-libxgboost = 0.90 = he6710b0_1-libxml2 = 2.9.9 = hea5a465_1-libxslt = 1.1.33 = h7d1a2b0_0-llvmlite = 0.28.0 = py37hd408876_0-lxml = 4.3.2 = py37hefd8a0e_0-尖吻鲭鲨= 1.0.10 = py_0-减价= 3.1.1 = py37_0-markupsafe = 1.1.1 = py37h7b6447c_0-mkl = 2019.3 = 199-mkl_fft = 1.0.10 = py37ha843d7b_0-1.0.2 mkl_random = = py37hd81dba3_0-ncurses = 6.1 = he6710b0_1-networkx = 2.2 = py37_1-忍者= 1.9.0 = py37hfd86e86_0-鼻子= 1.3.7 = py37_2-numba = 0.43.1 = py37h962f231_0-numpy = 1.16.2 = py37h7e9f1db_0-numpy-base = 1.16.2 = py37hde5b4d6_0-olefile = 0.46 = py_0-openpyxl = 2.6.1 = py37_1-openssl = 1.1.1b = h7b6447c_1-opt_einsum = 3.1.0 = py_0-熊猫= 0.24.2 = py37he6710b0_0-paramiko = 2.4.2 = py37_0-parso = 0.3.4 = py37_0-pathlib2 = 2.3.3 = py37_0-容易受骗的人= 0.5.1 = py37_0-pexpect = 4.6.0 = py37_0-pickleshare = 0.7.5 = py37_0-枕头= 5.4.1之前= py37h34e0f95_0-皮普= 19.0.3 = py37_0-厚度= 3.11 = py37_0-prompt_toolkit = 2.0.9 = py37_0-protobuf = 3.9.2 = py37he6710b0_0-psutil = 5.6.1 = py37h7b6447c_0-psycopg2 = 2.7.6.1 = py37h1ba5d50_0-ptyprocess = 0.6.0 = py37_0-py-xgboost = 0.90 = py37he6710b0_1-py-xgboost-cpu = 0.90 = py37_1-pyasn1 = 0.4.8 = py_0-pycparser = 2.19 = py_0-pygments = 2.3.1 = py37_0-pymongo = 3.8.0 = py37he6710b0_1-= py37h7b6447c_0 1.3.0 pynacl =版本-pyopenssl = 19.0.0 = py37_0-pyparsing = 2.3.1 = py37_0-pysocks = 1.6.8 = py37_0-python = 3.7.3 = h0371630_0-python-dateutil = 2.8.0 = py37_0-python编辑器的1.0.4 = = py_0-= py3.7_cpu_0 1.3.0 pytorch =版本-pytz = 2018.9 = py37_0-pyyaml = 5.1 = py37h7b6447c_0-readline = 7.0 = h7b6447c_5-= 2.21.0 = py37_0请求-s3transfer = 0.2.1 = py37_0-scikit-learn = 0.20.3 = py37hd81dba3_0-scipy = 1.2.1 = py37h7c811a0_0-setuptools = 40.8.0 = py37_0-simplejson = 3.16.0 = py37h14c3975_0-singledispatch = 3.4.0.3 = py37_0-6 = 1.12.0 = py37_0-smmap2 = 2.0.5 = py_0-sqlite = 3.27.2 = h7b6447c_0-sqlparse = 0.3.0 = py_0-statsmodels = 0.9.0 = py37h035aef0_0-汇总= 0.8.3 = py37_0-db2 = pyhb230dea_0 tensorboard = 1.15.0 +-db2 = mkl_py37hc5fbf04_0 tensorflow = 1.15.0 +-db2 = mkl_py37h2ae1e84_0 tensorflow-base = 1.15.0 +-db2 = pyh2649769_0 tensorflow-estimator = 1.15.1 +-db2 = h4fcabd2_0 tensorflow-mkl = 1.15.0 +-termcolor = 1.1.0 = py37_1-tk = 8.6.8 = hbc83047_0-torchvision = 0.4.1 = py37_cpu-tqdm = 4.31.1 = py37_1-traitlets = 4.3.2 = py37_0-urllib3 = 1.24.1 = py37_0-virtualenv = 16.0.0 = py37_0-wcwidth = 0.1.7 = py37_0-webencodings = 0.5.1 = py37_1-websocket-client = 0.56.0 = py37_0-werkzeug = 0.14.1 = py37_0-轮= 0.33.1 = py37_0-打包= 1.11.1 = py37h7b6447c_0-xz = 5.2.4 = h14c3975_4-yaml = 0.1.7 = had09818_2-zlib = 1.2.11 = h7b6447c_3-zstd = 1.3.7 = h0b5b093_0-皮普-argparse = = 1.4.0-databricks-cli = = 0.9.1-弃用= = 1.2.7-码头工人= = 4.1.0-fusepy = = 2.0.4-大猩猩= = 0.3.0-horovod = = 0.18.2-hyperopt = = 0.2.1.db1-2.2.5 keras = =)-matplotlib = = 3.0.3-mleap = = 0.8.1-mlflow = = 1.4.0-nose-exclude = = 0.5.0-pyarrow = = 0.13.0-querystring-parser = = 4-seaborn = = 0.9.0-tensorboardx = = 1.9前缀/砖/ conda / env / databricks-ml

GPU集群下的Python

的名字databricks-ml-gpu渠道--pytorch-违约依赖关系-_libgcc_mutex = 0.1 =主要-_py-xgboost-mutex = 1.0 = gpu_0-_tflow_select = 2.1.0 = gpu-absl-py = 0.8.1 = py37_0-asn1crypto = 0.24.0 = py37_0-阿斯特= 0.8.0 = py37_0-backcall = 0.1.0 = py37_0-补丁= 1.0 = py_2-bcrypt = 3.1.7 = py37h7b6447c_0-布拉斯特区= 1.0 = mkl-宝途= 2.49.0 = py37_0-boto3 = 1.9.162 = py_0-botocore = 1.12.163 = py_0-c-ares = 1.15.0 = h7b6447c_1001-ca证书= 2019.1.23 = 0-certifi = 2019.3.9 = py37_0-cffi = 1.12.2 = py37h2e261b9_1-chardet = 3.0.4 = py37_1003-单击= 7.0 = py_0-cloudpickle = 0.8.0 = py37_0-彩色光= 0.4.1 = py_0-configparser = 3.7.4 = py37_0-密码= 2.6.1 = py37h1ba5d50_0-cudatoolkit = 10.0.130 = 0-cudnn = 7.6.4 = cuda10.0_0-cupti = 10.0.130 = 0-周期计= 0.10.0 = py37_0-cython = 0.29.6 = py37he6710b0_0-decorator = 4.4.0 = py37_1-docutils = 0.14 = py37_0-entrypoints = 0.3 = py37_0-et_xmlfile = 1.0.1 = py37_0-瓶1.0.2 = = py37_1-freetype的= 2.9.1 = h8a8886c_1-未来= 0.17.1 = py37_0-恐吓= 0.2.2 = py37_0-gitdb2 = 2.0.6 = py_0-gitpython = 2.1.11 = py37_0-google-pasta = 0.1.8 = py_0-grpcio = 1.16.1 = py37hf8bcb03_1-gunicorn = 19.9.0 = py37_0-h5py = 2.9.0 = py37h7918eee_0-hdf5 = 1.10.4 = hb1b8bf9_0-html5lib = 1.0.1 = py_0-icu = 58.2 = h9c2bf20_1-idna = 2.8 = py37_0-intel-openmp = 2019.3 = 199-ipython = 7.4.0 = py37h39e3cac_0-ipython_genutils = 0.2.0 = py37_0-itsdangerous = 1.1.0 = py_0-jdcal = 1.4 = py37_0-绝地= 0.13.3 = py37_0-jinja2 = 2.10 = py37_0-jmespath = 0.9.4 = py_0-jpeg = 9 b = h024ee3a_2-keras-applications = 1.0.8 = py_0-keras-preprocessing = 1.1.0 = py_1-kiwisolver = 1.0.1 = py37hf484d3e_0-krb5 = 1.16.1 = h173b8e3_7-libedit = 3.1.20181209 = hc058e9b_0-libffi = 3.2.1 = hd88cf55_4-libgcc-ng = 8.2.0 = hdf63c60_1-libgfortran-ng = 7.3.0 = hdf63c60_0-libpng = 1.6.36 = hbc83047_0-libpq = 11.2 = h20c2e04_0-libprotobuf = 3.9.2 = hd408876_0-libsodium = 1.0.16 = h1bed415_0-libstdcxx-ng = 8.2.0 = hdf63c60_1-libtiff = 4.0.10 = h2733197_2-libxgboost = 0.90 = h688424c_0-libxml2 = 2.9.9 = hea5a465_1-libxslt = 1.1.33 = h7d1a2b0_0-llvmlite = 0.28.0 = py37hd408876_0-lxml = 4.3.2 = py37hefd8a0e_0-尖吻鲭鲨= 1.0.10 = py_0-减价= 3.1.1 = py37_0-markupsafe = 1.1.1 = py37h7b6447c_0-mkl = 2019.3 = 199-mkl_fft = 1.0.10 = py37ha843d7b_0-1.0.2 mkl_random = = py37hd81dba3_0-ncurses = 6.1 = he6710b0_1-networkx = 2.2 = py37_1-忍者= 1.9.0 = py37hfd86e86_0-鼻子= 1.3.7 = py37_2-numba = 0.43.1 = py37h962f231_0-numpy = 1.16.2 = py37h7e9f1db_0-numpy-base = 1.16.2 = py37hde5b4d6_0-olefile = 0.46 = py_0-openpyxl = 2.6.1 = py37_1-openssl = 1.1.1b = h7b6447c_1-opt_einsum = 3.1.0 = py_0-熊猫= 0.24.2 = py37he6710b0_0-paramiko = 2.4.2 = py37_0-parso = 0.3.4 = py37_0-pathlib2 = 2.3.3 = py37_0-容易受骗的人= 0.5.1 = py37_0-pexpect = 4.6.0 = py37_0-pickleshare = 0.7.5 = py37_0-枕头= 5.4.1之前= py37h34e0f95_0-皮普= 19.0.3 = py37_0-厚度= 3.11 = py37_0-prompt_toolkit = 2.0.9 = py37_0-protobuf = 3.9.2 = py37he6710b0_0-psutil = 5.6.1 = py37h7b6447c_0-psycopg2 = 2.7.6.1 = py37h1ba5d50_0-ptyprocess = 0.6.0 = py37_0-py-xgboost = 0.90 = py37h688424c_0-py-xgboost-gpu = 0.90 = py37h28bbb66_0-pyasn1 = 0.4.8 = py_0-pycparser = 2.19 = py_0-pygments = 2.3.1 = py37_0-pymongo = 3.8.0 = py37he6710b0_1-= py37h7b6447c_0 1.3.0 pynacl =版本-pyopenssl = 19.0.0 = py37_0-pyparsing = 2.3.1 = py37_0-pysocks = 1.6.8 = py37_0-python = 3.7.3 = h0371630_0-python-dateutil = 2.8.0 = py37_0-python编辑器的1.0.4 = = py_0-= py3.7_cuda10.0.130_cudnn7.6.3_0 1.3.0 pytorch =版本-pytz = 2018.9 = py37_0-pyyaml = 5.1 = py37h7b6447c_0-readline = 7.0 = h7b6447c_5-= 2.21.0 = py37_0请求-s3transfer = 0.2.1 = py37_0-scikit-learn = 0.20.3 = py37hd81dba3_0-scipy = 1.2.1 = py37h7c811a0_0-setuptools = 40.8.0 = py37_0-simplejson = 3.16.0 = py37h14c3975_0-singledispatch = 3.4.0.3 = py37_0-6 = 1.12.0 = py37_0-smmap2 = 2.0.5 = py_0-sqlite = 3.27.2 = h7b6447c_0-sqlparse = 0.3.0 = py_0-statsmodels = 0.9.0 = py37h035aef0_0-汇总= 0.8.3 = py37_0-db2 = pyhb230dea_0 tensorboard = 1.15.0 +-db2 = gpu_py37h9fd0ff8_0 tensorflow = 1.15.0 +-db2 = gpu_py37hd56f5dd_0 tensorflow-base = 1.15.0 +-db2 = pyh2649769_0 tensorflow-estimator = 1.15.1 +-db2 = h0d30ee6_0 tensorflow-gpu = 1.15.0 +-termcolor = 1.1.0 = py37_1-tk = 8.6.8 = hbc83047_0-torchvision = 0.4.1 = py37_cu100-tqdm = 4.31.1 = py37_1-traitlets = 4.3.2 = py37_0-urllib3 = 1.24.1 = py37_0-virtualenv = 16.0.0 = py37_0-wcwidth = 0.1.7 = py37_0-webencodings = 0.5.1 = py37_1-websocket-client = 0.56.0 = py37_0-werkzeug = 0.14.1 = py37_0-轮= 0.33.1 = py37_0-打包= 1.11.1 = py37h7b6447c_0-xz = 5.2.4 = h14c3975_4-yaml = 0.1.7 = had09818_2-zlib = 1.2.11 = h7b6447c_3-zstd = 1.3.7 = h0b5b093_0-皮普-argparse = = 1.4.0-databricks-cli = = 0.9.1-弃用= = 1.2.7-码头工人= = 4.1.0-fusepy = = 2.0.4-大猩猩= = 0.3.0-horovod = = 0.18.2-hyperopt = = 0.2.1.db1-2.2.5 keras = =)-matplotlib = = 3.0.3-mleap = = 0.8.1-mlflow = = 1.4.0-nose-exclude = = 0.5.0-pyarrow = = 0.13.0-querystring-parser = = 4-seaborn = = 0.9.0-tensorboardx = = 1.9前缀/砖/ conda / env / databricks-ml-gpu

包含Python模块的Spark包

火花包

Python模块

版本

graphframes

graphframes

0.7.0-db1-spark2.4

spark-deep-learning

sparkdl

1.5.0-db12-spark2.4

tensorframes

tensorframes

0.8.2-s_2.11

Java和Scala库(Scala 2.11集群)

除了Java和Scala库在Databricks Runtime 6.2, Databricks Runtime 6.2 ML包含以下jar:

组ID

工件ID

版本

com.databricks

spark-deep-learning

1.5.0-db12-spark2.4

com.typesafe.akka

akka-actor_2.11

2.3.11

ml.combust.mleap

mleap-databricks-runtime_2.11

0.15.0

ml.dmlc

xgboost4j

0.90

ml.dmlc

xgboost4j-spark

0.90

org.graphframes

graphframes_2.11

0.7.0-db1-spark2.4

org.mlflow

mlflow-client

1.4.0

org.tensorflow

libtensorflow

1.15.0

org.tensorflow

libtensorflow_jni

1.15.0

org.tensorflow

spark-tensorflow-connector_2.11

1.15.0

org.tensorflow

tensorflow

1.15.0

org.tensorframes

tensorframes

0.8.2-s_2.11