Databricks Runtime 6.0 for ML(不支持)

Databricks于2019年10月发布了这张图片。

Databricks Runtime 6.0 for Machine Learning为机器学习和数据科学提供了一个现成的环境Databricks Runtime 6.0(不支持).Databricks Runtime ML包含许多流行的机器学习库,包括TensorFlow, PyTorch, Keras和XGBoost。它还支持使用Horovod进行分布式深度学习训练。

有关更多信息,包括创建Databricks Runtime ML集群的说明,请参见介绍Databricks运行时机器学习

新功能

Databricks Runtime 6.0 ML是建立在Databricks Runtime 6.0之上的。有关Databricks Runtime 6.0中的新功能的信息,请参见Databricks Runtime 6.0(不支持)发行说明。

使用新的MLflow Spark数据源大规模查询MLflow实验数据

MLflow实验的Spark数据源现在提供了一个标准API来加载MLflow实验运行数据。这可以使用DataFrame api大规模查询和分析MLflow实验数据。对于给定的实验,DataFrame包含run_ids、metrics、params、标签、start_time、end_time、状态和工件的artifact_uri。看到MLflow实验

改进

  • Hyperopt GA

    Hyperopt on Databricks现在普遍可用。自公开预览以来的显著改进包括支持MLflow在Spark worker上的日志记录,正确处理PySpark广播变量,以及使用Hyperopt选择模型的新指南。我们还修复了日志信息、错误处理、UI中的小错误,并使我们的文档更易于阅读。详细信息请参见Hyperopt文档

    我们已经更新了Databricks如何记录Hyperopt实验,以便您现在可以在Hyperopt运行期间通过传递度量来记录自定义度量mlflow.log_metric函数(见log_metric).如果您想记录除损失之外的自定义指标,这是非常有用的hyperopt.fmin函数被调用。

  • MLflow

    • 增加MLflow Java客户端1.2.0

    • MLflow现在被提升为顶级图书馆

  • 升级的机器学习库

    • Horovod从0.16.4升级到0.18.1

    • MLflow从1.0.0升级到1.2.0

  • 蟒蛇分布从5.2.0升级到2019.03

删除

  • Databricks ML模型导出被删除。使用MLeap用于导入和导出模型。

  • Hyperopt的以下属性hyperopt。SparkTrials删除:

    • SparkTrials.successful_trials_count

    • SparkTrials.failed_trials_count

    • SparkTrials.cancelled_trials_count

    • SparkTrials.total_trials_count

    它们被以下功能所取代:

    • SparkTrials.count_successful_trials ()

    • SparkTrials.count_failed_trials ()

    • SparkTrials.count_cancelled_trials ()

    • SparkTrials.count_total_trials ()

系统环境

Databricks Runtime 6.0 ML的系统环境与Databricks Runtime 6.0不同:

以下部分列出了Databricks Runtime 6.0 ML中包含的与Databricks Runtime 6.0中包含的不同的库。

顶级库

Databricks Runtime 6.0 ML包括以下顶级

Python库

Databricks Runtime 6.0 ML使用Conda进行Python包管理,包括许多流行的ML包。下面介绍Databricks Runtime 6.0 ML的Conda环境。

CPU集群上的Python 3

的名字databricks-ml渠道-pytorch-违约依赖关系-_libgcc_mutex = 0.1 =主要-_py-xgboost-mutex = 2.0 = cpu_0-_tflow_select = tripwire = mkl-absl-py =是0.7.1 = py37_0-asn1crypto = 0.24.0 = py37_0-阿斯特= 0.8.0 = py37_0-backcall = 0.1.0 = py37_0-补丁= 1.0 = py_2-bcrypt = 3.1.6 = py37h7b6447c_0-布拉斯特区= 1.0 = mkl-宝途= 2.49.0 = py37_0-boto3 = 1.9.162 = py_0-botocore = 1.12.163 = py_0-c-ares = 1.15.0 = h7b6447c_1001-ca证书= 2019.1.23 = 0-certifi = 2019.3.9 = py37_0-cffi = 1.12.2 = py37h2e261b9_1-chardet = 3.0.4 = py37_1003-单击= 7.0 = py37_0-cloudpickle = 0.8.0 = py37_0-彩色光= 0.4.1 = py37_0-configparser = 3.7.4 = py37_0-密码= 2.6.1 = py37h1ba5d50_0-周期计= 0.10.0 = py37_0-cython = 0.29.6 = py37he6710b0_0-decorator = 4.4.0 = py37_1-docutils = 0.14 = py37_0-entrypoints = 0.3 = py37_0-et_xmlfile = 1.0.1 = py37_0-瓶1.0.2 = = py37_1-freetype的= 2.9.1 = h8a8886c_1-未来= 0.17.1 = py37_0-恐吓= 0.2.2 = py37_0-gitdb2 = 2.0.5 = py37_0-gitpython = 2.1.11 = py37_0-grpcio = 1.16.1 = py37hf8bcb03_1-gunicorn = 19.9.0 = py37_0-h5py = 2.9.0 = py37h7918eee_0-hdf5 = 1.10.4 = hb1b8bf9_0-html5lib = 1.0.1 = py_0-icu = 58.2 = h9c2bf20_1-idna = 2.8 = py37_0-intel-openmp = 2019.3 = 199-ipython = 7.4.0 = py37h39e3cac_0-ipython_genutils = 0.2.0 = py37_0-itsdangerous = 1.1.0 = py37_0-jdcal = 1.4 = py37_0-绝地= 0.13.3 = py37_0-jinja2 = 2.10 = py37_0-jmespath = 0.9.4 = py_0-jpeg = 9 b = h024ee3a_2-keras = 2.2.4 = 0-keras-applications = 1.0.8 = py_0-keras-base = 2.2.4 = py37_0-keras-preprocessing = 1.1.0 = py_1-kiwisolver = 1.0.1 = py37hf484d3e_0-krb5 = 1.16.1 = h173b8e3_7-libedit = 3.1.20181209 = hc058e9b_0-libffi = 3.2.1 = hd88cf55_4-libgcc-ng = 8.2.0 = hdf63c60_1-libgfortran-ng = 7.3.0 = hdf63c60_0-libpng = 1.6.36 = hbc83047_0-libpq = 11.2 = h20c2e04_0-libprotobuf = 3.8.0 = hd408876_0-libsodium = 1.0.16 = h1bed415_0-libstdcxx-ng = 8.2.0 = hdf63c60_1-libtiff = 4.0.10 = h2733197_2-libxgboost = 0.90 = he6710b0_0-libxml2 = 2.9.9 = hea5a465_1-libxslt = 1.1.33 = h7d1a2b0_0-llvmlite = 0.28.0 = py37hd408876_0-lxml = 4.3.2 = py37hefd8a0e_0-尖吻鲭鲨= 1.0.10 = py_0-减价= 3.1.1 = py37_0-markupsafe = 1.1.1 = py37h7b6447c_0-mkl = 2019.3 = 199-mkl_fft = 1.0.10 = py37ha843d7b_0-1.0.2 mkl_random = = py37hd81dba3_0-模拟= 3.0.5 = py37_0-ncurses = 6.1 = he6710b0_1-networkx = 2.2 = py37_1-忍者= 1.9.0 = py37hfd86e86_0-鼻子= 1.3.7 = py37_2-numba = 0.43.1 = py37h962f231_0-numpy = 1.16.2 = py37h7e9f1db_0-numpy-base = 1.16.2 = py37hde5b4d6_0-olefile = 0.46 = py37_0-openpyxl = 2.6.1 = py37_1-openssl = 1.1.1b = h7b6447c_1-熊猫= 0.24.2 = py37he6710b0_0-paramiko = 2.4.2 = py37_0-parso = 0.3.4 = py37_0-pathlib2 = 2.3.3 = py37_0-容易受骗的人= 0.5.1 = py37_0-pexpect = 4.6.0 = py37_0-pickleshare = 0.7.5 = py37_0-枕头= 5.4.1之前= py37h34e0f95_0-皮普= 19.0.3 = py37_0-厚度= 3.11 = py37_0-prompt_toolkit = 2.0.9 = py37_0-protobuf = 3.8.0 = py37he6710b0_0-psutil = 5.6.1 = py37h7b6447c_0-psycopg2 = 2.7.6.1 = py37h1ba5d50_0-ptyprocess = 0.6.0 = py37_0-py-xgboost = 0.90 = py37he6710b0_0-py-xgboost-cpu = 0.90 = py37_0-pyasn1 = 0.4.6 = py_0-pycparser = 2.19 = py37_0-pygments = 2.3.1 = py37_0-pymongo = 3.8.0 = py37he6710b0_1-= py37h7b6447c_0 1.3.0 pynacl =版本-pyopenssl = 19.0.0 = py37_0-pyparsing = 2.3.1 = py37_0-pysocks = 1.6.8 = py37_0-python = 3.7.3 = h0371630_0-python-dateutil = 2.8.0 = py37_0-python编辑器的1.0.4 = = py_0-pytorch-cpu = 1.1.0 = py3.7_cpu_0-pytz = 2018.9 = py37_0-pyyaml = 5.1 = py37h7b6447c_0-readline = 7.0 = h7b6447c_5-= 2.21.0 = py37_0请求-s3transfer = 0.2.1 = py37_0-scikit-learn = 0.20.3 = py37hd81dba3_0-scipy = 1.2.1 = py37h7c811a0_0-setuptools = 40.8.0 = py37_0-simplejson = 3.16.0 = py37h14c3975_0-singledispatch = 3.4.0.3 = py37_0-6 = 1.12.0 = py37_0-smmap2 = 2.0.5 = py37_0-sqlite = 3.27.2 = h7b6447c_0-sqlparse = 0.3.0 = py_0-statsmodels = 0.9.0 = py37h035aef0_0-汇总= 0.8.3 = py37_0-tensorboard = 1.13.1 = py37hf484d3e_0-tensorflow = 1.13.1 = mkl_py37h54b294f_0-tensorflow-base = 1.13.1 = mkl_py37h7ce6ba3_0-tensorflow-estimator = 1.13.0 = py_0-tensorflow-mkl = 1.13.1 = h4fcabd2_0-termcolor = 1.1.0 = py37_1-tk = 8.6.8 = hbc83047_0-torchvision-cpu = 0.3.0 = py37_cuNone_1-tqdm = 4.31.1 = py37_1-traitlets = 4.3.2 = py37_0-urllib3 = 1.24.1 = py37_0-virtualenv = 16.0.0 = py37_0-wcwidth = 0.1.7 = py37_0-webencodings = 0.5.1 = py37_1-websocket-client = 0.56.0 = py37_0-werkzeug = 0.14.1 = py37_0-轮= 0.33.1 = py37_0-打包= 1.11.1 = py37h7b6447c_0-xz = 5.2.4 = h14c3975_4-yaml = 0.1.7 = had09818_2-zlib = 1.2.11 = h7b6447c_3-zstd = 1.3.7 = h0b5b093_0-皮普-argparse = = 1.4.0-databricks-cli = = 0.9.0-码头工人= = 4.0.2-fusepy = = 2.0.4-大猩猩= = 0.3.0-horovod = = 0.18.1-hyperopt = = 0.1.2.db8-matplotlib = = 3.0.3-mleap = = 0.8.1-mlflow = = 1.2.0-nose-exclude = = 0.5.0-pyarrow = = 0.13.0-querystring-parser = = 4-seaborn = = 0.9.0-tensorboardx = = 1.8前缀/砖/ conda / env / databricks-ml

包含Python模块的Spark包

火花包

Python模块

版本

graphframes

graphframes

0.7.0-db1-spark2.4

spark-deep-learning

sparkdl

1.5.0-db5-spark2.4

tensorframes

tensorframes

0.7.0-s_2.11

Java和Scala库(Scala 2.11集群)

除了Java和Scala库在Databricks Runtime 6.0, Databricks Runtime 6.0 ML包含以下jar:

组ID

工件ID

版本

com.databricks

spark-deep-learning

1.5.0-db5-spark2.4

com.typesafe.akka

akka-actor_2.11

2.3.11

ml.combust.mleap

mleap-databricks-runtime_2.11

0.14.0

ml.dmlc

xgboost4j

0.90

ml.dmlc

xgboost4j-spark

0.90

org.graphframes

graphframes_2.11

0.7.0-db1-spark2.4

org.mlflow

mlflow-client

1.2.0

org.tensorflow

libtensorflow

1.13.1

org.tensorflow

libtensorflow_jni

1.13.1

org.tensorflow

spark-tensorflow-connector_2.11

1.13.1

org.tensorflow

tensorflow

1.13.1

org.tensorframes

tensorframes

0.7.0-s_2.11