pyspark.pandas.Series.apply¶

系列。 应用 ( 函数:可调用的,arg游戏:序列(任何]=(),* *kwds:任何 )→pyspark.pandas.series.Series¶

调用函数的值。

只能作用于一个Python函数。

请注意

这个API函数执行一次来推断的类型可能是非常昂贵的,例如,当聚合或排序后创建的数据集。

为了避免这种情况,指定返回类型函数例如,如下:

            > > >def广场(x)- >np。int32:…返回x* *2
           

pandas-on-Spark使用返回类型提示,不试图推断类型。

参数

返回

另请参阅

例子

创建一系列与典型的夏季气温为每个城市。

           > > >年代=ps。系列([20.,21,12),…指数=(“伦敦”,“纽约”,“赫尔辛基”])> > >年代伦敦20纽约21赫尔辛基12dtype: int64
          

平方值通过定义一个函数,将它作为参数传递给应用()。

           > > >def广场(x)- >np。int64:…返回x* *2> > >年代。应用(广场)400年伦敦奥运会纽约441144年赫尔辛基dtype: int64
          

定义一个自定义函数,需要额外的位置参数,通过使用这些额外的参数arg游戏关键字

           > > >defsubtract_custom_value(x,custom_value)- >np。int64:…返回x- - - - - -custom_value
          

           > > >年代。应用(subtract_custom_value,arg游戏=(5,))伦敦15纽约16赫尔辛基7dtype: int64
          

定义一个自定义函数,将关键字参数,通过这些参数应用

           > > >defadd_custom_values(x,* *kwargs)- >np。int64:…为月在kwargs:…x+ =kwargs(月]…返回x
          

           > > >年代。应用(add_custom_values,6月=30.,7月=20.,8月=25)95年伦敦奥运会纽约9687年赫尔辛基dtype: int64
          

使用一个函数从Numpy库

           > > >defnumpy_log(上校)- >np。float64:…返回np。日志(上校)> > >年代。应用(numpy_log)伦敦2.995732纽约3.044522赫尔辛基2.484907dtype: float64
          

你可以省略类型提示,让pandas-on-Spark推断它的类型。

           > > >年代。应用(np。日志)伦敦2.995732纽约3.044522赫尔辛基2.484907dtype: float64
          

以前的

pyspark.pandas.Series.dot

下一个

pyspark.pandas.Series.agg