BlockMatrix¶

类 pyspark.mllib.linalg.distributed。 BlockMatrix ( 块:pyspark.rdd.RDD(元组(元组(int,int],pyspark.mllib.linalg.Matrix]],rowsPerBlock:int,colsPerBlock:int,numRows:int=0,numCols:int=0 ) ¶

代表了一种分布式矩阵在本地块矩阵。

参数

块 pyspark.RDD: 子矩阵块的一个抽样((blockRowIndex blockColIndex)子矩阵),形成这种分布矩阵。如果多个相同的块索引存在,操作,比如添加和繁殖的结果将是不可预测的。
rowsPerBlock int: 组成每一块的行数。块形成最后一行不需要给定的行数。
colsPerBlock int: 列数,每一块。块形成最后一列不需要给定的列数。
numRows int,可选: 这个矩阵的行数。如果提供的值小于或等于零,将计算的行数numRows被调用。
numCols int,可选: 这个矩阵的列数。如果提供的值小于或等于零,列数时计算numCols被调用。

方法

`添加`(其他)	增加了两块矩阵。
`缓存`()	缓存底层抽样。
`乘`(其他)	离开这个BlockMatrix繁殖其他,另一个BlockMatrix。
`numCols`()	获取或计算的关口。
`numRows`()	获取或计算的行数。
`坚持`(storageLevel)	存在潜在的抽样与指定的存储水平。
`减去`(其他)	减去给定的块矩阵其他从这一块矩阵:——其他。
`toCoordinateMatrix`()	这个矩阵转换为一个CoordinateMatrix。
`toIndexedRowMatrix`()	这个矩阵转换为一个IndexedRowMatrix。
`toLocalMatrix`()	收集分布式矩阵作为DenseMatrix司机。
`转置`()	这个BlockMatrix转置。
`验证`()	验证矩阵信息对矩阵数据块(块),如果发现任何错误抛出一个异常。

属性

`块`	子矩阵块的抽样((blockRowIndex blockColIndex)、子矩阵),形成这种分布矩阵。
`colsPerBlock`	列数,每一块。
`numColBlocks`	列数BlockMatrix块的。
`numRowBlocks`	BlockMatrix的块的行数。
`rowsPerBlock`	组成每一块的行数。

方法的文档

添加 ( 其他:pyspark.mllib.linalg.distributed.BlockMatrix )→pyspark.mllib.linalg.distributed.BlockMatrix ¶

增加了两块矩阵。矩阵必须有相同的大小和匹配rowsPerBlock和colsPerBlock值。如果一个被添加的子矩阵块SparseMatrix,由此产生的子矩阵块也将SparseMatrix,即使它被添加到DenseMatrix。如果两个密集子矩阵块,块也将DenseMatrix的输出。

例子

             > > >dm1=矩阵。密集的(3,2,(1,2,3,4,5,6])> > >dm2=矩阵。密集的(3,2,(7,8,9,10,11,12])> > >sm=矩阵。稀疏的(3,2,(0,1,3),(0,1,2),(7,11,12])> > >blocks1=sc。并行化((((0,0),dm1),((1,0),dm2)))> > >blocks2=sc。并行化((((0,0),dm1),((1,0),dm2)))> > >blocks3=sc。并行化((((0,0),sm),((1,0),dm2)))> > >mat1=BlockMatrix(blocks1,3,2)> > >mat2=BlockMatrix(blocks2,3,2)> > >mat3=BlockMatrix(blocks3,3,2)
            

             > > >mat1。添加(mat2)。toLocalMatrix()DenseMatrix (6 2 (2.0, 4.0, 6.0, 14.0, 16.0, 18.0, 8.0, 10.0, 12.0, 20.0, 22.0, 24.0), 0)
            

             > > >mat1。添加(mat3)。toLocalMatrix()DenseMatrix (6 2 (8.0, 2.0, 3.0, 14.0, 16.0, 18.0, 4.0, 16.0, 18.0, 20.0, 22.0, 24.0), 0)
            

缓存 ( )→pyspark.mllib.linalg.distributed.BlockMatrix ¶: 缓存底层抽样。

乘 ( 其他:pyspark.mllib.linalg.distributed.BlockMatrix )→pyspark.mllib.linalg.distributed.BlockMatrix ¶

离开这个BlockMatrix繁殖其他,另一个BlockMatrix。的colsPerBlock这个矩阵必须相等rowsPerBlock的其他。如果其他包含任何SparseMatrix块,他们将不得不被转换成DenseMatrix块。输出BlockMatrix只会由DenseMatrix块。这可能会导致一些性能问题,直到支持添加两个稀疏矩阵相乘。

例子

             > > >dm1=矩阵。密集的(2,3,(1,2,3,4,5,6])> > >dm2=矩阵。密集的(2,3,(7,8,9,10,11,12])> > >dm3=矩阵。密集的(3,2,(1,2,3,4,5,6])> > >dm4=矩阵。密集的(3,2,(7,8,9,10,11,12])> > >sm=矩阵。稀疏的(3,2,(0,1,3),(0,1,2),(7,11,12])> > >blocks1=sc。并行化((((0,0),dm1),((0,1),dm2)))> > >blocks2=sc。并行化((((0,0),dm3),((1,0),dm4)))> > >blocks3=sc。并行化((((0,0),sm),((1,0),dm4)))> > >mat1=BlockMatrix(blocks1,2,3)> > >mat2=BlockMatrix(blocks2,3,2)> > >mat3=BlockMatrix(blocks3,3,2)
            

             > > >mat1。乘(mat2)。toLocalMatrix()DenseMatrix (2, 2, 242.0, 272.0, 350.0, 398.0, 0)
            

             > > >mat1。乘(mat3)。toLocalMatrix()DenseMatrix (2, 2, 227.0, 258.0, 394.0, 450.0, 0)
            

numCols ( )→int¶

获取或计算的关口。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))
            

             > > >垫=BlockMatrix(块,3,2)> > >打印(垫。numCols())2
            

             > > >垫=BlockMatrix(块,3,2,7,6)> > >打印(垫。numCols())6
            

numRows ( )→int¶

获取或计算的行数。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))
            

             > > >垫=BlockMatrix(块,3,2)> > >打印(垫。numRows())6
            

             > > >垫=BlockMatrix(块,3,2,7,6)> > >打印(垫。numRows())7
            

坚持 ( storageLevel:pyspark.storagelevel.StorageLevel )→pyspark.mllib.linalg.distributed.BlockMatrix ¶: 存在潜在的抽样与指定的存储水平。

减去 ( 其他:pyspark.mllib.linalg.distributed.BlockMatrix )→pyspark.mllib.linalg.distributed.BlockMatrix ¶

减去给定的块矩阵其他从这一块矩阵:——其他。矩阵必须有相同的大小和匹配rowsPerBlock和colsPerBlock值。如果其中一个的子矩阵块被减去SparseMatrix,由此产生的子矩阵块也将SparseMatrix,即使它被从DenseMatrix减去。如果两个密集子矩阵块减去,输出块也将DenseMatrix。

例子

             > > >dm1=矩阵。密集的(3,2,(3,1,5,4,6,2])> > >dm2=矩阵。密集的(3,2,(7,8,9,10,11,12])> > >sm=矩阵。稀疏的(3,2,(0,1,3),(0,1,2),(1,2,3])> > >blocks1=sc。并行化((((0,0),dm1),((1,0),dm2)))> > >blocks2=sc。并行化((((0,0),dm2),((1,0),dm1)))> > >blocks3=sc。并行化((((0,0),sm),((1,0),dm2)))> > >mat1=BlockMatrix(blocks1,3,2)> > >mat2=BlockMatrix(blocks2,3,2)> > >mat3=BlockMatrix(blocks3,3,2)
            

             > > >mat1。减去(mat2)。toLocalMatrix()DenseMatrix (6 2 (-4.0, -7.0, -4.0, 4.0, 7.0, 4.0, -6.0, -5.0, -10.0, 6.0, 5.0, 10.0), 0)
            

             > > >mat2。减去(mat3)。toLocalMatrix()DenseMatrix (6 2 (6.0, 8.0, 9.0, -4.0, -7.0, -4.0, 10.0, 9.0, 9.0, -6.0, -5.0, -10.0), 0)
            

toCoordinateMatrix ( )→pyspark.mllib.linalg.distributed.CoordinateMatrix ¶

这个矩阵转换为一个CoordinateMatrix。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(1,2,(1,2))),…((1,0),矩阵。密集的(1,2,(7,8)))))> > >垫=BlockMatrix(块,1,2)。toCoordinateMatrix()> > >垫。条目。取(3)[MatrixEntry (0, 0, 1.0), MatrixEntry (0、1、2.0), MatrixEntry (1 0 7.0)]
            

toIndexedRowMatrix ( )→pyspark.mllib.linalg.distributed.IndexedRowMatrix ¶

这个矩阵转换为一个IndexedRowMatrix。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)。toIndexedRowMatrix()
            

             > > >#这BlockMatrix将有6个有效行,由于> > >#有两个子矩阵块堆叠,每3行。> > >#随后IndexedRowMatrix还将有6行。> > >打印(垫。numRows())6
            

             > > >#这BlockMatrix 2列有效,由于> > >#有两个子矩阵块堆叠,每个都有2列。> > >#随后IndexedRowMatrix还将有2列。> > >打印(垫。numCols())2
            

toLocalMatrix ( )→pyspark.mllib.linalg.Matrix ¶

收集分布式矩阵作为DenseMatrix司机。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)。toLocalMatrix()
            

             > > >#这BlockMatrix将有6个有效行,由于> > >#有两个子矩阵块堆叠,每3行。> > >#随后DenseMatrix还将有6行。> > >打印(垫。numRows)6
            

             > > >#这BlockMatrix 2列有效,由于> > >#有两个子矩阵块堆叠,每个都有2> > >#列。随后DenseMatrix还将有2列。> > >打印(垫。numCols)2
            

转置 ( )→pyspark.mllib.linalg.distributed.BlockMatrix ¶

这个BlockMatrix转置。返回一个新的BlockMatrix实例共享相同的底层数据。是一个懒惰的操作。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)
            

             > > >mat_transposed=垫。转置()> > >mat_transposed。toLocalMatrix()DenseMatrix (2、6 (1.0, 4.0, 2.0, 5.0, 3.0, 6.0, 7.0, 10.0, 8.0, 11.0, 9.0, 12.0), 0)
            

验证 ( )→没有¶: 验证矩阵信息对矩阵数据块(块),如果发现任何错误抛出一个异常。

属性的文档

块 ¶

子矩阵块的抽样((blockRowIndex blockColIndex)、子矩阵),形成这种分布矩阵。

例子

             > > >垫=BlockMatrix(…sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12))))),3,2)> > >块=垫。块> > >块。第一个()((0,0),DenseMatrix (3 2 (1.0, 2.0, 3.0, 4.0, 5.0, 6.0), 0))
            

colsPerBlock ¶

列数,每一块。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)> > >垫。colsPerBlock2
            

numColBlocks ¶

列数BlockMatrix块的。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)> > >垫。numColBlocks1
            

numRowBlocks ¶

BlockMatrix的块的行数。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)> > >垫。numRowBlocks2
            

rowsPerBlock ¶

组成每一块的行数。

例子

             > > >块=sc。并行化((((0,0),矩阵。密集的(3,2,(1,2,3,4,5,6))),…((1,0),矩阵。密集的(3,2,(7,8,9,10,11,12)))))> > >垫=BlockMatrix(块,3,2)> > >垫。rowsPerBlock3
            

以前的

QRDecomposition

下一个

CoordinateMatrix