Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-177][Native Bayes] Fix error when converting Vector to CSRNumericTable #176

Merged
merged 9 commits into from
Feb 23, 2022
25 changes: 13 additions & 12 deletions mllib-dal/src/main/scala/com/intel/oap/mllib/OneDAL.scala
Original file line number Diff line number Diff line change
Expand Up @@ -236,8 +236,8 @@ object OneDAL {
matrixLabel
}

private def vectorsToSparseNumericTable(vectors: Array[Vector],
nFeatures: Long): CSRNumericTable = {
def vectorsToSparseNumericTable(vectors: Array[Vector],
nFeatures: Long): CSRNumericTable = {
require(vectors(0).isInstanceOf[SparseVector], "vectors should be sparse")

println(s"Features row x column: ${vectors.length} x ${vectors(0).size}")
Expand All @@ -250,10 +250,9 @@ object OneDAL {
val columnIndices = Array.fill(ratingsNum) {
0L
}
val rowOffsets = ArrayBuffer[Long](1L)
val rowOffsets = ArrayBuffer[Long]()
xwu99 marked this conversation as resolved.
Show resolved Hide resolved

var indexValues = 0
var curRow = 0L

// Converted to one CSRNumericTable
for (row <- 0 until vectors.length) {
Expand All @@ -263,20 +262,22 @@ object OneDAL {
// one-based indexValues
columnIndices(indexValues) = column + 1

if (row > curRow) {
curRow = row
// one-based indexValues
rowOffsets += indexValues + 1
}

indexValues = indexValues + 1
}
// one-based row indexValues
rowOffsets += indexValues + 1
}
// one-based row indexValues
rowOffsets += indexValues + 1

val contextLocal = new DaalContext()

// check CSR encoding
assert(values.length == ratingsNum,
"the length of values should be equal to the number of non-zero elements")
assert(columnIndices.length == ratingsNum,
"the length of columnIndices should be equal to the number of non-zero elements")
assert(rowOffsets.size == (csrRowNum + 1),
"the size of rowOffsets should be equal to the number of rows + 1")

val cTable = OneDAL.cNewCSRNumericTableDouble(values, columnIndices, rowOffsets.toArray,
nFeatures, csrRowNum)
val table = new CSRNumericTable(contextLocal, cTable)
Expand Down
40 changes: 40 additions & 0 deletions mllib-dal/src/test/scala/org/apache/spark/ml/oneDALSuite.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
package org.apache.spark.ml

import com.intel.oap.mllib.OneDAL
import org.apache.spark.internal.Logging
import org.apache.spark.ml.linalg.{Matrices, Vector, Vectors}
import org.apache.spark.sql.Row

class oneDALSuite extends FunctionsSuite with Logging {

import testImplicits._

test("test sparse vector to CSRNumericTable") {
val data = Seq(
Vectors.sparse(3, Seq((0, 1.0), (1, 2.0), (2, 3.0))),
Vectors.sparse(3, Seq((0, 10.0), (1, 20.0), (2, 30.0))),
Vectors.sparse(3, Seq.empty),
Vectors.sparse(3, Seq.empty),
Vectors.sparse(3, Seq((0, 1.0), (1, 2.0))),
Vectors.sparse(3, Seq((0, 10.0), (2, 20.0))),
)
val df = data.map(Tuple1.apply).toDF("features")
df.show()
val rowsRDD = df.rdd.map {
case Row(features: Vector) => features
}
val results = rowsRDD.coalesce(1).mapPartitions { it: Iterator[Vector] =>
val vectors: Array[Vector] = it.toArray
val numColumns = vectors(0).size
val CSRNumericTable = {
OneDAL.vectorsToSparseNumericTable(vectors, numColumns)
}
Iterator(CSRNumericTable.getCNumericTable)
}.collect()
val csr = OneDAL.makeNumericTable(results(0))
val resultMatrix = OneDAL.numericTableToMatrix(csr)
val matrix = Matrices.fromVectors(data)

assert((resultMatrix.toArray sameElements matrix.toArray) === true)
}
}
3 changes: 2 additions & 1 deletion mllib-dal/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ suiteArray=(
"classification.MLlibNaiveBayesSuite" \
"regression.MLlibLinearRegressionSuite" \
"stat.MLlibCorrelationSuite" \
"stat.MultivariateOnlineSummarizerSuite"
"stat.MultivariateOnlineSummarizerSuite" \
"oneDALSuite"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you put oneDALSuite in com.intel.oap.mllib namespace to align with oneDAL.scala

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if putting oneDALSuite in com.intel.oap.mllib, we can't use Matrices.fromVectors function to convert Vector to Matrix, because fromVectors function was private[ml]. In my opinion, we can distinguish more fine-grained into ML and MLLIb likes following image.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, thanks!

)

MVN_NO_TRANSFER_PROGRESS=
Expand Down