Integrating the SuanShu Math Library
Overview
SuanShu is an open source, object-oriented mathematics library for numerical analysis and statistics. The updated Deephaven extension of SuanShu can be accessed at: https://github.com/illumon-public/SuanShu.
Integrating SuanShu with Deephaven allows computations related to:
Numerical Analysis
- Numerical differentiation and integration
- Polynomials and Root finding
- Unconstrained and Constrained optimization for univariate and multivariate functions
- Linear Algebra: Matrix operations and factorizations
- Random samplings from various distributions
Statistics
- Descriptive statistics
- Ordinary Linear Regression (OLS)
- Generalized Linear Model (GLM)
- Residual Analysis
- Stochastic Differential Equation (SDE) simulation
- Various hypothesis tests
For more information, please review the Javadocs for the Deephaven-Suanshu Integration and the Javadocs for the SuanShu Library.
Integration
Much of the SuanShu API can be called directly from a Deephaven query. For example,
import com.numericalmethod.suanshu.analysis.function.special.gaussian.Gaussian
db.importClass(com.numericalmethod.suanshu.analysis.function.special.gaussian.Gaussian)
t1 = db.t(...)
g = new Gaussian()
t2 = t1.update("Output = g.evaluate(X)")
However, vector and matrix operations require ssVec(...) and ssMat(...) to be called on Deephaven data to create SuanShu Vector and Matrix objects, which can then be passed as arguments to SuanShu methods.
These methods are required because the SuanShu library utilizes its own data types for processing matrices and vectors. The ssVec() and ssMat() methods wrap Deephaven arrays into types that are compatible with the SuanShu library.
The following example demonstrates the process noted above to multiply a matrix and a vector:
values = newTable(
intCol("M1", 1, 4, 7),
intCol("M2", 2, 5, 8),
intCol("M3", 3, 6, 9),
intCol("V", 2, 1, 3),
)
arrays = values.by()
structures2D = arrays.update(
"Matrix = ssMat(M1, M2, M3)",
"Vector = ssVec(V)"
)
product = structures2D.update("Product = Matrix.multiply(Vector)")
show()
The toString() method in Vector and Matrix instances created by the above integration methods (ssVec() and ssMat()) has been made efficient to show first 10 elements of the Vector and first 3 rows and first 3 columns of the Matrix. The show() method can be utilized as shown below to convert Vector and Matrix instances into complete String representations.
((SuanShuIntegration.AbstractVector)vector).show()
((SuanShuIntegration.AbstractMatrix)matrix).show()
Note: These show() methods are valid only for the instances created by the ssVec(), which are of type SuanShuIntegration.AbstractVector, and ssMat(), which are of type SuanShuIntegration.AbstractMatrix. Calling these methods on other Vector and Matrix instances throw compile-time errors.
Concurrency
By default, SuanShu utilizes all available cores as needed. However, you can set a limit.
For example, the following limits SuanShu to use up to five cores:
import com.numericalmethod.suanshu.parallel.ParallelExecutor
ParallelExecutor.setConcurrencyLevel(5)
Examples
Evaluate a Gaussian Function
import com.numericalmethod.suanshu.analysis.function.special.gaussian.Gaussian
db.importClass(com.numericalmethod.suanshu.analysis.function.special.gaussian.Gaussian)
g = new Gaussian()
t = emptyTable(100).update("X = Math.random()", "Gaussian_X = g.evaluate(X)")
Beta Distribution
import com.numericalmethod.suanshu.stats.distribution.univariate.BetaDistribution
dist = new BetaDistribution(10.5, 99.5);
println dist.mean()
println dist.variance()
println dist.skew()
println dist.kurtosis()
println dist.cdf(0.4)
println dist.density(0.05)
Shapiro-Wilk Test
db.importClass(com.numericalmethod.suanshu.stats.test.distribution.normality.ShapiroWilk)
t = newTable(doubleCol("Data", [-1.7, -1, -1, -0.73, -0.61, -0.5, -0.24, 0.45, 0.62, 0.81, 1, 5] as double[]))
tShapiro = t.by().update("ShapiroWilk = new ShapiroWilk(Data)", "Statistics = ShapiroWilk.statistics()", "pValue = ShapiroWilk.pValue()")
Linear Regression via OLS
import com.illumon.iris.db.tables.utils.TableTools
import com.numericalmethod.suanshu.stats.regression.linear.LMProblem
import com.numericalmethod.suanshu.stats.regression.linear.ols.OLSRegression
db.importClass(com.numericalmethod.suanshu.stats.regression.linear.ols.OLSRegression.class)
db.importClass(com.numericalmethod.suanshu.stats.regression.linear.LMProblem.class)
t = newTable(doubleCol("Y", [2.32, 0.452, 4.53, 12.34, 32.2] as double[]),
doubleCol("X1", [1.52, 3.22, 4.32, 10.1034, 12.1] as double[]),
doubleCol("X2", [2.23, 6.34, 12.2, 43.2, 2.12] as double[]),
doubleCol("X3", [4.31, 3.46, 23.1, 22.3, 3.27] as double[]),
doubleCol("W", [0.2, 0.4, 0.1, 0.3, 0.1] as double[]))
t = t.by().update("XMat = ssMat(X1, X2, X3)",
"YVec = ssVec(Y)",
"WeightVec = ssVec(W)",
"OLSRegression = new OLSRegression(new LMProblem(YVec, XMat, true, WeightVec))",
"CoEff = OLSRegression.beta.betaHat",
"Error = OLSRegression.beta.stderr")
Last Updated: 16 February 2021 18:07 -04:00 UTC Deephaven v.1.20200928 (See other versions)
Deephaven Documentation Copyright 2016-2020 Deephaven Data Labs, LLC All Rights Reserved