deephaven.java_to_python¶
Utilities for converting Deephaven java objects to appropriate python objects.
-
columnToNumpyArray
(table, columnName, convertNulls=0, forPandas=False)¶ Produce a copy of the specified column as a
numpy.ndarray
.- Parameters
table – the Table object
columnName – the name of the desired column
convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e.
'ERROR'
), enum member (i.e.NULL_CONVERSION.PASS
), or integer value (i.e.2
)forPandas – boolean for whether the output will be fed into a pandas.Series (i.e. must be 1-dimensional)
- Returns
numpy.ndarray
object which reproduces the given column of table (as faithfully as possible)
Note that the entire column is going to be cloned into memory, so the total number of entries in the column should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as down-selecting rows using the Deephaven query language before converting.
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:
NULL_CONVERSION.ERROR (=0)
[default] inspect for the presence of null values, and raise an exception if one is encountered.NULL_CONVERSION.PASS (=1)
do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.NULL_CONVERSION.CONVERT (=2)
inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):integer type columns with null value(s), the
numpy.ndarray
will have float-point type and null values will be replaced withNaN
Boolean type columns with null value(s), the
numpy.ndarray
will havenumpy.object
type and null values will beNone
.
Type mapping will be performed as indicated here:
byte -> numpy.int8
, ornumpy.float32
if necessary for null conversionshort -> numpy.int16
, ornumpy.float32
if necessary for null conversionint -> numpy.int32
, ornumpy.float64
if necessary for null conversionlong -> numpy.int64
, ornumpy.float64
if necessary for null conversionBoolean -> numpy.bool
, ornumpy.object
if necessary for null conversionfloat -> numpy.float32
andNULL_FLOAT -> numpy.nan
double -> numpy.float64
andNULL_DOUBLE -> numpy.nan
DBDateTime -> numpy.dtype(datetime64[ns])
andnull -> numpy.nat
String -> numpy.unicode_
(of appropriate length) andnull -> ''
char -> numpy.dtype('U1')
(one character string) andNULL_CHAR -> ''
array/DbArray
if
forPandas=False
and all entries are of compatible shape, then will return a rectangularnumpy.ndarray
of dtype in keeping with the aboveif
forPandas=False
or all entries are not of compatible shape, then returns one-diemnsionalnumpy.ndarray
with dtypenumpy.object
, with each entrynumpy.ndarray
and type mapping in keeping with the above
Anything else should present as a one-dimensional array of type
numpy.object
with entries uninterpreted except by the jpy JNI layer.
Note
The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value
0
). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For
char
arrays, we cannot differentiate between entries whose original value (in java) was0
orNULL_CHAR
.
-
columnToSeries
(table, columnName, convertNulls=0)¶ Produce a copy of the specified column as a
pandas.Series
object.- Parameters
table – the Table object
columnName – the name of the desired column
convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e.
'ERROR'
), enum member (i.e.NULL_CONVERSION.PASS
), or integer value (i.e.2
)
- Returns
pandas.Series
object which reproduces the given column of table as faithfully as possible
Performance for
numpy.ndarray
object is generally much better thanpandas.Series
objects. Consider using thecolumnToNumpyArray()
method, unless you really need apandas.Series
object.Note that the entire column is going to be cloned into memory, so the total number of entries in the column should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as down-selecting rows using the Deephaven query language before converting.
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
-
convertJavaArray
(javaArray, convertNulls='ERROR', forPandas=False)¶ Converts a java array to it’s closest
numpy.ndarray
alternative.- Parameters
javaArray – input java array or dbarray object
convertNulls – member of
NULL_CONVERSION
enum, specifying how to treat null values. Can be specified by string value (i.e.'ERROR'
), enum member (i.e.NULL_CONVERSION.PASS
), or integer value (i.e.2
)forPandas – boolean indicating whether output will be fed into a pandas.Series, which requires that the underlying data is one-dimensional
- Returns
numpy.ndarray
representing as faithful a copy of the java array as possible
The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:
NULL_CONVERSION.ERROR (=0)
[default] inspect for the presence of null values, and raise an exception if one is encountered.NULL_CONVERSION.PASS (=1)
do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.NULL_CONVERSION.CONVERT (=2)
inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):integer type columns with null value(s), the
numpy.ndarray
will have float-point type and null values will be replaced withNaN
Boolean type columns with null value(s), the
numpy.ndarray
will havenumpy.object
type and null values will beNone
.
Type mapping will be performed as indicated here:
byte -> numpy.int8
, ornumpy.float32
if necessary for null conversionshort -> numpy.int16
, ornumpy.float32
if necessary for null conversionint -> numpy.int32
, ornumpy.float64
if necessary for null conversionlong -> numpy.int64
, ornumpy.float64
if necessary for null conversionBoolean -> numpy.bool
, ornumpy.object
if necessary for null conversionfloat -> numpy.float32
andNULL_FLOAT -> numpy.nan
double -> numpy.float64
andNULL_DOUBLE -> numpy.nan
DBDateTime -> numpy.dtype(datetime64[ns])
andnull -> numpy.nat
String -> numpy.unicode_
(of appropriate length) andnull -> ''
char -> numpy.dtype('U1')
(one character string) andNULL_CHAR -> ''
array/DbArray
if
forPandas=False
and all entries are of compatible shape, then will return a rectangularnumpy.ndarray
of dtype in keeping with the aboveif
forPandas=False
or all entries are not of compatible shape, then returns one-diemnsionalnumpy.ndarray
with dtypenumpy.object
, with each entrynumpy.ndarray
and type mapping in keeping with the above
Anything else should present as a one-dimensional array of type
numpy.object
with entries uninterpreted except by the jpy JNI layer.
Note
The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value
0
). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For
char
arrays, we cannot differentiate between entries whose original value (in java) was0
orNULL_CHAR
.
-
createCategoricalSeries
(table, columnName, convertNulls=0)¶ Produce a copy of the specified column as a
pandas.Series
object containing categorical data.- Parameters
table – the Table object
columnName – the name of the desired column
convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e.
'ERROR'
), enum member (i.e.NULL_CONVERSION.PASS
), or integer value (i.e.2
)
- Returns
pandas.Series
object which reproduces the given column of table (as faithfully as possible)
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
-
freezeTable
(table)¶ Helper method for freezing a table
- Parameters
table – the deephaven table
- Returns
the frozen table
-
tableToDataFrame
(table, convertNulls=0, categoricals=None)¶ Produces a copy of a table object as a
pandas.DataFrame
.- Parameters
table – the Table object
convertNulls – member of
NULL_CONVERSION
enum, specifying how to treat null values.categoricals – None, column name, or list of column names to convert a ‘categorical’ data series
- Returns
pandas.Dataframe
object which reproduces table as faithfully as possible
Note that the entire table is going to be cloned into memory, so the total number of entries in the table should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as dropping unnecessary columns and/or down-selecting rows using the Deephaven query language before converting.
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:
NULL_CONVERSION.ERROR (=0)
[default] inspect for the presence of null values, and raise an exception if one is encountered.NULL_CONVERSION.PASS (=1)
do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.NULL_CONVERSION.CONVERT (=2)
inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):integer type columns with null value(s), the
numpy.ndarray
will have float-point type and null values will be replaced withNaN
Boolean type columns with null value(s), the
numpy.ndarray
will havenumpy.object
type and null values will beNone
.
Conversion for different data types will be performed as indicated here:
byte -> numpy.int8
, ornumpy.float32
if necessary for null conversionshort -> numpy.int16
, ornumpy.float32
if necessary for null conversionint -> numpy.int32
, ornumpy.float64
if necessary for null conversionlong -> numpy.int64
, ornumpy.float64
if necessary for null conversionBoolean -> numpy.bool
, ornumpy.object
if necessary for null conversionfloat -> numpy.float32
andNULL_FLOAT -> numpy.nan
double -> numpy.float64
andNULL_DOUBLE -> numpy.nan
DBDateTime -> numpy.dtype(datetime64[ns])
andnull -> numpy.nat
String -> numpy.unicode_
(of appropriate length) andnull -> ''
char -> numpy.dtype('U1')
(one character string) andNULL_CHAR -> ''
array/DbArray
if
forPandas=False
and all entries are of compatible shape, then will return a rectangularnumpy.ndarray
of dtype in keeping with the aboveif
forPandas=False
or all entries are not of compatible shape, then returns one-diemnsionalnumpy.ndarray
with dtypenumpy.object
, with each entrynumpy.ndarray
and type mapping in keeping with the above
Anything else should present as a one-dimensional array of type
numpy.object
with entries uninterpreted except by the jpy JNI layer.
Note
The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value
0
). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For
char
arrays, we cannot differentiate between entries whose original value (in java) was0
orNULL_CHAR
.