deephaven.java_to_python¶
Utilities for converting Deephaven java objects to appropriate python objects.
-
columnToNumpyArray(table, columnName, convertNulls=0, forPandas=False)¶ Produce a copy of the specified column as a
numpy.ndarray.- Parameters
table – the Table object
columnName – the name of the desired column
convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e.
'ERROR'), enum member (i.e.NULL_CONVERSION.PASS), or integer value (i.e.2)forPandas – boolean for whether the output will be fed into a pandas.Series (i.e. must be 1-dimensional)
- Returns
numpy.ndarrayobject which reproduces the given column of table (as faithfully as possible)
Note that the entire column is going to be cloned into memory, so the total number of entries in the column should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as down-selecting rows using the Deephaven query language before converting.
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:
NULL_CONVERSION.ERROR (=0)[default] inspect for the presence of null values, and raise an exception if one is encountered.NULL_CONVERSION.PASS (=1)do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.NULL_CONVERSION.CONVERT (=2)inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):integer type columns with null value(s), the
numpy.ndarraywill have float-point type and null values will be replaced withNaNBoolean type columns with null value(s), the
numpy.ndarraywill havenumpy.objecttype and null values will beNone.
Type mapping will be performed as indicated here:
byte -> numpy.int8, ornumpy.float32if necessary for null conversionshort -> numpy.int16, ornumpy.float32if necessary for null conversionint -> numpy.int32, ornumpy.float64if necessary for null conversionlong -> numpy.int64, ornumpy.float64if necessary for null conversionBoolean -> numpy.bool, ornumpy.objectif necessary for null conversionfloat -> numpy.float32andNULL_FLOAT -> numpy.nandouble -> numpy.float64andNULL_DOUBLE -> numpy.nanDBDateTime -> numpy.dtype(datetime64[ns])andnull -> numpy.natString -> numpy.unicode_(of appropriate length) andnull -> ''char -> numpy.dtype('U1')(one character string) andNULL_CHAR -> ''array/DbArrayif
forPandas=Falseand all entries are of compatible shape, then will return a rectangularnumpy.ndarrayof dtype in keeping with the aboveif
forPandas=Falseor all entries are not of compatible shape, then returns one-diemnsionalnumpy.ndarraywith dtypenumpy.object, with each entrynumpy.ndarrayand type mapping in keeping with the above
Anything else should present as a one-dimensional array of type
numpy.objectwith entries uninterpreted except by the jpy JNI layer.
Note
The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value
0). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For
chararrays, we cannot differentiate between entries whose original value (in java) was0orNULL_CHAR.
-
columnToSeries(table, columnName, convertNulls=0)¶ Produce a copy of the specified column as a
pandas.Seriesobject.- Parameters
table – the Table object
columnName – the name of the desired column
convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e.
'ERROR'), enum member (i.e.NULL_CONVERSION.PASS), or integer value (i.e.2)
- Returns
pandas.Seriesobject which reproduces the given column of table as faithfully as possible
Performance for
numpy.ndarrayobject is generally much better thanpandas.Seriesobjects. Consider using thecolumnToNumpyArray()method, unless you really need apandas.Seriesobject.Note that the entire column is going to be cloned into memory, so the total number of entries in the column should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as down-selecting rows using the Deephaven query language before converting.
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
-
convertJavaArray(javaArray, convertNulls='ERROR', forPandas=False)¶ Converts a java array to it’s closest
numpy.ndarrayalternative.- Parameters
javaArray – input java array or dbarray object
convertNulls – member of
NULL_CONVERSIONenum, specifying how to treat null values. Can be specified by string value (i.e.'ERROR'), enum member (i.e.NULL_CONVERSION.PASS), or integer value (i.e.2)forPandas – boolean indicating whether output will be fed into a pandas.Series, which requires that the underlying data is one-dimensional
- Returns
numpy.ndarrayrepresenting as faithful a copy of the java array as possible
The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:
NULL_CONVERSION.ERROR (=0)[default] inspect for the presence of null values, and raise an exception if one is encountered.NULL_CONVERSION.PASS (=1)do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.NULL_CONVERSION.CONVERT (=2)inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):integer type columns with null value(s), the
numpy.ndarraywill have float-point type and null values will be replaced withNaNBoolean type columns with null value(s), the
numpy.ndarraywill havenumpy.objecttype and null values will beNone.
Type mapping will be performed as indicated here:
byte -> numpy.int8, ornumpy.float32if necessary for null conversionshort -> numpy.int16, ornumpy.float32if necessary for null conversionint -> numpy.int32, ornumpy.float64if necessary for null conversionlong -> numpy.int64, ornumpy.float64if necessary for null conversionBoolean -> numpy.bool, ornumpy.objectif necessary for null conversionfloat -> numpy.float32andNULL_FLOAT -> numpy.nandouble -> numpy.float64andNULL_DOUBLE -> numpy.nanDBDateTime -> numpy.dtype(datetime64[ns])andnull -> numpy.natString -> numpy.unicode_(of appropriate length) andnull -> ''char -> numpy.dtype('U1')(one character string) andNULL_CHAR -> ''array/DbArrayif
forPandas=Falseand all entries are of compatible shape, then will return a rectangularnumpy.ndarrayof dtype in keeping with the aboveif
forPandas=Falseor all entries are not of compatible shape, then returns one-diemnsionalnumpy.ndarraywith dtypenumpy.object, with each entrynumpy.ndarrayand type mapping in keeping with the above
Anything else should present as a one-dimensional array of type
numpy.objectwith entries uninterpreted except by the jpy JNI layer.
Note
The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value
0). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For
chararrays, we cannot differentiate between entries whose original value (in java) was0orNULL_CHAR.
-
createCategoricalSeries(table, columnName, convertNulls=0)¶ Produce a copy of the specified column as a
pandas.Seriesobject containing categorical data.- Parameters
table – the Table object
columnName – the name of the desired column
convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e.
'ERROR'), enum member (i.e.NULL_CONVERSION.PASS), or integer value (i.e.2)
- Returns
pandas.Seriesobject which reproduces the given column of table (as faithfully as possible)
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
-
freezeTable(table)¶ Helper method for freezing a table
- Parameters
table – the deephaven table
- Returns
the frozen table
-
tableToDataFrame(table, convertNulls=0, categoricals=None)¶ Produces a copy of a table object as a
pandas.DataFrame.- Parameters
table – the Table object
convertNulls – member of
NULL_CONVERSIONenum, specifying how to treat null values.categoricals – None, column name, or list of column names to convert a ‘categorical’ data series
- Returns
pandas.Dataframeobject which reproduces table as faithfully as possible
Note that the entire table is going to be cloned into memory, so the total number of entries in the table should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as dropping unnecessary columns and/or down-selecting rows using the Deephaven query language before converting.
Warning
The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.
The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:
NULL_CONVERSION.ERROR (=0)[default] inspect for the presence of null values, and raise an exception if one is encountered.NULL_CONVERSION.PASS (=1)do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.NULL_CONVERSION.CONVERT (=2)inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):integer type columns with null value(s), the
numpy.ndarraywill have float-point type and null values will be replaced withNaNBoolean type columns with null value(s), the
numpy.ndarraywill havenumpy.objecttype and null values will beNone.
Conversion for different data types will be performed as indicated here:
byte -> numpy.int8, ornumpy.float32if necessary for null conversionshort -> numpy.int16, ornumpy.float32if necessary for null conversionint -> numpy.int32, ornumpy.float64if necessary for null conversionlong -> numpy.int64, ornumpy.float64if necessary for null conversionBoolean -> numpy.bool, ornumpy.objectif necessary for null conversionfloat -> numpy.float32andNULL_FLOAT -> numpy.nandouble -> numpy.float64andNULL_DOUBLE -> numpy.nanDBDateTime -> numpy.dtype(datetime64[ns])andnull -> numpy.natString -> numpy.unicode_(of appropriate length) andnull -> ''char -> numpy.dtype('U1')(one character string) andNULL_CHAR -> ''array/DbArrayif
forPandas=Falseand all entries are of compatible shape, then will return a rectangularnumpy.ndarrayof dtype in keeping with the aboveif
forPandas=Falseor all entries are not of compatible shape, then returns one-diemnsionalnumpy.ndarraywith dtypenumpy.object, with each entrynumpy.ndarrayand type mapping in keeping with the above
Anything else should present as a one-dimensional array of type
numpy.objectwith entries uninterpreted except by the jpy JNI layer.
Note
The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value
0). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For
chararrays, we cannot differentiate between entries whose original value (in java) was0orNULL_CHAR.