deephaven.java_to_python

Utilities for converting Deephaven java objects to appropriate python objects.

columnToNumpyArray(table, columnName, convertNulls=0, forPandas=False)

Produce a copy of the specified column as a numpy.ndarray.

Parameters
  • table – the Table object

  • columnName – the name of the desired column

  • convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e. 'ERROR'), enum member (i.e. NULL_CONVERSION.PASS), or integer value (i.e. 2)

  • forPandas – boolean for whether the output will be fed into a pandas.Series (i.e. must be 1-dimensional)

Returns

numpy.ndarray object which reproduces the given column of table (as faithfully as possible)

Note that the entire column is going to be cloned into memory, so the total number of entries in the column should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as down-selecting rows using the Deephaven query language before converting.

Warning

The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.

The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:

  • NULL_CONVERSION.ERROR (=0) [default] inspect for the presence of null values, and raise an exception if one is encountered.

  • NULL_CONVERSION.PASS (=1) do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.

  • NULL_CONVERSION.CONVERT (=2) inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):

    • integer type columns with null value(s), the numpy.ndarray will have float-point type and null values will be replaced with NaN

    • Boolean type columns with null value(s), the numpy.ndarray will have numpy.object type and null values will be None.

Type mapping will be performed as indicated here:

  • byte -> numpy.int8, or numpy.float32 if necessary for null conversion

  • short -> numpy.int16, or numpy.float32 if necessary for null conversion

  • int -> numpy.int32, or numpy.float64 if necessary for null conversion

  • long -> numpy.int64, or numpy.float64 if necessary for null conversion

  • Boolean -> numpy.bool, or numpy.object if necessary for null conversion

  • float -> numpy.float32 and NULL_FLOAT -> numpy.nan

  • double -> numpy.float64 and NULL_DOUBLE -> numpy.nan

  • DBDateTime -> numpy.dtype(datetime64[ns]) and null -> numpy.nat

  • String -> numpy.unicode_ (of appropriate length) and null -> ''

  • char -> numpy.dtype('U1') (one character string) and NULL_CHAR -> ''

  • array/DbArray
    • if forPandas=False and all entries are of compatible shape, then will return a rectangular numpy.ndarray of dtype in keeping with the above

    • if forPandas=False or all entries are not of compatible shape, then returns one-diemnsional numpy.ndarray with dtype numpy.object, with each entry numpy.ndarray and type mapping in keeping with the above

  • Anything else should present as a one-dimensional array of type numpy.object with entries uninterpreted except by the jpy JNI layer.

Note

The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value 0). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.

This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For char arrays, we cannot differentiate between entries whose original value (in java) was 0 or NULL_CHAR.

columnToSeries(table, columnName, convertNulls=0)

Produce a copy of the specified column as a pandas.Series object.

Parameters
  • table – the Table object

  • columnName – the name of the desired column

  • convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e. 'ERROR'), enum member (i.e. NULL_CONVERSION.PASS), or integer value (i.e. 2)

Returns

pandas.Series object which reproduces the given column of table as faithfully as possible

Performance for numpy.ndarray object is generally much better than pandas.Series objects. Consider using the columnToNumpyArray() method, unless you really need a pandas.Series object.

Note that the entire column is going to be cloned into memory, so the total number of entries in the column should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as down-selecting rows using the Deephaven query language before converting.

Warning

The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.

convertJavaArray(javaArray, convertNulls='ERROR', forPandas=False)

Converts a java array to it’s closest numpy.ndarray alternative.

Parameters
  • javaArray – input java array or dbarray object

  • convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e. 'ERROR'), enum member (i.e. NULL_CONVERSION.PASS), or integer value (i.e. 2)

  • forPandas – boolean indicating whether output will be fed into a pandas.Series, which requires that the underlying data is one-dimensional

Returns

numpy.ndarray representing as faithful a copy of the java array as possible

The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:

  • NULL_CONVERSION.ERROR (=0) [default] inspect for the presence of null values, and raise an exception if one is encountered.

  • NULL_CONVERSION.PASS (=1) do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.

  • NULL_CONVERSION.CONVERT (=2) inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):

    • integer type columns with null value(s), the numpy.ndarray will have float-point type and null values will be replaced with NaN

    • Boolean type columns with null value(s), the numpy.ndarray will have numpy.object type and null values will be None.

Type mapping will be performed as indicated here:

  • byte -> numpy.int8, or numpy.float32 if necessary for null conversion

  • short -> numpy.int16, or numpy.float32 if necessary for null conversion

  • int -> numpy.int32, or numpy.float64 if necessary for null conversion

  • long -> numpy.int64, or numpy.float64 if necessary for null conversion

  • Boolean -> numpy.bool, or numpy.object if necessary for null conversion

  • float -> numpy.float32 and NULL_FLOAT -> numpy.nan

  • double -> numpy.float64 and NULL_DOUBLE -> numpy.nan

  • DBDateTime -> numpy.dtype(datetime64[ns]) and null -> numpy.nat

  • String -> numpy.unicode_ (of appropriate length) and null -> ''

  • char -> numpy.dtype('U1') (one character string) and NULL_CHAR -> ''

  • array/DbArray
    • if forPandas=False and all entries are of compatible shape, then will return a rectangular numpy.ndarray of dtype in keeping with the above

    • if forPandas=False or all entries are not of compatible shape, then returns one-diemnsional numpy.ndarray with dtype numpy.object, with each entry numpy.ndarray and type mapping in keeping with the above

  • Anything else should present as a one-dimensional array of type numpy.object with entries uninterpreted except by the jpy JNI layer.

Note

The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value 0). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.

This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For char arrays, we cannot differentiate between entries whose original value (in java) was 0 or NULL_CHAR.

createCategoricalSeries(table, columnName, convertNulls=0)

Produce a copy of the specified column as a pandas.Series object containing categorical data.

Parameters
  • table – the Table object

  • columnName – the name of the desired column

  • convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values. Can be specified by string value (i.e. 'ERROR'), enum member (i.e. NULL_CONVERSION.PASS), or integer value (i.e. 2)

Returns

pandas.Series object which reproduces the given column of table (as faithfully as possible)

Warning

The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.

freezeTable(table)

Helper method for freezing a table

Parameters

table – the deephaven table

Returns

the frozen table

tableToDataFrame(table, convertNulls=0, categoricals=None)

Produces a copy of a table object as a pandas.DataFrame.

Parameters
  • table – the Table object

  • convertNulls – member of NULL_CONVERSION enum, specifying how to treat null values.

  • categoricals – None, column name, or list of column names to convert a ‘categorical’ data series

Returns

pandas.Dataframe object which reproduces table as faithfully as possible

Note that the entire table is going to be cloned into memory, so the total number of entries in the table should be considered before blindly doing this. For large tables (millions of entries or more?), consider measures such as dropping unnecessary columns and/or down-selecting rows using the Deephaven query language before converting.

Warning

The table will be frozen prior to conversion. A table which updates mid-conversion would lead to errors or other undesirable behavior.

The value for convertNulls only applies to java integer type (byte, short, int, long) or java.lang.Boolean array types:

  • NULL_CONVERSION.ERROR (=0) [default] inspect for the presence of null values, and raise an exception if one is encountered.

  • NULL_CONVERSION.PASS (=1) do not inspect for the presence of null values, and pass value straight through without interpretation (Boolean null -> False). This is intended for conversion which is as fast as possible. No warning is generated if null value(s) present, since no inspection is performed.

  • NULL_CONVERSION.CONVERT (=2) inspect for the presence of null values, and take steps to return the closest analogous numpy alternative (motivated by pandas behavior):

    • integer type columns with null value(s), the numpy.ndarray will have float-point type and null values will be replaced with NaN

    • Boolean type columns with null value(s), the numpy.ndarray will have numpy.object type and null values will be None.

Conversion for different data types will be performed as indicated here:

  • byte -> numpy.int8, or numpy.float32 if necessary for null conversion

  • short -> numpy.int16, or numpy.float32 if necessary for null conversion

  • int -> numpy.int32, or numpy.float64 if necessary for null conversion

  • long -> numpy.int64, or numpy.float64 if necessary for null conversion

  • Boolean -> numpy.bool, or numpy.object if necessary for null conversion

  • float -> numpy.float32 and NULL_FLOAT -> numpy.nan

  • double -> numpy.float64 and NULL_DOUBLE -> numpy.nan

  • DBDateTime -> numpy.dtype(datetime64[ns]) and null -> numpy.nat

  • String -> numpy.unicode_ (of appropriate length) and null -> ''

  • char -> numpy.dtype('U1') (one character string) and NULL_CHAR -> ''

  • array/DbArray
    • if forPandas=False and all entries are of compatible shape, then will return a rectangular numpy.ndarray of dtype in keeping with the above

    • if forPandas=False or all entries are not of compatible shape, then returns one-diemnsional numpy.ndarray with dtype numpy.object, with each entry numpy.ndarray and type mapping in keeping with the above

  • Anything else should present as a one-dimensional array of type numpy.object with entries uninterpreted except by the jpy JNI layer.

Note

The numpy unicode type uses 32-bit characters (there is no 16-bit option), and is implemented as a character array of fixed-length entries, padded as necessary by the null character (i.e. character of integer value 0). Every entry in the array will actually use as many characters as the longest entry, and the numpy fetch of an entry automatically trims the trailing null characters.

This will require much more memory (doubles bit-depth and pads all strings to the length of the longest) in python versus a corresponding java String array. If the original java String has any trailing null (zero-value) characters, these will be ignored in python usage. For char arrays, we cannot differentiate between entries whose original value (in java) was 0 or NULL_CHAR.