Importing Data without Schemas

If you want to import datasets into memory for temporary use, you can use the readCsv command or you can select Upload Delimited File from the More Actions drop-down menu in the Deephaven console to upload a CSV file or other formats. Importing data into a memory table using these methods is recommended for datasets containing no more than 50,000 rows of data.

Once loaded into memory, these datasets can also be saved to a Deephaven User Table. These are distinguished from Deephaven System tables in that they have no XML schema, but are otherwise accessible from Deephaven APIs in the same way.

Importing Into Memory

readCsv

readCsv is a command available within Deephaven scripts that directly reads a file on the query server to a table object in the console. The syntax follows:

  myTable=readCsv("path_to_file")

For use in Python, this method is in the deephaven.TableTools module. It can be imported directly into the namespace with the statement

from deephaven.TableTools import readCsv

An optional second parameter to readCsv can be used to specify an alternate format or a field delimiter. Formatting options for the second parameter include the following:

Parameter Value

Description

TRIM

This format ignores leading or trailing spaces around a value that are not inside double quotes. When a second parameter to readCSV is not specified, the TRIM format is used.

DEFAULT

Apache Commons CSV default format.

EXCEL

Microsoft Excel CSV format. (Note: Excel CSV files can usually be imported with the TRIM or DEFAULT format.)

MYSQL

MySQL CSV format

RFC4180

IETF RFC 4180 MIME text/csv format

TDF

Tab-delimited format

For example, the following can be used to import a tab-delimited text file:

from deephaven import *

# default case - assumes comma delimited and trims spaces not contained in quotes
myTable1 = ttools.readCsv("path_to_file")  # equivalent to readCsv("path_to_file", "TRIM")

# tab-delimited file
myTable2 = ttools.readCsv("path_to_file", "TDF")

# user specified delimiter - for example ";"
myTable3 = ttools.readCsv("path_to_file", ";")
// default case - assumes comma delimited and trims spaces not contained in quotes
myTable1 = readCsv("path_to_file")  // equivalent to readCsv("path_to_file", "TRIM")

// tab-delimited file
myTable2 = readCsv("path_to_file", "TDF")

// user specified delimiter - for example ";"
myTable3 = readCsv("path_to_file", ";")

Possibilities for a different field delimiter include semicolons (;), colons (:), pipes (|), and spaces. Any single character can be specified.

readHeaderlessCsv

The readHeaderlessCsv method can read a CSV file that does not have column headers. It will either generate column names (Col1, Col2, etc.) or use a provided list of column names.

The syntax follows:

readHeaderlessCsv (filePath, header)

The argument must include the full file path, and - optionally - the user may provide column names in the header parameter as strings. Column data types are inferred from the data.

See the Javadocs for full details and other options.

Upload CSV

Internally, the Upload CSV option uses readCsv to import data, but rather than expecting the source file to be accessible from the query server, it allows you to upload and import a file from your local filesystem. This option is only available when running a console in Deephaven.

To access Upload CSV, first open a console window in Deephaven. Click More Actions and then select Upload CSV from the drop-down menu. You will then be prompted to locate the file you want to upload.

Upload CSV does not allow for additional arguments to be passed to readCsv, therefore it only supports imports using the TRIM format and a comma delimiter.

Example

Let's look at an example where we have a CSV file that contains data we need for an analysis. The table is named tableXYZ.csv(Note:  This table does not exist in Deephaven; it is used for demonstration only.)

Open Deephaven, select Create Console, click Connect.

Click More Actions, and then select Import CSV from the drop-down menu. A dialog window will open, which you can then use to navigate to the location on your system where the file is stored. Once the file is selected, click OK.

You will be asked to provide a name for the variable that will hold the data in CSV file being imported. For this example, we will enter tableXYZ into the field, and then click OK.

A new tab titled tableXYZ will open in the lower part of the console window. The table presented in the tabbed panel will contain the data from the CSV file.

This table has been saved to a variable in Deephaven, and can be used immediately for your analyses. However, if you quit Deephaven, the data in that table is no longer available. To save the table for further use, you need to create a User Table (see below), which is saved to a namespace in Deephaven.

readJdbc

readJdbc reads directly from a JDBC query result into an in-memory table. The metadata provided by the JDBC ResultSet is used to generate Deephaven columns of the most appropriate type. There are two variants of this function:

To read execute a query against a given JDBC data source:

from deephaven import *

myTable = JdbcHelpers.readJdbc(<JDBC driver name>, <JDBC URL>, [<user>, <password>], <query>, [<options>])
myTable = readJdbc(<JDBC driver name>, <JDBC URL>, [<user>, <password>], <query>, [<options>])

The user and password parameters are optional (some drivers take user and password in the URL), as is the options argument.

To read a given ResultSet already generated by a query executed (this will consume the ResultSet):

from deephaven import *

myTable = JdbcHelpers.readJdbc(<Result Set>, [<options>])
myTable = readJdbc(<Result Set>, [<options>])

The options may be used to specify the following:

Parameter Value

Description

Default

columnNameFormat(<src format>, <target format>)

Expected source and target column name format. This may be used to systematically convert column names (i.e., "my_col_name" to "MyColName").

None

maxRows(<maxRows>)

Maximum number of rows to read from the result set (typically a LIMIT or TOP clause in the SQL query is a better way to achieve this).

No Limit

strict(<strict>)

Whether to apply strict mode when mapping JDBC to Deephaven (throw an exception instead of truncating when out of range values are encountered).

true

sourceTimeZone(<time zone>)

Specify the time zone to use when interpreting JDBC dates and timestamps.

Data source time zone if possible, otherwise local

arrayDelimiter(<array delimiter>)

The delimiter to use when interpreting JDBC strings as array values.

"," (comma)

columnTargetType(<column name>,<target type>)

The Deephaven type to use for the given column. Some SQL types can be mapped to multiple Deephaven types, in which this option can be used to specify an alternative to the default.

Default

For example, the following can be used to import from a SQL Server data source:

from deephaven import *

newTable = JdbcHelpers.readJdbc("com.microsoft.sqlserver.jdbc.SQLServerDriver",
                                "jdbc:sqlserver://dbserverhost;database=dbname",
                                "myuser",
                                "mypassword",
                                "SELECT * FROM table1")
newTable = readJdbc("com.microsoft.sqlserver.jdbc.SQLServerDriver",
                    "jdbc:sqlserver://dbserverhost;database=dbname",
                    "myuser",
                    "mypassword",
                    "SELECT * FROM table1")

The following would import from the same source, but map the column names from a "lower underscore" format to "upper camel" format, and interpret SQL dates and timestamps as UTC. Note the use of the Google CaseFormat type.

from deephaven import *

newTable = JdbcHelpers.readJdbc("com.microsoft.sqlserver.jdbc.SQLServerDriver",
                                "jdbc:sqlserver://dbserverhost;database=dbname",
                                "myuser",
                                "mypassword",
                                "SELECT * FROM table1",
                                JdbcHelpers.readJdbcOptions()
                                    .sourceTimeZone("UTC")
                                    .columnNameFormat("UpperCamel", "_")
                                    .readJdbcOptions  # get the Java builder object from the Python object
                                )
import com.illumon.iris.utils.CasingStyle
newTable = readJdbc("com.microsoft.sqlserver.jdbc.SQLServerDriver",
	                "jdbc:sqlserver://dbserverhost;database=dbname",
	                "myuser",
	                "mypassword",
	                "SELECT * FROM table1",
	                readJdbcOptions()
	                    .sourceTimeZone("UTC")
	                    .columnNameFormat(CasingStyle.UpperCamel, "_"))

Saving Datasets as User Tables

Creating a User Table

Creating and saving a User Table is accomplished using the addTable method in a query. Here's the syntax:

db.addTable("<nameSpace>", "<newTableName>", <source>)

  • addTable is the method
  • <nameSpace> is the target namespace in which to store the User Table.  The value entered can be the name of an existing namespace, or you can use this argument to create a new namespace on Deephaven.   Note:  You can only create a new namespace with proper authorization, and you can only import tables into namespaces for which you have permission.
  • <newTableName> is the name to be used for the User Table after it is saved. 
  • <source> is the source of the data to be stored.

We already have our CSV file imported into Deephaven, and saved to the variable tableXYZ. Now we want to save that table to our namespace. Here's the query:

db.addTable("training", "myTable", tableXYZ)

  • addTable is the method
  • training is the target namespace in which to store the User Table. 
  • myTable is the name to be used for the User Table after it is saved. 
  • tableXYZ is the source of the data to be stored, which is the table that was imported earlier.

When you run this query in Deephaven, the table will be imported into Deephaven and saved in the namespace specified. The console will provide feedback only if the import failed.  If there is no feedback, the import was successful.

To use the table you just imported, you first need to restart the Deephaven console.  To do this, click More Actions and then select Reconnect from the drop-down menu. Once the console has reconnected, you can confirm the table was imported by running the following query:

t = db.t("training", "myTable")

Deleting a User Table

Just as you can add a table to your namespace, you can also remove a table from Deephaven.  This is accomplished using the removeTable method.  The syntax follows:

db.removeTable("<nameSpace>", "<tableName>")

  • removeTable is the method
  • <nameSpace> is the namespace in which the table you want to delete is stored. 
  • <newTableName> is the name of the table to be deleted.  Note:  You can only delete a table from a namespace if you have permission to do so.


Last Updated: 16 February 2021 18:06 -04:00 UTC    Deephaven v.1.20200928  (See other versions)

Deephaven Documentation     Copyright 2016-2020  Deephaven Data Labs, LLC     All Rights Reserved