Importing Data without Schemas

If you want to import datasets into memory for temporary use, you can use the readCsv command or you can select Upload CSV from the More Actions drop-down menu in the Deephaven console. Importing data into a memory table using these methods is recommended for datasets containing no more than 50,000 rows of data.

Once loaded into memory, these datasets can also be saved to a Deephaven User Table. These are distinguished from Deephaven System tables in that they have no XML schema, but are otherwise accessible from Deephaven APIs in the same way.

Importing Into Memory

readCsv

readCsv is a command available within Deephaven scripts that directly reads a file on the query server to a table object in the console. The syntax follows:

  myTable=readCsv("path_to_file")

An optional second parameter to readCsv can be used to specify an alternate format or a field delimiter. Formatting options for the second parameter include the following:

Parameter Value

Description

TRIM

This format ignores leading or trailing spaces around a value that are not inside double quotes. When a second parameter to readCSV is not specified, the TRIM format is used.

DEFAULT

Apache Commons CSV default format.

EXCEL

Microsoft Excel CSV format. (Note: Excel CSV files can usually be imported with the TRIM or DEFAULT format.)

MYSQL

MySQL CSV format

RFC4180

IETF RFC 4180 MIME text/csv format

TDF

Tab-delimited format

For example, the following can be used to import a tab-delimited text file:

newTable=readCsv("/home/me/somefilename.txt","TDF")

Possibilities for a different field delimiter include semicolons (;), colons (:), pipes (|), and spaces. Any single character can be specified. The following example specifies a semicolon as the field delimiter:

myTable=readCsv("/home/me/filename.csv",";")

Upload CSV

Internally, the Upload CSV option uses readCsv to import data, but rather than expecting the source file to be accessible from the query server, it allows you to upload and import a file from your local file system. This option is only available when running a console in Deephaven.

To access Upload CSV, first open a console window in Deephaven. Click More Actions and then select Upload CSV from the drop-down menu. You will then be prompted to locate the file you want to upload.

Upload CSV does not allow for additional arguments to be passed to readCsv, therefore it only supports imports using the TRIM format and a comma delimiter.

Example

Let's look at an example where we have a CSV file that contains data we need for an analysis. The table is named tableXYZ.csv(Note:  This table does not exist in Deephaven; it is used for demonstration only.)

Open Deephaven, select Create Console, click Connect.

Click More Actions and the select Import CSV from the drop-down menu. A dialog window will open, which you can then use to navigate to the location on your system where the file is stored. Once the file is selected, click OK.

You will be asked to provide a name for the variable that will hold the data in CSV file being imported. For this example, we will enter tableXYZ into the field, and then click OK.

A new tab titled tableXYZ will open in the lower part of the console window. The table presented in the tabbed panel will contain the data from the CSV file.

This table has been saved to a variable in Deephaven, and can be used immediately for your analyses. However, if you quit Deephaven, the data in that table is no longer available. To save the table for further use, you need to create a User Table (see below), which is saved to a namespace in Deephaven.

Saving Datasets as User Tables

Creating a User Table

Creating and saving a User Table is accomplished using the addTable method in a query. Here's the syntax:

db.addTable("<nameSpace>", "<newTableName>", <source>)

  • addTable is the method
  • <nameSpace> is the target namespace in which to store the User Table.  The value entered can be the name of an existing namespace, or you can use this argument to create a new namespace on Deephaven.   Note:  You can only create a new namespace with proper authorization, and you can only import tables into namespaces for which you have permission.
  • <newTableName> is the name to be used for the User Table after it is saved. 
  • <source> is the source of the data to be stored.

We already have our CSV file imported into Deephaven, and saved to the variable tableXYZ. Now we want to save that table to our namespace. Here's the query:

db.addTable("training", "myTable", tableXYZ)

  • addTable is the method
  • training is the target namespace in which to store the User Table. 
  • myTable is the name to be used for the User Table after it is saved. 
  • tableXYZ is the source of the data to be stored, which is the table that was imported earlier.

When you run this query in Deephaven, the table will be imported into Deephaven and saved in the namespace specified. The console will provide feedback only if the import failed.  If there is no feedback, the import was successful.

To use the table you just imported, you first need to restart the Deephaven console.  To do this, click More Actions and then select Reconnect from the drop-down menu. Once the console has reconnected, you can confirm the table was imported by running the following query:

t = db.t("training", "myTable")

Deleting a User Table

Just as you can add a table to your namespace, you can also remove a table from Deephaven.  This is accomplished using the removeTable method.  The syntax follows:

db.removeTable("<nameSpace>", "<tableName>")

  • removeTable is the method
  • <nameSpace> is the namespace in which the table you want to delete is stored. 
  • <newTableName> is the name of the table to be deleted.  Note:  You can only delete a table from a namespace if you have permission to do so.


Last Updated: 23 September 2019 12:17 -04:00 UTC    Deephaven v.1.20181212  (See other versions)

Deephaven Documentation      Copyright 2016-2019  Deephaven Data Labs, LLC      All Rights Reserved