The Deephaven Server Installation Guide

This document details the steps to installing Deephaven v1.20200331.

Overview

Installation from a base system starts with designating server roles (described in Planning the Installation), installing prerequisites, installing the Deephaven binaries, and configuring Deephaven services on each node based on the decided roles.

Deephaven uses etcd to persist and distribute configuration data across servers. In test installations, a single node can run all Deephaven services and the sole etcd instance; this provides no resilience or high availability, but it makes for a simple installation for test/dev/evaluation use. Production installations should have a quorum of at least three nodes running etcd. This provides fault detection and fault tolerance in case one node should fail.

System Requirements

Deephaven requires a Linux variant OS for the server nodes. Clients, and some remote services, like data loggers, can run on a variety of operating systems, including OSX and Windows, but the server installations need to be Linux, such as:

  • CentOS or RHEL 7 or later
  • OpenSUSE Tumbleweed

At least one server should have high speed storage locally attached for data import and streaming data. All query servers should have shared storage, such as NFS, for shared access to historical data.

Deephaven can be installed on a Linux server or virtual machine.

Minimum Hardware Requirements

Processor:

x86_64 (64-bit)

RAM:

128 GB minimum, 512 GB preferred

Intraday Data

Mount: /db/Intraday

Size: Sized for several days of the data set.

Type: Low-latency direct attached; SSD is usually ideal.

Needed on server(s) running the Data Import Server (DIS) process or batch import processes.

Historical Data

Mount: /db/Systems

Size: Depends on the size of the historical data set.

Type: Network attached storage shareable between query servers (i.e., NFS).

Binary Log Data

Mount: /var/log/deephaven/binlogs

Size: Sufficient to store at least a day's streamed data.

Type: Low-latency direct attached; SSD is usually ideal.

Needed on server(s) running logger and tailer processes. Deephaven services also generate binary logs for internal monitoring.

Process Log Files

Mount: /var/log/deephaven other than binlogs

Size: Sufficient for a minimum of a week's log files.

Type: Locally attached disk.

Needed on all Deephaven servers.

Operating System:

CentOS/RHEL version 7.x or greater

OpenSUSE Tumbleweed

Regarding System RAM

Deephaven connections — either from users/applications interactively executing queries, or from persistent queries managed by the Deephaven Controller process — allocate heap space to worker processes spawned on the server(s). The amount of heap allocated to a worker is configurable, and more complex queries that manipulate more and larger columns of data will need more heap.

A reasonable starting point in determining the RAM needed for a query server is to allow 8GB per concurrent user and per persistent query. For example, 160GB of RAM should be able to comfortably serve 10 users running 10 persistent queries.

Operating System Settings

Turn off swap in the kernel.

echo "vm.swappiness = 0"| sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Note: Many Deephaven processes use large amounts of RAM. If swapping is permitted, the kernel may swap a significant portion of RAM allocated to Deephaven processes. In some environments, this can drastically increase the time needed to run garbage collection (GC). Extremely long GCs can cause network timeouts that adversely impact system stability. For this reason, in addition to general performance concerns, it is strongly recommended that servers running Deephaven are provisioned with ample RAM and configured to avoid swapping.

Set minimum process limits.

Add or adjust the following settings in /etc/security/limits.conf file:

*          soft    nproc     204800
*          hard    nproc     204800
root       soft    nproc     unlimited

Set minimum open files limits.

Add or adjust the following settings in /etc/security/limits.conf file:

cat /etc/security/limits.d/*-nofile.conf
*          soft    nofile     65535
*          hard    nofile     65535
root       soft    nofile     unlimited

Kernel Settings

The use of Transparent Huge Page memory should be disabled.

To check for use of Transparent Huge Pages:

cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

To disable Transparent Huge Pages in the kernel for Centos 7:

sudo vi /etc/default/grub

Locate the line starting with: GRUB_CMDLINE_LINUX=

Add the following option to the current list: 'transparent_hugepage=never'

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

The IO scheduler should be set to use the 'deadline' scheduler.

Centos 7:

sudo vi /etc/default/grub
Locate the line starting with: GRUB_CMDLINE_LINUX=
Add the following option to the current list: 'elevator=deadline'
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Reboot the server for the settings to take effect.

Prerequisites

Each Deephaven system (server or client) must have a Java JDK installed. For this release, Java 8 is the only supported release. Oracle's JDK can be used, as well as the OpenJDK, and third party OpenJDK distributions, such as Azul Zulu. Java versions should be consistent across servers and clients to ensure that cryptographic ciphers and certificate signing authorities can be correctly negotiated and validated when making secure connections between processes.

Each server node that will be part of the etcd cluster must have etcd installed prior to initial configuration of the Deephaven services.

Planning the Installation

Although Deephaven can run with all its component services on a single server, this type of "single-node" deployment is normally only used for test or development purposes.

Production installs will typically have three or more servers, as shown below. The primary reasons for this number of servers are to allow specialized configuration and to provide fault tolerance of system configuration management. The system will have designated server roles, such as query servers or merge servers. (See Scaling to Multiple Servers to learn more.)

Within a Deephaven deployment, the following services are used:

  • Query Server - one or more instances per deployment. This is the main service that processes user queries, and query server instances will often run on dedicated query servers in larger deployments.
    • global-rqd-server is the Query Server used by remote processes which do not have a specific service.name stanza with a RemoteQueryDispatcherParameters.host entry in the iris-endpoints.prop file; i.e., if a custom Java application is running on a standalone server or workstation and attempts to launch a Deephaven worker without specifying a Query Server, this is the Query Server it will connect to.

    • console-rqd-server is the default Query Server used by Console instances.

  • Merge Server - usually one instance per deployment. This service is responsible for writing historical data tables.
  • Data Import Server (DIS) - one or more instances per deployment. This service writes data to intraday tables based on streams sent to it by the tailers.
  • Controller - usually one instance per deployment. The controller manages scheduled persistent queries.
  • Client Update Service (CUS) - usually one instance per deployment. The client update service facilitates installation and updating of client components on Windows, OSX, and Linux clients.
  • Local Table Data Service (LTDS) - usually one instance per DIS. This service provides query servers with access to previous days' intraday data. Current day intraday data is sourced from the DIS process that wrote the data.
  • Table Data Cache Proxy (TDCP) - optional, but, when used, usually one instance per query server. This process caches and consolidates client data requests.
  • Log Aggregation Service (LAS) - one per Deephaven server. This process allows other Deephaven processes to write log files.
  • Configuration Service - At least one instance per deployment; additional instances can be deployed for high availability. This process allows other Deephaven processes to access common configuration data. Because the Configuration Service is a dependency for most other Deephaven services, it is strongly recommended to deploy multiple instances in production installations.
  • Authentication Server - usually one or two per Deephaven installation. This process authenticates login requests from Deephaven client processes. The Authentication Server can be configured to use a variety of back-end account stores for its credential authorities.
  • DB ACL Write Server - usually one per Deephaven installation. Provides write access to the database used for storage of Deephaven accounts and permissions data.
  • Web API Service - usually one per Deephaven installation. Provides HTTP(S) presence and port handling for Deephaven Web API and Web IDE clients.
  • Tailer - at least one per Deephaven server. Additional tailers may run on external servers. This process watches for new binary log files - usually used to transmit streaming data - and feeds new data from them to the Data Import Server.

The Remote Table Appender unifies the user and system data stream by putting user data into the Log Aggregator Service-Tailer-Data Import Service pipeline. This increases stability, persistence, and repeatability.

Besides these core Deephaven services, Deephaven also makes use of third-party services:

  • m/monit - one instance per Deephaven server. A process monitoring service normally used to start and manage Deephaven services.
  • etcd - etcd servers can be dedicated to just running etcd, or can be servers that also run other Deephaven services; however, a Deephaven installation must have its own etcd cluster and cannot share one, or the servers of one, that is used for other purposes. Single node installations are possible, but odd numbers of multiple nodes are recommended to allow fault tolerance and high availability. A distributed property store that is used as the back-end data store for the Configuration Service.
  • Mysql or MariaDB - usually one instance per Deephaven deployment. Can be external to the Deephaven deployment. A relational database used as the back-end to the Authentication Service to store Deephaven user, group, and permissions information.
  • envoy - optional. Usually one instance per Deephaven installation. May be external to the Deephaven installation. Provides reverse proxy handling between external Web API clients and the Web API Service, allowing a single external port to map to multiple internal Web API ports.

When planning a new installation, it is important to plan the number of servers, and which servers should run which of the Deephaven services. Scale of load (number of concurrent users and the sizes of their queries), the volume of incoming data, and fault tolerance needs, are all considerations for this planning.

A typical small production deployment might look like this:

"Infra" node:

  • Etcd
  • DIS
  • LTDS
  • LAS

Merge Server

  • CUS
  • Web API Service
  • MariaDB
  • Authentication Server
  • DB ACL Write Server
  • Controller
  • Tailer
  • M/Monit

"Query" server 1

  • Etcd
  • Query Server
  • LAS
  • Tailer
  • M/Monit

"Query" server 2

  • Etcd
  • Query Server
  • LAS
  • Tailer
  • M/Monit

This provides fault tolerance for etcd-maintained configuration data, and scale-out of query processing across two query servers, but no fault tolerance for other Deephaven processes. The "Infra" server should have fast locally attached storage for intraday data being written by the DIS, and all servers should have shared (e.g. NFS) storage for access to historical data (written by the merge server or merge processes and read by the query servers).

Certificate Requirements

For production environments, Deephaven requires a certificate containing an Extended Key Usage with TLS Web Server Authentication and TLS Web Client Authentication, and a Subject Alternative Name containing the Fully Qualified Domain Name of the host of the Web API Service (it is common to use a wildcard certificate for ease of reuse, but note that the wildcard certificate is only valid for one level of subdomain; *.my.company suffices for test.my.company, but NOT for my.test.my.company). This certificate should be obtained either from an internal corporate certificate authority that is already trusted for use in the organization, or from an external provider, such as Digicert or Verisign. The certificate and key files must be named (or renamed to) tls.crt and tls.key and must be PEM format without passwords.

For test, demo, and lab installations, the installation process can create self-signed certificates, and configure some services to work in an insecure fashion. This, however, will also require approval of exceptions when accessing Deephaven Web services, and may require configuration of the Envoy reverse proxy for any significant amount of Web console use (details below). In general, properly issued and signed certificates are strongly recommended for most installations.

Installation Process

1. Ensure that default non-login shell umask is set appropriately. This is more likely to need to be changed on non-CentOS systems.

  1. Use:

    sudo vi /etc/bashrc

  2. Verify, and, if necessary, update and save, this section:

    # By default, we want umask to get set. This sets it for non-login shell.
        # Current threshold for system reserved uid/gids is 200
        # You could check uidgid reservation validity in
        # /usr/share/doc/setup-*/uidgid file
        if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then
          umask 002
        else
          umask 022
        fi

2. Install a Java 8 JDK on all nodes:

  1.        e.g., (on CentOS, to install the latest Java 8 OpenJDK):

    sudo yum install java-1.8.0-openjdk-devel

3. Install the MySQL connector on the auth server node(s):

sudo yum install mysql-connector-java -y

4. Install other prerequisites:

  1. MySQL or MariaDB (auth server node):

    sudo yum install mariadb-server -y

  2. Lighttpd (CUS node):

    sudo yum install lighttpd -y

  3. M/Monit (all nodes)
    • sudo yum install monit -y

    • sudo systemctl enable monit

    • sudo systemctl start monit

5. On each node that will form part of the etcd cluster, install etcd. Copy the Deephaven etcd RPM to each etcd node and run:

sudo yum install ./dh-etcd-3.3.18-1.el7.x86_64.rpm -y

6. Copy the Deephaven RPM to each node and install the binaries. This will create directory structures under /db, /usr/illumon, /etc/sysconfig/illumon.confs, /etc/sysconfig/illumon.d, and /etc/sysconfig/deephaven:

sudo yum localinstall ./illumon-db-1.20200121.xxx-1-1.rpm

Note: during installation of the RPM, yum will update sudoers to allow root to use specific impersonated groups as shown, depending on your version.

If upgrading from v1.20200928, use:

root ALL=(irisadmin:irisadmin) ALL
root ALL=(dbmerge:dbmerge) ALL
root ALL=(dbquery:dbquery) ALL
root ALL=(etcd:irisadmin) ALL

If upgrading from v1.20201022 or later use both:

root ALL=(irisadmin:irisadmin) ALL
root ALL=(dbmerge:dbmerge) ALL
root ALL=(dbquery:dbquery) ALL
root ALL=(etcd:irisadmin) ALL

and

(irisadmin : irisadmin) NOPASSWD: ALL
(irisadmin : dbmergegrp) NOPASSWD: ALL
(dbquery : dbquery) NOPASSWD: ALL
(dbmerge : dbmerge) NOPASSWD: ALL

Alternatively, for limited-root, tar installations, files should be unpacked manually. Then run pre_install.sh and post_install.sh.

DH_VERSION=1.20200331.123 # Change this
sudo -u irisadmin mkdir -p /tmp/dh_upgrade
sudo -u irisadmin -g irisadmin mkdir -p /usr/illumon
sudo -u irisadmin -g irisadmin mkdir -p /etc/sysconfig/deephaven
sudo -u irisadmin -g irisadmin tar -C /tmp/dh_upgrade -xzf /path/to/illumon-db-${DH_VERSION}.tar.gz
sudo -u irisadmin mv /tmp/dh_upgrade/usr/illumon/illumon-db-${DH_VERSION} /usr/illumon
sudo -u irisadmin mv /tmp/dh_upgrade/etc/sysconfig/deephaven/* /etc/sysconfig/deephaven/
sudo /usr/illumon/illumon-db-${DH_VERSION}/install/with_root/pre_install.sh /usr/illumon/illumon-db-${DH_VERSION}/install/with_root/post_install.sh

7. If using an externally issued certificate, copy the tls.crt and tls.key file to /etc/deephaven/cus-tls/. tls.crt should contain the complete certificate chain, and contain Extended Key Usage per the Certificate Requirements section above. Permissions for these files should be 400 with owner and group of irisadmin.

8. On any node, perform the initial configuration of etcd and its keys.

sudo /usr/illumon/latest/install/etcd/config_generator.sh --servers <IP address of first etcd server> <IP address of second etcd server> <IP address of third etcd server> --max-procs 1 --self-signed

9. Copy the dh_etcd_config.tgz to the other servers that will be running etcd and to any Deephaven servers that will be running the Configuration Server.

10. On the other etcd servers, unpackage the etcd server configuration:

sudo /usr/illumon/latest/install/config_packager.sh etcd unpackage-server <ordinal of the etcd server>

The ordinal is the position number of the etcd server's IP address as it was passed earlier, with the first server being ordinal 1.

Note that this script must be executed in a directory containing the dh_etcd_config.tgz file. It is generated in /etc/sysconfig/deephaven/etcd.

11. On any Deephaven servers that are running the Configuration Server or etcd, unpackage the client configuration:

sudo /usr/illumon/latest/install/config_packager.sh etcd unpackage-client

Again note, that this script must be executed in a directory containing the dh_etcd_config.tgz file. It is generated in /etc/sysconfig/deephaven/etcd.

12. Enable the etcd service on all etcd nodes:

sudo /usr/illumon/latest/install/etcd/enable_dh_etcd_systemd.sh

Note that this command must be run on all servers within ten minutes. If more time is needed, pass the --sequential flag. This will cause the scripts to return immediately, rather than wait for all peer machines to run the script.

After execution, cluster status should be checked with:

sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table

The installation should not continue until all nodes are shown present and the cluster healthy.

A healthy installation should show your nodes in a table with the following header (Note that there may be some warnings about authentication not being enabled; these can be safely ignored.):

+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+

13. On any one of the etcd nodes, initialize the etcd cluster:

sudo /usr/illumon/latest/install/etcd/admin_init.sh

14. Run iris_keygen.sh. This will create keys, interprocess certificates, keystores, and truststores, and configure Deephaven services to use them. This step is run on just one of the configuration server nodes. The keys, certificates, and passwords generated here will be copied to the other nodes in later steps.

  1. If a web certificate was provided:

    sudo /usr/illumon/latest/install/iris_keygen.sh --config-server <FQDN of Configuration Server host>

  2. Otherwise, for self-signed certificate installations:

    sudo /usr/illumon/latest/install/iris_keygen.sh --config-server <FQDN of Configuration Server host> --web-server <FQDN of Web API Server host>

15. Use sudo cp to copy the mysql connector jar (mysql-connector-java-5.1.44.jar or mysql-connector-java.jar) to /etc/sysconfig/illumon.d/java_lib on all nodes. Installing the connector would have placed the jar initially in /usr/share/java/mysql-connector-java.jar on the server where yum was run in step 3.

16. Configure the Deephaven Configuration Server settings on the same node where iris_keygen was run in step 14:

/usr/illumon/latest/install/dh_config_server_init.sh --server <CONFIG_SERVER_FQDN>

17. Configure installation properties: sudo /usr/illumon/latest/install/set_iris_endpoints_prop.sh with arguments to indicate which services are hosted on which servers. This is run on the Configuration Service node.

18. Configure data routing on the DIS/Merge Server:

/usr/illumon/latest/install/set_routing_service_hosts.sh <FQDN of Data Import Server host> <FQDN of Remote Table Appender host>

In many cases, these two entries will be the same server, which is hosting intraday data processing and user table data.

19. If using a local mariadb ACL database, run the dbacl_init.sh script as root:

sudo /usr/illumon/latest/install/with_root/dbacl_init.sh

This script will serve two functions:

  • Automatically updates ACL to recommended system defaults (see /etc/sysconfig/illumon.d/sql/init-dbacl-and-tables.sql and /etc/sysconfig/illumon.d/sql/iris-users.sql).
  • Replaces default, insecure mysql user passwords with randomly generated secure passwords.

If running mysql, or using an external mysql or mariadb installation for the ACL database manually execute /etc/sysconfig/illumon.d/sql/init-dbacl-and-tables.sql and /etc/sysconfig/illumon.d/sql/iris-users.sql against this installation, and extract passwords for the Deephaven read-only and read/write users into the respective files:

  • /etc/sysconfig/deephaven/auth/db_acl_ro_passphrase.txt
  • /etc/sysconfig/deephaven/auth/db_acl_write_server_passphrase.txt
  • Use permissions irisadmin:irisadmin 400 for the write server passphrase, and irisadmin:irisadmin 444 for the read only (ro) passphrase file.

20. On the Configuration Server, load configuration into etcd:

/usr/illumon/latest/install/etcd/config_import.sh

21. Create and distribute authentication information to any other nodes that will be running the configuration or authentication server services.

On the iris_keygen node:

/usr/illumon/latest/install/config_packager.sh auth package

On the other nodes, after having copied the resultant iris_auth.tgz file:

/usr/illumon/latest/install/config_packager.sh auth unpackage

Note that the configuration package must be in the current working directory when running unpackage.

22. Create and distribute trust information to any other nodes that will be running Deephaven services but not running configuration server or authentication server services.

On the iris_keygen node:

/usr/illumon/latest/install/config_packager.sh query package

On the other nodes, after having copied the resultant dh_query.tgz file:

/usr/illumon/latest/install/config_packager.sh query unpackage

Note that the configuration package must be in the current working directory when running unpackage.

23. (Optional, but recommended) Configure the Client Update Service (use --environment centos for CentOS or RHEL).

a.) If a certificate was provided:

sudo /usr/illumon/latest/install/with_root/cus_init.sh --environment centos --cus-address <FQDN of Client Update Service> --secure

b.) Otherwise, for self-signed certificate installations:

sudo /usr/illumon/latest/install/with_root/cus_init.sh --environment centos --cus-address <FQDN of CUS> --insecure

24. Configure monit services for the nodes:

For a “classic three node setup”:

  • infra:
    • sudo /usr/illumon/latest/install/with_root/monit_init.sh --role classic-infra
    • sudo /usr/illumon/latest/install/with_root/monit_init.sh --role classic-dis
    • sudo /usr/illumon/latest/install/with_root/monit_init.sh --role classic-batch
  • query servers:
    • sudo /usr/illumon/latest/install/with_root/monit_init.sh --role classic-query

On each node, reload:

monit with sudo monit reload

Using Envoy for Self-Signed Certificate Installations

If an installation is run using certificates that are not trusted by Web clients, connections to the Deephaven Web IDE will fail. The initial failure will be on accessing the Web login page, but, after accepting a security exception for the certificate of this page, the login process itself will hang. Inspection of the Web log details or Web developer console will show that the browser failed to make a secure connection to the wss:// web socket. Since different web sockets are used for most every connection, many exceptions would have to be accepted when logging in to the Web console this way.

An alternative is to install and configure the Envoy reverse proxy (see: Using Envoy as a Front Proxy). This allows browser users to accept the exception for the connection to Envoy, and, since Envoy is handling the back-end mapping of user sessions to web sockets, no additional exceptions are needed.

Verify the Installation

  • On each node, run sudo monit summary. This should show a list of services, with OK status. If monit was run soon after installation or a restart, some services may show Initializing, but after about two minutes they should all show OK.
  • From a client system attempt to create a new instance connection to the new installation. (See Installation Guide.)
    • Launch a Query Config panel and verify the three initial persistent queries (ImportHelper, etc.) are running:

    • Launch a Console against each of the query servers and execute a simple query, like x=db.i("DbInternal","ProcessEventLog").where("Date=currentDateNy()").head(100)
  • From a browser, connect to https://<web server FQDN>:8123/iriside and attempt to log in and create a console.

Install Example Data

Many of the example queries in the Deephaven documentation are based on the LearnDeephaven data set. This data set consists of three tables of historical data covering a few dates in the range from 2017-08-21 to 2017-08-25, plus 2017-11-01. The LearnDeephaven dataset, and its corresponding namespace, takes about 350MB on disk.

Historical data is accessed from disk "locally" by query server processes. Normally, historical data storage, which is under the path /db/Systems, is shared across Deephaven servers using NFS or some other type of sharable file system. Configuration of shared storage for historical data is covered in the Scaling to Multiple Servers section.

If the installation is a single-node test/dev system, no shared storage is needed, so LearnDeephaven can be installed immediately after installing the system. For other installations, with multiple servers, LearnDeephaven should be installed only after shared storage for historical data has been configured.

The LearnDeephaven data set is installed by running this command once on a server from which writes to the shared /db/Systems storage are allowed (or the only server, in the case of a single-node installation):

sudo /usr/illumon/latest/install/learn_install.sh

After installing the LearnDeephaven namespace, query workers will need to be restarted before they can query from it. LearnDeephaven is, by default, made readable for all Deephaven users.

Troubleshooting

Configuration scripts log their activities to /var/log/deephaven/install_configuration. Any logs containing sensitive information will only be visible to the irisadmin user; on a secure production system, these logs should be purged or moved to a more secure archive after system functionality has been fully verified.

Each step will also return status information towards the end of its output. Ensure that no step has reported "failed" before its exit. Do not proceed to next steps if a step has failed.

After install, component services will log their details to: /var/log/deephaven/<service/process name>/

The health of the etcd cluster can be checked with:

sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table

If enabling etcd (step 12) fails with an Access Denied error, it may be necessary to reinitialize the systemd daemon:

sudo systemctl daemon-reexec

It is normal to see Authentication Server failures in the Configuration Server log while the Authentication Server is starting up. The Configuration Server must start first, as it is a dependency for the Authentication Server. The Configuration Server will then attempt to connect to the Authentication Server, and will retry while waiting for the Authentication Server to begin accepting connections.

If the Configuration Server client fails because it cannot find a matching Subject Alternative Name (SAN), check that /etc/sysconfig/illumon.d/dh-config/clients/single/host contains the correct name or IP address of the Configuration Server, and that /etc/sysconfig/deephaven/auth/truststore-iris.pem has a SAN for its configuration_server entry that matches the contents of the host file. Note that this will require reading the contents of the .pem file with something like OpenSSL that can decode the X509 formatting.

For example, run cat - | openssl x509 -text -noout and paste the certificate_server section of the .pem file to decode it.

If an installation has failed and is re-run, especially if the new run is with a later build number, some installation steps may fail as they try to back up previous settings under /etc/sysconfig/deephaven. To pass these steps, it may be necessary to create the missing path (sudo -p mkdir <path that is missing>) and then use sudo touch <path that is missing>/somefile.txt to create some content in the path.

If nothing starts, check whether there is content under /etc/sysconfig/deephaven/dh-config/clients/single/. There should be at least host, port, and cacert files there. If these are missing, it could be because step 16 (set up the configuration sever) was not run on the same server as step 22a (package the configuration to copy to other servers). If that is the only piece missing, the installation may be repaired by running step 16 again on the server used for iris_keygen, running step 22a again on this same server, and then re-copying the .tgz file and re-running step 22b on the other servers to correct the configuration.

Installing Patch Updates

While running a particular release, there may be the need to install a patch update RPM to upgrade to a newer build version and apply fixes and enhancements to the Deephaven system. Such "build-to-build" updates normally do not require any manual configuration changes to the system and mainly replace product binaries and internal resource files with newer versions. An example would be updating from version 20200121.005 (build 5) to 20200121.008 (build 8). After installing the newer RPM on all servers the client update service will pick up the new versions of client files and these will be downloaded to clients the next time they connect with the Deephaven launcher.

1. On each server, stop Deephaven services: sudo monit stop all

2. On each server, install the updated RPM: sudo yum localinstall <updated_RPM>

3. On the node(s) running configuration server:

  1. sudo monit start configuration_server; watch sudo monit summary

4. On the node(s) running authentication server:

  1. sudo monit start authentication_server; watch sudo monit summary
  2. Wait until the process is in Status 'OK'.

5. On every query node:

  1. sudo monit start all; watch sudo monit summary
  2. Wait until the process is in Status 'OK'.

6. On the main infra node:

  1. sudo monit start all; watch sudo monit summary
  2. Wait until the process is in Status 'OK'.

7. On any stand-alone servers, use the IllumonConfigurationUpdater to update client copies of binaries and resources, or manually copy files from a server's /usr/illumon/latest/java_lib path.

Adding Servers to the Installation

Additional servers (nodes) can be added to the Deephaven cluster after the initial installation. This can be done to improve fault tolerance of the system or to increase available compute resources. The most typical examples of adding nodes are:

  • Adding etcd nodes. There should always be an odd number of etcd nodes in the cluster.
  • Adding authentication server nodes. Having one or more backup authentication servers allows for continued use of the system in case the primary authentication server is inaccessible.
  • Adding configuration server nodes. Having one or more backup configuration servers allows for continued use of the system in case the primary configuration server is inaccessible. etcd provides high availability and fault tolerance for the storage of configuration data, but the configuration server is the interface through which these data are provided to other Deephaven services.
  • Adding dedicated servers for query processing or data import processing. This allows scale-out of the system to handle more users and/or larger volumes of data.

1. Install the Java JDK on the new server:

  1. e.g., (on CentOS, to install the latest Java 8 OpenJDK): sudo yum install java-1.8.0-openjdk-devel

2. Install M/Monit:

  1. sudo yum install monit
  2. sudo systemctl enable monit
  3. sudo systemctl start monit

3. Install the Deephaven RPM. This will create directory structures under /db, /usr/illumon, /etc/sysconfig/illumon.confs, /etc/sysconfig/illumon.d, and /etc/sysconfig/deephaven.

  1. sudo yum localinstall ./illumon-db-1.20200121.012-1-1.rpm

4. Use sudo cp to copy the mysql connector jar (mysql-connector-java-5.1.44.jar or mysql-connector-java.jar) to /etc/sysconfig/illumon.d/java_lib on the new server. Installing the connector would have placed the jar initially in /usr/share/java/mysql-connector-java.jar on the server where yum was initially run during the install process.

5. Configure installation properties: sudo /usr/illumon/latest/install/set_iris_endpoints_prop.sh with arguments to indicate which services are hosted on which servers. This is run on the Configuration Service node.

6. set_iris_endpoints_prop.sh will report on which file it created/updated (e.g. prop file: /etc/sysconfig/deephaven/illumon.d.1.20200121.008/resources/iris-endpoints.prop). This file then needs to be imported into etcd: sudo -u irisadmin /usr/illumon/latest/bin/etcd_prop_file --import /etc/sysconfig/deephaven/illumon.d.1.20200121.008/resources/iris-endpoints.prop

7. If adding a new data import server (DIS), update the data routing YAML to reflect the new server and related namespaces/tables. (See: Data Routing Configuration via YAML.)

8. On the node where iris_keygen was run in step 13, package the configuration data that will be needed by the new node(s), copy it to the new node(s), and install it there.

  1. On the iris_keygen node: sudo /usr/illumon/latest/install/config_packager.sh auth package
  2. On the other nodes, after having copied the resultant iris_auth.tgz file: sudo /usr/illumon/latest/install/config_packager.sh auth unpackage

9. Configure monit services for the new node(s):

  a. For a “classic three node setup”:

  • infra:
    • sudo -E /usr/illumon/latest/install/with_root/monit_init.sh --role classic-infra
    • sudo -E /usr/illumon/latest/install/with_root/monit_init.sh --role classic-dis
    • sudo -E /usr/illumon/latest/install/with_root/monit_init.sh --role classic-batch
  • query servers:
    • sudo -E /usr/illumon/latest/install/with_root/monit_init.sh --role classic-query

  b. Reload monit with sudo monit reload

10. On the server running the client update service, restart it, to pick up the new endpoint information so that clients will see it/them when they next run the launcher or configuration updater:
sudo monit restart client_update_service

11. If the node being added is a query server, run the controller reload on the server running the Deephaven controller service to rebuild the list of query servers that will be available within the system:
sudo service iris controller_tool --reload


Last Updated: 15 March 2021 11:25 -04:00 UTC    Deephaven v.1.20200928  (See other versions)

Deephaven Documentation     Copyright 2016-2020  Deephaven Data Labs, LLC     All Rights Reserved