Persistent Query Controller

Persistent Query Controller Configuration

Persistent queries are one of the core functions in Deephaven. These queries are defined by a user through the Deephaven console and then stored for future use. All persistent queries are stored by and under the control of the Persistent Query Controller process. This process stores all persistent queries, and is responsible for starting and stopping them at the appropriate times.

(Note: All further references to "controller" in this document refer to the Persistent Query Controller.)

Controller configuration exists in both property files and XML files. The base property file for any Deephaven process is specified by the Configuration.rootFile property. Controller configuration exists in both property files and XML files. The base property file for any Deephaven process is specified by the Configuration.rootFile property; see the Persistent Query Controller runbook for further details. XML files have their locations specified in a different way, which is explained later.

Some aspects of the controller's configuration can be dynamically reloaded, without the need to restart the controller. These include the list of query and merge servers, the list of temporary queues, new JVM profiles, and the configuration types. To issue a reload command, see Controller Configuration Reload section of the Persistent Query Controller Tool documentation. To issue a reload command, a user must belong to a group defined by the property:

configuration.reload.userGroups

By default, only superusers can issue reload commands.

Query and Merge Servers

All persistent queries must run on a database server, and the controller must have details on these servers. Each server is a Remote Query Dispatcher process which listens on a specific port for connections from the controller.

These servers (dispatchers) are defined by a set of properties that the controller reads on startup. This list of available servers may need to be updated dynamically - for example, to add a new server or to change the address of a failed server - and is one of the configuration pieces that can be dynamically reloaded with the controller tool's reload option.

The list of query servers is always started by defining the number of available servers:

iris.db.nservers=<N>

This is followed by a list of server properties in the format

iris.db.<server number>.<property>=<value>, where the server numbers start at 1 and increment to N from the value defined by iris.db.nservers. Following are the properties that can be defined for each server:

  • iris.db.<server number>.host - This required value defines the host name or IP for the server.
  • iris.db.<server number>.port - The port on which the dispatcher is listening. If not defined, the default port from the property RemoteQueryDispatcherParameters.queryPort is used; this usually points to 22013.
  • iris.db.<server number>.class - The server class, usually Query or Merge. If not defined, it uses the value from the iris.db.defaultServerClass property, which defaults to Query.
  • iris.db.<server number>.name - A name for the server, displayed in the console and stored with the queries. If it is not provided, it will be automatically generated by using <server class>_<number>, where the number starts with 1 and increments by 1 for each server of a given class.

For example, a basic configuration with one query server and one merge server, each running on the same local host, might look like the following. Since the first server is a query server listening on the default dispatcher port, only the host needs to be specified:

iris.db.nservers=2
			
iris.db.1.host=localhost
# default server name will be Query_1

iris.db.2.host=localhost
iris.db.2.port=30002
iris.db.2.class=Merge
# default server name will be Merge_1

Console Server Classes

The server class referred to by the iris.db.<server number>.class property can also refer to a console-specific server class. These classes are for servers that should be available to appropriately privileged users in their interactive consoles, but not available for persistent queries.

Server classes are defined by adding properties that must be available to the persistent query controller, as shown below:

ConsoleServerClass.<server class>.allowedGroups=<ACL groups allowed>

For example, the following property defines a Historical query class that is available to users with the HistorialQuery group:

ConsoleServerClass.Historical.allowedGroups=HistoricalQuery

Users can be added to the defined console server group by using the User/Groups tab in the ACL Editor. Adding users to the group ensures they have access to these servers in the console's Query Server box. To make the server available for everybody, use the "allusers" group as the ACL group in the server class property.

Temporary Query Queues

Temporary queries are used for queries that run once, such as batch imports of historical data. Temporary queues are defined through controller properties. Temporary query queue properties can be dynamically reloaded by using the controller tool's reload command.

Each temporary query queue is defined by setting two properties in the controller's property file. As many temporary query queues can be defined as needed, and each one will have its own properties, based on the query queue's name. These properties define the resources that the temporary queue is allowed to consume. Both properties are required for each temporary query queue:

  • PersistentQueryController.temporaryQueryQueue.<queue_name>.maxConcurrentQueries - This defines the maximum number of concurrent queries allowed to run on the named temporary query queue.
  • PersistentQueryController.temporaryQueryQueue.<queue_name>.maxHeapMB - This defines the maximum heap in MB that the temporary queries are allowed to use for the temporary query queue.

These resource restrictions are both applied when determining whether or not the next temporary query can run on a queue - the next query must not cause either the maximum concurrent queries or the maximum heap to be exceeded. If either is exceeded, the query will not be run until sufficient resources are available on its queue. Queries are run in the order in which they were submitted to the queue.

The property PersistentQueryController.defaultTemporaryQueryQueue defines the default temporary query queue presented to the user when temporary scheduling is chosen.

Following is a simple default configuration, defining a single queue which allows one query to run at a time with a maximum heap of 20000 MB.

PersistentQueryController.temporaryQueryQueue.DefaultTemporaryQueue.maxConcurrentQueries=1
PersistentQueryController.temporaryQueryQueue.DefaultTemporaryQueue.maxHeapMB=20000
PersistentQueryController.defaultTemporaryQueryQueue=DefaultTemporaryQueue

Persistent Query Startup

The persistent query controller may be required to start a large number of queries at the same time at the start of a business day. It maintains a thread pool for this, and while extra threads will be added as needed, it may be helpful to increase these values on systems where the controller is expected to start and stop large numbers of queries at the same time. The following properties control this thread pool.

  • PersistentQueryController.queryStartThreadPoolCoreSize - this defines the minimum number of threads maintained for persistent query startup. The number of available threads will never drop below this value.
  • PersistentQueryController.queryStartThreadPoolKeepAliveMinutes - extra threads added for query startup will be removed if they are idle for this number of minutes. For example, on a system with a lot of queries that run every hour, this value can be updated to ensure that threads remain available for an hour or more.

Query Types

Query configuration types (such as Live Query (Script), Batch Query (RunAndDone), and Data Merge) are defined in an XML configuration file, which is used by the controller and console to understand how to handle each type of query. The behavior of these query types is configurable, and new query types can be added by customers.

Note: Modification of the existing Deephaven query types is not recommended.

The property iris.controller.configurationTypesXml defines a comma-delimited list of XML files that contain the query configuration type definitions. By default it uses PersistentQueryConfigurationTypes.xml.

Query Type Attributes

Each query type is defined in an XML ConfigurationType element, defining the following attributes:

  • allowedGroups - if defined, this is a comma-delimited list which restricts users of the query to the specified user groups. If a user is not a member of one of the specified groups (or a superuser), the user will not be able to create a query of this type. If it is not defined, then there are no group restrictions on this query type.
  • displayable - Whether or not the query type should be displayed in a console. If it is defined and false, this query type is only displayed to superusers. This is useful for internal query types such as the Deephaven helper queries.
  • enabled - If defined and false, this query type is disabled. A disabled query type is not available to users.
  • hasScript - Defines whether or not the query has a script. If a query does not have a script (hasScript="false"), then the console will not display a script panel when editing a query of this type. The default value is true.
  • name - The name of the query type [for example "Live Query (Script)"]. This is required.
  • serverTypes - An optional comma-delimited list which restricts the server types on which a query can run. If it is not defined, then the default server types from the property iris.db.defaultServerClass will be used; unless changed, the default is server type Query.
  • stopTimeRequired - An optional attribute which defines whether scheduling of the query requires a stop time. Query types such as Live Query (Script) that run continuously require stop times, while query types such as Batch Query (RunAndDone), Data Merge, and Import do not require stop times as they terminate automatically when complete. The default value is true.

Query Sub Elements

Each ConfigurationType element can define the following sub-elements to further define behavior. The classes defined within these elements are dynamically created during the creation of queries by the console, controller, and dispatcher.

  • <SetupQuery name="Java setup class"> - This required element defines a Java class that will be used to create an instance of the query type. This class must extend the com.illumon.iris.db.tables.remotequery.ContextAwareRemoteQuery<com.illumon.iris.controller.PersistentQueryState> class. A query type is not valid without this setup class.
  • <ConfigChecker class="Java configuration checker class" /> - An optional Java configuration checker class that will be run to validate data before a query of this type can be saved. This class must implement the com.illumon.iris.controller.ConfigChecker interface. If it is not provided, no extra validation is performed on a query of this type before it is saved.
  • <ConfigPanelFactoryClass class="Java configuration panel factory class" /> - An optional Java class to provide a type-specific configuration panel to the console. This type-specific panel contains configuration-specific details. For example, a merge query requires parameters such as the table's namespace and table name. If this is not provided, then no type-specific panel will be created. The factory class must implement the com.illumon.iris.controller.TypeSpecificConfigPanelFactory interface, and the panels it creates must implement the com.illumon.iris.controller.TypeSpecificConfigPanel interface.
  • <PopupProvider class="Java pop-up provider class" /> - An optional Java class to provide additional context-sensitive (right-click) menu options in the console's query configuration view. For example, an import query in the query configuration panel that is right-clicked provides the option to create the corresponding merge query. The pop-up class must implement the com.illumon.iris.controller.PersistentQueryPanelPopupProvider interface. If it is not provided, no additional pop-up menu options are created.
  • <ExtraColumnGetter class="Java extra column getter class" /> - An optional Java class to provide extra columns to be displayed in the console's configuration panel for this configuration type. For example, a merge query displays the namespace and table name in the configuration panel. This Java class must implement the com.illumon.iris.controller.ExtraColumnGetter interface. If it is not provided, no extra columns are displayed.
  • <ExtraPanelColumn name="<column name" /> - If ExtraColumnGetter is provided, one or more ExtraPanelColumn elements should be provided. These are the extra columns to be provided to the ExtraColumnGetter's getExtraColumnValue method.
  • <CustomActionProvider class="Java custom action provider class" /> - An optional Java class to produce pop-up menu items for the Custom Actions attached to a table in a script. This Java class must implement the com.illumon.iris.controller.CustomActionProvider interface.

Controller Cache

All persistent queries created by users are stored by the controller in etcd. See the Cache Backup and Restore Process runbook and Persistent Query Controller Tool documentation for details on how to back up and restore this cache.

Git Configuration

By default, the source code for a persistent query in the "Script Editor" tab is stored by the Persistent Query Controller as part of the query's configuration. However, with Git integration, a persistent query's source code isn't stored in Deephaven. Rather, it is loaded directly from its associated Git repository.

For more information on configuring persistent queries, see Persistent Query Configuration Viewer/Editor.

To enable Git integration, several properties must be set in the Deephaven controller's configuration. One global property must be set, followed by several properties for each repository.

The global property is:

  • iris.scripts.repos — a comma-separated list of git repositories the controller should use

The additional properties for each repository are listed below:

  • iris.scripts.repo.<repo_name>.groups — a comma-separated list of the Deephaven groups who may access the repository.
  • iris.scripts.repo.<repo_name>.updateEnabled — Set to true to automatically update the repository (i.e., run a git pull) once per minute. This helps ensure that when a query runs, it uses the most recent version of the script available in the repository's remote origin.
  • iris.scripts.repo.<repo_name>.branch — the Git branch to check out; if this is not set, the controller's PersistentQueryController.defaultBranch property value is used
  • iris.scripts.repo.<repo_name>.prefixDisplayPathsWithRepoName — If true, the "Choose Script" dialog of the Persistent Query Configuration Editor will include the repository's name next to each script path. This helps disambiguate scripts for users who have access to multiple repositories.
  • iris.scripts.repo.<repo_name>.branch — the Git branch to check out
  • iris.scripts.repo.<repo_name>.root — the directory on the filesystem into which Deephaven will clone the git repository. Each repository must have a distinct root directory. If a relative path is used, the path will be relative to the workspace directory of the Controller process. On Deephaven servers, this will normally be /db/TempFiles/irisadmin/iris_controller.
  • iris.scripts.repo.<repo_name>.paths=IrisQueries/groovy — the paths, relative to the repository's root directory, to include. Files in all other paths will not be available to Deephaven queries.
  • iris.scripts.repo.<repo_name>.uri — the SSH URI used to access the git repository, such as [email protected]:illumon/iris.git

For repository updates (i.e., Git pull) to be enabled, the Persistent Query Controller must not be configured to use a local git repository.

  • PersistentQueryController.useLocalGit - Optional - Set to true to use a local repository as the script source. This disables repository updates globally, regardless of each repository's updateEnabled setting. The default value is true, causing Deephaven to use a local repository, not checking out a configured branch from the remote.

Git garbage collection is controlled by the following property.

  • PersistentQueryController.gitGcEnabled - if defined and set to false, this disables git garbage collection.

Git Authentication

If a keypair is being used to authenticate with the Git server, the keypair should be created in the .ssh directory under the irisadmin home directory - usually /db/TempFiles/irisadmin. If an id_rsa keypair doesn't already exist under /db/TempFiles/irisadmin/.ssh it can be created using:

sudo -u irisadmin ssh-keygen -t rsa

The utility will prompt for the path/name to use when creating the keypair; accept the default value ( /db/TempFiles/irisadmin/.ssh/id_rsa). Press Enter twice when prompted for a passphrase to create the keypair with a blank passphrase. The ssh key contained in /db/TempFiles/irisadmin/.ssh/id_rsa.pub can then be used to grant the irisadmin account access to the Git repository by adding the key using the Git installation's administration interface.

Note: The fingerprint of the RSA key for the server specified in this URI must be in the ssh known hosts file for the user the controller runs as. In a typical installation, the controller runs as the `irisadmin` user, and the appropriate known hosts file is `~irisadmin/.ssh/known_hosts`. For example, if the git URI is "[email protected]:mycompany/iris.git", the Git server can be added to the to the known hosts file with the following command:

sudo -u irisadmin /bin/sh -c 'ssh-keyscan -t rsa gitlab.mycompany.net >> ~irisadmin/.ssh/known_hosts'

It is prudent to print the received key and verify that it matches the server. The key that was added can be viewed with this command:

sudo -u irisadmin tail ~irisadmin/.ssh/known_hosts

Also, for repository updates (i.e., git pull) to be enabled, the Persistent Query Controller must not be configured to use a local git repository.

  • PersistentQueryController.useLocalGit — Optional — Set to true to use a local repository as the script source. This disables repository updates globally, regardless of each repository's updateEnabled setting. The default value is true, causing Deephaven to use a local repository, not checking out a configured branch from the remote.

Example Configuration

The example configuration below configures Deephaven to read scripts from three repositories: team1, team2, and shared. All users can access scripts in the shared repository, but the team1 and team2 repositories are restricted to specific users. All three repositories will use a branch called master.

iris.scripts.repos=shared,team1,team2

iris.scripts.repo.shared.groups=*
iris.scripts.repo.shared.updateEnabled=true
iris.scripts.repo.shared.branch=master
iris.scripts.repo.shared.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.shared.root=../git/shared
iris.scripts.repo.<repo_name>.paths=IrisQueries/groovy,IrisUtils/groovy
[email protected]:common-libs/shared.git

iris.scripts.repo.team1.groups=user1,user2,user3
iris.scripts.repo.team1.updateEnabled=true
iris.scripts.repo.team1.branch=master
iris.scripts.repo.team1.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.team1.root=../git/team1
iris.scripts.repo.team1.paths=IrisQueries/groovy
[email protected]:team1/team1.git

iris.scripts.repo.team2.groups=user1,user4,user5
iris.scripts.repo.team2.updateEnabled=true
iris.scripts.repo.team2.branch=master
iris.scripts.repo.team2.prefixDisplayPathsWithRepoName=false
iris.scripts.repo.team2.root=../git/team2
iris.scripts.repo.team2.paths=IrisQueries/groovy
[email protected]:team2/team2.git

Other Properties

The following table presents other properties related to the controller's operation. These parameters are not reloadable.

Property

Meaning

critEmail

The email distribution list to which critical alerts will be sent. Currently the only critical alert is for a hung script update job, which is used to refresh scripts from Git.

iris.authentication.keyfile

The keyfile used to authenticate the controller to the dispatchers.

PersistentQueryController.binaryLogTimeZone

If specified, the time zone that determines column partition values for the controller's data (PersistentQueryStateLog and PersistentQueryConfigurationLogV2). If not specified, then the server's default value is used.

PersistentQueryController.commitCheckpointImmediate

If specified as true, then the controller will force a cache checkpoint immediately following startup.

PersistentQueryController.defaultMaxHeapSizeGB

This is the maximum heap the controller will allow for a query before a dispatcher is contacted. By default it is 1024GB (1TB).

PersistentQueryController.host

The hostname on which the controller is running. This is not used by the controller, but is read by other processes that need to connect to the controller.

PersistentQueryController.keyPairFile

The keypair file used to encrypt sensitive information for the controller's use. This file should not be visible to users of the system.

PersistentQueryController.port

The port on which the controller listens for client connections, from user consoles or the controller tool.

In addition, the controller's logging behavior can be changed with the standard logging parameters. See the Deephaven Operations Guide Log-Related Properties section for further details.

Initial Startup Configuration

When the controller starts for the first time on any Deephaven installation it must create helper queries to assist with Deephaven operations. Properties are used to assist with the creation of these queries, and in a non-standard configuration it may be useful to override these properties.

The Revert Helper query assists Iris when a query is reverted to a previous version. Following are the available parameters that the controller uses when creating the initial revert helper query (the first time the controller is run), and their default values.

  • revertHelper.queryOwner=iris - the owner of the revert helper query; this must be a superuser
  • revertHelper.queryName=RevertHelperQuery - the name of the query
  • revertHelper.dbServer=Query_1 - the server on which the revert helper will run; it should be a server with a type of Query; if custom-named servers are used, this will need to reflect a named query server
  • revertHelper.heapSize=1 - the heap size in GB of the helper query

The following parameter defines how far back the revert helper looks when a user requests to revert a query to a previous version:

  • revertHelper.lookbackDays=180

The following parameter defines the number of seconds for which a console request to revert a query will wait for a response from the helper before displaying an error:

  • revertHelper.waitQuerySeconds=30

The Import Helper query assists with import, merge, and validation queries. The initial query-creation parameters have the same meanings as for the revert helper.

  • importHelper.queryOwner=iris
  • importHelper.queryName=ImportHelperQuery
  • importHelper.dbServer=Merge_1 - the import helper should run on a server with a type of Merge
  • importHelper.heapSize=1
  • importHelper.queryOwner=iris


Last Updated: 23 February 2021 14:01 -04:00 UTC    Deephaven v.1.20200928  (See other versions)

Deephaven Documentation     Copyright 2016-2020  Deephaven Data Labs, LLC     All Rights Reserved