Data Routing Service Configuration via YAML
Data is central to Deephaven. There are many ways to configure the storage, ingestion, and retrieval of the data. The Data Routing Service is a central API for managing the configuration of a Deephaven system. The YAML configuration file format centralizes the information governing the locations, servers, and services that determine how data is handled. Because the information is stored in one place, the entire configuration can be viewed at a glance. This also makes it easier to make changes and understand the implications.
Like most Deephaven configuration, this routing information is stored in etcd and accessed via the configuration service. See our Configuration guide for information about exporting and editing the data routing configuration.
YAML File Format
Related Link
The full YAML specification can be found at:
http://yaml.org/spec/1.2/spec.htm
The YAML data format is designed to be human readable. However, there are some aspects that are not obvious to the unfamiliar reader. The following pointers will make the rest of this document easier to understand.
- YAML leans heavily on maps and lists.
- A string followed by a colon ( e.g.,
filters:
) indicates the name of a name-value pair (the value can be a complex type). This is used in "map" sections. - A dash (e.g.,
- name
) indicates an item in a list or array. In this structure, the item is often a map. However, the item can be any type, simple or complex. Maps are populated with key:value sets, and the values can be arbitrary complex types. - Anchors are defined by
&identifier
, and they are global to the file. Anchors might refer to complex data. - Aliases refer to existing anchors (e.g.,
*identifier
). - Aliased maps can be spliced in (e.g.,
<<: *DIS-default
). In the context below, all the items defined by the map with anchorDIS-default
are duplicated in the map containing the<<:
directive.
The Data Routing Service needs certain data to create a system configuration. It looks for that data under defined sections in a single YAML document.
Data Types
Only a few of the possible data types are used by Deephaven and mentioned in this document:
Related Link
More information about data types can be found here:
http://yaml.org/spec/1.2/spec.html#id2759963.
- List (or sequence) - consecutive items beginning with "- "
- Map - set of "name: value" pairs
- Scalars - single values mainly integer, floating point, boolean, and string
The YAML parser will guess at the data type for a scalar, and it cannot be always correct. The main opportunity for confusion is with strings that can be interpreted as other data types. Any value can be clarified as a string by enclosing it in quotation marks (e.g., "8" puts the number in string format).
Note: we will simply refer to scalars as values or their value types (e.g., string) from here on out.
Placeholder Values
Any value of the form <env:TOKEN>
or <prop:property_name>
will be replaced by the value of the named environment variable or property. Note that this is not standard YAML, but rather a Deephaven extension to allow for customization in certain environments.
For example:
anchors:
- &lasHostProp <prop:deephaven.admin.host>
...
logAggregatorServers: !!omap
- theLas:
host: *lasHostProp
port: <env:DEEPHAVEN_ADMIN_HOST>
Sections
The Data Routing Configuration file contains a single YAML document with sections, which we define as important maps and lists directly under the root "config
" map.
The document must contain this "config
" map, which must have the following sections:
- storage
- dataImportServers
- logAggregatorServers
- tableDataServices
Optional sections may be included in the YAML file that define anchors and default values (e.g., "anchors"), and those that combine default data into one location (e.g., "default"). Each section is discussed below.
Anchors
Anchors can be defined and then referenced throughout the data routing configuration file. These can represent strings, numbers, lists, maps, and more. In this case, we use them to define names for machine roles and for port values. In a later section, they define the default set of properties in a DIS.
The syntax &identifer
defines an anchor, which allows "identifier" to be referenced elsewhere in the document. This is often useful to concentrate default data in one place to avoid duplication. Note: because anchors are global, the names must be unique.
See the example "anchors" section below. This collection of anchors defines the layout of the cluster, defines default values for ports that were previously defined in properties files, and defines default values for DIS configurations that will be referenced later in the document (see Data Import Servers below for a detailed explanation of these values):
anchors:
# This is a list of the cluster and defines anchors for hosts
- &localhost "localhost"
- &ddl_infra "192.168.0.140"
- &ddl_query "192.168.0.140"
- &ddl_query1 "192.168.0.141"
- &ddl_query2 "192.168.0.142"
- &ddl_dis "192.168.0.140"
- &ddl_rta "192.168.0.140"
- &ddl_merge "192.168.0.140"
# Define aliases for the default port values
- &default_tailerPort 22021
- &default_lasPort 22020
- &default_tableDataPort 22015
- &default_tableDataCacheProxyPort 22016
- &default_localTableDataPort 22014
# Defines the set of key:value pairs for the DIS configuration that contains all the default values.
# Sections below can include this with <<: and override only what they need to change
- &DIS-default
tailerPort: *default_tailerPort
throttleKbps: -1
storage: default
definitionsStorage: default
tableDataPort: *default_tableDataPort
Note: Anchors can be defined anywhere in the YAML file. However, the anchors must be defined earlier in the file than where they are used. For example, the hosts and default ports could also be defined in their own sections (e.g., "hosts" and "defaultPorts") rather than consolidated into one section.
Storage
This required section defines locations where Deephaven data will be stored. This will include the default database root and any alternate locations used by additional data import servers.
Note: Many components still get the root location from properties (e.g., OnDiskDatabase.rootDirectory
).
For example:
storage:
- name: default
dbRoot: /db
The value of this section must be a list:
name: [String]
- Other parts of the configuration will refer to a storage instance by this name.dbRoot: [String]
- This refers to an existing directory.
See also Import-driven lastBy Queries.
Data Import Servers
This section supports Data Import Server (DIS) and Tailer processes.
The DIS process will load the configuration matching its process.name
property (e.g., db_dis
) . The configuration names can be set to match the process names in use. This can be overridden with property DataImportServer.routingConfigurationName
.
The Tailer process uses this configuration section to determine where to send the data it tails. Note that a given table might have multiple DIS destinations.
The two consumers have slightly different needs, such as the host name, which is not needed by the DIS process.
The following example defines two data import servers. Note that the defaults defined in the previous section are imported into each map entry:
dataImportServers:
db_dis:
<<: *DIS_default # import all defaults from the DIS_default section
host: *ddl_dis # reference the address defined above for "ddl_dis"
userIntradayDirectoryName: "IntradayUser"
filters: {namespaceSet: System}
webServerParameters:
enabled: true
port: 8084
authenticationRequired: false
sslRequired: false
db_rta:
<<: *DIS_default
host: *ddl_dis
userIntradayDirectoryName: "IntradayUser"
filters: {namespaceSet: User}
The value of this section must be a map. The keys of the map will be used as Data Import Server names. The value for each key is also a map:
host: [String]
- The tailer will connect to this data import server on the given host and porttailerPort: [int]
- The Data Import Server will receive Tailer connections on this port.throttleKbps: [int]
- (Optional) If omitted or set to -1, then there is no throttling.storage: [String]
- This must be the name of a storage instance defined in the storage section. This data import server will write data in the location specified by this storage instance.definitionsStorage: [String]
- (Optional) If a Data Import Server is configured with storage other than the default, table definitions generally must still be read from the default storage. This must be the name of a storage instance defined in the storage section.userIntradayDirectoryName: [String]
- (Optional) Intraday user data will be stored in this folder under the defined storage. If not specified, the default is "Users".filters: [filter definition]
- (Optional) This filter determines the tables for which tailers will send data to this Data Import Server.tableDataPort: [int]
- (Optional) - This port will be used to publish table data. If not set, or set to -1, this import server will not publish table data.webServerParameters: [map]
- (Optional) This defines an optional web server for internal status.enabled: [boolean]
- (Optional) If set, and true, then a webserver will be created.port: [(if enabled) int]
authenticationRequired: [boolean]
- (Optional) Defaults to true.sslRequired: [boolean]
- (Optional) Defaults to true. IfauthenticationRequired
is true, thensslRequired
must also be true.tags: [list]
- (Optional) Strings that are used to categorize this Table Data Service.description
: (Optional) - Text description of this Table Data Service. This text is displayed in user interface contexts.-
properties
: (Optional) - Map of properties to be applied to this DIS instance. See full details below.
Properties
The properties value is a map of properties to be applied to this data import server instance.
Valid properties include:
requestTImeoutMillis: int
StringCacheHint.<various>
DataImportServers have string caches, which are configured with properties starting with DataImportServer.StringCacheHint
:
DataImportServer.StringCacheHint.tableNameAndColumnNameEquals_
DataImportServer.StringCacheHint.columnNameEquals_
DataImportServer.StringCacheHint.columnNameContains_
DataImportServer.StringCacheHint.tableNameStartsWith_
DataImportServer.StringCacheHint.default
These properties define the global default values, but they are augmented and overridden in any given data import server configuration by properties starting with StringCacheHint
:
StringCacheHint.tableNameAndColumnNameEquals_
StringCacheHint.columnNameEquals_
StringCacheHint.columnNameContains_
StringCacheHint.tableNameStartsWith_
StringCacheHint.default
Example:
db_dis:
# lines omitted
properties:
requestTimeoutMillis: 60000
# DIS Cache Hints
# Note that failover import servers handling the same data must have identical settings per table
# Table name and column name exact match:
StringCacheHint.tableNameAndColumnNameEquals_QuoteArcaStock/Exchange: ConcurrentUnboundedStringCache,String,1
#
# Column name exact match:
# NB: USym and Sym are still using Strings - in case of issues, this should keep ImportState instances compatible across bounces.
StringCacheHint.columnNameEquals_USym: ConcurrentUnboundedStringCache,String,20000
StringCacheHint.columnNameEquals_Sym: ConcurrentUnboundedStringCache,String,800000
StringCacheHint.columnNameEquals_Exchange: ConcurrentUnboundedStringCache,CompressedString,50
StringCacheHint.columnNameEquals_SecurityType: ConcurrentUnboundedStringCache,CompressedString,9
StringCacheHint.columnNameEquals_Parity: ConcurrentUnboundedStringCache,CompressedString,2
#
# Everything else:
StringCacheHint.default: ConcurrentBoundedStringCache,MappedCompressedString,1000000,2
Log Aggregator Servers
This section defines Log Aggregator Servers. Unlike the way a Tailer uses DIS entries, only one LAS will be selected for a given table location. This allows a section with a specific filter to override a following section with more general filters. The LAS process will load the configuration matching its process.name
property. That can be overridden with property LogAggregatorService.routingConfigurationName
.
The following example shows one service for user data (RTA) and a local service for everything else:
logAggregatorServers: !!omap
- rta:
port: *default_lasPort
host: *ddl_rta
filters:
- namespaceSet: User
- log_aggregator_service: # default service
port: *default_lasPort
host: *localhost
The value for this section must be a list, and should be an ordered list. The "!!omap
" directive ensures that the datatype is an ordered list. Consumers of the Log Aggregator Service will send data to the first server in the list for which the filter matches.
Each item in the list is a map, where the key is taken to be the name of a Log Aggregator Server.
The value of each item within the map is a key-value pair:
host: [String]
- Clients of this Log Aggregator Server will connect to the given host and port. This is often localhost.port: [int]
- The Log Aggregator Service of this name will listen on this port.filters: [filter definition]
- (Optional) This filter defines whether data for a given table should be sent to this server.-
properties
: (Optional) - map of properties to be applied to this Log Aggregator instance. See below.
Properties
The properties value is a map of properties to be applied to this log aggregator instance.
Valid properties include:
binaryLogTimeZone
messagePool.minBufferSize
messagePool.maxBufferSize
messagePool.bucketSize
BinaryLogQueueSink.pollTimeoutMs
BinaryLogQueueSink.idleShutdownTimeoutMs
BinaryLogQueueSink.queueSize
Table Data Services
This is the most important section and requires careful set-up.
This section supports both providers and consumers of the TableDataService
protocol. Providers include Data Import Servers, the Table Data Cache Proxy and the Local Table Data Service. All Data Import Servers are implicitly Table Data Service (TDS) providers. The TDCP and LTDS load their TDS configurations from the TableDataService
section by matching its process.name
property, or the one of the optional overriding properties:
TableDataCacheProxy.routingConfigurationName
LocalTableDataServer.routingConfigurationName
Any TDS that will be used by a consumer process (query server, merge process, etc.) must have filters configured so that any table location will be provided by exactly one source, either because of filter exclusions or because of local availability of data.
A table location is composed of a namespace, tableName, internalPartition, and columnPartition. It is common for one TDS to provide certain column partition values (e.g., currentDate, currentDate+N) and another to provide locations for the other column partition values.
For example:
tableDataServices:
# db_dis - data import server implies table data source for system data
# db_rta - data import server implies table data source for user data
# local, with a storage named above
local:
storage: default
# Configuration for the LocalTableDataServer named db_ltds, and define
# the data published by that service
db_ltds:
host: *ddl_dis
port: *default_localTableDataPort
storage: default
# Proxies combine other services and create new services
# Configuration for the TableDataCacheProxy named db_tdcp, and define
# the data published by that service.
# There is typically one such service on each worker machine.
db_tdcp:
host: localhost
port: *default_tableDataCacheProxyPort
sources:
# SYSTEM_INTRADAY tables for "current date" (and future), minus LASTBY_NAMESPACE tables handled by Example_LastBy
- name: [db_dis, db_dis_backup] # the array defines a failover group of equivalent sources
filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `LASTBY_NAMESPACE`", whereLocationKey: "ColumnPartition >= com.illumon.iris.db.v2.locations.CompositeTableDataServiceConsistencyMonitor.consistentDateNy()"}
# LTDS for SYSTEM_INTRADAY past dates, minus LASTBY_NAMESPACE tables handled by Example_LastBy
- name: db_ltds
filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `LASTBY_NAMESPACE`", whereLocationKey: "ColumnPartition < com.illumon.iris.db.v2.locations.CompositeTableDataServiceConsistencyMonitor.consistentDateNy()"}
# all user data
- name: db_rta
filters: {namespaceSet: User}
# only LASTBY_NAMESPACE data
- name: Example_LastBy
filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `Order`"}
# TDS failover groups. These are treated as equivalent sources:
# e.g., in the case of data recovery.
system_dis_tds:
sources:
# any source that is an array defines a rollover group;
#all entries in the group must have host and port, and should have identical filters.
- name: [db_dis, db_dis2]
The value for this section must be a map. The key of each entry will be used as the name of the table data service.
The value of each item is also a map:
host: [String]
- (Optional) Host and port define the address of a remote table data service. Both must be set, or neither.port: [int]
- (Optional) Host and port define the address of a remote table data service. Both must be set, or neither.storage: [String]
- (Optional) Either storage or sources must be specified. If present, this must be the name of a defined storage instance. This is valid for a local table data service, or a local table data proxy.sources: [list]
- (Optional) Either storage or sources must be specified. Sources defines a list of other table data services that will be presented as a new table data service:name: [String or List]
- A string value refers to a configured table data service. A list indicates multiple configured table data services that are deemed to be equivalent. This can be used for redundancy or failover.filters: [filter definition]
- (Optional) In a composed table data service, it is essential that data is segmented to be non-overlapping via filters and data layout. A given table location should not be served by multiple table data services in a group.tags: [list]
- (Optional) Strings that are used to categorize this Table Data Service.description
- (Optional) Text description of this Table Data Service. This text is displayed in user interface contexts.-
properties
: (Optional) - map of properties to be applied to this Table Data Service instance. See below.
Properties
The properties value specifies properties to be applied to this table data service instance.
Valid properties for LocalTableDataService instances include:
tableLocationsRefreshMillis
tableSizeRefreshMillis
refreshThreadPoolSize
allowLiveUserTables
Note: The following properties apply only to standalone uses of the LocalTableDataService:
LocalTableDataService.tableLocationsRefreshMillis
LocalTableDataService.tableSizeRefreshMillis=1000
LocalTableDataService.refreshThreadPoolSize
Valid properties for RemoteTableDataService instances include:
requestTimeoutMillis
Tags and Descriptions
All Table Data Service configuration sections now support tags and descriptions. Data Import Servers are implicitly Table Data Services, and tags and descriptions apply there as well.
This information is used to influence the user interface in several places.
In general:
- The
[description]
item provides explanatory text that can be displayed along with the name - A
[tag]
provides one or more labels that can be used to categorize the tagged item. - The
default
tag identifies an item in a set to be used as the default selection.
Persistent query type "In-Worker Service"
When "In-Worker Service" is selected as the Configuration Type in the Persistent Query Configuration Editor, routing services may be available. These routing services are taken from this routing file, and are based on the subsection. For example, only DIS services can see the dataImportServers entries. Options here impact the in-worker service configuration panel.
If a configuration has a "description", then that description will be displayed alongside the name in the routing service drop-down menu.
Note that:
- If a configuration has the tag "default", that item will be selected by default for a new query of the appropriate type.
- Only for "Data Import Server" - any DIS configurations with the tag "dis_script" will be excluded from the panel's routing services list.
- Only for "Data Import Server with Script" - if any DIS configurations have the tag "dis_script", then only DIS instances with that tag will be included in the drop-down list.
Persistent query type "Data Merge"
When "Data Merge" is selected as the Configuration Type in the Persistent Query Configuration Editor, the Merge Settings tab becomes available. This tab includes a drop-down list for Table Data Service Configuration.
Items in this list will show any configured descriptions. Note that:
- If any Table Data Service configurations have the tag "
merge
", then only instances with that tag will be included in the drop-down list. - If any Table Data Service configuration has the tag "
default
", that item will be selected by default for a new query of this type.
For example:
dataImportServers:
test_LastBy:
tags:
- dis_script
- default
description: Our Worker DIS
. . .
tableDataServices:
local:
tags:
- merge,
- default
description: Read data to be merged locally
. . .
query:
tags:
- merge
description: Read data from all system sources
. . .
The default selection is local
. Because local
and query
include the tag "merge", they appear in the drop-down list below. The default selection is local
:
Data Filters
Filters specify which services apply to a given data location, or which locations apply to a given service. In the "filters" section, which may be used within many of the primary sections discussed above, a single filter may be defined, or an array of filters. A location will be accepted if any defined filter accepts it.
There are two ways to specify these filters: using Query Language or Attribute Values. The filter attributes for the two modes are mutually exclusive.
Query Language Filters
Filter attributes whereTableKey
and whereTableLocationKey
contain boolean clauses in the Deephaven query language. These clauses can define the following :
NamespaceSet (String)
-NamespaceSet
isUser
orSystem
, and divides tables and namespaces betweenSystem
andUser
.Namespace (String)
TableName (String)
Online (Boolean)
- Online tables are those that are expected to change, or tick. This includes system intraday tables and all user tables.Offline (Boolean)
- Offline tables are those that are expected to be historical and unchanging. This is system data for past dates, and all user tables.Online
andOffline
categories both include user data, soOffline
is not the same as!Online
.InternalPartition (String)
ColumnPartition (String)
-whereTableLocationKey
queries apply to the Location Key (InternalPartition
andColumnPartition
) associated with a given Table Key.
If your sequence of filters relies on externally changing values such as the date, then you must provide a consistent view of those values for each routing decision. For the common case of currentDateNy()
, you can use the alternative com.illumon.iris.db.v2.locations.CompositeTableDataServiceConsistencyMonitor.consistentDateNy()
function. For arbitrary functions, you guard instances with the FunctionConsistencyMonitor
defined in com.illumon.iris.db.v2.locations.CompositeTableDataServiceConsistencyMonitor.INSTANCE
.
Examples
The following example filters to System
namespaces except LASTBY_NAMESPACE
, for the current date and the future:
filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `LASTBY_NAMESPACE`", whereLocationKey: "ColumnPartition >= com.illumon.iris.db.v2.locations.CompositeTableDataServiceConsistencyMonitor.consistentDateNy()"}
The next example filters to the same tables, but for all dates before the current date:
filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `LASTBY_NAMESPACE`", whereLocationKey: "ColumnPartition <= com.illumon.iris.db.v2.locations.CompositeTableDataServiceConsistencyMonitor.consistentDateNy()"}
Unlike the first two filters, the following example includes all locations for these tables:
filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `LASTBY_NAMESPACE`"}
Attribute Values Filter
This type of filter allows you to stipulate specific values for the named attributes. Since multiple filters can be specified disjunctively, you can build an inclusive filter by specifying the parts you want included in separate filters.
The attributes for a filter are:
namespaceSet
- "User" or "System".namespace
- The table namespace must match this value.tableName
- The table name must match this value.online
- true or false.online==false
means historical system data, or any user data.online==true
means intraday system data, or any user data.class
- This specifies a fully qualified class name which can be used to evaluate locations. This class must implementDataRoutingService.LocationFilter
orDataRoutingService.TableFilter
, or both (which is defined asDataRoutingService.Filter
).
A filter may define zero or more of these fields. A value of "*" is the same as not specifying the attribute. A filter will accept a location if all specified fields match.
Examples
There are several ways to specify filters in the YAML format, as illustrated in the examples below.
Example 1: Inline Map Format
sources:
- name: example1a
filters: {namespaceSet: System, online: false } # (system and offline) or user
Example 2: Map Elements on Separate Lines
- name: example1b
filters:
namespaceSet: System
namespace: "*"
tableName: "*"
online: false
Example 3: An Array of Filters. Each filter can be inline or on multiple lines.
- name: example2
# any of these filters
filters:
- {namespaceSet: System, online: true, class: com.illumon.iris.db.v2.locations.FilteredTableDataService$SystemIntradayActiveLocationFilterClass }
- {namespaceSet: User }
- {namespace: Example2Namespace, tableName: "*" }
Example 4: An Empty Filter
-name: everything2
filters: {}
Example 5: No Filter (same as an empty filter)
-name: everything1
YAML File Validation Tool
Deephaven includes a tool to validate data routing service configuration files, which can be used before putting them on a system. This will test for various common errors in the YAML file, and is performed by invoking a Java class, VerifyDataRoutingConfiguration
. This can be accomplished in one of two ways: via validate_routing_yml
, or directly via java.
Running each respective command should return 0 for a successful parse, and non-zero otherwise. The parsing or IOException will be printed on failure.
validate_routing_yml script
A script named validate_routing_yml
is provided in /usr/illumon/latest/bin
. This script takes a YAML filename to be validated as a parameter.
Example Command:
/usr/illumon/latest/bin/validate_routing_yml /etc/sysconfig/illumon.d/resources/routing_service.yml
Example Output:
Data Routing Configuration file "/etc/sysconfig/illumon.d/resources/routing_service.yml" parsed successfully
Java
Note that the location specified for the workspace must be a directory, or creatable as a directory, in which the user has write permission.
Command:
java -Dworkspace=/tmp/foo -Ddevroot=/tmp/foo
-DConfiguration.rootFile=iris-defaults.prop -cp -Dconfiguration.quiet=true -cp
"/usr/illumon/latest/java_lib/*"
com.illumon.iris.db.v2.configuration.VerifyDataRoutingConfiguration /etc/sysconfig/illumon.d/resources/routing_service.yml
Example Output:
Loading iris-defaults.prop
Configuration: workspace is /tmp/foo/
Configuration: devroot is /tmp/foo/
Configuration: Configuration.rootFile is iris-defaults.prop
Data Routing Configuration file "/etc/sysconfig/illumon.d/resources/routing_service.yml" parsed successfully
Example Deephaven Data Routing Configuration File
The following shows a sample template of a YAML configuration file for a three-node cluster, with a lastBy DIS and shared System (historical) data. You can download the full example using the button below.

config:
anchors:
# &name defines an anchor which allows "name" to be referenced elsewhere in the document
# This is effectively a map of the cluster
- &default-host "localhost"
- &localhost "localhost" # use this to be able to change how you refer to localhost
- &ddl-import "localhost"
- &ddl-rta "localhost"
# Define aliases for the default port values
- &default-tailerPort 22021
- &default-lasPort 22020
- &default-tableDataPort 22015
- &default-tableDataCacheProxyPort 22016
- &default-localTableDataPort 22014
storage:
- name: default
dbRoot: /db
# Define additional storage locations here
# - name: Example_LastBy
# dbRoot: /db/dataImportServers/Example_LastBy
# This section (name is arbitrary) defines an anchor with useful default values
# for dataImportServers, logAggregatorServers, and others.
defaults:
# DIS-default defines an anchor that has default values for dataImportServers entries.
# Instances below can override only what they need to change.
- &DIS-default
tailerPort: *default-tailerPort
storage: default
definitionsStorage: default
tableDataPort: *default-tableDataPort
# If a DIS instance adds a properties key, it will REPLACE this one. Add another anchor so they can be added back in.
properties: &DIS-defaultProperties
## The following properties are available for import servers
## TableDataService properties
# requestTimeoutMillis: 60000
#
# Start DIS Cache Hints
# Note that failover import servers handling the same data must have identical settings per table
# Table name and column name exact match:
StringCacheHint.tableNameAndColumnNameEquals_QuoteArcaStock/Exchange: ConcurrentUnboundedStringCache,String,1
#
# Column name exact match:
# NB: USym and Sym are still using Strings - in case of issues, this should keep ImportState instances compatible across bounces.
StringCacheHint.columnNameEquals_USym: ConcurrentUnboundedStringCache,String,20000
StringCacheHint.columnNameEquals_Sym: ConcurrentUnboundedStringCache,String,800000
StringCacheHint.columnNameEquals_Exchange: ConcurrentUnboundedStringCache,CompressedString,50
StringCacheHint.columnNameEquals_SecurityType: ConcurrentUnboundedStringCache,CompressedString,9
StringCacheHint.columnNameEquals_Parity: ConcurrentUnboundedStringCache,CompressedString,2
#
# Everything else:
StringCacheHint.default: ConcurrentBoundedStringCache,MappedCompressedString,1000000,2
#
########## End DIS Cache Hints ##########
# LAS-default defines an anchor that has default values for logAggregatorServers
- &LAS-default
port: *default-lasPort
host: *localhost
# if a LAS instance adds a properties key, it will REPLACE this one. Add another anchor so they can be added back in.
properties: &LAS-defaultProperties
## the following properties are available for log aggregators
# binaryLogTimeZone: America/New_York
# messagePool.minBufferSize: 1024
# messagePool.maxBufferSize: 1048576
# messagePool.bucketSize: 1000
# BinaryLogQueueSink.pollTimeoutMs: 1000
# BinaryLogQueueSink.idleShutdownTimeoutMs: 300000 (5 minutes)
# BinaryLogQueueSink.queueSize: 1000
- &LocalTableDataService-default
# if a TDS instance adds a properties key, it will REPLACE this one. Add another anchor so they can be added back in.
properties: &LocalTableDataService-defaultProperties
## the following properties are available for LocalTableDataService instances -
## that is, for tableDataServices which are LocalTableDataServices
# tableLocationsRefreshMillis: 10000
# tableSizeRefreshMillis: 30000
# requestTimeoutMillis: 60000
# allowLiveUserTables: false
dataImportServers:
# The primary data import server
db_dis:
# import the default values
<<: *DIS-default
host: *ddl-import # reference the address defined above for "ddl-import"
userIntradayDirectoryName: "IntradayUser"
webServerParameters:
enabled: true
port: 8084
authenticationRequired: false
sslRequired: false
filters:
- {namespaceSet: System, online: true}
properties:
# re-add the default properties (omit the <<: if you do not want the defaults)
<<: *DIS-defaultProperties
## any additional properties for this DIS
# Define a data import server for user data
# NOTE: as configured here, this import server is actually the same process as db_dis. It is provided for control over routing.
# Since this is only for routing, userIntradayDirectoryName is not used, but set the same as db_dis to avoid confusion
db_rta:
<<: *DIS-default
host: *ddl-import
userIntradayDirectoryName: "IntradayUser"
tailerPort: *default-tailerPort
filters: {namespaceSet: User}
# This defines an in-worker lastBy import server
# Example_LastBy:
# host: *ddl-import
# tailerPort: 22222
# filters:
# - {namespace: LASTBY_NAMESPACE}
# webServerParameters:
# enabled: false
# storage: Example_LastBy
# tableDataPort: 22223
# description: an in-worker dis
# tags: [default, dis_script]
# Filter matches will be "or" matches; The first match will be returned.
logAggregatorServers: !!omap # need an ordered map for precedence, to allow the filters to overlap
# User data will go to the rta instance
- rta:
<<: *LAS-default
host: *ddl-rta
filters:
- namespaceSet: User
# All other data goes to the log aggregator on localhost
- log_aggregator_service: # default LAS
<<: *LAS-default
# properties:
# optionally re-add the default properties if you add properties here
# <<: *LAS-defaultProperties
# The tableDataServices section defines single and composite table data services.
# LocalTableDataService, TableDataCacheProxy, and other republishing processes will
# typically select the name matching the process.name property.
tableDataServices:
# Note: dataImportServices entries are implicitly available for use in
# tableDataService composition.
# Define the default local (from disk) LocalTableDataService
local:
storage: default
tags:
- merge
- default
# Configuration for the LocalTableDataServer named db_ltds, and define
# the data published by that service
db_ltds:
host: *ddl-import
port: *default-localTableDataPort
storage: default
<<: *LocalTableDataService-default
#properties:
# optionally re-add the default properties if you add properties here
# <<: *LocalTableDataService-defaultProperties
# tableLocationsRefreshMillis: 10000
# tableSizeRefreshMillis: 30000
# requestTimeoutMillis: 60000
# Configuration for the TableDataCacheProxy named db_tdcp, and define
# the data published by that service.
# There is typically one such service on each worker machine.
db_tdcp:
host: localhost
port: *default-tableDataCacheProxyPort
sources:
# SYSTEM_INTRADAY tables for "current date"
- name: db_dis
filters: {whereTableKey: "NamespaceSet = `System` && Online", whereLocationKey: "ColumnPartition >= currentDateNy()"}
# LTDS for SYSTEM_INTRADAY non-current date
- name: db_ltds
filters: {whereTableKey: "NamespaceSet = `System` && Online", whereLocationKey: "ColumnPartition < currentDateNy()"}
# A db_rta source is useful in case it is different from db_dis
- name: db_rta
filters: {namespaceSet: User}
# Example of tdcp which includes a lastBy import server
#db_tdcp_with_lastby:
# host: localhost
# port: *default-tableDataCacheProxyPort
# sources:
# # SYSTEM_INTRADAY tables for "current date", minus LASTBY_NAMESPACE tables handled by Example_LastBy
# - name: db_dis
# filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `LASTBY_NAMESPACE`", whereLocationKey: "ColumnPartition >= currentDateNy()"}
# # LTDS for SYSTEM_INTRADAY non-current date, minus LASTBY_NAMESPACE tables handled by Example_LastBy
# - name: db_ltds
# filters: {whereTableKey: "NamespaceSet = `System` && Namespace != `LASTBY_NAMESPACE`", whereLocationKey: "ColumnPartition < currentDateNy()"}
# # all user data
# - name: db_rta
# filters: {namespaceSet: User}
# # only LASTBY_NAMESPACE data
# - name: Example_LastBy
# filters: {whereTableKey: "NamespaceSet = `System` && Namespace == `LASTBY_NAMESPACE`"}
# Define the TableDataService for query servers with local filesystem access
# to historical system data.
query:
description: read from all table data sources
tags: [merge]
sources:
- name: local
filters: {online: false } # (system and offline) or user
- name: db_tdcp
filters: {online: true}
# Define the TableDataService for query servers without local filesystem access
# to historical system data - use ltds instead.
query_ltds:
sources:
- name: db_ltds
filters: {whereTableKey: "Offline"}
- name: db_tdcp
filters: {whereTableKey: "Online"}
Download the YAML Configuration Template
Last Updated: 16 February 2021 18:07 -04:00 UTC Deephaven v.1.20200928 (See other versions)
Deephaven Documentation Copyright 2016-2020 Deephaven Data Labs, LLC All Rights Reserved