Installing Python
Installing Deephaven Python Packages on the Server
To integrate Python with Deephaven, the following server installations are required. This should only be required once for all users, provided the suggestions are followed. These instructions will be appropriate for a default Centos 7 deployment, and assumes a successful Deephaven rpm install prior to following these instructions.
The deephaven
and deephaven_jpy
packages are tested for compatibility with Python 2.7(>=.9), 3.6, and 3.7, and can be installed for use with any or all of these Python versions. Currently, in the Centos repository, is Python 2.7 and Python 3.6, so out-of-the-box integration is easiest with these versions.
Install Required Python items
Required Python components can be installed by running the following:
sudo yum install python36 python36-devel python36-virtualenv python-virtualenv python-devel
Note that Centos 7 comes with Python 2.7 installed as a dependency.
Auto-Provisioning of Python Virtual Environments
Deephaven users who would like to use a Python worker or a Jupyter notebook will need to set up and configure virtual environments for their Deephaven installation.
Currently, Deephaven has three virtual environments:
/db/VEnvs/python27
/db/VEnvs/python36
/db/VEnvs/jupyter
These environments are not installed by default.
To configure the Deephaven venvs automation, an admin will need to invoke /usr/illumon/latest/install/python/setup_dh_auto_provision.sh
on all of the worker boxes. This sets up configuration in /etc/sysconfig/deephaven/python/
that configures the automation of the Deephaven default VEnvs.
The provisioning of virtual environments is done by /usr/illumon/latest/install/python/auto_provision.sh
(which looks in /etc/sysconfig/deephaven/python/
for configuration). This step is automatically run during the post-install process of the RPM. (Note: this has no effect if nothing has been configured.)
/db/VEnvs
are owned by irisadmin
. This cannot be changed as we enforce these permissions on every upgrade. However, if users do not want irisadmin
to own their own VEnvs, they can configure them accordingly (see below).
Customer-Configured Virtual Environments
Customers may choose to configure their own auto provisioned virtual environments. This is a more advanced use case, and we recommend contacting customer support directly for detailed instructions.
As irisadmin
, invoke:
/usr/illumon/latest/install/python/auto_provision.sh <name> <owner> <env-dir> <requirements-file>
<name>
must be a unique identifier for the VEnv.<owner>
is the user account who will own the VEnv.<env-dir>
is where the VEnv will be created.<requirements-file>
will be therequirements.txt
used to create the VEnv.
Deephaven ships the following requirements.txt
:
/usr/illumon/latest/python/envs/py27/specs/worker-standard/requirements.txt
/usr/illumon/latest/python/envs/py36/specs/worker-standard/requirements.txt
/usr/illumon/latest/python/envs/py36/specs/worker-jupyter/requirements.txt
Note that all such Virtual Environments will be automatically updated whenever Deephaven is upgraded.
Properties (JPY JVM Flags)
These properties are meant to be set on any given query worker to select a particular Python virtual environment. The locations of these properties match the paths that are setup when /usr/illumon/latest/install/python/setup_dh_auto_provision.sh
is run. These virtual environments, once setup, will be automatically updated whenever the Deephaven product is updated. This results in zero-effort maintenance of Python virtual environments that can be selected by setting a single prop:
jpy.env
The JVM flags mentioned in the examples below follow:
- jpy.programName references the path to the Python executable for your Python environment.
- jpy.jpyLib references the jpy dynamic library from your Python environment, which comes from the
deephaven_jpy
package in your environment. - jpy.jdlLib references the jdl dynamic library from your Python environment, which also comes from the
deephaven_jpy
package in your environment. - jpy.pythonLib references the dynamic library associated with your Python executable.
Examples
This allows selecting a VEnv by passing -Djpy.env=python36:
[jpy.env=python36] {
jpy.programName=/db/VEnvs/python36/bin/python3.6
jpy.pythonLib=/usr/lib64/libpython3.6m.so.1.0
jpy.jpyLib=/db/VEnvs/python36/lib/python3.6/site-packages/jpy.cpython-36m-x86_64-linux-gnu.so
jpy.jdlLib=/db/VEnvs/python36/lib/python3.6/site-packages/jdl.cpython-36m-x86_64-linux-gnu.so
}
This allows selecting a VEnv by passing -Djpy.env=python27:
[jpy.env=python27] {
jpy.programName=/db/VEnvs/python27/bin/python2.7
jpy.pythonLib=/usr/lib64/libpython2.7.so.1.0
jpy.jpyLib=/db/VEnvs/python27/lib/python2.7/site-packages/jpy.so
jpy.jdlLib=/db/VEnvs/python27/lib/python2.7/site-packages/jdl.so
}
This allows selecting a VEnv by passing -Djpy.env=jupyter:
[jpy.env=jupyter] {
jpy.programName=/db/VEnvs/jupyter/bin/python3.6
jpy.pythonLib=/usr/lib64/libpython3.6m.so.1.0
jpy.jpyLib=/db/VEnvs/jupyter/lib/python3.6/site-packages/jpy.cpython-36m-x86_64-linux-gnu.so
jpy.jdlLib=/db/VEnvs/jupyter/lib/python3.6/site-packages/jdl.cpython-36m-x86_64-linux-gnu.so
}
It is worth reemphasizing that the Python version used on the client must match the Python version used on the worker, which is a requirement imposed for working serialization by the dill module.
Testing the Python Server Installation
To test the Python installation, open a Deephaven console. To the right of Session Type, click the drop-down list and select Python as shown below. Then, under Advanced Options, supply the appropriate jvm flags for the desired Python virtual environment, then click Connect.
Once the console has connected to the server, execute the following statement in the console:
from deephaven import *
If this does not result in an error, then your Deephaven environment should be properly configured. To verify you are using the intended Python version, execute the following:
import sys; print(sys.version)
For a more functional test, try the following:
foo = lambda x: x*x
z = 3.1415
tt = db.timeTable("00:00:01").update("I=i", "Y=foo.call(i)", "Z=z")
If the table below appears in the lower portion of the console window, Python is ready to use in the Deephaven console.
Python Client Workstations
The following client installations are only required if you plan to execute Python queries outside of the Deephaven console. The "local client - remote worker" scenario requires that Python objects are serialized and deserialized via the dill
module, which relies on using the pickle
protocol, which is not compatible between versions of Python. It is a requirement that the same version of Python is being used by the local client and remote worker.
Java Environment Variable Configuration
For Python integration to function properly, the jpy module requires that the environment variable JDK_HOME
is properly set for your JDK installation and that PATH
is defined to contain the appropriate Java library files.
Windows:
set JDK_HOME=C:\Program Files\Java\jdk<version>
set PATH=%JDK_HOME%\bin;%JDK_HOME%\jre\bin\server;%PATH%
Mac:
export JDK_HOME=$(/usr/libexec/java_home)|
export PATH=$PATH:$JDK_HOME/bin
Linux:
export JDK_HOME=/usr/java/latest
export PATH=$PATH:$JDK_HOME/bin
Python Packages
The deephaven_jpy
package is used to translate between Java and Python, and must be installed as a bridge between the Python interpreter and Deephaven's Java infrastructure. The deephaven_jpy
has no Python package dependencies, requires that the Java setup noted above is correct. It can be installed from PyPI for 64-bit Windows and OS X, and a .whl is available for direct download for linux from https://github.com/illumon-public/illumon-jpy/releases. The deephaven
package depends on deephaven_jpy, dill, wrapt, numpy
, and pandas
.
Package Installation
There are many ways to setup and maintain a Python environment - far too many to cover here. However, the following provides simple guidance for a few different options:
Windows Guidance
Anaconda is probably the simplest way to setup and maintain a Python environment on Windows, and the necessary packages are in the conda repository. After installing the desired Anaconda version (which also supplies pip), simply perform the command
conda install dill wrapt numpy scipy pandas
For 64-bit Windows. Deephaven packages can be installed by executing the following:
pip install deephaven deephaven_jpy
Mac OSX Guidance
A stripped down version of Python comes installed with osX, but there are many fewer complications observed when avoiding the use of this OS X system Python. It is recommended that the user use Anaconda (same as Windows instructions), Homebrew, or Macports. Anaconda and Macports segregate all packages from the Mac OS X native libraries, and makes it easier to avoid problems and conflicts with system packages. Homebrew is based on integrating with the system packages, which seems more problematic specifically for Python usage.
For Macports, after basic Macports configuration, execute the following:
sudo port install python<version>
where <version>
indicates the desired Python version (currently one of 27, 36, 37 is supported by Deephaven). Note that Macports supports installing and using multiple versions. After the install completes, follow with the dependencies. Assuming you are installing <version>=37
, execute the following:
sudo port install py37-pip py37-wrapt py37-dill py37-numpy py37-scipy py37-pandas
Note, perform the suggested port select --set pip pip<version>
to make the desired version of pip the default. Otherwise, directly use the appropriate pip command for the desired version, i.e. pip-2.7 or pip-3.7. Then run the following,
sudo pip install deephaven deephaven_jpy
Linux Guidance
Use the package manager, as appropriate for your Linux distribution, to install the appropriate version of Python and associated pip. It is likely that associated numpy
and pandas
packages also exist in the package manager, as may wrapt
and dill
. Otherwise, these packages will installed by pip as dependencies of the deephaven
package.
The deephaven_jpy is available for download from https://github.com/illumon-public/illumon-jpy/releases, with easy to follow instructions for installation provided there.
The deephaven
package can then be installed from PyPI by using pip:
sudo pip install deephaven
jpy Configuration Repair
If your deephaven_jpy
wheel was built on a machine with a different version of Java, then the loader will not find the appropriate libjvm
when you try import jpy from the Python prompt. First, try to update the JDK path used by deephaven_jpy by running the following:
<python> -m jpyutil
where <python>
indicates the appropriate python binary for your environment. Note: this requires that JAVA_HOME
or JDK_HOME
environment variable is set (as directed above) and write permission in the site-packages directory where the jpy.so
is located. If this requires root access (i.e., for system site-packages), then remember to forward the environment variables:
sudo -E <python> -m jpyutil
Examples of the resulting error follow:
ImportError: libjvm.so: cannot open shared object file: No such file or directory
or
ImportError: jvm.dll: cannot open shared object file: No such file or directory
If the libjvm
is NOT discovered by the above, then you will likely also have to set the library path (LD_LIBRARY_PATH
environment variable) for the loader to include the libjvm dynamic library (jvm.dll on Windows). On all systems, this should be the /server/ directory beneath your JDK location.
For OSX, this can be accomplished via
export LD_LIBRARY_PATH=$JDK_HOME/jre/lib/server:$LD_LIBRARY_PATH
This is only because the jpy dynamic library was built versus a different version of Java was and the RPath directive in the library is hard-coded to the incorrect libjvm location. On OSX, setting the LD_LIBRARY_PATH
will still not work if system Python is used (i.e., installed via Homebrew), because Apple System Integrity Protection (SIP) precludes redirection of the loader for System resources. This should not be an issue if the deephaven_jpy
wheel is built with the Java version present on the user's machine.
For Linux, this is accomplished (for most distributions) by defining the following:
export LD_LIBRARY_PATH=$JDK_HOME/jre/lib/amd64/server:$LD_LIBRARY_PATH
Setting Up The Deephaven Environment
Bootstrapping The Deephaven Environment
To use Deephaven from a client workstation, the Python packages discussed above must be installed. However, virtually all of the functionality of the deephaven
Python package requires that the jvm is initialized through jpy, with the Deephaven Java infrastructure appropriately initialized inside the jvm. Finally, the Deephaven data capabilities will most sensibly be used by connecting to a remote server that is running the appropriate Deephaven processes.
The local Deephaven configuration assumes a particular file structure for a collection of jar and configuration files. Setting up this structure is most easily accomplished through a bootstrap process directly using the remote server. This bootstrap process can be accomplished with the Python script in bootstrap.zip.
Deephaven workspace
The Deephaven workspace, as the name may suggest, will be the root of the assumed Deephaven file structure. Before executing the bootstrap process, the desired location (something like <user home>/deephaven/workspaces/remote
) should be set in the environment variable DEEPHAVEN_WORKSPACE
and the directory need not exist. That is, execute the following statement:
export DEEPHAVEN_WORKSPACE=<location>.
To make this value persistent (and automatic), put this statement in your .profile
or .bashrc
file. This is the root for the overall Deephaven workspace, and can be used for more than one remote server, if desired.
Deephaven devroot
The Deephaven devroot contains all jar and configuration files in an expected file structure. This should be generically be located at <DEEPHAVEN_WORKSPACE>/.iris/<instance>
. Here <instance>
some user chosen instance name, and the contents are directly tied to the specific remote server used to bootstrap/connect
. Before executing the bootstrap process, set the environment variable DEEPHAVEN_DEVROOT
to the desired location, and the directory need not exist.
In the case that more than one remote server may be used, you can setup more than one devroot, as in <DEEPHAVEN_WORKSPACE>/.iris/<instance_1>
and <DEEPHAVEN_WORKSPACE>/.iris/<instance_2>
, where <instance_1>
and <instance_2>
are named appropriately. Switching between the two amounts to setting the environment variable DEEPHAVEN_DEVROOT
as appropriate.
Important Note: the entire contents of DEEPHAVEN_DEVROOT
will be affected (deleted or modified) by executing the bootstrap process.
Executing Bootstrap process
Extract the bootstrap archive to the desired working location. Ensure that you have set the JDK_HOME
, DEEPHAVEN_WORKSPACE
, and DEEPHAVEN_DEVROOT
environment variables as directed above. With the working directory the extracted bootstrap directory, execute:
python update_workspace.py --host <host address>
Where <host address>
is a viable Deephaven host of the form:
http://<address>[:<port>]/iris or
https://<address>[:<port>]/iris
This will create any missing directory structure as necessary, and sync all appropriate files beneath DEEPHAVEN_DEVROOT
from the server indicated by --host
argument.
Deephaven propfile
The Deephaven propfile is an important way of passing properties for initialization of the remote worker (i.e., on the server). It is assumed to be located in directory <DEEPHAVEN_DEVROOT>/resources/
and the default is iris-console.prop
. This file will not exist until after the bootstrap process, but plays an integral role in the local client/remote worker execution model.
Before trying to perform any local client/remote worker processing, set the environment variable DEEPHAVEN_PROPFILE=iris-console.prop
. If desired, make a copy of iris-console.prop
and set DEEPHAVEN_PROPFILE
to this new location.
Important Note: the DEEPHAVEN_PROPFILE
is required to be in <DEEPHAVEN_DEVROOT>/resources/
. The entire contents of DEEPHAVEN_DEVROOT
will be affected (deleted or modified) by executing the bootstrap process. Specifically, any changes that you make in to the DEEPHAVEN_PROPFILE
will be overwritten.
For this discussion, the most important role of DEEPHAVEN_PROPFILE
is that this is where you will provide the appropriate jpy directives to tell the remote worker which python version to use. Inside the DEEPHAVEN_PROPFILE
file , whose full path is <DEEPHAVEN_DEVROOT>/resources/<DEEPHAVEN_PROPFILE>
, insert the line
RemoteQueryClient.extraJvmArgs=-Djpy.programName=<value> -Djpy.pythonLib=<value> -Djpy.jpyLib=<value> -Djpy.jdlLib=<value>
where these values are determined by the Python environment on the server. See JPY Flags section above for the specific values.
It is worth reemphasizing that the Python version used on the client must match the Python version used on the worker, which is requirement imposed for working serialization by the dill module.
Testing the Python Client Installation
After following the bootstrap instructions, the testDeephaven.py.txt script has an example of creating a remote query client and a remote database; and executing remote queries.
Note: On a Mac, you may need to install a JDK 6 to launch the integration even though you must actually be running JDK 8. See: https://github.com/s-u/rJava/issues/37.
Last Updated: 26 January 2021 10:16 -05:00 UTC Deephaven v.1.20200331 (See other versions)
Deephaven Documentation Copyright 2016-2020 Deephaven Data Labs, LLC All Rights Reserved