Class RspArray<T extends RspArray>

java.lang.Object
com.illumon.iris.db.v2.utils.RefCountedCow<T>
com.illumon.iris.db.v2.utils.rsp.RspArray<T>
Direct Known Subclasses:
RspBitmap

public abstract class RspArray<T extends RspArray>
extends RefCountedCow<T>

A set representation for long values using Regular Space Partitioning (RSP) of the long space in "blocks" of (2^16) elements.

Modeled heavily after roaringbitmap.RoaringArray (keeping API method names and semantics as much as possible), with modifications for:

  1. Full "unsigned long" 64 bit range (as opposed to 32 bit in RoaringArray).
  2. Spans of all bits set ("AllSet") that can be arbitrarily big (ie, not constrained to 2^16 = RB Container size).

The handling of unsigned values follows RB; ie, key values are compared/sorted as unsigned longs.

Definitions:

  • A "block" is a particular interval [n*2^16, (n+1)*2^16 - 1] of the long domain.
  • A "span" is a partition of the domain consisting of one or more consecutive blocks; a span is a subset of the domain represented by an interval [n*2^16, (n+m)*2^16 - 1], m >= 1.
  • Full blocks are blocks whose domain are fully contained in the set, ie, the set contains every possible value in the block's interval (as a bitmap, it would be "all ones").
  • Spans of full blocks are represented by a single "full blocks span" object (just a Long) which knows how many 2^16 ranges it has (it's "full blocks span len" ("flen") is the number of full blocks in the span).
  • Individual blocks that are not completely full are stored in an RB Container; their "full blocks span len" is zero.

Our internal representation uses two parallel arrays:

  • a long[] spanInfos array that contains the information for the offset to the values in the span, which we call the span's "key". For instance, a full block span that represents all the long values in [65536, 196607] has as its key the value 65536.
  • an Object[] spans array that contains the actual spans. At the most basic level, a span can be either a full block span or a container of values (but there is nuance in exactly how to represent them, see below).

We use several optimizations to reduce memory utilization for sparse sets. Details follow.

The long[] spanInfos and Object[] spans data members of this class are used, combined, to represent the offset (key) and span values in the set, against that offset. The two arrays are used, together, as parallel arrays and the information for a given conceptual span is contained in both of them for the same corresponding index i.

There are two basic cases for a span: it is either a full blocks span, containing a >=1 number of full blocks, or it is a container, containing individual values in the particular 2^16 block corresponding to the span's key.

There are four ways total that these two cases can be represented between the long in the `spanInfos` array and the Object in the `spans` array. Given a span at position `i`:

  1. If the corresponding Object spans[i] is of type Long, then the long spanInfos[i] value is the key for the span (with its lower 16 bits as zero), and the Long value represents how many full blocks are present. Example, the set [ 0, 2^50 - 1 ] is represented as spanInfo==0 and span==Long(2^34).
  2. As an optimization to conserve memory, if the Object spans[i] is the Object reference with value FULL_BLOCK_SPAN_MARKER (a singleton and final marker Object defined statically in this file). then the upper 48 bits of the long spanInfo[i] value represent the key for the span, and the lower 16 bits of the long spanInfo[i] value represent the full block span length. Example, the set [ 65536, 196607 ] is represented by spanInfo==65538 and span==FULL_BLOCK_SPAN_MARKER (note 196607 == 65536*3 - 1, so the set is 2 full blocks, and 65538 == 65536 | 2.
  3. If the corresponding Object spans[i] is null, then the long spanInfos[i] represents the single value present in the span (note in this case, its upper 16 bits still corresponds to its key). Example, the set { 65537 } is represented by spanInfo==65537 and span==null.
  4. If the corresponding Object spans[i] is of type short[] or of type Container, then it represents a container of multiple values in a single block (but not all of the possible values in the block, since in that case it would be a full block span as above). In this case the higher 48 bits of its corresponding spanInfo represent the key for the span. Depending on the actual type of span there are two subcases:
    1. If spans[i] is of type Container, then the values in the roaringbitmaps container object are part of the set, considered against its key offset. The key is represented in the higher 48 bits of its corresponding spaninfo. The lower 16 bits of spanInfo are zero in this case. Example, the set [ 100,000-100,010, 100,020-100,030 ] is represented by spaInfo==65536, span==RunContainer({34464-34474, 34484-34494})
    2. If spans[i] is of type short[], then an ArrayContainer with the short[] contents needs to be reconstructed. The lower 16 bits of the spanInfo value are used to represent the other data members of ArrayContainer. This case exists as an optimization to reduce memory utilization for sparse blocks. For details of this reconstruction please see the code for the definition of the SpanView class below.

Notes:

  • Our version of RB Container supports a "shared" boolean flag that is used to implement copy-on-write (COW) semantics and allow operation results to share containers in COW fashion.
  • We extended the Container class hierarchy to include specializations for empty, single value, single range, and two values containers. These are immutable; empty is used only as a way to return empty results, and are never actual stored in the spans array. For details, please see the Container class definition and derived class hierarchy.