OPALS Datamanager

The OPALS Datamanager (ODM) is key component of OPALS. It has been developed for efficient access to huge spatial data sets and for handling arbitrary attributes capsulated to geometry objects.

ODM as a spatial index

Efficient data handling is a precondition for today's LIDAR processing software. Even large ALS projects have to be processable in an appropriate amount of time. The ODM therefore tackles the requirement of huge data set handling (points in the order of 10^9 can be administered), but also, the support of multi-threaded processing tools.

The spatial indexing concept of the ODM considers the geometry data properties of ALS campaigns. Points are the primary data source, however, line and polygon geometries (i.e. structure lines, vegetation layers, building boundaries, etc.) have to be handled as well. To achieve maximum performance, the ODM splits point data and more complex geometry objects into two separated spatial indices.

Spatial index of point data

The point data are organized in a hierarchical structure with two levels. The level 0 index can be seen as matrix which partitions space into 2-dimensional tiles. An appropriate tile size is automatically determined during import (also see Module Import). Per default the first 0.2 million points are used to estimate the point density. Then the tile size is computed such that 0.2 million points hit one tile in average. For tile structure insert operations and spatial queries are straight forward and very fast in execution.

point_index_level0.png
Fig. 1: Level 0 Point Index: Only a set of tiles are kept in memory

In general the amount of point data clearly exceeds the physical available memory of today's computer. Therefore the ODM provides a spatially oriented swapping strategy based on the tiling structure. Tiles are always fully loaded up to a certain limit of points in memory (per default 5 million points). If new tiles are requested into memory the manager objects unloads disposable tiles from the end of a recent used list.

The level 1 index is responsible for indexing points within one tile. Since tiles are always fully loaded into memory level 1 indices do not require swapping functionality. The ODM does not read and write the actual level 1 index structure from/to file. Tests have shown that on-the-fly index building is equally fast to reading a full index structure if the number of points stays below a certain limit. This has two advantages:

  • Less information has to be written to file
  • Different level 1 indexing method can be used

Currently the ODM only uses kD-Trees (Bentley, 1975) for level 1 indexing. The kD-Tree, a generalisation of a binary search tree, is an extremely fast spatial indexing method. This static indexing structure only supports point data, but its speed for nearest neighbor search and range queries is outstanding. The use implementation is based on the kD-Tree class of CGAL (see Third party software).

kd-tree.png
Fig. 2: kD-Tree: k-Dimensional binary search tree

Spatial index of complex geometries

For line and polygon data the ODM uses a R*-tree (Beckmann et al., 1990) for spatial indexing. Each object is represented by its bounding box within the tree. The R*-tree splits space into hierarchically nested, and possibly overlapping, k-dimensional boxes. Using a special heuristic it is tried to minimize both, coverage and overlap of the boxes. The use implementation is based on the R*-Tree class of SpatialIndex (see Third party software)

r-tree.png
Fig. 3: R*-tree: hierarchically nested, and possibly overlapping, k-dimensional boxes

Analysing the index statistics of an ODM

After importing a data set into the ODM, opalsImport reports an index statistics as shown below (The same statistics can be retrieved at any time using the opalsInfo). Altough the automatic tile size estimation is robust against outliers, it is assumed that the point density of the first 0.2 million points is representative for the overall data set. Since the point density of LiDAR data is usually quite homogeneous the tile size estimation is usually non-crucial. In case the homogeneous assumption does not hold (e.g. for terrestrial laserscanning data) or for strange point arragment the tile size estimation algorithm may fail to select an appropriate value, which can cause problems in the subsequent processing (see FAQ). Hence, it is important to understand and interpret the reported index statistics correctly.

odm_index_stats_strip18.png
Fig. 4: Index statistics of demo data set strip18.las

Since the ODM has different indicies for point and polygon data, two table rows are be listed if geometries of both types are stored within the ODM. The spatial index statistics informs about the index type, the maximum depth of the data structure (is always one is case of tiling), the number of nodes (node count) and leafs (leafs count), the tile size in case of an tiling structures and some statistic parameters (min #owl, max #owl, mean #owl and stddev #owl) about the number of objects within leafs (#owl). The differentiation between nodes and leafs is coming from the terminology of data structures. A node (also known as an internal node, inner node or branch node) does have childeren, whereas, leafs (also known as an external node, outer node, leaf node, or terminal node) doesn't have any childeren. Within most data structures (like the ODM) the data itself are managed in leafs only. In case of the tiling structure of the ODM, the node count repesents the number of elements of the tile matrix and leafs count the number of tiles that contain point data. if node count is zero (c.f. Figure 4) the ODM is not in tiling mode yet.

odm_index_stats_other.png
Fig. 5: Index statistics of a data set with about 100000 points in each of overall 136 leaf tiles (tile size = 135m)

The second index statistics example (see Figure 5) comes from a flight strip containing approx. 15 mio points. For this data set an average approx. 100000 points are stored within one tile and the maximum number of points per tile is less than 300000. These are optimal values regarding processing speed and small index overhead. Since various OPALS modules perform parallel processing based on the tiling structure, a high number of points per tile can lead to problems while processing since tiles are always completly loaded into memory. The mean #owl should be below 0.3 mio points and the maximum #owl should not exceed 0.6 mio points. Otherwise, 'Unable to load new data tiles' exceptions may occur in case of high core environments (>= 8). The problem can be partly compensated by the global parameter -points_in_memory allow allocating more memory during processing or reducing the number of processing threads (see common parameter -nbThreads). However, it is recommended to re-import the data set with a smaller tile size (see Module Import parameter -tileSize) in case of doubt.

Preserve natural order of imported data

For optimal spatial performance the ODM organises and stores data in spatial order as described above. Hence, the natural order of the data gets lost. Although this is irrelevant for most applications, situations can arise where it is necessary to preserve natural order of the data. The ODM structure provides an additional index for storing the data order, which allows to export the data in the original import order. ATTENTION: This feature is not available yet

ODM as a database table

Handling arbitrary attributes for each geometry object is the second key feature of the ODM. This make each odm file comparable to a database table. Therefore nearly any file format can be repesented loss-free within the ODM. In general, attributes can be divided into two different groups

  • predefined attributes (attributes with semantic)
  • user-defined attributes (attributes without semantic)

Predefined attributes have fixed names and data types. From a module point of view those attributes have a well-defined semantic (and unit). Some OPALS modules (ie. Module Normals, Module EchoRatio, ...) uses predefined attributes as input and/or for storing computational results.

User-defined attributes, on the other hand, are intended to store OPALS irrelevant information or information that doesn't match predefined attributes. Such attributes can be identified by a prefixing '_' (underscore) character. ODM supports a wide range of different data types which are freely choosable for user-defined attributes:

Supported data types
Type Size
64bit integer 64 Bit
integer 32 Bit
unsigned integer 32 Bit
short 16 Bit
unsigned short 16 Bit
byte 8 Bit
unsigned byte 8 Bit
boolean 1 Bit
float 32 Bit
double 64 Bit
fixed length string n Bytes
unlimited string variable

For a describition how to define user-defined attributes during import please see Module Import, OPALS Generic Format, and LAS Format Definition. A full list of all predefined attributes can be found here.

For completeness it should be mentioned that predefined, but also, user-defined attributes can be used to filter the full data set during processing (for details see generic filter)

ODM file and layer structure

Each ODM file contains a small file table storing meta information about each imported file. Additional the ODM attaches to each geometry object the corresponding file id (attribute FileId). Hence, the original (multiple) import files can be reproduced export using Module Export.

The ODM imports full data information, but also, tries to represent the data structure using layers. The layer concept, as it is known from CAD software, is useful to group and organise the data in a specific way. Therefore ODM provides a layer table for each file. E.g. the layer structure of LAS file is based on the classification field. By retrieving the file and layer id (attribute LayerId) of objects the corresponding layer object entry can be accessed.

File and layer tables are automatically created during import. Up to now, OPALS does not provide tools to manipulate those information. However, the FileId and LayerId attribute of each geometry object can be used for filtering and processing the data.

ODM predefined attributes

ODM predefined attributes (attributes with semantic) and their corresponding data types
Name Type Comment
Amplitude float Linear scale value proportional to the receiving power
Attribute unsigned byte
BeamVectorX float X-component of beam vector (from scanner to point)
BeamVectorY float Y-component of beam vector (from scanner to point)
BeamVectorZ float Z-component of beam vector (from scanner to point)
Blue unsigned short Blue colour channel
ChannelDesc unsigned byte
Classification unsigned byte See LAS spec
ClassificationFlags unsigned byte See LAS spec
Confidence unsigned byte
ConstrainedFlag bool
CrossSection float
CumulativeDistance float See Riegl POF Format Spec.)
CurveParam float
CustomClassId unsigned byte
EchoNumber unsigned byte This is the k-th return/echo for a certain pulse, where for the first return: k==1 (see LAS spec.)
EchoRatio float
EchoWidth float Full width at half maximum [ns]
EchoWidthNormalised float
EdgeOfFlightLine bool
Estimator unsigned byte
FaceId unsigned integer triangle faces id
FileId unsigned short
GPSTime double
Green unsigned short Green colour channel
Id 64bit integer
InfraRed unsigned short Infrared (IR) colour channel
LASExtensions string
LayerId unsigned short
MarkedFlag bool
MaxCurvature float
MaxCurvatureDirection float
MinCurvature float
NormalEigenvalue1 float Highest eigenvalue of normal estimation covariance matrix
NormalEigenvalue2 float Second highest eigenvalue of normal estimation covariance matrix
NormalEigenvalue3 float Third highest eigenvalue of normal estimation covariance matrix
NormalEstimationMethod unsigned byte Normal estimator id (0...simple, 1...robust, 2...FMCD)
NormalLeftX float x value of normal vector left to a structure line
NormalLeftY float y value of normal vector left to a structure line
NormalLeftZ float z value of normal vector left to a structure line
NormalPlaneOffset float Offset from the current point to the estimated local plane
NormalPtsGiven unsigned byte Number of points given to the normal estimator
NormalPtsUsed unsigned byte Number of points used in the normal estimator
NormalRightX float x value of normal vector right to a structure line
NormalRightY float y value of normal vector right to a structure line
NormalRightZ float z value of normal vector right to a structure line
NormalSigma0 float Sigma0 of normal estimation
NormalX float x value of normal (unit) vector
NormalY float y value of normal (unit) vector
NormalZ float z value of normal (unit) vector
NormalizedZ float point height above DTM
NrOfEchos unsigned byte The pulse which this point is based on generated this number of returns/echoes (see LAS spec.)
PitchAngle float [radians] as defined by the aviation norm ARINC 705
PointCode string general alphanumeric point code
PointLabel string
PointSourceId unsigned short
RGIndex unsigned short
Range float
Red unsigned short Red colour channel
Reflectance float
Residual float
RollAngle float [radians] as defined by the aviation norm ARINC 705
ScanAngle float [radians] also: pan angle. Rotation of scanner head around primary axis of scanner (the only axis of 1-D scanners, for 2-D scanners: the pan angle)
ScanDirection bool
ScopSemantic integer
SegmentID unsigned integer unique segment id
SegmentPtsUsed integer Number of points contributing to a single segment
SigmaNormalFit float
SigmaX float
SigmaY float
SigmaZ float
SpreadAngle float
StructNr unsigned short
TangentSigmaX float
TangentSigmaY float
TangentSigmaZ float
TangentX float
TangentY float
TangentZ float
TiltAngle float [radians] Rotation of scanner head around secondary axis of scanner (undefined for 1-D scanners)
UltraViolet unsigned short Ultraviolet (UV) colour channel
UserData unsigned byte
VertexId unsigned integer
WaterDepth float Water depth of a submerged (bathymetric lidar) point
WinputCode unsigned short
YawAngle float [radians] as defined by the aviation norm ARINC 705

Datamanager Library

The OPALS distribution comes along with a ODM API (Application Programmable Interface) for C++ and Python. The library provides a low level acccess to ODMs as it is used by the OPALS modules. For details please refer to the C++ Datamanager library (DM) or the Python bindings of the Datamanager library (pyDM).

References

Otepka, J., Briese, C., Nothegger, C., 2006, Symposium of ISPRS Commission IV - Geo Spatial Databases for Sustainable Development, 6 pages

Beckmann, N., Kriegel, H.-P., Schneider, R. and Seeger, B., 1990. The r*-tree: An efficient and robust access method for points and rectangles. In: H. Garcia-Molina and H. V. Jagadish (eds), Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23-25, 1990, ACM Press, pp. 322-331.

Bentley, J. L., 1975. Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), pp. 509-517.