OPALS Software Concept

This document describes the overall OPALS software design and is divided into the following sections:

Overview

Since the mid-1990s, the Institute of Photogrammetry and Remote Sensing (I.P.F.) is engaged in Airborne Laser Scanning (ALS) in research and development. Scientific contributions have been made in a wide field of related topics like full waveform signal analysis, georeferencing and filtering of ALS point clouds, automatic breakline modelling, DTM generation, quality control, etc. Apart from that, converting research ideas into software solutions is an enduring tradition at the I.P.F. for which the DTM program SCOP++ is an example. Partial solutions of ALS-related issues have been implemented in SCOP++, but a complete processing chain is missing, as the development cycles for this highly interactive program are long.

Thus, the objectives of the new OPALS program system are to provide a complete processing chain for large ALS projects and to shorten development cycles significantly. OPALS is designed as a collection of small well-defined modules which can be accessed in three different ways: (i) from DOS/Unix shells as executables, (ii) from Python shells as full-featured, platform-independent Python modules or (iii) from custom C++ programs by dynamic linkage (DLL) for fastest module calls. Sophisticated custom processing chains can be established by freely combining the OPALS modules using shell or Python scripts. To reduce development times, a lightweight framework is introduced. It allows non-expert programmers to implement their own modules, concentrating on the implementation of their latest research outcomes, whereas the framework deals with general issues like validation of user inputs, error handling, logging, etc. In this way, new research outcomes get available more rapidly for the scientific community. OPALS does not only target researchers, but also ALS service providers dealing with large ALS projects. Efficient data handling is a precondition for this purpose. Thus, the OPALS data manager (ODM) is one of the core units, allowing administration of data volumes in the order of 10^9 points. The ODM acts as spatial cache and provides high-performance spatial queries.

In the following sections the background concept of the program system OPALS are described.

ALS data processing

sw_concept_als_proc.png
Fig. 1: Proposed ALS flowchart representing the basis for the OPALS software

Fig. 1 shows the proposed processing chain based on full waveform ALS data. The first section deals with the derivation of the 3D point cloud starting with the raw observations. On the one hand, the flight path is determined combining the observations of GNSS and IMU, and on the other hand, analysis and radiometric calibration of the echo waveform is performed, resulting in additional echo attributes like echo width, amplitude, and backscatter cross-section. For each strip, a 3D point cloud is derived by means of direct georeferencing. At this stage of the workflow, high-level data adminstration becomes relevant the first time, as subsequent quality control steps require efficient point data access mainly on a per-strip basis. Quality checks include the verification of full data coverage and the compliance with minimum point densities based on density maps as well as the examination of the strip registration precision using e.g. colour coded strip difference maps. For the latter, digital surface models (DSM) of the ALS strips and the list of all overlapping strips are required, again advocating for a strip-based data management as mentioned above. For checking the absolute planar and vertical accuracy, external reference data is necessary. As a result of quality control, calibration of the measurement system and/or strip adjustment may become inevitable, resulting in new 3D point clouds. Based on them, the quality control cycle is repeated until the desired quality criterions are met.

Once the final 3D point cloud is processed, a tile based or seamless data management supersedes the strip-wise administration, combining the point data (coordinates and echo attributes) of multiple ALS strips. The next major steps in the processing chain are the classification of each echo of the point cloud into either terrain or (different classes of) off-terrain points - often referred to as filtering - and the delineation of natural and artificial lineaments (breaklines and other structure lines). These processes mainly rely on geometric criterions (point height distribution of neighbouring points, angles between planar patches, etc.) but, additionally, the echo attributes derived from the full waveform backscatter signal help improving the precision and reliability of the results. Thus, the data administration must provide both efficient spatial access (nearest neighbour and range queries) as well as flexible administration of arbitrary attributes. Finally, the DSM and the DTM - both among the most important products of ALS

  • are interpolated either as regular grids, hybrid grids considering breaklines, or triangular irregular networks (TIN), serving as basis for many subsequent fields of application like building modelling, vegetation mapping and other forestry applications, hydraulic modelling for simulation of flood boundaries, and many more. A second quality control cycle (not shown in Fig. 1) for model compliance marks the last step of an ALS project.

Modern ALS sensors provide high point densities in the range of dozens of points per m2 and, hence, the ALS processing software has to deal with billions of points even in a single ALS project. An efficient data management is, therefore, a precondition for successful project handling. File based data management, even using standard binary file formats like LAS or ESRI shapefile , only partially allows administration of additional attributes and doesn't provide geometry indices essential for fast spatial data access. On the other hand, geo-relational database systems like PostgreSQL/PostGIS or Oracle Spatial offer full administration capabilities concerning both geometry and attributes, but the strength of these systems is the long-term archiving and persistence aspect rather than near-realtime data access as necessary for ALS projects.

OPALS Software Concept

The main objectives of the OPALS software can be formulated as follows:

  • complete processing chain from raw data to various products, e.g. DTM
  • automatic work flow for huge data volumes
  • rapid availability of recent research outcomes as software modules
  • platform for sustainable scientific development beyond the duration of a PhD

To achieve these objectives, the basic concepts for the OPALS software are:

  • modular design based on small components
  • accessibility of modules as command line executables, Python modules, and via C++ API
  • individual process control via scripts
  • data administration based on the OPALS data manager
  • interfaces for efficient data exchange with DTM, GIS and visualisation software
  • abdication of interactivity as far as possible

The OPALS program system is mainly designed for automatic processing. Thus, a sophisticated graphical user interface and interactive editing steps are omitted deliberately. This may seem disadvantageous at first glance. However, OPALS is split into small, well-defined modules that may be combined freely, resulting in flexible, custom processing chains. For instance, the derivation of a hypsometric map of a single ALS flight strip is achieved using three different modules: import of strip point data, DTM grid interpolation and derivation of the colour coded rastermap, facilitating the re-usability of the respective components in different application environments. To further ease this combination, OPALS modules can be accessed in three different ways: (i) from command prompts / Unix or Linux shells as executables, (ii) from Python shells using platform-independent Python code, and (iii) from custom C++ programs by dynamic linkage (DLL). The latter allows experienced users direct embedding of OPALS components in their own C++ programming environment. By contrast, the former two options (stand-alone executable and Python API) allow combining OPALS modules in either Unix/Batch or Python scripts. Scripting is a powerful instrument, as it enables the construction of complex, custom processing chains by freely combining OPALS modules.

As pointed out before, efficient data management is regarded to be of crucial importance. Thus, the OPALS Data Manager (ODM) was developed, featuring high-performance spatial queries of point and line data even for large project areas. The ODM acts as a spatial cache combining the simplicity and efficiency of file based processing and the flexibility and expandability of database systems. An independent import module is provided to read data in arbitrary data formats or even to extract data from spatial databases, and to build up the ODM data structure. Subsequent application modules take an ODM file as input and have access to the coordinate and echo attribute information via an internal ODM library. Analogously, an export module is available for converting data stored in the ODM back to a series of supported file formats and databases. Due to the omission of interactivity, high-performance interfaces to external program systems like DTM, GIS, editing or visualisation programs are provided. In this regard, OPALS makes intensive use of open source solutions like the geodata abstraction library GDAL for accessing grid data and the OGR simple feature library (subset of GDAL) for interchange of point and line related data. This provides conformance of OPALS to the specifications of the Open Geospatial Consortium (OGC) on the one hand and facilitates data exchange with software systems like GRASS on the other hand which use the same technology.

Apart form a complete processing chain and the ability to handle huge data volumes, another main goal of OPALS is to shorten the time span for transforming research results into software modules. This is mainly achieved by using a light-weight framework allowing non expert programmers to concentrate on the actual research problem whereas the framework deals with general programming issues like validation of user inputs, error handling, logging, etc.

In the following sections the OPALS framework and the basic concepts of the OPALS data manager are explained in more detail.

OPALS framework

Every OPALS module is thought to take some input data, apply a set of algorithms considering certain parameters, and finally produce some output. This involves numerous recurring, common tasks like interface definition, user input validation, error handling, logging, progress control, licensing, and the like. Using a machine-oriented programming language like C++, these parts become voluminous, while the actual algorithm to be implemented may represent only a small fraction of the entire code. To disburden module programmers from all these matters, the light-weight OPALS software framework was set up.

The implementation of research outcomes by the different researchers themselves facilitates rapid public applicability on the one hand. On the other hand, these compact implementations are prone to reflect peculiar programming styles concerning naming conventions, log file layout, etc. in their interface. However, a uniform behaviour and look-and-feel of the individual modules is essential in order to ensure module interoperability and short training periods on behalf of users, which is therefore encouraged by the framework.

As mentioned above, the definition of interfaces and the validation of user inputs are core features of the OPALS framework. For each parameter, module programmers only need to specify a parameter descriptor, the respective value type, the optionality, and a help text. Parameters belong to one of four classes of optionality: (i) mandatory, (ii) estimable, (iii) ignorable, and (iv) aborting (e.g. 'help'). A parameter for a general input file may serve as a representative example:

( inputfile, opals::String, 0, "input file name" )

This parameter is specified by an intuitive name (inputfile), it accepts string-type values, it is mandatory, as indicated by the optionality value 0, and its meaning is further explained by a help text. Based on this generic parameter description, the framework performs a series of uniform tasks. First, separate get-, set-, and isSet- functions are created for each parameter, considering its name and data type. This is achieved using preprocessor macros, which result in the following, automatically generated code:

opals::String get_inputfile() const;
void set_inputfile ( const opals::String &inFile );
bool isSet_inputfile() const;

Furthermore, the implementations as executable, Python module and shared library with C++ API are defined by the framework, again based on macros. Finally, the framework provides uniform functions for writing log files in a clear XML structure and an error handling system based on exceptions. An example module is described in more detail in section Module example.

OPALS data manager

Compared to the flexibility and the generality of spatial databases, applications covered by OPALS require only a limited set of spatial operations onto the original data. For example, window and nearest neighbour queries are key operations for DTM interpolation, normal estimation, segmentation and similar tasks. Considering the properties of the source data and queries which have to be supported, the OPALS data manager (ODM) was designed and implemented to achieve maximum performance. The ODM operates as a spatial cache on top of either a low-level file based data administration or a spatial geodatabase running in the background.

The ODM stores point data in a K-d tree, a generalisation of a binary search tree (Bentley, 1975), and more complex geometries in an R*-tree (Beckmann et al., 1990). The K-d tree is an extremely fast spatial indexing method. This static indexing structure only supports point data, but its speed is outstanding which is why the disadvantage of two separated spatial indices and the limited support for insertion/deletion was accepted. Both indices are wrapped in the ODM, such as if all geometry data were managed in a single spatial index structure. Both indexing methods have to be thread safe as multiple processing threads may access and modify the data manager simultaneously. Huge ALS projects can easily exceed the memory of todays computers. Hence, it was necessary to develop an extended K-d tree which swaps unneeded data to disk in an efficient manner. Therefore, the overall data area is spilt into tiles. The point data of each tile are then indexed by one K-d tree. An intelligent stacking system guarantees a low degree of data swapping which is crucial for the overall system performance. More details about point data administration using multiple K-d trees can be found in (Otepka et al., 2006).

Apart from fast spatial queries, the ODM also provides an administration scheme for storing attributes of arbitrary number and data type on a per-point basis. The additional point attributes may either stem from the initial analysis of the full waveform signal (e.g. echo width, amplitude, etc.) but may also be calculated by one of the OPALS modules. The three components (nx, ny, nz) of the surface normal vector, for instance, may be calculated for each ALS point in a separate module, stored as additional information to be used by a different module dealing with segmentation of surface elements. Thus, the additional information system is highly dynamic and can, therefore, be used to communicate information between different modules without the need for external storage of attributes.

Module example

By means of the concrete Module Grid, this section gives more detailed background information about OPALS internals. It is rather a developer oriented bit of documentation, but may also be interesting for OPALS users as it gives a brief overview how to embed OPALS modules in a scripting environment.

The scope of Module Grid is to derive a regular grid in GDAL-supported format, based on an ODM file. Simple interpolation techniques like moving-planes are applied on the basis of n nearest neighbours. For the sake of clarity, only the five most important parameters are shown in this example, whereas the real Module Grid features some more.

As pointed out in section OPALS framework, the module programmer basically has to provide the generic parameter description and the implementation of the runModule() function. The following parameter list is used (namespace opals was ommited):

(( inFile, Path, 0, "input ODM file name" )) \
(( outFile, Vector<Path>, 1, "output gridfile name" )) \
(( gridSize, double, 2, "model grid width" )) \
(( interpolation, GridInterpolator::Type, 2, "interpolation method" )) \
(( neighbours, int, 2, "nr of nearest neighbours")) \

The OPALS framework automatically creates the C++ code for the command line executable as well as the C++/Python API for Module Grid. The following usage screen appears when entering opalsGrid within the command prompt without any further parameters:

Usage opalsGrid:
--inFile Path input ODM file name
--outFile Vector<Path> (=estim) output gridfile name
--gridSize double (=1) model grid width
--interpolation GridInterpolator (=movingPlanes) interpolation method
--neighbours int (=8) nr of nearest neighbours

Below the header, one line is printed for each parameter. Each line starts with the parameter descriptor, the parameter type, followed by a value, if assigned. Estimable parameter values are indicated by a following '(=estim)', while constant default values are shown in round brackets. Mandatory parameters lack these indications. At the end of each parameter-line, an explanatory text is output. OPALS makes extensive use of the open source boost C++ libraries. For the executables, boost::program_options is applied to parse and store the command line parameters. In the following code snippet, the C++ class declaration of class ModuleGrid is shown. The class is derived from ModuleBase (base functionality for all modules) and opals::IGrid the virtual class (interface) that is visible to the C++ dynamic linkage libaries. The C++ interfaces are fully self-contained and therefore do not make use of boost or standard template library classes.

class ModuleGrid : public ModuleBase, public opals::IGrid
{
public:
// Constructors and Destructor
ModuleGrid();
ModuleGrid(const ModuleGrid &ref);
virtual ~ModuleGrid();
// set parameters
virtual void set_inFile ( const Path &inFile );
virtual void set_outFile ( const Vector<Path> &outFile );
virtual void set_gridSize ( const double &gridSize );
virtual void set_interpolation ( const GridInterpolator::Type &interpol );
virtual void set_neighbours ( const int &neighbours );
// query if parameters are set
virtual bool isSet_inFile() const;
virtual bool isSet_outFile() const;
...
// get parameter value functions
virtual Path get_inFile() const;
...
virtual int get_neighbours() const;
protected:
virtual void finalizeModuleInput_();
virtual void runModule_();
void estimate_outFile();
};

This class declaration is also the basis for the C++ API DLL and Python interfaces, where the latter is exported using boost::python. Please note, that the declaration and the definition (not shown here) of all access and query functions (set parameter, get parameter) were created automatically by the OPALS framework. The module programmer is only responsible for implementing the finalizeModuleInput_() function to verify the integrity of the parameter settings (cross dependencies of parameters) and the actual runModule_() function. An estimate outFile() function also appears in the declaration due to the optionality value 1 of parameter outFile, which indicates a parameter whose value may be estimated (in this case based on the input file name). For all remaining parameters with optionality value 2, an appropriate, constant default value exists (c.f. usage screen).

The following snippets demonstrate how OPALS modules can be applied in scripts, beginning with a simple Batch-file:

@echo off
rem +++ Simple opalsGrid Batch Example
echo Running opalsGrid...
call opalsGrid -inf=strip1.odm -out=strip1-dtm.tif -grid=0.5
echo Done!

In this example, a regular 0.5m grid is derived, and stored in Geo- Tiff file format (due to the output file extension .tif). The example shows that command line parameters can be abbreviated (e.g. grid=0.5), and parameters may be omitted if appropriate default values are available as for parameters interpolation (default: moving planes) and neighbours (default: 8). The final code snippet shows the same example embedded in a Python script:

#Simple Module Grid Python Example
#+++++++++++++++++++++++++++++++
import opals
from opals import Grid
mygrid = Grid.Grid()
print "Running Module Grid..."
mygrid.set_inFile('strip1.odm')
mygrid.set_outFile('strip1-dtm.tif')
mygrid.set_gridSize(5.0)
mygrid.run()
print "Done!"
#+++++++++++++++++++++++++++++++

More information about about embedding OPALS module in scripting environments can be found in sections Using OPALS in a Scripting Environment.

References

If you use the program system OPALS in scientific project, please cite one of the listed I.P.F. references in your paper.

Additional Web references

ASPRS LAS file format

Boost C++ Libraries

ESRI Shapefile Technical Description

Geospatial Data Abstraction Library

Grass GIS

Open Geospatial Consortium

Oracle Spatial

PostGIS