Fundamental Data Serving Concepts

Producer Field Guide

HGD_Product
Producer Field Guide
HGD_Portfolio_Suite
Producer

Rasters and vectors are the two basic data structures for storing and manipulating geographic information. The raster model stores data as cell values in a georeferenced grid. The vector model represents the information using geometric and mathematical associations.

Features and Feature Types

According to the Open Geospatial Consortium (OGC): "A feature is an abstraction of a real world phenomenon; it is a geographic feature if it is associated with a location relative to the Earth."

For more information see OpenGeospatial.org.

These features are a standard representation of vector datasets as they deal with geographic information as a set of vector information. These features are also aggregated in classes of features, called feature type. Each of these feature types represents a class of real world objects. For example, all the road features are held in a feature type called RoadType.

Features and Feature Types

FeatureAndFeatureType

The feature type presents the syntactical description of the features as a set of property definitions which can be either simple, complex, or geometric. This representation allows definition of arbitrary complex data models and provides enough constraints to allow interoperability.

The feature types also bring part of the semantic definition of the features. By sharing feature type and property definitions, communities can develop networks of geographic information.

Rasters

The raster model stores data by representing geographic information as cell values in a grid. All of the cells on the grid are identified by a unique coordinate location and a value. The raster model is particularly useful for working with continuous forms of features such as soil types or vegetation coverage.

Traditional images are examples of raster images, but the raster concept can be extended to include more complex information such as satellite imagery and radar measurements. In this case, each raster cell is filled with a set of values where each one represents a physical property measured at the location of the cell.

Another common type of raster is the result of the portrayal process. This process transforms vector information into a raster representation. This process is described in detail in the next section.

Coverages

Coverages are digital geospatial information comprised of regularly spaced locations along the first, second, or third axes of a spatial coordinate reference system representing space-varying phenomena. A coverage with a homogeneous range set defines at each location in the domain either a single (scalar) value, for example, elevation; or a series (array / tensor) of values all defined in the same way, such as brightness values in different parts of the electromagnetic spectrum.

Fundamentally, coverages (and images) provide a n-dimensional (where n is usually 2, and occasionally 3 or higher) "view" of some (usually more complex) space of geographic features. ERDAS APOLLO sets the "view" to be geospatially registered to the Earth.

A coverage is a function from a spatiotemporal domain set to a range of values (observations). In the figure below, if x1, ..., xj are the j spatiotemporal coordinates (for example x,y,z,t or x,y,t), the coverage value attached to each of the spatiotemporal positions is an i-dimensional vector y1, y2, ..., yi where y1, y2, ..., yi are functions of the spatiotemporal coordinate.

Coverage Domain

domainedef

The coverage value may be a scalar (numeric or text) value, such as population density, a compound (vector or tensor) value, such as incomes by race or radiances by wavelength. The range axis descriptions are used for compound observations; they describe additional parameters, such as an independent variable besides space and time and the valid values of this parameter. These values are used to select subsets of coverage data similar to spatiotemporal subsetting.

In the case of a coverage describing the ground temperature in Europe during the year 2003, the coverage value is a 1-dimensional vector (i = 1, the temperature) function of the 3-dimensional spatiotemporal coordinates (j = 3, latitude, longitude, time). No range axis is needed to describe the coverage value because it is a scalar value.

Scalar Observable - No Range Axis Needed

scalarobservable

Examples of compound observations include a multispectral radiance, such as brightness by wavelength (typical of satellite imagery), age distribution (counts of people by age brackets in a census table), or climate pattern (mean rainfall by month of the year in a climate database). In these cases, the Range Axis is needed to describe the ordinal values.

Spectral Response Observable - Range Axis Used

spectralobservable

A coverage is a special case or a subtype of a feature. The following figure demonstrates that features with geometry and coverages are two subtypes of the supertype feature. Other feature subtypes may not be directly associated with any geometry at all.

Coverage as a Feature Subtype

Coverageandfeature

Coverage Subtypes

The coverage type itself has many important subtypes, such as Image, Grid Coverage, Surface Coverage, Discrete Point Coverage, Line String Coverage, TIN Coverage, Polyhedral Surface Coverage, Nearest Neighbor and Lost Area Coverage, Segmented Line Coverage, and Geometry Coverages. This structure is illustrated in the following figure.

Coverage Subtypes

coveragesubtypes

The OGC-WCS specification Version 1.0.0. only supports the grid coverage type.

The grid coverage defines its domain as a regular grid of points or cells in 2, 3, or 4 dimensions. This description is suitable for digital airphotos, satellite imagery, gridded terrain models, or any other raster data.

Grid Coverage Characteristics

Grid coverages have the following characteristics:

  • Variable number of bits (1, 2, 4, 8,16, 32, or 64 bits) per grid value: unsigned integer, signed integer, and real
  • 1 to N bands
  • 1 to N dimensions
  • For grids with multiple bands, band values can be ordered by dimension. For example, a 2D grid coverage can be ordered by row-column-band (pixel interleaved), by row-band-column (line interleaved), or by band-row-column (band sequential).
  • Support for a variable number of "no data values"
  • Various color models are supported: gray scale, pseudocolor (any bit depth), RGB, CMYK and HSL

A grid coverage has a grid coordinate system that allows for addressing individual grid cells that are centered on the grid points. A grid has an ordering of cell values with the first cell in this ordering having grid coordinates of 0, 0. For example, a two-dimensional grid coverage with 512 rows and 512 columns would have grid coordinates with a range from rows 0 to 511 and columns 0 to 511.

Portrayal and Layers

The portrayal process transforms geographic information into a form easily understandable by humans. A common example is the transformation of vector-based information into a raster representation of this information. This transformation is specified by a set of rules applied to the input data sets. This process improves the use of geographic information for decision making.

Portrayal Process

Portrayal

The vector-to-raster transformation is not the only possibility. The portrayal may also present the information as a set of reports, a collection of statistics, or any knowledge that can be extracted from the available geographic information.

The final step is to present to the user an integrated view of multiple sources, giving him the ability to make the right decision. This operation consists of overlaying a set of information layers as native rasters with portrayed vector information. This simple operation may become very difficult with the variety of data format and spatial reference systems.

Raster and Vector Layers

RasterVector

Spatial Reference Systems (SRS)

In the process of mapmaking, map projections are needed to portray the real surface of the Earth on a flat surface. The surface of the Earth is first approximated by a geoid. The geoid is a surface that is defined as the locus of all points with equal gravity at mean sea level. Due to the irregular mass distribution in the Earth's interior, the geoid has an irregular shape which makes it unsuitable to use in calculations on spatial data. That is why the geoid is approximated by the nearest regular body, a spheroid, which is often also referred to as an ellipsoid. The ellipsoid is much easier to work with mathematically than the geoid. It forms the basis of the best-known type of coordinate reference systems: the Geographic SRS. The position of a point relative to the ellipsoid is then expressed by means of geographic coordinates: geodetic latitude and geodetic longitude.

Unfortunately, there is not just one ellipsoid that represents the Earth. The size and shape of the ellipsoid are traditionally chosen so that the surface of the geoid is matched as closely as possible to the Earth region from which the data was taken. That choice results in the definition of the origin, orientation, size and shape of the ellipsoid. This concept is called geodetic datum.

See Spheroids for more information.

A Geographic SRS is still not suitable for mapmaking, because it describes geometry on a curved surface. It is impossible to represent such geometry in a Euclidean plane without introducing distortions. The control of those distortions is part of the science of map projections. A map projection is a set of formulae that converts the geodetic latitude and longitude to plane map coordinates. The Spatial Reference System, as defined by OGC, is a text parameter that names a horizontal coordinate reference system code.

The OGC Web Map Specification mentions two namespaces: EPSG and AUTO. The EPSG namespace makes use of the European Petroleum Survey Group tables [EPSG], which define numeric identifiers for many common map projections and which associate projection or coordinate metadata (such as measurement units or central meridian) for each identifier. The AUTO namespace is used for automatic projections; that is, for a class of projections that include an arbitrary center of projection.

For more information see EPSG Coordinate Systems.

For more information see the OCG Abstract Specification at opengeospatial.org.

Bounding Box

The bounding box specifies the extent of the geographic area that you wish to display by using coordinate values of the chosen SRS.

According to the OGC WMS specifications, the bounding box (BBOX) is a set of four comma-separated decimal, scientific notation, or integer values (if integers are provided where floating point is needed, the decimal point is assumed at the end of the number). These values specify the minimum X, minimum Y, maximum X, and maximum Y ranges (in that order), expressed in units of the SRS of the request, such that a rectangular area is defined in those units. A Bounding Box should not have a zero area.4

For more information see the WMS Specification at opengeospatial.org.

Example of a Bounding Box

bbox

Metadata

Metadata is basically textual data that describes data. It often includes elements such as data creation date, lineage (history) of the data, data type, and common usages of the data.

According to ISO, the need for metadata has been created due to... "a revival in the awareness of the importance of geography and how things relate spatially, combined with the advancement of electronic technology that has caused an expansion in the use of digital geographic information and geographic information systems worldwide. Increasingly, individuals from a wide range of disciplines outside of the geographic sciences and information technologies are capable of producing, enhancing, and modifying digital geographic information. As complexity and diversity of geographic datasets grow, a method for providing an understanding of all aspects of this data grows in importance."

"Digital geographic data is an attempt to model and describe the real world for use in computer analysis and graphic display of information. Any description of reality is always an abstraction, always partial, and always just one of many possible 'views'. This 'view’, or model of the real world is not an exact duplication; data is often approximated, others are simplified, and other things are ignored. There is seldom perfect, complete, and correct data. To ensure that data is not misused, the assumption and limitation affecting the creation of data must be fully documented. Metadata allows a producer to describe a dataset fully so that users can understand the assumptions and limitations and evaluate the dataset's applicability for their intended use."

"Typically, geographic data is used by many other people other than the provider. It is often produced by one individual or organization and used by another. Proper documentation will provide those unfamiliar with the data with a better understanding, and enable them to use it properly. As geographic data producers and users handle more and more data, proper documentation will provide them with a keener knowledge of their holdings and will allow them to better manage data production, storage, updating, search and discovery as well as reuse."

The ISO 19115 Metadata standard provides a structure for describing digital geographic data. It defines metadata elements, provides a schema and establishes a common set of metadata terminology, definitions and extension procedures.

For more information see iso.org.

Effects of Implementing Metadata

When implemented by a data provider, metadata:

  • Provides data providers with appropriate information to characterize their geographic data properly
  • Facilitates the organization and management of metadata for geographic data
  • Enables users to apply geographic data in the most efficient way by knowing its basic characteristics
  • Facilitates data discovery, retrieval and re-use. Users will be better able to locate, access, evaluate, purchase, and utilize geographic data
  • Enables users to determine whether geographic data will be of use to them