32  Spatiotemporal Analysis

Data observations may entail reference information about spatial or temporal characteristics. Spatial information is stored as coordinates, usually named “x” and “y” or “lat”/“lon”. Treating spatiotemporal data using non-spatial data methods can lead to over-optimistic performance estimates. Hence, methods specifically designed to account for the special nature of spatiotemporal data are needed.

In the mlr3 framework, the following packages relate to this field:

The following (sub-)sections introduce the potential pitfalls of spatiotemporal data in machine learning and how to account for it. Note that not all functionality will be covered, and that some of the used packages are still in early lifecycles. If you want to contribute to one of the packages mentioned above, please contact Patrick Schratz.

32.1 Creating a spatial Task

To make use of spatial resampling methods, a mlr3 task that is aware of its spatial characteristic needs to be created. Two child classes exist in mlr3spatiotempcv for this purpose:

To create one of these, one can either pass a sf object as the “backend” directly:

# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717)
# create mlr3 task
task = TaskClassifST$new("ecuador_sf",
  backend = data_sf, target = "slides", positive = "TRUE"

or use a plain data.frame. In this case, the constructor of TaskClassifST needs a few more arguments:

data = mlr3::as_data_backend(ecuador)
task = TaskClassifST$new("ecuador",
  backend = data, target = "slides",
  positive = "TRUE", extra_args = list(coordinate_names = c("x", "y"),
  crs = 32717)

Now this Task can be used as a normal mlr3 task in any kind of modeling scenario.

32.2 Autocorrelation

Data which includes spatial or temporal information requires special treatment in machine learning (similar to survival, ordinal and other task types listed in the special tasks chapter). In contrast to non-spatial/non-temporal data, observations inherit a natural grouping, either in space or time or in both space and time (Legendre 1993). This grouping causes observations to be autocorrelated, either in space (spatial autocorrelation (SAC)), time (temporal autocorrelation (TAC)) or both space and time (spatiotemporal autocorrelation (STAC)). For simplicity, the acronym STAC is used as a generic term in the following chapter for all the different characteristics introduced above.

What effects does STAC have in statistical/machine learning?

The overarching problem is that STAC violates the assumption that the observations in the train and test datasets are independent (Hastie, Friedman, and Tibshirani 2001). If this assumption is violated, the reliability of the resulting performance estimates, for example retrieved via cross-validation, is decreased. The magnitude of this decrease is linked to the magnitude of STAC in the dataset, which cannot be determined easily.

One approach to account for the existence of STAC is to use dedicated resampling methods. mlr3spatiotemporal provides access to the most frequently used spatiotemporal resampling methods. The following example showcases how a spatial dataset can be used to retrieve a bias-reduced performance estimate of a learner.

The following examples use the ecuador dataset created by Jannes Muenchow. It contains information on the occurrence of landslides (binary) in the Andes of Southern Ecuador. The landslides were mapped from aerial photos taken in 2000. The dataset is well suited to serve as an example because it it relatively small and of course due to the spatial nature of the observations. Please refer to Muenchow, Brenning, and Richter (2012) for a detailed description of the dataset.

To account for the spatial autocorrelation probably present in the landslide data, we will make use one of the most used spatial partitioning methods, a cluster-based k-means grouping (Brenning 2012), (spcv_coords in mlr3spatiotemporal). This method performs a clustering in 2D space which contrasts with the commonly used random partitioning for non-spatial data. The grouping has the effect that train and test data are more separated in space as they would be by conducting a random partitioning, thereby reducing the effect of STAC.

By contrast, when using the classical random partitioning approach with spatial data, train and test observations would be located side-by-side across the full study area (a visual example is provided further below). This leads to a high similarity between train and test sets, resulting in “better” but biased performance estimates in every fold of a CV compared to the spatial CV approach. However, these low error rates are mainly caused due to the STAC in the observations and the lack of appropriate partitioning methods and not by the power of the fitted model.

32.3 Spatial CV vs. Non-Spatial CV

In the following a spatial and a non-spatial CV will be conducted to showcase the mentioned performance differences.

The performance of a simple classification tree ("classif.rpart") is evaluated on a random partitioning (repeated_cv) with four folds and two repetitions. The chosen evaluation measure is “classification error” ("classif.ce"). The only difference in the spatial setting is that repeated_spcv_coords is chosen instead of repeated_cv.

32.3.1 Non-Spatial CV


# be less verbose

task = tsk("ecuador")

learner = lrn("classif.rpart", maxdepth = 3, predict_type = "prob")
resampling_nsp = rsmp("repeated_cv", folds = 4, repeats = 2)
rr_nsp = resample(
  task = task, learner = learner,
  resampling = resampling_nsp)

rr_nsp$aggregate(measures = msr("classif.ce"))

32.3.2 Spatial CV

task = tsk("ecuador")

learner = lrn("classif.rpart", maxdepth = 3, predict_type = "prob")
resampling_sp = rsmp("repeated_spcv_coords", folds = 4, repeats = 2)
rr_sp = resample(
  task = task, learner = learner,
  resampling = resampling_sp)

rr_sp$aggregate(measures = msr("classif.ce"))

Here, the classification tree learner is around 0.05 percentage points worse when using Spatial Cross-Validation (SpCV) compared to Non-Spatial Cross-Validation (NSpCV). The magnitude of this difference is variable as it depends on the dataset, the magnitude of STAC and the learner itself. For algorithms with a higher tendency of overfitting to the training set, the difference between the two methods will be larger.

32.4 Visualization of Spatiotemporal Partitions

Every partitioning method in mlr3spatiotemporal comes with a generic plot() method to visualize the created groups. In a 2D space this happens via ggplot2 while for spatiotemporal methods 3D visualizations via plotly are created.

autoplot(resampling_sp, task, fold_id = c(1:4), size = 0.7) *
  ggplot2::scale_y_continuous(breaks = seq(-3.97, -4, -0.01)) *
  ggplot2::scale_x_continuous(breaks = seq(-79.06, -79.08, -0.01))

Note that setting the correct CRS for the given data is important which is done during task creation Spatial offsets of up to multiple meters may occur if the wrong CRS is supplied initially.

This example used an already created task via the sugar function tsk(). In practice however, one needs to create a spatiotemporal task via TaskClassifST/TaskRegrST and set the crs argument.

The spatial grouping of the k-means based approach above contrasts visually ver well compared to the NSpCV (random) partitioning:

autoplot(resampling_nsp, task, fold_id = c(1:4), size = 0.7) *
  ggplot2::scale_y_continuous(breaks = seq(-3.97, -4, -0.01)) *
  ggplot2::scale_x_continuous(breaks = seq(-79.06, -79.08, -0.01))

32.5 Spatial Block Visualization

The spcv_block method makes use of rectangular blocks to divide the study area into equally-sized parts. These blocks can be visualized by their spatial location and fold ID to get a better understanding how these influenced the final partitions.

task = tsk("ecuador")
resampling = rsmp("spcv_block", range = 1000L)

# Visualize train/test splits of multiple folds
autoplot(resampling, task, size = 0.7,
  fold_id = c(1, 2), show_blocks = TRUE, show_labels = TRUE) *
  ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01))

32.6 Choosing a Resampling Method

While the example used the spcv_coords method, this does not mean that this method is the best or only method suitable for this task. Even though this method is quite popular, it was mainly chosen because of the clear visual grouping differences compared to random partitioning.

In fact, most often multiple spatial partitioning methods can be used for a dataset. It is recommended (required) that users familiarize themselves with each implemented method and decide which method to choose based on the specific characteristics of the dataset. For almost all methods implemented in mlr3spatiotemporal, there is a scientific publication describing the strengths and weaknesses of the respective approach (either linked in the help file of mlr3spatiotemporal or its respective dependency packages).

In the example above, a cross-validation without hyperparameter tuning was shown. If a nested CV is desired, it is recommended to use the same spatial partitioning method for the inner loop (= tuning level). See Schratz et al. (2019) for more details and chapter 11 of Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019)1.

A list of all implemented methods in mlr3spatiotemporal can be found in the Getting Started vignette of the package.

If you want to learn even more about the field of spatial partitioning, STAC and the problems associated with it, the work of Prof. Hanna Meyer is very much recommended for further reference.

32.7 Spatial Prediction

Experimental support for spatial prediction with terra, raster, stars and sf objects is available in mlr3spatial.

Until the package is released on CRAN, please see the package vignettes for usage examples.

  1. The chapter will soon be rewritten using the mlr3 and mlr3spatiotempcv packages.↩︎