Geo-Spatial functional data analysis

Geo-Spatial functional data analysis (FDA) concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties.

Figure (1)

It is used in

  • mapping,
  • assessing spatial data quality,
  • sampling design optimisation,
  • modelling of dependence structures,
  • and drawing of valid inference from a limited set of spatio-temporal data.

Geo-Spatial functional data analysis

This new branch of Statistics can be used to better analyse, model and predict spatial data.

Key aspects of FDA include:

  • smoothing
  • data reduction,
  • functional linear modelling
  • and forecasting methods.

Spatial FDA accounts for attributes of the geometry of the physical problem such as irregular shaped domains, external and internal boundary features and strong concavities.

These models can also include a priori information about the spatial structure of the phenomenon described by a partial differential equations (PDE).

Island of Montreal.

We consider the problem of estimating population density over the Island of Montreal. Figure (1), shows the census tract locations (493 data points defined
by the centroids of census enumeration areas) over the Island of Montreal, Quebec, Canada, excluding an airport (in the south) and an industrial park with an oil refinery tank farm (in the north-east tip of the Island). Population density is available at each census tract, measured as 1000 inhabitants per $km^2$ and a binary variable indicating whether a tract is predominantly residential or industrial/commercial is available as covariate for estimating the distributions of census quantities.

Here in particular we are interested in population density, thus the airport and industrial park are not part of the domain of interest since people cannot live in these two areas. Census quantities can be rather different in different sides of these not-inhabited parts of the city; for instance, just in the south of the industrial park there is a densely populated area with medium-low income, whilst in the north-east of it there on the contrary is a rich neighbourhood characterised by low population density, and finally in the west of it there is a relatively wealthy cluster of condominiums (high population density).

Hence, whilst it seems reasonable to assume that population density features a smooth spatial variation over the inhabited parts of the island, there is instead no reason to assume similar spatial variation on either side of these not-inhabited areas. Figure (1) also shows the island coasts as boundaries of the domain of interest; those parts of the boundary that are highlighted in red correspond respectively to the harbour in the east shore and to two public parks in the south-west and north east shore; no people live by the river banks along these stretches of coast.


Figure (3)

Figure (2) and (3) shows this estimate of the population density. Notice that the estimate complies with the imposed boundary conditions, dropping to zero along uninhabited stretches of coast. Also, the estimate has not artificially linked data points on either side of the uninhabited parts; see for instance the densely populated area in the south of the oil refinery and purification plant with respect to the low population density neighbourhood in the north-east of the industrial park. The $\beta$ coefficient that corresponds to the binary covariate indicating whether a tract is predominantly residential or commercial/industrial is $1.30$; this means that census tracts that are predominantly residential are on average expected to have $1300$ more inhabitants per $km^2$, with respect to those classified as mostly commercial/industrial.