Spatial functional data analysis

Spatial functional data analysis (FDA) concerns the statistical analysis of spatial and spatiotemporal data.

Spatial FDA is used to better analyse, model and predict the complex dependence structure inherent in spatiotemporal processes and thus is able to provide
more accurate predictions at high temporal and spatial resolutions.

Spatial FDA also accounts for attributes of the geometry of the physical problem such as irregular shaped domains, external and internal boundary features and strong concavities.

Spatial FDA can also include a priori information about the spatial structure of the phenomenon described by a partial differential equation (PDE). This facilitates a causal explanation for the drivers and impediments of the underlying spatiotemporal process.

What is Data2PDE?

Data to PDE estimates the parameters and the solutions of linear PDEs de noted by data that are partially observed noisy measurements distributed over complex geometries.

Modelling Temperature in Croatia with Data2PDE

Figure (1) shows the temperature measurements on February 2, 2008, at 127 meteorological stations across Croatia.

Figure 1

We model the dynamics of the temperature in Croatia with the diffusion advection reaction equation

$ \frac{\partial^2z}{\partial x^2} + \beta_{yy} \frac{\partial^2z}{\partial y^2}+\beta_x\frac{\partial z}{\partial x}+\beta_y\frac{\partial z}{\partial y}+\beta z = 0$

where $\beta_{yy},\beta_{x},\beta_{y},\beta$ are unknown parameters that require estimation from data.

The DAR PDE is used to describe the flow of heat, particles, or other physical quantities in situations where there is both diffusion and advection. Diffusion is the movement of a substance from an area of high concentration to an area of low concentration, resulting after a passage of time in the uniform distribution of the substance. Advection or flow refers to the transport due to linear or gently curvilinear movement of the substance. Reaction occurs when the substance has a baseline to which it is either accumulating or decaying over either space or time.

The parameter $\beta_{yy}$ represent the rate of the spread of temperature from high concentration to low concentration in space; $\beta_x$ and $\beta_y$ denote the movement of the temperature across space; and the reaction multiplier $\beta$ defines the exponential increase or decrease of the temperature.

Let $z$ be the solution of the diffusion advection reaction equation and let the observations of the process $z$ at the locations $\textsf{x}_{i},\textsf{y}_{i}$ be $\textsf{z}_{i}$ for $i=1,\ldots,N.$ We assume that

$\textsf{z}_{i}=z(\textsf{x}_{i},\textsf{y}_{i})+\alpha w_{i} +\epsilon_{i}$

where $w$ is a co-variate representing scaled elevation (elevation/100) at each station; $\alpha$ measures the relationship between scaled elevation and $\textsf{z},$; and $\epsilon_{i}$ is an independent and identically distributed measurement error following a distribution with zero mean and finite variance $\sigma_{\epsilon}^{2}.$ As weather systems originating or crossing over Croatia are strongly influenced by its diverse topography we include the scaled elevation at each station as a covariate in this model.

The estimated relationship between the elevation and the temperature across Croatia with its 95% con dence interval is $\alpha=-0.67 \pm 0.06.$ Indicating that with a 100-meter rise in altitude, temperature decreases by about $0.67$ degrees Celsius.

The estimated $\beta = 0.55\pm 0.43$ indicates an accumulation of temperature.

The estimated rates of advection are $\beta_{y}=2.89 \pm 1.7$ and $\beta_{x}=-0.71\pm2.$ Indicating that the temperature is moving in a north-easterly direction.

The estimated rate of di ffusion in the y-direction is $\beta_{yy}=1.72 \pm 0.96.$ Indicating anisotropic behaviour in the temperature as it is di ffusing at a faster rate from south to north than from west to east.

The estimated temperature is shown in Figure (2), the warmer air is coming in from the Adriatic sea. The Bura, which is a north to north-eastern wind, blows this warm air from the coastline at Rijeka across Croatia to northeast border near Bilogora.

Figure 2

What is Data 2 Linear Dynamics (Data2LD)?

Data 2 Linear Dynamics (Data2LD) estimates the solution and the parameters of linear dynamical systems from incomplete and noisy observations of the underlying processes.

It uses a linear combination of spline basis functions to approximate the implicitly defined solution of the dynamical system. While also estimating the systems' parameters by requiring that this approximating solution adheres to the data.


Carey, M., Gath, E., Hayes, K. (2016) 'A generalized smoother for linear ordinary differential equations'. Journal of Computational and Graphical Statistics.

Carey, M., Gath, E., Hayes, K. (2014) 'Frontiers in financial dynamics'. Research in International Business and Finance

Databeers Dublin

Databeers Dublin aims to bring together data experts from industry and academia, at a level accessible to a wide audience, by holding events which comprise of short talks from professionals with diverse expertise.

Thanks to Davide Cellai for the invite to talk at Databeers #6. For more information and upcoming events please see:

Functional Data Analysis and Beyond

at Matrix international research institute, Creswick, Australia

3rd December - 14th December 2018

Had a wonderful time at the workshop on Functional Data Analysis and Beyond. Many thanks to Aurore Delaigle, Frederic Ferraty and Debashis Paul for the invite.

In recent years, the field of functional data analysis has been widely used to answer science and policy questions, where the data are typically observed over time, space and other continuous variables. Current methodologies provide sophisticated computational techniques in solving complex problems in a wide range of application areas ranging from biomedical imaging, climate-environment interaction and unravelling networks evolving in time and space.

This workshop is intended to bring together the leaders in this field, representatives of application areas, and promising young researchers to charter the path for future development in the field.

Statistics of geometric features and new data types

Isaac Newton Institute for Mathematical Sciences, Cambridge, UK

19th March - 23rd March 2018

Had a great time at the workshop on statistics of geometric features and new data types. Many thanks to John Aston, Richard Davis and Axel Munk for the invite.

Geo spatial data are observations of a process that are collected in conjunction with reference to their geographical location. This type of data is abundant in many scientific fields, some examples include: population census, social and demographic (health, justice, education), economic (business surveys, trade, transport, tourism, agriculture, etc.) and environmental (atmospheric and oceanographic) data. They are often distributed over irregularly shaped spatial domains with complex boundaries and interior holes. Modelling approaches must account for the spatial dependence over these irregular domains as well as describing there temporal evolution.

Dynamic systems modeling has a huge potential in statistics, as evidenced by the amount of activity in functional data analysis. Many seemingly complex forms of functional variation can be more simply represented as a set of differential equations, either ordinary or partial.

In this talk, I present a class of semiparametric regression models with differential regularization in the form of PDEs. This methodology will be called Data2PDE “Data to Partial Differential Equations". Data2PDE characterizes spatial processes that evolve over complex geometries in the presence of uncertain, incomplete and often noisy observations and prior knowledge regarding the physical principles of the process characterized by a PDE.

Distributed Data for Dynamics and Manifolds

BIRS-affiliated mathematics research centre, Casa Matemática Oaxaca (CMO),

3rd September - 8th September 2017

Very interesting and insightful workshop many thanks to Jiguo Cao, Giles Hooker, James Ramsay, Laura Sangalli, and Fang Yao for the invite.

The ever-increasing rise of automated measurement has allowed us an unprecedented view of the world around us; from chemical processes on cell surfaces to global climate models, new sensors are capable of recording complex processes over a huge variety of spatial scales. The challenge is now not to collect data, but to analyze it.

This workshop focused on pairing complex models of physical processes with large data sets recorded on complex objects to refine our models and develop a new understanding of these processes. This workshop brought together statisticians, mathematical biologists, geometers, and applied mathematicians to develop new methods to understand how this new wealth of data can inform and improve mathematical models in these fields and how these models, in turn, can affect how the data is collected and measured.

European Study Group with Industry ESGI141

25th-29th June 2018, UCD Dublin.

ESGI’s are week-long workshops that provide a forum for industrial scientists to work alongside academics on problems of direct industrial relevance.

The scientific focus of the workshop is on the investigation and development of a suite of working solutions to complex challenging projects submitted by industry that require mathematical/statistical/computational expertise.

A very enjoyable, productive and fruitful week at ESGI141. Thanks to Prolego Scientific, Electricity Supply Board (ESB), Analog Devices and Captured Carbon for their interesting projects.

Using Data2LD to model the acceleration of brain tissue

Let's consider a study on traumatic brain injury (TBI), which contributes to just under a third (30.5%) of all injury-related deaths in the US and is caused by a blow to the head. Figure (1) shows the 133 accelerometer readings taken over 55.2 milliseconds. The dashed line represents the impulse function which denotes the blow to the head.

Figure (1)

The laws of motion tell us that the acceleration f(t) can be modelled by a second-order linear ordinary differential equation (ODE) with input a unit pulse u(t) representing the blow to the head and shown in the dashed lines in Figure (1).

This ODE
\frac{\textrm{d}^2f(t)}{\textrm{d}t^2} + \beta_{0} f(t) + \beta_{1} \frac{\textrm{d}f(t)}{\textrm{d}t} + \alpha_{0} u(t)
contains three parameters $\beta_{0},\beta_{1}$ and $\alpha_{0},$ and these convey the rate of the restoring force (as $t \rightarrow \infty,$ the acceleration will tend to revert back zero), the rate of the friction force (as $t \rightarrow \infty,$ the oscillations in the acceleration reduce to zero) and the rate of force from the unit pulse.

While there are several methods for estimating ODE parameters with partially observed data, they are invariably subject to several problems including high computational cost, sensitivity to initial values or large sampling variability.

We propose a method called Data2LD for data to linear dynamics that overcomes these issues and produces estimates of the ODE parameters that have less bias, a smaller sampling variance and a ten-fold improvement in computation.

The final parameter estimates with 95% confidence intervals are, $\hat{\beta_{0}} = -0.056 \pm 0.002,$ $\hat{\beta_{1}} = -0.150 \pm 0.018$ and $\hat{\alpha_{0}} = 0.395 \pm 0.032.$ indicating that the acceleration is an under-damped process; after the blow to the head, the acceleration will oscillate with a decreasing amplitude that will quickly decay to zero.

Figure (2)

Figure (2) shows the accelerometer readings of the brain tissue before and after a series of five blows to the head indicated by the circles. The fitted curve produced by Data2LD (solid line), the 95% confidence interval for this curve (dashed line) and the 95% prediction interval for this curve (grey region). We can see the fitted curve approximating the ODE solution provides an adequate description of the acceleration of the brain tissue.

Matlab code to produce the results from the above example