Data to linear dynamics (Data2LD)

Lets consider a study on traumatic brain injury (TBI), which contributes to just under a third (30.5\%) of all injury-related deaths in the US and is caused by a blow to the head. Figure (1) illustrates the acceleration of the brain tissue before and after a series of five blows to the cranium.

Figure (1)

The laws of motion tell us that the acceleration f(t) can be modeled by a second order linear differential equation (LDE) with a point impulse u(t) representing the blow to the cranium and shown in the dashed lines in Figure (1).

This LDE
\frac{\textrm{d}^2f}{\textrm{d}t^2} = \beta_{0} f + \beta_{1} \frac{\textrm{d}f}{\textrm{d}t} + \alpha_{0} u(t)
contains three parameters $\beta_{0},\beta_{1}$ and $\alpha_{0},$ and these convey the rate of the restoring force (as $t \rightarrow \infty,$ the acceleration will tend to revert back zero), the rate of the friction force (as $t \rightarrow \infty,$ the oscillations in the acceleration reduce to zero) and the rate of force from the point impulse.

While there are several methods for estimating LDE parameters with partially observed data, they are invariably subject to several problems including high computational cost, sensitivity to initial values or large sampling variability.

We propose a method called Data2LD for data to linear dynamics that overcomes these issues and produces estimates of the LDE parameters that have less bias, a smaller sampling variance and a ten fold improvement in computation.

The final parameter estimates with 95\% confidence intervals are, $\hat{\beta_{0}} = -0.056 \pm 0.002,$ $\hat{\beta_{1}} = -0.150 \pm 0.018$ and $\hat{\alpha_{0}} = 0.395 \pm 0.032.$ This is an under-damped process; after the blow to the cranium the acceleration will oscillate with a decreasing amplitude that will quickly decay to zero.

Figure (2)

Figure (2) shows the accelerometer readings of the brain tissue, the fitted curve produced by Data2LD (solid line), the numerical approximation to the solution of the LDE with the parameters identified by Data2LD (dashed line) and the impulse function $u$ representing the blow to the cranium (dotted line). We can see the LDE solution with the parameters defined by Data2LD is very close to the fitted curve produced by Data2LD, which indicates that the LDE provides an adequate description of the acceleration of the brain tissue.



Carey, M., Gath, E., Hayes, K. (2014) 'Frontiers in financial dynamics'. Research in International Business and Finance.

Carey, M., Gath, E., Hayes, K. (2016) 'A generalised smoother for linear ordinary differential equations'. Journal of Computational and Graphical Statistics.

Carey, M., Ramsay J. (2018) 'Parameter Estimation and Dynamic Smoothing
with Linear Differential Equations'. Journal of Computational and Graphical Statistics. (in press)

Dynamics 4 Genomic Big Data

The immune response to viral infection is a dynamic process, which is regulated by an intricate network of many genes and their products.

Understanding the dynamics of this network will infer the mechanisms involved in regulating influenza infection and hence aid the development of antiviral treatments and preventive vaccines. There has been an abundance of literature regarding dynamic network construction, e.g., Hecker et al. (2009), Lu et al. (2011) and Wu et al. (2013).

My research involves the development of a new pipeline for dynamic network construction for high-dimensional time course gene expression data. This pipeline allows us to discern the fundamental underlying biological process and their dynamic features at genetic level.

The pipeline includes:

Novel statistical methods and modelling approaches have been developed for the implementation of this new pipeline, which include a new approach for the selection of the smoothing parameter, a new clustering approach and a new method for model selection for high-dimensional ODEs.


Carey, M., Wu, S., Gan, G. and Wu, H. (2016) 'Correlation-based iterative clustering methods for time course data: the identification of temporal gene response modules to influenza infection in humans'. Infectious Disease Modelling.

Song, J., Carey, M., Zhu, H., Miao, H., Ramırez, Juan and Wu, H. (2017) 'Identifying the dynamic gene regulatory network during latent HIV-1reactivation using high-dimensional ordinary differential equations'. International Journal of Computational Biology and Drug Design

Carey, M., Wu, S.,  Wu, H., 'A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection'. Statistical Methods in Medical Research (in press)


Geo-Spatial functional data analysis

Geo-Spatial functional data analysis (FDA) concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties.

Figure (1)

It is used in

  • mapping,
  • assessing spatial data quality,
  • sampling design optimisation,
  • modelling of dependence structures,
  • and drawing of valid inference from a limited set of spatio-temporal data.

Geo-Spatial functional data analysis

This new branch of Statistics can be used to better analyse, model and predict spatial data.

Key aspects of FDA include:

  • smoothing
  • data reduction,
  • functional linear modelling
  • and forecasting methods.

Spatial FDA accounts for attributes of the geometry of the physical problem such as irregular shaped domains, external and internal boundary features and strong concavities.

These models can also include a priori information about the spatial structure of the phenomenon described by a partial differential equations (PDE).

Island of Montreal.

We consider the problem of estimating population density over the Island of Montreal. Figure (1), shows the census tract locations (493 data points defined
by the centroids of census enumeration areas) over the Island of Montreal, Quebec, Canada, excluding an airport (in the south) and an industrial park with an oil refinery tank farm (in the north-east tip of the Island). Population density is available at each census tract, measured as 1000 inhabitants per $km^2$ and a binary variable indicating whether a tract is predominantly residential or industrial/commercial is available as covariate for estimating the distributions of census quantities.

Here in particular we are interested in population density, thus the airport and industrial park are not part of the domain of interest since people cannot live in these two areas. Census quantities can be rather different in different sides of these not-inhabited parts of the city; for instance, just in the south of the industrial park there is a densely populated area with medium-low income, whilst in the north-east of it there on the contrary is a rich neighbourhood characterised by low population density, and finally in the west of it there is a relatively wealthy cluster of condominiums (high population density).

Hence, whilst it seems reasonable to assume that population density features a smooth spatial variation over the inhabited parts of the island, there is instead no reason to assume similar spatial variation on either side of these not-inhabited areas. Figure (1) also shows the island coasts as boundaries of the domain of interest; those parts of the boundary that are highlighted in red correspond respectively to the harbour in the east shore and to two public parks in the south-west and north east shore; no people live by the river banks along these stretches of coast.


Figure (3)

Figure (2) and (3) shows this estimate of the population density. Notice that the estimate complies with the imposed boundary conditions, dropping to zero along uninhabited stretches of coast. Also, the estimate has not artificially linked data points on either side of the uninhabited parts; see for instance the densely populated area in the south of the oil refinery and purification plant with respect to the low population density neighbourhood in the north-east of the industrial park. The $\beta$ coefficient that corresponds to the binary covariate indicating whether a tract is predominantly residential or commercial/industrial is $1.30$; this means that census tracts that are predominantly residential are on average expected to have $1300$ more inhabitants per $km^2$, with respect to those classified as mostly commercial/industrial.