Databeers Dublin

Databeers Dublin aims to bring together data experts from industry and academia, at a level accessible to a wide audience, by holding events which comprise of short talks from professionals with diverse expertise.

Thanks to Davide Cellai for the invite to talk at Databeers #6. For more information and upcoming events please see:

2019 Research Demonstrator Competition - Funded PhD Opportunities

Ph.D. project title: Functional Data Analysis with Application to High-Frequency 3D Imaging

Project supervisor: Dr Michelle Carey

Project Description: Many scientific areas are faced with the challenge of extracting information from large, complex, and highly structured data sets. A great deal of modern statistical work focuses on developing tools for handling such data. This project aims at developing a suite of functional data analysis, FDA, techniques to analyze samples of manifolds, as outcomes, alongside large numbers of scalar predictors. This work is motivated by an anthropological application involving 3D facial imaging data. The goal is to uncover the genetic architecture of the human face and to better understand the ancestry of different facial features that is to uncover how individual characteristics, such as age and genetic ancestry, influence the shape of the human face.

Funding details: The Research demonstratorships will be available for up to 4 years full time Ph.D. studies. They will be awarded competitively on academic merit. The start date for these opportunities is 1st September 2019. The total value of the payment of the Research Demonstratorship programme is valued at €15,000  per annum (tax free) together with up to € 2,000 in travel allowance over the four years. This will include tutorial duties. Successful applicants who do not qualify for any external fee support will in addition be covered for 100% fees. 

The closing date for applications is April 5th 2019 with the results anticipated by the end of May.  Applications should be made via e-mail to Marian Woods (  

An application should include as a single email attachment the following documents (in a single PDF) in order to be considered:

1. A brief curriculum vitae.
2. A statement of mathematical interest (no more than 500 words).
3. Full copies of academic transcripts (translated to English where applicable).
4. Contact details of two academic references (name, telephone number and email address).

NOTE: After submitting their application, applicants are required to contact the supervisor(s) associated with their listed topic (s) of interest, communicating their interest in the proposed topic(s). However, decisions of acceptance are made by an independent panel.

See for further details.

Functional Data Analysis and Beyond

at the Matrix centre in Creswick, Australia

3 December - 14 December 2018

Had a wonderful time at the workshop on Functional Data Analysis and Beyond. Many thanks to Aurore Delaigle, Frederic Ferraty and Debashis Paul for the invite.

In recent years, the field of functional data analysis has been widely used to answer science and policy questions, where the data are typically observed over time, space and other continuous variables. Current methodologies provide sophisticated computational techniques in solving complex problems in a wide range of application areas ranging from biomedical imaging, climate-environment interaction and unraveling networks evolving in time and space. After a period of prolific growth in computational techniques and methodological development, primarily motivated by diverse application areas, the time has come to consolidate the recent progress and provide a platform where researchers could exchange ideas and start collaboration on scientific projects and build a robust inferential framework for functional data analysis that take into account the increasing complexities of the data.

This workshop is intended to bring together the leaders in this field, representatives of application areas, and promising young researchers to charter the path for future development in the field.

Statistics of geometric features and new data types

Isaac Newton Institute for Mathematical Sciences, Cambridge

19th March - 23rd March 2018

Had a great time at the workshop on statistics of geometric features and new data types. Many thanks to John Aston, Richard Davis and Axel Munk for the invite.

Geo spatial data are observations of a process that are collected in conjunction with reference to their geographical location. This type of data is abundant in many scientific fields, some examples include: population census, social and demographic (health, justice, education), economic (business surveys, trade, transport, tourism, agriculture, etc.) and environmental (atmospheric and oceanographic) data. They are often distributed over irregularly shaped spatial domains with complex boundaries and interior holes. Modelling approaches must account for the spatial dependence over these irregular domains as well as describing there temporal evolution.

Dynamic systems modeling has a huge potential in statistics, as evidenced by the amount of activity in functional data analysis. Many seemingly complex forms of functional variation can be more simply represented as a set of differential equations, either ordinary or partial.

In this talk, I present a class of semiparametric regression models with differential regularization in the form of PDEs. This methodology will be called Data2PDE “Data to Partial Differential Equations". Data2PDE characterizes spatial processes that evolve over complex geometries in the presence of uncertain, incomplete and often noisy observations and prior knowledge regarding the physical principles of the process characterized by a PDE.

Distributed Data for Dynamics and Manifolds

Oaxaca, Mexico 3rd September - 8th September 2017

Very interesting and insightful workshop many thanks to Jiguo Cao, Giles Hooker, James Ramsay, Laura Sangalli, and Fang Yao for the invite.

The ever-increasing rise of automated measurement has allowed us an unprecedented view of the world around us; from chemical processes on cell surfaces to global climate models, new sensors are capable of recording complex processes over a huge variety of spatial scales. The challenge is now not to collect data, but to analyze it.

This workshop focused on pairing complex models of physical processes with large data sets recorded on complex objects to refine our models and develop a new understanding of these processes. This workshop brought together statisticians, mathematical biologists, geometers, and applied mathematicians to develop new methods to understand how this new wealth of data can inform and improve mathematical models in these fields and how these models, in turn, can affect how the data is collected and measured.

European Study Group with Industry ESGI141

25th-29th June 2018, UCD Dublin.

ESGI’s are week-long workshops that provide a forum for industrial scientists to work alongside academics on problems of direct industrial relevance.

The scientific focus of the workshop is on the investigation and development of a suite of working solutions to complex challenging projects submitted by industry that require mathematical/statistical/computational expertise.

A very enjoyable, productive and fruitful week at ESGI141. Thanks to Prolego Scientific, Electricity Supply Board (ESB), Analog Devices and Captured Carbon for their interesting projects.

Data to linear dynamics (Data2LD)

Let's consider a study on traumatic brain injury (TBI), which contributes to just under a third (30.5\%) of all injury-related deaths in the US and is caused by a blow to the head. Figure (1) illustrates the acceleration of the brain tissue before and after a series of five blows to the cranium.

Figure (1)

The laws of motion tell us that the acceleration f(t) can be modeled by a second order linear differential equation (LDE) with a point impulse u(t) representing the blow to the cranium and shown in the dashed lines in Figure (1).

This LDE
\frac{\textrm{d}^2f}{\textrm{d}t^2} = \beta_{0} f + \beta_{1} \frac{\textrm{d}f}{\textrm{d}t} + \alpha_{0} u(t)
contains three parameters $\beta_{0},\beta_{1}$ and $\alpha_{0},$ and these convey the rate of the restoring force (as $t \rightarrow \infty,$ the acceleration will tend to revert back zero), the rate of the friction force (as $t \rightarrow \infty,$ the oscillations in the acceleration reduce to zero) and the rate of force from the point impulse.

While there are several methods for estimating LDE parameters with partially observed data, they are invariably subject to several problems including high computational cost, sensitivity to initial values or large sampling variability.

We propose a method called Data2LD for data to linear dynamics that overcomes these issues and produces estimates of the LDE parameters that have less bias, a smaller sampling variance and a ten-fold improvement in computation.

The final parameter estimates with 95\% confidence intervals are, $\hat{\beta_{0}} = -0.056 \pm 0.002,$ $\hat{\beta_{1}} = -0.150 \pm 0.018$ and $\hat{\alpha_{0}} = 0.395 \pm 0.032.$ indicating that the acceleration is an under-damped process; after the blow to the cranium the acceleration will oscillate with a decreasing amplitude that will quickly decay to zero.

Figure (2)

Figure (2) shows the accelerometer readings of the brain tissue, the fitted curve produced by Data2LD (solid line), the numerical approximation to the solution of the LDE with the parameters identified by Data2LD (dashed line) and the impulse function $u(t)$ representing the blow to the cranium (dotted line). We can see the LDE solution with the parameters defined by Data2LD is very close to the fitted curve produced by Data2LD, which indicates that the LDE provides an adequate description of the acceleration of the brain tissue.


Carey, M., Gath, E., Hayes, K. (2014) 'Frontiers in financial dynamics'. Research in International Business and Finance.

Carey, M., Gath, E., Hayes, K. (2016) 'A generalized smoother for linear ordinary differential equations'. Journal of Computational and Graphical Statistics.

Carey, M., Ramsay J. (2018) 'Parameter Estimation and Dynamic Smoothing
with Linear Differential Equations'. Journal of Computational and Graphical Statistics. (in press)

Dynamics 4 Genomic Big Data

The immune response to viral infection is a dynamic process, which is regulated by an intricate network of many genes and their products.

Understanding the dynamics of this network will infer the mechanisms involved in regulating influenza infection and hence aid the development of antiviral treatments and preventive vaccines. There has been an abundance of literature regarding dynamic network construction, e.g., Hecker et al. (2009), Lu et al. (2011) and Wu et al. (2013).

My research involves the development of a new pipeline for dynamic network construction for high-dimensional time course gene expression data. This pipeline allows us to discern the fundamental underlying biological process and their dynamic features at genetic level.

The pipeline includes:

Novel statistical methods and modelling approaches have been developed for the implementation of this new pipeline, which include a new approach for the selection of the smoothing parameter, a new clustering approach and a new method for model selection for high-dimensional ODEs.


Carey, M., Wu, S., Gan, G. and Wu, H. (2016) 'Correlation-based iterative clustering methods for time course data: the identification of temporal gene response modules to influenza infection in humans'. Infectious Disease Modelling.

Song, J., Carey, M., Zhu, H., Miao, H., Ramırez, Juan and Wu, H. (2017) 'Identifying the dynamic gene regulatory network during latent HIV-1reactivation using high-dimensional ordinary differential equations'. International Journal of Computational Biology and Drug Design

Carey, M., Wu, S.,  Wu, H., 'A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection'. Statistical Methods in Medical Research

Geo-Spatial functional data analysis

Geo-Spatial functional data analysis (FDA) concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties.

Figure (1)

It is used in

  • mapping,
  • assessing spatial data quality,
  • sampling design optimisation,
  • modelling of dependence structures,
  • and drawing of valid inference from a limited set of spatio-temporal data.

Geo-Spatial functional data analysis

This new branch of Statistics can be used to better analyse, model and predict spatial data.

Key aspects of FDA include:

  • smoothing
  • data reduction,
  • functional linear modelling
  • and forecasting methods.

Spatial FDA accounts for attributes of the geometry of the physical problem such as irregular shaped domains, external and internal boundary features and strong concavities.

These models can also include a priori information about the spatial structure of the phenomenon described by a partial differential equations (PDE).

Island of Montreal.

We consider the problem of estimating population density over the Island of Montreal. Figure (1), shows the census tract locations (493 data points defined
by the centroids of census enumeration areas) over the Island of Montreal, Quebec, Canada, excluding an airport (in the south) and an industrial park with an oil refinery tank farm (in the north-east tip of the Island). Population density is available at each census tract, measured as 1000 inhabitants per $km^2$ and a binary variable indicating whether a tract is predominantly residential or industrial/commercial is available as covariate for estimating the distributions of census quantities.

Here in particular we are interested in population density, thus the airport and industrial park are not part of the domain of interest since people cannot live in these two areas. Census quantities can be rather different in different sides of these not-inhabited parts of the city; for instance, just in the south of the industrial park there is a densely populated area with medium-low income, whilst in the north-east of it there on the contrary is a rich neighbourhood characterised by low population density, and finally in the west of it there is a relatively wealthy cluster of condominiums (high population density).

Hence, whilst it seems reasonable to assume that population density features a smooth spatial variation over the inhabited parts of the island, there is instead no reason to assume similar spatial variation on either side of these not-inhabited areas. Figure (1) also shows the island coasts as boundaries of the domain of interest; those parts of the boundary that are highlighted in red correspond respectively to the harbour in the east shore and to two public parks in the south-west and north east shore; no people live by the river banks along these stretches of coast.


Figure (3)

Figure (2) and (3) shows this estimate of the population density. Notice that the estimate complies with the imposed boundary conditions, dropping to zero along uninhabited stretches of coast. Also, the estimate has not artificially linked data points on either side of the uninhabited parts; see for instance the densely populated area in the south of the oil refinery and purification plant with respect to the low population density neighbourhood in the north-east of the industrial park. The $\beta$ coefficient that corresponds to the binary covariate indicating whether a tract is predominantly residential or commercial/industrial is $1.30$; this means that census tracts that are predominantly residential are on average expected to have $1300$ more inhabitants per $km^2$, with respect to those classified as mostly commercial/industrial.