Demonstration of Data2LD: Incidence of cancer vs age for American females in 2001 (age-varying parameter)

This page provides a detailed description of the MATLAB calculations necessary to run Data2LD code for estimating the parameters and the solution of a linear differential equation.

Contents

Set-up

clear
clear functions
cd('/Users/michellecarey/Documents/Data2LD_v_121119/')
path = pwd;
cd(path)

Load the data

cd('/Users/michellecarey/Documents/Data2LD_v_121119/Examples/Cancer_Incidences')
load('Cancer_D.mat')

'ko');
set(phdl, 'LineWidth', 2)
xlabel('Age')
ylabel('Cancer cases per 100,000 population')

Figure (1) illustrates $19$ observations of cancer cases measured from age 0.5 (6 months) to age 87 years.

The first order linear differential equation

The first order linear differential equation, as used in this example is

<span class='MathJax_Preview'><img src='https://data2dynamics.ucd.ie/wp-content/plugins/latex/cache/tex_61ab892dad64747c5caecf7273150200.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt=" />

The parameter $beta(t)$ in the linear ODE conveys the age-varying rate of growth (if positive) or decay (if negative) in the cancer cases per 100,000 population as the individuals increase in age.

We have no prior knowledge of the form of $beta(t)$ so we will estimate it with a simple step function (B-spline basis of order 1) $beta(t)=sum_{j=1}^{J}b_jphi_j(t)$.

Our aim is to estimate the cancer cases $x(t)$ and the parameter $theta=[b_1,ldots,b_J]$ from the data in Figure (1).

Set up basis functions for X

rng       = [yCell{1}(1,1),yCell{1}(end,1)];
knots     = linspace(rng(1),rng(end),17);
norder    = 4;
nbasis    = length(knots)+ (norder - 2);
Xbasisobj = create_bspline_basis(rng,nbasis,norder,knots);


For the expansion of $x(t)$ we used order four B-spline functions which by their nature have quadratic 1st derivatives. We used 17 equally spaced knots between the first age (6 months) and last age (87 years).


Create a functional data object for the beta function

For the expansion of $beta(t)$ we used order one B-spline functions over the following knot sequence [0.5,12,40,45,50,55,60,65,70,75,80,85,87]. This placed less flexibility where the estimated curve was relatively constant and more flexibility where the estimated curve indicated that the trajectory was more complicated.

rng       = [yCell{1}(1,1),yCell{1}(end,1)];
knots     = [0.5,12,40,45,50,55,60,65,70,75,80,85,87];
norder    = 1;
nbasis    = length(knots)+ (norder - 2);
basisobjC = create_bspline_basis(rng, nbasis, norder, knots);
fd_beta   = fd(0.01.*ones(nbasis,1),basisobjC);

on;
plot([knots; knots], [0; 1], '--r')

Run Data2LD

...
    estimate);

Plot the fitted curve with its confidence and prediction intervals

This Figure illustrates the cancer cases per 100,000 population indicated by the circles. The fitted curve produced by Data2LD with $hat{rho} = 0.99$ (solid line), the approximated 9

figure(4)
hp = patch([yCell{1}(:,1); yCell{1}(end:-1:1,1);yCell{1}(1)],...
    [Results_cell{4}(:,4); Results_cell{4}(end:-1:1,5); Results_cell{4}(1)],[0.85 0.85 0.85]);
hold on;
plot(yCell{1}(:,1),[Results_cell{4}(:,2),Results_cell{4}(:,3)],'k--')
plot(yCell{1}(:,1),Results_cell{4}(:,1),'k-')
plot(yCell{1}(:,1), yCell{1}(:,2), 'ko');
xlim([min(yCell{1}(:,1))-0.1,max(yCell{1}(:,1))+0.1])
ylim([min(Results_cell{4}(:,4))-0.1,max(Results_cell{4}(:,5))+0.1])
xlabel('Age')
ylabel('Cancer cases per 100,000 population')
legend('Prediction Interval','Confidence Interval','Confidence Interval','Fitted Curve','Data','Location','northwest')

Plot the estimated parameters of the ODE and their approximated CI

This Figure illustrates the estimated age-varying rate of growth of the cancer cases per 100,000 population produced by Data2LD with $hat{rho} = 0.99$ (solid line) and the approximated 9

Wbasismat = eval_basis(yCell{1}(:,1),basisobjC);
StdErrW   = sqrt(diag(Wbasismat*Results_cell{2}(:,3:end)*Wbasismat'));

beta_fd   = fd(Results_cell{2}(1:getnbasis(basisobjC),1),basisobjC);
beta_0    = eval_fd(yCell{1}(:,1), beta_fd);

figure(5)
stairs(yCell{1}(:,1),beta_0,'k-');
hold on;
'Age')
ylabel('$hat{beta}(t)

Check fit to the ODE

The difference between the LHS and RHS of the ODE is small indicating that the ODE is satified.

fit    = Results_cell{4}(:,1);
Dfit   = eval_fd(yCell{1}(:,1), getfd(Results_cell{1}),1);
RHS    = beta_0.*fit;

figure()
plot(yCell{1}(:,1),Dfit,'*k')
hold on;
plot(yCell{1}(:,1),RHS,'k--')
legend('Left-Hand side of the ODE','Right-Hand side of the ODE','Location','best')
xlabel('Age')
ylabel('Velocity of the cancer cases per 100,000 population')

Check the residuals

Check the normality

The qqplot indicates that the residuals are approx normal distribution, as the quantiles of the residuals align with the 45-degree reference line. There is evidence of skewness where three ages have unusally high cancer rates that are not captured by the model. The histogram does have the features of a normal distribution (its symmetric and bell-shaped). However, it also indicates skewness where three ages have unusally high cancer rates that are not captured by the model.

res = yCell{1}(:,2)-Results_cell{4}(:,1);
figure()
subplot(2,1,1)
h = normplot(res);
h(1).LineWidth = 2;
h(2).LineWidth = 2;
h(3).LineWidth = 2;
subplot(2,1,2)
hist(res)

Check the fitted vrs residuals plot

The figure should display random scatter in a constant band around zero that is no patterns, and no funnels. We can see clear pattern here indicating that the model does not adquetly capture the complexity of the data.

figure()
plot(Results_cell{4}(:,1),yCell{1}(:,2)-Results_cell{4}(:,1),'ko')