This page provides a detailed description of the MATLAB calculations necessary to run Data2LD code for estimating the parameters and the solution of a linear differential equation.
Contents
- Set-up
- Load the data
- The first order linear differential equation
- Set up basis functions for X
- Create a functional data object for the beta function
- Run Data2LD
- Plot the fitted curve with its confidence and prediction intervals
- Plot the estimated parameters of the ODE and their approximated CI
- Check fit to the ODE
- Check the residuals
- Check the normality
- Check the fitted vrs residuals plot
Set-up
clear clear functions cd('/Users/michellecarey/Documents/Data2LD_v_121119/') path = pwd; cd(path)
Load the data
cd('/Users/michellecarey/Documents/Data2LD_v_121119/Examples/Cancer_Incidences') load('Cancer_D.mat') 'ko'); set(phdl, 'LineWidth', 2) xlabel('Age') ylabel('Cancer cases per 100,000 population')
Figure (1) illustrates observations of cancer cases measured from age 0.5 (6 months) to age 87 years.
The first order linear differential equation
The first order linear differential equation, as used in this example is
" />
The parameter in the linear ODE conveys the age-varying rate of growth (if positive) or decay (if negative) in the cancer cases per 100,000 population as the individuals increase in age.
We have no prior knowledge of the form of so we will estimate it with a simple step function (B-spline basis of order 1) .
Our aim is to estimate the cancer cases and the parameter from the data in Figure (1).
Set up basis functions for X
rng = [yCell{1}(1,1),yCell{1}(end,1)];
knots = linspace(rng(1),rng(end),17);
norder = 4;
nbasis = length(knots)+ (norder - 2);
Xbasisobj = create_bspline_basis(rng,nbasis,norder,knots);
For the expansion of we used order four B-spline functions which by their nature have quadratic 1st derivatives. We used 17 equally spaced knots between the first age (6 months) and last age (87 years).
Create a functional data object for the beta function
For the expansion of we used order one B-spline functions over the following knot sequence [0.5,12,40,45,50,55,60,65,70,75,80,85,87]. This placed less flexibility where the estimated curve was relatively constant and more flexibility where the estimated curve indicated that the trajectory was more complicated.
rng = [yCell{1}(1,1),yCell{1}(end,1)];
knots = [0.5,12,40,45,50,55,60,65,70,75,80,85,87];
norder = 1;
nbasis = length(knots)+ (norder - 2);
basisobjC = create_bspline_basis(rng, nbasis, norder, knots);
fd_beta = fd(0.01.*ones(nbasis,1),basisobjC);
on;
plot([knots; knots], [0; 1], '--r')
Run Data2LD
...
estimate);
Plot the fitted curve with its confidence and prediction intervals
This Figure illustrates the cancer cases per 100,000 population indicated by the circles. The fitted curve produced by Data2LD with (solid line), the approximated 9
figure(4)
hp = patch([yCell{1}(:,1); yCell{1}(end:-1:1,1);yCell{1}(1)],...
[Results_cell{4}(:,4); Results_cell{4}(end:-1:1,5); Results_cell{4}(1)],[0.85 0.85 0.85]);
hold on;
plot(yCell{1}(:,1),[Results_cell{4}(:,2),Results_cell{4}(:,3)],'k--')
plot(yCell{1}(:,1),Results_cell{4}(:,1),'k-')
plot(yCell{1}(:,1), yCell{1}(:,2), 'ko');
xlim([min(yCell{1}(:,1))-0.1,max(yCell{1}(:,1))+0.1])
ylim([min(Results_cell{4}(:,4))-0.1,max(Results_cell{4}(:,5))+0.1])
xlabel('Age')
ylabel('Cancer cases per 100,000 population')
legend('Prediction Interval','Confidence Interval','Confidence Interval','Fitted Curve','Data','Location','northwest')
Plot the estimated parameters of the ODE and their approximated CI
This Figure illustrates the estimated age-varying rate of growth of the cancer cases per 100,000 population produced by Data2LD with $hat{rho} = 0.99$ (solid line) and the approximated 9
Wbasismat = eval_basis(yCell{1}(:,1),basisobjC);
StdErrW = sqrt(diag(Wbasismat*Results_cell{2}(:,3:end)*Wbasismat'));
beta_fd = fd(Results_cell{2}(1:getnbasis(basisobjC),1),basisobjC);
beta_0 = eval_fd(yCell{1}(:,1), beta_fd);
figure(5)
stairs(yCell{1}(:,1),beta_0,'k-');
hold on;
'Age')
ylabel('$hat{beta}(t)
Check fit to the ODE
The difference between the LHS and RHS of the ODE is small indicating that the ODE is satified.
fit = Results_cell{4}(:,1);
Dfit = eval_fd(yCell{1}(:,1), getfd(Results_cell{1}),1);
RHS = beta_0.*fit;
figure()
plot(yCell{1}(:,1),Dfit,'*k')
hold on;
plot(yCell{1}(:,1),RHS,'k--')
legend('Left-Hand side of the ODE','Right-Hand side of the ODE','Location','best')
xlabel('Age')
ylabel('Velocity of the cancer cases per 100,000 population')
Check the residuals
Check the normality
The qqplot indicates that the residuals are approx normal distribution, as the quantiles of the residuals align with the 45-degree reference line. There is evidence of skewness where three ages have unusally high cancer rates that are not captured by the model. The histogram does have the features of a normal distribution (its symmetric and bell-shaped). However, it also indicates skewness where three ages have unusally high cancer rates that are not captured by the model.
res = yCell{1}(:,2)-Results_cell{4}(:,1);
figure()
subplot(2,1,1)
h = normplot(res);
h(1).LineWidth = 2;
h(2).LineWidth = 2;
h(3).LineWidth = 2;
subplot(2,1,2)
hist(res)
Check the fitted vrs residuals plot
The figure should display random scatter in a constant band around zero that is no patterns, and no funnels. We can see clear pattern here indicating that the model does not adquetly capture the complexity of the data.
figure()
plot(Results_cell{4}(:,1),yCell{1}(:,2)-Results_cell{4}(:,1),'ko')