Introduction

Specification document - a mathematical description of models used by bage.

Note: some features described here have not been implemented yet.

Input data

outcome variable: events, numbers of people, or some sort of measure on a continuous variable such as income or expenditure
exposure/size/weights
disagg by one or more variables. Almost always includes age, sex/gender, and time. May include other variables eg region, ethnicity, education.
not all combinations of variables present; may be some missing values

Models

Poisson likelihood

Let y_i be a count of events in cell i = 1, ⋯, n and let w_i be the corresponding exposure measure, with the possibility that w_i ≡ 1. The likelihood under the Poisson model is then using the shape-rates parameterisation of the Gamma distribution. Parameter ξ governs dispersion, with and We allow ξ to equal 0, in which case the model reduces to

For ξ > 0, Equations @ref(eq:lik-pois-1) and @ref(eq:lik-pois-2) are equivalent to (Norton, Christen, and Fox 2018; Simpson 2022). This is the format we use internally for estimation. When values for γ_i are needed, we generate them on the fly, using the fact that

Binomial likelihood

The likelihood under the binomial model is Parameter ξ again governs dispersion, with and

We allow ξ to equal 0, in which case the model reduces to Equations @ref(eq:lik-binom-1) and @ref(eq:lik-binom-2) are equivalent to which is what we use internally. Values for γ_i can be generated using

Normal likelihood

The normal model is where ỹ_i is a standardized version of outcome y_i, and w̃_i is a standardized version of weight w_i. The standardization is carried out using where

Standardizing allows us to apply the same priors as we use for the Poisson and binomial models.

Model for prior means

Let μ = (μ₁, ⋯, μ_n)^⊤. Our model for μ is where

β⁽⁰⁾ is an intercept;
β^(m), m = 1, ⋯, M is a vector with J_m elements describing a main effect or interaction formed from the dimensions of data y;
X^(m) is an n × J_m matrix of 1s and 0s, the ith row of which picks out the element of β^(m) that is used with cell i;
Z is a n × P matrix of covariates; and
ζ is a coefficient vector with P elements.

Priors for Intercept, Main Effects, and Interactions

General features

‘Along’ and ‘by’ dimensions

Each β^(m), m > 0, can be a main effect, involving a single dimension, or an interaction, involving two dimensions. Some priors, when applied to an interaction, treat one dimension, referred to as the ‘along’ dimension, differently from the remaining dimensions, referred to as ‘by’ dimensions. A random walk prior (Section @ref(sec:pr-rw)), for instance, consists of an independent random walk along the ‘along’ dimension, within each combination of the ‘by’ dimensions.

We use v = 1, ⋯, V_m to denote position within the ‘along’ dimension, and u = 1, ⋯, V_m to denote position within a classification formed by the ‘by’ dimensions. When there are no sum-to-zero constraints (see below), U_m = ∏_kd_k where d_k is the number of elements in the kth ‘by’ variable. When there are sum-to-zero constraints, U_m = ∏_k(d_k − 1).

If a prior involves an ‘along’ dimension but the user does not specify one, the procedure for choosing a dimension is as follows:

if the term involves time, use the time dimension;
otherwise, if the term involves age, use the age dimension;
otherwise, raise an error asking the user to explicitly specify a dimension.

Constraints

With some combinations of terms and priors, some β^(m) are only weakly identified, and have diffuse prior distributions. Even when this happens, however, the quantity $\mu_i = \sum_{m=0}^M \beta_{j_i^m}^{(m)}$ is still well identified, so the weak identification may not matter to the aims of the analysis.

If, however, stronger identification is required, it can be achieved by imposing constraints on the elements of the β^(m). This is done via the con argument. At present only two choices for con have been implemented. The first is "none", where no constraints are applied. This is the default. The second is "by".

The "by" option can only be used if β^(m) has an ‘along’ dimension. If con is "by", then within each element v of the ‘along’ dimension, the sum of the β_j^(m) across each ‘by’ dimension is zero. For instance, if β^(m) is an interaction between time, region, and sex, with time as the ‘along’ variable, then within each combination of time and region, the values for females and males sum to zero, and within each combination of time and sex, the values for regions sum to zero.

Except in the case of dynamic SVD-based priors (eg Sections @ref(sec:pr-svd-rw)), "by" constraints are implemented internally by drawing values within an unrestricted lower-dimensional space, and then transforming to the restricted higher-dimensional space. For instance, a random walk prior for a time-region interaction with R regions consists of R − 1 unrestricted random walks along time, which are converted into R random walks that sum to zero across region. Matrices for transforming between the unrestricted and restricted spaces are constructed using the QR decomposition, as described in Section 1.8.1 of Wood (2017). With dynamic SVD-based priors, we draw values for the SVD coefficients with no constraints, convert these to unconstrained values for β^(m), and then subtract means.

Algorithm for assigning default priors

If β^(m) has one or two elements, assign β^(m) a fixed-normal prior (Section @ref(sec:pr-fnorm));
otherwise, if β^(m) involves time, assign β^(m) a random walk prior (Section @ref(sec:pr-rw)) along the time dimension;
otherwise, if β^(m) involves age, assign β^(m) a random walk prior (Section @ref(sec:pr-rw)) along the age dimension;
otherwise, assign β^(m) a normal prior (Section @ref(sec:pr-norm))

The intercept term β⁽⁰⁾ can only be given a fixed-normal prior (Section @ref(sec:pr-fnorm)) or a Known prior (Section @ref(sec:pr-known)).

N()

Model

Exchangeable normal

Contribution to posterior density

Forecasting

Code

N(s = 1)

s is A_τ^(m). Defaults to 1.

NFix()

Model

Exchangeable normal, with fixed standard deviation

Contribution to posterior density

Forecasting

Code

NFix(sd = 1)

sd is A_τ^(m). Defaults to 1.

RW()

Model

Random walk

A₀^(m) can be 0, implying that β_u, 1^(m) is fixed at 0.

When U_m > 1, constraints (Section @ref(sec:constraints)) can be applied.

Contribution to posterior density

Forecasting

If the prior includes sum-to-zero constraints, means are subtracted from the forecasted values within each combination of ‘along’ and ‘by’ variables.

Code

RW(s = 1,
   along = NULL,
   con = c("none", "by"))

s is A_τ^(m). Defaults to 1.
sd is A₀^(m). Defaults to 1.
along used to identify ‘along’ and ‘by’ dimensions.
if con is "by", sum-to-zero constraints are applied.

RW2()

Model

Second-order random walk

A₀^(m) can be 0, implying that β_u, 1^(m) is fixed at 0.

When U_m > 1, constraints (Section @ref(sec:constraints)) can be applied.

Contribution to posterior density

Forecasting

If the prior includes sum-to-zero constraints, means are subtracted from the forecasted values within each combination of ‘along’ and ‘by’ variables.

Code

RW2(s = 1,
    sd = 1,
    sd_slope = 1,
    along = NULL,
    con = c("none", "by"))

s is A_τ^(m)
sd is A₀^(m)
sd_slope is A_η^(m)
along used to identify ‘along’ and ‘by’ dimensions
if con is "by", sum-to-zero constraints are applied

RW2_Infant()

Model

Second-order random walk with infant indicator. Designed for age profiles for mortality rates. Along dimension must be age.

When U_m > 1, constraints (Section @ref(sec:constraints)) can be applied.

Contribution to posterior density

Forecasting

Terms with an RW2_Infant() prior cannot be forecasted.

Code

RW2_Infant(s = 1,
           sd_slope = 1,
       con = c("none", "by"))

s is A_τ^(m)
sd_slope is A_η^(m)
if con is "by", sum-to-zero constraints are applied

RW_Seas()

Model

Random walk with seasonal effect

A₀^(m) can be 0, implying that α_u, 1^(m) is fixed at 0.

A_ω^(m)2 can be set to zero, implying that seasonal effects are fixed over time.

When U_m > 1, constraints (Section @ref(sec:constraints)) can be applied.

Contribution to posterior density

Forecasting

Code

RW_Seas(n_seas,
        s = 1,
    sd = 1,
    s_seas = 0,
    sd_seas = 1,
    along = NULL,
    con = c("none", "by"))

n_seas is S_m
s is A_τ^(m)
sd is A₀^(m)
s_seas is A_ω^(m)
sd_seas is A_λ^(m)
along used to identify ‘along’ and ‘by’ dimensions
if con is "by", sum-to-zero constraints are applied

RW2_Seas()

Model

Second-order random work, with seasonal effect

A₀^(m) can be 0, implying that α_u, 1^(m) is fixed at 0.

A_ω^(m)2 can be set to zero, implying that seasonal effects are fixed over time.

When U_m > 1, constraints (Section @ref(sec:constraints)) can be applied.

Contribution to posterior density

Forecasting

Code

RW2_Seas(n_seas,
         s = 1,
     sd = 1,
     sd_slope = 1,
     s_seas = 0,
     sd_seas = 1,
     along = NULL,
     con = c("none", "by"))

n_seas is S_m
s is A_τ^(m)
sd is A₀^(m)
sd_slope is A_η^(m)
s_seas is A_ω^(m)
sd_seas is A_λ^(m)
along used to identify ‘along’ and ‘by’ dimensions
if con is "by", sum-to-zero constraints are applied

AR()

Model

Internally, TMB derives values for β_u, v^(m), v = 1, ⋯, K_m, and for ω_m, that imply a stationary distribution, and that give every term β_u, v^(m) the same marginal variance. We denote this marginal variance τ_m², and assign it a prior Each of the ϕ_k^(m) has prior

Contribution to posterior density

where p(β_u, 1^(m), ⋯, β_{u, V_m}^(m) ∣ ϕ₁^(m), ⋯, ϕ_{K_m}^(m), τ_m) is calculated internally by TMB.

Forecasting

Code

AR(n_coef = 2,
   s = 1,
   shape1 = 5,
   shape2 = 5,
   along = NULL,
   con = c("none", "by"))

n_coef is K_m
s is A_τ^(m)
shape1 is S₁^(m)
shape2 is S₂^(m)
along is used to indentify the ‘along’ and ‘by’ dimensions

AR1()

Special case or AR(), with extra options for autocorrelation coefficient.

Model

This is adapted from the specification used for AR1 densities in TMB. It implies that the marginal variance of all β_u, v^(m) is τ_m². We require that −1 < a_0m < a_1m < 1.

Contribution to posterior density

Forecasting

Code

AR1(s = 1,
    shape1 = 5,
    shape2 = 5,
    min = 0.8,
    max = 0.98,
    along = NULL,
    con = c("none", "by"))

s is A_τ^(m)
shape1 is S₁^(m)
shape2 is S₂^(m)
min is a_0m
max is a_1m
along is used to identify ‘along’ and ‘by’ dimensions

The defaults for min and max are based on the defaults for function ets() in R package forecast (Hyndman and Khandakar 2008).

Lin()

Model

Note that $\sum_{v=1}^{V_m} \alpha_{u,v}^{(m)} = 0$.

Contribution to posterior density

Forecasting

Code

Lin(s = 1,
    mean_slope = 0,
    sd_slope = 1,
    along = NULL,
    con = c("none", "by"))

s is A_τ^(m)
mean_slope is B_η^(m)
sd_slope is A_η^(m)
along is used to indentify ‘along’ and ‘by’ dimensions
if con is "by", sum-to-zero constraints are applied

Lin_AR()

Model

Note that $\sum_{v=1}^{V_m} \alpha_{u,v}^{(m)} = 0$.

Internally, TMB derives values for ϵ_u, v^(m), v = 1, ⋯, K_m, and for ω_m, that provide the ϵ_u, v^(m) with a stationary distribution in which each term has the same marginal variance. We denote this marginal variance τ_m², and assign it a prior Each of the individual ϕ_k^(m) has prior

Contribution to posterior density

where p(ϵ_u, 1^(m), ⋯, ϵ_{u, V_m}^(m) ∣ ϕ₁^(m), ⋯, ϕ_{K_m}^(m), τ_m) is calculated internally by TMB.

Forecasting

Code

Lin_AR(n_coef = 2,
       s = 1,
       shape1 = 5,
       shape2 = 5,
       mean_slope = 0,
       sd_slope = 1,
       along = NULL,
       con = c("none", "by"))

n_coef is K_m
s is A_τ^(m)
shape1 is S₁^(m)
shape2 is S₂^(m)
mean_slope is B_η^(m)
sd_slope is A_η^(m)
along is used to indentify ‘along’ and ‘by’ variables
if con is "by", sum-to-zero constraints are applied

Lin_AR1()

Model

Note that $\sum_{v=1}^{V_m} \alpha_{u,v}^{(m)} = 0$.

Contribution to posterior density

Forecasting

Code

Lin_AR1(s = 1,
        shape1 = 5,
        shape2 = 5,
    min = 0.8,
    max = 0.98,
        mean_slope = 0,
        sd_slope = 1,
        along = NULL,
    con = c("none", "by"))

s is A_τ^(m)
shape1 is S₁^(m)
shape2 is S₂^(m)
min is a_0m
max is a_1m
mean_slope is B_η^(m)
sd_slope is A_η^(m)
along is used to indentify ‘along’ and ‘by’ variables
if con is "by", sum-to-zero constraints are applied

Sp()

Model

Penalised spline (P-spline)

where β_u^(m) is the subvector of β^(m) composed of elements from the uth combination of the ‘by’ variables, B^(m) is a V_m × K_m matrix of B-splines, and α_u^(m) has a second-order random walk prior (Section @ref(sec:pr-rw2)).

B^(m) = (b₁^(m)(v), ⋯, b_{K_m}^(m)(v)), with v = (1, ⋯, V_m)^⊤. The B-splines are centered, so that 1^⊤b_k^(m)(v) = 0, k = 1, ⋯, K_m.

Contribution to posterior density

Forecasting

Terms with a Sp() prior cannot be forecasted.

Code

Sp(n = NULL,
   s = 1)

n is K_m. Defaults to max (0.7J_m, 4).
s is the A_τ^(m) from the second-order random walk prior. Defaults to 1.
along is used to identify ‘along’ and ‘by’ variables

SVD()

Model

Age but no sex or gender

Let β_u be the age effect for the uth combination of the ‘by’ variables. With an SVD prior, where F^(m) is a V_m × K_m matrix, and g^(m) is a vector with V_m elements, both derived from a singular value decomposition (SVD) of an external dataset of age-specific values for all sexes/genders combined. The construction of F^(m) and g^(m) is described in Appendix @ref(app:svd). The centering and scaling used in the construction allow use of the simple prior

Joint model of age and sex/gender

In the joint model, vector β_u represents the interaction between age and sex/gender for the uth combination of the ‘by’ variables. Matrix F^(m) and vector g^(m) are calculated from data that separate sexes/genders. The model is otherwise unchanged.

Independent models for each sex/gender

In the independent model, vector β_s, u represents age effects for sex/gender s and the uth combination of the ‘by’ variables, and we have Matrix F_s^(m) and vector g_s^(m) are calculated from data that separate sexes/genders. The prior is

Contribution to posterior density

for the age-only and joint models, and for the independent model

Forecasting

Terms with an SVD prior cannot be forecasted.

Code

SVD(ssvd,
    n_comp = NULL,
    indep = TRUE)

where - ssvd is an object containing F and g - n_comp is the number of components to be used (which defaults to ceiling(n/2), where n is the number of components in ssvd - indep determines whether and independent or joint model will be used if the term being modelled contains a sex or gender variable.

SVD_RW()

Model

The SVD_RW() prior is identical to the SVD() prior except that the coefficients evolve over time, following independent random walks. For instance, in the combined-sex/gender and joint models with K_m SVD components,

Contribution to posterior density

In the combined-sex/gender and joint models,

and in the independent model,

Forecasting

Code

SVD_RW(ssvd,
       n_comp = NULL,
       indep = TRUE,
       s = 1,
       sd = 1,
       con = c("none", "by"))

where

ssvd is an object containing F and g
n_comp is K_m
indep determines whether and independent or joint model will be used if the term being modelled contains a sex or gender variable.
s is A_τ^(m)
sd is A₀^(m)

SVD_RW2(), SVD_AR(), SVD_AR1()

The SVD_RW2(), SVD_AR() and SVD_AR1() priors have the same structure as the SVD_RW() prior, but with RW2(), AR(), and AR1() priors for the along dimension taking the place of the RW() prior.

Known

Model

Elements of β^(m) are treated as known with certainty.

Contribution to posterior density

Known priors make no contribution to the posterior density.

Forecasting

Main effects with a known prior cannot be forecasted.

Code

Known(values)

values is a vector containing the β_j^(m).

Covariates

Model

Matrix Z is a standardized version of the original covariate data supplied by the user. If a variable in the original data is numeric, then it is standardized to have mean 0 and standard deviation one. If a variable in the original data is categorical with C categories, then it is converted into C − 1 indicator variables, with the first category as the omitted variable. (This is the ‘treatment’ contrast in R.)

The elements of ζ have prior

Contribution to posterior density

Forecasting

A model with covariates can be used for forecasting provided that

the coefficients (the ζ_p) are non-time-varying,
future values for the covariates (the columns of Z) can be inferred from the classifying variables (other than time), or are supplied by the user.

Code

set_covariates(mod, formula)

mod Object of class "bage_mod"
formula One-sided R formula describing the covariates to be used

Prior for dispersion terms

Model

Use exponential distribution, parameterised using mean,

Contribution to prior density

Code

set_disp(mean = 1)

mean is μ_ξ

Data models

Data models for outcome

Random Rounding to Base 3

Random rounding to base 3 (RR3) is a confidentialization method used by some statistical agencies. It is applied to counts data. Each count x is rounded randomly as follows:

If x mod 3 = 0, then x is left unchanged;
if x mod 3 = 1 then x is changed to x − 1 with probability 2/3, and is changed to x + 2 with probability 1/3; and
if x mod 3 = 2 then x is changed to x − 2 with probability 1/3, and is changed to x + 1 with probability 2/3.

RR3 data models can be used with Poisson or binomial likelihoods. Let y_i denote the observed value for the outcome, and y_i^* the true value. The likelihood with a RR3 data model is then

Data models for exposure or size

Estimation

Filtering and Aggregation

The data that we supply to TMB is a a filtered and aggregated version of the data that the user provides through the data argument.

In the filtering stage, we remove any rows where (i) the offset is 0 or NA, or (ii) the outcome variable is NA.

In the aggregation stage, we identify any rows in the data that duplicated combinations of classification variables. For instance, if the classification variables are age and sex, and we have two rows where age is "20-24" and sex is "Female", then these rows would count as duplicated combinations. We aggregate offset and outcome variables across these duplicates. With Poisson and binomial models, the aggregation formula for outcomes is and the aggregation formula for exposure/size is where D is the number of times a particular combination is duplicated. With normal models, the aggregation formula for outcomes is and the aggregation formuala for weights is

Inner-Outer Approximation

Step 0: Select ‘inner’ and ‘outer’ variables

Select variables to be used in inner model. By default, these are the age, sex, and time variables in the model. All remaining variables are ‘outer’ variables.

Step 1: Fit inner model

Aggregate the data using the classification formed by the inner variables. (See Section @ref(sec:filter-ag) on aggregation procedures.) Remove all terms not involving ‘inner’ variables, other than the intercept term, from the model. Set dispersion to 0. Fit the resulting model.

Step 2: Fit outer model

Let μ̂_iⁱⁿ be point estimates for the linear predictor μ_i obtained from the inner model.

Poisson model

Aggregate the data using the classification formed by the outer variables. Remove all terms involving the ‘inner’ variables, plus the intercept, from the model. Set dispersion to 0. Set exposure to w_i^out = μ̂ⁱⁿw_i. Fit the model.

Binomial model

Fit the original model, but set dispersion to 0, and for all terms from the ‘inner’ model, use Known priors using point estimates from the inner model.

Normal model

Aggregate the data using the classification formed by the outer variables. Remove all terms involving the ‘inner’ variables, plus the intercept, from the model. Set the outcome variable to to $y_i^{} = y_i - ^{}. Fit the model.

Step 4: Concatenate estimates

Concatenate posterior distributions for the inner terms from the inner model to posterior distributions for the outer terms from the outer model.

Step 5: Calculate dispersion

If the original model includes a dispersion term, then estimate dispersion. Let μ̂_i^comb be point estimates for the linear predictor obtained from the concatenated estimates.

Poisson model

Use the original disaggregated data, or, if the original data contains more then 10,000 rows, select 10,000 rows at random from the original data. Remove all terms from the original model except for the intercept. Set exposure to w_i^out = μ̂^combw_i.

Binomial model

Fit the the original model, but with all terms except the intercept having Known priors, where the values are obtained from point estimates from the concatenated estimates.

Normal model

Deriving outputs

Running TMB yields a set of means m, and a precision matrix Q⁻¹, which together define the approximate joint posterior distribution of

β^(m) for terms with independent normal, fixed normal, multivariate normal, random walk, second-order random walk, AR1, Linear, and Linear-AR1 priors,
α for terms with Spline and SVD priors,
hyper-parameters for β^(m) and α^(m) typically transformed to another scale, such as a log scale,
dispersion term ξ, and
seasonal effects λ, together with associated hyper-parameters τ_λ (on a log scale).

We use $\tilde{\pmb{\theta}}$ to denote a vector containing all these quantities.

We perform a Cholesky decomposition of Q⁻¹, to obtain R such that We store R as part of the model object.

We draw generate values for $\tilde{\pmb{\theta}}$ by generating a vector of standard normal variates z, back-solving the equation and setting

Next we convert any transformed hyper-parameters back to the original units, and insert values for β^(m) for terms that have Known priors. We denote the resulting vector θ.

Finally we draw from the distribution of γ ∣ y, θ using the methods described in Sections @ref(sec:pois)-@ref(sec:norm).

Simulation

To generate one set of simulated values, we start with values for exposure, trials, or weights, w, and possibly covariates Z, then go through the following steps:

Draw values for any parameters in the priors for the β^(m), m = 1, ⋯, M.
Conditional on the values drawn in Step 1, draw values the β^(m), m = 0, ⋯, M.
If the model contains seasonal effects, draw the standard deviation κ_m, and then the effects λ^(m).
If the model contains covariates, draw φ and ϑ_p where necessary, draw coefficient vector ζ.
Use values from steps 2–4 to form the linear predictor $\sum_{m=0}^{M} \pmb{X}^{(m)} (\pmb{\beta}^{(m)} + \pmb{\lambda}^{(m)}) + \pmb{Z} \pmb{\zeta}$.
Back-transform the linear predictor, to obtain vector of cell-specific parameters μ.
If the model contains a dispersion parameter ξ, draw values from the prior for ξ.
In Poisson and binomial models, use μ and, if present, ξ to draw γ.
In Poisson and binomial models, use γ and w to draw y; in normal models, use μ, ξ, and w to draw y.

Replicate data

Model

Poisson likelihood

Condition on γ

Condition on (μ, ξ)

which is equivalent to

Binomial likelihood

Condition on γ

Condition on (μ, ξ)

Normal likelihood

Data models for outcomes

If the overall model includes a data model for the outcome, then a further set of draws is made, deriving values for the observed outcomes, given values for the true outcomes.

Code

replicate_data(x, condition_on = c("fitted", "expected"), n = 20)

Appendices

Definitions

Quantity	Definition
i	Index for cell, i = 1, ⋯, n.
y_i	Value for outcome variable.
w_i	Exposure, number of trials, or weight.
γ_i	Super-population rate, probability, or mean.
μ_i	Cell-specific mean.
ξ	Dispersion parameter.
g()	Log, logit, or identity function.
m	Index for intercept, main effect, or interaction. m = 0, ⋯, M.
j	Index for element of a main effect or interaction.
u	Index for combination of ‘by’ variables for an interaction. u = 1, ⋯U_m. U_mV_m = J_m
v	Index for the ‘along’ dimension of an interaction. v = 1, ⋯V_m. U_mV_m = J_m
β⁽⁰⁾	Intercept.
β^(m)	Main effect or interaction. m = 1, ⋯, M.
β_j^(m)	jth element of β^(m). j = 1, ⋯, J_m.
X^(m)	Matrix mapping β^(m) to y.
Z	Matrix of covariates.
ζ	Parameter vector for covariates Z^(m).
A₀	Scale parameter in prior for intercept β⁽⁰⁾ or initial value.
τ_m	Standard deviation parameter for main effect or interaction.
A_τ^(m)	Scale parameter in prior for τ_m.
α^(m)	Parameter vector for P-spline and SVD priors.
α_k^(m)	kth element of α^(m). k = 1, ⋯, K_m.
V^(m)	Covariance matrix for multivariate normal prior.
h_j^(m)	Linear covariate
η^(m)	Parameter specific to main effect or interaction β^(m).
η_u^(m)	Parameter specific to uth combination of ‘by’ variables in interaction β^(m).
A_η^(m)	Standard deviation in normal prior for η_m.
ω_m	Standard deviation of parameter η_c in multivariate priors.
ϕ_m	Correlation coefficient in AR1 densities.
a_0m, a_1m	Minimum and maximum values for ϕ_m.
B^(m)	B-spline matrix in P-spline prior.
b_k^(m)	B-spline. k = 1, ⋯, K_m.
F^(m)	Matrix in SVD prior.
g^(m)	Offset in SVD prior.
β_trend	Trend effect.
β_cyc	Cyclical effect.
β_seas	Seasonal effect.
φ	Global shrinkage parameter in shrinkage prior.
A_φ	Scale term in prior for φ.
ϑ_p	Local shrinkage parameter in shrinkage prior.
p₀	Expected number of non-zero coefficients in ζ.
σ̂	Empirical scale estimate in prior for φ.
π	Vector of hyper-parameters

SVD prior for age

Let A be a matrix of age-specific estimates from an international database, transformed to take values in the range (−∞, ∞). Each column of A represents one set of age-specific estimates, such as log mortality rates in Japan in 2010, or logit labour participation rates in Germany in 1980.

Let U, D, V be the matrices from a singular value decomposition of A, where we have retained the first K components. Then

Let m_k and s_k be the mean and sample standard deviation of the elements of the kth row of V, with m = (m₁, ⋯, m_k)^⊤ and s = (s₁, ⋯, s_k)^⊤. Then is a standardized version of V.

We can rewrite @ref(eq:svd1) as where F = UDdiag(s) and g = UDm.

Let $\tilde{\pmb{v}}_l$ be a randomly-selected column from $\tilde{\pmb{V}}$. From the construction of $\tilde{\pmb{V}}$ we have E[ṽ_kl] = 0 and var[ṽ_kl] = 1. If z is a vector of standard normal variables, then should look approximately like a randomly-selected column from the original data matrix A.

References

Hyndman, Rob J, and Yeasmin Khandakar. 2008. “Automatic Time Series Forecasting: The Forecast Package for R.” Journal of Statistical Software 26 (3): 1–22. https://doi.org/10.18637/jss.v027.i03.

Norton, Richard A, J Andrés Christen, and Colin Fox. 2018. “Sampling Hyperparameters in Hierarchical Models: Improving on Gibbs for High-Dimensional Latent Fields and Large Datasets.” Communications in Statistics-Simulation and Computation 47 (9): 2639–55.

Simpson, Dan. 2022. “Priors Part 4: Specifying Priors That Appropriately Penalise Complexity.” https://dansblog.netlify.app/posts/2022-08-29-priors4/priors4.html.

Wood, Simon N. 2017. Generalized Additive Models: An Introduction with R. Chapman; Hall/CRC.

2. Mathematical Details

Introduction

Input data

Models

Poisson likelihood

Binomial likelihood

Normal likelihood

Model for prior means

Priors for Intercept, Main Effects, and Interactions

General features

‘Along’ and ‘by’ dimensions

Constraints

Algorithm for assigning default priors

N()

Model

Contribution to posterior density

Forecasting

Code

NFix()

Model

Contribution to posterior density

Forecasting

Code

RW()

Model

Contribution to posterior density

Forecasting

Code

RW2()

Model

Contribution to posterior density

Forecasting

Code

RW2_Infant()

Model

Contribution to posterior density

Forecasting

Code

RW_Seas()

Model

Contribution to posterior density

Forecasting

Code

RW2_Seas()

Model

Contribution to posterior density

Forecasting

Code

AR()

Model

Contribution to posterior density

Forecasting

Code

AR1()

Model

Contribution to posterior density

Forecasting

Code

Lin()

Model

Contribution to posterior density

Forecasting

Code

Lin_AR()

Model

Contribution to posterior density

Forecasting

Code

Lin_AR1()

Model

Contribution to posterior density

Forecasting

Code

Sp()

Model

Contribution to posterior density

Forecasting

Code

SVD()

Model