| Title: | Bayesian Estimation and Forecasting of Age-Specific Rates |
|---|---|
| Description: | Fast Bayesian estimation and forecasting of age-specific rates, probabilities, and means, based on 'Template Model Builder'. |
| Authors: | John Bryant [aut, cre], Junni Zhang [aut], Bayesian Demography Limited [cph] |
| Maintainer: | John Bryant <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.10.9 |
| Built: | 2026-06-01 05:25:44 UTC |
| Source: | https://github.com/bayesiandemography/bage |
Modeling of rates, probabilities, and other values, typically disaggregated by age. Estimation is done using TMB, which makes it fast and scalable.
Specify model using mod_pois()
Fit model using fit()
Extract results using augment()
Check model using replicate_data()
Specify model
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
set_prior() Specify prior for main effect or interaction
priors Overview of priors for main effects or interactions
set_disp() Specify prior for dispersion/variance
set_covariates() Add covariates to model
datamods Overview of data models (measurement error models)
confidential Overview of confidentialization models
Fit model
fit() Derive posterior distribution
Extract results
augment() Original data, plus observation-level estimates
components() Hyper-parameters
dispersion() Dispersion parameter (a type of hyper-parameter)
tidy() One-line summary
set_n_draw() Specify number of prior or posterior draws
Forecast
forecast() Use model to obtain estimates for future periods
Check model
replicate_data() Compare real and simulated data
report_sim() Simulation study of model
Maintainer: John Bryant [email protected]
Authors:
John Bryant [email protected]
Junni Zhang [email protected]
Other contributors:
Bayesian Demography Limited [copyright holder]
Useful links:
Report bugs at https://github.com/bayesiandemography/bage/issues
Use an autoregressive process to model a main effect, or use multiple autoregressive processes to model an interaction. Typically used with time effects or with interactions that involve time.
AR( n_coef = 2, s = 1, shape1 = 5, shape2 = 5, along = NULL, con = c("none", "by") )AR( n_coef = 2, s = 1, shape1 = 5, shape2 = 5, along = NULL, con = c("none", "by") )
n_coef |
Number of lagged terms in the
model, ie the order of the model. Default is |
s |
Scale for the prior for the innovations.
Default is |
shape1, shape2
|
Parameters for beta-distribution prior
for coefficients. Defaults are |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If AR() is used with an interaction, then
separate AR processes are constructed along
the 'along' variable, within each combination of the
'by' variables.
By default, the autoregressive processes
have order 2. Alternative choices can be
specified through the n_coef argument.
Argument s controls the size of innovations.
Smaller values for s tend to give smoother estimates.
An object of class "bage_prior_ar".
When AR() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
Internally, AR() derives a value for that
gives every element of a marginal
variance of . Parameter
has a half-normal prior
The correlation coefficients
each have prior
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
AR() is based on the TMB function
ARk
AR1() Special case of AR(). Can be more
numerically stable than higher-order models.
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
AR(n_coef = 3) AR(n_coef = 3, s = 2.4) AR(along = "cohort")AR(n_coef = 3) AR(n_coef = 3, s = 2.4) AR(along = "cohort")
Use an autoregressive process of order 1 to model a main effect, or use multiple AR1 processes to model an interaction. Typically used with time effects or with interactions that involve time.
AR1( s = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )AR1( s = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )
s |
Scale for the prior for the innovations.
Default is |
shape1, shape2
|
Parameters for beta-distribution prior
for coefficients. Defaults are |
min, max
|
Minimum and maximum values
for autocorrelation coefficient.
Defaults are |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If AR() is used with an interaction,
separate AR processes are constructed along
the 'along' variable, within each combination of the
'by' variables.
Arguments min and max can be used to specify
the permissible range for autocorrelation.
Argument s controls the size of innovations. Smaller values
for s tend to give smoother estimates.
An object of class "bage_prior_ar".
When AR1() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
Internally, AR1() derives a value for that
gives every element of a marginal
variance of . Parameter
has a half-normal prior
where s is provided by the user.
Coefficient is constrained
to lie between min and max.
Its prior distribution is
where
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
AR1() is based on the TMB function
AR1
The defaults for min and max are based on the
defaults for forecast::ets().
AR() Generalization of AR1()
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
AR1() AR1(min = 0, max = 1, s = 2.4) AR1(along = "cohort")AR1() AR1(min = 0, max = 1, s = 2.4) AR1(along = "cohort")
Extract data and rates, probabilities, or means from a model object. The return value consists of the original data and one or more columns of modeled values.
## S3 method for class 'bage_mod' augment(x, rows = NULL, quiet = FALSE, ...)## S3 method for class 'bage_mod' augment(x, rows = NULL, quiet = FALSE, ...)
x |
Object of class |
rows |
Results are returned only for
these rows of |
quiet |
Whether to suppress messages.
Default is |
... |
Unused. Included for generic consistency only. |
The rows argument can be used to
obtain results for a subset within data.
This is faster, and uses less memory, than
generating results for the whole dataset
and then subsetting.
A tibble, with the original data plus one or more of the following columns:
.<outcome> Corrected or extended version of
the outcome variable, in applications where the
outcome variable has missing values, or a data model
is being used.
.observed 'Direct' estimates of rates or
probabilities, ie counts divided by exposure or size
(in Poisson and binomial models.)
.fitted Draws of rates, probabilities,
or means.
.expected Draws of expected values for
rates or probabilities (in Poisson
that include exposure, or in binomial models.)
Uncertain quantities are represented using rvecs.
augment() is typically called on a fitted
model. In this case, the modeled values are
draws from the joint posterior distribution for rates,
probabilities, or means.
augment() can, however, be called on an
unfitted model. In this case, the modeled values
are draws from the joint prior distribution.
In other words, the modeled values are informed by
model priors, and by values for exposure, size, or weights,
but not by observed outcomes.
augment() automatically imputes any missing
values for the outcome variable. If outcome variable
var has one or more NAs, then augment
creates a variable .var
holding original and imputed values.
If the overall model includes a data model
for the outcome variable var,
then augment() creates a new variable .var containing
estimates of the true value for the outcome.
components() Extract values for hyper-parameters
dispersion() Extract values for dispersion
tidy() Short summary of a model
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
fit() Fit a model
is_fitted() See if a model has been fitted
unfit() Reset a model
datamods Overview of data models implemented in bage
set.seed(0) ## specify model mod <- mod_pois(divorces ~ age + sex + time, data = nzl_divorces, exposure = population) |> set_n_draw(n_draw = 100) ## smaller sample, so 'augment' faster ## fit model mod <- mod |> fit() ## draw from the posterior distribution mod |> augment() ## results for females only mod |> augment(rows = sex == "Female") ## insert a missing value into outcome variable divorces_missing <- nzl_divorces divorces_missing$divorces[1] <- NA ## fitting model and calling 'augument'A ## creates a new variable called '.divorces' ## holding observed and imputed values mod_pois(divorces ~ age + sex + time, data = divorces_missing, exposure = population) |> fit() |> augment() ## specifying a data model for the ## original data also leads to a new ## variable called '.divorces' mod_pois(divorces ~ age + sex + time, data = nzl_divorces, exposure = population) |> set_datamod_outcome_rr3() |> fit() |> augment()set.seed(0) ## specify model mod <- mod_pois(divorces ~ age + sex + time, data = nzl_divorces, exposure = population) |> set_n_draw(n_draw = 100) ## smaller sample, so 'augment' faster ## fit model mod <- mod |> fit() ## draw from the posterior distribution mod |> augment() ## results for females only mod |> augment(rows = sex == "Female") ## insert a missing value into outcome variable divorces_missing <- nzl_divorces divorces_missing$divorces[1] <- NA ## fitting model and calling 'augument'A ## creates a new variable called '.divorces' ## holding observed and imputed values mod_pois(divorces ~ age + sex + time, data = divorces_missing, exposure = population) |> fit() |> augment() ## specifying a data model for the ## original data also leads to a new ## variable called '.divorces' mod_pois(divorces ~ age + sex + time, data = nzl_divorces, exposure = population) |> set_datamod_outcome_rr3() |> fit() |> augment()
Extract values for hyper-parameters from a model object. Hyper-parameters include
main effects and interactions,
dispersion,
trends, seasonal effects, errors,
SVD, spline, and covariate coefficients,
standard deviations, correlation coefficients.
## S3 method for class 'bage_mod' components(object, quiet = FALSE, original_scale = FALSE, ...)## S3 method for class 'bage_mod' components(object, quiet = FALSE, original_scale = FALSE, ...)
object |
Object of class |
quiet |
Whether to suppress messages.
Default is |
original_scale |
Whether values for
|
... |
Unused. Included for generic consistency only. |
A tibble with four columns columns:
The return value contains the following columns:
term Model term that the hyper-parameter belongs to.
component Component within term.
level Element within component .
.fitted An rvec containing
draws from the posterior distribution.
components() is typically called on a fitted
model. In this case, the values returned are
draws from the joint posterior distribution for the
hyper-parameters in the model.
components() can, however, be called on an
unfitted model. In this case, the values returned
are draws from the joint prior distribution.
In other words, the values incorporate
model priors, and any exposure, size, or weights
argument, but not observed outcomes.
Internally, models created with mod_norm()
are fitted using transformed versions of the
outcome and weights variables. By default, when components()
is used with these models,
it returns values for .fitted
that are based on the transformed versions.
To instead obtain values for "effect", "trend", "season",
"error" and "disp" that are based on the
untransformed versions,
set original_scale to TRUE.
augment() Extract values for rates,
means, or probabilities,
together with original data
dispersion() Extract values for dispersion
tidy() Extract a one-line summary of a model
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
fit() Fit a model
is_fitted() See if a model has been fitted
unfit() Reset a model
set.seed(0) ## specify model mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## extract prior distribution ## of hyper-parameters mod |> components() ## fit model mod <- mod |> fit() ## extract posterior distribution ## of hyper-parameters mod |> components() ## fit normal model mod <- mod_norm(value ~ age * diag + year, data = nld_expenditure, weights = 1) |> fit() ## dispersion (= standard deviation in normal model) ## on the transformed scale mod |> components() |> subset(component == "disp") ## disperson on the original scale mod |> components(original_scale = TRUE) |> subset(component == "disp")set.seed(0) ## specify model mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## extract prior distribution ## of hyper-parameters mod |> components() ## fit model mod <- mod |> fit() ## extract posterior distribution ## of hyper-parameters mod |> components() ## fit normal model mod <- mod_norm(value ~ age * diag + year, data = nld_expenditure, weights = 1) |> fit() ## dispersion (= standard deviation in normal model) ## on the transformed scale mod |> components() |> subset(component == "disp") ## disperson on the original scale mod |> components(original_scale = TRUE) |> subset(component == "disp")
Extract the matrix and offset used by a scaled SVD summary of a demographic database.
## S3 method for class 'bage_ssvd' components( object, v = NULL, n_comp = NULL, indep = NULL, age_labels = NULL, ... )## S3 method for class 'bage_ssvd' components( object, v = NULL, n_comp = NULL, indep = NULL, age_labels = NULL, ... )
object |
An object of class |
v |
Version of scaled SVD components to use. If no value is suppled, the most recent version is used. |
n_comp |
The number of components.
The default is half the total number of
components of |
indep |
Whether to use independent or
joint SVDs for each sex/gender, if the
data contains a sex/gender variable.
The default is to use independent SVDs.
To obtain results for the total population
when the data contains a sex/gender variable,
set |
age_labels |
Age labels for the desired age or age-sex profile. If no labels are supplied, the most detailed profile available is used. |
... |
Not currently used. |
A tibble with the offset and components.
HMD Mortality rates from the
Human Mortality Database.
HFD Fertility rates from the
Human Fertility Database.
generate() Randomly generate age-profiles, or age-sex profiles, based on a scaled SVD summary.
SVD() SVD prior for terms involving age.
SVD_AR1(), SVD_AR(), SVD_RW(), SVD_RW2()
Dynamic SVD priors for terms involving age and time.
poputils::age_labels() Generate age labels.
## females and males modeled independently components(LFP, n_comp = 3) ## joint model for females and males components(LFP, indep = FALSE, n_comp = 3) ## females and males combined components(LFP, indep = NA, n_comp = 3) ## specify age groups labels <- poputils::age_labels(type = "five", min = 15, max = 60) components(LFP, age_labels = labels)## females and males modeled independently components(LFP, n_comp = 3) ## joint model for females and males components(LFP, indep = FALSE, n_comp = 3) ## females and males combined components(LFP, indep = NA, n_comp = 3) ## specify age groups labels <- poputils::age_labels(type = "five", min = 15, max = 60) components(LFP, age_labels = labels)
Get information on computations performed by function fit().
The information includes the total time used for fitting, and
the time used for two particular tasks that can be slow:
running the optimizer stats::nlminb(),
and drawing from the multivariate normal returned
by the TMB. It also includes values returned by the optimizer:
the number of iterations needed, and messages about convergence.
computations(object)computations(object)
object |
A fitted object of class |
A tibble with the following variables:
time_total Seconds used for whole fitting process.
time_max Seconds used for optimisiation.
time_draw Seconds used by function TMB::sdreport().
iter Number of iterations required for optimization.
message Message about convergence returned by optimizer.
mod_pois(),mod_binom(),mod_norm() Specify a model
fit() Fit a model
tidy() Summarise a model
set_n_draw() Specify number of posterior draws
mod <- mod_pois(divorces ~ age + sex + time, data = nzl_divorces, exposure = population) |> fit() computations(mod)mod <- mod_pois(divorces ~ age + sex + time, data = nzl_divorces, exposure = population) |> fit() computations(mod)
The models for rates, probabilities, or means
created with functions mod_pois(), mod_binom(),
and mod_norm() can be extended by adding
descriptions of confidentalization procedures applied
to the outcome variable.
Data models for outcome variable
| Function | Confidentialization procedure | pois | binom | norm |
set_confidential_rr3() |
Outcome randomly rounded to multiple of 3 | Yes | Yes | No |
An object of class "bage_ssvd"
holding scaled SVD components derived from
census data on school attendance.
The attendance data is assembed by
the United Nations Statistics Division.
CSA holds 5 components.
CSACSA
Object of class "bage_ssvd".
Versions:
"v2025" (default). Data downloaded on 2025-11-05
Compared other age-sex patterns for other demographic processes such as mortality, age-sex patterns for school attendance show substantial variation across populations. More components may be needed to obtain satisfactory models of age-sex patterns for school attendance than for other processes.
Derived from data in the
"Population 5 to 24 years of age by school
attendance, sex and urban/rural residence" table
from the
Population Censuses' Datasets
database assembled by the United Nations
Statistics Division.
Code to create CSA
is in folder ‘data-raw/ssvd_csa’
in the source code for the bage
package.
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() A prior based on a scaled SVD
A subset of the data needed to produce
a scaled SVD object, derived from
data from the World Marriage Database.
The data is formatted using function
data_ssvd_wmd() in package bssvd.
data_wmddata_wmd
A tibble with 6 rows and with columns
version, type, labels_age,
labels_sexgender, matrix, and offset.
Derived from data from the World Marriage Data 2019 database available on the United Nations Population Division website, and assembled by the UNPD from national census and survey data.
ssvd() Function to create scaled SVD objects
WMD_C Scaled SVD object based on a full set of World Marriage Database data.
Scaled SVDs Overview of scaled SVDs implemented in bage
The models for rates, probabilities, or means
created with functions mod_pois(), mod_binom(),
and mod_norm() can be extended by adding data models,
also referred to as measurement error models.
| Function | Assumptions about measurement error | Poisson | Binomial | Normal |
set_datamod_miscount() |
Reported outcome has undercount and overcount | Yes | No | No |
set_datamod_undercount() |
Reported outcome has undercount | Yes | Yes | No |
set_datamod_overcount() |
Reported outcome has overcount | Yes | No | No |
set_datamod_noise() |
Reported outcome unbiased, but with positive and negative measurement errors | Yes* | No | Yes |
set_datamod_exposure() |
Reported exposure unbiased, but with positive and negative measurement errors | Yes* | No | No |
*Models with no dispersion term for rates.
Datasets in bage
| dataset | Outcome | Variables | Country |
| isl_deaths | Deaths | age, sex, time, deaths, popn |
Iceland |
| kor_births | Births | age, region, time, popn, gdp_pc_2023, dens2020 |
South Korea |
| nld_expenditure | Health expenditure | diag, age, year, value |
Netherlands |
| nzl_divorces | Divorces | age, sex, time, divorces, population |
New Zealand |
| nzl_households | One-person households | age, region, year, oneperson, total |
New Zealand |
| nzl_injuries | Accidental deaths | age, sex, ethnicity, year, injuries, popn |
New Zealand |
| prt_deaths | Deaths | age, time, deaths, exposure |
Portugal |
| swe_infant | Infant deaths | county, time, births, deaths |
Sweden |
| usa_deaths | Accidental deaths | month, deaths |
United States |
Extract values for the 'dispersion' parameter from a model object.
dispersion(object, quiet = FALSE, original_scale = FALSE)dispersion(object, quiet = FALSE, original_scale = FALSE)
object |
Object of class |
quiet |
Whether to suppress messages.
Default is |
original_scale |
Whether values for
disperson are on the original
scale or the transformed scale.
Default is |
An rvec
(or NULL if the model does not
include a dispersion parameter.)
dispersion() is typically called on a fitted
model. In this case, the values for dispersion are
draws from the posterior distribution.
dispersion() can, however, be called on an
unfitted model. In this case, the values
are drawn from the prior distribution.
Internally, models created with mod_norm()
are fitted using transformed versions of the
outcome and weights variables. By default, when dispersion()
is used with these models,
it returns values on the transformed scale.
To instead obtain values on the untransformed
scale, set original_scale to TRUE.
components() Extract values for hyper-parameters,
including dispersion
set_disp() Specify a prior for dispersion
set.seed(0) ## specify model mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## prior distribution mod |> dispersion() ## fit model mod <- mod |> fit() ## posterior distribution mod |> dispersion() ## fit normal model mod <- mod_norm(value ~ age * diag + year, data = nld_expenditure, weights = 1) |> fit() ## values on the transformed scale mod |> dispersion() ## values on the original scale mod |> dispersion(original_scale = TRUE)set.seed(0) ## specify model mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## prior distribution mod |> dispersion() ## fit model mod <- mod |> fit() ## posterior distribution mod |> dispersion() ## fit normal model mod <- mod_norm(value ~ age * diag + year, data = nld_expenditure, weights = 1) |> fit() ## values on the transformed scale mod |> dispersion() ## values on the original scale mod |> dispersion(original_scale = TRUE)
Use a damped random walk as a model for a main effect, or use multiple damped random walks as a model for an interaction. Typically used with terms that involve time, particularly in forecasts. Damping often improves forecast accuracy.
DRW( s = 1, sd = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )DRW( s = 1, sd = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )
s |
Scale for the prior for the innovations.
Default is |
sd |
Standard deviation
of initial value. Default is |
shape1, shape2
|
Parameters for beta-distribution prior
for damping coefficient. Defaults are |
min, max
|
Minimum and maximum values
for damping coefficient.
Defaults are |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If DRW() is used with an interaction,
a separate damped random walk is constructed
within each combination of the
'by' variables.
Arguments min and max can be used to control
the amount of damping that occurs.
Argument s controls the size of innovations.
Smaller values for s tend to produce smoother series.
Argument sd controls variance in
initial values. Setting sd to 0 fixes initial
values at 0.
An object of class "bage_prior_drwrandom"
or "bage_prior_drwzero".
When DRW() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
is the damping coefficient;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
Coefficient is constrained
to lie between min and max.
Its prior distribution is
where
Standard deviation
has a half-normal prior
where s is provided by the user.
DRW() has the same basic structure as AR1(). However,
in DRW(), controls the variance of the innovations,
but in AR1() controls the marginal
variance of each or .
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
DRW2() Damped second-order random walk
RW() Random walk, without damping
RW2() Second-order random walk, without damping
RW_Seas() Random walk with seasonal effect
AR() Autoregressive with order k
AR1() Autoregressive with order 1
Sp() Smoothing via splines
SVD() Smoothing over age using singular value decomposition
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
DRW() DRW(min = 0, max = 1) DRW(sd = 0)DRW() DRW(min = 0, max = 1) DRW(sd = 0)
Use a damped second-order random walk as a model for a main effect, or use multiple second-order random walks as a model for an interaction. A damped second-order random walk is a random walk with drift where the drift term varies, with a tendency to converge on zero. It is typically used with terms that involve time, where there are sustained trends upward or downward. Damping often improves forecast accuracy.
DRW2( s = 1, sd = 1, sd_slope = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )DRW2( s = 1, sd = 1, sd_slope = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )
s |
Scale for the prior for the innovations.
Default is |
sd |
Standard deviation
of initial value. Default is |
sd_slope |
Standard deviation
of initial slope. Default is |
shape1, shape2
|
Parameters for beta-distribution prior
for damping coefficient. Defaults are |
min, max
|
Minimum and maximum values
for damping coefficient.
Defaults are |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If DRW2() is used with an interaction,
a separate damped random walk is constructed
within each combination of the
'by' variables.
Arguments min and max can be used to control
the amount of damping that occurs.
Argument s controls the size of innovations.
Smaller values for s tend to give smoother series.
Argument sd controls variance in
initial values. Setting sd to 0 fixes
initial values at 0.
Argument sd_slope controls variance in the
initial slope.
An object of class "bage_prior_drw2random"
or "bage_prior_drw2zero".
When DRW2() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
is the damping coefficient;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
Coefficient is constrained
to lie between min and max.
Its prior distribution is
where
Standard deviation
has a half-normal prior
where s is provided by the user.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
DRW() Damped first-order random walk
RW2() Second-order random walk, without damping
RW2_Seas() Second order random walk with seasonal effect
AR() Autoregressive with order k
AR1() Autoregressive with order 1
Sp() Smoothing via splines
SVD() Smoothing over age via singular value decomposition
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
DRW2() DRW2(s = 0.5) DRW2(min = 0, max = 1)DRW2() DRW2(s = 0.5) DRW2(min = 0, max = 1)
Derive the posterior distribution for a model.
## S3 method for class 'bage_mod' fit( object, method = c("standard", "inner-outer"), vars_inner = NULL, optimizer = c("multi", "nlminb", "BFGS", "CG"), quiet = TRUE, max_jitter = 1e-04, start_oldpar = FALSE, ... )## S3 method for class 'bage_mod' fit( object, method = c("standard", "inner-outer"), vars_inner = NULL, optimizer = c("multi", "nlminb", "BFGS", "CG"), quiet = TRUE, max_jitter = 1e-04, start_oldpar = FALSE, ... )
object |
A |
method |
Estimation method. Current
choices are |
vars_inner |
Names of variables to use
for inner model when |
optimizer |
Which optimizer to use.
Current choices are |
quiet |
Whether to suppress messages
from optimizer. Default is |
max_jitter |
Maximum quantity to add to diagonal of precision matrix, if Cholesky factorization is failing. Default is 0.0001. |
start_oldpar |
Whether the optimizer should start
at previous estimates. Used only
when |
... |
Not currently used. |
A bage_mod object
When method is "standard" (the default),
all parameters, other than
the lowest-level rates, probabilities, or
means are jointly estimated within TMB.
When method is "inner-outer", estimation is
carried out in multiple steps, which, in large models,
can sometimes reduce computation times.
In Step 1, a model only using the inner variables
is fitted to the data.
In Step 2, a model only using the
outer variables is fitted to the data.
In Step 3, values for dispersion are calculated.
Parameter estimates from steps 1, 2, and 3
are then combined.
The choices for the optimizer argument are:
"multi" Try "nlminb", and if that fails,
restart from the parameter values where "nlminb" stopped,
using "BFGS". The default.
"nlminb" stats::nlminb()
"BFGS" stats::optim() using method "BFGS".
"GC" stats::optim() using method "CG" (conjugate gradient).
max_jitter
Sampling from the posterior distribution requires
performing a Cholesky factorization of the precision
matrix returned by TMB. This factorization sometimes
fails because of numerical problems. Adding a small
quantity to the diagonal of the precision matrix
can alleviate numerical problems, while potentially
reducing accuracy. If the Cholesky factorization
initially fails, bage will try again with progressively
larger quantities added to the diagonal, up to the
maximum set by max_jitter. Increasing the value of
max_jitter can help suppress numerical problems.
A safer strategy, however, is to simplify
the model, or to use more informative priors.
Up to version 0.9.8 of bage, fit() always aggregated
across cells with identical values of the
predictor variables
in formula (ie the variables to the right of ~)
before fitting. For instance,
if a dataset contained deaths and population
disaggregated by age and sex, but the model formula
was deaths ~ age, then fit() would aggregate
deaths and population within each age category
before fitting the model. From
version 0.9.9, fit()
only aggregates across cells with identical
values if no data model is used,
and if the model is Poisson with
dispersion set to 0 or is normal.
Note that this change in behavior has no effect
on most models, since most models include all
variables used to classify outcomes.
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
augment() Extract values for rates,
probabilities, or means, together
with original data
components() Extract values for hyper-parameters
dispersion() Extract values for dispersion
forecast() Forecast, based on a model
report_sim() Simulation study of a model
unfit() Reset a model
is_fitted() Check if a model has been fitted
Mathematical Details vignette
## specify model mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## examine unfitted model mod ## fit model mod <- fit(mod) ## examine fitted model mod ## extract rates aug <- augment(mod) aug ## extract hyper-parameters comp <- components(mod) comp## specify model mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## examine unfitted model mod ## fit model mod <- fit(mod) ## examine fitted model mod ## extract rates aug <- augment(mod) aug ## extract hyper-parameters comp <- components(mod) comp
Forecast rates, probabilities, means, and other model parameters.
## S3 method for class 'bage_mod' forecast( object, newdata = NULL, labels = NULL, output = c("augment", "components"), include_estimates = FALSE, rows = NULL, quiet = FALSE, ... )## S3 method for class 'bage_mod' forecast( object, newdata = NULL, labels = NULL, output = c("augment", "components"), include_estimates = FALSE, rows = NULL, quiet = FALSE, ... )
object |
A |
newdata |
Data frame with data for future periods. |
labels |
Labels for future time periods. |
output |
Type of output returned.
|
include_estimates |
Whether to
include historical estimates along
with the forecasts. Default is |
rows |
A logical vector,
an index vector, or a logical expression
specifying which rows to return
from an |
quiet |
Whether to suppress messages.
Default is |
... |
Not currently used. |
A tibble.
Internally, the steps involved in a forecast are:
Forecast time-varying main effects and interactions, e.g. a time main effect, or an age-time interaction.
Combine forecasts for the time-varying main effects and interactions with non-time-varying parameters, e.g. age effects or dispersion.
Use the combined parameters to generate values for rates, probabilities or means.
Optionally, generate values for the outcome variable.
forecast() generates values for the outcome variable when,
output is "augment",
a value has been supplied for newdata,
newdata included a value for the exposure,
size, or weights variable (except if exposure = 1
or weights = 1 in the original call to
mod_pois() or mod_norm()).
Mathematical Details gives more details on the internal calculations in forecasting.
When output is "augment" (the default),
the return value from forecast()
looks like output from function augment(). When output is
"components", the return value looks like output
from components().
When include_estimates is FALSE (the default),
the output of forecast() excludes values for
time-varying parameters for the period covered by the data.
When include_estimates is TRUE, the output
includes these values.
Setting include_estimates to TRUE can be helpful
when creating graphs that combine estimates and forecasts.
Models that contain covariates can be used in forecasts, provided that
all coefficients (the ) are estimated
from historical data via fit(), and
if any covariates (the columns of )
are time-varying, then future values for these
covariates are supplied via the newdata argument.
Models that contain data models can be used in forecasts, provided that
the data models have no time-varying parameters, or
future values for time-varying parameters are supplied when the data model is first specified.
For examples, see the Data Models vignette.
forecast() is typically used with a
fitted model, i.e. a model in which parameter
values have been estimated from the data.
The resulting forecasts reflect data and priors.
forecast() can, however, be used with an
unfitted model. In this case, the forecasts
are based entirely on the priors. See below for
an example. Experimenting with forecasts
based entirely on the priors can be helpful for
choosing an appropriate model.
The interface for forecast() has not been finalised.
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
fit() Fit a model
augment() Extract values for rates,
probabilities, or means, together
with original data
components() Extract values for hyper-parameters
Mathematical Details vignette
## specify and fit model mod <- mod_pois(injuries ~ age * sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> fit() mod ## forecasts mod |> forecast(labels = 2019:2024) ## combined estimates and forecasts mod |> forecast(labels = 2019:2024, include_estimates = TRUE) ## only selected rows mod |> forecast(labels = 2019:2024, rows = age == "40-44") ## hyper-parameters mod |> forecast(labels = 2019:2024, output = "components") ## hold back some data and forecast library(dplyr, warn.conflicts = FALSE) data_historical <- nzl_injuries |> filter(year <= 2015) data_forecast <- nzl_injuries |> filter(year > 2015) |> mutate(injuries = NA) mod_pois(injuries ~ age * sex + ethnicity + year, data = data_historical, exposure = popn) |> fit() |> forecast(newdata = data_forecast) ## forecast using GDP per capita in 2023 as a covariate mod_births <- mod_pois(births ~ age * region + time, data = kor_births, exposure = popn) |> set_covariates(~ gdp_pc_2023) |> fit() mod_births |> forecast(labels = 2024:2025)## specify and fit model mod <- mod_pois(injuries ~ age * sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> fit() mod ## forecasts mod |> forecast(labels = 2019:2024) ## combined estimates and forecasts mod |> forecast(labels = 2019:2024, include_estimates = TRUE) ## only selected rows mod |> forecast(labels = 2019:2024, rows = age == "40-44") ## hyper-parameters mod |> forecast(labels = 2019:2024, output = "components") ## hold back some data and forecast library(dplyr, warn.conflicts = FALSE) data_historical <- nzl_injuries |> filter(year <= 2015) data_forecast <- nzl_injuries |> filter(year > 2015) |> mutate(injuries = NA) mod_pois(injuries ~ age * sex + ethnicity + year, data = data_historical, exposure = popn) |> fit() |> forecast(newdata = data_forecast) ## forecast using GDP per capita in 2023 as a covariate mod_births <- mod_pois(births ~ age * region + time, data = kor_births, exposure = popn) |> set_covariates(~ gdp_pc_2023) |> fit() mod_births |> forecast(labels = 2024:2025)
Generate draws from priors for model terms.
## S3 method for class 'bage_prior_ar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drwrandom' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drwzero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drw2random' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drw2zero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_known' generate(x, n_element = 20, n_draw = 25, ...) ## S3 method for class 'bage_prior_lin' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_linar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_linex' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_norm' generate(x, n_element = 20, n_draw = 25, ...) ## S3 method for class 'bage_prior_normfixed' generate(x, n_element = 20, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwrandom' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwrandomseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwrandomseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwzero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwzeroseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwzeroseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2random' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2randomar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2randomseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2randomseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zeroar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zeroseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zeroseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_spline' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd' generate(x, n_element = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_ar' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drwrandom' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drwzero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drw2random' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drw2zero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_lin' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_linex' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rwrandom' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rwzero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rw2random' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rw2zero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...)## S3 method for class 'bage_prior_ar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drwrandom' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drwzero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drw2random' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_drw2zero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_known' generate(x, n_element = 20, n_draw = 25, ...) ## S3 method for class 'bage_prior_lin' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_linar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_linex' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_norm' generate(x, n_element = 20, n_draw = 25, ...) ## S3 method for class 'bage_prior_normfixed' generate(x, n_element = 20, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwrandom' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwrandomseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwrandomseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwzero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwzeroseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rwzeroseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2random' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2randomar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2randomseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2randomseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zero' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zeroar' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zeroseasfix' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_rw2zeroseasvary' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_spline' generate(x, n_along = 20, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd' generate(x, n_element = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_ar' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drwrandom' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drwzero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drw2random' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_drw2zero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_lin' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_linex' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rwrandom' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rwzero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rw2random' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...) ## S3 method for class 'bage_prior_svd_rw2zero' generate(x, n_along = 5, n_by = 1, n_draw = 25, ...)
x |
Object of class |
n_along |
Number of elements of
'along' dimension. Default is |
n_by |
Number of combinations of
'by' variables. Default is |
n_draw |
Number of draws. Default
is |
... |
Unused. Included for generic consistency only. |
n_element |
Number of elements in term,
in priors that do not distinguish
'along' and 'by' dimensions.
Default is |
Some priors distinguish between 'along' and 'by'
dimensions, while others do not: see priors
for a complete list. Arguments n_along and n_by
are used with priors that make the distinction,
and argument n_element is used with priors that do not.
A tibble
priors Overview of priors implemented in bage
## prior that distinguishes 'along' and 'by' x <- RW() generate(x, n_along = 10, n_by = 2) ## prior that does not distinguish x <- N() generate(x, n_element = 20) ## SVD_AR(), SVD_RW(), and SVD_RW2() ## distinguish 'along' and 'by' x <- SVD_AR(HFD) generate(x, n_along = 5, n_by = 2) ## SVD() does not x <- SVD(HFD) generate(x, n_element = 10)## prior that distinguishes 'along' and 'by' x <- RW() generate(x, n_along = 10, n_by = 2) ## prior that does not distinguish x <- N() generate(x, n_element = 20) ## SVD_AR(), SVD_RW(), and SVD_RW2() ## distinguish 'along' and 'by' x <- SVD_AR(HFD) generate(x, n_along = 5, n_by = 2) ## SVD() does not x <- SVD(HFD) generate(x, n_element = 10)
Generate random age or age-sex profiles from
an object of class "bage_ssvd". An object
of class "bage_ssvd" holds results from
an SVD decomposition of demographic
data.
## S3 method for class 'bage_ssvd' generate( x, v = NULL, n_draw = 20, n_comp = NULL, indep = NULL, age_labels = NULL, ... )## S3 method for class 'bage_ssvd' generate( x, v = NULL, n_draw = 20, n_comp = NULL, indep = NULL, age_labels = NULL, ... )
x |
An object of class |
v |
Version of data to use. |
n_draw |
Number of random draws to generate. |
n_comp |
The number of components.
The default is half the total number of
components of |
indep |
Whether to use independent or
joint SVDs for each sex/gender, if the
data contains a sex/gender variable.
The default is to use independent SVDs.
To obtain results for the total population
when the data contains a sex/gender variable,
set |
age_labels |
Age labels for the desired age or age-sex profile. If no labels are supplied, the most detailed profile available is used. |
... |
Unused. Included for generic consistency only. |
A tibble
HMD Mortality rates from the
Human Mortality Database.
HFD Fertility rates from the
Human Fertility Database.
components() Components used by SVD prior.
SVD() SVD prior for term involving age.
SVD_AR1(), SVD_AR(), SVD_RW(), SVD_RW2()
Dynamic SVD priors for terms involving age and time.
poputils::age_labels() Generate age labels.
## females and males modeled independently generate(HMD) ## joint model for females and males generate(HMD, indep = FALSE) ## SVD for females and males combined generate(HMD, indep = NA) ## specify age groups labels <- poputils::age_labels(type = "lt", max = 60) generate(HMD, age_labels = labels)## females and males modeled independently generate(HMD) ## joint model for females and males generate(HMD, indep = FALSE) ## SVD for females and males combined generate(HMD, indep = NA) ## specify age groups labels <- poputils::age_labels(type = "lt", max = 60) generate(HMD, age_labels = labels)
An object of class "bage_ssvd"
holding scaled SVD components derived from
data from the Human Fertility Database.
HFD holds 5 components.
HFDHFD
Object of class "bage_ssvd".
Versions:
"v2025" (default) Data published on 2025-07-24
"v2024" Data published on October 2024-10-23
Derived from data from the
Human Fertility Database.
Max Planck Institute
for Demographic Research (Germany) and Vienna
Institute of Demography (Austria).
Code to create HFD
is in folder ‘data-raw/ssvd_hfd’
in the source code for the bage
package.
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() A prior based on a scaled SVD
Objects of class "bage_ssvd"
holding scaled SVD components derived from
data from the Human Internal Migration Database.
HIMD_P1, HIMD_P5, and HIMD_R each
hold 5 components
HIMD_R HIMD_P1 HIMD_P5HIMD_R HIMD_P1 HIMD_P5
Object of class "bage_ssvd".
Versions:
"v2024" (default) Data published on 2024-10-23
HIMD_P1 is derived from data on
1-year migration probabilities, ie the
probability that a person will migrate
during a time interval of 1 year.
HIMD_P5 is derived from data on
5-year migration probabilities, ie the
probability that a person will migrate
during a time interval of 5 years.
HIMD_R is derived from data on 1-year
migration probabilities, using the
formula .
Dyrting, S. (2024, October 23).
Data from: Estimating Complete Migration Probabilities from Grouped Data.
Retrieved from osf.io/vmrfk
on 1 September 2025.
Code to create HIMD_R,
HIMD_P1 and HIMD_P5
is in folder ‘data-raw/ssvd_himd’
in the source code for the bage
package.
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() A prior based on a scaled SVD
An object of class "bage_ssvd"
holding scaled SVD components derived from
data from the
Human Mortality Database.
HMD holds 5 components.
HMDHMD
Object of class "bage_ssvd".
Versions:
"v2025" (default) Data published on
2025-09-25, all years
"v2025-50" Data published on
2025-09-25, 1950 and later
"v2024" Data published on
2024-02-26, all years
Derived from data from the
Human Mortality Database.
Max Planck Institute for
Demographic Research (Germany), University of California,
Berkeley (USA), and French Institute for Demographic Studies
(France). Code to create HMD
is in folder ‘data-raw/ssvd_hmd’
in the source code for the bage
package.
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() A prior based on a scaled SVD
Test whether fit() has been called on a model object.
is_fitted(x)is_fitted(x)
x |
An object of class |
TRUE or FALSE
mod_pois(), mod_binom(), mod_norm() to specify a model
fit() to fit a model
mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) is_fitted(mod) mod <- fit(mod) is_fitted(mod)mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) is_fitted(mod) mod <- fit(mod) is_fitted(mod)
Deaths and mid-year populations in Iceland, by age, sex, and calendar year.
isl_deathsisl_deaths
A tibble with 5,300 rows and the following columns:
age Single year of age, up to "105+"
sex "Female" and "Male"
time Calendar year, 1998-2022
deaths Counts of deaths
popn Mid-year population
Tables "Deaths by municipalities, sex and age 1981-2022", and "Average annual population by municipality, age and sex 1998-2022 - Current municipalities", on the Statistics Iceland website. Data downloaded on 12 July 2023.
datasets Overview of datasets in bage
Treat an intercept, a main effect, or an interaction as fixed and known.
Known(values)Known(values)
values |
A numeric vector |
An object of class "bage_prior_known".
NFix() Prior where level unknown, but variability known.
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
Known(-2.3) Known(c(0.1, 2, -0.11))Known(-2.3) Known(c(0.1, 2, -0.11))
Births and mid-year population by age of mother, region, and calendar year, 2011-2023, plus regional data on GDP per capita and population density.
kor_birthskor_births
A tibble with 1,872 rows and the following columns:
age Five-year age groups from "10-14" to "50-54"'
region Administrative region
time Calendar year, 2011-2023
births Counts of births
popn Mid-year population
gdp_pc_2023 Regional GDP per capita in 2023
dens_2020 Regional population density
(people per km-squared) in 2020
Tables "Live Births by Age Group of Mother, Sex and Birth Order for Provinces", and "Resident Population in Five-Year Age Groups", on the Korean Statistical Information Service website. Data downloaded on 24 September 2024. Data on GDP per capita and population density from Wikipedia https://w.wiki/DMFA, data downloaded on 8 March 2025, and https://w.wiki/DMF9, data downloaded on 8 March 2025.
datasets Overview of datasets in bage
An object of class "bage_ssvd"
holding scaled SVD components
derived from labor force participation data
assembled by the OECD.
LFP holds 5 components.
LFPLFP
Object of class "bage_ssvd".
Versions:
"v2025" Data downloaded on 2025-10-17
Derived from data in the "Labor Force Indicators"
table of the
OECD Data Explorer.
Code to create LFS
is in folder ‘data-raw/ssvd_lfp’ in
the source code for the bage
package.
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() A prior based on a scaled SVD
Use a line or lines with independent normal errors to model a main effect or interaction. Typically used with time.
Lin(s = 1, mean_slope = 0, sd_slope = 1, along = NULL, con = c("none", "by"))Lin(s = 1, mean_slope = 0, sd_slope = 1, along = NULL, con = c("none", "by"))
s |
Scale for the prior for the errors.
Default is |
mean_slope |
Mean in prior for slope of line. Default is 0. |
sd_slope |
Standard deviation in prior for slope of line. Default is 1. |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If Lin() is used with an interaction,
then separate lines are constructed along
the 'along' variable, within each combination
of the 'by' variables.
Argument s controls the size of the errors.
Smaller values give smoother estimates.
s can be zero, in which case errors are zero,
and all values lie exactly on straight lines.
This is clearly a simplification, but it allows
the prior to be used with very large
interactions.
Argument sd_slope controls the size of the slopes of
the lines. Larger values can give more steeply
sloped lines.
An object of class "bage_prior_lin".
When Lin() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'along' variable of the interaction; and
denotes position within the 'by' variable(s) of the interaction.
Parameter has a half-normal prior
When , the model reduces to
or
.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
Lin_AR() Linear with AR errors
Lin_AR1() Linear with AR1 errors
RW2() Second-order random walk
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
Lin() Lin(s = 0.5, sd_slope = 2) Lin(s = 0) Lin(along = "cohort")Lin() Lin(s = 0.5, sd_slope = 2) Lin(s = 0) Lin(along = "cohort")
Use a line or lines with autoregressive errors to model a main effect or interaction. Typically used with time.
Lin_AR( n_coef = 2, s = 1, shape1 = 5, shape2 = 5, mean_slope = 0, sd_slope = 1, along = NULL, con = c("none", "by") )Lin_AR( n_coef = 2, s = 1, shape1 = 5, shape2 = 5, mean_slope = 0, sd_slope = 1, along = NULL, con = c("none", "by") )
n_coef |
Number of lagged terms in the
model, ie the order of the model. Default is |
s |
Scale for the innovations in the
AR process. Default is |
shape1, shape2
|
Parameters for
beta-distribution prior
for coefficients in the AR process.
Defaults are |
mean_slope |
Mean in prior for slope of line. Default is 0. |
sd_slope |
Standard deviation in the prior for the slope of the line. Larger values imply steeper slopes. Default is 1. |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If Lin_AR() is used with an interaction,
separate lines are constructed along
the 'along' variable, within each combination
of the 'by' variables.
The order of the autoregressive errors is
controlled by the n_coef argument. The
default is 2.
Argument s controls the size of the innovations.
Smaller values tend to give smoother estimates.
Argument sd_slope controls the slopes of
the lines. Larger values can give more steeply
sloped lines.
An object of class "bage_prior_linar".
When Lin_AR() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
The slopes have priors
and
Internally, Lin_AR() derives a value for that
gives or a marginal
variance of . Parameter
has a half-normal prior
The correlation coefficients
each have prior
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
Lin_AR1() Special case of Lin_AR()
Lin() Line with independent normal errors
AR() AR process with no line
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
Lin_AR() Lin_AR(n_coef = 3, s = 0.5, sd_slope = 2)Lin_AR() Lin_AR(n_coef = 3, s = 0.5, sd_slope = 2)
Use a line or lines with AR1 errors to model a main effect or interaction. Typically used with time.
Lin_AR1( s = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, mean_slope = 0, sd_slope = 1, along = NULL, con = c("none", "by") )Lin_AR1( s = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, mean_slope = 0, sd_slope = 1, along = NULL, con = c("none", "by") )
s |
Scale for the innovations in the
AR process. Default is |
shape1, shape2
|
Parameters for
beta-distribution prior
for coefficients in the AR process.
Defaults are |
min, max
|
Minimum and maximum values
for autocorrelation coefficient in the AR process.
Defaults are |
mean_slope |
Mean in prior for slope of line. Default is 0. |
sd_slope |
Standard deviation in the prior for the slope of the line. Larger values imply steeper slopes. Default is 1. |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If Lin_AR1() is used with an interaction,
separate lines are constructed along
the 'along' variable, within each combination
of the 'by' variables.
Arguments min and max can be used to specify
the permissible range for autocorrelation.
Argument s controls the size of the innovations.
Smaller values tend to give smoother estimates.
Argument sd_slope controls the slopes of
the lines. Larger values can give more steeply
sloped lines.
An object of class "bage_prior_linar".
When Lin_AR1() is being used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
The slopes have priors
and
Internally, Lin_AR1() derives a value for that
gives or a marginal
variance of . Parameter
has a half-normal prior
where a value for s is provided by the user.
Coefficient is constrained
to lie between min and max.
Its prior distribution is
where
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
The defaults for min and max are based on the
defaults for forecast::ets().
Lin_AR() Generalization of Lin_AR1()
Lin() Line with independent normal errors
AR1() AR1 process with no line
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
Lin_AR1() Lin_AR1(min = 0, s = 0.5, sd_slope = 2)Lin_AR1() Lin_AR1(min = 0, s = 0.5, sd_slope = 2)
Specify a model where the outcome is drawn from a binomial distribution.
mod_binom(formula, data, size)mod_binom(formula, data, size)
formula |
An R formula, specifying the outcome and predictors. |
data |
A data frame containing the outcome and predictor variables, and the number of trials. |
size |
Name of the variable giving the number of trials (with or without quote marks.) |
The model is hierarchical. The probabilities in the binomial distribution are described by a prior model formed from dimensions such as age, sex, and time. The terms for these dimension themselves have models, as described in priors. These priors all have defaults, which depend on the type of term (eg an intercept, an age main effect, or an age-time interaction.)
An object of class bage_mod.
The likelihood is
where
subscript identifies some combination of the the
classifying variables, such as age, sex, and time;
is a count, such of number of births,
such as age, sex, and region;
is a probability of 'success'; and
is the number of trials.
The probabilities are assumed to be drawn
a beta distribution
where
is the expected value for ; and
governs dispersion (ie variance.)
Expected value equals, on a logit scale,
the sum of terms formed from classifying variables,
where
is an intercept;
, , is a main effect
or interaction; and
is the element of associated with
cell .
The are given priors, as described in priors.
has an exponential prior with mean 1. Non-default
values for the mean can be specified with set_disp().
The model for
can also include covariates,
as described in set_covariates().
mod_pois() Specify Poisson model
mod_norm() Specify normal model
set_prior() Specify non-default prior for term
set_disp() Specify non-default prior for dispersion
fit() Fit a model
augment() Extract values for probabilities,
together with original data
components() Extract values for hyper-parameters
forecast() Forecast parameters and outcomes
report_sim() Check model using simulation study
replicate_data() Check model using replicate data
Mathematical Details Detailed descriptions of models
mod <- mod_binom(oneperson ~ age:region + age:year, data = nzl_households, size = total)mod <- mod_binom(oneperson ~ age:region + age:year, data = nzl_households, size = total)
Specify a model where the outcome is drawn from a normal distribution.
mod_norm(formula, data, weights = NULL)mod_norm(formula, data, weights = NULL)
formula |
An R formula, specifying the outcome and predictors. |
data |
A data frame containing outcome, predictor, and, optionally, weights variables. |
weights |
Name of the weights variable,
a |
The model is hierarchical. The means in the normal distribution are described by a prior model formed from dimensions such as age, sex, and time. The terms for these dimension themselves have models, as described in priors. These priors all have defaults, which depend on the type of term (eg an intercept, an age main effect, or an age-time interaction.)
An object of class bage_mod_norm.
Internally, mod_norm() scales the outcome variable
to have mean 0 and standard deviation 1, and
scales the weights to have mean 1.
This scaling allows mod_norm() to use the
same menu of priors as mod_pois() and mod_binom().
augment() always returns values on the
original scale, rather than the transformed scale.
components() by default returns values on
the transformed scale. But if original_scale is
TRUE, it returns some types of values on the
original scale. See components() for details.
There are three options for creating an unweighted model:
do not supply a value for the weights variable;
set weights equal to NULL; or
set weights equal
to
1, though this option is deprecated, and will
eventually be removed.
To create a weighted model, supply the name of
the weighting variable in data, quoted or unquoted.
The likelihood is
where
subscript identifies some combination of the
classifying variables, such as age, sex, and time,
is the value of the outcome variable,
is a weight.
In some applications, is set to 1
for all .
Internally, bage works with standardized
versions of and :
where
Mean parameter is modelled as
the sum of terms formed
from classifying variables and covariates,
where
is an intercept;
, , is a main effect
or interaction; and
is the element of associated with
cell ,
The are given priors, as described in priors.
has an exponential prior with mean 1. Non-default
values for the mean can be specified with set_disp().
The model for
can also include covariates,
as described in set_covariates().
mod_pois() Specify Poisson model
mod_binom() Specify binomial model
set_prior() Specify non-default prior for term
set_disp() Specify non-default prior for standard deviation
fit() Fit a model
augment() Extract values for means,
together with original data
components() Extract values for hyper-parameters
forecast() Forecast parameters and outcomes
report_sim() Check model using a simulation study
replicate_data() Check model using replicate data
data for a model
Mathematical Details Detailed description of models
## model without weights mod <- mod_norm(value ~ diag:age + year, data = nld_expenditure) ## model with weights nld_expenditure$wt <- sqrt(nld_expenditure$value) mod <- mod_norm(value ~ diag:age + year, data = nld_expenditure, weights = wt)## model without weights mod <- mod_norm(value ~ diag:age + year, data = nld_expenditure) ## model with weights nld_expenditure$wt <- sqrt(nld_expenditure$value) mod <- mod_norm(value ~ diag:age + year, data = nld_expenditure, weights = wt)
Specify a model where the outcome is drawn from a Poisson distribution.
mod_pois(formula, data, exposure)mod_pois(formula, data, exposure)
formula |
An R formula, specifying the outcome and predictors. |
data |
A data frame containing outcome, predictor, and, optionally, exposure variables. |
exposure |
Name of the exposure variable
or |
The model is hierarchical. The rates in the Poisson distribution are described by a prior model formed from dimensions such as age, sex, and time. The terms for these dimension themselves have models, as described in priors. These priors all have defaults, which depend on the type of term (eg an intercept, an age main effect, or an age-time interaction.)
An object of class bage_mod_pois.
The exposure argument can take three forms:
the name of a variable in data, with or without
quote marks, eg "population" or population;
NULL, in which case a pure "counts" model
with no exposure, is produced; or
the number
1, in which case a pure "counts" model
is also produced (though this option is deprecated,
and will eventially be removed).
The likelihood is
where
subscript identifies some combination of the
classifying variables, such as age, sex, and time;
is an outcome, such as deaths;
is rates; and
is exposure.
In some applications, there is no obvious population at risk.
In these cases, exposure can be set to 1
for all .
The rates are assumed to be drawn
a gamma distribution
where
is the expected value for ; and
governs dispersion (i.e. variation), with
lower values implying less dispersion.
Expected value equals, on the log scale,
the sum of terms formed from classifying variables,
where
is an intercept;
, , is a main effect
or interaction; and
is the element of associated with
cell .
The are given priors, as described in priors.
has an exponential prior with mean 1. Non-default
values for the mean can be specified with set_disp().
The model for
can also include covariates,
as described in set_covariates().
mod_binom() Specify binomial model
mod_norm() Specify normal model
set_prior() Specify non-default prior for term
set_disp() Specify non-default prior for dispersion
fit() Fit a model
augment() Extract values for rates,
together with original data
components() Extract values for hyper-parameters
forecast() Forecast parameters and outcomes
report_sim() Check model using a simulation study
replicate_data() Check model using replicate data
Mathematical Details Detailed description of models
## model with exposure mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) ## model without exposure mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = NULL)## model with exposure mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) ## model without exposure mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = NULL)
Use independent draws from a normal distribution to model a main effect or interaction. Typically used with variables other than age or time, such as region or ethnicity, where there is no natural ordering.
N(s = 1)N(s = 1)
s |
Scale for the standard deviation.
Default is |
Argument s controls the size of errors. Smaller values
for s tend to give more tightly clustered estimates.
An object of class "bage_prior_norm".
where is the main effect or interaction.
Parameter
has a half-normal prior
where s is provided by the user.
NFix() Similar to N() but standard deviation
parameter is supplied rather than estimated from data
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
N() N(s = 0.5)N() N(s = 0.5)
Get the value of n_draw for a model object.
n_draw controls the number
of posterior draws that are generated
by functions such as augment()
and components().
## S3 method for class 'bage_mod' n_draw(x)## S3 method for class 'bage_mod' n_draw(x)
x |
An object of class |
An integer
set_n_draw() Modify the value of n_draw
mod_pois(),mod_binom(),mod_norm() Create a model object
mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) n_draw(mod) mod <- mod |> set_n_draw(n_draw = 5000) n_draw(mod)mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) n_draw(mod) mod <- mod |> set_n_draw(n_draw = 5000) n_draw(mod)
Normal prior where, in contrast to N(), the
variance is treated as fixed and known. Typically
used for main effects or interactions where there
are too few elements to reliably estimate variance
from the available data.
NFix(sd = 1)NFix(sd = 1)
sd |
Standard deviation. Default is |
NFix() is the default prior for the intercept.
An object of class "bage_prior_normfixed".
where is the main effect or interaction,
and a value for sd is supplied by the user.
N() Similar to NFix(), but standard deviation
parameter is estimated from the data rather
than being fixed in advance
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
NFix() NFix(sd = 10)NFix() NFix(sd = 10)
Per capita health expenditure, in Euros, by diagnostic group, age group, and year, in the Netherlands.
nld_expenditurenld_expenditure
A tibble with 1,296 rows and the following columns:
diag Diagnostic group
age 5-year age groups, with open age
group of 85+
year 2003, 2005, 2007, and 2011
value Expenditures, in Euros
Calculated from data in table "Expenditure by disease, age and gender under the System of Health Accounts (SHA) Framework : Current health spending by age" from OECD database 'OECD.Stat' (downloaded on 25 May 2016) and in table "Historical population data and projections (1950-2050)" from OECD database 'OECD.Stat' (downloaded 5 June 2016).
datasets Overview of datasets in bage
Counts of divorces and population, by age, sex, and calendar year, in New Zealand, 2011-2021.
nzl_divorcesnzl_divorces
A tibble with 242 rows and the following columns:
age: 5-year age groups, "15-19" to "65+"
sex: "Female" or "Male"
time: Calendar year
divorces: Numbers of divorces during year
population: Person-years lived during year
Divorce counts from data in table "Age at divorces by sex (marriages and civil unions) (Annual-Dec)" in the online database Infoshare on the Statistics New Zealand website. Data downloaded on 22 March 2023. Population estimates derived from data in table "Estimated Resident Population by Age and Sex (1991+) (Annual-Dec)" in the online database Infoshare on the Statistics New Zealand website. Data downloaded on 26 March 2023.
datasets Overview of datasets in bage
Counts of people in one-person households, and counts of people living in any household, by age, region, and year.
nzl_householdsnzl_households
A tibble with 528 rows and the following columns:
age: 5-year age groups, with open age
group of 65+
region: Region within New Zealand
year: Calendar year
oneperson: Count of people living in one-person
households
total: Count of people living in all types of
household
Derived from data in table "Household composition by age group, for people in households in occupied private dwellings, 2006, 2013, and 2018 Censuses (RC, TA, DHB, SA2)" in the online database NZ.Stat, on the Statistics New Zealand website. Data downloaded on 3 January 2023.
datasets Overview of datasets in bage
Counts of fatal injuries in New Zealand, by age, sex, ethnicity, and year, plus estimates of the population at risk.
nzl_injuriesnzl_injuries
A tibble with 912 rows and the following columns:
age: 5-year age groups, up to age 55-59
sex: "Female" or "Male"
ethnicity: "Maori" or "Non Maori"
year: Calendar year
injuries: Count of injuries, randomly rounded to base 3
popn: Population on 30 June
Derived from data in tables "Estimated Resident Population by Age and Sex (1991+) (Annual-Jun)" and "Maori Ethnic Group Estimated Resident Population by Age and Sex (1991+) (Annual-Jun)" in the online database Infoshare, and table "Count of fatal and serious non-fatal injuries by sex, age group, ethnicity, cause, and severity of injury, 2000-2021" in the online database NZ.Stat, on the Statistics New Zealand website. Data downloaded on 1 January 2023.
datasets Overview of datasets in bage
After calling a function such as mod_pois() or
set_prior() it is good practice to print the
model object at the console, to check the model's
structure. The output from print() has
the following components:
A header giving the class of the model and noting whether the model has been fitted.
A formula giving the outcome variable and terms for the model.
A table giving the number of parameters, and
(fitted models only) the standard
deviation across those parameters,
a measure of the term's importance.
See priors() and tidy().
Values for other model settings. See set_disp(),
set_var_age(), set_var_sexgender(), set_var_time(),
set_n_draw()
Details on computations (fitted models only).
See computations().
## S3 method for class 'bage_mod' print(x, ...)## S3 method for class 'bage_mod' print(x, ...)
x |
Object of class |
... |
Unused. Included for generic consistency only. |
x, invisibly.
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
fit.bage_mod() and is_fitted() Model fitting
augment() Extract values for rates,
probabilities, or means, together
with original data
components() Extract values for hyper-parameters
dispersion() Extract values for dispersion
priors Overview of priors for model terms
tidy.bage_mod() Number of parameters, and standard deviations
set_disp() Dispersion
set_var_age(), set_var_sexgender(), set_var_time()
Age, sex/gender and time variables
set_n_draw() Model draws
mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## print unfitted model mod mod <- fit(mod) ## print fitted model modmod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) ## print unfitted model mod mod <- fit(mod) ## print fitted model mod
The models created with mod_pois(), mod_binom(),
and mod_norm() include terms such as age effects
and region-time interactions. Each of these terms
requires a prior distribution. Current options
for these priors are summarised in the table below.
| Prior | Description | Uses | Forecast | Along/By |
N() |
Elements drawn from normal distribution | Term with no natural order | Yes | No |
NFix() |
N() with standard deviation fixed |
Term with few elements | Yes | No |
Known() |
Values treated as known | Simulations, prior knowledge | No | No |
RW() |
Random walk | Smoothing | Yes | Yes |
RW2() |
Second-order random walk | Like RW(), but with trends |
Yes | Yes |
DRW() |
Damped random walk | Smoothing, forecasting | Yes | Yes |
DRW2() |
Damped second-order random walk | Like DRW(), but with trends |
Yes | Yes |
RW2_AR() |
RW2() with AR errors |
Terms involving time, forecasting | Yes | Yes |
RW2_AR1() |
RW2() with AR1 errors |
Terms involving time, forecasting | Yes | Yes |
RW2_Infant() |
RW2() with infant indicator |
Mortality age profiles | No | Yes |
RW_Seas() |
RW(), with seasonal effect |
Terms involving time | Yes | Yes |
RW2_Seas() |
RW2(), with seasonal effect |
Term involving time | Yes | Yes |
AR() |
Auto-regressive prior of order k | Mean reversion, forecasting | Yes | Yes |
AR1() |
Special case of AR() |
Mean reversion, forecasting | Yes | Yes |
Lin() |
Linear trend, with independent errors | Parsimonious model for time | Yes | Yes |
Lin_AR() |
Linear trend, with AR errors | Term involving time, forecasting | Yes | Yes |
Lin_AR1() |
Linear trend, with AR1 errors | Terms involving time, forecasting | Yes | Yes |
Sp() |
P-Spline (penalised spline) | Smoothing, eg over age | No | Yes |
SVD() |
Age-sex profile based on SVD | Age or age-sex | No | No |
SVD_AR() |
SVD(), but coefficients follow AR() |
Age or age-sex and time | Yes | Yes |
SVD_AR1() |
SVD(), but coefficients follow AR1() |
Age or age-sex and time | Yes | Yes |
SVD_Lin() |
SVD(), but coefficients follow Lin() |
Age or age-sex and time | Yes | Yes |
SVD_RW() |
SVD(), but coefficients follow RW() |
Age or age-sex and time | Yes | Yes |
SVD_RW2() |
SVD(), but coefficients follow RW2() |
Age or age-sex and time | Yes | Yes |
SVD_DRW() |
SVD(), but coefficients follow DRW() |
Age or age-sex and time | Yes | Yes |
SVD_DRW2() |
SVD(), but coefficients follow DRW2() |
Age or age-sex and time | Yes | Yes |
Priors for interaction terms often consist of a time-series-style model along one dimension, with a separate series for each combination of the remaining dimensions. For instance, a prior for an age-sex-time interaction might consist of a separate random walk along time for each combination of age-group and sex. In bage the dimension with the time-series-type model is referred to as the 'along' dimension, and the remaining dimensions are referred to as the 'by' dimensions.
If no prior is specified for a term, then bage assigns the term a default prior using the following algorithm:
if the term has one or two elements, use NFix();
otherwise, if the term involves time, use RW(), with time as the 'along' dimension;
otherwise, if the term involves age, use RW(), with age as the 'along' dimension;
otherwise, use N().
A model can only be used for forecasting if
the model includes a time dimension, and
the prior for the time dimension supports forecasting.
If necessary, the time dimension can be identified
using set_var_time(). The table above lists the
priors that support forecasting.
Deaths and exposure in Portugal, by age, sex, and year.
prt_deathsprt_deaths
A tibble with 3,168 rows and the following columns:
age: Age groups "0", "1-4", "5-9", ..., "95-99", "100+"
sex: "Female" or "Male"
time: Calendar year
deaths: Count of deaths
exposure: Person-years lived by population
The data are from the Human Mortality Database. Deaths are rounded to the nearest integer. More recent versions, and a comprehensive description of the data, are available at the HMD website.
Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at https://www.mortality.org. (data downloaded on 17 July 2018).
datasets Overview of datasets in bage
Use a fitted model to create replicate datasets, typically as a way of checking a model.
replicate_data(x, condition_on = NULL, n = 19)replicate_data(x, condition_on = NULL, n = 19)
x |
A fitted model, typically created by
calling |
condition_on |
Parameters to condition
on. Either |
n |
Number of replicate datasets to create. Default is 19. |
Use n draws from the posterior distribution
for model parameters to generate n simulated datasets.
If the model is working well, these simulated
datasets should look similar to the actual dataset.
A tibble with the following structure:
.replicate |
data |
"Original" |
Original data supplied to mod_pois(), mod_binom(), mod_norm() |
"Replicate 1" |
Simulated data. |
"Replicate 2" |
Simulated data. |
| ... | ... |
"Replicate <n>" |
Simulated data. |
condition_on argumentWith Poisson and binomial models that include dispersion terms (which is the default), there are two options for constructing replicate data.
When condition_on is "fitted",
the replicate data is created by (i) drawing values
from the posterior distribution for rates or probabilities
(the defined in mod_pois()
and mod_binom()), and (ii) conditional on these
rates or probabilities, drawing values for the
outcome variable.
When condition_on is "expected",
the replicate data is created by (i) drawing
values from hyper-parameters governing
the rates or probabilities
(the and defined
in mod_pois() and mod_binom()),
then (ii) conditional on these hyper-parameters,
drawing values for the rates or probabilities,
and finally (iii) conditional on these
rates or probabilities, drawing values for the
outcome variable. The '"expected" option
is only possible in Poisson and binomial models,
and only when dispersion is non-zero.
The default for condition_on is "expected",
in cases where it is feasible.
The "expected" option
provides a more severe test for
a model than the "fitted" option,
since "fitted" values are weighted averages
of the "expected" values and the original
data.
If a data model has been provided for
the outcome variable, then creation of replicate
data will include a step where errors are added
to outcomes. For instance, the a rr3
data model is used, then replicate_data() rounds
the outcomes to base 3.
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
mod_norm() Specify a normal model
fit() Fit model.
augment() Extract values for rates,
probabilities, or means, together
with original data
components() Extract values for hyper-parameters
dispersion() Extract values for dispersion
forecast() Forecast, based on a model
report_sim() Simulation study of model.
Mathematical Details vignette
mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = 1) |> fit() rep_data <- mod |> replicate_data() library(dplyr) rep_data |> group_by(.replicate) |> count(wt = injuries) ## when the overall model includes an rr3 data model, ## replicate data are rounded to base 3 mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> set_datamod_outcome_rr3() |> fit() |> replicate_data()mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = 1) |> fit() rep_data <- mod |> replicate_data() library(dplyr) rep_data |> group_by(.replicate) |> count(wt = injuries) ## when the overall model includes an rr3 data model, ## replicate data are rounded to base 3 mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> set_datamod_outcome_rr3() |> fit() |> replicate_data()
Use simulated data to assess the performance of an estimation model.
report_sim( mod_est, mod_sim = NULL, method = c("standard", "inner-outer"), vars_inner = NULL, n_sim = 100, point_est_fun = c("median", "mean"), widths = c(0.5, 0.95), report_type = c("short", "long", "full"), n_core = 1 )report_sim( mod_est, mod_sim = NULL, method = c("standard", "inner-outer"), vars_inner = NULL, n_sim = 100, point_est_fun = c("median", "mean"), widths = c(0.5, 0.95), report_type = c("short", "long", "full"), n_core = 1 )
mod_est |
The model whose performance is being
assessed. An object of class |
mod_sim |
The model used to generate the simulated
data. If no value is supplied, |
method |
Estimation method used for |
vars_inner |
Variables used in inner model
with |
n_sim |
Number of sets of simulated data to use. Default is 100. |
point_est_fun |
Name of the function to use
to calculate point estimates. The options are |
widths |
Widths of credible intervals.
A vector of values in the interval |
report_type |
Amount of detail in return value.
Options are |
n_core |
Number of cores to use for parallel
processing. If |
A named list with a tibble called "components" and a
tibble called "augment".
The interface for report_sim() is still under development
and may change in future.
mod_pois() Specify binomial model
mod_binom() Specify binomial model
mod_norm() Specify normal model
set_prior() Specify non-default prior for term
set_disp() Specify non-default prior for dispersion
fit() Fit a model
replicate_data() Generate replicate
data for a model
## results random, so set seed set.seed(0) ## make data - outcome variable (deaths here) ## needs to be present, but is not used data <- data.frame(region = c("A", "B", "C", "D", "E"), population = c(100, 200, 300, 400, 500), deaths = NA) ## simulation with estimation model same as ## data-generating model mod_est <- mod_pois(deaths ~ region, data = data, exposure = population) |> set_prior(`(Intercept)` ~ Known(0)) report_sim(mod_est = mod_est, n_sim = 10) ## in practice should use larger value ## simulation with estimation model different ## from data-generating model mod_sim <- mod_est |> set_prior(region ~ N(s = 2)) report_sim(mod_est = mod_est, mod_sim = mod_sim, n_sim = 10)## results random, so set seed set.seed(0) ## make data - outcome variable (deaths here) ## needs to be present, but is not used data <- data.frame(region = c("A", "B", "C", "D", "E"), population = c(100, 200, 300, 400, 500), deaths = NA) ## simulation with estimation model same as ## data-generating model mod_est <- mod_pois(deaths ~ region, data = data, exposure = population) |> set_prior(`(Intercept)` ~ Known(0)) report_sim(mod_est = mod_est, n_sim = 10) ## in practice should use larger value ## simulation with estimation model different ## from data-generating model mod_sim <- mod_est |> set_prior(region ~ N(s = 2)) report_sim(mod_est = mod_est, mod_sim = mod_sim, n_sim = 10)
Use a random walk as a model for a main effect, or use multiple random walks as a model for an interaction. Typically used with terms that involve age or time.
RW(s = 1, sd = 1, along = NULL, con = c("none", "by"))RW(s = 1, sd = 1, along = NULL, con = c("none", "by"))
s |
Scale for the prior for the innovations.
Default is |
sd |
Standard deviation
of initial value. Default is |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If RW() is used with an interaction,
a separate random walk is constructed
within each combination of the
'by' variables.
Argument s controls the size of innovations.
Smaller values for s tend to produce smoother series.
Argument sd controls variance in
initial values. Setting sd to 0 fixes initial
values at 0.
An object of class "bage_prior_rwrandom"
or "bage_prior_rwzero".
When RW() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'along' variable of the interaction; and
denotes position within the 'by' variable(s) of the interaction.
Parameter
has a half-normal prior
where s is provided by the user.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW_Seas() Random walk with seasonal effect
RW2() Second-order random walk
AR() Autoregressive with order k
AR1() Autoregressive with order 1
Sp() Smoothing via splines
SVD() Smoothing over age using singular value decomposition
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
RW() RW(s = 0.5) RW(sd = 0) RW(along = "cohort")RW() RW(s = 0.5) RW(sd = 0) RW(along = "cohort")
Use a random walk with seasonal effects as a model for a main effect, or use multiple random walks, each with their own seasonal effects, as a model for an interaction. Typically used with terms that involve time.
RW_Seas( n_seas, s_seas, s = 1, sd = 1, sd_seas = 1, along = NULL, con = c("none", "by") )RW_Seas( n_seas, s_seas, s = 1, sd = 1, sd_seas = 1, along = NULL, con = c("none", "by") )
n_seas |
Number of seasons |
s_seas |
Scale for innovations
in seasonal effects. Can be |
s |
Scale for prior for innovations in
random walk. Default is |
sd |
Standard deviation
of initial value. Default is |
sd_seas |
Standard deviation for
initial values of seasonal effects.
Default is |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If RW_Seas() is used with an interaction,
a separate series is constructed
within each combination of the
'by' variables.
Argument s controls the size of innovations in the random walk.
Smaller values for s tend to produce smoother series.
Argument sd controls variance in
initial values of the random walk. sd can be 0.
Argument n_seas controls the number of seasons.
When using quarterly data, for instance,
n_seas should be 4.
Setting s_seas to 0 produces
seasonal effects that are the same
each year. Setting s_seas to a value
greater than 0 produces seasonal effects
that evolve over time.
Object of class
"bage_prior_rwrandomseasvary",
"bage_prior_rwrandomseasfix",
"bage_prior_rwzeroseasvary", or
"bage_prior_rwzeroseasfix".
When RW_Seas() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
or is an element of the random walk;
or is an element of the seasonal effect;
denotes position within the main effect;
denotes position within the 'along' variable of the interaction; and
denotes position within the 'by' variable(s) of the interaction.
Parameter has a half-normal prior
If s_seas is set to 0, then is 0,
and seasonal effects are time-invariant.
Parameter has a half-normal prior
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW() Random walk without seasonal effect
RW2_Seas() Second-order random walk with seasonal effect
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
## seasonal effects fixed RW_Seas(n_seas = 4, s_seas = 0) ## seasonal effects evolve RW_Seas(n_seas = 4, s_seas = 1) ## first term in random walk fixed at 0 RW_Seas(n_seas = 4, s_seas = 1, sd = 0)## seasonal effects fixed RW_Seas(n_seas = 4, s_seas = 0) ## seasonal effects evolve RW_Seas(n_seas = 4, s_seas = 1) ## first term in random walk fixed at 0 RW_Seas(n_seas = 4, s_seas = 1, sd = 0)
Use a second-order random walk as a model for a main effect, or use multiple second-order random walks as a model for an interaction. A second-order random walk is a random walk with drift where the drift term varies. It is typically used with terms that involve age or time, where there are sustained trends upward or downward.
RW2(s = 1, sd = 1, sd_slope = 1, along = NULL, con = c("none", "by"))RW2(s = 1, sd = 1, sd_slope = 1, along = NULL, con = c("none", "by"))
s |
Scale for the prior for the innovations.
Default is |
sd |
Standard deviation
of initial value. Default is |
sd_slope |
Standard deviation
of initial slope. Default is |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If RW2() is used with an interaction,
a separate random walk is constructed
within each combination of the
'by' variables.
Argument s controls the size of innovations.
Smaller values for s tend to give smoother series.
Argument sd controls variance in
initial values. Setting sd to 0 fixes
initial values at 0.
Argument sd_slope controls variance in the
initial slope.
An object of class "bage_prior_rw2random"
or "bage_prior_rw2zero".
When RW2() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'along' variable of the interaction; and
denotes position within the 'by' variable(s) of the interaction.
Parameter
has a half-normal prior
.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW() Random walk
RW2_Seas() Second order random walk with seasonal effect
AR() Autoregressive with order k
AR1() Autoregressive with order 1
Sp() Smoothing via splines
SVD() Smoothing over age via singular value decomposition
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
RW2() RW2(s = 0.5)RW2() RW2(s = 0.5)
Use one or more second-order random walks, combined with an autoregressive error term, to model a main effect or an interaction. Typically used with time.
RW2_AR( s_rw = 1, sd = 1, sd_slope = 1, n_coef = 2, s_ar = 1, shape1 = 5, shape2 = 5, along = NULL, con = c("none", "by") )RW2_AR( s_rw = 1, sd = 1, sd_slope = 1, n_coef = 2, s_ar = 1, shape1 = 5, shape2 = 5, along = NULL, con = c("none", "by") )
s_rw |
Scale for the innovations in the
RW2 process. Default is |
sd |
Standard deviation for initial term
in RW2 process. Default is |
sd_slope |
Standard deviation in the prior for the initial slope of the RW2 process. Larger values imply steeper slopes. Default is 1. |
n_coef |
Number of lagged terms in the
model, ie the order of the model. Default is |
s_ar |
Scale for the innovations in the
AR process. Default is |
shape1, shape2
|
Parameters for beta-distribution prior
for coefficients. Defaults are |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If RW2_AR() is used with an interaction,
separate random walks are constructed along
the 'along' variable, within each combination
of the 'by' variables.
Parameters controlling the RW2 process:
s_rw
sd
sd_slope
Parameters controlling the AR process:
n_coef
s_ar
shape1
shape2
An object of class "bage_prior_rw2randomar"
or "bage_prior_rw2zeroar".
When RW2_AR() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
The parameter in the random walk has prior
Internally, RW2_AR() derives a value for that
gives or a marginal
variance of . Parameter
has a half-normal prior
The correlation coefficients
each have prior
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW2_AR1() Special case of RW2_AR()
RW2() Second-order random walk
AR() AR process
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
RW2_AR() RW2_AR(sd_slope = 2, n_coef = 3, s_ar = 0.5)RW2_AR() RW2_AR(sd_slope = 2, n_coef = 3, s_ar = 0.5)
Use one or more second-order random walks, combined with an AR1 error term, to model a main effect or an interaction. Typically used with time.
RW2_AR1( s_rw = 1, sd = 1, sd_slope = 1, s_ar = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )RW2_AR1( s_rw = 1, sd = 1, sd_slope = 1, s_ar = 1, shape1 = 5, shape2 = 5, min = 0.8, max = 0.98, along = NULL, con = c("none", "by") )
s_rw |
Scale for the innovations in the
RW2 process. Default is |
sd |
Standard deviation for initial term
in RW2 process. Default is |
sd_slope |
Standard deviation in the prior for the initial slope of RW2 process. Larger values imply steeper slopes. Default is 1. |
s_ar |
Scale for the innovations in the
AR1 process. Default is |
shape1, shape2
|
Parameters for beta-distribution prior
for coefficients. Defaults are |
min, max
|
Minimum and maximum values
for autocorrelation coefficient in AR1 process.
Defaults are |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If RW2_AR1() is used with an interaction,
separate random walks are constructed along
the 'along' variable, within each combination
of the 'by' variables.
Parameters controlling the RW2 process:
s_rw
sd
sd_slope
Parameters controlling the AR1 process:
s_ar
shape1
shape2
min
max
An object of class "bage_prior_rw2randomar"
or "bage_prior_rw2zeroar".
When RW2_AR1() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
denotes position within the main effect;
denotes position within the 'by' variable(s) of the interaction; and
denotes position within the 'along' variable of the interaction.
The parameter in the random walk has prior
Internally, RW2_AR() derives a value for that
gives or a marginal
variance of . Parameter
has a half-normal prior
Coefficient is constrained
to lie between min and max.
Its prior distribution is
where
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW2_AR() Generalization of RW2_AR1()
Lin_AR1() Sepcial case of RW2_AR1()
RW2() Second-order random walk
AR1() AR1 process
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
RW2_AR1() RW2_AR1(sd_slope = 2, s_ar = 0.5)RW2_AR1() RW2_AR1(sd_slope = 2, s_ar = 0.5)
Use a second-order random walk to model variation over age, with an indicator variable for the first age group. Designed for use in models of mortality rates.
RW2_Infant(s = 1, sd_slope = 1, con = c("none", "by"))RW2_Infant(s = 1, sd_slope = 1, con = c("none", "by"))
s |
Scale for the prior for the innovations.
Default is |
sd_slope |
Standard deviation
for initial slope of random walk. Default is |
con |
Constraints on parameters.
Current choices are |
A second-order random walk prior RW2()
works well for smoothing
mortality rates over age, except at age 0, where there
is a sudden jump in rates, reflecting the
special risks of infancy. The RW2_Infant()
extends the RW2() prior by adding an indicator
variable for the first age group.
If RW2_Infant() is used in an interaction,
the 'along' dimension is always age, implying that
there is a separate random walk along age within each
combination of the 'by' variables.
Argument s controls the size of innovations in the random walk.
Smaller values for s tend to give smoother series.
Argument sd controls the sl size of innovations in the random walk.
Smaller values for s tend to give smoother series.
Object of class "bage_prior_rw2infant".
When RW2_Infant() is used with a main effect,
and when it is used with an interaction,
where
is a main effect or interaction;
denotes position within the main effect;
denotes position within the 'along' variable of the interaction; and
denotes position within the 'by' variable(s) of the interaction.
Parameter has a half-normal prior
.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW2() Second-order random walk, without infant indicator
Sp() Smoothing via splines
SVD() Smoothing over age via singular value decomposition
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
RW2_Infant() RW2_Infant(s = 0.1)RW2_Infant() RW2_Infant(s = 0.1)
Use a second-oder random walk with seasonal effects as a model for a main effect, or use multiple second-order random walks, each with their own seasonal effects, as a model for an interaction. Typically used with temrs that involve time.
RW2_Seas( n_seas, s_seas, s = 1, sd = 1, sd_slope = 1, sd_seas = 1, along = NULL, con = c("none", "by") )RW2_Seas( n_seas, s_seas, s = 1, sd = 1, sd_slope = 1, sd_seas = 1, along = NULL, con = c("none", "by") )
n_seas |
Number of seasons |
s_seas |
Scale for innovations
in seasonal effects. Can be |
s |
Scale for prior for innovations in
random walk. Default is |
sd |
Standard deviation
of initial value. Default is |
sd_slope |
Standard deviation
for initial slope of random walk. Default is |
sd_seas |
Standard deviation for
initial values of seasonal effects.
Default is |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If RW2_Seas() is used with an interaction,
a separate series is constructed within each
combination of the 'by' variables.
Argument s controls the size of innovations in the random walk.
Smaller values for s tend to produce smoother series.
Argument n_seas controls the number of seasons.
When using quarterly data, for instance,
n_seas should be 4.
Setting s_seas to 0 produces
seasonal effects that are the same
each year. Setting s_seas to a value
greater than 0 produces seasonal effects
that evolve over time.
Object of class
"bage_prior_rw2randomseasvary",
"bage_prior_rw2randomseasfix",
"bage_prior_rw2zeroseasvary", or
"bage_prior_rw2zeroseasfix".
When RW2_Seas() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction;
or is an element of the random walk;
or is an element of the seasonal effect;
denotes position within the main effect;
denotes position within the 'along' variable of the interaction; and
denotes position within the 'by' variable(s) of the interaction.
Parameter has a half-normal prior
.
If s_seas is set to 0, then is 0,
and the seasonal effects are fixed over time.
Parameter has a half-normal prior
.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
RW2() Second-order random walk without seasonal effect
RW_Seas() Random walk with seasonal effect
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
Mathematical Details vignette
## seasonal effects fixed RW2_Seas(n_seas = 4, s_seas = 0) ## seasonal effects evolve RW2_Seas(n_seas = 4, s_seas = 1) ## first term in random walk fixed at 0 RW2_Seas(n_seas = 4, s_seas = 1, sd = 0)## seasonal effects fixed RW2_Seas(n_seas = 4, s_seas = 0) ## seasonal effects evolve RW2_Seas(n_seas = 4, s_seas = 1) ## first term in random walk fixed at 0 RW2_Seas(n_seas = 4, s_seas = 1, sd = 0)
Specify a confidentialization procedure where the outcome variable is randomly rounded to a multiple of 3.
set_confidential_rr3(mod)set_confidential_rr3(mod)
mod |
An object of class |
set_confidential_rr3() can only be used with
Poisson and binomial models (created with
mod_pois() and mod_binom().)
Random rounding to base 3 (RR3) is a confidentialization
technique that is sometimes applied by statistical
agencies. The procedure for randomly-rounding
an integer value is as follows:
If is divisible by 3, leave it unchanged
If dividing by 3 leaves a remainder of 1, then
round down (subtract 1) with probability 2/3,
and round up (add 2) with probability 1/3.
If dividing by 3 leaves a remainder of 1,
then round down (subtract 2)
with probability 1/3, and round up (add 1)
with probability 2/3.
If set_confidential_rr3() is applied to
a fitted model, set_confidential_rr3()
unfits
the model, deleting existing estimates.
A revised version of mod.
confidential Overview of confidentialization procedures currently modeled in bage
mod_pois(), mod_binom(), mod_norm() Specify a
model for rates, probabilities, or means
## 'injuries' variable in 'nzl_injuries' dataset ## has been randomly rounded to base 3 mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> set_confidential_rr3() |> fit()## 'injuries' variable in 'nzl_injuries' dataset ## has been randomly rounded to base 3 mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> set_confidential_rr3() |> fit()
Add covariates to a model.
set_covariates(mod, formula)set_covariates(mod, formula)
mod |
An object of class |
formula |
A one-sided R formula, specifying the covariates. |
If set_covariates() is applied to
a model that already has covariates,
set_covariates() deletes the
existing covariates.
If set_covariates() is applied to
a fitted model, set_covariates() unfits
the model, deleting existing estimates.
A modified version of mod
All variables contained in the formula
argument to set_covariates() should be in the
dataset supplied in the original call to
mod_pois(), mod_binom(), or mod_norm().
set_covariates() processes the covariate data before
adding it to the model:
All numeric variables are standardized, using
x <- scale(x).
Categorical variables are converted to sets of indicator
variables, using treatment contrasts.
For instance, variable x with categories
"high", "medium", and "low",
is converted into two indicator variables, one called xmedium and one
called xlow.
When a model includes covariates, the quantity
is added to the linear predictor, where
is a matrix of standardized covariates, and
is a vector of coefficients. The elements of
have prior
.
mod_pois(), mod_binom(), mod_norm() Specify a
model for rates, probabilities, or means
## create a COVID covariate library(dplyr, warn.conflicts = FALSE) births <- kor_births |> mutate(is_covid = time %in% 2020:2022) mod <- mod_pois(births ~ age * region + time, data = births, exposure = popn) |> set_covariates(~ is_covid) mod## create a COVID covariate library(dplyr, warn.conflicts = FALSE) births <- kor_births |> mutate(is_covid = time %in% 2020:2022) mod <- mod_pois(births ~ age * region + time, data = births, exposure = popn) |> set_covariates(~ is_covid) mod
Specify a data model for the exposure variable in a Poisson model. The data model assumes that, within each cell, observed exposure is drawn from an Inverse-Gamma distribution. In this model,
E[ expected exposure | true exposure ] = true exposure
and
sd[ expected exposure | true exposure ] = cv true exposure
where cv is a coefficient of variation parameter.
set_datamod_exposure(mod, cv)set_datamod_exposure(mod, cv)
mod |
An object of class |
cv |
Coefficient of variation
for measurement errors in exposure.
A single number, or a data frame
with a variable called |
In the exposure data model, cv, the coefficient
of variation, does not depend on
true exposure. This implies that
errors do not fall, in relative terms,
as population rises. Unlike sampling errors,
measurement errors do not get averaged away
in large populations.
The exposure data model assumes that the exposure variable
is unbiased. If there is in fact evidence
of biases, then this evidence should be
used to create a de-biased version of the
variable (eg one where estimated biases
have been subtracted) to supply to
mod_pois().
set_datamod_exposure() can only be used
with a Poisson model for rates in which
the dispersion in the rates has been set to zero.
The dispersion in the rates can be set
explicitly to zero using set_disp(),
though set_datamod_exposure() will also
do so.
A revised version of mod.
cv argumentcv can be a single number, in which
case the same value is used for all cells.
cv can also be a data frame with a
with a variable called "cv" and
one or more columns with 'by' variables.
For instance, a cv of
data.frame(sex = c("Female", "Male"),
cv = c(0.01, 0.012))
implies that the coefficient of variation is 0.01 for females and 0.012 for males.
See below for an example where the coefficient of variation is based on aggregated age groups.
The model for observed exposure is
where
is observed exposure for cell
(the exposure argument to mod_pois());
is true exposure for cell ; and
is the value for dispersion
that is applied to cell .
cv is .
mod_pois() Specify a Poisson model
set_disp() Specify dispersion of rates
augment() Original data plus estimated values,
including estimates of true value for exposure
datamods Data models implemented in bage
confidential Confidentialization
procedures modeled in bage
Mathematical Details vignette
## specify model mod <- mod_pois(injuries ~ age * sex + year, data = nzl_injuries, exposure = popn) |> set_disp(mean = 0) |> set_datamod_exposure(cv = 0.025) ## fit the model mod <- mod |> fit() mod ## examine results - note the new variable ## '.popn' with estimates of the true ## population aug <- mod |> augment() ## allow different cv's for each sex cv_sex <- data.frame(sex = c("Female", "Male"), cv = c(0.03, 0.02)) mod <- mod |> set_datamod_exposure(cv = cv_sex) mod ## our outcome variable is confidentialized, ## so we recognize that in the model too mod <- mod |> set_confidential_rr3() mod ## now a model where everyone aged 0-49 ## receives one value for cv, and ## everyone aged 50+ receives another library(poputils) ## for 'age_upper()' library(dplyr, warn.conflicts = FALSE) nzl_injuries_age <- nzl_injuries |> mutate(age_group = if_else(age_upper(age) < 50, "0-49", "50+")) cv_age <- data.frame(age_group = c("0-49", "50+"), cv = c(0.05, 0.01)) mod <- mod_pois(injuries ~ age * sex + year, data = nzl_injuries_age, exposure = popn) |> set_disp(mean = 0) |> set_datamod_exposure(cv = cv_age)## specify model mod <- mod_pois(injuries ~ age * sex + year, data = nzl_injuries, exposure = popn) |> set_disp(mean = 0) |> set_datamod_exposure(cv = 0.025) ## fit the model mod <- mod |> fit() mod ## examine results - note the new variable ## '.popn' with estimates of the true ## population aug <- mod |> augment() ## allow different cv's for each sex cv_sex <- data.frame(sex = c("Female", "Male"), cv = c(0.03, 0.02)) mod <- mod |> set_datamod_exposure(cv = cv_sex) mod ## our outcome variable is confidentialized, ## so we recognize that in the model too mod <- mod |> set_confidential_rr3() mod ## now a model where everyone aged 0-49 ## receives one value for cv, and ## everyone aged 50+ receives another library(poputils) ## for 'age_upper()' library(dplyr, warn.conflicts = FALSE) nzl_injuries_age <- nzl_injuries |> mutate(age_group = if_else(age_upper(age) < 50, "0-49", "50+")) cv_age <- data.frame(age_group = c("0-49", "50+"), cv = c(0.05, 0.01)) mod <- mod_pois(injuries ~ age * sex + year, data = nzl_injuries_age, exposure = popn) |> set_disp(mean = 0) |> set_datamod_exposure(cv = cv_age)
Specify a data model for the outcome in a Poisson model, where the outcome is subject to undercount and overcount.
set_datamod_miscount(mod, prob, rate)set_datamod_miscount(mod, prob, rate)
mod |
An object of class |
prob |
The prior for the probability
that a person or event in the target
population will correctly enumerated.
A data frame with a variable
called |
rate |
The prior for the overcoverage rate.
A data frame with a variable
called |
The miscount data model is essentially a combination of the undercount and overcount data models. It assumes that reported outcome is the sum of two quantities:
Units from target population, undercounted People or events belonging to the target population, in which each unit's inclusion probability is less than 1.
Overcount People or events that do not belong to target population, or that are counted more than once.
If, for instance, a census enumerates 91 people from a true population of 100, but also mistakenly enumerates a further 6 people, then
the true value for the outcome variable is 100
the value for the undercounted target population is 91,
the value for the overcount is 6, and
the observed value for the outcome variable is 91 + 6 = 97.
A revised version of mod.
prob argumentThe prob argument specifies a prior
distribution for the probability
that a person or event in the target
population is included in the
reported outcome. prob is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a prob of
data.frame(sex = c("Female", "Male"),
mean = c(0.95, 0.92),
disp = c(0.02, 0.015))
implies that the expected value for the inclusion probability is 0.95 for females and 0.92 for males, with slightly more uncertainty for females than for males.
rate argumentThe rate argument specifies a prior
distribution for the overcoverage
rate. rate is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a rate of
data.frame(mean = 0.03, disp = 0.1)
implies that the expected value for the overcoverage rate is 0.03, with a dispersion of 0.1. Since no 'by' variables are included, the same mean and dispersion values are applied to all cells.
The model for the observed outcome is
where
is the observed outcome for cell ;
is the true outcome for cell ;
is the rate for cell ;
is exposure for cell ;
is the probability that a member of the
target population in cell is correctly enumerated in that cell;
is the overcoverage rate for cell ;
is the expected value for
(specified via prob);
is disperson for (specified via prob);
is the expected value for
(specified via rate); and
is disperson for (specified via rate).
mod_pois() Specify a Poisson model
augment() Original data plus estimated values,
including estimates of true value for
the outcome variable
components() Estimated values for
model parameters, including inclusion
probabilities and overcount rates
set_datamod_undercount() An undercount-only
data model
set_datamod_overcount() An overcount-only
data model
datamods All data models implemented in bage
confidential Confidentialization
procedures modeled in bage
Mathematical Details vignette
## specify 'prob' and 'rate' prob <- data.frame(sex = c("Female", "Male"), mean = c(0.95, 0.97), disp = c(0.05, 0.05)) rate <- data.frame(mean = 0.03, disp = 0.15) ## specify model mod <- mod_pois(divorces ~ age * sex + time, data = nzl_divorces, exposure = population) |> set_datamod_miscount(prob = prob, rate = rate) mod ## fit model mod <- mod |> fit() mod ## original data, plus imputed values for outcome mod |> augment() ## parameter estimates library(dplyr) mod |> components() |> filter(term == "datamod") ## the data have in fact been confidentialized, ## so we account for that, in addition ## to accounting for undercoverage and ## overcoverage mod <- mod |> set_confidential_rr3() |> fit() mod## specify 'prob' and 'rate' prob <- data.frame(sex = c("Female", "Male"), mean = c(0.95, 0.97), disp = c(0.05, 0.05)) rate <- data.frame(mean = 0.03, disp = 0.15) ## specify model mod <- mod_pois(divorces ~ age * sex + time, data = nzl_divorces, exposure = population) |> set_datamod_miscount(prob = prob, rate = rate) mod ## fit model mod <- mod |> fit() mod ## original data, plus imputed values for outcome mod |> augment() ## parameter estimates library(dplyr) mod |> components() |> filter(term == "datamod") ## the data have in fact been confidentialized, ## so we account for that, in addition ## to accounting for undercoverage and ## overcoverage mod <- mod |> set_confidential_rr3() |> fit() mod
Specify a data model in which
observed outcome = true outcome + error,
where the error has a symmetric distribution with mean 0.
If the true outcome has a normal distribution, then the error has a normal distribution. If the true outcome has a Poisson distribution, then the error has a symmetric Skellam distribution.
set_datamod_noise(mod, sd)set_datamod_noise(mod, sd)
mod |
An object of class |
sd |
Standard deviation of measurement errors. A single number, or a data frame with 'by' variables. |
The model assumes that the outcome variable
is unbiased. If there is in fact evidence
of biases, then this evidence should be
used to create a de-biased version of the
outcome variable in data, and this de-biased
version should be used by mod_norm() or
mod_pois().
If set_datamod_noise() is used with a Poisson
model, then the dispersion term for
the Poisson rates must be set to zero.
This can be done using set_disp(),
though set_datamod_noise() will also
do so.
A revised version of mod.
The Skellam distribution is restricted to integers, but can take positive and negative values.
If
then
has a distribution.
If , then the distribution
is symmetric.
sd argumentsd can be a single number, in which
case the same standard deviation
is used for all cells.
sd can also be a data frame with a
with a variable called "sd" and
one or more columns with 'by' variables.
For instance, a sd of
data.frame(sex = c("Female", "Male"),
sd = c(330, 240))
implies that measurement errors have standard deviation 330 for females and 240 for males.
The model for the observed outcome is
with
if has a normal distribution, and
if has a Poisson distribution, where
is the observed outcome for cell ;
is the true outcome for cell ;
is the measurement error for cell ; and
is the standard deviation of
the measurement error for cell .
mod_norm() Specify a normal model
mod_pois() Specify a Poisson model
augment() Original data plus estimated values,
including estimates of true value for outcome
datamods Data models implemented in bage
Mathematical Details vignette
## Normal model ------------------------------ ## prepare outcome variable library(dplyr, warn.conflicts = FALSE) spend <- nld_expenditure |> mutate(log_spend = log(value + 1)) ## specify model mod <- mod_norm(log_spend ~ age * diag + year, data = spend, weights = 1) |> set_datamod_noise(sd = 0.1) ## fit model mod <- mod |> fit() mod ## create new aggregated diagnositic ## group variable library(dplyr, warn.conflicts = FALSE) spend <- spend |> mutate(diag_ag = case_when( diag == "Neoplasms" ~ diag, diag == "Not allocated" ~ diag, TRUE ~ "Other" )) ## assume size of measurement errors ## varies across these aggregated groups sd_diag <- data.frame(diag_ag = c("Neoplasms", "Not allocated", "Other"), sd = c(0.05, 0.2, 0.1)) ## fit model that uses diagnostic-specific ## standard deviations mod <- mod_norm(log_spend ~ age * diag + year, data = spend, weights = 1) |> set_datamod_noise(sd = sd_diag) ## Poisson model ----------------------------- mod <- mod_pois(deaths ~ month, data = usa_deaths, exposure = 1) |> set_datamod_noise(sd = 200)## Normal model ------------------------------ ## prepare outcome variable library(dplyr, warn.conflicts = FALSE) spend <- nld_expenditure |> mutate(log_spend = log(value + 1)) ## specify model mod <- mod_norm(log_spend ~ age * diag + year, data = spend, weights = 1) |> set_datamod_noise(sd = 0.1) ## fit model mod <- mod |> fit() mod ## create new aggregated diagnositic ## group variable library(dplyr, warn.conflicts = FALSE) spend <- spend |> mutate(diag_ag = case_when( diag == "Neoplasms" ~ diag, diag == "Not allocated" ~ diag, TRUE ~ "Other" )) ## assume size of measurement errors ## varies across these aggregated groups sd_diag <- data.frame(diag_ag = c("Neoplasms", "Not allocated", "Other"), sd = c(0.05, 0.2, 0.1)) ## fit model that uses diagnostic-specific ## standard deviations mod <- mod_norm(log_spend ~ age * diag + year, data = spend, weights = 1) |> set_datamod_noise(sd = sd_diag) ## Poisson model ----------------------------- mod <- mod_pois(deaths ~ month, data = usa_deaths, exposure = 1) |> set_datamod_noise(sd = 200)
#' ‘r lifecycle::badge(’deprecated')
set_datamod_outcome_rr3(mod)set_datamod_outcome_rr3(mod)
mod |
An object of class |
This function has been deprecated, and will
be removed from future versions of bage.
Please used function set_confidential_rr3()
instead.
A revised version of mod.
## 'injuries' variable in 'nzl_injuries' dataset ## has been randomly rounded to base 3 mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> set_confidential_rr3() |> ## rather than set_datamod_outcome_rr3 fit()## 'injuries' variable in 'nzl_injuries' dataset ## has been randomly rounded to base 3 mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) |> set_confidential_rr3() |> ## rather than set_datamod_outcome_rr3 fit()
Specify a data model for the outcome in a Poisson model, where the outcome is subject to overcount
set_datamod_overcount(mod, rate)set_datamod_overcount(mod, rate)
mod |
An object of class |
rate |
The prior for the overcoverage rate.
A data frame with a variable
called |
The overcount data model assumes that reported values for the outcome overstate the actual values. The reported values might be affected by double-counting, for instance, or might include some people or events that are not in the target population.
A revised version of mod.
rate argumentThe rate argument specifies a prior
distribution for the overcoverage
rate. rate is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a rate of
data.frame(sex = c("Female", "Male"),
mean = c(0.05, 0.03),
disp = c(0.1, 0.15))
implies that the reported value for the outcome is expected to overstate the true value by about 5% for females, and about 3% for females, with greater unceratinty for males than females.
The model for the observed outcome is
where
is the observed outcome for cell ;
is the true outcome for cell ;
overcount in cell ;
is the rate for cell ;
is exposure for cell ;
is the overcoverage rate for cell ;
is the expected value for
(specified via rate); and
is disperson for (specified via rate).
mod_pois() Specify a Poisson model
augment() Original data plus estimated values,
including estimates of true value for
the outcome variable
components() Estimated values for
model parameters, including inclusion
probabilities and overcount rates
set_datamod_undercount() An undercount-only
data model
set_datamod_miscount() An undercount-and-overcount
data model
datamods All data models implemented in bage
confidential Confidentialization
procedures modeled in bage
Mathematical Details vignette
## specify 'rate' rate <- data.frame(sex = c("Female", "Male"), mean = c(0.1, 0.13), disp = c(0.2, 0.2)) ## specify model mod <- mod_pois(divorces ~ age * sex + time, data = nzl_divorces, exposure = population) |> set_datamod_overcount(rate) mod ## fit model mod <- mod |> fit() mod ## original data, plus imputed values for outcome mod |> augment() ## parameter estimates library(dplyr) mod |> components() |> filter(term == "datamod") ## the data have in fact been confidentialized, ## so we account for that, in addition ## to accounting for overcoverage mod <- mod |> set_confidential_rr3() |> fit() mod## specify 'rate' rate <- data.frame(sex = c("Female", "Male"), mean = c(0.1, 0.13), disp = c(0.2, 0.2)) ## specify model mod <- mod_pois(divorces ~ age * sex + time, data = nzl_divorces, exposure = population) |> set_datamod_overcount(rate) mod ## fit model mod <- mod |> fit() mod ## original data, plus imputed values for outcome mod |> augment() ## parameter estimates library(dplyr) mod |> components() |> filter(term == "datamod") ## the data have in fact been confidentialized, ## so we account for that, in addition ## to accounting for overcoverage mod <- mod |> set_confidential_rr3() |> fit() mod
Specify a data model for the outcome in a Poisson or binomial model, where the outcome is subject to undercount.
set_datamod_undercount(mod, prob)set_datamod_undercount(mod, prob)
mod |
An object of class |
prob |
The prior for the probability
that a person or event in the target
population will correctly enumerated.
A data frame with a variable
called |
The undercount data model assumes that reported values for the outcome variable understate the true values, because the reported values miss some people or events in the target population. In other words, the probability that any given unit in the target population will be included in the reported outcome is less than 1.
A revised version of mod.
prob argumentThe prob argument specifies a prior
distribution for the probability
that a person or event in the target
population is included in the
reported outcome. prob is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a prob of
data.frame(sex = c("Female", "Male"),
mean = c(0.95, 0.92),
disp = c(0.02, 0.015))
implies that the expected value for the inclusion probability is 0.95 for females and 0.92 for males, with slightly more uncertainty for females than for males.
The model for the observed outcome is
where
is the observed outcome for cell ;
is the true outcome for cell ;
is the probability that a member of the
target population in cell is correctly enumerated in that cell;
is the expected value for
(specified via prob); and
is disperson for (specified via prob).
mod_pois() Specify a Poisson model
mod_binom() Specify a binomial model
augment() Original data plus estimated values,
including estimates of true value for
the outcome variable
components() Estimated values for
model parameters, including inclusion
probabilities and overcount rates
set_datamod_overcount() An overcount-only
data model
set_datamod_miscount() An undercount-and-overcount
data model
datamods All data models implemented in bage
confidential Confidentialization
procedures modeled in bage
Mathematical Details vignette
## specify 'prob' prob <- data.frame(sex = c("Female", "Male"), mean = c(0.95, 0.97), disp = c(0.05, 0.05)) ## specify model mod <- mod_pois(divorces ~ age * sex + time, data = nzl_divorces, exposure = population) |> set_datamod_undercount(prob) mod ## fit model mod <- mod |> fit() mod ## original data, plus imputed values for outcome mod |> augment() ## parameter estimates library(dplyr) mod |> components() |> filter(term == "datamod") ## the data have in fact been confidentialized, ## so we account for that, in addition ## to accounting for undercoverage mod <- mod |> set_confidential_rr3() |> fit() mod## specify 'prob' prob <- data.frame(sex = c("Female", "Male"), mean = c(0.95, 0.97), disp = c(0.05, 0.05)) ## specify model mod <- mod_pois(divorces ~ age * sex + time, data = nzl_divorces, exposure = population) |> set_datamod_undercount(prob) mod ## fit model mod <- mod |> fit() mod ## original data, plus imputed values for outcome mod |> augment() ## parameter estimates library(dplyr) mod |> components() |> filter(term == "datamod") ## the data have in fact been confidentialized, ## so we account for that, in addition ## to accounting for undercoverage mod <- mod |> set_confidential_rr3() |> fit() mod
Specify the mean of prior for the dispersion parameter (in Poisson and binomial models) or the standard deviation parameter (in normal models.)
set_disp(mod, mean = 1)set_disp(mod, mean = 1)
mod |
An object of class |
mean |
Mean value for the exponential prior.
In Poisson and binomial models, can be set to 0.
Default is |
The dispersion or mean parameter has an exponential
distribution with mean ,
By default equals 1.
In Poisson and binomial models,
mean can be set to 0, implying
that the dispersion term is also 0.
In normal models, mean must be non-negative.
If set_disp() is applied to
a fitted model, set_disp() unfits
the model, deleting existing estimates.
A bage_mod object
mod_pois(), mod_binom(), mod_norm() Specify a
model for rates, probabilities, or means
set_prior() Specify prior for a term
set_n_draw() Specify the number of draws
is_fitted() Test whether a model is fitted
mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) mod mod |> set_disp(mean = 0.1) mod |> set_disp(mean = 0)mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) mod mod |> set_disp(mean = 0.1) mod |> set_disp(mean = 0)
Specify the number of draws from the posterior
distribution to be used in model output.
A newly-created bage_mod object has an
n_draw value of 1000. Higher values
may be appropriate for characterizing
the tails of distributions, or for
publication-quality graphics and summaries.
set_n_draw(mod, n_draw = 1000L)set_n_draw(mod, n_draw = 1000L)
mod |
An object of class |
n_draw |
Number of draws. |
If the new value for n_draw is greater than
the old value, and the model has already been fitted,
then the model is unfitted, and
function fit() may need to be called again.
A bage_mod object
n_draw.bage_mod() query the value of n_draw
augment(), components() functions for
drawing from prior or posterior distribution - the output
of which is affected by the value of n_draw
mod_pois(), mod_binom(), mod_norm() Specify a
model
set_prior() Specify prior for a term
set_disp() Specify prior for dispersion
fit() Fit a model
unfit() Reset a model
mod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) mod # value for 'n_draw' displayed in object n_draw(mod) # or use 'n_draw()' to query mod <- mod |> set_n_draw(n_draw = 5000) modmod <- mod_pois(injuries ~ age:sex + ethnicity + year, data = nzl_injuries, exposure = popn) mod # value for 'n_draw' displayed in object n_draw(mod) # or use 'n_draw()' to query mod <- mod |> set_n_draw(n_draw = 5000) mod
Specify a prior distribution for an intercept, a main effect, or an interaction.
set_prior(mod, formula)set_prior(mod, formula)
mod |
A |
formula |
A formula giving the term and a function for creating a prior. |
If set_prior() is applied to
a fitted model, set_prior() unfits
the model, deleting existing estimates.
A modified bage_mod object.
priors Current choices for prior distributions
is_fitted() Test whether a model is fitted
set_disp() Specify prior for dispersion
mod <- mod_pois(injuries ~ age + year, data = nzl_injuries, exposure = popn) mod mod |> set_prior(age ~ RW2())mod <- mod_pois(injuries ~ age + year, data = nzl_injuries, exposure = popn) mod mod |> set_prior(age ~ RW2())
Reset random seeds stored in a model object.
When new_seeds is NULL (the default),
the new seeds are generated randomly; otherwise
they are taken from new_seeds.
set_seeds(mod, new_seeds = NULL)set_seeds(mod, new_seeds = NULL)
mod |
An object of class |
new_seeds |
|
When an object of class "bage_mod" is first created,
values are generated four four random seeds:
seed_components
seed_augment
seed_forecast_components
seed_forecast_augment
When fit(), components(), augment(),
and forecast() are called on the model object,
the seeds are used internally to ensure that
he same inputs generate the same outputs, even
when the outputs involve random draws.
End users are unlikely to call set_seeds() in
a data analysis, though it may occasionally by useful
when building a simulation from scratch.
A revised version of mod.
report_sim() Do a simulation study. (report_sim()
calls set_seeds() internally.)
mod_pois(), mod_binom(), mod_norm() Specify a model
fit() Fit a model
unfit() Reset model, deleting estimates
## fit model mod <- mod_pois(injuries ~ age, data = nzl_injuries, exposure = popn) |> fit() ## call 'components()' components(mod) ## call 'components()' again - same results components(mod) ## reset seeds mod <- set_seeds(mod) ## calling 'set_seeds' unfits the model is_fitted(mod) ## so we fit it again mod <- fit(mod) ## when we call components, we get ## different results from earlier components(mod)## fit model mod <- mod_pois(injuries ~ age, data = nzl_injuries, exposure = popn) |> fit() ## call 'components()' components(mod) ## call 'components()' again - same results components(mod) ## reset seeds mod <- set_seeds(mod) ## calling 'set_seeds' unfits the model is_fitted(mod) ## so we fit it again mod <- fit(mod) ## when we call components, we get ## different results from earlier components(mod)
Specify which variable (if any) represents age.
Functions mod_pois(), mod_binom(),
and mod_norm() try to infer the age variable
from variable names, but do not always get it right.
set_var_age(mod, name)set_var_age(mod, name)
mod |
An object of class |
name |
The name of the age variable. |
In an R formula, a 'variable' is different
from a 'term'. For instance,
~ age + region + age:region
contains variables age and region,
and terms age, region, and age:region.
By default, bage gives a term involving age a
(RW()) prior. Changing the age variable
via set_var_age() can change priors:
see below for an example.
If set_var_age() is applied to
a fitted model, set_var_age() unfits
the model, deleting existing estimates.
A bage_mod object
set_var_sexgender() Set sex or gender variable
set_var_time() Set time variable
is_fitted() Test whether a model is fitted
internally, bage uses poputils::find_var_age()
to locate age variables
## rename 'age' variable to something unusual injuries2 <- nzl_injuries injuries2$age_last_birthday <- injuries2$age ## mod_pois does not recognize age variable mod <- mod_pois(injuries ~ age_last_birthday * ethnicity + year, data = injuries2, exposure = popn) mod ## so we set the age variable explicitly ## (which, as a side effect, changes the prior on ## the age main effect) mod |> set_var_age(name = "age_last_birthday")## rename 'age' variable to something unusual injuries2 <- nzl_injuries injuries2$age_last_birthday <- injuries2$age ## mod_pois does not recognize age variable mod <- mod_pois(injuries ~ age_last_birthday * ethnicity + year, data = injuries2, exposure = popn) mod ## so we set the age variable explicitly ## (which, as a side effect, changes the prior on ## the age main effect) mod |> set_var_age(name = "age_last_birthday")
Specify which variable (if any) represents sex or gender.
Functions mod_pois(), mod_binom(),
and mod_norm() try to infer the sex/gender variable
from variable names, but do not always get it right.
set_var_sexgender(mod, name)set_var_sexgender(mod, name)
mod |
An object of class |
name |
The name of the sex or gender variable. |
In an R formula, a 'variable' is different
from a 'term'. For instance,
~ gender + region + gender:region
contains variables gender and region,
and terms gender, region, and gender:region.
If set_var_sexgender() is applied to
a fitted model, set_var_sexgender() unfits
the model, deleting existing estimates.
A "bage_mod" object
set_var_age() Set age variable
set_var_time() Set time variable
is_fitted() Test whether model is fitted
internally, bage uses poputils::find_var_sexgender()
to locate sex or gender variables
internally, bage uses poputils::find_label_female()
to locate female categories within a sex or gender variable
internally, bage uses poputils::find_label_male()
to locate male categories within a sex or gender variable
## rename 'sex' variable to something unexpected injuries2 <- nzl_injuries injuries2$biological_sex <- injuries2$sex ## mod_pois does not recognize sex variable mod <- mod_pois(injuries ~ age * biological_sex + year, data = injuries2, exposure = popn) mod ## so we set the sex variable explicitly mod |> set_var_sexgender(name = "biological_sex")## rename 'sex' variable to something unexpected injuries2 <- nzl_injuries injuries2$biological_sex <- injuries2$sex ## mod_pois does not recognize sex variable mod <- mod_pois(injuries ~ age * biological_sex + year, data = injuries2, exposure = popn) mod ## so we set the sex variable explicitly mod |> set_var_sexgender(name = "biological_sex")
Specify which variable (if any) represents time.
Functions mod_pois(), mod_binom(),
and mod_norm() try to infer the time variable
from variable names, but do not always get it right.
set_var_time(mod, name)set_var_time(mod, name)
mod |
An object of class |
name |
The name of the time variable. |
In an R formula, a 'variable' is different
from a 'term'. For instance,
~ time + region + time:region
contains variables time and region,
and terms time, region, and time:region.
By default, bage gives a term involving time a
(RW()) prior. Changing the time variable
via set_var_time() can change priors:
see below for an example.
If set_var_time() is applied to
a fitted model, set_var_time() unfits
the model, deleting existing estimates.
A bage_mod object
set_var_age() Set age variable
set_var_sexgender() Sex sex or gender variable
is_fitted() Test if model has been fitted
internally, bage uses poputils::find_var_time()
to locate time variables
## rename time variable to something unusual injuries2 <- nzl_injuries injuries2$calendar_year <- injuries2$year ## mod_pois does not recognize time variable mod <- mod_pois(injuries ~ age * ethnicity + calendar_year, data = injuries2, exposure = popn) mod ## so we set the time variable explicitly ## (which, as a side effect, changes the prior on ## the time main effect) mod |> set_var_time(name = "calendar_year")## rename time variable to something unusual injuries2 <- nzl_injuries injuries2$calendar_year <- injuries2$year ## mod_pois does not recognize time variable mod <- mod_pois(injuries ~ age * ethnicity + calendar_year, data = injuries2, exposure = popn) mod ## so we set the time variable explicitly ## (which, as a side effect, changes the prior on ## the time main effect) mod |> set_var_time(name = "calendar_year")
Use a p-spline (penalised spline) to model main effects or interactions. Typically used with age, but can be used with any variable where outcomes are expected to vary smoothly from one element to the next.
Sp( n_comp = NULL, s = 1, sd = 1, sd_slope = 1, along = NULL, con = c("none", "by") )Sp( n_comp = NULL, s = 1, sd = 1, sd_slope = 1, along = NULL, con = c("none", "by") )
n_comp |
Number of spline basis functions (components) to use. |
s |
Scale for the prior for the innovations.
Default is |
sd |
Standard deviation in prior for first element of random walk. |
sd_slope |
Standard deviation in prior
for initial slope of random walk. Default is |
along |
Name of the variable to be used as the 'along' variable. Only used with interactions. |
con |
Constraints on parameters.
Current choices are |
If Sp() is used with an interaction,
separate splines are used for the 'along' variable within
each combination of the
'by' variables.
An object of class "bage_prior_spline".
When Sp() is used with a main effect,
and when it is used with an interaction,
where
is the main effect or interaction, with elements;
is a subvector of holding
values for the th combination of the 'by' variables;
is the number of elements of ;
is the number of elements of ;
is a or matrix of
spline basis functions; and
is n_comp.
The elements of or are assumed
to follow a second-order random walk.
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
Eilers, P.H.C. and Marx B. (1996). "Flexible smoothing with B-splines and penalties". Statistical Science. 11 (2): 89–121.
RW() Smoothing via random walk
RW2() Smoothing via second-order random walk
SVD() Smoothing of age via singular value decomposition
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
splines::bs() Function used by bage to construct
spline basis functions
Mathematical Details vignette
Sp() Sp(n_comp = 10)Sp() Sp(n_comp = 10)
Create an object of class "bage_ssvd" to
hold results from a scaled Singular Value Decomposition
(SVD) with n_comp components.
ssvd(data)ssvd(data)
data |
A data frame. See Details for description. |
data has the following columns:
version Vintage of data
type Type of decomposition. Choices are "total",
"joint", and "indep".
labels_age Age labels for individual rows of
matrices within matrix and individual elements
of vectors within offset.
labels_sexgender Sex/gender labels for individual rows of
matrices within matrix and individual elements
of vectors within offset, or NULL.
NULL when sexgender is "total", since in
this case results average across sexes/genders.
matrix List column of sparse matrices.
Must have rownames.
Must not have NAs. When type is "total" or "joint",
each matrix has n_comp columns. When "type" is "indep",
each matrix has 2 * n_comp columns.
offset List column of vectors.
Must have names, which
are identical to the rownames of the corresponding
element of matrix.
data would normally be constructed using functions
in package bssvd.
An object of class "bage_ssvd".
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() Prior based on scaled SVD
ssvd(data_wmd)ssvd(data_wmd)
Use components from a Singular Value Decomposition (SVD) to model a main effect or interaction involving age.
SVD(ssvd, v = NULL, n_comp = 3, indep = TRUE)SVD(ssvd, v = NULL, n_comp = 3, indep = TRUE)
ssvd |
Object of class |
v |
Version of scaled SVD components to use. If no value is suppled, the most recent version is used. |
n_comp |
Number of components from scaled SVD
to use in modelling. The default is |
indep |
Whether to use separate or
combined SVDs in terms involving sex or gender.
Default is |
A SVD() prior assumes that the age, age-sex, or age-gender
profiles for the quantity
being modelled looks like they were drawn at random
from an external demographic database. For instance,
the prior obtained via
SVD(HMD)
assumes that age or age-sex profiles look like they were drawn from the Human Mortality Database.
If SVD() is used with an interaction involving
variables other than age and sex/gender,
separate profiles are constructed
within each combination of other variables.
bage chooses the appropriate age-specific
or age-sex-specific SVD values internally.
The choice depends on the model term that the
SVD() prior is applied to, and on the
age labels used in data argument to
mod_pois(), mod_binom() or mod_norm().
bage makes its choice when set_prior()
is called.
An object of class "bage_prior_svd".
Two possible ways of extracting patterns from age-sex-specific data are
carry out separate SVDs on separate datasets for each sex/gender; or
carry out a single SVD on dataset that has separate entries for each sex/gender.
Option 1 is more flexible. Option 2 is
more robust to sampling or measurement errors.
Option 1 is obtained by setting the joint
argument to FALSE. Option 1
is obtained by setting the indep argument to TRUE.
The default is TRUE.
Case 1: Term involving age and no other variables
When SVD() is used with an age main effect,
where
is a main effect or interaction involving age;
is the number of elements of ;
is the number of components from the SVD;
is a known matrix with dimension ; and
is a known vector with elements.
and are constructed from
a large database of age-specific demographic estimates
by performing an SVD and standardizing.
The elements of have prior
Case 2: Term involving age and non-sex, non-gender variable(s)
When SVD() is used with an interaction that involves age but that
does not involve sex or gender,
where
is a subvector of holding
values for the th combination of the non-age variables;
is the number of elements of ;
is the number of components from the SVD;
is a known matrix with dimension ; and
is a known vector with elements.
Case 3: Term involving age, sex/gender, and no other variables
When SVD() is used with an interaction that involves
age and sex or gender, there are two sub-cases, depending on
the value of indep.
When indep is TRUE,
and when indep is FALSE,
where
is an interaction involving age and sex/gender;
is a subvector of ,
holding values for sex/gender ;
is the number of elements in ;
is the number of sexes/genders;
is the number of components from the SVD;
is a known matrix, specific
to sex/gender ;
is a known vector with elements,
specific to sex/gender ;
is a known matrix, with values
for all sexes/genders; and
is a known vector with elements, with values
for all sexes/genders.
The elements of and have prior
Case 4: Term involving age, sex/gender, and other variable(s)
When SVD() is used with an interaction that involves
age, sex or gender, and other variables, there are two sub-cases,
depending on the value of indep.
When indep is TRUE,
and when indep is FALSE,
where
is an interaction involving sex/gender;
is a subvector of ,
holding values for sex/gender for the th
combination of the other variables;
is the number of elements in ;
is the number of sexes/genders;
is the number of components from the SVD;
is a known matrix, specific
to sex/gender ;
is a known vector with elements,
specific to sex/gender ;
is a known matrix, with values
for all sexes/genders; and
is a known vector with elements, with values
for all sexes/genders.
HMD Mortality rates from the
Human Mortality Database.
HFD Fertility rates from the
Human Fertility Database.
For details of the construction of scaled SVDS see the Mathematical Details vignette
SVD_AR(), SVD_AR1(), SVD_RW(), SVD_RW2() SVD priors for
for time-varying age profiles;
RW() Smoothing via random walk
RW2() Smoothing via second-order random walk
Sp() Smoothing via splines
Scaled SVDs Overview of scaled SVDs implemented in bage
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
set_var_sexgender() Identify sex or gender variable in data
SVD(HMD) SVD(HMD, n_comp = 2)SVD(HMD) SVD(HMD, n_comp = 2)
Use components from a Singular Value Decomposition (SVD) to model an interaction involving age and time, or age, sex/gender and time, where the coefficients evolve over time.
SVD_AR( ssvd, v = NULL, n_comp = 3, indep = TRUE, n_coef = 2, s = 1, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_AR1( ssvd, v = NULL, n_comp = 3, indep = TRUE, min = 0.8, max = 0.98, s = 1, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_DRW( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, min = 0.8, max = 0.98, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_DRW2( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, sd_slope = 1, min = 0.8, max = 0.98, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_Lin( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, mean_slope = 0, sd_slope = 1, con = c("none", "by") ) SVD_RW( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, con = c("none", "by") ) SVD_RW2( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, sd_slope = 1, con = c("none", "by") )SVD_AR( ssvd, v = NULL, n_comp = 3, indep = TRUE, n_coef = 2, s = 1, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_AR1( ssvd, v = NULL, n_comp = 3, indep = TRUE, min = 0.8, max = 0.98, s = 1, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_DRW( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, min = 0.8, max = 0.98, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_DRW2( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, sd_slope = 1, min = 0.8, max = 0.98, shape1 = 5, shape2 = 5, con = c("none", "by") ) SVD_Lin( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, mean_slope = 0, sd_slope = 1, con = c("none", "by") ) SVD_RW( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, con = c("none", "by") ) SVD_RW2( ssvd, v = NULL, n_comp = 3, indep = TRUE, s = 1, sd = 1, sd_slope = 1, con = c("none", "by") )
ssvd |
Object of class |
v |
Version of scaled SVD components to use. If no value is suppled, the most recent version is used. |
n_comp |
Number of components from scaled SVD
to use in modelling. The default is |
indep |
Whether to use separate or
combined SVDs in terms involving sex or gender.
Default is |
n_coef |
Number of AR coefficients in |
s |
Scale for standard deviations terms.
Default is |
shape1, shape2
|
Parameters for prior
for coefficients in |
con |
Constraints on parameters.
Current choices are |
min, max
|
Minimum and maximum values
for autocorrelation coefficient in |
sd |
Standard deviation
of initial value for random walks. Default is |
sd_slope |
Standard deviation in prior
for initial slope. Default is |
mean_slope |
Mean in prior for initial
slope. Default is |
SVD_AR(), SVD_AR1(),
SVD_Lin(),
SVD_RW(), SVD_RW2(),
SVD_DRW(), and SVD_RW2(),
priors assume that, in any given period,
the age profiles or age-sex profiles for the quantity
being modelled looks like they were drawn at random
from an external demographic database. For instance,
the SVD_AR() prior obtained via
SVD_AR(HMD)
assumes that profiles look like they were obtained from the Human Mortality Database.
An object inheriting from class
"bage_prior".
When the interaction being modelled only involves age and time, or age, sex/gender, and time
and when it involves other variables besides age, sex/gender, and time,
where
is an interaction involving age, time, possibly sex/gender,
and possibly other variables;
is a subvector of holding
values for period ;
is a subvector of holding
values for the th combination of the non-age, non-time,
non-sex/gender variables for period ;
is a known matrix; and
is a known vector.
and are constructed from
a large database of age-specific demographic estimates
by applying the singular value decomposition, and then standardizing.
With SVD_AR(), the prior for the th element
of or is
or
with SVD_AR1(), it is
or
with SVD_Lin(), it is
or
with SVD_RW(), it is
or
and with SVD_RW2(), it is
or
with SVD_DRW(), it is
or
and with SVD_DRW2(), it is
or
SVD_AR1() and SVD_DRW() are almost but not quite identical.
In SVD_AR1(), the variance of
is chosen so that has marginal variance ,
while in SVD_DRW(), has variance .
For details on the time series models, see AR(), AR1(),
Lin(), RW(), RW2(), DRW(), and DRW2().
With some combinations of terms and priors, the values of
the intercept, main effects, and interactions
are only weakly identified.
This weak identifiability is
typically harmless. However, in some applications, such as
when trying to obtain interpretable values
for main effects and interactions, it can be helpful to increase
identifiability through the use of constraints, specified through the
con argument.
Current options for con are:
"none" No constraints. The default.
"by" Only used in interaction terms that include 'along' and
'by' dimensions. Within each value of the 'along'
dimension, terms across each 'by' dimension are constrained
to sum to 0.
HMD Mortality rates from the
Human Mortality Database.
HFD Fertility rates from the
Human Fertility Database.
For details of the construction of scaled SVDS see the Mathematical Details article
SVD() SVD prior for non-time-varying terms
Lin() Linear with independent errors
RW() Smoothing via random walk
RW2() Smoothing via second-order random walk
DRW() Smoothing via damped random walk
DRW2() Smoothing via damped second-order random walk
Sp() Smoothing via splines
Scaled SVDs Overview of scaled SVDs implemented in bage
priors Overview of priors implemented in bage
set_prior() Specify prior for intercept,
main effect, or interaction
set_var_sexgender() Identify sex or gender variable in data
SVD_AR1(HMD) SVD_RW(HMD, n_comp = 2) SVD_RW2(HMD, indep = FALSE) SVD_Lin(HMD, s = 0)SVD_AR1(HMD) SVD_RW(HMD, n_comp = 2) SVD_RW2(HMD, indep = FALSE) SVD_Lin(HMD, s = 0)
Scaled SVDs contain information on typical age-patterns or age-sex patterns for demographic processes, extracted from international databases. The information is extracted using a singular value decomposition (SVD), and then scaled to make it easier to formulate priors.
Scaled SVDs can have multiple versions, based on data released at different dates, or on subsets of the available data.
Some datasets, and hence some scaled SVDs, include information on age but not on sex or gender.
| Scaled SVD | Process | Source | Versions | Sex dimension |
| CSA | School attendance | Census data assembled by UN | "v2025", "v2024" |
Yes |
| HFD | Fertility | Human Fertility Database | "v2025", "v2024" |
No |
| HIMD_R | Internal migration: annual rates | Human Internal Migration Database | "v2024" |
No |
| HIMD_P1 | Internal migration: 1-year probabilities | Human Internal Migration Database | "v2024" |
No |
| HIMD_P5 | Internal migration: 5-year probabilities | Human Internal Migration Database | "v2024" |
No |
| HMD | Mortality | Human Mortality Database | "v2025", "v2025_50", "v2024" |
Yes |
| LFP | Labour Force Participation | OECD | "v2025" |
Yes |
| WMD_C | Currently married | World Migration Data | "v2019" |
Yes |
| WMD_E | Ever married | World Migration Data | "v2019" |
Yes |
Counts of births and infant deaths in Sweden by county and year, 1995-2015
swe_infantswe_infant
A tibble with 441 rows and the following columns:
county: A factor with 21 levels, where the levels are
ordered by number of births, from "Stockholm" down to "Gotland"
'time: Calendar year
births: Count of births
deaths: Count of infant deaths
Dataset used in Chapter 11 of the book Bayesian Demographic Estimation and Forecasting.
Database "Live births by region, mother's age and child's sex. Year 1968 - 2017" and database "Deaths by region, age (during the year) and sex. Year 1968 - 2017" on the Statistics Sweden website. Downloaded on 13 July 2018.
Bryant J and Zhang J. 2018. Bayesian Demographic Estimation and Forecasting. CRC Press.
datasets Overview of datasets in bage
Summarize the intercept, main effects, and interactions from a fitted model.
## S3 method for class 'bage_mod' tidy(x, ...)## S3 method for class 'bage_mod' tidy(x, ...)
x |
Object of class |
... |
Unused. Included for generic consistency only. |
The tibble returned by tidy()
contains the following columns:
term Name of the intercept, main effect, or interaction
prior Specification for prior
n_par Number of parameters
n_par_free Number of free parameters
std_dev Standard deviation for point estimates.
With some priors, the number of free parameters is less than
the number of parameters for that term. For instance, an SVD()
prior might use three vectors to represent 101 age groups
so that the number of parameters is 101, but the number of
free parameters is 3.
std_dev is the standard deviation across elements of a
term, based on point estimates of those elements.
For instance, if the point
estimates for a term with three elements are
0.3, 0.5, and 0.1, then the value for std_dev is
sd(c(0.3, 0.5, 0.1))
std_dev is a measure of the contribution of a term to
variation in the outcome variable.
A tibble
std_dev is modified from Gelman et al. (2014)
Bayesian Data Analysis. Third Edition. pp396–397.
augment() Extract values for rates,
probabilities, or means, together
with original data
components() Extract values for hyper-parameters
dispersion() Extract values for dispersion
mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) mod <- fit(mod) tidy(mod)mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) mod <- fit(mod) tidy(mod)
Reset a model, deleting all estimates.
unfit(mod)unfit(mod)
mod |
A fitted object of class |
An unfitted version of mod.
fit() Fit a model
mod_pois(), mod_binom(), mod_norm() Specify a model
set_seeds() Reset random seeds
Functions such as set_prior(), set_disp() and
set_var_age() unfit models as side effects.
## create a model, which starts out unfitted mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) is_fitted(mod) ## calling 'fit' produces a fitted version mod <- fit(mod) is_fitted(mod) ## calling 'unfit' resets the model mod <- unfit(mod) is_fitted(mod)## create a model, which starts out unfitted mod <- mod_pois(injuries ~ age + sex + year, data = nzl_injuries, exposure = popn) is_fitted(mod) ## calling 'fit' produces a fitted version mod <- fit(mod) is_fitted(mod) ## calling 'unfit' resets the model mod <- unfit(mod) is_fitted(mod)
Counts of accidental deaths in the USA, by month, for 1973-1978.
usa_deathsusa_deaths
A tibble with 72 rows and the following columns:
month: Year and month.
deaths: Count of deaths.
Reformatted version of
datasets::USAccDeaths.
datasets Overview of datasets in bage
Object of class "bage_ssvd"
holding scaled SVD components derived from
data from the census and survey data on marriage
assembled by the United Nations
Population Division.
WMD_C and WMD_E each hold 5 components.
WMD_C WMD_EWMD_C WMD_E
Object of class "bage_ssvd".
Versions:
"v2019" (default) Data published in 2019
WMD_C is based on data on the proportion
of the population that is currently married.
It should be used for modelling the proportion
of people whose marital status is
"Currently Married"
WMD_E is based on data on the proportion
of the population that has ever been married.
It should be used for modelling the proportion
of people whose marital status is
"Ever Married".
In both cases "marriage" includes de facto marriages and consensual unions, in addition to legal marriages.
Derived from data from the
World Marriage Data 2019 database
available on the United Nations Population
Division website, and assembled by the
UNPD from national census and survey data.
Code to create WMD
is in folder ‘data-raw/ssvd_wmd’
in the source code for thet bage
package.
Scaled SVDs Overview of scaled SVDs implemented in bage
SVD() A prior based on a scaled SVD