| Title: | Demographic Analysis and Data Manipulation |
|---|---|
| Description: | Perform tasks commonly encountered when preparing and analysing demographic data. Some functions are intended for end users, and others for developers. Includes functions for working with life tables. |
| Authors: | John Bryant [aut, cre], Bayesian Demography Limited [cph] |
| Maintainer: | John Bryant <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.1 |
| Built: | 2026-06-01 05:23:03 UTC |
| Source: | https://github.com/bayesiandemography/poputils |
Calculate the intrinsic growth rate implied by fertility and mortality schedules. The intrinsic growth rate is the rate at which all age groups in a population would eventually grow if the population was subject to the fertility and mortality schedules indefinitely. The fertility and mortality schedules apply to a single sex, typically females.
.intrinsic_growth_rate(mx, Lx, age_mid).intrinsic_growth_rate(mx, Lx, age_mid)
mx |
Age specific fertility rates, for a single sex (typically daughters). An ordinary numeric vector or an rvec(). |
Lx |
Life table mortality measure, for a single sex (typically females). An ordinary numeric vector or an rvec(). |
age_mid |
The midpoint of each age group. Numeric vector. |
The fertility scehdule mx and mortality schedule
Lx refer only to the reproductive ages. The
mortality schedule Lx is the number of years that
a newborn baby would be expected to spend in each
age group under prevailing mortality rates. It is
the life table quantity nLx where the radix
l0 has been set to 1.
A numeric vector of length 1.
mx <- c(0.072, 0.148, 0.156, 0.140, 0.108, 0.054, 0.013) Lx <- c(3.49, 3.37, 3.23, 3.07, 2.91, 2.74, 2.56) age_mid <- c(17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5) .intrinsic_growth_rate(mx = mx, Lx = Lx, age_mid = age_mid)mx <- c(0.072, 0.148, 0.156, 0.140, 0.108, 0.054, 0.013) Lx <- c(3.49, 3.37, 3.23, 3.07, 2.91, 2.74, 2.56) age_mid <- c(17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5) .intrinsic_growth_rate(mx = mx, Lx = Lx, age_mid = age_mid)
Determine whether a set of age labels refer to one-year, five-year, or life-table age groups.
age_group_type(x)age_group_type(x)
x |
A vector of age labels |
The valid types of age labels are:
"single". One-year age groups, eg
"0" or "55", and possibly
an open age group, eg "90+".
"five". Five-year age groups, eg
"0-4" or "55-59", and possibly
an open age group, eg "100+".
"lt". Life table age groups, eg
"0", "1-4", "5-9",
"55-59", or "80+".
If x does not fit any of these
descriptions, then age_group_type() throws
an error.
If x could belong to
more than one type, then age_group_type()
prefers "single" to "five" and "lt",
and prefers "five" to "lt".
"single", "five", or "lt".
age_group_type(c("5-9", "0-4", "100+")) age_group_type(c("2", "5", "1")) age_group_type(c("0", "1-4")) ## could be any "single" or "lt" age_group_type("0") ## could be "five" or "lt" age_group_type("80-84")age_group_type(c("5-9", "0-4", "100+")) age_group_type(c("2", "5", "1")) age_group_type(c("0", "1-4")) ## could be any "single" or "lt" age_group_type("0") ## could be "five" or "lt" age_group_type("80-84")
Create labels for age groups. The labels depend
on the type argument:
"single". One-year age groups, eg
"0" or "55", and possibly
an open age group, eg "90+".
"five". Five-year age groups, eg
"0-4" or "55-59", and possibly
an open age group, eg "100+".
"lt". Life table age groups, eg
"0", "1-4", "5-9",
"55-59", or "80+".
age_labels(type, min = 0, max = 100, open = NULL)age_labels(type, min = 0, max = 100, open = NULL)
type |
Type of age group labels:
|
min |
Minimum age. Defaults to 0. |
max |
Maximum age for closed age groups. Defaults to 100. |
open |
Whether the last age group is "open", ie has no upper limit. |
The first age group starts at the age
specified by min. If open is TRUE,
then the final age group starts at the age
specified by max. Otherwise, the
final age group ends at the age specified
by max.
open defaults to TRUE when
min equals zero, and to FALSE
otherwise.
A character vector.
age_labels(type = "single", min = 15, max = 40) age_labels(type = "five") age_labels(type = "lt", max = 80)age_labels(type = "single", min = 15, max = 40) age_labels(type = "five") age_labels(type = "lt", max = 80)
Given a vector x of age group labels, return
a numeric vector.
age_lower() returns the lower limits of each age group,
age_mid() returns the midpoints, and
age_upper() returns the upper limits.
Vector x must describe 1-year, 5-year or life-table
age groups: see age_labels() for examples. x can
format these age groups in any way understood by
reformat_age().
age_lower(x) age_mid(x) age_upper(x)age_lower(x) age_mid(x) age_upper(x)
x |
A vector of age group labels. |
These functions can make age groups easier to work with. Lower and upper limits can be used for selecting on age. Replacing age group with midpoints can improve graphs.
A numeric vector, the same length as x.
x <- c("15-19", "5-9", "50+") age_lower(x) age_mid(x) age_upper(x) ## non-standard formats are OK age_lower(c("infants", "100 and over")) df <- data.frame(age = c("1-4", "10-14", "5-9", "0"), rate = c(0.023, 0.015, 0.007, 0.068)) df subset(df, age_lower(age) >= 5)x <- c("15-19", "5-9", "50+") age_lower(x) age_mid(x) age_upper(x) ## non-standard formats are OK age_lower(c("infants", "100 and over")) df <- data.frame(age = c("1-4", "10-14", "5-9", "0"), rate = c(0.023, 0.015, 0.007, 0.068)) df subset(df, age_lower(age) >= 5)
A "standard" distribution for age-specific fertility rates proposed in Booth (1984), and widely used for estimating fertility rates in settings with high fertility and limited data.
booth_standardbooth_standard
A tibble with 8 rows and the following columns:
age Five-year age group, from "10-14" to "45-49"
value Proportion of total fertility occurring within
the age group
The key input for booth_standard
is 'Yx' from table 2 of Booth (1984).
Booth H (1984) Transforming Gompertz's function for fertility analysis: The development of a standard for the relational Gompertz function. Population Studies 38(3): 495-506.
Check that age labels can be parsed and, optionally, whether the labels are complete, unique, start at zero, and end with an open age group.
check_age( x, complete = FALSE, unique = FALSE, zero = FALSE, open = FALSE, closed = FALSE )check_age( x, complete = FALSE, unique = FALSE, zero = FALSE, open = FALSE, closed = FALSE )
x |
A vector of age labels. |
complete |
If |
unique |
If |
zero |
If |
open |
If |
closed |
If |
By default, check_age() only tests whether
a set of labels can be parsed as single-year,
five-year, or life table age groups.
(See age_group_type() for more on the three
types of age group.) However, it can
also apply the following tests:
complete. Whether x includes
all intermediate age groups, with no gaps.
For instance, the labels c("10-14", "15-19", "5-9")
are complete, while the labelsc("15-19", "5-9")
are not (because they are missing "10-14".)
unique. Whether x has duplicated labels.
zero. Whether the youngest age group in x starts
at age 0, ie whether it includes "0" or "0-4".
open. Whether the oldest age group in x is "open",
with no upper limit, eg "100+" or "65+".
closed. Whether the oldest age group in x
is "closed", with an upper limit,
eg "100-104+" or "65".
TRUE, invisibly, or raises an
error if a test fails.
reformat_age() to convert age labels to
the format used by poputils.
try( check_age(c("10-14", "0-4", "15+"), complete = TRUE) ) try( check_age(c("10-14", "5-9", "0-4", "5-9", "15+"), unique = TRUE) ) try( check_age(c("10-14", "5-9", "15+"), zero = TRUE) ) try( check_age(c("10-14", "0-4", "5-9"), open = TRUE) ) try( check_age(c("10+", "0-4", "5-9"), closed = TRUE) )try( check_age(c("10-14", "0-4", "15+"), complete = TRUE) ) try( check_age(c("10-14", "5-9", "0-4", "5-9", "15+"), unique = TRUE) ) try( check_age(c("10-14", "5-9", "15+"), zero = TRUE) ) try( check_age(c("10-14", "0-4", "5-9"), open = TRUE) ) try( check_age(c("10+", "0-4", "5-9"), closed = TRUE) )
Check that x and y have the same length.
check_equal_length(x, y, nm_x, nm_y)check_equal_length(x, y, nm_x, nm_y)
x, y
|
Arguments to compare |
nm_x, nm_y
|
Names to use in error message |
'TRUE', invisibly.
x <- 1:3 y <- 3:1 check_equal_length(x = x, y = y, nm_x = "x", nm_y = "y")x <- 1:3 y <- 3:1 check_equal_length(x = x, y = y, nm_x = "x", nm_y = "y")
Check that n is finite, non-NA scalar that
is an integer or integerish (ie is equal to round(n)),
and optionally within a specified range
and divisible by a specified number.
check_n(n, nm_n, min, max, divisible_by)check_n(n, nm_n, min, max, divisible_by)
n |
A whole number |
nm_n |
Name for 'n' to be used in error messages |
min |
Minimum value 'n' can take. Can be NULL. |
max |
Maximum values 'n' can take. Can be NULL. |
divisible_by |
'n' must be divisible by this. Can be NULL. |
If all tests pass, check_n() returns TRUE invisibly.
Otherwise it throws an error.
check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1) check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL) check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1) check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL) check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)
Given a named list of colnum vectors, like those
produced by tidyselect::eval_select(),
throw an error if there is an overlap.
check_no_overlap_colnums(x)check_no_overlap_colnums(x)
x |
A named list of integer vectors. |
TRUE, invisibly
x <- list(arg1 = c(age = 1L), arg2 = c(gender = 4L, region = 5L)) check_no_overlap_colnums(x)x <- list(arg1 = c(age = 1L), arg2 = c(gender = 4L, region = 5L)) check_no_overlap_colnums(x)
Convert age group labels to a less detailed classification.
The three classifications recognized by combine_age()
are "single", "five", and "lt", as defined on
age_labels(). The following conversions are permitted:
"single" —> "lt"
"single" —> "five"
"lt" —> "five"
combine_age(x, to = c("five", "lt"))combine_age(x, to = c("five", "lt"))
x |
A vector of age labels |
to |
Type of age classification
to convert to: |
If x is a factor, then combine_age()
returns a factor; otherwise it returns a
character vector.
age_labels() to create age group labels
reformat_age() to convert existing age group labels
to a standard format
set_age_open() to set the lower limit
of the open age group
x <- c("0", "5", "3", "12") combine_age(x) combine_age(x, to = "lt")x <- c("0", "5", "3", "12") combine_age(x) combine_age(x, to = "lt")
Turn life expectancies at birth into full life tables, using the Brass logit model. The method is simple and is designed for simulations or for settings with little or no data on age-specific mortality rates. In settings where data on age-specific mortality is available, other methods might be more appropriate.
e0_to_lifetab_logit( target, standard, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL )e0_to_lifetab_logit( target, standard, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL )
target |
A data frame containing a variable called
|
standard |
A data frame containing variables
called |
infant, child, closed, open
|
Methods used to
calculate life expectancy. See |
radix |
Initial population for the
|
suffix |
Optional suffix added to life table columns. |
A tibble.
The method implemented by e0_to_lifetab_logit() is
based on the observation that, if populations A and B
are demographically similar, then, in many cases,
where is the life table probability of dying
between birth and age . By definition, ,
where is the standard life table function,
with radix () equal to 1.
Given (i) target life expectancy,
(ii) a set of ),
(referred to as a "standard"), and
(iii) a value for ,
e0_to_lifetab_logit() finds
a value for that yields a set of
) with the required life expectancy.
If populations A and B
are similar, then is likely to close to 1.
target argumenttarget is a data frame specifying
life expectancies for each population being modelled,
and, optionally, inputs to the calculations, and
'by' variables.
target contains the following variables:
A variable called "e0" giving life expectancy at birth.
Optionally, a variable called "beta" with values
for .
Can be an ordinary numeric vector
or an rvec.
If target does not include a "beta" variable,
then e0_to_lifetab_logit() sets to 1.
A variable called "sex". The "sex" variable
must be supplied if the infant
argument to e0_to_lifetab_logit() is "CD" or "AK",
or if the child argument is "CD".
Optionally, 'by' variables. Typical examples are time, region, and model variant.
standard argumentstandard is a data frame specifying
the to be used with each life expectancy
in target, and, optionally, values for the average age
person-years lived by people who die in each group,
. Values in standard are age-specific.
standard contains the following variables:
A variable called "age", with labels that
can be parsed by reformat_age().
A variable called "lx". Cannot be an rvec.
Additional variables used to match rows in standard
to rows in target.
Brass W, Coale AJ. 1968. “Methods of analysis and estimation,” in Brass, W, Coale AJ, Demeny P, Heisel DF, et al. (eds). The Demography of Tropical Africa. Princeton NJ: Princeton University Press, pp. 88–139.
Moultrie TA, Timæus IM. 2013. Introduction to Model Life Tables. In Moultrie T, Dorrington R, Hill A, Hill K, Timæus I, Zaba B. (eds). Tools for Demographic Estimation. Paris: International Union for the Scientific Study of Population. online version.
tfr_to_asfr_scale Fertility equivalent of
e0_to_lifetab_logit()
logit(), invlogit() Logit function
lifeexp() Calculate life expectancy from detailed inputs
## create new life tables based on level-1 ## 'West' model life tables, but with lower ## life expectancy library(dplyr, warn.conflicts = FALSE) target <- data.frame(sex = c("Female", "Male"), e0 = c(17.5, 15.6)) standard <- west_lifetab |> filter(level == 1) |> select(sex, age, lx) e0_to_lifetab_logit(target = target, standard = standard, infant = "CD", child = "CD") ## target is an rvec library(rvec, warn.conflicts = FALSE) target_rvec <- data.frame(sex = c("Female", "Male"), e0 = rnorm_rvec(n = 2, mean = c(17.5, 15.6), n_draw = 1000)) e0_to_lifetab_logit(target = target_rvec, standard = standard)## create new life tables based on level-1 ## 'West' model life tables, but with lower ## life expectancy library(dplyr, warn.conflicts = FALSE) target <- data.frame(sex = c("Female", "Male"), e0 = c(17.5, 15.6)) standard <- west_lifetab |> filter(level == 1) |> select(sex, age, lx) e0_to_lifetab_logit(target = target, standard = standard, infant = "CD", child = "CD") ## target is an rvec library(rvec, warn.conflicts = FALSE) target_rvec <- data.frame(sex = c("Female", "Male"), e0 = rnorm_rvec(n = 2, mean = c(17.5, 15.6), n_draw = 1000)) e0_to_lifetab_logit(target = target_rvec, standard = standard)
This function contained an error, and is deprecated.
Please use function e0_to_lifetab_logit() instead.
ex_to_lifetab_brass( target, standard, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL )ex_to_lifetab_brass( target, standard, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL )
target |
A data frame containing a variable called
|
standard |
A data frame containing variables
called |
infant, child, closed, open
|
Methods used to
calculate life expectancy. See |
radix |
Initial population for the
|
suffix |
Optional suffix added to life table columns. |
Function ex_to_lifetab_brass() used formula
,
instead of the conventional
.
A tibble.
Given labels for sex or gender, try to infer
which (if any) refer to females.
If no elements look like a label for females,
or if two or more elements do,
then return NULL.
find_label_female(nms)find_label_female(nms)
nms |
A character vector |
An element of nms or NULL.
find_label_male(), find_var_sexgender()
find_label_female(c("Female", "Male")) ## one valid find_label_female(c("0-4", "5-9")) ## none valid find_label_female(c("F", "Fem")) ## two validfind_label_female(c("Female", "Male")) ## one valid find_label_female(c("0-4", "5-9")) ## none valid find_label_female(c("F", "Fem")) ## two valid
Given labels for sex or gender, try to infer
which (if any) refer to males.
If no elements look like a label for males,
or if two or more elements do,
then return NULL.
find_label_male(nms)find_label_male(nms)
nms |
A character vector |
An element of nms or NULL.
find_label_female(), find_var_sexgender()
find_label_male(c("Female", "Male")) ## one valid find_label_male(c("0-4", "5-9")) ## none valid find_label_male(c("male", "m")) ## two validfind_label_male(c("Female", "Male")) ## one valid find_label_male(c("0-4", "5-9")) ## none valid find_label_male(c("male", "m")) ## two valid
Find the element of nms that looks like an age variable.
If no elements look like an age variable, or if
two or more elements do,
then return NULL.
find_var_age(nms)find_var_age(nms)
nms |
A character vector |
An element of nms, or NULL.
find_var_time(), find_var_sexgender()
find_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_age(c("Sex", "Year")) ## none valid find_var_age(c("age", "age.years")) ## two validfind_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_age(c("Sex", "Year")) ## none valid find_var_age(c("age", "age.years")) ## two valid
Find the element of nms that looks like
a sex or gender variable.
If no elements look like a sex or gender variable,
or if two or more elements do,
then return NULL.
find_var_sexgender(nms)find_var_sexgender(nms)
nms |
A character vector |
An element of nms, or NULL.
find_var_age(), find_var_time(), find_label_female(),
find_label_male()
find_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_sexgender(c("Age", "Region")) ## none valid find_var_sexgender(c("sexgender", "sexes")) ## two validfind_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_sexgender(c("Age", "Region")) ## none valid find_var_sexgender(c("sexgender", "sexes")) ## two valid
Find the element of nms that looks like an time variable.
If no elements look like a time variable, or if
two or more elements do,
then return NULL.
find_var_time(nms)find_var_time(nms)
nms |
A character vector |
An element of nms, or NULL.
find_var_age(), find_var_sexgender()
find_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_time(c("Sex", "Region")) ## none valid find_var_time(c("time", "year")) ## two validfind_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_time(c("Sex", "Region")) ## none valid find_var_time(c("time", "year")) ## two valid
Constructed a named vector of indices equivalent to the vectors produced by tidyselect::eval_select, but for the grouping variables in an object of class "grouped_df".
groups_colnums(data)groups_colnums(data)
data |
A data frame. |
If data is not grouped, then groups_colnums
returns a zero-length vector.
A named integer vector.
library(dplyr) df <- data.frame(x = 1:4, g = c(1, 1, 2, 2)) groups_colnums(df) df <- group_by(df, g) groups_colnums(df)library(dplyr) df <- data.frame(x = 1:4, g = c(1, 1, 2, 2)) groups_colnums(df) df <- group_by(df, g) groups_colnums(df)
Estimates of age-specific fertility rates, (births per 1000 person-years lived) for rural and urban areas, in Iran, 1986-2000. Calculated by Mohammad Jalal Abbasi-Shavazi and Peter McDonald from data from the 2000 Iran Demographic and Health Survey.
irn_fertirn_fert
A tibble with 2010 rows and the following columns:
time Calendar year
age Five-year age group from "15-19" to "45-49"
area "Rural" or "Urban"
rate Age-specific fertility rate
Tables 4.1 and 4.2 of Abbasi-Shavazi, M J, McDonald, P (2005). National and provincial level fertility trends in Iran, 1972–2006. Australian National University. Working Papers in Demography no. 94.
Calculate life table quantities. Function
lifetab() returns an entire life table.
Function lifeexp() returns life expectancy at birth.
The inputs can be mortality rates (mx) or
probabilities of dying (qx), though not both.
lifetab( data, mx = NULL, qx = NULL, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL, n_core = 1 ) lifeexp( data, mx = NULL, qx = NULL, at = 0, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", suffix = NULL, n_core = 1 )lifetab( data, mx = NULL, qx = NULL, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL, n_core = 1 ) lifeexp( data, mx = NULL, qx = NULL, at = 0, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", suffix = NULL, n_core = 1 )
data |
Data frame with mortality data. |
mx |
< |
qx |
< |
age |
< |
sex |
< |
ax |
< |
by |
< |
infant |
Method used to calculate
life table values in age group |
child |
Method used to calculate
life table values in age group |
closed |
Method used to calculate
life table values in closed age intervals
other than |
open |
Method used to calculate
life table values in the final, open age group
(eg |
radix |
Initial population for the
|
suffix |
Optional suffix added to new columns in result. |
n_core |
Number of cores to use for parallel
processing. If |
at |
Age at which life expectancy is calculated
( |
A tibble.
mx Deaths per person-year lived.
qx Probability of surviving from the start
of age group 'x' to the end.
lx Number of people alive at
the start of age group x.
dx Number of deaths in age group x
Lx Expected number of person years lived in
age group x.
ex Life expectancy, calculated at the start
of age group x.
Mortality rates mx are sometimes expressed
as deaths per 1000 person-years lived, or per 100,000
person-years lived. lifetab() and lifeexp()
assumed that they are expressed as deaths per
person-year lived.
lifetab() and lifeexp() implement several
methods for calculating life table quantities
from mortality rates. Each method makes
different assumptions about
the way that mortality rates vary within
age intervals:
"constant" Mortality rates are constant
within each interval.
"linear". Life table quantity lx
is a straight line within each interval.
Equivalently, deaths are distributed uniformly
within each interval.
"CD". Used only with age groups "0"
and "1-4". Mortality rates decline
over the age interval,
with the slope depending on the absolute
level of infant mortality. The formulas were
developed by Coale and Demeny (1983),
and used in Preston et al (2001).
"AK". Used only with age group "0".
Mortality rates decline over the age interval,
with the slope depending on the absolute
level of infant mortality. The formulas
were formulas developed by Andreev and Kingkade (2015),
and are used in the Human Mortality Database
methods protocol.
For a detailed description of the methods, see the vignette for poputils.
ax is the average number of years
lived in an age interval by people who
die in that interval. Demographers sometimes
refer to it as the 'separation factor'. If a non-NA
value of ax is supplied for an age group,
then the results for that age group are based
on the formula
,
(where n_x is the width of the age interval),
over-riding any methods specified via the infant, child,
closed and open arguments.
The probability of dying, qx, is always 1 in the
final (open) age group. qx therefore provides
no direct information on mortality conditions
within the final age group. lifetab() and
lifeexp() use conditions in the second-to-final
age group as a proxy for conditions in the final
age group. When open is "constant" (which
is currently the only option), and no value
for ax in the final age group is provided,
lifetab() and lifeexp() assume
that , and set
.
In practice, mortality is likely to be higher
in the final age group than in the second-to-final
age group, so the default procedure is likely to
lead to inaccuracies. When the size of the final
age group is very small, these inaccuracies will
be inconsequential. But in other cases, it may
be necessary to supply an explicit value for
ax for the final age group, or to use mx
rather than qx as inputs.
An rvec is a 'random vector',
holding multiple draws from a distribution.
Using an rvec for the mx argument to
lifetab() or lifeexp() is a way of representing
uncertainty. This uncertainty is propagated
through to the life table values, which will
also be rvecs.
Calculations can be slow when working
with rvecs and many combinations of 'by' variables.
In these cases, setting n_core to a number
greater than 1, which triggers parallel processing,
may help.
Preston SH, Heuveline P, and Guillot M. 2001. Demography: Measuring and Modeling Population Processes Oxford: Blackwell.
Coale AJ, Demeny P, and Vaughn B. 1983. Regional model life tables and stable populations New York: Academic Press.
Andreev, E.M. and Kingkade, W.W., 2015. Average age at death in infancy and infant mortality level: Reconsidering the Coale-Demeny formulas at current levels of low mortality. Demographic Research, 33, pp.363-390.
Human Mortality Database Methods Protocol.
ex_to_lifetab_brass() Calculate life table from minimal inputs
q0_to_m0() Convert between infant mortality measures
tfr() Calculate total fertility rate
library(dplyr) ## life table for females based on 'level 1' ## mortality rates "West" model life table west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx) ## change method for infant and children from ## default ("constant") to "CD" west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculate life expectancies ## for all levels, using the 'by' ## argument to distinguish levels west_lifetab |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD", by = level) ## obtain the same result using ## 'group_by' west_lifetab |> group_by(level) |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculations based on 'qx' west_lifetab |> lifeexp(qx = qx, sex = sex, by = level) ## life expectancy at age 60 west_lifetab |> filter(level == 10) |> lifeexp(mx = mx, at = 60, sex = sex) ## life expectancy at ages 0 and 60 west_lifetab |> filter(level == 10) |> lifeexp(mx = mx, at = c(0, 60), sex = sex)library(dplyr) ## life table for females based on 'level 1' ## mortality rates "West" model life table west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx) ## change method for infant and children from ## default ("constant") to "CD" west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculate life expectancies ## for all levels, using the 'by' ## argument to distinguish levels west_lifetab |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD", by = level) ## obtain the same result using ## 'group_by' west_lifetab |> group_by(level) |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculations based on 'qx' west_lifetab |> lifeexp(qx = qx, sex = sex, by = level) ## life expectancy at age 60 west_lifetab |> filter(level == 10) |> lifeexp(mx = mx, at = 60, sex = sex) ## life expectancy at ages 0 and 60 west_lifetab |> filter(level == 10) |> lifeexp(mx = mx, at = c(0, 60), sex = sex)
Transform values to and from the logit scale.
logit() calculates
logit(p) invlogit(x)logit(p) invlogit(x)
p |
Values between 0 and 1. Can be an atomic vector, a matrix, or an rvec. |
x |
Values in the interval |
and invlogit() calculates
To avoid overflow, invlogit()
uses
internally for where .
In some of the demographic literature, the logit function is defined as
logit() and invlogit() follow the conventions
in statistics and machine learning, and omit the
.
A vector of doubles, if p or x is a vector.
A matrix of doubles, if p or x is a matrix.
An object of class rvec_dbl, if p or x is an rvec.
p <- c(0.5, 1, 0.2) logit(p) invlogit(logit(p))p <- c(0.5, 1, 0.2) logit(p) invlogit(logit(p))
Given a matrix, create a list, each element of which contains a column or row from the matrix.
matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)
m |
A matrix |
matrix_to_list_of_cols() and 'matrix_to_list_of_rows() are
internal functions, for use by developers, and would not
normally be called directly by end users.
matrix_to_list_of_cols() A list of vectors,
each of which is a column from x.
matrix_to_list_of_rows(), A list of vectors,
each of which is a row from x.
m <- matrix(1:12, nrow = 3) matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)m <- matrix(1:12, nrow = 3) matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)
Counts of deaths and population, by age, sex, and calendar year, plus mortality rates, for New Zealand, 2021-2022.
nzl_mortnzl_mort
A data frame with 84 rows and the following variables:
year Calendar year.
gender "Female", and "Male".
age Age, in life table age groups, with an open age
group of 95+.
deaths Counts of deaths, randomly rounded to base 3.
popn Estimates of average annual population.
mx Mortality rates (deaths / popn).
Modified from data in tables "Deaths by age and sex (Annual-Dec)" and "Estimated Resident Population by Age and Sex (1991+) (Annual-Dec)" from Stats NZ online database Infoshare, downloaded on 24 September 2023.
A modified version of link{nzl_mort} where mx
columns is an rvec, rather than an ordinary
R vector. The rvec holds the random draws from the posterior
distribution obtained from by a Bayesian statistical model.
nzl_mort_rvecnzl_mort_rvec
An object of class tbl_df (inherits from tbl, data.frame) with 84 rows and 4 columns.
Convert the probability of dying during infancy (q0) to the mortality rate for infancy (m0).
q0_to_m0( q0, sex = NULL, a0 = NULL, infant = c("constant", "linear", "CD", "AK") )q0_to_m0( q0, sex = NULL, a0 = NULL, infant = c("constant", "linear", "CD", "AK") )
q0 |
Probability of dying in first year of life. A numeric vector or an rvec. |
sex |
Biological sex. A vector the same length
as |
a0 |
Average age at death for infants who die.
Optional. See help for |
infant |
Calculation method.
See help for |
A numeric vector or rvec.
The term "infant mortality rate" is ambiguous. Demographers sometimes use it to refer to m0 (which is an actual rate) and sometimes use it to refer to q0 (which is a probability.)
lifetab() Calculate a full life table.
library(dplyr, warn.conflicts = FALSE) west_lifetab |> filter(age == 0, level <= 5) |> select(level, sex, age, mx, qx) |> mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))library(dplyr, warn.conflicts = FALSE) west_lifetab |> filter(age == 0, level <= 5) |> select(level, sex, age, mx, qx) |> mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))
Convert age group labels to one of three formats:
Single-year age groups, eg
"0", "1", ..., "99", "100+".
Life table age groups, eg
"0", "1-4", "5-9", ..., "95-99", "100+"'.
Five-year age groups, eg
"0-4", "5-9", ..., "95-99", "100+".
By default reformat_age() returns a factor
that includes all intermediate age groups.
See below for examples.
reformat_age(x, factor = TRUE)reformat_age(x, factor = TRUE)
x |
A vector. |
factor |
Whether the return value should be a factor. |
reformat_age() applies the following algorithm:
Tidy and translate text,
eg convert "20 to 24 years" to
"20-24", convert "infant" to
"0", or convert "100 or more" to
"100+".
Check whether the resulting
labels could have been produced by
age_labels(). If not, throw an error.
If factor is TRUE
(the default), then return a factor. The levels of
this factor include all intermediate age groups.
Otherwise return a character vector.
When x consists entirely of numbers, reformat_age()
also checks for two special cases:
If every element of x is a multiple of 5,
and if max(x) >= 50, then x is assumed to
describe 5-year age groups
If every element of x is 0, 1, or a multiple
of 5, with max(x) >= 50, then x is assumed
to describe life table age groups.
If factor is TRUE,
then reformat_age() returns a factor;
otherwise it returns a character vector.
reformat_age(c("80 to 84", "90 or more", "85 to 89")) ## factor contains intermediate level missing from 'x' reformat_age(c("80 to 84", "90 or more")) ## non-factor reformat_age(c("80 to 84", "90 or more"), factor = FALSE) ## single reformat_age(c("80", "90plus")) ## life table reformat_age(c("0", "30-34", "10--14", "1-4 years"))reformat_age(c("80 to 84", "90 or more", "85 to 89")) ## factor contains intermediate level missing from 'x' reformat_age(c("80 to 84", "90 or more")) ## non-factor reformat_age(c("80 to 84", "90 or more"), factor = FALSE) ## single reformat_age(c("80", "90plus")) ## life table reformat_age(c("0", "30-34", "10--14", "1-4 years"))
Reformat a binary sex variable so
that it consists entirely of
values "Female", "Male",
and possibly NA and any values
included in except.
reformat_sex(x, except = NULL, factor = TRUE)reformat_sex(x, except = NULL, factor = TRUE)
x |
A vector. |
except |
Values to exclude when reformatting. |
factor |
Whether the return value should be a factor. |
When parsing labels, reformat_sex()
ignores case: "FEMALE" and "fEmAlE"
are equivalent.
White space is removed from the beginning and end of labels.
reformat_sex() does not try to interpreting
numeric codes (eg 1, 2).
If factor is TRUE,
then reformat_age() returns a factor;
otherwise it returns a character vector.
reformat_sex(c("F", "female", NA, "MALES")) ## values supplied for 'except' reformat_sex(c("Fem", "Other", "Male", "M"), except = c("Other", "Diverse")) ## return an ordinary character vector reformat_sex(c("F", "female", NA, "MALES"), factor = FALSE)reformat_sex(c("F", "female", NA, "MALES")) ## values supplied for 'except' reformat_sex(c("Fem", "Other", "Male", "M"), except = c("Other", "Diverse")) ## return an ordinary character vector reformat_sex(c("F", "female", NA, "MALES"), factor = FALSE)
Apply the 'Random Round to Base 3' (RR3)
algorithm to a vector of integers
(or doubles where round(x) == x.)
rr3(x)rr3(x)
x |
A vector of integers (in the
sense that |
The RR3 algorithm is used by statistical
agencies to confidentialize data. Under the
RR3 algorithm, an integer
is randomly rounded as follows:
If is divisible by 3, leave it unchanged
If dividing by 3 leaves a remainder of 1, then
round down (subtract 1) with probability 2/3,
and round up (add 2) with probability 1/3.
If dividing by 3 leaves a remainder of 1,
then round down (subtract 2)
with probability 1/3, and round up (add 1)
with probability 2/3.
RR3 has some nice properties:
The randomly-rounded version of
has expected value .
If non-negative, then the randomly
rounded version of is non-negative.
If is non-positive, then the randomly
rounded version of is non-positive.
A randomly-rounded version of x.
x <- c(1, 5, 2, 0, -1, 3, NA) rr3(x)x <- c(1, 5, 2, 0, -1, 3, NA) rr3(x)
Set the lower limit of the open age group.
Given a vector of age group labels,
recode all age groups with a lower limit
greater than or equal to <lower> to <lower>+.
set_age_open(x, lower)set_age_open(x, lower)
x |
A vector of age labels. |
lower |
An integer. The lower limit for the open age group. |
set_age_open() requires that x and
the return value have a
a five-year, single-year, or life table format,
as described in age_labels().
A modified version of x.
set_age_open() uses age_lower() to identify
lower limits
age_labels() for creating age labels from scratch
x <- c("100+", "80-84", "95-99", "20-24") set_age_open(x, 90) set_age_open(x, 25)x <- c("100+", "80-84", "95-99", "20-24") set_age_open(x, 90) set_age_open(x, 25)
Calculate the total fertility rate (TFR) from age-specific fertility rates.
tfr( data, asfr = asfr, age = age, sex = NULL, by = NULL, denominator = 1, suffix = NULL )tfr( data, asfr = asfr, age = age, sex = NULL, by = NULL, denominator = 1, suffix = NULL )
data |
Data frame with age-specific fertility rates and age |
asfr |
< |
age |
< |
sex |
< |
by |
< |
denominator |
The denominator used to
calculate |
suffix |
Optional suffix added to |
The total fertility rate is a summary measures for current fertility levels that removes the effect of age structure. Is obtained by summing up age-specific fertility rates, multiplying each rate by the width of the corresponding age group. For instance, the rate for age group "15-19" is multiplied by 5, and the rate for age group "15" is multiplied by 1.
The total fertility rate can be interpreted as the number of average children that a person would have, under prevailing fertility rates, if the person survived to the maximum age of reproduction. The hypothetical person is normally a woman, since age-specific fertility rates normally use person-years lived by women as the denominator. But it can apply to men, if the age-specific fertility rates are "paternity rates", ie rates that use person-years lived by men as the denominator.
A tibble.
Age-specific fertility rates do not
normally specify the sex of the children
who are born. In cases where they do, however,
rates have to be summed across sexes to
give the total fertility rates. If tfr() is
supplied with a sex argument, it assumes that
sex applies to the births, and sums over the sexes.
Published tables of age-specific fertility rates often express the rates as births per 1000 person-years lived, rather than per person-year lived. (Sometimes this is expressed as "births per 1000 women".) In these cases
An rvec is a 'random vector',
holding multiple draws from a distribution.
Using an rvec for the asfr argument to
tfr() is a way of representing
uncertainty. This uncertainty is propagated
through to the TFR, which will
also be rvecs.
lifeexp() Calculate life expectancy from age-specific
mortality rates.
irn_fert |> tfr(asfr = rate, by = c(area, time), denominator = 1000)irn_fert |> tfr(asfr = rate, by = c(area, time), denominator = 1000)
Turn total fertility rates (TFRs) into sets of age-specific fertility rates, by scaling a set of standard rates upwards or downwards.
tfr_to_asfr_scale(target, standard, suffix = NULL)tfr_to_asfr_scale(target, standard, suffix = NULL)
target |
A data frame containing a variable called
|
standard |
A data frame containing variables
called |
suffix |
Optional suffix added to
|
A tibble.
Let be the age-specific fertility rate for people
aged between and .
Values for are obtained by
scaling the standard rates
so that they agree with the target
total fertility rate . That is,
tfr_to_asfr_scale() sets
where
target argumenttarget is a data frame specifying
total fertility rates for each population being modelled.
target contains the following variables:
A variable called "tfr". An ordinary
numeric vector or an rvec().
Optionally, 'by' variables. Typical examples are time, region, and model variant.
standard argumentstandard is a data frame specifying
standard fertility schedules to be used
with each life expectancy
in target. Values in standard are age-specific.
standard contains the following variables:
A variable called "age", with labels that
can be parsed by reformat_age().
A variable called "value", containing
non-negative values. Cannot be an rvec.
Additional variables used to match rows in standard
to rows in target.
ex_to_lifetab_brass() Life table equivalent
of tfr_to_asfr_scale().
booth_standard The 'Booth standard' fertility schedule
tfr Calculate total fertility rate from age-specific fertility rates
## create age-specific fertility rates ## based on the [Booth standard][booth_standard] library(dplyr, warn.conflicts = FALSE) target <- data.frame(region = c("A", "B"), tfr = c(5.5, 4.7)) asfr <- tfr_to_asfr_scale(target = target, standard = booth_standard) asfr ## check consistency with original TFRs asfr |> tfr(asfr = asfr, by = region) ## target is an rvec library(rvec, warn.conflicts = FALSE) target_rvec <- data.frame(region = c("A", "B"), tfr = rnorm_rvec(n = 2, mean = c(5.5, 4.7), n_draw = 1000)) tfr_to_asfr_scale(target = target_rvec, standard = booth_standard)## create age-specific fertility rates ## based on the [Booth standard][booth_standard] library(dplyr, warn.conflicts = FALSE) target <- data.frame(region = c("A", "B"), tfr = c(5.5, 4.7)) asfr <- tfr_to_asfr_scale(target = target, standard = booth_standard) asfr ## check consistency with original TFRs asfr |> tfr(asfr = asfr, by = region) ## target is an rvec library(rvec, warn.conflicts = FALSE) target_rvec <- data.frame(region = c("A", "B"), tfr = rnorm_rvec(n = 2, mean = c(5.5, 4.7), n_draw = 1000)) tfr_to_asfr_scale(target = target_rvec, standard = booth_standard)
Build a matrix where the elements are values of
a measure variable, and the rows and columns
are formed by observed combinations of ID
variables. The ID variables picked
out by rows and cols must uniquely identify
cells. to_matrix(), unlike stats::xtabs(),
does not sum across multiple combinations of
ID variables.
to_matrix(x, rows, cols, measure)to_matrix(x, rows, cols, measure)
x |
A data frame. |
rows |
The ID variable(s) used to distinguish rows in the matrix. |
cols |
The ID variable(s) used to distinguish columns in the matrix. |
measure |
The measure variable, eg rates or counts. |
A matrix
x <- expand.grid(age = c(0, 1, 2), sex = c("F", "M"), region = c("A", "B"), year = 2000:2001) x$count <- 1:24 to_matrix(x, rows = c(age, sex), cols = c(region, year), measure = count) to_matrix(x, rows = c(age, sex, region), cols = year, measure = count) ## cells not uniquely identified try( to_matrix(x, rows = age, cols = sex, measure = count) )x <- expand.grid(age = c(0, 1, 2), sex = c("F", "M"), region = c("A", "B"), year = 2000:2001) x$count <- 1:24 to_matrix(x, rows = c(age, sex), cols = c(region, year), measure = count) to_matrix(x, rows = c(age, sex, region), cols = year, measure = count) ## cells not uniquely identified try( to_matrix(x, rows = age, cols = sex, measure = count) )
Trim a vector so that all values are greater than 0 and less than 1.
trim_01(x)trim_01(x)
x |
A numeric vector. Can be an rvec. |
If
min is lowest element of x
that is higher than 0, and
max is the highest element of x that is
lower than 1,
then trim_01()
shifts all elements of x that are lower than min
upwards, so that they equal min, and
shifts all elements of x that are higher than max
downwards, so that they equal max.
A trimmed version of x
logit(), invlogit() Logit transformation
x <- c(1, 0.98, -0.001, 0.5, 0.01) trim_01(x)x <- c(1, 0.98, -0.001, 0.5, 0.01) trim_01(x)
Life table quantities from the "West" family of Coale-Demeny model life tables.
west_lifetabwest_lifetab
A data frame with 1,050 rows and the following variables:
level Index for life table. Lower level implies
lower life expectancy.
sex "Female", and "Male".
age Age, in life table age groups, with an open age
group of 95+.
mx Mortality rate.
ax Average years lived in age interval by people
who die in that interval.
qx Probability some alive at start of age interval
dies during interval.
lx Number of people still alive at start of
age interval.
dx Number of people dying during age interval.
Lx Number of person-years lived during age interval.
ex Expectation of life at start of age interval.
Coale A, Demeny P, and Vaughn B. 1983.
Regional model life tables and stable populations.
2nd ed. New York: Academic Press,
accessed via demogR::cdmltw().