Package 'poputils' reference manual

Title:	Demographic Analysis and Data Manipulation
Description:	Perform tasks commonly encountered when preparing and analysing demographic data. Some functions are intended for end users, and others for developers. Includes functions for working with life tables.
Authors:	John Bryant [aut, cre], Bayesian Demography Limited [cph]
Maintainer:	John Bryant <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.4
Built:	2025-02-18 06:13:01 UTC
Source:	https://github.com/bayesiandemography/poputils

Infer Age Label Type

Description

Determine whether a set of age labels refer to one-year, five-year, or life-table age groups.

Usage

age_group_type(x)
age_group_type(x)

Arguments

`x`	A vector of age labels

Details

The valid types of age labels are:

"single". One-year age groups, eg "0" or "55", and possibly an open age group, eg "90+".
"five". Five-year age groups, eg "0-4" or "55-59", and possibly an open age group, eg "100+".
"lt". Life table age groups, eg "0", "1-4", "5-9", "55-59", or "80+".

If x does not fit any of these descriptions, ⁠then age_group_type()⁠ throws an error.

If x could belong to more than one type, then age_group_type() prefers "single" to "five" and "lt", and prefers "five" to "lt".

Value

"single", "five", or "lt".

Examples

age_group_type(c("5-9", "0-4", "100+"))
age_group_type(c("2", "5", "1"))
age_group_type(c("0", "1-4"))

## could be any "single" or "lt"
age_group_type("0")

## could be "five" or "lt"
age_group_type("80-84")
age_group_type(c("5-9", "0-4", "100+"))
age_group_type(c("2", "5", "1"))
age_group_type(c("0", "1-4"))

## could be any "single" or "lt"
age_group_type("0")

## could be "five" or "lt"
age_group_type("80-84")

Create Age Labels

Description

Create labels for age groups. The labels depend on the type argument:

"single". One-year age groups, eg "0" or "55", and possibly an open age group, eg "90+".
"five". Five-year age groups, eg "0-4" or "55-59", and possibly an open age group, eg "100+".
"lt". Life table age groups, eg "0", "1-4", "5-9", "55-59", or "80+".

Usage

age_labels(type, min = 0, max = 100, open = NULL)
age_labels(type, min = 0, max = 100, open = NULL)

Arguments

`type`	Type of age group labels: `"single"`, `"five"`, or `"lt"`.
`min`	Minimum age. Defaults to 0.
`max`	Maximum age for closed age groups. Defaults to 100.
`open`	Whether the last age group is "open", ie has no upper limit.

Details

The first age group starts at the age specified by min. If open is TRUE, then the final age group starts at the age specified by max. Otherwise, the final age group ends at the age specified by max.

open defaults to TRUE when min equals zero, and to FALSE otherwise.

Value

A character vector.

Examples

age_labels(type = "single", min = 15, max = 40)
age_labels(type = "five")
age_labels(type = "lt", max = 80)
age_labels(type = "single", min = 15, max = 40)
age_labels(type = "five")
age_labels(type = "lt", max = 80)

Lower Limits, Midpoints, and Upper Limits of Age Groups

Description

Given a vector x of age group labels, return a numeric vector.

age_lower() returns the lower limits of each age group,
age_mid() returns the midpoints, and
age_upper() returns the upper limits.

Vector x must describe 1-year, 5-year or life-table age groups: see age_labels() for examples. x can format these age groups in any way understood by reformat_age().

Usage

age_lower(x)

age_mid(x)

age_upper(x)
age_lower(x)

age_mid(x)

age_upper(x)

Arguments

`x`	A vector of age group labels.

Details

These functions can make age groups easier to work with. Lower and upper limits can be used for selecting on age. Replacing age group with midpoints can improve graphs.

Value

A numeric vector, the same length as x.

Examples

x <- c("15-19", "5-9", "50+")
age_lower(x)
age_mid(x)
age_upper(x)

## non-standard formats are OK
age_lower(c("infants", "100 and over"))

df <- data.frame(age = c("1-4", "10-14", "5-9", "0"),
                 rate = c(0.023, 0.015, 0.007, 0.068))
df
subset(df, age_lower(age) >= 5)
x <- c("15-19", "5-9", "50+")
age_lower(x)
age_mid(x)
age_upper(x)

## non-standard formats are OK
age_lower(c("infants", "100 and over"))

df <- data.frame(age = c("1-4", "10-14", "5-9", "0"),
                 rate = c(0.023, 0.015, 0.007, 0.068))
df
subset(df, age_lower(age) >= 5)

Validity Checks for Age Labels

Description

Check that age labels can be parsed and, optionally, whether the labels are complete, unique, start at zero, and end with an open age group.

Usage

check_age(x, complete = FALSE, unique = FALSE, zero = FALSE, open = FALSE)
check_age(x, complete = FALSE, unique = FALSE, zero = FALSE, open = FALSE)

Arguments

`x`	A vector of age labels.
`complete`	If `TRUE`, test whether `x` has gaps.
`unique`	If `TRUE`, test whether `x` has duplicates.
`zero`	If `TRUE`, test whether youngest age group in `x` starts at 0.
`open`	If `TRUE`, test whether oldest age group in `x` is open.

Details

By default, check_age() only tests whether a set of labels can be parsed as single-year, five-year, or life table age groups. (See age_group_type() for more on the three types of age group.) However, it can also apply the following tests:

complete. Whether x includes all intermediate age groups, with no gaps. For instance, the labels c("10-14", "15-19", "5-9") are complete, while the labelsc("15-19", "5-9") are not (because they are missing "10-14".)
unique. Whether x has duplicated labels.
zero. Whether the youngest age group in x starts at age 0, ie whether it includes "0" or "0-4".
open. Whether the oldest age group in x has an "open" age group, such as "100+" or "65+", that has no upper limit.

Value

TRUE, invisibly, or raises an error if a test fails.

Examples

try(
  check_age(c("10-14", "0-4", "15+"),
            complete = TRUE)  
)

try(
  check_age(c("10-14", "5-9", "0-4", "5-9", "15+"),
            unique = TRUE)
)

try(
  check_age(c("10-14", "5-9", "15+"),
            zero = TRUE)
)

try(
  check_age(c("10-14", "0-4", "5-9"),
            open = TRUE)
)
try(
  check_age(c("10-14", "0-4", "15+"),
            complete = TRUE)  
)

try(
  check_age(c("10-14", "5-9", "0-4", "5-9", "15+"),
            unique = TRUE)
)

try(
  check_age(c("10-14", "5-9", "15+"),
            zero = TRUE)
)

try(
  check_age(c("10-14", "0-4", "5-9"),
            open = TRUE)
)

Check that Arguments have Same Length

Description

Check that x and y have the same length.

Usage

check_equal_length(x, y, nm_x, nm_y)
check_equal_length(x, y, nm_x, nm_y)

Arguments

`x`, `y`	Arguments to compare
`nm_x`, `nm_y`	Names to use in error message

Value

'TRUE', invisibly.

Examples

x <- 1:3
y <- 3:1
check_equal_length(x = x,
                   y = y,
                   nm_x = "x",
                   nm_y = "y")
x <- 1:3
y <- 3:1
check_equal_length(x = x,
                   y = y,
                   nm_x = "x",
                   nm_y = "y")

Check Whole Number

Description

Check that n is finite, non-NA scalar that is an integer or integerish (ie is equal to round(n)), and optionally within a specified range and divisible by a specified number.

Usage

check_n(n, nm_n, min, max, divisible_by)
check_n(n, nm_n, min, max, divisible_by)

Arguments

`n`	A whole number
`nm_n`	Name for 'n' to be used in error messages
`min`	Minimum value 'n' can take. Can be NULL.
`max`	Maximum values 'n' can take. Can be NULL.
`divisible_by`	'n' must be divisible by this. Can be NULL.

Value

If all tests pass, check_n() returns TRUE invisibly. Otherwise it throws an error.

Examples

check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1)
check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL)
check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)
check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1)
check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL)
check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)

Check that Colnum Vectors do not Overlap

Description

Given a named list of colnum vectors, like those produced by tidyselect::eval_select(), throw an error if there is an overlap.

Usage

check_no_overlap_colnums(x)
check_no_overlap_colnums(x)

Arguments

`x`	A named list of integer vectors.

Value

TRUE, invisibly

Examples

x <- list(arg1 = c(age = 1L),
          arg2 = c(gender = 4L, region = 5L))
check_no_overlap_colnums(x)
x <- list(arg1 = c(age = 1L),
          arg2 = c(gender = 4L, region = 5L))
check_no_overlap_colnums(x)

Aggregate Age Group Labels

Description

Convert age group labels to a less detailed classification. The three classifications recognized by combine_age() are "single", "five", and "lt", as defined on age_labels(). The following conversions are permitted:

"single" —> "lt"
"single" —> "five"
"lt" —> "five"

Usage

combine_age(x, to = c("five", "lt"))
combine_age(x, to = c("five", "lt"))

Arguments

`x`	A vector of age labels
`to`	Type of age classification to convert to: `"five"` or `"lt"`. Defaults to `"five"`.

Value

If x is a factor, then combine_age() returns a factor; otherwise it returns a character vector.

Examples

x <- c("0", "5", "3", "12")
combine_age(x)
combine_age(x, to = "lt")
x <- c("0", "5", "3", "12")
combine_age(x)
combine_age(x, to = "lt")

Derive Life Tables that Match Life Expectancies, using a Brass Logit Model

Description

Turn life expectancies at birth into full life tables, using the Brass logit model. The method is simple and is designed for simulations or for settings with little or no data on age-specific mortality rates. In settings where data on age-specific mortality is available, other methods might be more appropriate.

Usage

ex_to_lifetab_brass(
  target,
  standard,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  radix = 1e+05,
  suffix = NULL
)
ex_to_lifetab_brass(
  target,
  standard,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  radix = 1e+05,
  suffix = NULL
)

Arguments

`target`	A data frame containing a variable called `"ex"`, and possibly others. See Details.
`standard`	A data frame containing variables called `age` and `lx`, and possibly others. See details.
`infant`, `child`, `closed`, `open`	Methods used to calculate life expectancy. See `lifetab()` for details.
`radix`	Initial population for the `lx` column in the derived life table(s). Default is `100000`.
`suffix`	Optional suffix added to life table columns.

Value

A data frame containing one or more life tables.

Method

The method implemented by ex_to_lifetab_brass() is based on the observation that, if populations A and B are demographically similar, then, in many cases,

$\text{logit}(l_x^{\text{B}}) \approx \alpha + \beta \text{logit}(l_x^{\text{A}})$

where $l_x$ is the "survivorship probability" quantity from a life table. When populations are similar, $beta$ is often close to 1.

Given (i) target life expectancy, (ii) a set of $l_x^{\text{A}}$ ), (referred to as a "standard"), and (iii) a value for $\beta$ , ex_to_lifetab_brass() finds a value for $\alpha$ that yields a set of $l_x^{\text{B}}$ ) with the required life expectancy.

`target` argument

target is a data frame specifying life expectancies for each population being modelled, and, possibly, inputs to the calculations, and index variables. Values in target are not age-specific.

A variable called "ex", with life expectancy at birth must be included in target.
A variable called "beta" with values for beta can be included in target. This variable can be an rvec. If no "beta" variable is included in target, then ex_to_lifetab_brass() assumes that $beta \equiv 1$ .
A variable called "sex". If the infant argument to ex_to_lifetab_brass() is is "CD" or "AK", or if the child argument is "CD", target must include a ⁠"sex" variable, and the labels for this variable must be interpretable by function [format_sex()]. Otherwise, the ⁠"sex"' variable is optional, and there is no restriction on labels.
Other variables used to distinguish between life expectancies, such as time, region, or model variant.

`standard` argument

standard is a data frame specifying the $l_x$ to be used with each life expectancy in ex, and, optionally, values the average age person-years lived by people who die in each group, $_na_x$ . Values in standard are age-specific.

A variable called "age", with labels that can be parsed by reformat_age().
A variable called "lx". Internally each set of $l_x$ is are standardized so that the value for age 0 equals 1. Within each set, values must be non-increasing. Cannot be an rvec.
Additional variables used to match rows in standard to rows in target.

Internally, standard is merged with target using a left join from target, on any variables that target and standard have in common.

References

Brass W, Coale AJ. 1968. “Methods of analysis and estimation,” in Brass, W, Coale AJ, Demeny P, Heisel DF, et al. (eds). The Demography of Tropical Africa. Princeton NJ: Princeton University Press, pp. 88–139.

Moultrie TA, Timæus IM. 2013. Introduction to Model Life Tables. In Moultrie T, Dorrington R, Hill A, Hill K, Timæus I, Zaba B. (eds). Tools for Demographic Estimation. Paris: International Union for the Scientific Study of Population. online version.

Examples

## create new life tables based on level-1
## 'West' model life tables, but with lower
## life expectancy

library(dplyr, warn.conflicts = FALSE)

target <- data.frame(sex = c("Female", "Male"), 
                     ex = c(17.5, 15.6))

standard <- west_lifetab |>
    filter(level == 1) |>
    select(sex, age, lx)
    
ex_to_lifetab_brass(target = target,
                    standard = standard,
                    infant = "CD",
                    child = "CD")
## create new life tables based on level-1
## 'West' model life tables, but with lower
## life expectancy

library(dplyr, warn.conflicts = FALSE)

target <- data.frame(sex = c("Female", "Male"), 
                     ex = c(17.5, 15.6))

standard <- west_lifetab |>
    filter(level == 1) |>
    select(sex, age, lx)
    
ex_to_lifetab_brass(target = target,
                    standard = standard,
                    infant = "CD",
                    child = "CD")

Identify Sex or Gender Labels Referring to Females

Description

Given labels for sex or gender, try to infer which (if any) refer to females. If no elements look like a label for females, or if two or more elements do, then return NULL.

Usage

find_label_female(nms)
find_label_female(nms)

Arguments

nms

A character vector

Value

An element of nms or NULL.

Examples

find_label_female(c("Female", "Male")) ## one valid
find_label_female(c("0-4", "5-9"))     ## none valid
find_label_female(c("F", "Fem"))       ## two valid
find_label_female(c("Female", "Male")) ## one valid
find_label_female(c("0-4", "5-9"))     ## none valid
find_label_female(c("F", "Fem"))       ## two valid

Identify Sex or Gender Labels Referring to Males

Description

Given labels for sex or gender, try to infer which (if any) refer to males. If no elements look like a label for males, or if two or more elements do, then return NULL.

Usage

find_label_male(nms)
find_label_male(nms)

Arguments

nms

A character vector

Value

An element of nms or NULL.

Examples

find_label_male(c("Female", "Male")) ## one valid
find_label_male(c("0-4", "5-9"))     ## none valid
find_label_male(c("male", "m"))      ## two valid
find_label_male(c("Female", "Male")) ## one valid
find_label_male(c("0-4", "5-9"))     ## none valid
find_label_male(c("male", "m"))      ## two valid

Identify an Age Variable

Description

Find the element of nms that looks like an age variable. If no elements look like an age variable, or if two or more elements do, then return NULL.

Usage

find_var_age(nms)
find_var_age(nms)

Arguments

nms

A character vector

Value

An element of nms, or NULL.

Examples

find_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_age(c("Sex", "Year"))                 ## none valid
find_var_age(c("age", "age.years"))            ## two valid
find_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_age(c("Sex", "Year"))                 ## none valid
find_var_age(c("age", "age.years"))            ## two valid

Identify a Sex or Gender Variable

Description

Find the element of nms that looks like a sex or gender variable. If no elements look like a sex or gender variable, or if two or more elements do, then return NULL.

Usage

find_var_sexgender(nms)
find_var_sexgender(nms)

Arguments

nms

A character vector

Value

An element of nms, or NULL.

Examples

find_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_sexgender(c("Age", "Region"))               ## none valid
find_var_sexgender(c("sexgender", "sexes"))          ## two valid
find_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_sexgender(c("Age", "Region"))               ## none valid
find_var_sexgender(c("sexgender", "sexes"))          ## two valid

Identify a Time Variable

Description

Find the element of nms that looks like an time variable. If no elements look like a time variable, or if two or more elements do, then return NULL.

Usage

find_var_time(nms)
find_var_time(nms)

Arguments

nms

A character vector

Value

An element of nms, or NULL.

Examples

find_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_time(c("Sex", "Region"))               ## none valid
find_var_time(c("time", "year"))                ## two valid
find_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid
find_var_time(c("Sex", "Region"))               ## none valid
find_var_time(c("time", "year"))                ## two valid

Get a named vector of column indices for the grouping variables in a grouped data frame

Description

Constructed a named vector of indices equivalent to the vectors produced by tidyselect::eval_select, but for the grouping variables in an object of class "grouped_df".

Usage

groups_colnums(data)
groups_colnums(data)

Arguments

data

A data frame.

Details

If data is not grouped, then groups_colnums returns a zero-length vector.

Value

A named integer vector.

Examples

library(dplyr)
df <- data.frame(x = 1:4,
                 g = c(1, 1, 2, 2))
groups_colnums(df)
df <- group_by(df, g)
groups_colnums(df)
library(dplyr)
df <- data.frame(x = 1:4,
                 g = c(1, 1, 2, 2))
groups_colnums(df)
df <- group_by(df, g)
groups_colnums(df)

Calculate Life Tables or Life Expectancies

Description

Calculate life table quantities. Function lifetab() returns an entire life table. Function lifeexp() returns life expectancy at birth. The inputs can be mortality rates (mx) or probabilities of dying (qx), though not both.

Usage

lifetab(
  data,
  mx = NULL,
  qx = NULL,
  age = age,
  sex = NULL,
  ax = NULL,
  by = NULL,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  radix = 1e+05,
  suffix = NULL
)

lifeexp(
  data,
  mx = NULL,
  qx = NULL,
  at = 0,
  age = age,
  sex = NULL,
  ax = NULL,
  by = NULL,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  suffix = NULL
)
lifetab(
  data,
  mx = NULL,
  qx = NULL,
  age = age,
  sex = NULL,
  ax = NULL,
  by = NULL,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  radix = 1e+05,
  suffix = NULL
)

lifeexp(
  data,
  mx = NULL,
  qx = NULL,
  at = 0,
  age = age,
  sex = NULL,
  ax = NULL,
  by = NULL,
  infant = c("constant", "linear", "CD", "AK"),
  child = c("constant", "linear", "CD"),
  closed = c("constant", "linear"),
  open = "constant",
  suffix = NULL
)

Arguments

`data`	Data frame with mortality data.
`mx`	<`tidyselect`> Mortality rates, expressed as deaths per person-year lived. Possibly an rvec.
`qx`	<`tidyselect`> Probability of dying within age interval. An alternative to `mx`. Possibly an rvec.
`age`	<`tidyselect`> Age group labels. The labels must be interpretable by functions such as `reformat_age()` and `age_group_type()`. The first age group must start at age 0, and the last age group must be "open", with no upper limit.
`sex`	<`tidyselect`> Biological sex, with labels that can be interpreted by `reformat_sex()`. Needed only when `infant` is `"CD"` or `"AK"`, or `child` is `"CD"`.
`ax`	<`tidyselect`> Average age at death within age group. Optional. See Details.
`by`	<`tidyselect`> Separate life tables, or life expectancies, calculated for each combination the `by` variables. If a `sex` variable was specified, then that variable is automatically included among the `by` variables. If `data` is a grouped data frame, then the grouping variables take precedence over `by`.
`infant`	Method used to calculate life table values in age group `"0"`. Ignored if `age` does not include age group `"0"`. Default is `"constant"`.
`child`	Method used to calculate life table values in age group `"1-4"`. Ignored if `age` does not include age group `"0"`. Default is `"constant"`.
`closed`	Method used to calculate life table values in closed age intervals other than `"0"` and `"1-4"` (ie intervals such as "10-14" or "12"). Default is `"constant"`.
`open`	Method used to calculate life table values in the final, open age group (eg `"80+"` or `"110+"`). Currently the only option is '"constant".
`radix`	Initial population for the `lx` column. Default is `100000`.
`suffix`	Optional suffix added to new columns in result.
`at`	Age at which life expectancy is calculated (`⁠lifeexp() only). Default is ⁠`0'.

Value

A tibble.

Definitions of life table quantities

mx Deaths per person-year lived.
qx Probability of surviving from the start of age group 'x' to the end.
lx Number of people alive at the start of age group x.
dx Number of deaths in age group x
Lx Expected number of person years lived in age group x.
ex Life expectancy, calculated at the start of age group x.

Mortality rates mx are sometimes expressed as deaths per 1000 person-years lived, or per 100,000 person-years lived. lifetab() and lifeexp() assumed that they are expressed as deaths per person-year lived.

Calculation methods

lifetab() and lifeexp() implement several methods for calculating life table quantities from mortality rates. Each method makes different assumptions about the way that mortality rates vary within age intervals:

"constant" Mortality rates are constant within each interval.
"linear". Life table quantity lx is a straight line within each interval. Equivalently, deaths are distributed uniformly within each interval.
"CD". Used only with age groups "0" and "1-4". Mortality rates decline over the age interval, with the slope depending on the absolute level of infant mortality. The formulas were developed by Coale and Demeny (1983), and used in Preston et al (2001).
"AK". Used only with age group "0". Mortality rates decline over the age interval, with the slope depending on the absolute level of infant mortality. The formulas were formulas developed by Andreev and Kingkade (2015), and are used in the Human Mortality Database methods protocol.

For a detailed description of the methods, see the vignette for poputils.

ax

ax is the average number of years lived in an age interval by people who die in that interval. Demographers sometimes refer to it as the 'separation factor'. If a non-NA value of ax is supplied for an age group, then the results for that age group are based on the formula

$m_x = d_x / (n_x l_x + a_x d_x)$

(where n_x is the width of the age interval), over-riding any methods specified via the infant, child, closed and open arguments.

Open age group when inputs are qx

The probability of dying, qx, is always 1 in the final (open) age group. qx therefore provides no direct information on mortality conditions within the final age group. lifetab() and lifeexp() use conditions in the second-to-final age group as a proxy for conditions in the final age group. When open is "constant" (which is currently the only option), and no value for ax in the final age group is provided, lifetab() and lifeexp() assume that $m_A = m_{A-1}$ , and set $L_{A} = l_A / m_A$ .

In practice, mortality is likely to be higher in the final age group than in the second-to-final age group, so the default procedure is likely to lead to inaccuracies. When the size of the final age group is very small, these inaccuracies will be inconsequential. But in other cases, it may be necessary to supply an explicit value for ax for the final age group, or to use mx rather than qx as inputs.

Using rvecs to represent uncertainty

An rvec is a 'random vector', holding multiple draws from a distribution. Using an rvec for the mx argument to lifetab() or lifeexp() is a way of representing uncertainty. This uncertainty is propagated through to the life table values, which will also be rvecs.

References

Preston SH, Heuveline P, and Guillot M. 2001. Demography: Measuring and Modeling Population Processes Oxford: Blackwell.
Coale AJ, Demeny P, and Vaughn B. 1983. Regional model life tables and stable populations New York: Academic Press.
Andreev, E.M. and Kingkade, W.W., 2015. Average age at death in infancy and infant mortality level: Reconsidering the Coale-Demeny formulas at current levels of low mortality. Demographic Research, 33, pp.363-390.
Human Mortality Database Methods Protocol.
Tools for Demographic Estimation.

Examples

library(dplyr)

## life table for females based on 'level 1'
## mortality rates "West" model life table
west_lifetab |>
    filter(sex == "Female",
           level == 1) |>
    lifetab(mx = mx)

## change method for infant and children from
## default ("constant") to "CD"
west_lifetab |>
    filter(sex == "Female",
           level == 1) |>
    lifetab(mx = mx,
            sex = sex,
            infant = "CD",
            child = "CD")

## calculate life expectancies
## for all levels, using the 'by'
## argument to distinguish levels
west_lifetab |>
    lifeexp(mx = mx,
            sex = sex,
            infant = "CD",
            child = "CD",
            by = level)

## obtain the same result using
## 'group_by'
west_lifetab |>
  group_by(level) |>
  lifeexp(mx = mx,
          sex = sex,
          infant = "CD",
          child = "CD")

## calculations based on 'qx'
west_lifetab |>
  lifeexp(qx = qx,
          sex = sex,
          by = level)

## life expectancy at age 60
west_lifetab |>
  filter(level == 10) |>
  lifeexp(mx = mx,
          at = 60,
          sex = sex)
library(dplyr)

## life table for females based on 'level 1'
## mortality rates "West" model life table
west_lifetab |>
    filter(sex == "Female",
           level == 1) |>
    lifetab(mx = mx)

## change method for infant and children from
## default ("constant") to "CD"
west_lifetab |>
    filter(sex == "Female",
           level == 1) |>
    lifetab(mx = mx,
            sex = sex,
            infant = "CD",
            child = "CD")

## calculate life expectancies
## for all levels, using the 'by'
## argument to distinguish levels
west_lifetab |>
    lifeexp(mx = mx,
            sex = sex,
            infant = "CD",
            child = "CD",
            by = level)

## obtain the same result using
## 'group_by'
west_lifetab |>
  group_by(level) |>
  lifeexp(mx = mx,
          sex = sex,
          infant = "CD",
          child = "CD")

## calculations based on 'qx'
west_lifetab |>
  lifeexp(qx = qx,
          sex = sex,
          by = level)

## life expectancy at age 60
west_lifetab |>
  filter(level == 10) |>
  lifeexp(mx = mx,
          at = 60,
          sex = sex)

Logit and Inverse-Logit Functions

Description

Transform values to and from the logit scale. logit() calculates

Usage

logit(p)

invlogit(x)
logit(p)

invlogit(x)

Arguments

`p`	Values in the interval `⁠[0, 1]⁠`. Can be an atomic vector, a matrix, or an rvec.
`x`	Values in the interval `⁠(-Inf, Inf)⁠`. Can be an atomic vector, a matrix, or an rvec.

Details

$x = \log \left(\frac{p}{1 - p}\right)$

and invlogit() calculates

$p = \frac{e^x}{1 + e^x}$

To avoid overflow, invlogit() uses $p = \frac{1}{1 + e^{-x}}$ internally for $x$ where $x > 0$ .

In some of the demographic literature, the logit function is defined as

$x = \frac{1}{2} \log \left(\frac{p}{1 - p}\right).$

logit() and invlogit() follow the conventions in statistics and machine learning, and omit the $\frac{1}{2}$ .

Value

A vector of doubles, if p or x is a vector.
A matrix of doubles, if p or x is a matrix.
An object of class rvec_dbl, if p or x is an rvec.

Examples

p <- c(0.5, 1, 0.2)
logit(p)
invlogit(logit(p))
p <- c(0.5, 1, 0.2)
logit(p)
invlogit(logit(p))

Turn a Matrix Into a List of Columns or Rows

Description

Given a matrix, create a list, each element of which contains a column or row from the matrix.

Usage

matrix_to_list_of_cols(m)

matrix_to_list_of_rows(m)
matrix_to_list_of_cols(m)

matrix_to_list_of_rows(m)

Arguments

m

A matrix

Details

matrix_to_list_of_cols() and 'matrix_to_list_of_rows() are internal functions, for use by developers, and would not normally be called directly by end users.

Value

matrix_to_list_of_cols() A list of vectors, each of which is a column from x.
matrix_to_list_of_rows(), A list of vectors, each of which is a row from x.

Examples

m <- matrix(1:12, nrow = 3)
matrix_to_list_of_cols(m)
matrix_to_list_of_rows(m)
m <- matrix(1:12, nrow = 3)
matrix_to_list_of_cols(m)
matrix_to_list_of_rows(m)

Mortality Data for New Zealand

Description

Counts of deaths and population, by age, sex, and calendar year, plus mortality rates, for New Zealand, 2021-2022.

Usage

nzmort
nzmort

Format

A data frame with 84 rows and the following variables:

year: Calendar year.
gender: "Female", and "Male".
age: Age, in life table age groups, with an open age group of 95+.
deaths: Counts of deaths, randomly rounded to base 3.
popn: Estimates of average annual population.
mx: Mortality rates (deaths / popn).

Source

Modified from data in tables "Deaths by age and sex (Annual-Dec)" and "Estimated Resident Population by Age and Sex (1991+) (Annual-Dec)" from Stats NZ online database Infoshare, downloaded on 24 September 2023.

Mortality Data and Probabilistic Rates for New Zealand

Description

A modified version of link{nzmort} where mx columns is an rvec, rather than an ordinary R vector. The rvec holds the random draws from the posterior distribution obtained from by a Bayesian statistical model.

Usage

nzmort_rvec
nzmort_rvec

Format

An object of class tbl_df (inherits from tbl, data.frame) with 84 rows and 4 columns.

Convert q0 to m0

Description

Convert the probability of dying during infancy (q0) to the mortality rate for infancy (m0).

Usage

q0_to_m0(
  q0,
  sex = NULL,
  a0 = NULL,
  infant = c("constant", "linear", "CD", "AK")
)
q0_to_m0(
  q0,
  sex = NULL,
  a0 = NULL,
  infant = c("constant", "linear", "CD", "AK")
)

Arguments

`q0`	Probability of dying in first year of life. A numeric vector or an rvec.
`sex`	Biological sex. A vector the same length as `q0`, with labels that can be interpreted by `reformat_sex()`. Needed only when `infant` is `"CD"` or `"AK"`.
`a0`	Average age at death for infants who die. Optional. See help for `lifetab()`.
`infant`	Calculation method. See help for `lifetab()`. Default is `"constant"`.

Value

A numeric vector or rvec.

Warning

The term "infant mortality rate" is ambiguous. Demographers sometimes use it to refer to m0 (which is an actual rate) and sometimes use it to refer to q0 (which is a probability.)

Examples

library(dplyr, warn.conflicts = FALSE)
west_lifetab |>
 filter(age == 0, level <= 5) |>
 select(level, sex, age, mx, qx) |>
 mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))
library(dplyr, warn.conflicts = FALSE)
west_lifetab |>
 filter(age == 0, level <= 5) |>
 select(level, sex, age, mx, qx) |>
 mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))

Reformat Age Group Labels

Description

Convert age group labels to one of three formats:

Single-year age groups, eg "0", "1", ..., "99", "100+".
Life table age groups, eg "0", ⁠"1-4", ⁠"5-9"⁠, ..., ⁠"95-99"⁠, ⁠"100+"'.
Five-year age groups, eg "0-4", "5-9", ..., "95-99", "100+".

By default reformat_age() returns a factor that includes all intermediate age groups. See below for examples.

Usage

reformat_age(x, factor = TRUE)
reformat_age(x, factor = TRUE)

Arguments

`x`	A vector.
`factor`	Whether the return value should be a factor.

Details

reformat_age() applies the following algorithm:

Tidy and translate text, eg convert "20 to 24 years" to "20-24", convert "infant" to "0", or convert "100 or more" to "100+".
Check whether the resulting labels could have been produced by age_labels(). If not, throw an error.
If factor is TRUE (the default), then return a factor. The levels of this factor include all intermediate age groups. Otherwise return a character vector.

When x consists entirely of numbers, reformat_age() also checks for two special cases:

If every element of x is a multiple of 5, and if max(x) >= 50, then x is assumed to describe 5-year age groups
If every element of x is 0, 1, or a multiple of 5, with max(x) >= 50, then x is assumed to describe life table age groups.

Value

If factor is TRUE, then reformat_age() returns a factor; otherwise it returns a character vector.

Examples

reformat_age(c("80 to 84", "90 or more", "85 to 89"))

## factor contains intermediate level missing from 'x'
reformat_age(c("80 to 84", "90 or more"))

## non-factor
reformat_age(c("80 to 84", "90 or more"),
          factor = FALSE)

## single
reformat_age(c("80", "90plus"))

## life table
reformat_age(c("0",
            "30-34",
            "10--14",
            "1-4 years"))
reformat_age(c("80 to 84", "90 or more", "85 to 89"))

## factor contains intermediate level missing from 'x'
reformat_age(c("80 to 84", "90 or more"))

## non-factor
reformat_age(c("80 to 84", "90 or more"),
          factor = FALSE)

## single
reformat_age(c("80", "90plus"))

## life table
reformat_age(c("0",
            "30-34",
            "10--14",
            "1-4 years"))

Reformat a Binary Sex Variable

Description

Reformat a binary sex variable so that it consists entirely of values "Female", "Male", and possibly NA and any values included in except.

Usage

reformat_sex(x, except = NULL, factor = TRUE)
reformat_sex(x, except = NULL, factor = TRUE)

Arguments

`x`	A vector.
`except`	Values to exclude when reformatting.
`factor`	Whether the return value should be a factor.

Details

When parsing labels, reformat_sex() ignores case: "FEMALE" and "fEmAlE" are equivalent.

White space is removed from the beginning and end of labels.

reformat_sex() does not try to interpreting numeric codes (eg 1, 2).

Value

If factor is TRUE, then reformat_age() returns a factor; otherwise it returns a character vector.

Examples

reformat_sex(c("F", "female", NA, "MALES"))

## values supplied for 'except'
reformat_sex(c("Fem", "Other", "Male", "M"),
             except = c("Other", "Diverse"))

## return an ordinary character vector
reformat_sex(c("F", "female", NA, "MALES"),
             factor = FALSE)
reformat_sex(c("F", "female", NA, "MALES"))

## values supplied for 'except'
reformat_sex(c("Fem", "Other", "Male", "M"),
             except = c("Other", "Diverse"))

## return an ordinary character vector
reformat_sex(c("F", "female", NA, "MALES"),
             factor = FALSE)

Randomly Round A Vector of Integers to Base 3

Description

Apply the 'Random Round to Base 3' (RR3) algorithm to a vector of integers (or doubles where round(x) == x.

Usage

rr3(x)
rr3(x)

Arguments

`x`	A vector of integers (in the sense that `round(x) == x`.) Can be an rvec.

Details

The RR3 algorithm is used by statistical agencies to confidentialize data. Under the RR3 algorithm, an integer $n$ is randomly rounded as follows:

If $n$ is divisible by 3, leave it unchanged
If dividing $n$ by 3 leaves a remainder of 1, then round down (subtract 1) with probability 2/3, and round up (add 2) with probability 1/3.
If dividing $n$ by 3 leaves a remainder of 1, then round down (subtract 2) with probability 1/3, and round up (add 1) with probability 2/3.

RR3 has some nice properties:

The randomly-rounded version of $n$ has expected value $n$ .
If $n$ non-negative, then the randomly rounded version of $n$ is non-negative.
If $n$ is non-positive, then the randomly rounded version of $n$ is non-positive.

Value

A randomly-rounded version of x.

Examples

x <- c(1, 5, 2, 0, -1, 3, NA)
rr3(x)
x <- c(1, 5, 2, 0, -1, 3, NA)
rr3(x)

Specify Open Age Group

Description

Set the lower limit of the open age group. Given a vector of age group labels, recode all age groups with a lower limit greater than or equal to ⁠<lower>⁠ to ⁠<lower>+⁠.

Usage

set_age_open(x, lower)
set_age_open(x, lower)

Arguments

`x`	A vector of age labels.
`lower`	An integer. The lower limit for the open age group.

Details

set_age_open() requires that x and the return value have a a five-year, single-year, or life table format, as described in age_labels().

Value

A modified version of x.

Examples

x <- c("100+", "80-84", "95-99", "20-24")
set_age_open(x, 90)
set_age_open(x, 25)
x <- c("100+", "80-84", "95-99", "20-24")
set_age_open(x, 90)
set_age_open(x, 25)

Build a Matrix from Measure and ID Variables

Description

Build a matrix where the elements are values of a measure variable, and the rows and columns are formed by observed combinations of ID variables. The ID variables picked out by rows and cols must uniquely identify cells. to_matrix(), unlike stats::xtabs(), does not sum across multiple combinations of ID variables.

Usage

to_matrix(x, rows, cols, measure)
to_matrix(x, rows, cols, measure)

Arguments

`x`	A data frame.
`rows`	The ID variable(s) used to distinguish rows in the matrix.
`cols`	The ID variable(s) used to distinguish columns in the matrix.
`measure`	The measure variable, eg rates or counts.

Value

A matrix

Examples

x <- expand.grid(age = c(0, 1, 2),
                 sex = c("F", "M"),
                 region = c("A", "B"),
                 year = 2000:2001)
x$count <- 1:24

to_matrix(x,
          rows = c(age, sex),
          cols = c(region, year),
          measure = count)

to_matrix(x,
          rows = c(age, sex, region),
          cols = year,
          measure = count)

## cells not uniquely identified
try(
to_matrix(x,
          rows = age,
          cols = sex,
          measure = count)
)
x <- expand.grid(age = c(0, 1, 2),
                 sex = c("F", "M"),
                 region = c("A", "B"),
                 year = 2000:2001)
x$count <- 1:24

to_matrix(x,
          rows = c(age, sex),
          cols = c(region, year),
          measure = count)

to_matrix(x,
          rows = c(age, sex, region),
          cols = year,
          measure = count)

## cells not uniquely identified
try(
to_matrix(x,
          rows = age,
          cols = sex,
          measure = count)
)

Trim Values So They Are Between 0 and 1

Description

Trim a vector so that all values are greater than 0 and less than 1.

Usage

trim_01(x)
trim_01(x)

Arguments

`x`	A numeric vector. Can be an rvec.

Details

min is lowest element of x that is higher than 0, and
max is the highest element of x that is lower than 1, then trim_01()
shifts all elements of x that are lower than min upwards, so that they equal min, and
shifts all elements of x that are higher than max downwards, so that they equal max.

Value

A trimmed version of x

Examples

x <- c(1, 0.98, -0.001, 0.5, 0.01)
trim_01(x)
x <- c(1, 0.98, -0.001, 0.5, 0.01)
trim_01(x)

Coale-Demeny West Model Life Tables

Description

Life table quantities from the "West" family of Coale-Demeny model life tables.

Usage

west_lifetab
west_lifetab

Format

A data frame with 1,050 rows and the following variables:

level: Index for life table. Lower level implies lower life expectancy.
sex: "Female", and "Male".
age: Age, in life table age groups, with an open age group of 95+.
mx: Mortality rate.
ax: Average years lived in age interval by people who die in that interval.
qx: Probability some alive at start of age interval dies during interval.
lx: Number of people still alive at start of age interval.
dx: Number of people dying during age interval.
Lx: Number of person-years lived during age interval.
ex: Expectation of life at start of age interval.

Source

Coale A, Demeny P, and Vaughn B. 1983. Regional model life tables and stable populations. 2nd ed. New York: Academic Press, accessed via demogR::cdmltw().

Package 'poputils'

Help Index

Infer Age Label Type

Description

Usage

Arguments

Details

Value

Examples

Create Age Labels

Description

Usage

Arguments

Details

Value

See Also

Examples

Lower Limits, Midpoints, and Upper Limits of Age Groups

Description

Usage

Arguments

Details

Value

See Also

Examples

Validity Checks for Age Labels

Description

Usage

Arguments

Details

Value

See Also

Examples

Check that Arguments have Same Length

Description

Usage

Arguments

Value

Examples

Check Whole Number

Description

Usage

Arguments

Value

Examples

Check that Colnum Vectors do not Overlap

Description

Usage

Arguments

Value

See Also

Examples

Aggregate Age Group Labels

Description

Usage

Arguments

Value

See Also

Examples

Derive Life Tables that Match Life Expectancies, using a Brass Logit Model

Description

Usage

Arguments

Value

Method

target argument

standard argument

References

See Also

Examples

Identify Sex or Gender Labels Referring to Females

Description

Usage

Arguments

Value

See Also

Examples

Identify Sex or Gender Labels Referring to Males

Description

Usage

`target` argument

`standard` argument