Title: | Demographic Analysis and Data Manipulation |
---|---|
Description: | Perform tasks commonly encountered when preparing and analysing demographic data. Some functions are intended for end users, and others for developers. Includes functions for working with life tables. |
Authors: | John Bryant [aut, cre], Bayesian Demography Limited [cph] |
Maintainer: | John Bryant <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.4 |
Built: | 2024-11-20 05:36:04 UTC |
Source: | https://github.com/bayesiandemography/poputils |
Determine whether a set of age labels refer to one-year, five-year, or life-table age groups.
age_group_type(x)
age_group_type(x)
x |
A vector of age labels |
The valid types of age labels are:
"single"
. One-year age groups, eg
"0"
or "55"
, and possibly
an open age group, eg "90+"
.
"five"
. Five-year age groups, eg
"0-4"
or "55-59"
, and possibly
an open age group, eg "100+"
.
"lt"
. Life table age groups, eg
"0"
, "1-4"
, "5-9"
,
"55-59"
, or "80+"
.
If x
does not fit any of these
descriptions, then age_group_type()
throws
an error.
If x
could belong to
more than one type, then age_group_type()
prefers "single"
to "five"
and "lt"
,
and prefers "five"
to "lt"
.
"single"
, "five"
, or "lt"
.
age_group_type(c("5-9", "0-4", "100+")) age_group_type(c("2", "5", "1")) age_group_type(c("0", "1-4")) ## could be any "single" or "lt" age_group_type("0") ## could be "five" or "lt" age_group_type("80-84")
age_group_type(c("5-9", "0-4", "100+")) age_group_type(c("2", "5", "1")) age_group_type(c("0", "1-4")) ## could be any "single" or "lt" age_group_type("0") ## could be "five" or "lt" age_group_type("80-84")
Create labels for age groups. The labels depend
on the type
argument:
"single"
. One-year age groups, eg
"0"
or "55"
, and possibly
an open age group, eg "90+"
.
"five"
. Five-year age groups, eg
"0-4"
or "55-59"
, and possibly
an open age group, eg "100+"
.
"lt"
. Life table age groups, eg
"0"
, "1-4"
, "5-9"
,
"55-59"
, or "80+"
.
age_labels(type, min = 0, max = 100, open = NULL)
age_labels(type, min = 0, max = 100, open = NULL)
type |
Type of age group labels:
|
min |
Minimum age. Defaults to 0. |
max |
Maximum age for closed age groups. Defaults to 100. |
open |
Whether the last age group is "open", ie has no upper limit. |
The first age group starts at the age
specified by min
. If open
is TRUE
,
then the final age group starts at the age
specified by max
. Otherwise, the
final age group ends at the age specified
by max
.
open
defaults to TRUE
when
min
equals zero, and to FALSE
otherwise.
A character vector.
age_labels(type = "single", min = 15, max = 40) age_labels(type = "five") age_labels(type = "lt", max = 80)
age_labels(type = "single", min = 15, max = 40) age_labels(type = "five") age_labels(type = "lt", max = 80)
Given a vector x
of age group labels, return
a numeric vector.
age_lower()
returns the lower limits of each age group,
age_mid()
returns the midpoints, and
age_upper()
returns the upper limits.
Vector x
must describe 1-year, 5-year or life-table
age groups: see age_labels()
for examples. x
can
format these age groups in any way understood by
reformat_age()
.
age_lower(x) age_mid(x) age_upper(x)
age_lower(x) age_mid(x) age_upper(x)
x |
A vector of age group labels. |
These functions can make age groups easier to work with. Lower and upper limits can be used for selecting on age. Replacing age group with midpoints can improve graphs.
A numeric vector, the same length as x
.
x <- c("15-19", "5-9", "50+") age_lower(x) age_mid(x) age_upper(x) ## non-standard formats are OK age_lower(c("infants", "100 and over")) df <- data.frame(age = c("1-4", "10-14", "5-9", "0"), rate = c(0.023, 0.015, 0.007, 0.068)) df subset(df, age_lower(age) >= 5)
x <- c("15-19", "5-9", "50+") age_lower(x) age_mid(x) age_upper(x) ## non-standard formats are OK age_lower(c("infants", "100 and over")) df <- data.frame(age = c("1-4", "10-14", "5-9", "0"), rate = c(0.023, 0.015, 0.007, 0.068)) df subset(df, age_lower(age) >= 5)
Check that age labels can be parsed and, optionally, whether the labels are complete, unique, start at zero, and end with an open age group.
check_age(x, complete = FALSE, unique = FALSE, zero = FALSE, open = FALSE)
check_age(x, complete = FALSE, unique = FALSE, zero = FALSE, open = FALSE)
x |
A vector of age labels. |
complete |
If |
unique |
If |
zero |
If |
open |
If |
By default, check_age()
only tests whether
a set of labels can be parsed as single-year,
five-year, or life table age groups.
(See age_group_type()
for more on the three
types of age group.) However, it can
also apply the following tests:
complete
. Whether x
includes
all intermediate age groups, with no gaps.
For instance, the labels c("10-14", "15-19", "5-9")
are complete, while the labelsc("15-19", "5-9")
are not (because they are missing "10-14"
.)
unique
. Whether x
has duplicated labels.
zero
. Whether the youngest age group in x
starts
at age 0, ie whether it includes "0"
or "0-4"
.
open
. Whether the oldest age group in x
has an "open"
age group, such as "100+"
or "65+"
, that has no
upper limit.
TRUE
, invisibly, or raises an
error if a test fails.
reformat_age()
to convert age labels to
the format used by poputils.
try( check_age(c("10-14", "0-4", "15+"), complete = TRUE) ) try( check_age(c("10-14", "5-9", "0-4", "5-9", "15+"), unique = TRUE) ) try( check_age(c("10-14", "5-9", "15+"), zero = TRUE) ) try( check_age(c("10-14", "0-4", "5-9"), open = TRUE) )
try( check_age(c("10-14", "0-4", "15+"), complete = TRUE) ) try( check_age(c("10-14", "5-9", "0-4", "5-9", "15+"), unique = TRUE) ) try( check_age(c("10-14", "5-9", "15+"), zero = TRUE) ) try( check_age(c("10-14", "0-4", "5-9"), open = TRUE) )
Check that x
and y
have the same length.
check_equal_length(x, y, nm_x, nm_y)
check_equal_length(x, y, nm_x, nm_y)
x , y
|
Arguments to compare |
nm_x , nm_y
|
Names to use in error message |
'TRUE', invisibly.
x <- 1:3 y <- 3:1 check_equal_length(x = x, y = y, nm_x = "x", nm_y = "y")
x <- 1:3 y <- 3:1 check_equal_length(x = x, y = y, nm_x = "x", nm_y = "y")
Check that n
is finite, non-NA scalar that
is an integer or integerish (ie is equal to round(n)
),
and optionally within a specified range
and divisible by a specified number.
check_n(n, nm_n, min, max, divisible_by)
check_n(n, nm_n, min, max, divisible_by)
n |
A whole number |
nm_n |
Name for 'n' to be used in error messages |
min |
Minimum value 'n' can take. Can be NULL. |
max |
Maximum values 'n' can take. Can be NULL. |
divisible_by |
'n' must be divisible by this. Can be NULL. |
If all tests pass, check_n()
returns TRUE
invisibly.
Otherwise it throws an error.
check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1) check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL) check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)
check_n(10, nm_n = "count", min = 0, max = NULL, divisible_by = 1) check_n(10, nm_n = "count", min = NULL, max = NULL, divisible_by = NULL) check_n(10, nm_n = "n", min = 5, max = 10, divisible_by = 2)
Given a named list of colnum vectors, like those
produced by tidyselect::eval_select()
,
throw an error if there is an overlap.
check_no_overlap_colnums(x)
check_no_overlap_colnums(x)
x |
A named list of integer vectors. |
TRUE
, invisibly
x <- list(arg1 = c(age = 1L), arg2 = c(gender = 4L, region = 5L)) check_no_overlap_colnums(x)
x <- list(arg1 = c(age = 1L), arg2 = c(gender = 4L, region = 5L)) check_no_overlap_colnums(x)
Convert age group labels to a less detailed classification.
The three classifications recognized by combine_age()
are "single"
, "five"
, and "lt"
, as defined on
age_labels()
. The following conversions are permitted:
"single"
—> "lt"
"single"
—> "five"
"lt"
—> "five"
combine_age(x, to = c("five", "lt"))
combine_age(x, to = c("five", "lt"))
x |
A vector of age labels |
to |
Type of age classification
to convert to: |
If x
is a factor, then combine_age()
returns a factor; otherwise it returns a
character vector.
age_labels()
to create age group labels
reformat_age()
to convert existing age group labels
to a standard format
set_age_open()
to set the lower limit
of the open age group
x <- c("0", "5", "3", "12") combine_age(x) combine_age(x, to = "lt")
x <- c("0", "5", "3", "12") combine_age(x) combine_age(x, to = "lt")
Turn life expectancies at birth into full life tables, using the Brass logit model. The method is simple and is designed for simulations or for settings with little or no data on age-specific mortality rates. In settings where data on age-specific mortality is available, other methods might be more appropriate.
ex_to_lifetab_brass( target, standard, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL )
ex_to_lifetab_brass( target, standard, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL )
target |
A data frame containing a variable called
|
standard |
A data frame containing variables
called |
infant , child , closed , open
|
Methods used to
calculate life expectancy. See |
radix |
Initial population for the
|
suffix |
Optional suffix added to life table columns. |
A data frame containing one or more life tables.
The method implemented by ex_to_lifetab_brass()
is
based on the observation that, if populations A and B
are demographically similar, then, in many cases,
where is the "survivorship probability" quantity
from a life table. When populations are
similar,
is often close to 1.
Given (i) target life expectancy,
(ii) a set of ),
(referred to as a "standard"), and
(iii) a value for
,
ex_to_lifetab_brass()
finds
a value for that yields a set of
) with the required life expectancy.
target
argumenttarget
is a data frame specifying
life expectancies for each population being modelled,
and, possibly, inputs to the calculations, and
index variables. Values in target
are not age-specific.
A variable called "ex"
, with life expectancy at birth
must be included in target
.
A variable called "beta"
with values
for beta
can be included in target
.
This variable can be an rvec.
If no "beta"
variable is included in target
,
then ex_to_lifetab_brass()
assumes that
.
A variable called "sex"
. If the infant
argument to ex_to_lifetab_brass()
is is "CD"
or "AK"
,
or if the child
argument is "CD"
,
target
must include a "sex" variable, and the labels for this variable must be interpretable by function [format_sex()]. Otherwise, the
"sex"' variable is optional,
and there is no restriction on labels.
Other variables used to distinguish between life expectancies, such as time, region, or model variant.
standard
argumentstandard
is a data frame specifying
the to be used with each life expectancy
in
ex
, and, optionally, values the average age
person-years lived by people who die in each group,
. Values in
standard
are age-specific.
A variable called "age"
, with labels that
can be parsed by reformat_age()
.
A variable called "lx"
.
Internally each set of is are standardized
so that the value for age 0 equals 1.
Within each set, values must be non-increasing.
Cannot be an rvec.
Additional variables used to match rows in standard
to rows in target
.
Internally, standard
is merged with
target
using a left join from target
,
on any variables that target
and standard
have in common.
Brass W, Coale AJ. 1968. “Methods of analysis and estimation,” in Brass, W, Coale AJ, Demeny P, Heisel DF, et al. (eds). The Demography of Tropical Africa. Princeton NJ: Princeton University Press, pp. 88–139.
Moultrie TA, Timæus IM. 2013. Introduction to Model Life Tables. In Moultrie T, Dorrington R, Hill A, Hill K, Timæus I, Zaba B. (eds). Tools for Demographic Estimation. Paris: International Union for the Scientific Study of Population. online version.
logit()
, invlogit()
Logit function
lifeexp()
Calculate life expectancy from detailed inputs
## create new life tables based on level-1 ## 'West' model life tables, but with lower ## life expectancy library(dplyr, warn.conflicts = FALSE) target <- data.frame(sex = c("Female", "Male"), ex = c(17.5, 15.6)) standard <- west_lifetab |> filter(level == 1) |> select(sex, age, lx) ex_to_lifetab_brass(target = target, standard = standard, infant = "CD", child = "CD")
## create new life tables based on level-1 ## 'West' model life tables, but with lower ## life expectancy library(dplyr, warn.conflicts = FALSE) target <- data.frame(sex = c("Female", "Male"), ex = c(17.5, 15.6)) standard <- west_lifetab |> filter(level == 1) |> select(sex, age, lx) ex_to_lifetab_brass(target = target, standard = standard, infant = "CD", child = "CD")
Given labels for sex or gender, try to infer
which (if any) refer to females.
If no elements look like a label for females,
or if two or more elements do,
then return NULL
.
find_label_female(nms)
find_label_female(nms)
nms |
A character vector |
An element of nms
or NULL
.
find_label_male()
, find_var_sexgender()
find_label_female(c("Female", "Male")) ## one valid find_label_female(c("0-4", "5-9")) ## none valid find_label_female(c("F", "Fem")) ## two valid
find_label_female(c("Female", "Male")) ## one valid find_label_female(c("0-4", "5-9")) ## none valid find_label_female(c("F", "Fem")) ## two valid
Given labels for sex or gender, try to infer
which (if any) refer to males.
If no elements look like a label for males,
or if two or more elements do,
then return NULL
.
find_label_male(nms)
find_label_male(nms)
nms |
A character vector |
An element of nms
or NULL
.
find_label_female()
, find_var_sexgender()
find_label_male(c("Female", "Male")) ## one valid find_label_male(c("0-4", "5-9")) ## none valid find_label_male(c("male", "m")) ## two valid
find_label_male(c("Female", "Male")) ## one valid find_label_male(c("0-4", "5-9")) ## none valid find_label_male(c("male", "m")) ## two valid
Find the element of nms
that looks like an age variable.
If no elements look like an age variable, or if
two or more elements do,
then return NULL
.
find_var_age(nms)
find_var_age(nms)
nms |
A character vector |
An element of nms
, or NULL
.
find_var_time()
, find_var_sexgender()
find_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_age(c("Sex", "Year")) ## none valid find_var_age(c("age", "age.years")) ## two valid
find_var_age(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_age(c("Sex", "Year")) ## none valid find_var_age(c("age", "age.years")) ## two valid
Find the element of nms
that looks like
a sex or gender variable.
If no elements look like a sex or gender variable,
or if two or more elements do,
then return NULL
.
find_var_sexgender(nms)
find_var_sexgender(nms)
nms |
A character vector |
An element of nms
, or NULL
.
find_var_age()
, find_var_time()
, find_label_female()
,
find_label_male()
find_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_sexgender(c("Age", "Region")) ## none valid find_var_sexgender(c("sexgender", "sexes")) ## two valid
find_var_sexgender(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_sexgender(c("Age", "Region")) ## none valid find_var_sexgender(c("sexgender", "sexes")) ## two valid
Find the element of nms
that looks like an time variable.
If no elements look like a time variable, or if
two or more elements do,
then return NULL
.
find_var_time(nms)
find_var_time(nms)
nms |
A character vector |
An element of nms
, or NULL
.
find_var_age()
, find_var_sexgender()
find_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_time(c("Sex", "Region")) ## none valid find_var_time(c("time", "year")) ## two valid
find_var_time(c("Sex", "Year", "AgeGroup", NA)) ## one valid find_var_time(c("Sex", "Region")) ## none valid find_var_time(c("time", "year")) ## two valid
Constructed a named vector of indices equivalent to the vectors produced by tidyselect::eval_select, but for the grouping variables in an object of class "grouped_df".
groups_colnums(data)
groups_colnums(data)
data |
A data frame. |
If data
is not grouped, then groups_colnums
returns a zero-length vector.
A named integer vector.
library(dplyr) df <- data.frame(x = 1:4, g = c(1, 1, 2, 2)) groups_colnums(df) df <- group_by(df, g) groups_colnums(df)
library(dplyr) df <- data.frame(x = 1:4, g = c(1, 1, 2, 2)) groups_colnums(df) df <- group_by(df, g) groups_colnums(df)
Calculate life table quantities. Function
lifetab()
returns an entire life table.
Function lifeexp()
returns life expectancy at birth.
The inputs can be mortality rates (mx
) or
probabilities of dying (qx
), though not both.
lifetab( data, mx = NULL, qx = NULL, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL ) lifeexp( data, mx = NULL, qx = NULL, at = 0, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", suffix = NULL )
lifetab( data, mx = NULL, qx = NULL, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", radix = 1e+05, suffix = NULL ) lifeexp( data, mx = NULL, qx = NULL, at = 0, age = age, sex = NULL, ax = NULL, by = NULL, infant = c("constant", "linear", "CD", "AK"), child = c("constant", "linear", "CD"), closed = c("constant", "linear"), open = "constant", suffix = NULL )
data |
Data frame with mortality data. |
mx |
< |
qx |
< |
age |
< |
sex |
< |
ax |
< |
by |
< |
infant |
Method used to calculate
life table values in age group |
child |
Method used to calculate
life table values in age group |
closed |
Method used to calculate
life table values in closed age intervals
other than |
open |
Method used to calculate
life table values in the final, open age group
(eg |
radix |
Initial population for the
|
suffix |
Optional suffix added to new columns in result. |
at |
Age at which life expectancy is calculated
( |
A tibble.
mx
Deaths per person-year lived.
qx
Probability of surviving from the start
of age group 'x' to the end.
lx
Number of people alive at
the start of age group x
.
dx
Number of deaths in age group x
Lx
Expected number of person years lived in
age group x
.
ex
Life expectancy, calculated at the start
of age group x
.
Mortality rates mx
are sometimes expressed
as deaths per 1000 person-years lived, or per 100,000
person-years lived. lifetab()
and lifeexp()
assumed that they are expressed as deaths per
person-year lived.
lifetab()
and lifeexp()
implement several
methods for calculating life table quantities
from mortality rates. Each method makes
different assumptions about
the way that mortality rates vary within
age intervals:
"constant"
Mortality rates are constant
within each interval.
"linear"
. Life table quantity lx
is a straight line within each interval.
Equivalently, deaths are distributed uniformly
within each interval.
"CD"
. Used only with age groups "0"
and "1-4". Mortality rates decline
over the age interval,
with the slope depending on the absolute
level of infant mortality. The formulas were
developed by Coale and Demeny (1983),
and used in Preston et al (2001).
"AK"
. Used only with age group "0".
Mortality rates decline over the age interval,
with the slope depending on the absolute
level of infant mortality. The formulas
were formulas developed by Andreev and Kingkade (2015),
and are used in the Human Mortality Database
methods protocol.
For a detailed description of the methods, see the vignette for poputils.
ax
is the average number of years
lived in an age interval by people who
die in that interval. Demographers sometimes
refer to it as the 'separation factor'. If a non-NA
value of ax
is supplied for an age group,
then the results for that age group are based
on the formula
,
(where n_x
is the width of the age interval),
over-riding any methods specified via the infant
, child
,
closed
and open
arguments.
The probability of dying, qx
, is always 1 in the
final (open) age group. qx
therefore provides
no direct information on mortality conditions
within the final age group. lifetab()
and
lifeexp()
use conditions in the second-to-final
age group as a proxy for conditions in the final
age group. When open
is "constant"
(which
is currently the only option), and no value
for ax
in the final age group is provided,
lifetab()
and lifeexp()
assume
that , and set
.
In practice, mortality is likely to be higher
in the final age group than in the second-to-final
age group, so the default procedure is likely to
lead to inaccuracies. When the size of the final
age group is very small, these inaccuracies will
be inconsequential. But in other cases, it may
be necessary to supply an explicit value for
ax
for the final age group, or to use mx
rather than qx
as inputs.
An rvec is a 'random vector',
holding multiple draws from a distribution.
Using an rvec for the mx
argument to
lifetab()
or lifeexp()
is a way of representing
uncertainty. This uncertainty is propagated
through to the life table values, which will
also be rvecs.
Preston SH, Heuveline P, and Guillot M. 2001. Demography: Measuring and Modeling Population Processes Oxford: Blackwell.
Coale AJ, Demeny P, and Vaughn B. 1983. Regional model life tables and stable populations New York: Academic Press.
Andreev, E.M. and Kingkade, W.W., 2015. Average age at death in infancy and infant mortality level: Reconsidering the Coale-Demeny formulas at current levels of low mortality. Demographic Research, 33, pp.363-390.
Human Mortality Database Methods Protocol.
ex_to_lifetab_brass()
Calculate life table from minimal inputs
q0_to_m0()
Convert between infant mortality measures
library(dplyr) ## life table for females based on 'level 1' ## mortality rates "West" model life table west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx) ## change method for infant and children from ## default ("constant") to "CD" west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculate life expectancies ## for all levels, using the 'by' ## argument to distinguish levels west_lifetab |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD", by = level) ## obtain the same result using ## 'group_by' west_lifetab |> group_by(level) |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculations based on 'qx' west_lifetab |> lifeexp(qx = qx, sex = sex, by = level) ## life expectancy at age 60 west_lifetab |> filter(level == 10) |> lifeexp(mx = mx, at = 60, sex = sex)
library(dplyr) ## life table for females based on 'level 1' ## mortality rates "West" model life table west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx) ## change method for infant and children from ## default ("constant") to "CD" west_lifetab |> filter(sex == "Female", level == 1) |> lifetab(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculate life expectancies ## for all levels, using the 'by' ## argument to distinguish levels west_lifetab |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD", by = level) ## obtain the same result using ## 'group_by' west_lifetab |> group_by(level) |> lifeexp(mx = mx, sex = sex, infant = "CD", child = "CD") ## calculations based on 'qx' west_lifetab |> lifeexp(qx = qx, sex = sex, by = level) ## life expectancy at age 60 west_lifetab |> filter(level == 10) |> lifeexp(mx = mx, at = 60, sex = sex)
Transform values to and from the logit scale.
logit()
calculates
logit(p) invlogit(x)
logit(p) invlogit(x)
p |
Values in the interval |
x |
Values in the interval |
and invlogit()
calculates
To avoid overflow, invlogit()
uses
internally for
where
.
In some of the demographic literature, the logit function is defined as
logit()
and invlogit()
follow the conventions
in statistics and machine learning, and omit the
.
A vector of doubles, if p
or x
is a vector.
A matrix of doubles, if p
or x
is a matrix.
An object of class rvec_dbl
, if p
or x
is an rvec.
p <- c(0.5, 1, 0.2) logit(p) invlogit(logit(p))
p <- c(0.5, 1, 0.2) logit(p) invlogit(logit(p))
Given a matrix, create a list, each element of which contains a column or row from the matrix.
matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)
matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)
m |
A matrix |
matrix_to_list_of_cols()
and 'matrix_to_list_of_rows() are
internal functions, for use by developers, and would not
normally be called directly by end users.
matrix_to_list_of_cols()
A list of vectors,
each of which is a column from x
.
matrix_to_list_of_rows()
, A list of vectors,
each of which is a row from x
.
m <- matrix(1:12, nrow = 3) matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)
m <- matrix(1:12, nrow = 3) matrix_to_list_of_cols(m) matrix_to_list_of_rows(m)
Counts of deaths and population, by age, sex, and calendar year, plus mortality rates, for New Zealand, 2021-2022.
nzmort
nzmort
A data frame with 84 rows and the following variables:
year
: Calendar year.
gender
: "Female"
, and "Male"
.
age
: Age, in life table age groups, with an open age
group of 95+.
deaths
: Counts of deaths, randomly rounded to base 3.
popn
: Estimates of average annual population.
mx
: Mortality rates (deaths / popn).
Modified from data in tables "Deaths by age and sex (Annual-Dec)" and "Estimated Resident Population by Age and Sex (1991+) (Annual-Dec)" from Stats NZ online database Infoshare, downloaded on 24 September 2023.
A modified version of link{nzmort}
where mx
columns is an rvec, rather than an ordinary
R vector. The rvec holds the random draws from the posterior
distribution obtained from by a Bayesian statistical model.
nzmort_rvec
nzmort_rvec
An object of class tbl_df
(inherits from tbl
, data.frame
) with 84 rows and 4 columns.
Convert the probability of dying during infancy (q0) to the mortality rate for infancy (m0).
q0_to_m0( q0, sex = NULL, a0 = NULL, infant = c("constant", "linear", "CD", "AK") )
q0_to_m0( q0, sex = NULL, a0 = NULL, infant = c("constant", "linear", "CD", "AK") )
q0 |
Probability of dying in first year of life. A numeric vector or an rvec. |
sex |
Biological sex. A vector the same length
as |
a0 |
Average age at death for infants who die.
Optional. See help for |
infant |
Calculation method.
See help for |
A numeric vector or rvec.
The term "infant mortality rate" is ambiguous. Demographers sometimes use it to refer to m0 (which is an actual rate) and sometimes use it to refer to q0 (which is a probability.)
lifetab()
Calculate a full life table.
library(dplyr, warn.conflicts = FALSE) west_lifetab |> filter(age == 0, level <= 5) |> select(level, sex, age, mx, qx) |> mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))
library(dplyr, warn.conflicts = FALSE) west_lifetab |> filter(age == 0, level <= 5) |> select(level, sex, age, mx, qx) |> mutate(m0 = q0_to_m0(q0 = qx, sex = sex, infant = "CD"))
Convert age group labels to one of three formats:
Single-year age groups, eg
"0"
, "1"
, ..., "99"
, "100+"
.
Life table age groups, eg
"0"
, "1-4",
"5-9", ...,
"95-99",
"100+"'.
Five-year age groups, eg
"0-4"
, "5-9"
, ..., "95-99"
, "100+"
.
By default reformat_age()
returns a factor
that includes all intermediate age groups.
See below for examples.
reformat_age(x, factor = TRUE)
reformat_age(x, factor = TRUE)
x |
A vector. |
factor |
Whether the return value should be a factor. |
reformat_age()
applies the following algorithm:
Tidy and translate text,
eg convert "20 to 24 years"
to
"20-24"
, convert "infant"
to
"0"
, or convert "100 or more"
to
"100+"
.
Check whether the resulting
labels could have been produced by
age_labels()
. If not, throw an error.
If factor
is TRUE
(the default), then return a factor. The levels of
this factor include all intermediate age groups.
Otherwise return a character vector.
When x
consists entirely of numbers, reformat_age()
also checks for two special cases:
If every element of x
is a multiple of 5,
and if max(x) >= 50
, then x
is assumed to
describe 5-year age groups
If every element of x
is 0, 1, or a multiple
of 5, with max(x) >= 50
, then x
is assumed
to describe life table age groups.
If factor
is TRUE
,
then reformat_age()
returns a factor;
otherwise it returns a character vector.
reformat_age(c("80 to 84", "90 or more", "85 to 89")) ## factor contains intermediate level missing from 'x' reformat_age(c("80 to 84", "90 or more")) ## non-factor reformat_age(c("80 to 84", "90 or more"), factor = FALSE) ## single reformat_age(c("80", "90plus")) ## life table reformat_age(c("0", "30-34", "10--14", "1-4 years"))
reformat_age(c("80 to 84", "90 or more", "85 to 89")) ## factor contains intermediate level missing from 'x' reformat_age(c("80 to 84", "90 or more")) ## non-factor reformat_age(c("80 to 84", "90 or more"), factor = FALSE) ## single reformat_age(c("80", "90plus")) ## life table reformat_age(c("0", "30-34", "10--14", "1-4 years"))
Reformat a binary sex variable so
that it consists entirely of
values "Female"
, "Male"
,
and possibly NA
and any values
included in except
.
reformat_sex(x, except = NULL, factor = TRUE)
reformat_sex(x, except = NULL, factor = TRUE)
x |
A vector. |
except |
Values to exclude when reformatting. |
factor |
Whether the return value should be a factor. |
When parsing labels, reformat_sex()
ignores case: "FEMALE"
and "fEmAlE"
are equivalent.
White space is removed from the beginning and end of labels.
reformat_sex()
does not try to interpreting
numeric codes (eg 1
, 2
).
If factor
is TRUE
,
then reformat_age()
returns a factor;
otherwise it returns a character vector.
reformat_sex(c("F", "female", NA, "MALES")) ## values supplied for 'except' reformat_sex(c("Fem", "Other", "Male", "M"), except = c("Other", "Diverse")) ## return an ordinary character vector reformat_sex(c("F", "female", NA, "MALES"), factor = FALSE)
reformat_sex(c("F", "female", NA, "MALES")) ## values supplied for 'except' reformat_sex(c("Fem", "Other", "Male", "M"), except = c("Other", "Diverse")) ## return an ordinary character vector reformat_sex(c("F", "female", NA, "MALES"), factor = FALSE)
Apply the 'Random Round to Base 3' (RR3)
algorithm to a vector of integers
(or doubles where round(x) == x
.
rr3(x)
rr3(x)
x |
A vector of integers (in the
sense that |
The RR3 algorithm is used by statistical
agencies to confidentialize data. Under the
RR3 algorithm, an integer
is randomly rounded as follows:
If is divisible by 3, leave it unchanged
If dividing by 3 leaves a remainder of 1, then
round down (subtract 1) with probability 2/3,
and round up (add 2) with probability 1/3.
If dividing by 3 leaves a remainder of 1,
then round down (subtract 2)
with probability 1/3, and round up (add 1)
with probability 2/3.
RR3 has some nice properties:
The randomly-rounded version of
has expected value
.
If non-negative, then the randomly
rounded version of
is non-negative.
If is non-positive, then the randomly
rounded version of
is non-positive.
A randomly-rounded version of x
.
x <- c(1, 5, 2, 0, -1, 3, NA) rr3(x)
x <- c(1, 5, 2, 0, -1, 3, NA) rr3(x)
Set the lower limit of the open age group.
Given a vector of age group labels,
recode all age groups with a lower limit
greater than or equal to <lower>
to <lower>+
.
set_age_open(x, lower)
set_age_open(x, lower)
x |
A vector of age labels. |
lower |
An integer. The lower limit for the open age group. |
set_age_open()
requires that x
and
the return value have a
a five-year, single-year, or life table format,
as described in age_labels()
.
A modified version of x
.
set_age_open()
uses age_lower()
to identify
lower limits
age_labels()
for creating age labels from scratch
x <- c("100+", "80-84", "95-99", "20-24") set_age_open(x, 90) set_age_open(x, 25)
x <- c("100+", "80-84", "95-99", "20-24") set_age_open(x, 90) set_age_open(x, 25)
Build a matrix where the elements are values of
a measure variable, and the rows and columns
are formed by observed combinations of ID
variables. The ID variables picked
out by rows
and cols
must uniquely identify
cells. to_matrix()
, unlike stats::xtabs()
,
does not sum across multiple combinations of
ID variables.
to_matrix(x, rows, cols, measure)
to_matrix(x, rows, cols, measure)
x |
A data frame. |
rows |
The ID variable(s) used to distinguish rows in the matrix. |
cols |
The ID variable(s) used to distinguish columns in the matrix. |
measure |
The measure variable, eg rates or counts. |
A matrix
x <- expand.grid(age = c(0, 1, 2), sex = c("F", "M"), region = c("A", "B"), year = 2000:2001) x$count <- 1:24 to_matrix(x, rows = c(age, sex), cols = c(region, year), measure = count) to_matrix(x, rows = c(age, sex, region), cols = year, measure = count) ## cells not uniquely identified try( to_matrix(x, rows = age, cols = sex, measure = count) )
x <- expand.grid(age = c(0, 1, 2), sex = c("F", "M"), region = c("A", "B"), year = 2000:2001) x$count <- 1:24 to_matrix(x, rows = c(age, sex), cols = c(region, year), measure = count) to_matrix(x, rows = c(age, sex, region), cols = year, measure = count) ## cells not uniquely identified try( to_matrix(x, rows = age, cols = sex, measure = count) )
Trim a vector so that all values are greater than 0 and less than 1.
trim_01(x)
trim_01(x)
x |
A numeric vector. Can be an rvec. |
If
min
is lowest element of x
that is higher than 0, and
max
is the highest element of x
that is
lower than 1,
then trim_01()
shifts all elements of x
that are lower than min
upwards, so that they equal min
, and
shifts all elements of x
that are higher than max
downwards, so that they equal max
.
A trimmed version of x
logit()
, invlogit()
Logit transformation
x <- c(1, 0.98, -0.001, 0.5, 0.01) trim_01(x)
x <- c(1, 0.98, -0.001, 0.5, 0.01) trim_01(x)
Life table quantities from the "West" family of Coale-Demeny model life tables.
west_lifetab
west_lifetab
A data frame with 1,050 rows and the following variables:
level
: Index for life table. Lower level implies
lower life expectancy.
sex
: "Female"
, and "Male"
.
age
: Age, in life table age groups, with an open age
group of 95+.
mx
: Mortality rate.
ax
: Average years lived in age interval by people
who die in that interval.
qx
: Probability some alive at start of age interval
dies during interval.
lx
: Number of people still alive at start of
age interval.
dx
: Number of people dying during age interval.
Lx
: Number of person-years lived during age interval.
ex
: Expectation of life at start of age interval.
Coale A, Demeny P, and Vaughn B. 1983.
Regional model life tables and stable populations.
2nd ed. New York: Academic Press,
accessed via demogR::cdmltw()
.