-- B --
breaks_to_interval()
-- C --
cut_ages()
-- P --
pop_dat
-- R --
reaggregate_counts()
reaggregate_counts.default()
reaggregate_rates()
reaggregate_rates.default()
breaks_to_interval()
takes a specified set of breaks representing the left
hand limits of a closed open interval, i.e [x, y), and returns the
corresponding interval and upper bounds. The resulting intervals span from
the minimum break through to a specified max_upper
.
breaks_to_interval(breaks, max_upper = Inf)
breaks |
1 or more non-negative cut points in increasing (strictly) order. These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
max_upper |
Represents the maximum upper bound splitting the data. Defaults to |
A tibble with an ordered factor column (interval
),
as well as columns corresponding to the explicit bounds (lower_bound
and
upper_bound
). Note that even those these bounds are whole numbers they are
returned as numeric
to allow the maximum upper bound to be given as Inf
.
breaks_to_interval(breaks = c(0, 1, 5, 15, 25, 45, 65))
#> # A tibble: 7 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [1, 5) 1 5
#> 3 [5, 15) 5 15
#> 4 [15, 25) 15 25
#> 5 [25, 45) 25 45
#> 6 [45, 65) 45 65
#> 7 [65, Inf) 65 Inf
breaks_to_interval(
breaks = c(0, 1, 5, 15, 25, 45, 65),
max_upper = 100
)
#> # A tibble: 7 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [1, 5) 1 5
#> 3 [5, 15) 5 15
#> 4 [15, 25) 15 25
#> 5 [25, 45) 25 45
#> 6 [45, 65) 45 65
#> 7 [65, 100) 65 100
cut_ages()
provides categorisation of ages based on specified breaks
which represent the left-hand interval limits. The resulting intervals span
from the minimum break through to a specified max_upper
and will always be
closed on the left and open on the right. Ages below the minimum break, or
above max_upper
will be returned as NA.
cut_ages(ages, breaks, max_upper = Inf)
ages |
Vector of age values. Double values are coerced to integer prior to categorisation / aggregation. Must not be NA. |
breaks |
1 or more non-negative cut points in increasing (strictly) order. These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
max_upper |
Represents the maximum upper bound for the resulting intervals. Double values are rounded up to the nearest (numeric) integer. Defaults to |
A data frame with an ordered factor column (interval
), as well as columns
corresponding to the explicit bounds (lower_bound
and upper_bound
).
Internally both bound columns are stored as double but it can be taken as
part of the function API that lower_bound
is coercible to integer without
any coercion to NA_integer_
. Similarly all values of upper_bound
apart
from those corresponding to max_upper
can be assumed coercible to integer
(max_upper
may or may not depending on the given argument).
cut_ages(ages = 0:9, breaks = c(0, 3, 5, 10))
#> # A tibble: 10 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 3) 0 3
#> 2 [0, 3) 0 3
#> 3 [0, 3) 0 3
#> 4 [3, 5) 3 5
#> 5 [3, 5) 3 5
#> 6 [5, 10) 5 10
#> 7 [5, 10) 5 10
#> 8 [5, 10) 5 10
#> 9 [5, 10) 5 10
#> 10 [5, 10) 5 10
cut_ages(ages = 0:9, breaks = c(0, 5))
#> # A tibble: 10 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 5) 0 5
#> 2 [0, 5) 0 5
#> 3 [0, 5) 0 5
#> 4 [0, 5) 0 5
#> 5 [0, 5) 0 5
#> 6 [5, Inf) 5 Inf
#> 7 [5, Inf) 5 Inf
#> 8 [5, Inf) 5 Inf
#> 9 [5, Inf) 5 Inf
#> 10 [5, Inf) 5 Inf
# Note the following is comparable to a call to
# cut(ages, right = FALSE, breaks = c(breaks, Inf))
ages <- seq.int(from = 0, by = 10, length.out = 10)
breaks <- c(0, 1, 10, 30)
cut_ages(ages, breaks)
#> # A tibble: 10 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [10, 30) 10 30
#> 3 [10, 30) 10 30
#> 4 [30, Inf) 30 Inf
#> 5 [30, Inf) 30 Inf
#> 6 [30, Inf) 30 Inf
#> 7 [30, Inf) 30 Inf
#> 8 [30, Inf) 30 Inf
#> 9 [30, Inf) 30 Inf
#> 10 [30, Inf) 30 Inf
# values above max_upper treated as NA
cut_ages(ages = 0:10, breaks = c(0,5), max_upper = 7)
#> # A tibble: 11 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 5) 0 5
#> 2 [0, 5) 0 5
#> 3 [0, 5) 0 5
#> 4 [0, 5) 0 5
#> 5 [0, 5) 0 5
#> 6 [5, 7) 5 7
#> 7 [5, 7) 5 7
#> 8 <NA> NA NA
#> 9 <NA> NA NA
#> 10 <NA> NA NA
#> 11 <NA> NA NA
A dataset derived from the 2021 UK census containing population for different age categories across England and Wales.
pop_dat
A data frame with 200 rows and 6 variables:
Unique area identifier
Unique area name
Left-closed and right-open age interval
count of individ
https://github.com/TimTaylor/census_pop_2021
reaggregate_counts()
converts counts over one interval range to another
with optional weighting by a known population.
reaggregate_counts(...)
## Default S3 method:
reaggregate_counts(
bounds,
counts,
new_bounds,
...,
population_bounds = NULL,
population_weights = NULL
)
... |
Further arguments passed to or from other methods. |
bounds |
The current boundaries in (strictly) increasing order. These correspond to the left hand side of the intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
counts |
Vector of counts corresponding to the intervals defined by |
new_bounds |
The desired boundaries in (strictly) increasing order. |
population_bounds |
Interval boundaries for a known population weighting given by the
|
population_weights |
Population weightings corresponding to Used to weight the output across the desired intervals. If |
A data frame with 4 entries; interval
, lower_bound
, upper_bound
and a
corresponding count
.
# Reaggregating some data obtained from the 2021 UK census
head(pop_dat)
#> area_code area_name age_category value
#> 1 K04000001 England and Wales [0, 5) 3232100
#> 2 K04000001 England and Wales [5, 10) 3524600
#> 3 K04000001 England and Wales [10, 15) 3595900
#> 4 K04000001 England and Wales [15, 20) 3394700
#> 5 K04000001 England and Wales [20, 25) 3602100
#> 6 K04000001 England and Wales [25, 30) 3901800
# Each row of the data is for the same region so we can drop some columns
# `age_category` and `value` columns
dat <- subset(pop_dat, select = c(age_category, value))
# Add the lower bounds to the data
dat <- transform(
dat,
lower_bound = as.integer(sub("\\[([0-9]+), .+)", "\\1", age_category))
)
# Now recategorise to the desired age intervals
with(
dat,
reaggregate_counts(
bounds = lower_bound,
counts = value,
new_bounds = c(0, 1, 5, 15, 25, 45, 65)
)
)
#> # A tibble: 7 × 4
#> interval lower upper count
#> <ord> <dbl> <dbl> <dbl>
#> 1 [0, 1) 0 1 646420
#> 2 [1, 5) 1 5 2585680
#> 3 [5, 15) 5 15 7120500
#> 4 [15, 25) 15 25 6996800
#> 5 [25, 45) 25 45 15787900
#> 6 [45, 65) 45 65 15396800
#> 7 [65, Inf) 65 Inf 11063400
reaggregate_rates()
converts rates over one interval range to another
with optional weighting by a known population.
reaggregate_rates(...)
## Default S3 method:
reaggregate_rates(
bounds,
rates,
new_bounds,
...,
population_bounds = NULL,
population_weights = NULL
)
... |
Further arguments passed to or from other methods. |
bounds |
The current boundaries in (strictly) increasing order. These correspond to the left hand side of the intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
rates |
Vector of rates corresponding to the intervals defined by |
new_bounds |
The desired boundaries in (strictly) increasing order. |
population_bounds |
Interval boundaries for a known population weighting given by the
|
population_weights |
Population weightings corresponding to Used to weight the output across the desired intervals. If |
A data frame with 4 entries; interval
, lower_bound
, upper_bound
and a
corresponding rate
.
reaggregate_rates(
bounds = c(0, 5, 10),
rates = c(0.1, 0.2 ,0.3),
new_bounds = c(0, 2, 7, 10),
population_bounds = c(0, 2, 5, 7, 10),
population_weights = c(100, 200, 50, 150, 100)
)
#> # A tibble: 4 × 4
#> interval lower upper rate
#> <ord> <dbl> <dbl> <dbl>
#> 1 [0, 2) 0 2 0.1
#> 2 [2, 7) 2 7 0.12
#> 3 [7, 10) 7 10 0.2
#> 4 [10, Inf) 10 Inf 0.3