-- B --
breaks_to_interval()
-- C --
cut_ages()
-- P --
pop_dat
-- R --
reaggregate_counts()
reaggregate_rates()
breaks_to_interval()
takes a specified set of breaks representing the left
hand limits of a closed open interval, i.e [x, y), and returns the
corresponding interval and upper bounds. The resulting intervals span from
the minimum break through to a specified max_upper
.
breaks_to_interval(breaks, max_upper = Inf)
breaks |
1 or more non-negative cut points in increasing (strictly) order. These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
max_upper |
Represents the maximum upper bound splitting the data. Defaults to |
A tibble with an ordered factor column (interval
),
as well as columns corresponding to the explicit bounds (lower
and
upper
). Note that even those these bounds are whole numbers they are
returned as numeric
to allow the maximum upper bound to be given as Inf
.
breaks_to_interval(breaks = c(0, 1, 5, 15, 25, 45, 65))
#> # A tibble: 7 × 3
#> interval lower upper
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [1, 5) 1 5
#> 3 [5, 15) 5 15
#> 4 [15, 25) 15 25
#> 5 [25, 45) 25 45
#> 6 [45, 65) 45 65
#> 7 [65, Inf) 65 Inf
breaks_to_interval(
breaks = c(0, 1, 5, 15, 25, 45, 65),
max_upper = 100
)
#> # A tibble: 7 × 3
#> interval lower upper
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [1, 5) 1 5
#> 3 [5, 15) 5 15
#> 4 [15, 25) 15 25
#> 5 [25, 45) 25 45
#> 6 [45, 65) 45 65
#> 7 [65, 100) 65 100
cut_ages()
provides categorisation of ages based on specified breaks
which represent the left-hand interval limits. The resulting intervals span
from the minimum break through to a specified max_upper
and will always be
closed on the left and open on the right. Ages below the minimum break, or
above max_upper
will be returned as NA.
cut_ages(ages, breaks, max_upper = Inf)
ages |
Vector of age values. Double values are coerced to integer prior to categorisation / aggregation. Must not be NA. |
breaks |
1 or more non-negative cut points in increasing (strictly) order. These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
max_upper |
Represents the maximum upper bound for the resulting intervals. Double values are rounded up to the nearest (numeric) integer. Defaults to |
A data frame with an ordered factor column (interval
), as well as columns
corresponding to the explicit bounds (lower
and upper
).
Internally both bound columns are stored as double but it can be taken as
part of the function API that lower
is coercible to integer without
any coercion to NA_integer_
. Similarly all values of upper
apart
from those corresponding to max_upper
can be assumed coercible to integer
(max_upper
may or may not depending on the given argument).
cut_ages(ages = 0:9, breaks = c(0, 3, 5, 10))
#> # A tibble: 10 × 3
#> interval lower upper
#> <ord> <dbl> <dbl>
#> 1 [0, 3) 0 3
#> 2 [0, 3) 0 3
#> 3 [0, 3) 0 3
#> 4 [3, 5) 3 5
#> 5 [3, 5) 3 5
#> 6 [5, 10) 5 10
#> 7 [5, 10) 5 10
#> 8 [5, 10) 5 10
#> 9 [5, 10) 5 10
#> 10 [5, 10) 5 10
cut_ages(ages = 0:9, breaks = c(0, 5))
#> # A tibble: 10 × 3
#> interval lower upper
#> <ord> <dbl> <dbl>
#> 1 [0, 5) 0 5
#> 2 [0, 5) 0 5
#> 3 [0, 5) 0 5
#> 4 [0, 5) 0 5
#> 5 [0, 5) 0 5
#> 6 [5, Inf) 5 Inf
#> 7 [5, Inf) 5 Inf
#> 8 [5, Inf) 5 Inf
#> 9 [5, Inf) 5 Inf
#> 10 [5, Inf) 5 Inf
# Note the following is comparable to a call to
# cut(ages, right = FALSE, breaks = c(breaks, Inf))
ages <- seq.int(from = 0, by = 10, length.out = 10)
breaks <- c(0, 1, 10, 30)
cut_ages(ages, breaks)
#> # A tibble: 10 × 3
#> interval lower upper
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [10, 30) 10 30
#> 3 [10, 30) 10 30
#> 4 [30, Inf) 30 Inf
#> 5 [30, Inf) 30 Inf
#> 6 [30, Inf) 30 Inf
#> 7 [30, Inf) 30 Inf
#> 8 [30, Inf) 30 Inf
#> 9 [30, Inf) 30 Inf
#> 10 [30, Inf) 30 Inf
# values above max_upper treated as NA
cut_ages(ages = 0:10, breaks = c(0,5), max_upper = 7)
#> # A tibble: 11 × 3
#> interval lower upper
#> <ord> <dbl> <dbl>
#> 1 [0, 5) 0 5
#> 2 [0, 5) 0 5
#> 3 [0, 5) 0 5
#> 4 [0, 5) 0 5
#> 5 [0, 5) 0 5
#> 6 [5, 7) 5 7
#> 7 [5, 7) 5 7
#> 8 <NA> NA NA
#> 9 <NA> NA NA
#> 10 <NA> NA NA
#> 11 <NA> NA NA
A dataset derived from the 2021 UK census containing population for different age categories across England and Wales.
pop_dat
A data frame with 200 rows and 6 variables:
Unique area identifier
Unique area name
Left-closed and right-open age interval
count of individ
https://github.com/TimTaylor/census_pop_2021
reaggregate_counts()
converts counts over one interval range to another
with optional weighting by a known population.
reaggregate_counts(
bounds,
counts,
new_bounds,
...,
population_bounds = NULL,
population_weights = NULL
)
bounds |
The current boundaries in (strictly) increasing order. These correspond to the left hand side of the intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
counts |
Vector of counts corresponding to the intervals defined by |
new_bounds |
The desired boundaries in (strictly) increasing order. |
... |
Further arguments passed to or from other methods. |
population_bounds |
Interval boundaries for a known population weighting given by the
|
population_weights |
Population weightings corresponding to Used to weight the output across the desired intervals. If |
A data frame with 4 entries; interval
, lower
, upper
and a
corresponding count
.
# Reaggregating some data obtained from the 2021 UK census
head(pop_dat)
#> area_code area_name age_category value
#> 1 K04000001 England and Wales [0, 5) 3232100
#> 2 K04000001 England and Wales [5, 10) 3524600
#> 3 K04000001 England and Wales [10, 15) 3595900
#> 4 K04000001 England and Wales [15, 20) 3394700
#> 5 K04000001 England and Wales [20, 25) 3602100
#> 6 K04000001 England and Wales [25, 30) 3901800
# Each row of the data is for the same region so we can drop some columns
# `age_category` and `value` columns
dat <- subset(pop_dat, select = c(age_category, value))
# Add the lower bounds to the data
dat <- transform(
dat,
lower_bound = as.integer(sub("\\[([0-9]+), .+)", "\\1", age_category))
)
# Now recategorise to the desired age intervals
with(
dat,
reaggregate_counts(
bounds = lower_bound,
counts = value,
new_bounds = c(0, 1, 5, 15, 25, 45, 65)
)
)
#> # A tibble: 7 × 4
#> interval lower upper count
#> <ord> <dbl> <dbl> <dbl>
#> 1 [0, 1) 0 1 646420
#> 2 [1, 5) 1 5 2585680
#> 3 [5, 15) 5 15 7120500
#> 4 [15, 25) 15 25 6996800
#> 5 [25, 45) 25 45 15787900
#> 6 [45, 65) 45 65 15396800
#> 7 [65, Inf) 65 Inf 11063400
reaggregate_rates()
converts rates over one interval range to another
with optional weighting by a known population.
reaggregate_rates(
bounds,
rates,
new_bounds,
...,
population_bounds = NULL,
population_weights = NULL
)
bounds |
The current boundaries in (strictly) increasing order. These correspond to the left hand side of the intervals (e.g. the closed side of [x, y). Double values are coerced to integer prior to categorisation. |
rates |
Vector of rates corresponding to the intervals defined by |
new_bounds |
The desired boundaries in (strictly) increasing order. |
... |
Further arguments passed to or from other methods. |
population_bounds |
Interval boundaries for a known population weighting given by the
|
population_weights |
Population weightings corresponding to Used to weight the output across the desired intervals. If |
A data frame with 4 entries; interval
, lower
, upper
and a
corresponding rate
.
reaggregate_rates(
bounds = c(0, 5, 10),
rates = c(0.1, 0.2 ,0.3),
new_bounds = c(0, 2, 7, 10),
population_bounds = c(0, 2, 5, 7, 10),
population_weights = c(100, 200, 50, 150, 100)
)
#> # A tibble: 4 × 4
#> interval lower upper rate
#> <ord> <dbl> <dbl> <dbl>
#> 1 [0, 2) 0 2 0.1
#> 2 [2, 7) 2 7 0.12
#> 3 [7, 10) 7 10 0.2
#> 4 [10, Inf) 10 Inf 0.3