Reaggregate age counts — reaggregate

reaggregate_counts() converts counts over one interval range to another with optional weighting by a known population.

Usage

reaggregate_counts(...)

# Default S3 method
reaggregate_counts(
  bounds,
  counts,
  new_bounds,
  ...,
  population_bounds = NULL,
  population_weights = NULL
)

Arguments

...

Further arguments passed to or from other methods.

bounds

[numeric]

The current boundaries in (strictly) increasing order.

These correspond to the left hand side of the intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

counts

[numeric]

Vector of counts corresponding to the intervals defined by bounds.

new_bounds

[numeric]

The desired boundaries in (strictly) increasing order.

population_bounds

[numeric]

Interval boundaries for a known population weighting given by the population_weights argument.

population_weights

[numeric]

Population weightings corresponding to population_bounds.

Used to weight the output across the desired intervals.

If NULL (default), counts are divided proportional to the interval sizes.

Value

A data frame with 4 entries; interval, lower_bound, upper_bound and a corresponding count.

Examples


# Reaggregating some data obtained from the 2021 UK census
head(pop_dat)
#>   area_code         area_name age_category   value
#> 1 K04000001 England and Wales       [0, 5) 3232100
#> 2 K04000001 England and Wales      [5, 10) 3524600
#> 3 K04000001 England and Wales     [10, 15) 3595900
#> 4 K04000001 England and Wales     [15, 20) 3394700
#> 5 K04000001 England and Wales     [20, 25) 3602100
#> 6 K04000001 England and Wales     [25, 30) 3901800

# Each row of the data is for the same region so we can drop some columns
# `age_category` and `value` columns
dat <- subset(pop_dat, select = c(age_category, value))

# Add the lower bounds to the data
dat <- transform(
    dat,
    lower_bound = as.integer(sub("\\[([0-9]+), .+)", "\\1", age_category))
)

# Now recategorise to the desired age intervals
with(
    dat,
    reaggregate_counts(
        bounds = lower_bound,
        counts = value,
        new_bounds = c(0L, 1L, 5L, 15L, 25L, 45L, 65L)
    )
)
#> # A tibble: 7 × 4
#>   interval  lower upper    count
#>   <ord>     <int> <dbl>    <dbl>
#> 1 [0, 1)        0     1   646420
#> 2 [1, 5)        1     5  2585680
#> 3 [5, 15)       5    15  7120500
#> 4 [15, 25)     15    25  6996800
#> 5 [25, 45)     25    45 15787900
#> 6 [45, 65)     45    65 15396800
#> 7 [65, Inf)    65   Inf 11063400