Help pages

Convert breaks to an interval
Cut integer age vectors
Aggregated population data
Reaggregate age counts
Reaggregate age rates

-- B -- breaks_to_interval()

-- C -- cut_ages()

-- P -- pop_dat

-- R -- reaggregate_counts() reaggregate_rates()

Convert breaks to an interval

Description

breaks_to_interval() takes a specified set of breaks representing the left hand limits of a closed open interval, i.e [x, y), and returns the corresponding interval and upper bounds. The resulting intervals span from the minimum break through to a specified max_upper.

Usage

breaks_to_interval(breaks, max_upper = Inf)

Arguments

breaks

[integerish].

1 or more non-negative cut points in increasing (strictly) order.

These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

max_upper

[numeric]

Represents the maximum upper bound splitting the data.

Defaults to Inf.

Value

A tibble with an ordered factor column (interval), as well as columns corresponding to the explicit bounds (lower and upper). Note that even those these bounds are whole numbers they are returned as numeric to allow the maximum upper bound to be given as Inf.

Examples

breaks_to_interval(breaks = c(0, 1, 5, 15, 25, 45, 65))

#> # A tibble: 7 × 3
#>   interval  lower upper
#>   <ord>     <dbl> <dbl>
#> 1 [0, 1)        0     1
#> 2 [1, 5)        1     5
#> 3 [5, 15)       5    15
#> 4 [15, 25)     15    25
#> 5 [25, 45)     25    45
#> 6 [45, 65)     45    65
#> 7 [65, Inf)    65   Inf

breaks_to_interval(
    breaks = c(0, 1, 5, 15, 25, 45, 65),
    max_upper = 100
)

#> # A tibble: 7 × 3
#>   interval  lower upper
#>   <ord>     <dbl> <dbl>
#> 1 [0, 1)        0     1
#> 2 [1, 5)        1     5
#> 3 [5, 15)       5    15
#> 4 [15, 25)     15    25
#> 5 [25, 45)     25    45
#> 6 [45, 65)     45    65
#> 7 [65, 100)    65   100

Cut integer age vectors

Description

cut_ages() provides categorisation of ages based on specified breaks which represent the left-hand interval limits. The resulting intervals span from the minimum break through to a specified max_upper and will always be closed on the left and open on the right. Ages above max_upper will be returned as NA.

Usage

cut_ages(ages, breaks, max_upper = Inf)

Arguments

ages

[numeric].

Vector of age values.

Double values are coerced to integer prior to categorisation / aggregation.

Must not be NA.

breaks

[integerish].

1 or more non-negative cut points in increasing (strictly) order.

These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

max_upper

[numeric]

Represents the maximum upper bound for the resulting intervals.

Double values are rounded up to the nearest (numeric) integer.

Defaults to Inf.

Value

A tibble with an ordered factor column (interval), as well as columns corresponding to the explicit bounds (lower and upper). Internally both bound columns are stored as double but it can be taken as part of the function API that lower is coercible to integer without any coercion to NA_integer_. Similarly all values of upper apart from those corresponding to max_upper can be assumed coercible to integer (max_upper may or may not depending on the given argument).

Examples

cut_ages(ages = 0:9, breaks = c(0, 3, 5, 10))

#> # A tibble: 10 × 3
#>    interval lower upper
#>    <ord>    <dbl> <dbl>
#>  1 [0, 3)       0     3
#>  2 [0, 3)       0     3
#>  3 [0, 3)       0     3
#>  4 [3, 5)       3     5
#>  5 [3, 5)       3     5
#>  6 [5, 10)      5    10
#>  7 [5, 10)      5    10
#>  8 [5, 10)      5    10
#>  9 [5, 10)      5    10
#> 10 [5, 10)      5    10

cut_ages(ages = 0:9, breaks = c(0, 5))

#> # A tibble: 10 × 3
#>    interval lower upper
#>    <ord>    <dbl> <dbl>
#>  1 [0, 5)       0     5
#>  2 [0, 5)       0     5
#>  3 [0, 5)       0     5
#>  4 [0, 5)       0     5
#>  5 [0, 5)       0     5
#>  6 [5, Inf)     5   Inf
#>  7 [5, Inf)     5   Inf
#>  8 [5, Inf)     5   Inf
#>  9 [5, Inf)     5   Inf
#> 10 [5, Inf)     5   Inf

# Note the following is comparable to a call to
# cut(ages, right = FALSE, breaks = c(breaks, Inf))
ages <- seq.int(from = 0, by = 10, length.out = 10)
breaks <- c(0, 1, 10, 30)
cut_ages(ages, breaks)

#> # A tibble: 10 × 3
#>    interval  lower upper
#>    <ord>     <dbl> <dbl>
#>  1 [0, 1)        0     1
#>  2 [10, 30)     10    30
#>  3 [10, 30)     10    30
#>  4 [30, Inf)    30   Inf
#>  5 [30, Inf)    30   Inf
#>  6 [30, Inf)    30   Inf
#>  7 [30, Inf)    30   Inf
#>  8 [30, Inf)    30   Inf
#>  9 [30, Inf)    30   Inf
#> 10 [30, Inf)    30   Inf

# values above max_upper treated as NA
cut_ages(ages = 0:10, breaks = c(0,5), max_upper = 7)

#> # A tibble: 11 × 3
#>    interval lower upper
#>    <ord>    <dbl> <dbl>
#>  1 [0, 5)       0     5
#>  2 [0, 5)       0     5
#>  3 [0, 5)       0     5
#>  4 [0, 5)       0     5
#>  5 [0, 5)       0     5
#>  6 [5, 7)       5     7
#>  7 [5, 7)       5     7
#>  8 <NA>        NA    NA
#>  9 <NA>        NA    NA
#> 10 <NA>        NA    NA
#> 11 <NA>        NA    NA

Aggregated population data

Description

A dataset derived from the 2021 UK census containing population for different age categories across England and Wales.

Usage

pop_dat

Format

A data frame with 200 rows and 6 variables:

area_code: Unique area identifier
area_name: Unique area name
age_category: Left-closed and right-open age interval
value: count of individ

Source

https://github.com/TimTaylor/census_pop_2021

Reaggregate age counts

Description

reaggregate_counts() converts counts over one interval range to another with optional weighting by a known population.

Usage

reaggregate_counts(
  bounds,
  counts,
  new_bounds,
  ...,
  population_bounds = NULL,
  population_weights = NULL
)

Arguments

bounds

[numeric]

The current boundaries in (strictly) increasing order.

These correspond to the left hand side of the intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

counts

[numeric]

Vector of counts corresponding to the intervals defined by bounds.

new_bounds

[numeric]

The desired boundaries in (strictly) increasing order.

...

Further arguments passed to or from other methods.

population_bounds

[numeric]

Interval boundaries for a known population weighting given by the population_weights argument.

population_weights

[numeric]

Population weightings corresponding to population_bounds.

Used to weight the output across the desired intervals.

If NULL (default), counts are divided proportional to the interval sizes.

Value

A tibble with 4 entries; interval, lower, upper and a corresponding count.

Examples

# Reaggregating some data obtained from the 2021 UK census
head(pop_dat)

#>   area_code         area_name age_category   value
#> 1 K04000001 England and Wales       [0, 5) 3232100
#> 2 K04000001 England and Wales      [5, 10) 3524600
#> 3 K04000001 England and Wales     [10, 15) 3595900
#> 4 K04000001 England and Wales     [15, 20) 3394700
#> 5 K04000001 England and Wales     [20, 25) 3602100
#> 6 K04000001 England and Wales     [25, 30) 3901800

# Each row of the data is for the same region so we can drop some columns
# `age_category` and `value` columns
dat <- subset(pop_dat, select = c(age_category, value))

# Add the lower bounds to the data
dat <- transform(
    dat,
    lower_bound = as.integer(sub("\\[([0-9]+), .+)", "\\1", age_category))
)

# Now recategorise to the desired age intervals
with(
    dat,
    reaggregate_counts(
        bounds = lower_bound,
        counts = value,
        new_bounds = c(0, 1, 5, 15, 25, 45, 65)
    )
)

#> # A tibble: 7 × 4
#>   interval  lower upper    count
#>   <ord>     <dbl> <dbl>    <dbl>
#> 1 [0, 1)        0     1   646420
#> 2 [1, 5)        1     5  2585680
#> 3 [5, 15)       5    15  7120500
#> 4 [15, 25)     15    25  6996800
#> 5 [25, 45)     25    45 15787900
#> 6 [45, 65)     45    65 15396800
#> 7 [65, Inf)    65   Inf 11063400

Reaggregate age rates

Description

reaggregate_rates() converts rates over one interval range to another with optional weighting by a known population.

Usage

reaggregate_rates(
  bounds,
  rates,
  new_bounds,
  ...,
  population_bounds = NULL,
  population_weights = NULL
)

Arguments

bounds

[numeric]

The current boundaries in (strictly) increasing order.

These correspond to the left hand side of the intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

rates

[numeric]

Vector of rates corresponding to the intervals defined by bounds.

new_bounds

[numeric]

The desired boundaries in (strictly) increasing order.

...

Further arguments passed to or from other methods.

population_bounds

[numeric]

Interval boundaries for a known population weighting given by the population_weights argument.

population_weights

[numeric]

Population weightings corresponding to population_bounds.

Used to weight the output across the desired intervals.

If NULL (default) rates are divided proportional to the interval sizes.

Value

A data frame with 4 entries; interval, lower, upper and a corresponding rate.

Examples

reaggregate_rates(
    bounds = c(0, 5, 10),
    rates = c(0.1, 0.2 ,0.3),
    new_bounds = c(0, 2, 7, 10),
    population_bounds = c(0, 2, 5, 7, 10),
    population_weights = c(100, 200, 50, 150, 100)
)

#> # A tibble: 4 × 4
#>   interval  lower upper  rate
#>   <ord>     <dbl> <dbl> <dbl>
#> 1 [0, 2)        0     2  0.1 
#> 2 [2, 7)        2     7  0.12
#> 3 [7, 10)       7    10  0.2 
#> 4 [10, Inf)    10   Inf  0.3