Help pages

-- B -- breaks_to_interval()

-- C -- cut_ages()

-- P -- pop_dat

-- R -- reaggregate_counts() reaggregate_counts.default() reaggregate_rates() reaggregate_rates.default()

Convert breaks to an interval

Description

breaks_to_interval() takes a specified set of breaks representing the left hand limits of a closed open interval, i.e [x, y), and returns the corresponding interval and upper bounds. The resulting intervals span from the minimum break through to a specified max_upper.

Usage

breaks_to_interval(breaks, max_upper = Inf)

Arguments

breaks

[integerish].

1 or more non-negative cut points in increasing (strictly) order.

These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

max_upper

[numeric]

Represents the maximum upper bound splitting the data.

Defaults to Inf.

Value

A tibble with an ordered factor column (interval), as well as columns corresponding to the explicit bounds (lower_bound and upper_bound). Note that even those these bounds are whole numbers they are returned as numeric to allow the maximum upper bound to be given as Inf.

Examples

breaks_to_interval(breaks = c(0, 1, 5, 15, 25, 45, 65))
#> # A tibble: 7 × 3
#>   interval  lower_bound upper_bound
#>   <ord>           <dbl>       <dbl>
#> 1 [0, 1)              0           1
#> 2 [1, 5)              1           5
#> 3 [5, 15)             5          15
#> 4 [15, 25)           15          25
#> 5 [25, 45)           25          45
#> 6 [45, 65)           45          65
#> 7 [65, Inf)          65         Inf
breaks_to_interval(
    breaks = c(0, 1, 5, 15, 25, 45, 65),
    max_upper = 100
)
#> # A tibble: 7 × 3
#>   interval  lower_bound upper_bound
#>   <ord>           <dbl>       <dbl>
#> 1 [0, 1)              0           1
#> 2 [1, 5)              1           5
#> 3 [5, 15)             5          15
#> 4 [15, 25)           15          25
#> 5 [25, 45)           25          45
#> 6 [45, 65)           45          65
#> 7 [65, 100)          65         100

Cut integer age vectors

Description

cut_ages() provides categorisation of ages based on specified breaks which represent the left-hand interval limits. The resulting intervals span from the minimum break through to a specified max_upper and will always be closed on the left and open on the right. Ages below the minimum break, or above max_upper will be returned as NA.

Usage

cut_ages(ages, breaks, max_upper = Inf)

Arguments

ages

[numeric].

Vector of age values.

Double values are coerced to integer prior to categorisation / aggregation.

Must not be NA.

breaks

[integerish].

1 or more non-negative cut points in increasing (strictly) order.

These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

max_upper

[numeric]

Represents the maximum upper bound for the resulting intervals.

Double values are rounded up to the nearest (numeric) integer.

Defaults to Inf.

Value

A data frame with an ordered factor column (interval), as well as columns corresponding to the explicit bounds (lower_bound and upper_bound). Internally both bound columns are stored as double but it can be taken as part of the function API that lower_bound is coercible to integer without any coercion to NA_integer_. Similarly all values of upper_bound apart from those corresponding to max_upper can be assumed coercible to integer (max_upper may or may not depending on the given argument).

Examples

cut_ages(ages = 0:9, breaks = c(0, 3, 5, 10))
#> # A tibble: 10 × 3
#>    interval lower_bound upper_bound
#>    <ord>          <dbl>       <dbl>
#>  1 [0, 3)             0           3
#>  2 [0, 3)             0           3
#>  3 [0, 3)             0           3
#>  4 [3, 5)             3           5
#>  5 [3, 5)             3           5
#>  6 [5, 10)            5          10
#>  7 [5, 10)            5          10
#>  8 [5, 10)            5          10
#>  9 [5, 10)            5          10
#> 10 [5, 10)            5          10
cut_ages(ages = 0:9, breaks = c(0, 5))
#> # A tibble: 10 × 3
#>    interval lower_bound upper_bound
#>    <ord>          <dbl>       <dbl>
#>  1 [0, 5)             0           5
#>  2 [0, 5)             0           5
#>  3 [0, 5)             0           5
#>  4 [0, 5)             0           5
#>  5 [0, 5)             0           5
#>  6 [5, Inf)           5         Inf
#>  7 [5, Inf)           5         Inf
#>  8 [5, Inf)           5         Inf
#>  9 [5, Inf)           5         Inf
#> 10 [5, Inf)           5         Inf
# Note the following is comparable to a call to
# cut(ages, right = FALSE, breaks = c(breaks, Inf))
ages <- seq.int(from = 0, by = 10, length.out = 10)
breaks <- c(0, 1, 10, 30)
cut_ages(ages, breaks)
#> # A tibble: 10 × 3
#>    interval  lower_bound upper_bound
#>    <ord>           <dbl>       <dbl>
#>  1 [0, 1)              0           1
#>  2 [10, 30)           10          30
#>  3 [10, 30)           10          30
#>  4 [30, Inf)          30         Inf
#>  5 [30, Inf)          30         Inf
#>  6 [30, Inf)          30         Inf
#>  7 [30, Inf)          30         Inf
#>  8 [30, Inf)          30         Inf
#>  9 [30, Inf)          30         Inf
#> 10 [30, Inf)          30         Inf
# values above max_upper treated as NA
cut_ages(ages = 0:10, breaks = c(0,5), max_upper = 7)
#> # A tibble: 11 × 3
#>    interval lower_bound upper_bound
#>    <ord>          <dbl>       <dbl>
#>  1 [0, 5)             0           5
#>  2 [0, 5)             0           5
#>  3 [0, 5)             0           5
#>  4 [0, 5)             0           5
#>  5 [0, 5)             0           5
#>  6 [5, 7)             5           7
#>  7 [5, 7)             5           7
#>  8 <NA>              NA          NA
#>  9 <NA>              NA          NA
#> 10 <NA>              NA          NA
#> 11 <NA>              NA          NA

Aggregated population data

Description

A dataset derived from the 2021 UK census containing population for different age categories across England and Wales.

Usage

pop_dat

Format

A data frame with 200 rows and 6 variables:

area_code

Unique area identifier

area_name

Unique area name

age_category

Left-closed and right-open age interval

value

count of individ

Source

https://github.com/TimTaylor/census_pop_2021

Reaggregate age counts

Description

reaggregate_counts() converts counts over one interval range to another with optional weighting by a known population.

Usage

reaggregate_counts(...)

## Default S3 method:
reaggregate_counts(
  bounds,
  counts,
  new_bounds,
  ...,
  population_bounds = NULL,
  population_weights = NULL
)

Arguments

...

Further arguments passed to or from other methods.

bounds

[numeric]

The current boundaries in (strictly) increasing order.

These correspond to the left hand side of the intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

counts

[numeric]

Vector of counts corresponding to the intervals defined by bounds.

new_bounds

[numeric]

The desired boundaries in (strictly) increasing order.

population_bounds

[numeric]

Interval boundaries for a known population weighting given by the population_weights argument.

population_weights

[numeric]

Population weightings corresponding to population_bounds.

Used to weight the output across the desired intervals.

If NULL (default), counts are divided proportional to the interval sizes.

Value

A data frame with 4 entries; interval, lower_bound, upper_bound and a corresponding count.

Examples

# Reaggregating some data obtained from the 2021 UK census
head(pop_dat)
#>   area_code         area_name age_category   value
#> 1 K04000001 England and Wales       [0, 5) 3232100
#> 2 K04000001 England and Wales      [5, 10) 3524600
#> 3 K04000001 England and Wales     [10, 15) 3595900
#> 4 K04000001 England and Wales     [15, 20) 3394700
#> 5 K04000001 England and Wales     [20, 25) 3602100
#> 6 K04000001 England and Wales     [25, 30) 3901800
# Each row of the data is for the same region so we can drop some columns
# `age_category` and `value` columns
dat <- subset(pop_dat, select = c(age_category, value))

# Add the lower bounds to the data
dat <- transform(
    dat,
    lower_bound = as.integer(sub("\\[([0-9]+), .+)", "\\1", age_category))
)

# Now recategorise to the desired age intervals
with(
    dat,
    reaggregate_counts(
        bounds = lower_bound,
        counts = value,
        new_bounds = c(0, 1, 5, 15, 25, 45, 65)
    )
)
#> # A tibble: 7 × 4
#>   interval  lower upper    count
#>   <ord>     <dbl> <dbl>    <dbl>
#> 1 [0, 1)        0     1   646420
#> 2 [1, 5)        1     5  2585680
#> 3 [5, 15)       5    15  7120500
#> 4 [15, 25)     15    25  6996800
#> 5 [25, 45)     25    45 15787900
#> 6 [45, 65)     45    65 15396800
#> 7 [65, Inf)    65   Inf 11063400

Reaggregate age rates

Description

reaggregate_rates() converts rates over one interval range to another with optional weighting by a known population.

Usage

reaggregate_rates(...)

## Default S3 method:
reaggregate_rates(
  bounds,
  rates,
  new_bounds,
  ...,
  population_bounds = NULL,
  population_weights = NULL
)

Arguments

...

Further arguments passed to or from other methods.

bounds

[numeric]

The current boundaries in (strictly) increasing order.

These correspond to the left hand side of the intervals (e.g. the closed side of [x, y).

Double values are coerced to integer prior to categorisation.

rates

[numeric]

Vector of rates corresponding to the intervals defined by bounds.

new_bounds

[numeric]

The desired boundaries in (strictly) increasing order.

population_bounds

[numeric]

Interval boundaries for a known population weighting given by the population_weights argument.

population_weights

[numeric]

Population weightings corresponding to population_bounds.

Used to weight the output across the desired intervals.

If NULL (default) rates are divided proportional to the interval sizes.

Value

A data frame with 4 entries; interval, lower_bound, upper_bound and a corresponding rate.

Examples

reaggregate_rates(
    bounds = c(0, 5, 10),
    rates = c(0.1, 0.2 ,0.3),
    new_bounds = c(0, 2, 7, 10),
    population_bounds = c(0, 2, 5, 7, 10),
    population_weights = c(100, 200, 50, 150, 100)
)
#> # A tibble: 4 × 4
#>   interval  lower upper  rate
#>   <ord>     <dbl> <dbl> <dbl>
#> 1 [0, 2)        0     2  0.1 
#> 2 [2, 7)        2     7  0.12
#> 3 [7, 10)       7    10  0.2 
#> 4 [10, Inf)    10   Inf  0.3