cut_ages()
provides categorisation of ages based on specified breaks
which represent the left-hand interval limits. The resulting intervals span
from the minimum break through to a specified max_upper
and will always be
closed on the left and open on the right. Ages below the minimum break, or
above max_upper
will be returned as NA.
Arguments
- ages
[numeric]
.Vector of age values.
Double values are coerced to integer prior to categorisation / aggregation.
Must not be NA.
- breaks
[integerish]
.1 or more non-negative cut points in increasing (strictly) order.
These correspond to the left hand side of the desired intervals (e.g. the closed side of [x, y).
Double values are coerced to integer prior to categorisation.
- max_upper
[numeric]
Represents the maximum upper bound for the resulting intervals.
Double values are rounded up to the nearest (numeric) integer.
Defaults to
Inf
.
Value
A data frame with an ordered factor column (interval
), as well as columns
corresponding to the explicit bounds (lower_bound
and upper_bound
).
Internally both bound columns are stored as double but it can be taken as
part of the function API that lower_bound
is coercible to integer without
any coercion to NA_integer_
. Similarly all values of upper_bound
apart
from those corresponding to max_upper
can be assumed coercible to integer
(max_upper
may or may not depending on the given argument).
Examples
cut_ages(ages = 0:9, breaks = c(0L, 3L, 5L, 10L))
#> # A tibble: 10 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 3) 0 3
#> 2 [0, 3) 0 3
#> 3 [0, 3) 0 3
#> 4 [3, 5) 3 5
#> 5 [3, 5) 3 5
#> 6 [5, 10) 5 10
#> 7 [5, 10) 5 10
#> 8 [5, 10) 5 10
#> 9 [5, 10) 5 10
#> 10 [5, 10) 5 10
cut_ages(ages = 0:9, breaks = c(0L, 5L))
#> # A tibble: 10 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 5) 0 5
#> 2 [0, 5) 0 5
#> 3 [0, 5) 0 5
#> 4 [0, 5) 0 5
#> 5 [0, 5) 0 5
#> 6 [5, Inf) 5 Inf
#> 7 [5, Inf) 5 Inf
#> 8 [5, Inf) 5 Inf
#> 9 [5, Inf) 5 Inf
#> 10 [5, Inf) 5 Inf
# Note the following is comparable to a call to
# cut(ages, right = FALSE, breaks = c(breaks, Inf))
ages <- seq.int(from = 0, by = 10, length.out = 10)
breaks <- c(0, 1, 10, 30)
cut_ages(ages, breaks)
#> # A tibble: 10 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 1) 0 1
#> 2 [10, 30) 10 30
#> 3 [10, 30) 10 30
#> 4 [30, Inf) 30 Inf
#> 5 [30, Inf) 30 Inf
#> 6 [30, Inf) 30 Inf
#> 7 [30, Inf) 30 Inf
#> 8 [30, Inf) 30 Inf
#> 9 [30, Inf) 30 Inf
#> 10 [30, Inf) 30 Inf
# values above max_upper treated as NA
cut_ages(ages = 0:10, breaks = c(0,5), max_upper = 7)
#> # A tibble: 11 × 3
#> interval lower_bound upper_bound
#> <ord> <dbl> <dbl>
#> 1 [0, 5) 0 5
#> 2 [0, 5) 0 5
#> 3 [0, 5) 0 5
#> 4 [0, 5) 0 5
#> 5 [0, 5) 0 5
#> 6 [5, 7) 5 7
#> 7 [5, 7) 5 7
#> 8 NA NA NA
#> 9 NA NA NA
#> 10 NA NA NA
#> 11 NA NA NA