ympes provides a collection of lightweight helper functions (imps) both for interactive use and for inclusion within other packages. It’s my attempt to save some functionality that would otherwise get lost in a script somewhere on my computer. To that end it’s a bit of a hodgepodge of things that I’ve found useful at one time or another and, more importantly, remembered to include here!
Visualising palettes
I often want to quickly see what a palette looks like to ensure I can
distinguish the different colours. The imaginatively named
plot_palette()
thus provides a quick overview
plot_palette(c("#5FE756", "red", "black"))
We can make the plot square(ish) by setting the argument
square = TRUE
. A nice side effect of this is the automatic
adjusting of labels to account for the underlying colour
plot_palette(palette.colors(palette = "R4"), square = TRUE)
Finding strings
Sometimes you just want to find rows of a data frame where a
particular string occurs. greprows()
searches for pattern
matches within a data frames columns and returns the related rows or row
indices. It is a thin wrapper around a subset, lapply and reduce
grep()
based approach.
dat <- data.frame(
first = letters,
second = factor(rev(LETTERS)),
third = "Q"
)
greprows(dat, "A|b")
## [1] 2 26
grepvrows() is identical to greprows() except with the default value = TRUE.
grepvrows(dat, "A|b")
## first second third
## 2 b Y Q
## 26 z A Q
greprows(dat, "A|b", value = TRUE)
## first second third
## 2 b Y Q
## 26 z A Q
greplrows() returns a logical vector (match or not for each row of dat).
greplrows(dat, "A|b", ignore.case = TRUE)
## [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] TRUE TRUE
Capturing strings
One of my favourite functions in is strcapture()
. This
function allows you to extract the captured elements of a regular
expression in to a tabular data structure. Being able to parse input
strings from a file to correctly split columns in a data frame in a
single function call feels so elegant.
To illustrate this, we generate some synthetic movement data which we pretend to have loaded in from a file. Each entry has the form “Name-Direction-Value” with the first two entries representing character strings and, the last entry, an integer value.
movements <- function(length) {
x <- lapply(
list(c("Bob", "Mary", "Rose"), c("Up", "Down", "Right", "Left"), 1:10),
sample,
size = length,
replace = TRUE
)
do.call(paste, c(x, sep = "-"))
}
# just a small sample to begin with
(dat <- movements(3))
## [1] "Mary-Up-2" "Bob-Right-7" "Bob-Left-5"
pattern <- "([[:alpha:]]+)-([[:alpha:]]+)-([[:digit:]]+)"
proto <- data.frame(Name = "", Direction = "", Value = 1L)
strcapture(pattern, dat, proto = proto, perl = TRUE)
## Name Direction Value
## 1 Mary Up 2
## 2 Bob Right 7
## 3 Bob Left 5
For small (define as you wish) data sets this works fine.
Unfortunately as the number of entries increases the performance decays
(see https://bugs.r-project.org/show_bug.cgi?id=18728 for a
more detailed analysis). fstrapture()
attempts to improve
upon this by utilising an approach I saw implemented by Toby Hocking in
the nc and the
function nc::capture_first_vec()
.
# Now a larger number of strings
dat <- movements(1e5)
(t <- system.time(r <- strcapture(pattern, dat, proto = proto, perl = TRUE)))
## user system elapsed
## 1.498 0.036 1.534
(t2 <- system.time(r2 <- fstrcapture(dat, pattern, proto = proto)))
## user system elapsed
## 0.034 0.000 0.035
t[["elapsed"]] / t2[["elapsed"]]
## [1] 43.82857
As well as the improved performance you will notice two other
differences between the two function signatures. Firstly, to make things
more pipeable, the data parameter x
appears before the
pattern
parameter. Secondly, fstrcapture()
works only with Perl-compatible regular expressions.
Combining values for lazy people
cc()
is for those of us that get fed up typing quotation
marks. It accepts either comma-separated, unquoted names that you wish
to quote or, a length one character vector that you wish to split by
whitespace. Intended mainly for interactive use only, an example is
likely more enlightening than my description
cc(dale, audrey, laura, hawk)
## [1] "dale" "audrey" "laura" "hawk"
cc("dale audrey laura hawk")
## [1] "dale" "audrey" "laura" "hawk"
Avoid overwriting data frame columns
Sometimes I find myself needing to add a temporary variable to a data
frame without kaboshing a variable already present.
new_name()
provides a simple wrapper around
tempfile()
that generates random column names and checks
for their suitability. Not normally the sort of thing I’d wrap but I
find myself writing the same code a lot so here we are
new_name(mtcars)
## [1] "new17af364eeaa3"
new_name(mtcars, 3L)
## [1] "new17af4f299dba" "new17af6204eb1" "new17af3865bbd"
Assertions (Experimental)
Where better place for yet another implementation of bespoke
assertion functions than a small helper package!. Motivated by
vctrs::vec_assert()
but with lower overhead at a cost of
less flexibility. The assertion functions in ympes are designed to make
it easy to identify the top level calling function whether used within a
user facing function or internally. They are somewhat experimental in
nature and should be treated accordingly!
Currently implemented are:
assert_character()
, assert_chr()
,
assert_character_not_na()
,
assert_chr_not_na()
,
assert_scalar_character()
,
assert_scalar_chr()
,
assert_scalar_character_not_na()
,
assert_scalar_chr_not_na()
, assert_string()
,
assert_string_not_na()
,
assert_double()
, assert_dbl()
,
assert_double_not_na()
, assert_dbl_not_na()
,
assert_scalar_double()
, assert_scalar_dbl()
,
assert_scalar_double_not_na()
,
assert_scalar_dbl_not_na()
,
assert_integer()
, assert_int()
,
assert_integer_not_na()
, assert_int_not_na()
,
assert_scalar_integer()
, assert_scalar_int()
,
assert_scalar_integer_not_na()
,
assert_scalar_int_not_na()
,
assert_integerish()
, assert_whole()
assert_scalar_whole()
,
assert_scalar_integerish()
,
assert_logical()
, assert_lgl()
,
assert_logical_not_na()
, assert_lgl_not_na()
,
assert_scalar_logical()
, assert_scalar_lgl()
,
assert_scalar_logical_not_na()
,
assert_scalar_lgl_not_na()
, assert_bool()
,
assert_boolean()
,
assert_list()
, assert_data_frame()
,
assert_negative()
, assert_negative_or_na()
,
assert_positive()
, assert_positive_or_na()
,
assert_non_negative()
,
assert_non_negative_or_na()
,
assert_non_positive()
,
assert_non_positive_or_na()
,
assert_numeric()
, assert_num()
,
assert_numeric_not_na()
, assert_num_not_na()
,
assert_scalar_numeric()
, assert_scalar_num()
,
assert_scalar_numeric_not_na()
,
assert_scalar_num_not_na()
,
Hopefully most of these are self-explanatory but there is some opinionated (currently undocumented) handling of NA so care should be taken to inspect the underlying source code before using.
Currently these assertions return NULL if the assertion succeeds. Otherwise an error of class “ympes-error” (with optional subclass if supplied when calling the assertion).
# Use in a user facing function
fun <- function(i, d, l, chr, b) {
assert_scalar_int(i)
TRUE
}
fun(i=1L)
## [1] TRUE
try(fun(i="cat"))
## Error in fun() : `i` must be an integer vector of length 1.
# Use in an internal function
internal_fun <- function(a) {
assert_string(
a,
.arg = deparse(substitute(x)),
.call = sys.call(-1L),
.subclass = "example_error"
)
TRUE
}
external_fun <- function(b) {
internal_fun(a=b)
}
external_fun(b="cat")
## [1] TRUE
try(external_fun(b = letters))
## Error in external_fun() : `x` must be a character vector of length 1.
tryCatch(external_fun(b = letters), error = class)
## [1] "example_error" "ympes-error" "error" "condition"