Skip to content

Commit

Permalink
#420 stri_sprintf skeleton
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed May 18, 2021
1 parent d143d2f commit b44cc59
Show file tree
Hide file tree
Showing 14 changed files with 285 additions and 16 deletions.
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ export(stri_pad_left)
export(stri_pad_right)
export(stri_paste)
export(stri_paste_list)
export(stri_printf)
export(stri_rand_lipsum)
export(stri_rand_shuffle)
export(stri_rand_strings)
Expand Down Expand Up @@ -206,12 +207,14 @@ export(stri_split_fixed)
export(stri_split_lines)
export(stri_split_lines1)
export(stri_split_regex)
export(stri_sprintf)
export(stri_startswith)
export(stri_startswith_charclass)
export(stri_startswith_coll)
export(stri_startswith_fixed)
export(stri_stats_general)
export(stri_stats_latex)
export(stri_string_format)
export(stri_sub)
export(stri_sub_all)
export(stri_sub_all_replace)
Expand Down
20 changes: 17 additions & 3 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# What Is New in *stringi*



## 1.6.3-devel (2021-xx-yy)

* TODO ... [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`)
is a Unicode-aware replacement for the base `sprintf`:
it adds a customised handling of `NA`s (on demand) and
computing field size based on code point width.
Moreover, `stri_printf` can be used to display formatted strings
conveniently.

* TODO ... [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse`
is now also vectorised with respect to the `format` argument.


## 1.6.2 (2021-05-14)

* [BACKWARD INCOMPATIBILITY] In `stri_enc_list()`,
Expand All @@ -12,13 +26,13 @@
* [NEW FEATURE] #428: In `stri_flatten`, `na_empty=NA` now omits missing values.

* [BUILD TIME] #431: Pre-4.9.0 GCC has `::max_align_t`,
but not `std::max_align_t`, added a (possible) workaround, see the INSTALL
but not `std::max_align_t`, added a (possible) workaround, see the `INSTALL`
file.

* [BUGFIX] #429: `stri_width()` misclassified the width of certain
code points (including grave accent, Eszett, etc.);
General category Sk (Symbol, modifier) is no longer of width 0,
UCHAR_EAST_ASIAN_WIDTH of U_EA_AMBIGUOUS is no longer of width 2.
General category *Sk* (Symbol, modifier) is no longer of width 0,
`UCHAR_EAST_ASIAN_WIDTH` of `U_EA_AMBIGUOUS` is no longer of width 2.

* [BUGFIX] #354: `ALTREP` `CHARSXP`s were not copied, and thus could have been
garbage collected in the so-called meanwhile (with thanks to @jimhester).
Expand Down
2 changes: 1 addition & 1 deletion R/pad.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
#' points be used instead of the total code point width
#' (see \code{\link{stri_width}})?
#'
#' @return Returns a character vector.
#' @return These functions return a character vector.
#'
#' @rdname stri_pad
#' @examples
Expand Down
94 changes: 90 additions & 4 deletions R/sprintf.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,98 @@
## EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


#' @title
#' Format Strings
#'
#' @description
#' A Unicode-aware replacements for the built-in \code{\link[base]{sprintf}}
#' function. Moreover, \code{stri_printf} displays/writes formatted strings.
#'
#' @details
#' Vectorized over \code{format} and all vectors passed via \code{...}.
#'
#' \code{stri_string_format} is a synonym for \code{stri_sprintf}.
#'
#' Note that \code{stri_printf} treats missing values as \code{"NA"} strings
#' by default.
#'
#' Note that Unicode code points may have various widths when
#' printed on the console and that, by default, the function takes that
#' into account. By changing the state of the \code{use_length}
#' argument, this function act as if each code point was of width 1.
#'
#' @param format character vector of format strings
#' @param ... logical, integer, real, or character vectors (or objects
#' coercible to)
#' @param na_string single string to represent missing values;
#' if \code{NA}, missing values in \code{...}
#' result in the corresponding outputs be missing too;
#' use \code{"NA"} for compatibility with base R
#' @param inf_string single string to represent the (unsigned) infinity
#' @param na_string single string to represent the not-a-number
#' @param use_length single logical value; should the number of code
#' points be used when applying modifiers such as \code{\%20s}
#' instead of the total code point width (see \code{\link{stri_width}})?
#' @param file see \code{\link[base]{cat}}
#' @param sep see \code{\link[base]{cat}}
#' @param append see \code{\link[base]{cat}}
#'
#' @return
#' \code{stri_printf} is used for its side effect, which is printing
#' of text on the standard output or other connection. Hence, it returns
#' \code{invisible(NULL)}.
#'
#' The other function return a character vector.
#'
#' @rdname stri_sprintf
#' @examples
#' stri_sprintf("%10s=%.3f", "pi", pi)
#'
#' @export
stri_sprintf <- function(
format, ...,
na_string=NA_character_,
inf_string="Inf",
nan_string="NaN",
use_length=FALSE
) {
# force eval of ... here
.Call(C_stri_sprintf, format, list(...),
na_string, inf_string, nan_string, use_length)
}


#' @rdname stri_sprintf
#' @export
stri_string_format <- stri_sprintf


#' @export
stri_printf <- function(
format, ...,
file="",
sep="\n",
append=FALSE,
na_string="NA",
inf_string="Inf",
nan_string="NaN",
use_length=FALSE
) {
# force eval of ... here
str <- .Call(C_stri_sprintf, format, list(...),
na_string, inf_string, nan_string, use_length)
cat(str, file=file, sep=sep, append=append)
}

### TODO: update


#' @title
#' C-Style Formatting with sprintf as a Binary Operator
#' TODO: call stri_sprintf
#'
#' @description
#' Provides access to base R's \code{\link{sprintf}} in form of a binary
#' Provides access to base R's \code{\link[base]{sprintf}} in form of a binary
#' operator in a way similar to Python's \code{\%} overloaded for strings.
#'
#'
Expand All @@ -47,12 +133,12 @@
#' \code{e1 \%s$\% atomic_vector} is equivalent to
#' \code{e1 \%s$\% list(atomic_vector)}.
#'
#' Note that \code{\link{sprintf}} takes field width in bytes,
#' Note that \code{\link[base]{sprintf}} takes field width in bytes,
#' not Unicode code points. See Examples for a workaround.
#'
#'
#' @param e1 format strings, see \code{\link{sprintf}} for syntax
#' @param e2 a list of atomic vectors to be passed to \code{\link{sprintf}}
#' @param e1 format strings, see \code{\link[base]{sprintf}} for syntax
#' @param e2 a list of atomic vectors to be passed to \code{\link[base]{sprintf}}
#' or a single atomic vector
#'
#' @return
Expand Down
11 changes: 6 additions & 5 deletions man/operator_dollar.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/stri_pad.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

79 changes: 79 additions & 0 deletions man/stri_sprintf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions src/stri_container_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@
#ifndef __stri_container_base_h
#define __stri_container_base_h

#include "stri_external.h"
#include "stri_exception.h"



/**
* Base class for all StriContainers
Expand Down
1 change: 1 addition & 0 deletions src/stri_container_utf8.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#define __stri_container_utf8_h

#include "stri_container_base.h"
#include "stri_string8.h"


/**
Expand Down
1 change: 1 addition & 0 deletions src/stri_cpp.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ stri_search_regex_replace.cpp \
stri_search_regex_split.cpp \
stri_search_regex_subset.cpp \
stri_sort.cpp \
stri_sprintf.cpp \
stri_stats.cpp \
stri_stringi.cpp \
stri_sub.cpp \
Expand Down
11 changes: 9 additions & 2 deletions src/stri_exports.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@
#include <R.h>
#include <Rdefines.h>


// compare.cpp:
SEXP stri_cmp(SEXP e1, SEXP e2, SEXP opts_collator=R_NilValue);
SEXP stri_cmp_le(SEXP e1, SEXP e2, SEXP opts_collator=R_NilValue);
Expand Down Expand Up @@ -149,7 +148,15 @@ SEXP stri_enc_isutf32be(SEXP str);

// pad.cpp
SEXP stri_pad(SEXP str, SEXP width, SEXP side=Rf_mkString("left"),
SEXP pad=Rf_mkString(" "), SEXP use_length=Rf_ScalarLogical(FALSE));
SEXP pad=Rf_mkString(" "), SEXP use_length=Rf_ScalarLogical(FALSE));


// sprintf.cpp
SEXP stri_sprintf(SEXP format, SEXP x,
SEXP na_string=Rf_ScalarString(NA_STRING),
SEXP inf_string=Rf_mkString("Inf"),
SEXP nan_string=Rf_mkString("NaN"),
SEXP use_length=Rf_ScalarLogical(FALSE));

// wrap.cpp
SEXP stri_wrap(SEXP str, SEXP width, SEXP cost_exponent=Rf_ScalarInteger(2),
Expand Down
1 change: 1 addition & 0 deletions src/stri_length.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,7 @@ int stri__width_string(const char* str_cur_s, int str_cur_n)
return cur_width;
}


/**
* Determine the width of strings
*
Expand Down
Loading

0 comments on commit b44cc59

Please sign in to comment.