Add carry/terminate, cleanup validation

Fix #72 SGR Across Elements
brodieG · Jun 18, 2021 · e9a7e55 · e9a7e55
2 parents d662df8 + e8816ff
commit e9a7e55
Show file tree

Hide file tree

Showing 51 changed files with 1,490 additions and 772 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -30,16 +30,16 @@ Encoding: UTF-8
 Collate:
     'constants.R'
     'fansi-package.R'
-    'has.R'
     'internal.R'
     'load.R'
     'misc.R'
     'nchar.R'
-    'strip.R'
     'strwrap.R'
     'strtrim.R'
     'strsplit.R'
     'substr2.R'
     'tohtml.R'
     'unhandled.R'
     'normalize.R'
+    'carry.R'
+    'sgr.R'
diff --git a/DEVNOTES.md b/DEVNOTES.md
@@ -4,24 +4,26 @@ These are internal developer notes.
 
 ## Todo
 
+* This is definitely not parsimonious...  Maybe fix when we move substr to C?
+
+    > substr_ctl("", 2, 4, carry = "\033[33m")
+    [1] "\033[33m\033[0m"
+
+* Move the interrupt to be `_read_next` based with an unsigned counter?  With
+  maybe the SGR reads contributing more to the counter?  What about writes?  Is
+  there a more universal way to check for interrupts?  Main issue is that it's
+  possible (though perhaps unlikely) that there will be some very slow
+  individual loop iterations as we saw with `tabs_as_spaces`.
+* Write a section on performance in the fansi section.
 * Look into hiding global functions / structures, and split off fansi.h into the
   internal and external functions.  Maybe also split off the write functions
   from general utilities.
 * Delete unused code (e.g. FANSI_color_size, digits_in_int, etc.).
 * How to deal with wraps of `"hello \033[33;44m world"`.  Notice extra space.
-* Once we add isolate, make sure that trailing sequences are not omitted if the
-  end is not isolated.
 * Change `unhandled_ctl` to point out specific problem sequence.
 * Check double warnings in all functions doing the two pass reading.
-* How do we currently handle styles across elements?
-    * We don't.  `strwrap` carries the style within one single character
-      vector, it's just that in the output the result might span a few
-      elements.
-    * This needs to be properly documented.  Will also simplify
-      implementation of normalize.
 
-* Write docs about behavior of bleeding.
-* Bunch of docs don't have @return tags, oddly.
+* Bunch of docs don't have @return tags, oddly (fixed some).
 * Make sure we check we're not using `intmax_t` or `uintmax_t` in a tight loop
   anywhere.
 * Cleanup limits structure, is it really needed now we have a better view of
@@ -32,6 +34,36 @@ These are internal developer notes.
 
 ## Done
 
+* How do we currently handle styles across elements?
+    * We don't.  `strwrap` carries the style within one single character
+      vector, it's just that in the output the result might span a few
+      elements.
+    * This needs to be properly documented.  Will also simplify
+      implementation of normalize.
+
+* Rationalize type checking on entry into C code given that state init already
+  checks many of them.
+
+Kinda, still a bit of a mess because functions all have slightly different
+signatures.
+
+* Check whether anything other than `substr_ctl` uses `state_at_pos` and thus
+  the assumptions about carry being handled externally might be incorrect.
+
+* Once we add isolate, make sure that trailing sequences are not omitted if the
+  end is not isolated?
+
+This doesn't make sense now to me.  Maybe if the beginning is not isolated?
+
+* It's possible we messed up and `sgr_to_html` had carry semantics whereas other
+  stuff did not.
+
+Yes, we did.  We're now using a different default value for `carry` for it.
+
+* Are we checking byte encoding on e.g. pre/pad, etc.?
+
+* Write docs about behavior of bleeding.
+
 * Can we manage the stack better with the growing buffer so we don't keep all
   the prior half sized ones around until we exit so they are eligible for gc?
 
@@ -181,7 +213,7 @@ Do we warn about closing tags that don't close an active style?  Maybe we
 do that, and then point to docs about bleed.  But it does mean we should
 include the bleed argument.
 
-## Bleed
+## Bleed / Carry
 
 Add a `bleed` param that is a single string (or TRUE) that causes the
 program to bleed from string to string, with the initial state specified
@@ -204,6 +236,37 @@ Is this the desired outcome:
 
 Yes, if "isolate" is true, but if not we should emit the ending style.
 
+What was the issue with recycling carry?  That you have to pick whether to wrap
+your own carry, or whether take the external one?  Indeed, what's the right
+answer there?  Ah, that's the ambiguity, carry and inherit are really distinct
+but potentially mutually exclusive (where inherit means take a previously known
+state).  Inherit matters also beyond normalized mode as e.g. when we take
+substrings we start the string with all the known states.
+
+Inherit doesn't really make sense for `strwrap`? I guess it does to begin, and
+then carry does the rest (although `strwrap` always auto-carries per element, so
+that's a different type of carry).
+
+So:
+
+* Inherit, a recycled vector of starting styles.
+* Carry, TRUE or FALSE, or a single style string to start with.
+    * Mutually exclusive with inherit.
+* Isolate, "start", "end", "both", "none/neither".
+    * Orthogonal to all the others.
+    * Both Carry and Inherit will be emitted?  Or is inherit just so we know
+      what to close with isolate in normalized mode?
+
+Leaning more and more on it being user responsibility to handle interactions
+with external strings by pre-pasting either the style-at-end of other strings,
+or the required closing tags.
+
+Does `isolate` then just become `terminate`?  What if we change our mind in the
+future about wanting to terminate the beginning?
+
+So we're left with just `carry` and terminate, and instructions on how to do
+things manually.
+
 ## Overflow
 
 Set R_LEN_T_MAX to INT_MAX - N, and check that on expansion we get the correct

diff --git a/NAMESPACE b/NAMESPACE
@@ -1,5 +1,6 @@
 # Generated by roxygen2: do not edit by hand
 
+export(close_sgr)
 export(fansi_lines)
 export(has_ctl)
 export(has_sgr)
@@ -14,6 +15,7 @@ export(nzchar_ctl)
 export(nzchar_sgr)
 export(set_knit_hooks)
 export(sgr_256)
+export(sgr_at_end)
 export(sgr_to_html)
 export(strip_ctl)
 export(strip_sgr)

diff --git a/R/carry.R b/R/carry.R
@@ -0,0 +1,28 @@
+## Copyright (C) 2021  Brodie Gaslam
+##
+## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
+##
+## This program is free software: you can redistribute it and/or modify
+## it under the terms of the GNU General Public License as published by
+## the Free Software Foundation, either version 2 of the License, or
+## (at your option) any later version.
+##
+## This program is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+## GNU General Public License for more details.
+##
+## Go to <https://www.r-project.org/Licenses/GPL-2> for a copy of the license.
+
+sgr_at_end <- function(
+  x, warn=getOption('fansi.warn'),
+  term.cap=getOption('fansi.term.cap'),
+  normalize=getOption('fansi.normalize', FALSE),
+  carry=getOption('fansi.carry', FALSE)
+) {
+  VAL_IN_ENV(
+    x=x, warn=warn, term.cap=term.cap, normalize=normalize, carry=carry,
+    ctl='sgr'
+  )
+  .Call(FANSI_sgr_at_end, x, warn, term.cap.int, ctl.int, normalize, carry)
+}
diff --git a/R/fansi-package.R b/R/fansi-package.R
@@ -116,6 +116,43 @@
 #' the effect is the same as replacement (e.g. if you have a color active and
 #' pick another one).
 #'
+#' @section SGR Interactions:
+#'
+#' The cumulative nature of SGR means that SGR in strings that are spliced will
+#' interact with each other.  Additionally, a substring does not inherently
+#' contain all the information required to recreate its formatting as it
+#' appeared in its source string.
+#'
+#' One form of possible interaction to consider is how a character vector
+#' provided to `fansi` functions interacts with itself.  By default, `fansi`
+#' assumes that each element in an input character vector is independent, but
+#' this is incorrect if the input is a single document with each element a line
+#' in it.  In that situation unterminated SGR from each line should bleed into
+#' subsequent ones.  Setting `carry = TRUE` enables the "single document"
+#' interpretation. [`sgr_to_html`] is the exception as for legacy reasons it
+#' defaults to `carry = TRUE`.
+#'
+#' Another form of interaction is when substrings produced by `fansi` are
+#' spliced with or into other substrings.  By default `fansi` automatically
+#' terminates substrings it produces if they contain active SGR.  This prevents
+#' the SGR therein from affecting display of external strings, which is useful
+#' e.g. when arranging text in columns.  We can allow the SGR to bleed into
+#' appended strings by setting `terminate = FALSE`.  `carry` is unaffected by
+#' `terminate` as `fansi` records the ending SGR state prior to termination
+#' internally.
+#'
+#' Finally, `fansi` strings will be affected by any active SGR in strings they
+#' are appended to.  There are no parameters to control what happens
+#' automatically in this case, but `fansi` provides several functions that can
+#' help the user get their desired outcome.  `sgr_at_end` computes the active
+#' SGR at the end of a string, this can then be prepended onto the _input_ of
+#' `fansi` functions so that they are aware of what the active style at the
+#' beginning of the string.  Alternatively, one could use
+#' `close_sgr(sgr_at_end(...))` and pre-pend that to the _output_ of `fansi`
+#' functions so they are unaffected by preceding SGR.  One could also just
+#' prepend "ESC[0m", but in some cases as described in
+#' [`?normalize_sgr`][normalize_sgr] that is sub-optimal.
+#'
 #' @section Encodings / UTF-8:
 #'
 #' `fansi` will convert any non-ASCII strings to UTF-8 before processing them,
@@ -162,24 +199,12 @@
 #' computations, but for simplicity and also because R and our terminal do not
 #' do it properly either we are deferring the issue for now.
 #'
-#' @section R < 3.2.2 support:
-#'
-#' Nominally you can build and run this package in R versions between 3.1.0 and
-#' 3.2.1.  Things should mostly work, but please be aware we do not run the test
-#' suite under versions of R less than 3.2.2.  One key degraded capability is
-#' width computation of wide-display characters.  Under R < 3.2.2 `fansi` will
-#' assume every character is 1 display width.  Additionally, `fansi` may not
-#' always report malformed UTF-8 sequences as it usually does.  One
-#' exception to this is [`nchar_ctl`] as that is just a thin wrapper around
-#' [`base::nchar`].
-#'
 #' @section Overflow:
 #'
-#' The native code in this package assumes that all strings are NULL terminated
-#' and no longer than (32 bit) INT_MAX (excluding the NULL).  This should be a
-#' safe assumption since the code is designed to work with STRSXPs and CHRSXPs.
-#' Behavior is undefined and probably bad if you somehow manage to provide to
-#' `fansi` strings that do not adhere to these assumptions.
+#' The maximum length of input character vector elements allowed by `fansi` is
+#' the 32 bit INT_MAX, excluding the terminating NULL.  This appears to be the
+#' limit for R character vector elements generally, but is enforced at the C
+#' level nonetheless.
 #'
 #' It is possible that during processing strings that are shorter than INT_MAX
 #' would become longer than that. `fansi` checks for that overflow and will
@@ -189,6 +214,17 @@
 #' your system if `R_len_t`, the R type used to measure string lengths, is less
 #' than the processed length of the string.
 #'
+#' @section R < 3.2.2 support:
+#'
+#' Nominally you can build and run this package in R versions between 3.1.0 and
+#' 3.2.1.  Things should mostly work, but please be aware we do not run the test
+#' suite under versions of R less than 3.2.2.  One key degraded capability is
+#' width computation of wide-display characters.  Under R < 3.2.2 `fansi` will
+#' assume every character is 1 display width.  Additionally, `fansi` may not
+#' always report malformed UTF-8 sequences as it usually does.  One
+#' exception to this is [`nchar_ctl`] as that is just a thin wrapper around
+#' [`base::nchar`].
+#'
 #' @useDynLib fansi, .registration=TRUE, .fixes="FANSI_"
 #' @docType package
 #' @name fansi

diff --git a/R/has.R b/R/has.R