Skip to content

Commit

Permalink
Add carry/terminate, cleanup validation
Browse files Browse the repository at this point in the history
Fix #72 SGR Across Elements
  • Loading branch information
brodieG committed Jun 18, 2021
2 parents d662df8 + e8816ff commit e9a7e55
Show file tree
Hide file tree
Showing 51 changed files with 1,490 additions and 772 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,16 @@ Encoding: UTF-8
Collate:
'constants.R'
'fansi-package.R'
'has.R'
'internal.R'
'load.R'
'misc.R'
'nchar.R'
'strip.R'
'strwrap.R'
'strtrim.R'
'strsplit.R'
'substr2.R'
'tohtml.R'
'unhandled.R'
'normalize.R'
'carry.R'
'sgr.R'
85 changes: 74 additions & 11 deletions DEVNOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,26 @@ These are internal developer notes.

## Todo

* This is definitely not parsimonious... Maybe fix when we move substr to C?

> substr_ctl("", 2, 4, carry = "\033[33m")
[1] "\033[33m\033[0m"

* Move the interrupt to be `_read_next` based with an unsigned counter? With
maybe the SGR reads contributing more to the counter? What about writes? Is
there a more universal way to check for interrupts? Main issue is that it's
possible (though perhaps unlikely) that there will be some very slow
individual loop iterations as we saw with `tabs_as_spaces`.
* Write a section on performance in the fansi section.
* Look into hiding global functions / structures, and split off fansi.h into the
internal and external functions. Maybe also split off the write functions
from general utilities.
* Delete unused code (e.g. FANSI_color_size, digits_in_int, etc.).
* How to deal with wraps of `"hello \033[33;44m world"`. Notice extra space.
* Once we add isolate, make sure that trailing sequences are not omitted if the
end is not isolated.
* Change `unhandled_ctl` to point out specific problem sequence.
* Check double warnings in all functions doing the two pass reading.
* How do we currently handle styles across elements?
* We don't. `strwrap` carries the style within one single character
vector, it's just that in the output the result might span a few
elements.
* This needs to be properly documented. Will also simplify
implementation of normalize.

* Write docs about behavior of bleeding.
* Bunch of docs don't have @return tags, oddly.
* Bunch of docs don't have @return tags, oddly (fixed some).
* Make sure we check we're not using `intmax_t` or `uintmax_t` in a tight loop
anywhere.
* Cleanup limits structure, is it really needed now we have a better view of
Expand All @@ -32,6 +34,36 @@ These are internal developer notes.

## Done

* How do we currently handle styles across elements?
* We don't. `strwrap` carries the style within one single character
vector, it's just that in the output the result might span a few
elements.
* This needs to be properly documented. Will also simplify
implementation of normalize.

* Rationalize type checking on entry into C code given that state init already
checks many of them.

Kinda, still a bit of a mess because functions all have slightly different
signatures.

* Check whether anything other than `substr_ctl` uses `state_at_pos` and thus
the assumptions about carry being handled externally might be incorrect.

* Once we add isolate, make sure that trailing sequences are not omitted if the
end is not isolated?

This doesn't make sense now to me. Maybe if the beginning is not isolated?

* It's possible we messed up and `sgr_to_html` had carry semantics whereas other
stuff did not.

Yes, we did. We're now using a different default value for `carry` for it.

* Are we checking byte encoding on e.g. pre/pad, etc.?

* Write docs about behavior of bleeding.

* Can we manage the stack better with the growing buffer so we don't keep all
the prior half sized ones around until we exit so they are eligible for gc?

Expand Down Expand Up @@ -181,7 +213,7 @@ Do we warn about closing tags that don't close an active style? Maybe we
do that, and then point to docs about bleed. But it does mean we should
include the bleed argument.

## Bleed
## Bleed / Carry

Add a `bleed` param that is a single string (or TRUE) that causes the
program to bleed from string to string, with the initial state specified
Expand All @@ -204,6 +236,37 @@ Is this the desired outcome:

Yes, if "isolate" is true, but if not we should emit the ending style.

What was the issue with recycling carry? That you have to pick whether to wrap
your own carry, or whether take the external one? Indeed, what's the right
answer there? Ah, that's the ambiguity, carry and inherit are really distinct
but potentially mutually exclusive (where inherit means take a previously known
state). Inherit matters also beyond normalized mode as e.g. when we take
substrings we start the string with all the known states.

Inherit doesn't really make sense for `strwrap`? I guess it does to begin, and
then carry does the rest (although `strwrap` always auto-carries per element, so
that's a different type of carry).

So:

* Inherit, a recycled vector of starting styles.
* Carry, TRUE or FALSE, or a single style string to start with.
* Mutually exclusive with inherit.
* Isolate, "start", "end", "both", "none/neither".
* Orthogonal to all the others.
* Both Carry and Inherit will be emitted? Or is inherit just so we know
what to close with isolate in normalized mode?

Leaning more and more on it being user responsibility to handle interactions
with external strings by pre-pasting either the style-at-end of other strings,
or the required closing tags.

Does `isolate` then just become `terminate`? What if we change our mind in the
future about wanting to terminate the beginning?

So we're left with just `carry` and terminate, and instructions on how to do
things manually.

## Overflow

Set R_LEN_T_MAX to INT_MAX - N, and check that on expansion we get the correct
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Generated by roxygen2: do not edit by hand

export(close_sgr)
export(fansi_lines)
export(has_ctl)
export(has_sgr)
Expand All @@ -14,6 +15,7 @@ export(nzchar_ctl)
export(nzchar_sgr)
export(set_knit_hooks)
export(sgr_256)
export(sgr_at_end)
export(sgr_to_html)
export(strip_ctl)
export(strip_sgr)
Expand Down
28 changes: 28 additions & 0 deletions R/carry.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Copyright (C) 2021 Brodie Gaslam
##
## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
## GNU General Public License for more details.
##
## Go to <https://www.r-project.org/Licenses/GPL-2> for a copy of the license.

sgr_at_end <- function(
x, warn=getOption('fansi.warn'),
term.cap=getOption('fansi.term.cap'),
normalize=getOption('fansi.normalize', FALSE),
carry=getOption('fansi.carry', FALSE)
) {
VAL_IN_ENV(
x=x, warn=warn, term.cap=term.cap, normalize=normalize, carry=carry,
ctl='sgr'
)
.Call(FANSI_sgr_at_end, x, warn, term.cap.int, ctl.int, normalize, carry)
}
68 changes: 52 additions & 16 deletions R/fansi-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,43 @@
#' the effect is the same as replacement (e.g. if you have a color active and
#' pick another one).
#'
#' @section SGR Interactions:
#'
#' The cumulative nature of SGR means that SGR in strings that are spliced will
#' interact with each other. Additionally, a substring does not inherently
#' contain all the information required to recreate its formatting as it
#' appeared in its source string.
#'
#' One form of possible interaction to consider is how a character vector
#' provided to `fansi` functions interacts with itself. By default, `fansi`
#' assumes that each element in an input character vector is independent, but
#' this is incorrect if the input is a single document with each element a line
#' in it. In that situation unterminated SGR from each line should bleed into
#' subsequent ones. Setting `carry = TRUE` enables the "single document"
#' interpretation. [`sgr_to_html`] is the exception as for legacy reasons it
#' defaults to `carry = TRUE`.
#'
#' Another form of interaction is when substrings produced by `fansi` are
#' spliced with or into other substrings. By default `fansi` automatically
#' terminates substrings it produces if they contain active SGR. This prevents
#' the SGR therein from affecting display of external strings, which is useful
#' e.g. when arranging text in columns. We can allow the SGR to bleed into
#' appended strings by setting `terminate = FALSE`. `carry` is unaffected by
#' `terminate` as `fansi` records the ending SGR state prior to termination
#' internally.
#'
#' Finally, `fansi` strings will be affected by any active SGR in strings they
#' are appended to. There are no parameters to control what happens
#' automatically in this case, but `fansi` provides several functions that can
#' help the user get their desired outcome. `sgr_at_end` computes the active
#' SGR at the end of a string, this can then be prepended onto the _input_ of
#' `fansi` functions so that they are aware of what the active style at the
#' beginning of the string. Alternatively, one could use
#' `close_sgr(sgr_at_end(...))` and pre-pend that to the _output_ of `fansi`
#' functions so they are unaffected by preceding SGR. One could also just
#' prepend "ESC[0m", but in some cases as described in
#' [`?normalize_sgr`][normalize_sgr] that is sub-optimal.
#'
#' @section Encodings / UTF-8:
#'
#' `fansi` will convert any non-ASCII strings to UTF-8 before processing them,
Expand Down Expand Up @@ -162,24 +199,12 @@
#' computations, but for simplicity and also because R and our terminal do not
#' do it properly either we are deferring the issue for now.
#'
#' @section R < 3.2.2 support:
#'
#' Nominally you can build and run this package in R versions between 3.1.0 and
#' 3.2.1. Things should mostly work, but please be aware we do not run the test
#' suite under versions of R less than 3.2.2. One key degraded capability is
#' width computation of wide-display characters. Under R < 3.2.2 `fansi` will
#' assume every character is 1 display width. Additionally, `fansi` may not
#' always report malformed UTF-8 sequences as it usually does. One
#' exception to this is [`nchar_ctl`] as that is just a thin wrapper around
#' [`base::nchar`].
#'
#' @section Overflow:
#'
#' The native code in this package assumes that all strings are NULL terminated
#' and no longer than (32 bit) INT_MAX (excluding the NULL). This should be a
#' safe assumption since the code is designed to work with STRSXPs and CHRSXPs.
#' Behavior is undefined and probably bad if you somehow manage to provide to
#' `fansi` strings that do not adhere to these assumptions.
#' The maximum length of input character vector elements allowed by `fansi` is
#' the 32 bit INT_MAX, excluding the terminating NULL. This appears to be the
#' limit for R character vector elements generally, but is enforced at the C
#' level nonetheless.
#'
#' It is possible that during processing strings that are shorter than INT_MAX
#' would become longer than that. `fansi` checks for that overflow and will
Expand All @@ -189,6 +214,17 @@
#' your system if `R_len_t`, the R type used to measure string lengths, is less
#' than the processed length of the string.
#'
#' @section R < 3.2.2 support:
#'
#' Nominally you can build and run this package in R versions between 3.1.0 and
#' 3.2.1. Things should mostly work, but please be aware we do not run the test
#' suite under versions of R less than 3.2.2. One key degraded capability is
#' width computation of wide-display characters. Under R < 3.2.2 `fansi` will
#' assume every character is 1 display width. Additionally, `fansi` may not
#' always report malformed UTF-8 sequences as it usually does. One
#' exception to this is [`nchar_ctl`] as that is just a thin wrapper around
#' [`base::nchar`].
#'
#' @useDynLib fansi, .registration=TRUE, .fixes="FANSI_"
#' @docType package
#' @name fansi
Expand Down
63 changes: 0 additions & 63 deletions R/has.R

This file was deleted.

Loading

0 comments on commit e9a7e55

Please sign in to comment.