Fix mbstowcs_nonfatal() that might convert string shorter than desired #1586

Explorer09 · 2025-01-12T19:48:20Z

The "n" parameter of mbstowcs_nonfatal() refers to number of wide characters. But the logic seemed to be confused with "n" parameter of mbrtowc() (the number of bytes).

Note: I didn't test this patch or check whether the bug really happens with a user string. I just discovered this logic mistake when I try to rework the routines of RichString_writeFromWide() and RichString_appendnWideColumns()

Also I'm not sure if it's worth it to add ATTR_ACCESS3_W(1, 3) to this function.

BenBE · 2025-01-12T20:20:01Z

Looking at the callers I noticed, they both (RichString_writeFromWide and RichString_appendnWideColumns) use int for the len argument and store the size_t by mbstowcs_nonfatal back into that argument. Given n is passed as size_t this may lead to an error situation where a negative len is passed to n as a very large positive integer, overflowing the VLA allocated by the caller, which already triggered UB, as negative sizes for VLA are UB in and of themself.

Thus, could you take another look and see if we can get this sorted out properly (I think, fully going size_t should resolve most of the issues just mentioned.

BenBE · 2025-01-12T20:33:58Z

This still does not resolve the issue I mentioned above regarding these two places:

static inline int RichString_writeFromWide(RichString* this, int attrs, const char* data_c, int from, int len) {
   wchar_t data[len]; // <-- UB here
   len = mbstowcs_nonfatal(data, data_c, len); // int underflow here
…

int RichString_appendnWideColumns(RichString* this, int attrs, const char* data_c, int len, int* columns) {
   wchar_t data[len]; // <-- UB here
   len = mbstowcs_nonfatal(data, data_c, len); // int underflow here
…

In both cases, making len a size_t should fix the issues, forcing len >= 0.

Explorer09 · 2025-01-12T21:01:19Z

Looking at the callers I noticed, they both (RichString_writeFromWide and RichString_appendnWideColumns) use int for the len argument and store the size_t by mbstowcs_nonfatal back into that argument. Given n is passed as size_t this may lead to an error situation where a negative len is passed to n as a very large positive integer, overflowing the VLA allocated by the caller, which already triggered UB, as negative sizes for VLA are UB in and of themself.

Thus, could you take another look and see if we can get this sorted out properly (I think, fully going size_t should resolve most of the issues just mentioned.

Actually I was trying to migrate the RichString.chlen type to size_t and upgrade all RichString methods to use size_t for lengths. Except that things were messier than I expected.

Firstly there were confusions between display width offsets and character offsets. Many of the RichString APIs are not capable for handling varied character widths (e.g. combining characters and East Asian characters) yet. (I was trying to think of a solution and it's not ready yet.)

As for VLAs, my plan was to remove both VLA buffers with a function like this. It would write to CharType array directly without needing a wchar_t[] buffer.

ATTR_NONNULL_N(3, 5) ATTR_ACCESS3_W(1, 4)
static size_t RichString_convertMBString(CharType* chstr, int attrs, const char** src, size_t n, int* width) {
   size_t wcCount = 0;
   wchar_t wc = L'\0';
   mbstate_t ps = {0};
   bool wasReplaced = false;
   int totalWidth = 0;

   while (wcCount < n) {
      bool shouldReplace = false;
      size_t ret = mbrtowc(&wc, *src, SIZE_MAX, &ps);

      if (ret == (size_t)-1 || ret == (size_t)-2) {
         // Invalid or incomplete multibyte sequence
         shouldReplace = true;
         if (ret == (size_t)-2) {
            // There would be no bytes that form a valid sequence from
            // here on. Skip to the end of string. 
            ret = strlen(*src);
         } else {
            // Unfortunately mbrtowc() doesn't tell which next byte may
            // start a potentially valid sequence. Thus we have to skip
            // one byte at a time.
            ret = 1;
         }
      }

      if (!iswprint(wc)) {
         // A control character
         shouldReplace = true;
      }

      if (shouldReplace) {
         wc = L'\xFFFD';
      } else {
         wasReplaced = false;
      }

      if (!wasReplaced) {
         int cWidth = wcwidth(wc);
         assert(cWidth >= 0);

         if (*width >= 0 && cWidth > *width - totalWidth) {
            break;
         }

         // Write character to buffer
         if (chstr) {
            chstr[wcCount] = (CharType) {
               .attr = attrs & 0xffffff,
               .chars = { wch, '\0' }
            };
         }

         wcCount++;
         totalWidth = (totalWidth > INT_MAX - cWidth) ? INT_MAX : totalWidth + cWidth;

         if (shouldReplace) {
            wasReplaced = true;
         }
      }

      if (ret == 0) {
         break;
      }

      *src += ret;
   }

   *width = totalWidth;

   return wcCount;
}

Explorer09 · 2025-01-12T22:10:48Z

This still does not resolve the issue I mentioned above regarding these two places:

static inline int RichString_writeFromWide(RichString* this, int attrs, const char* data_c, int from, int len) {
   wchar_t data[len]; // <-- UB here
   len = mbstowcs_nonfatal(data, data_c, len); // int underflow here
…

I wonder if I can just put assert() on these as an interim solution, before RichString.chlen can be migrated to size_t?

As I mentioned, it's more work than I expected. If I just change the APIs then I have to address the case where any string length (size_t) overflows the type of int.

BenBE · 2025-01-12T23:07:43Z

A quick&dirty upgrade of the RichString header to size_t includes this, AFAICS:

diff --git a/RichString.c b/RichString.c
index 390beb07..455dc80a 100644
--- a/RichString.c
+++ b/RichString.c
@@ -21,7 +21,7 @@ in the source distribution for its full text.
 
 #define charBytes(n) (sizeof(CharType) * (n))
 
-static void RichString_extendLen(RichString* this, int len) {
+static void RichString_extendLen(RichString* this, size_t len) {
    if (this->chptr == this->chstr) {
       // String is in internal buffer
       if (len > RICHSTRING_MAXLEN) {
@@ -49,7 +49,7 @@ static void RichString_extendLen(RichString* this, int len) {
    this->chlen = len;
 }
 
-static void RichString_setLen(RichString* this, int len) {
+static void RichString_setLen(RichString* this, size_t len) {
    if (len < RICHSTRING_MAXLEN && this->chlen < RICHSTRING_MAXLEN) {
       RichString_setChar(this, len, 0);
       this->chlen = len;
@@ -58,8 +58,8 @@ static void RichString_setLen(RichString* this, int len) {
    }
 }
 
-void RichString_rewind(RichString* this, int count) {
-   RichString_setLen(this, this->chlen - count);
+void RichString_rewind(RichString* this, size_t count) {
+   RichString_setLen(this, this->chlen > count ? this->chlen - count : 0);
 }
 
 #ifdef HAVE_LIBNCURSESW
@@ -97,7 +97,7 @@ static size_t mbstowcs_nonfatal(wchar_t* restrict dest, const char* restrict src
    return written;
 }
 
-static inline int RichString_writeFromWide(RichString* this, int attrs, const char* data_c, int from, int len) {
+static inline size_t RichString_writeFromWide(RichString* this, int attrs, const char* data_c, int from, size_t len) {
    wchar_t data[len];
    len = mbstowcs_nonfatal(data, data_c, len);
    if (len <= 0)
@@ -112,18 +112,18 @@ static inline int RichString_writeFromWide(RichString* this, int attrs, const ch
    return len;
 }
 
-int RichString_appendnWideColumns(RichString* this, int attrs, const char* data_c, int len, int* columns) {
+int RichString_appendnWideColumns(RichString* this, int attrs, const char* data_c, size_t len, int* columns) {
    wchar_t data[len];
    len = mbstowcs_nonfatal(data, data_c, len);
    if (len <= 0)
       return 0;
 
-   int from = this->chlen;
-   int newLen = from + len;
+   size_t from = this->chlen;
+   size_t newLen = from + len;
    RichString_setLen(this, newLen);
    int columnsWritten = 0;
-   int pos = from;
-   for (int j = 0; j < len; j++) {
+   size_t pos = from;
+   for (size_t j = 0; j < len; j++) {
       wchar_t c = iswprint(data[j]) ? data[j] : L'\xFFFD';
       int cwidth = wcwidth(c);
       if (cwidth > *columns)
@@ -142,10 +142,10 @@ int RichString_appendnWideColumns(RichString* this, int attrs, const char* data_
    return pos - from;
 }
 
-static inline int RichString_writeFromAscii(RichString* this, int attrs, const char* data, int from, int len) {
-   int newLen = from + len;
+static inline int RichString_writeFromAscii(RichString* this, int attrs, const char* data, size_t from, size_t len) {
+   size_t newLen = from + len;
    RichString_setLen(this, newLen);
-   for (int i = from, j = 0; i < newLen; i++, j++) {
+   for (size_t i = from, j = 0; i < newLen; i++, j++) {
       assert((unsigned char)data[j] <= SCHAR_MAX);
       this->chptr[i] = (CharType) { .attr = attrs & 0xffffff, .chars = { (isprint((unsigned char)data[j]) ? data[j] : L'\xFFFD') } };
    }
@@ -153,26 +153,26 @@ static inline int RichString_writeFromAscii(RichString* this, int attrs, const c
    return len;
 }
 
-inline void RichString_setAttrn(RichString* this, int attrs, int start, int charcount) {
-   int end = CLAMP(start + charcount, 0, this->chlen);
-   for (int i = start; i < end; i++) {
+inline void RichString_setAttrn(RichString* this, int attrs, size_t start, size_t charcount) {
+   size_t end = CLAMP(start + charcount, 0, this->chlen);
+   for (size_t i = start; i < end; i++) {
       this->chptr[i].attr = attrs;
    }
 }
 
-void RichString_appendChr(RichString* this, int attrs, char c, int count) {
-   int from = this->chlen;
-   int newLen = from + count;
+void RichString_appendChr(RichString* this, int attrs, char c, size_t count) {
+   size_t from = this->chlen;
+   size_t newLen = from + count;
    RichString_setLen(this, newLen);
-   for (int i = from; i < newLen; i++) {
+   for (size_t i = from; i < newLen; i++) {
       this->chptr[i] = (CharType) { .attr = attrs, .chars = { c, 0 } };
    }
 }
 
-int RichString_findChar(const RichString* this, char c, int start) {
+int RichString_findChar(const RichString* this, char c, size_t start) {
    const wchar_t wc = btowc(c);
    const cchar_t* ch = this->chptr + start;
-   for (int i = start; i < this->chlen; i++) {
+   for (size_t i = start; i < this->chlen; i++) {
       if (ch->chars[0] == wc)
          return i;
       ch++;
@@ -182,10 +182,10 @@ int RichString_findChar(const RichString* this, char c, int start) {
 
 #else /* HAVE_LIBNCURSESW */
 
-static inline int RichString_writeFromWide(RichString* this, int attrs, const char* data_c, int from, int len) {
-   int newLen = from + len;
+static inline int RichString_writeFromWide(RichString* this, int attrs, const char* data_c, size_t from, size_t len) {
+   size_t newLen = from + len;
    RichString_setLen(this, newLen);
-   for (int i = from, j = 0; i < newLen; i++, j++) {
+   for (size_t i = from, j = 0; i < newLen; i++, j++) {
       this->chptr[i] = (((unsigned char)data_c[j]) >= 32 ? ((unsigned char)data_c[j]) : '?') | attrs;
    }
    this->chptr[newLen] = 0;
@@ -193,35 +193,35 @@ static inline int RichString_writeFromWide(RichString* this, int attrs, const ch
    return len;
 }
 
-int RichString_appendnWideColumns(RichString* this, int attrs, const char* data_c, int len, int* columns) {
+int RichString_appendnWideColumns(RichString* this, int attrs, const char* data_c, size_t len, int* columns) {
    int written = RichString_writeFromWide(this, attrs, data_c, this->chlen, MINIMUM(len, *columns));
    *columns = written;
    return written;
 }
 
-static inline int RichString_writeFromAscii(RichString* this, int attrs, const char* data_c, int from, int len) {
+static inline int RichString_writeFromAscii(RichString* this, int attrs, const char* data_c, size_t from, size_t len) {
    return RichString_writeFromWide(this, attrs, data_c, from, len);
 }
 
-void RichString_setAttrn(RichString* this, int attrs, int start, int charcount) {
-   int end = CLAMP(start + charcount, 0, this->chlen);
-   for (int i = start; i < end; i++) {
+void RichString_setAttrn(RichString* this, int attrs, size_t start, size_t charcount) {
+   size_t end = CLAMP(start + charcount, 0, this->chlen);
+   for (size_t i = start; i < end; i++) {
       this->chptr[i] = (this->chptr[i] & 0xff) | attrs;
    }
 }
 
-void RichString_appendChr(RichString* this, int attrs, char c, int count) {
-   int from = this->chlen;
-   int newLen = from + count;
+void RichString_appendChr(RichString* this, int attrs, char c, size_t count) {
+   size_t from = this->chlen;
+   size_t newLen = from + count;
    RichString_setLen(this, newLen);
-   for (int i = from; i < newLen; i++) {
+   for (size_t i = from; i < newLen; i++) {
       this->chptr[i] = c | attrs;
    }
 }
 
-int RichString_findChar(const RichString* this, char c, int start) {
+int RichString_findChar(const RichString* this, char c, size_t start) {
    const chtype* ch = this->chptr + start;
-   for (int i = start; i < this->chlen; i++) {
+   for (size_t i = start; i < this->chlen; i++) {
       if ((*ch & 0xff) == (chtype) c)
          return i;
       ch++;
@@ -246,7 +246,7 @@ int RichString_appendWide(RichString* this, int attrs, const char* data) {
    return RichString_writeFromWide(this, attrs, data, this->chlen, strlen(data));
 }
 
-int RichString_appendnWide(RichString* this, int attrs, const char* data, int len) {
+int RichString_appendnWide(RichString* this, int attrs, const char* data, size_t len) {
    return RichString_writeFromWide(this, attrs, data, this->chlen, len);
 }
 
@@ -258,7 +258,7 @@ int RichString_appendAscii(RichString* this, int attrs, const char* data) {
    return RichString_writeFromAscii(this, attrs, data, this->chlen, strlen(data));
 }
 
-int RichString_appendnAscii(RichString* this, int attrs, const char* data, int len) {
+int RichString_appendnAscii(RichString* this, int attrs, const char* data, size_t len) {
    return RichString_writeFromAscii(this, attrs, data, this->chlen, len);
 }
 
diff --git a/RichString.h b/RichString.h
index 7783378b..73177b53 100644
--- a/RichString.h
+++ b/RichString.h
@@ -40,7 +40,7 @@ in the source distribution for its full text.
 #define RICHSTRING_MAXLEN 350
 
 typedef struct RichString_ {
-   int chlen;
+   size_t chlen;
    CharType* chptr;
    CharType chstr[RICHSTRING_MAXLEN + 1];
    int highlightAttr;
@@ -48,32 +48,32 @@ typedef struct RichString_ {
 
 void RichString_delete(RichString* this);
 
-void RichString_rewind(RichString* this, int count);
+void RichString_rewind(RichString* this, size_t count);
 
-void RichString_setAttrn(RichString* this, int attrs, int start, int charcount);
+void RichString_setAttrn(RichString* this, int attrs, size_t start, size_t charcount);
 
-int RichString_findChar(const RichString* this, char c, int start);
+int RichString_findChar(const RichString* this, char c, size_t start);
 
 void RichString_setAttr(RichString* this, int attrs);
 
-void RichString_appendChr(RichString* this, int attrs, char c, int count);
+void RichString_appendChr(RichString* this, int attrs, char c, size_t count);
 
 /* All appending and writing functions return the number of written characters (not columns). */
 
 int RichString_appendWide(RichString* this, int attrs, const char* data);
 
 ATTR_ACCESS3_R(3, 4)
-int RichString_appendnWide(RichString* this, int attrs, const char* data, int len);
+int RichString_appendnWide(RichString* this, int attrs, const char* data, size_t len);
 
 /* columns takes the maximum number of columns to write and contains on return the number of columns written. */
-int RichString_appendnWideColumns(RichString* this, int attrs, const char* data, int len, int* columns);
+int RichString_appendnWideColumns(RichString* this, int attrs, const char* data, size_t len, int* columns);
 
 int RichString_writeWide(RichString* this, int attrs, const char* data);
 
 int RichString_appendAscii(RichString* this, int attrs, const char* data);
 
 ATTR_ACCESS3_R(3, 4)
-int RichString_appendnAscii(RichString* this, int attrs, const char* data, int len);
+int RichString_appendnAscii(RichString* this, int attrs, const char* data, size_t len);
 
 int RichString_writeAscii(RichString* this, int attrs, const char* data);

Didn't test it aside from a clean compile on my system.

This does not update return types to be size_t, so this should be done as a second step.

That aside, I'd be okay with some asserts before the VLA (assert(len > 0);) + a todo to remove the VLA going forward.

The conversion function LGTM, though AFAICS this is basically a rewritten version of mbstowcs_nonfatal?

Explorer09 · 2025-01-13T01:17:12Z

A quick&dirty upgrade of the RichString header to size_t includes this, AFAICS:

I would choose to review the functions one by one, because the caller code would also need to reflect the type changes.

RichString_rewind(): Yes, you found the problem where the length could wrap to a negative integer.
RichString_writeFromWide() (ncursesw version): size_t from.
RichString_writeFrom*(): To be safe we need to guard the overflow of these lines size_t newLen = from + len; So this line may be needed: if (from > SIZE_MAX - len) { fail(); }

The conversion function LGTM, though AFAICS this is basically a rewritten version of mbstowcs_nonfatal?

Not just that. Two features: (1.) The replacement with U+FFFD can be centralized in one place, making any future change to the character filter easier. (2.) It allows limiting the string by display width (wcswidth(3)) - basically merging what the loop in RichString_appendnWideColumns() does - a feature that mbstowcs(3) doesn't have.

I plan to add a third feature that detects whether the string contains characters of different width (combining characters or East Asian characters) and let the RichString objects store them (as a bool hasVarCWidth;). This could make printing the string with a limited width (character columns) simpler.

BenBE · 2025-01-13T08:37:14Z

A quick&dirty upgrade of the RichString header to size_t includes this, AFAICS:

I would choose to review the functions one by one, because the caller code would also need to reflect the type changes.

RichString_rewind(): Yes, you found the problem where the length could wrap to a negative integer.

Fixing this by adding the check is independent of the type change and can be done now too. With the type change it's just become necessary to avoid obvious UB.

RichString_writeFromWide() (ncursesw version): size_t from.

???

RichString_writeFrom*(): To be safe we need to guard the overflow of these lines size_t newLen = from + len; So this line may be needed: if (from > SIZE_MAX - len) { fail(); }

This is also something that's missing now already. Just becomes more obvious with the type change.

The conversion function LGTM, though AFAICS this is basically a rewritten version of mbstowcs_nonfatal?

Not just that. Two features: (1.) The replacement with U+FFFD can be centralized in one place, making any future change to the character filter easier. (2.) It allows limiting the string by display width (wcswidth(3)) - basically merging what the loop in RichString_appendnWideColumns() does - a feature that mbstowcs(3) doesn't have.

I plan to add a third feature that detects whether the string contains characters of different width (combining characters or East Asian characters) and let the RichString objects store them (as a bool hasVarCWidth;). This could make printing the string with a limited width (character columns) simpler.

Feature 3 may be used for #462 and #854, I guess?

It's okay to compare "<= RICHSTRING_MAXLEN". RICHSTRING_MAXLEN does not include the terminating L'\0' character and the internal buffer of a RichString instance has (RICHSTRING_MAXLEN + 1) elements. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Prevent arithmetic overflows when computing sizes in RichString_extendLen(). Note that we are not using xMallocArray() and xReallocArray() as they would add unnecessary overflow checks in this particular case. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

The computed length could wrap to a negative integer, causing undefined behavior. (Currently it would cause out-of-bound array access due to a signed length property in RichString. The length property should be migrated to an unsigned size_t type, but that would be done in a later commit.) Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

The "n" parameter of mbstowcs_nonfatal() refers to number of wide characters. But the logic seemed to be confused with "n" parameter of mbrtowc() (the number of bytes). Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

BenBE · 2025-01-13T18:23:50Z

RichString.c

@@ -22,11 +23,17 @@ in the source distribution for its full text.
 #define charBytes(n) (sizeof(CharType) * (n))

 static void RichString_extendLen(RichString* this, int len) {
+   // TODO: Remove the "len" type casts once all the length properties
+   // of RichString have been upgraded to size_t.
+   if ((size_t)len > (SIZE_MAX - 1) / sizeof(CharType)) {


Given INT_MAX <= SIZE_MAX, the only check needs to be len < 0 …

@BenBE
On 32-bit systems INT_MAX can be SIZE_MAX / 2 (rounded down) and it could be larger than (SIZE_MAX - 1) / sizeof(CharType). That's what the check is for (although it seems useless in 64-bit systems).

Either way that's not right to check for overflow as is intended …

I should have changed the conditional to (len < 0 && (size_t)len > (SIZE_MAX - 1) / sizeof(CharType)) ...

BenBE added enhancement Extension or improvement to existing feature code quality ♻️ Code quality enhancement labels Jan 12, 2025

BenBE added this to the 3.4.0 milestone Jan 12, 2025

Explorer09 force-pushed the mbstowcs-bug-fix branch from f109b7d to a547c87 Compare January 12, 2025 20:25

Explorer09 marked this pull request as draft January 13, 2025 11:08

Explorer09 added 4 commits January 14, 2025 01:58

RichString_setLen() minor logic fix

e06fe48

It's okay to compare "<= RICHSTRING_MAXLEN". RICHSTRING_MAXLEN does not include the terminating L'\0' character and the internal buffer of a RichString instance has (RICHSTRING_MAXLEN + 1) elements. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Explorer09 force-pushed the mbstowcs-bug-fix branch from a547c87 to ffcc552 Compare January 13, 2025 18:12

BenBE reviewed Jan 13, 2025

View reviewed changes

Explorer09 mentioned this pull request Jan 17, 2025

FlexMeter: Add FlexMeter functionality #1571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mbstowcs_nonfatal() that might convert string shorter than desired #1586

Fix mbstowcs_nonfatal() that might convert string shorter than desired #1586

Explorer09 commented Jan 12, 2025

BenBE commented Jan 12, 2025

BenBE commented Jan 12, 2025

Explorer09 commented Jan 12, 2025

Explorer09 commented Jan 12, 2025

BenBE commented Jan 12, 2025

Explorer09 commented Jan 13, 2025

BenBE commented Jan 13, 2025

BenBE Jan 13, 2025

Explorer09 Jan 13, 2025

BenBE Jan 13, 2025

Explorer09 Jan 13, 2025

Fix mbstowcs_nonfatal() that might convert string shorter than desired #1586

Are you sure you want to change the base?

Fix mbstowcs_nonfatal() that might convert string shorter than desired #1586

Conversation

Explorer09 commented Jan 12, 2025

BenBE commented Jan 12, 2025

BenBE commented Jan 12, 2025

Explorer09 commented Jan 12, 2025

Explorer09 commented Jan 12, 2025

BenBE commented Jan 12, 2025

Explorer09 commented Jan 13, 2025

BenBE commented Jan 13, 2025

BenBE Jan 13, 2025

Choose a reason for hiding this comment

Explorer09 Jan 13, 2025

Choose a reason for hiding this comment

BenBE Jan 13, 2025

Choose a reason for hiding this comment

Explorer09 Jan 13, 2025

Choose a reason for hiding this comment