Partial code improvements with respect to Clang 19 warnings #1516

Explorer09 · 2024-08-12T01:28:21Z

This branch contains my proposal of the code quality changes with respect to Clang 19 warnings. (Notably, -Wshorten-64-to-32 is enabled by default in Clang 19.) I would try to break these into small changes so that these commits may be cherry-picked into main early.

(This pull request is related to #1513)

CRT.c

CPUMeter.c

DynamicMeter.c

BenBE · 2024-08-13T07:20:12Z

CategoriesPanel.c

-   size_t columns = HeaderLayout_getColumns(this->scr->header->headerLayout);
+   unsigned int columns = HeaderLayout_getColumns(this->scr->header->headerLayout);
   MetersPanel** meterPanels = xMallocArray(columns, sizeof(MetersPanel*));
   Settings* settings = this->host->settings;

-   for (size_t i = 0; i < columns; i++) {
+   for (unsigned int i = 0; i < columns; i++) {
      char titleBuffer[32];
-      xSnprintf(titleBuffer, sizeof(titleBuffer), "Column %zu", i + 1);
+      xSnprintf(titleBuffer, sizeof(titleBuffer), "Column %u", i + 1);


I think this one is a step backwards. Stick with size_t.

Can't understand what you mean. What's the reason of keeping the size_t?

Two-fold:

It's a number of items (columns)

unsigned int looks kinda unwieldy IMHO.

For the format string %zu is much more portable than uint32_t which would require you to use %"PRIu32" instead of just %u.

@BenBE Well, I have a different view on this:

Object sizes with no likelihood of overflowing an int type would not need size_t (that is, unsigned long) for storing.

In x86-64, incrementing and comparing int types are slightly cheaper than comparing long or size_t (in terms of code size). When int types can suffice, using long would make the code one-byte-per-operation larger.

Header column count is unlikely to exceed termical column count (COLS), and the latter is bound to signed int type already.

And no, we don't need PRIu32 here. It's unsigned int, not uint32_t.

This case is slightly different from string lengths. My view is string lengths in size_t can serve as a defence in depth, where the user is possible to feed a 4 GiB large string in a config file. But if the object size would be bound by any other factor, then size_t might be a waste.

(By the way, I did read this article from EWONTFIX when I write my changes.)

I hope htop developers can reconsider.

Personal opinion following:

@BenBE Well, I have a different view on this:

Object sizes with no likelihood of overflowing an int type would not need size_t (that is, unsigned long) for storing.

Nonetheless size_t is less about the number of entities itself, but that you are counting entities.

Otherwise you could argue that "this hashtable implementation only needs short for its size, because I never store more than 65535 elements in it".

I know, that example is hyperbole, but follows that same logic. When I ask for size_t it's really little about "how many", but the "what" (AKA semantics).

In x86-64, incrementing and comparing int types are slightly cheaper than comparing long or size_t (in terms of code size). When int types can suffice, using long would make the code one-byte-per-operation larger.

That might be true for x86_64, but that's not the only platform we need to support. There's at least ARM, ARM64, MIPS, and probably several others, where htop is runing. Unless performance really IS a bottleneck for these functions (AFAICS it's not), priority should be on correctness, not code size or performance.

Header column count is unlikely to exceed terminal column count (COLS), and the latter is bound to signed int type already.

See arguments from the article you linked.

That limit is implied by the current interface from ncurses, but a future version could just switch to size_t for its column and row counts (potentially breaking BC at that point); there are good reasons like huge screens where terminals COULD have that many columns/rows.

And no, we don't need PRIu32 here. It's unsigned int, not uint32_t.

k, ACK on that. But that's only a minor note regarding the chosen type signature.

This case is slightly different from string lengths. My view is string lengths in size_t can serve as a defence in depth, where the user is possible to feed a 4 GiB large string in a config file. But if the object size would be bound by any other factor, then size_t might be a waste.

Allocations are typically backed by 4 KiB pages at minimum, with the usual memory manager implementation pre-allocating some memory to handle sudden spikes. Thus unless the savings are major or we have massive amounts of objects (where storing UTF-8 instead of UTF-16 saves several hundreds of megabytes (like with some browsers) there's usually little need to save every last byte. Furthermore, by storing smaller data types you sometimes may make the alignment of the data worse, thus actually reducing performance.

From my experience I can tell, that I've seen real-life cases of where uint16_t vs. uint32_t was actually harmful due to alignment and memory access patterns. For 32 bit values on a 64 bit architecture, similar considerations apply.

(By the way, I did read this article from EWONTFIX when I write my changes.)

FWIW, I personally side with the author.

I hope htop developers can reconsider.

@cgzones @fasterit @ginggs @natoscott : Your opinion on this matter of size_t vs unsigned int?

I'm happy to stick with size_t here.

Seconded
/DLange

@BenBE @natoscott @fasterit
This commit was to fix the implicit downcast warnings from HeaderLayout_getColumns() by Clang 19 (the messages was: "implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'unsigned int' [-Wshorten-64-to-32]"). HeaderLayout_getColumns() was an inline function that returns a size_t that, in certain caller codes, has to downcast to int because terminal column variables are in int type to follow curses API.

With this background, I would like some clarifications on which decision to make:
(A) Keep HeaderLayout_getColumns() return value in size_t. Use size_t for local variables that hold the return values of HeaderLayout_getColumns(), and add explicit downcasts when needed (e.g. when used in the context of terminal column widths).
(B) Let HeaderLayout_getColumns() return an unsigned int. Keep using int type for places like terminal columns. But when using in array iterators (like in this example code you guys are commented on), upcast to size_t.
(C) Let HeaderLayout_getColumns() return an unsigned int. Also use unsigned int for array iterators that would never cross the bound of HeaderLayout_getColumns(). This is my original proposal.

EDIT: Just a reminder that the (A) option is the least preferred to me as the downcasts can hide potential errors when the original "getColumns" values are out of bounds. Although assertions may be added to ensure the numbers are in bound, they still look like an overkill solution to me (making the problem more complicated than necessary).

Add explicit casts to fix the "-Wshorten-64-to-32" warning (enabled by default in Clang 19). Also shorten the type width for some local variables (where "uint32_t" is enough and no need for "unsigned long"). Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Don't assume a meter's caption string length will fit the type of 'int'. Use 'size_t' for the string lengths instead. Even though an overflow on the string length is unlikely, this makes the code more robust. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Change return type of HeaderLayout_getColumns() from 'size_t' to 'unsigned int', and use 'unsigned int' type for all Header column iterators. Using 'size_t' is unnecessary (you cannot have 2^32 columns anyway) and would cause "-Wshorten-64-to-32" warnings (enabled by default in Clang 19).

cgzones · 2024-08-24T19:21:54Z

Row.c

-      unsigned int minutes = totalSeconds / 60;
-      unsigned int seconds = totalSeconds % 60;
-      unsigned int milliseconds = microseconds / 1000;
+      unsigned int minutes = (unsigned int)totalSeconds / 60;


Shouldn't the cast here happen after the division?

Not necessary because of the (totalSeconds < 600) conditional. Although I argue this is more of a style issue...
(Compiler should catch that (unsigned int)(totalSeconds / 60) can be optimized to ((unsigned int)totalSeconds) / 60, no?)

Not necessarily, as (unsigned int)((1UL<<32) / 60) != (unsigned int)(1UL<<32) / 60.

Won't happen in this case based on the previous check, but the correctness should take priority here.

Explorer09 · 2024-09-03T11:39:50Z

Meter.c

@@ -318,25 +325,25 @@ static void LEDMeterMode_draw(Meter* this, int x, int y, int w) {
      y + 2;
   attrset(CRT_colors[LED_COLOR]);
   const char* caption = Meter_getCaption(this);
-   mvaddstr(yText, x, caption);
-   int xx = x + strlen(caption);
+   mvaddnstr(yText, x, caption, w);


This one is a little beyond my knowledge. According to man pages, the n in mvaddnstr refers to number of "characters" to print. However, what I really want to limit is the number of terminal columns, not the number of Unicode characters (this distinction is needed if East Asian Wide characters are in the strings).

Perhaps a good way is to test how this function call would output, would it be "１２" or just "１":

/* "１２３４ fullwidth" */ addnstr("\uFF11\uFF12\uFF13\uFF14 fullwidth", 2);

What about wcswidth(3)?

What I wish is a function that does an inverse of wcswidth(3). wcswidth() returns the terminal column width given a string and a number of (Unicode) characters limit. What I wished is retrieving the number of characters (or number of bytes) given a string and a column width limit. RichString_appendnWideColumns() might give the closest I need, but I want a function that operates on a const char* rather than a RichString object. Maybe I would write one from scratch?

Unfortunately that is easier said than actual done, but based on the fact that wcswidth returns >=0 you can probably build something like this by calling wcwidth(3) for each character in turn until you overshoot the limit, in which case you know the last character was too much (this accounts for zero-width-combiners and other special cases). The down-side is, that you will need to transform the const char* into wchar_t* first, thus having this functionality live with the other RichString functions might be reasonable.

NB: There's actually an issue that might benefit from this functionality in #462 IIRC.

@BenBE Yes, I'm aware of PR #462. And perhaps PR #1231 can benefit, too, as I wish this function would work as a word wrapper that is also CJK-aware.

BenBE added enhancement Extension or improvement to existing feature code quality ♻️ Code quality enhancement build system 🔧 Affects the build system rather then the user experience labels Aug 12, 2024