Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vstring: Avoid int -> char truncation warnings #3690

Merged
merged 5 commits into from
May 24, 2023

Conversation

b4n
Copy link
Member

@b4n b4n commented Apr 2, 2023

Either avoid or silence expected casts from int to char.

I'm not entirely sure whether vStringPut() shouldn't actually take a char argument instead of an int, as it will cast it in all cases. It is however slightly more convenient as the APIs to retrieve characters uses int only for edge cases (e.g. EOF), so the common case is a safe cast and it's more convenient not to explicit it everywhere.

@b4n b4n requested a review from masatake April 2, 2023 20:05
main/vstring.h Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 2, 2023

Codecov Report

Patch coverage: 90.00% and no project coverage change.

Comparison is base (ee10ac3) 82.99% compared to head (0c2e334) 83.00%.

❗ Current head 0c2e334 differs from pull request most recent head 1128712. Consider uploading reports for the commit 1128712 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3690   +/-   ##
=======================================
  Coverage   82.99%   83.00%           
=======================================
  Files         226      226           
  Lines       55006    55013    +7     
=======================================
+ Hits        45652    45662   +10     
+ Misses       9354     9351    -3     
Impacted Files Coverage Δ
dsl/optscript.c 88.13% <ø> (ø)
parsers/objc.c 88.33% <ø> (ø)
parsers/ocaml.c 64.97% <ø> (-0.05%) ⬇️
parsers/quarto.c 82.35% <0.00%> (ø)
parsers/asp.c 46.93% <36.36%> (ø)
main/fmt.c 86.74% <75.00%> (ø)
parsers/c-based.c 81.90% <75.00%> (+0.27%) ⬆️
parsers/falcon.c 87.03% <75.00%> (ø)
main/colprint.c 94.73% <100.00%> (+0.12%) ⬆️
main/entry.c 86.75% <100.00%> (ø)
... and 40 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@masatake masatake self-assigned this Apr 5, 2023
@masatake
Copy link
Member

Sorry for the late response. I will look into this one next.

@masatake masatake added this to the 6.1 milestone Apr 29, 2023
@masatake
Copy link
Member

@b4n, sorry for being late.

masatake added a commit to b4n/fishman-ctags that referenced this pull request May 13, 2023
@b4n
Copy link
Member Author

b4n commented May 14, 2023 via email

@masatake
Copy link
Member

@b4n Thank you. Please, delete my commit appended to your branch (if you need).

Either avoid or silence expected casts from int to char.

I'm not entirely sure whether vStringPut() shouldn't actually take a
char argument instead of an int, as it will cast it in all cases. It is
however slightly more convenient as the APIs to retrieve characters
uses int only for edge cases (e.g. EOF), so the common case is a safe
cast and it's more convenient not to explicit it everywhere.
@b4n
Copy link
Member Author

b4n commented May 15, 2023

OK, this blew widely out of proportions, but I verified and fixed all1 calls to vStringPut() to pass values that fir the unsigned char-as-int scheme.

Footnotes

  1. Unfortunately I caught that cppGetc() can return weird values (> 0xff) late (when a Java and a D test failed in a weird way), so I didn't check all calls that stream from this. More checks might be needed, but putting this in a vString is exactly the kind of bugs this could catch 🙂

We expect an `unsigned char` passed as an `int`.  Anything else is
invalid and probably means there is an issue in the calling code.

Co-Authored-By: Masatake YAMATO <yamato@redhat.com>

d [i] = tolower (c);
}
d [i] = (char) tolower (s [i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change itself looks good to me.

Following the consideration that I made during reviewing is not directly related to this pull request. If I wrote incorrect things, please, let me know.

s [i] can be a negative if char on this platform means signed char.
In tolower(3) man page:

    If c is neither an unsigned char value nor EOF, the behavior of these functions is undefined. 

So tolower (s [i]) can be a trouble. Am I wrong?

If we declare s as unsigned char *, and d as unsigned char *,
we can avoid the trouble.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8fffa20 cares what I feared here.

@b4n
Copy link
Member Author

b4n commented May 15, 2023 via email

@masatake
Copy link
Member

The commit, vstring: Avoid int -> char truncation warnings should be merged.

The rest commits are optional.
I'm looking for a simpler way to call vStringPut; I'm afraid I forget to put (unsigned char).
vStrigPut is used so frequently.

@b4n, have you ever thought about _Generic of C11?

#define vStringPut (string, c) _Generic((c),				\
					char: vStringPutChar,		\
					const char: vStringPutChar,	\					
					unsigned char: vStringPutUChar,	\
					const unsigned char: vStringPutUChar, \
					int: vStringPutInt,		\
					const int: vStringPutInt,	\
					default: vStringPutInt) (string, c)



extern void vStringPutChar (vString *const string, const char c)
{
	...
	/* buffer has char[] type. So assigning directly. */		
	string->buffer [string->length] = c; 
}

extern void vStringPutUChar (vString *const string, const unsigned char c)
{
	...
	string->buffer [string->length] = (char)c; 
		
}

extern void vStringPutInt (vString *const string, const int c)
{
	Assert (0 <= c && c <= 0xff);
	...
}

This looks fit on our demand.

The biggest issue is we have to move to C11.
However, _Generic looks useful.
If you are not familiar with _Generic, see http://www.robertgamble.net/2012/01/c11-generic-selections.html .

I also wrote a short program to understand _Generic:

#include <stdio.h>

#define string(X) _Generic((X),						\
			   signed char: "signed char",			\
			   const signed char: "const signed char",	\
			   unsigned char: "unsigned char",		\
			   const unsigned char: "const unsigned char",	\
			   char: "char",				\
			   const char: "const char",			\
			   int: "int",					\
			   const int: "const int",			\
			   unsigned int: "unsigned int",		\
			   const unsigned int: "const unsigned int",	\
			   default: "unknown")

#define test_val(X) do { X v = 0; printf("val: %s -> %s\n", #X, string(v)); } while(0)
#define test_addr(X) do { X y = 0; X *v = &y; printf("addr: %s -> %s\n", #X, string(*v)); } while(0)
#define test_array0(X) do { X v[1] = {0,}; printf("array0: %s -> %s\n", #X, string(v[0])); } while(0)
#define test_array1(X) do { X v[1] = {0,}; printf("array1: %s -> %s\n", #X, string(*v)); } while(0)

#define test(X) test_val(X); test_addr(X); test_array0(X); test_array1(X); 
int
main(void)
{
  test (char);
  test (signed char);
  test (unsigned char);
  test (const char);
  test (const signed char);
  test (const unsigned char);  
  test (int);
  test (unsigned int);
  test (const int);
  test (const unsigned int);
  return 0;
}



@b4n
Copy link
Member Author

b4n commented May 16, 2023

@b4n, have you ever thought about _Generic of C11?
[…]

No, but admittedly I never took time to get into C11 much, mostly because it was too recent for project I worked on.

But indeed, it looks interesting here if we can depend on it (e.g. not if it's conditional, because then callers either have to be correct anyway, or some builds will be broken and some will not).

In practice it could be as simple as something like this (with vStringPutImpl() being the proposed int-based variant performing the Assert):

#define vStringPut(s, c) _Generic((c), \
                                  char: vStringPutImpl((s), (unsigned char) (c)), \
                                  signed char: vStringPutImpl((s), (unsigned char) (c)), \
                                  default: vStringPutImpl((s), (c)))

const doesn't seem to play, which makes sense as it's not taking part of type conversion, but signed seem to for char, but not for int (probably due to the fact char is not defined as either signed or unsigned).

This said, I didn't test this and I don't know if it's reasonable to depend on _Generic(). It's more than 10 years old, and I'd think most compilers have it for at least as long though, so it probably depends on our target users.

In our case though, I wonder if we couldn't simply do a tiny hack without C11:

#define vStringPut(s, c) (sizeof(c) == sizeof(char) \
                          ? vStringPutImpl((s), (unsigned char) (c)) \
                          : vStringPutImpl((s), (c)))

What do you think?

@b4n
Copy link
Member Author

b4n commented May 16, 2023

@masatake I added a commit implementing my version to avoid C11, and it should work OK. Let me know if there's anything I overlooked though.

If we go that route, I could reorder the commits and remove the now unnecessary casts from 913aefc

@masatake
Copy link
Member

@b4n looks nice!
I wonder why I could not find the condition sizeof(c) == sizeof(char) when considering _Generic.

_Generic is interesting but we don't need now.

vStringPut() and friends take an `unsigned char` as an `int`, similar
to ctype functions, in order to be easy to use with `fgetc()`-style
functions.

However, this API is also used on regular C strings, which require a
cast if the value is larger than 0x7F (127) on systems where `char` is
a signed type.

In order to make the API easier to use, as it's easy to forget the cast
when working with `char`, introduce wrapper macros that add the cast
when called with `char`.  The cast is conditional so the underlying
implementation can still verify the value is in the valid range when
called with an `int` (to catch erroneous calls with EOF or other values
that do not fit a `char`).

Note that this will still not work properly if the caller has an
incorrect cast on its side, like e.g. `vStringPut(s, (int) c)` where
`c` is a `char`, as there's only so many magic tricks up our sleeves.
These calls should be updated to either be `vStringPut(s, c)` with the
added macros, or `vStringPut(s, (unsigned char) c)` if one wants to be
explicit -- yet no need to go all the trouble to make them
`vStringPut(s, (int) (unsigned char) c)`, as the `unsigned char` to
`int` conversion is implicit and safe.

Based off a suggestion from Masatake YAMATO <yamato@redhat.com>
@b4n
Copy link
Member Author

b4n commented May 23, 2023

I updated the patchset to move the commit making vStringPut() more generic earlier in the chain, and removed the now unnecessary casts from the commit fixing the calls. There was still a few calls to fix (that were incorrectly casting int to char explicitly, or char to int, both of which are incorrect and prevent the generic handling in vStringPut() from working), but quite a lot fewer.

@b4n looks nice! I wonder why I could not find the condition sizeof(c) == sizeof(char) when considering _Generic.

Well, to be fair I didn't think about it right away, but playing a little with _Generic() made me realize for the case at hand there was another trick up C's sleeves :)

@b4n
Copy link
Member Author

b4n commented May 23, 2023

Wait, apparently I messed up something in the rebase, hang on.

b4n added 2 commits May 23, 2023 11:39
`vStringPut()` has been made forgiving in the type the character is
pass as in e6e018b.  It however
requires the value to be valid in one of the formats it accepts.

This commits should update all problematic cases, basically removing
all explicit casts:

* removing incorrect `int` to `char` casts;
* removing incorrect `char` to `int` casts;
* removing unnecessary `char` to `unsigned char` casts.
* removing unnecessary `unsigned char` to `int` casts.

Incorrect casts can lead to unexpected results with values > 0x7F.
Unnecessary casts make the code harder to read than necessary, and
risk being incorrectly copied to code where they would be incorrect.
`cppGetc()` can return `CHAR_SYMBOL` and `STRING_SYMBOL` which need to
be properly handled when trying to represent as character.
@b4n
Copy link
Member Author

b4n commented May 23, 2023

Fixed, I removed too many things from the "fix calls" commit, sorry. Should be all good now, and I reviewed them more carefully again.

@masatake
Copy link
Member

LGTM. Thank you very much.

@b4n b4n merged commit 979433d into universal-ctags:master May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants