Rationale behind int and not std::size_t #3351

kunaltyagi · 2019-09-19T04:47:23Z

Is there a reason for using int and not std::size_t everywhere to represent size, including size of stdlib containers?

EDIT:
I've noticed an inconsistency. Several places use std::size_t, others do not.

Expected Behavior

const auto index = container.size();

for(std::size_t i = 0; i < index; ++i)
// OR
for(decltype(index) i = 0; i < index; ++i)

Current Behavior

const int index = static_cast<int>(container.size());

for(int i = 0; i < index; ++i)

Possible Solution

Use std::size_t instead of int for indices

Using std::size_t is recommended for portability
It saves a cast (which is not a no-op on 64-bit machines) which can induce bugs (see last point)
It needs less code
Allows larger indices to work (18446744073709551615 vs 2147483647)
Prevents negative indices (negative indices can be a feature actually, see Python, but in the code I've seen, negative indices aren't considered), since the cast can potentially create negative indices

The text was updated successfully, but these errors were encountered:

taketwo · 2019-09-19T07:35:11Z

Speaking about point indices, I assume that int was chosen because it takes less space, yet supports point clouds over 2 billion points in size (which probably felt sufficient to the original authors). However, I don't see any good reason why they chose int over unsigned int. As you have mentioned, negative indices aren't used in PCL.

Speaking of container size, could you post a link to some examples of what you mean?

SergioRAgostinho · 2019-09-19T07:36:00Z

I'm really convinced it is/was just a byproduct of code review absence.

kunaltyagi · 2019-09-19T08:10:13Z

Speaking of container size, could you post a link to some examples of what you mean?

Take Vertex for example. It has std::vector<uint32_t> vertices and lots of places use int num_vertices = static_cast<int>(vertex.vertices)

taketwo · 2019-09-19T08:13:09Z

Ah, OK. Yes, I fully agree with Sergio's opinion.

jasjuang · 2019-09-20T01:21:56Z

I remember if not careful using size_t will cause an infinite loop when looping backward. int also takes less space as well.

Also if you read the google coding style here

We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits.

kunaltyagi · 2019-09-20T05:05:51Z

if not careful using size_t will cause an infinite loop when looping backward.

Good point. I think this issue comes while looping forwards too (while using unsigned for eg.). If we agree on converting int to something larger, maybe creating special values such as pcl::equality_limits<uint>::min or PCL_UINT_MIN_EQUAL might be sensible.

we know are not going to be too big

Sadly, I hit the limit of int while concatenating point clouds. I had my pipeline wrong (no duplicate removal or voxel filtering) but I believe I'll hit the limits as more data comes in. 60 GB of points isn't an unfathomable data anymore for me. I personally prefer fixed width int (int16_t, int32_t) over int

It's a bit hard to argue what can be "too big" and maybe we could limit use of int to things we know will never be bigger than 2 billion (such as fields in a point cloud?)

aPonza · 2019-10-09T07:23:21Z

Just chiming in with the CppCoreGuidelines link as to the reasons why a signed index should be preferred in normal use cases. It requires having using index = std::ptrdiff_t; somewhere and casts everywhere though. The pros and cons seem clear enough, and I do see taketwo's point with never ever having negative indices. Moreover the STL itself having it "wrong" doesn't improve the situation.

kunaltyagi · 2019-10-09T07:47:10Z

@aPonza Yes. That's the big issue. Signed size everywhere would make so much more sense.

As for negative indices not used anywhere, it turns out, PCL uses one special value as marker, ie: -1 when item is not found else correct positive index. A redesign by using std::optional would overcome the issue, but requires a) C++17 or Boost b) Redesign.

I've been trying to remove use of indices as much as possible and use range for loops or algorithms instead. One thing that's controversial is that I'm converting all internal index counters to std::size_t for uniformity. (That's mostly for my use-case so far only since I'm at 20-30% of the limit) Another option is using gsl instead which also opens the gate for span which I love.

I'm hoping that by the time we finish converting loops to range-for + algorithms, the remainder will be taken care of by using range-adapters among other solutions. There are only a few places where the index is required and ranges allow using adapters or rewriting the code to eliminate a lot of them

mvieth · 2020-02-15T10:38:24Z

Hi, I made this commit where I replaced int with std::size_t for indices in the sample consensus module, then I found this issue and wanted to chime in. (See also @kunaltyagi 's RFC 0002 in the Wiki)

I am for the introduction of a index_t type, just because std::vector<index_t> is more expressive than std::vector<int> or std::vector<std::size_t>. In the index_t version, it is immediately clear that the vector holds indices and not just some arbitrary numbers.
Alternatively, using Indices consistently would also be more expressive (defined here as using Indices = std::vector<int>;).

Regarding the actual type of indices, I would argue for an unsigned type (e.g. unsigned int, std::size_t, or uint32_t) because:

a negative index is always wrong. Since there are barely any checks for negative indices, that opens the door for bugs. Unsigned indices make checks for negative indices unnecessary.
unsigned numbers have twice the range than signed ones at the same type size. So you can have twice as many points without needing a bigger datatype
the point that infinite loops can occur when looping backwards with unsigned is true, but I would say that backward loops are rather uncommon (in pcl at least)

stale · 2020-05-19T08:59:12Z

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

kunaltyagi · 2020-05-19T09:21:02Z

index_t is being added slowly but surely.

mvieth mentioned this issue Feb 22, 2020

Add pcl::index_t; move some type declarations from pcl/pcl_macros.h to pcl/types.h #3651

Merged

jingyangcarl mentioned this issue Feb 27, 2020

Follow up better type on pcl::index_t #3682

Closed

stale bot added the status: stale label May 19, 2020

kunaltyagi closed this as completed May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale behind int and not std::size_t #3351

Rationale behind int and not std::size_t #3351

kunaltyagi commented Sep 19, 2019 •

edited

Loading

taketwo commented Sep 19, 2019

SergioRAgostinho commented Sep 19, 2019

kunaltyagi commented Sep 19, 2019

taketwo commented Sep 19, 2019

jasjuang commented Sep 20, 2019

kunaltyagi commented Sep 20, 2019

aPonza commented Oct 9, 2019

kunaltyagi commented Oct 9, 2019

mvieth commented Feb 15, 2020

stale bot commented May 19, 2020

kunaltyagi commented May 19, 2020

Rationale behind int and not std::size_t #3351

Rationale behind int and not std::size_t #3351

Comments

kunaltyagi commented Sep 19, 2019 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

taketwo commented Sep 19, 2019

SergioRAgostinho commented Sep 19, 2019

kunaltyagi commented Sep 19, 2019

taketwo commented Sep 19, 2019

jasjuang commented Sep 20, 2019

kunaltyagi commented Sep 20, 2019

aPonza commented Oct 9, 2019

kunaltyagi commented Oct 9, 2019

mvieth commented Feb 15, 2020

stale bot commented May 19, 2020

kunaltyagi commented May 19, 2020

kunaltyagi commented Sep 19, 2019 •

edited

Loading