Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationale behind int and not std::size_t #3351

Closed
kunaltyagi opened this issue Sep 19, 2019 · 11 comments
Closed

Rationale behind int and not std::size_t #3351

kunaltyagi opened this issue Sep 19, 2019 · 11 comments

Comments

@kunaltyagi
Copy link
Member

kunaltyagi commented Sep 19, 2019

Is there a reason for using int and not std::size_t everywhere to represent size, including size of stdlib containers?

EDIT:
I've noticed an inconsistency. Several places use std::size_t, others do not.

Expected Behavior

const auto index = container.size();

for(std::size_t i = 0; i < index; ++i)
// OR
for(decltype(index) i = 0; i < index; ++i)

Current Behavior

const int index = static_cast<int>(container.size());

for(int i = 0; i < index; ++i)

Possible Solution

Use std::size_t instead of int for indices

  • Using std::size_t is recommended for portability
  • It saves a cast (which is not a no-op on 64-bit machines) which can induce bugs (see last point)
  • It needs less code
  • Allows larger indices to work (18446744073709551615 vs 2147483647)
  • Prevents negative indices (negative indices can be a feature actually, see Python, but in the code I've seen, negative indices aren't considered), since the cast can potentially create negative indices
@taketwo
Copy link
Member

taketwo commented Sep 19, 2019

Speaking about point indices, I assume that int was chosen because it takes less space, yet supports point clouds over 2 billion points in size (which probably felt sufficient to the original authors). However, I don't see any good reason why they chose int over unsigned int. As you have mentioned, negative indices aren't used in PCL.

Speaking of container size, could you post a link to some examples of what you mean?

@SergioRAgostinho
Copy link
Member

I'm really convinced it is/was just a byproduct of code review absence.

@kunaltyagi
Copy link
Member Author

Speaking of container size, could you post a link to some examples of what you mean?

Take Vertex for example. It has std::vector<uint32_t> vertices and lots of places use int num_vertices = static_cast<int>(vertex.vertices)

@taketwo
Copy link
Member

taketwo commented Sep 19, 2019

Ah, OK. Yes, I fully agree with Sergio's opinion.

@jasjuang
Copy link
Contributor

I remember if not careful using size_t will cause an infinite loop when looping backward. int also takes less space as well.

Also if you read the google coding style here

We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits.

@kunaltyagi
Copy link
Member Author

if not careful using size_t will cause an infinite loop when looping backward.

Good point. I think this issue comes while looping forwards too (while using unsigned for eg.). If we agree on converting int to something larger, maybe creating special values such as pcl::equality_limits<uint>::min or PCL_UINT_MIN_EQUAL might be sensible.

we know are not going to be too big

Sadly, I hit the limit of int while concatenating point clouds. I had my pipeline wrong (no duplicate removal or voxel filtering) but I believe I'll hit the limits as more data comes in. 60 GB of points isn't an unfathomable data anymore for me. I personally prefer fixed width int (int16_t, int32_t) over int

It's a bit hard to argue what can be "too big" and maybe we could limit use of int to things we know will never be bigger than 2 billion (such as fields in a point cloud?)

@aPonza
Copy link
Contributor

aPonza commented Oct 9, 2019

Just chiming in with the CppCoreGuidelines link as to the reasons why a signed index should be preferred in normal use cases. It requires having using index = std::ptrdiff_t; somewhere and casts everywhere though. The pros and cons seem clear enough, and I do see taketwo's point with never ever having negative indices. Moreover the STL itself having it "wrong" doesn't improve the situation.

@kunaltyagi
Copy link
Member Author

@aPonza Yes. That's the big issue. Signed size everywhere would make so much more sense.

As for negative indices not used anywhere, it turns out, PCL uses one special value as marker, ie: -1 when item is not found else correct positive index. A redesign by using std::optional would overcome the issue, but requires a) C++17 or Boost b) Redesign.

I've been trying to remove use of indices as much as possible and use range for loops or algorithms instead. One thing that's controversial is that I'm converting all internal index counters to std::size_t for uniformity. (That's mostly for my use-case so far only since I'm at 20-30% of the limit) Another option is using gsl instead which also opens the gate for span which I love.

I'm hoping that by the time we finish converting loops to range-for + algorithms, the remainder will be taken care of by using range-adapters among other solutions. There are only a few places where the index is required and ranges allow using adapters or rewriting the code to eliminate a lot of them

@mvieth
Copy link
Member

mvieth commented Feb 15, 2020

Hi, I made this commit where I replaced int with std::size_t for indices in the sample consensus module, then I found this issue and wanted to chime in. (See also @kunaltyagi 's RFC 0002 in the Wiki)

I am for the introduction of a index_t type, just because std::vector<index_t> is more expressive than std::vector<int> or std::vector<std::size_t>. In the index_t version, it is immediately clear that the vector holds indices and not just some arbitrary numbers.
Alternatively, using Indices consistently would also be more expressive (defined here as using Indices = std::vector<int>;).

Regarding the actual type of indices, I would argue for an unsigned type (e.g. unsigned int, std::size_t, or uint32_t) because:

  • a negative index is always wrong. Since there are barely any checks for negative indices, that opens the door for bugs. Unsigned indices make checks for negative indices unnecessary.
  • unsigned numbers have twice the range than signed ones at the same type size. So you can have twice as many points without needing a bigger datatype
  • the point that infinite loops can occur when looping backwards with unsigned is true, but I would say that backward loops are rather uncommon (in pcl at least)

@stale
Copy link

stale bot commented May 19, 2020

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

@stale stale bot added the status: stale label May 19, 2020
@kunaltyagi
Copy link
Member Author

index_t is being added slowly but surely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants