Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Creating a String from a DynamicVector drops the last character in the vector #1753

Open
sstadick opened this issue Feb 14, 2024 · 7 comments
Labels
bug Something isn't working mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library

Comments

@sstadick
Copy link

sstadick commented Feb 14, 2024

Bug description

Converting a DynamicVector to a String using the String constructor results in a string that is missing the last element of the DynamicVecotor used to create it.

Steps to reproduce

The following code reproduces the issue:

fn test_stringify() raises:
    var example = DynamicVector[Int8]()
    example.append(ord("e"))
    example.append(ord("x"))

    var container = DynamicVector[Int8]()
    for i in range(len(example)):
        container.append(example[i])
    let stringifed = String(container)
    assert_equal("ex", stringifed)
    # Unhandled exception caught during execution: AssertionError: ex is not equal to e

System information

❯ mojo -v
mojo 0.7.0 (af002202)
❯ modular -v
modular 0.4.1 (2d8afe15)

On an M1 Mac.

@sstadick sstadick added bug Something isn't working mojo Issues that are related to mojo labels Feb 14, 2024
@arthurevans arthurevans added the mojo-stdlib Tag for issues related to standard library label Feb 14, 2024
@arthurevans
Copy link
Collaborator

I'm not sure if this is a doc bug or an API design issue--maybe a bit of both.

The first constructor for String() takes a DynamicVector[Int8] argument which it expects to be a null-terminated array of characters. While that is the internal format that String uses, jit yields unexpected results in this case.

The constructors that take pointers also assume that the data is already null-terminated and that the provided len includes the terminator.

There's a private _unsafe_from_bytes() method that does exactly what @sstadick is looking to do here. (And a public asbytes() method that does the reverse).

TL;DR: It seems kind of backwards to me that the default paths in/out of String assume that the buffer coming in is null-terminated. But if that's the case we should just make sure the API docs for any method that takes a DynamicVector or Pointer mentions whether the data in question is null-terminated.

@arthurevans
Copy link
Collaborator

So workaround is--use _unsafe_from_bytes() or use container.append(0) to null-terminate your string first.

@lattner
Copy link
Collaborator

lattner commented Feb 15, 2024

Wow, this looks like we should require a keyword argument here, so String(nul_terminated=container), and it should check/abort if not nul terminated.

@sstadick
Copy link
Author

Thanks @arthurevans my question is answered, the behavior does at a minimum seem unexpected, feel free to close this if needed.

RE asbytes(), I don't see that in the String docs? https://docs.modular.com/mojo/stdlib/builtin/string.html Is there another place for docs that I should be looking, or is that maybe in a yet-to-be-public version?

@soraros
Copy link
Contributor

soraros commented Feb 15, 2024

@arthurevans What I get from your response is that the internal representation of String requires null termination. Is that accurate? Why go with that design? Maybe we can add those information to the doc?

@arthurevans
Copy link
Collaborator

@sstadick private methods (with a leading underscore) don't appear in the docs but do appear in code completion hints. In this case, I don't see it there, either, so it must be new. Sorry for the confusion. At any event, adding the null yourself is probably the better path for now. I've opened a PR to clarify the API docs as a short-term fix until the API can be updated.

@soraros yes, the internal buffer requires null termination. I believe this is not really for Mojo's sake, since Mojo tracks the length of the string. But it simplifies communicating with lower-level facilities that take C-style strings. (For example, if you open a file on Linux, at some point the Mojo std library will call into libc's open(), which takes the pathname as C-style string.)

@soraros
Copy link
Contributor

soraros commented Feb 16, 2024

@arthurevans Thanks for the reply. That explanation is both foreseeable and unfortunate. However, why didn't Mojo go with a separate CString type like Rust and Swift? Can we at least have the rationale documented somewhere?

@ematejska ematejska added the mojo-repo Tag all issues with this label label Apr 29, 2024
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 3, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 3, 2024 — with Linear
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 6, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 6, 2024 — with Linear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library
Projects
None yet
Development

No branches or pull requests

5 participants