Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mojo2austin expecting utf-8, found latin-1 #30

Open
dooferlad opened this issue Sep 24, 2024 · 6 comments
Open

mojo2austin expecting utf-8, found latin-1 #30

dooferlad opened this issue Sep 24, 2024 · 6 comments

Comments

@dooferlad
Copy link

Description

Running mojo2austin on a file I just generated gives an error:

Traceback (most recent call last):
  File "/home/dooferlad/.venv/bin/mojo2austin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/dooferlad/.venv/lib/python3.11/site-packages/austin/format/mojo.py", line 541, in main
    for event in MojoFile(mojo).parse():
  File "/home/dooferlad/.venv/lib/python3.11/site-packages/austin/format/mojo.py", line 507, in parse
    for e in self.parse_event():
  File "/home/dooferlad/.venv/lib/python3.11/site-packages/austin/format/mojo.py", line 492, in parse_event
    for event in t.cast(dict, self.__handlers__)[event_id](self):
  File "/home/dooferlad/.venv/lib/python3.11/site-packages/austin/format/mojo.py", line 469, in parse_string
    value = self.read_string()
            ^^^^^^^^^^^^^^^^^^
  File "/home/dooferlad/.venv/lib/python3.11/site-packages/austin/format/mojo.py", line 331, in read_string
    return self.read_until().decode()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1: invalid start byte

Steps to Reproduce

  1. Run a Django project: austin --output /home/dooferlad/supportsite.austin --binary --heap=2048 ./manage.py runserver 8001 --noreload --skip-checks
  2. mojo2austin /home/dooferlad/supportsite.austin /home/dooferlad/supportsite.austin-txt

Versions

  • Python 3.11.9
  • austin 3.6.0
  • austin-python==1.7.1

My environment is set up with:

LANGUAGE=en_GB.UTF-8
LANG=en_GB.UTF-8

Additional Information

To get mojo2austin and austin2speedscope to work, I made these changes: In austin/format/mojo.py at line 331:

    def read_string(self) -> str:
        """Read a string from the MOJO file."""
        return self.read_until().decode(encoding="latin-1")

And also austin/stats.py line 419:

    def __enter__(self) -> "AustinFileReader":
        """Open the Austin file and read the metadata."""
        self._stream = open(self.file, encoding="latin-1")

I assume that the string in the Mojo file is from the Python application, but I don't actually know. I am not sure if the above change is actually a fix or just masking the real bug!

@P403n1x87
Copy link
Owner

@dooferlad thanks for reporting this. Does this happen with every MOJO file generated by Austin? Sometimes some files might be corrupted because of bad samples so it's worth trying collecting them again.

@dooferlad
Copy link
Author

It definitely is happening every time for this project. I get a lot of invalid samples, so I suppose this is something I just have to live with?

⌛ Sampling duration : 14.00 s
⏱️  Frame sampling (min/avg/max) : 24/208/20201 μs
🐢 Long sampling rate : 438/12575 (3.48 %) samples took longer than the sampling interval to collect
💀 Error rate : 2867/12575 (22.80 %) invalid samples

@dooferlad
Copy link
Author

Of course, I say that and then I tried the latest github release instead of the latest snap and the conversion worked. I still have a lot of invalid samples though!

@dooferlad
Copy link
Author

Yes, it seems like invalid samples are the problem. Re-running multiple times gives me a selection of bytes that can't be decoded as UTF-8 arriving at different positions in the profile. I already have a large heap (4GiB). Is there anything else I can do to reduce errors?

@dooferlad
Copy link
Author

FWIW, this does the right thing, I think!

# austin/stats.py line 428
    def __iter__(self) -> Iterator:
        """Iterator over the samples in the Austin file."""

        def _() -> Generator[str, None, None]:
            assert self._stream_iter is not None

            while True:
                try:
                    line = self._stream.readline()
                    if line == "\n":
                        break
                    yield line
                except UnicodeDecodeError:
                    pass

            self._read_meta()

        return _()

Would you like me to submit a PR?

@P403n1x87
Copy link
Owner

@dooferlad yes please, any contribution is very welcome. As for the invalid samples, if you're specifically referring to the stats reported by Austin at the end, there isn't much that can be done about that. That's just the nature of an out-of-process profiler like Austin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants