[BUG] "LLVM ERROR: out of memory" on returning tensor #3465

deanrobertcook · 2024-09-10T11:57:04Z

Bug description

Following along with this blog post, I'm trying to get basic PNG loading working in mojo (I previously tried using cv2 directly from mojo, but couldn't find a way to convert a numpy array to a tensor in a reasonable time).

The code uses FFI to call to zlib to decompress the PNG data chunk. This step works fine. Then the code walks through each "scanline" of the PNG, undoes the filtering and places the line into a newly initialized tensor of the appropriate size. This also seems to work okay, which I verified with a print statement right before the tensor is returned.

However, control never makes it back to the caller. Instead, somewhere between the return and the call the following trace is thrown:

LLVM ERROR: out of memory
Allocation failed
Please submit a bug report to https://github.com/modularml/mojo/issues and include the crash backtrace along with all the relevant source codes.
Stack dump:
0.	Program arguments: mojo test.mojo
 #0 0x000055db77683478 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x12a1478)
 #1 0x000055db7768129e (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x129f29e)
 #2 0x000055db77683b0d (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x12a1b0d)
 #3 0x00007ff87377e420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007ff87320900b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007ff8731e8859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #6 0x000055db776206c8 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x123e6c8)
 #7 0x000055db77620702 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x123e702)
 #8 0x000055db78e1fb35 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x2a3db35)
 #9 0x000055db79a72a82 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x3690a82)
#10 0x000055db79a72d58 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x3690d58)
#11 0x000055db79a710ed (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x368f0ed)
#12 0x00007ff8631deead KGEN_CompilerRT_AlignedAlloc (/home/karaqst/.modular/pkg/packages.modular.com_max/lib/libKGENCompilerRTShared.so.19.0git+0x3cead)
#13 0x00007ff81c0154bf
mojo crashed!
Please file a bug report.
Aborted (core dumped)

I've posted a simplified version of the code below. Originally the code crashed on 1024*1024 PNGs, but I also tried generating PNGs as small as 10x10 and I still run out of memory...

This could of course be something I'm doing wrong. I'm still learning about how Mojo works, but this one just seemed a bit strange to me.

Thanks in advance!

Steps to reproduce

The following code should hopefully be complete—see the very bottom for where the crash happens. The only requirements are zlib and a very simple PNG file, which can be generated with the following python code:

from PIL import Image
import random

width, height = 10, 10 

img = Image.new("RGB", (width, height))

for x in range(width):
    for y in range(height):
        r = random.randint(0, 255)
        g = random.randint(0, 255)
        b = random.randint(0, 255)
        img.putpixel((x, y), (r, g, b))

img.save("random_image.png")

from sys import ffi
from tensor import Tensor, TensorSpec, TensorShape
from utils.index import Index
from pathlib import Path
from testing import assert_true
from bit import byte_swap, bit_reverse
from algorithm import vectorize
from sys.info import simdwidthof

alias Bytef = Scalar[DType.uint8]
alias uLong = UInt64
alias zlib_type = fn (
    _out: Pointer[Bytef],
    _out_len: Pointer[UInt64],
    _in: Pointer[Bytef],
    _in_len: uLong,
) -> Int

alias simd_width = simdwidthof[DType.uint32]()


struct Chunk(Movable, Copyable):
    """A struct representing a PNG chunk."""

    var length: UInt32
    """The lengh of the chunk (in bytes)."""
    var type: String
    """The type of the chunk."""
    var data: List[UInt8]
    """The data contained in the chunk."""
    var crc: UInt32
    """The CRC32 checksum of the chunk."""
    var end: Int
    """The position in the data list where the chunk ends."""

    fn __init__(
        inout self,
        length: UInt32,
        chunk_type: String,
        data: List[UInt8],
        crc: UInt32,
        end: Int,
    ):
        """Initializes a new Chunk struct.

        Args:
            length: The length of the chunk.
            chunk_type: The type of the chunk.
            data: The data contained in the chunk.
            crc: The CRC32 checksum of the chunk.
            end: The position in the data list where the chunk ends.
        """
        self.length = length
        self.type = chunk_type
        self.data = data
        self.crc = crc
        self.end = end

    fn __moveinit__(inout self, owned existing: Chunk):
        """Move data of an existing Chunk into a new one.

        Args:
            existing: The existing Chunk.
        """
        self.length = existing.length
        self.type = existing.type
        self.data = existing.data
        self.crc = existing.crc
        self.end = existing.end

    fn __copyinit__(inout self, existing: Chunk):
        """Copy constructor for the Chunk struct.

        Args:
          existing: The existing struct to copy from.
        """
        self.length = existing.length
        self.type = existing.type
        self.data = existing.data
        self.crc = existing.crc
        self.end = existing.end


fn bytes_to_uint32_be(owned list: List[UInt8]) raises -> List[UInt32]:
    """Converts a list of bytes into a list of UInt32s.

    Args:
        list: The List of bytes.

    Returns:
        Input data translated to a List of UInt32.

    Raises:
        ValueError: If the length of the input list is not a multiple of 4.
    """
    assert_true(
        len(list) % 4 == 0,
        "List[UInt8] length must be a multiple of 4 to convert to List[UInt32]",
    )
    var result_length = len(list) // 4

    # get the data pointer with ownership.
    # This avoids copying and makes sure only one List owns a pointer to the underlying address.
    var ptr_to_uint8 = list.steal_data()
    var ptr_to_uint32 = ptr_to_uint8.bitcast[UInt32]()
    var dtype_ptr = DTypePointer[DType.uint32](
        Pointer[UInt32](ptr_to_uint32.address)
    )

    # vectorize byte_swap over DTypePointer
    @parameter
    fn _byte_swap[_width: Int](i: Int):
        # call byte_swap on a batch of UInt32 values
        var bit_swapped = byte_swap(dtype_ptr.load[width=_width](i))
        # We are updating in place and both ptr_to_uint32 and dtype_ptr share the addresses
        dtype_ptr.store[width=_width](i, bit_swapped)

    # swap the bytes in each UInt32 to convert from big-endian to little-endian
    vectorize[_byte_swap, simd_width](result_length)

    return List[UInt32](
        unsafe_pointer=ptr_to_uint32, size=result_length, capacity=result_length
    )


fn bytes_to_string(list: List[UInt8]) -> String:
    """Converts a list of bytes to a string.

    Args:
        list: The List of bytes.

    Returns:
        The String representation of the bytes.
    """
    var word = String("")
    for letter in list:
        word += chr(int(letter[].cast[DType.uint8]()))

    return word


fn CRC32(
    data: List[SIMD[DType.uint8, 1]],
    value: SIMD[DType.uint32, 1] = 0xFFFFFFFF,
) -> SIMD[DType.uint32, 1]:
    """Calculate the CRC32 value for a given list of bytes.

    Args:
        data: The list of bytes for chich to calulate the CRC32 value.
        value: The initial value of the CRC32 calculation.

    Returns:
        The CRC32 value for the given list of bytes.
    """
    var crc32 = value
    for byte in data:
        crc32 = (bit_reverse(byte[]).cast[DType.uint32]() << 24) ^ crc32
        for _ in range(8):
            if crc32 & 0x80000000 != 0:
                crc32 = (crc32 << 1) ^ 0x04C11DB7
            else:
                crc32 = crc32 << 1

    return bit_reverse(crc32 ^ 0xFFFFFFFF)


def parse_next_chunk(data: List[UInt8], read_head: Int) -> Chunk:
    """Parses the chunk starting at read head.

    Args:
        data: A list containing the raw data in the PNG file.
        read_head: The position in the data list to start reading the chunk.

    Returns:
        A Chunk struct containing the information contained in the chink starting at read head.
    """
    chunk_length = bytes_to_uint32_be(data[read_head : read_head + 4])[0]
    chunk_type = bytes_to_string(data[read_head + 4 : read_head + 8])
    start_data = int(read_head + 8)
    end_data = int(start_data + chunk_length)
    chunk_data = data[start_data:end_data]
    start_crc = int(end_data)
    end_crc = int(start_crc + 4)
    chunk_crc = bytes_to_uint32_be(data[start_crc:end_crc])[0]

    # Check CRC
    assert_true(
        CRC32(data[read_head + 4 : end_data]) == chunk_crc,
        "CRC32 does not match",
    )
    return Chunk(
        length=chunk_length,
        chunk_type=chunk_type,
        data=chunk_data,
        crc=chunk_crc,
        end=end_crc,
    )


fn uncompress(
    compressed: List[UInt8], uncompressed_len: Int, quiet: Bool = True
) raises -> List[UInt8]:
    var handle = ffi.DLHandle("")
    var zlib_uncompress = handle.get_function[zlib_type]("uncompress")

    var buffer_len = uncompressed_len + 2000  # TODO(Dean) how much extra room is needed?
    var p_uncompressed = Pointer[Bytef].alloc(buffer_len)
    var p_compressed = Pointer[Bytef].alloc(len(compressed))
    var p_uncompressed_len = Pointer[uLong].alloc(1)

    memset_zero(p_uncompressed, buffer_len)
    memset_zero(p_uncompressed_len, 1)
    p_uncompressed_len[0] = buffer_len

    for i in range(len(compressed)):
        p_compressed.store(i, compressed[i])

    var Z_RES: Int32 = zlib_uncompress(
        p_uncompressed,
        p_uncompressed_len,
        p_compressed,
        len(compressed),
    )

    if not quiet:
        # _log_zlib_result(Z_RES, compressing=False)
        print("Uncompressed length: " + str(p_uncompressed_len[0]))
    # Can probably do something more efficient here with pointers, but eh.
    var res = List[UInt8]()
    for i in range(p_uncompressed_len[0]):
        res.append(p_uncompressed[i])

    p_uncompressed.free()
    p_compressed.free()
    p_uncompressed_len.free()
    return res


fn _undo_trivial(
    current: Int16, left: Int16 = 0, above: Int16 = 0, above_left: Int16 = 0
) -> Int16:
    return current


fn _undo_sub(
    current: Int16, left: Int16 = 0, above: Int16 = 0, above_left: Int16 = 0
) -> Int16:
    return current + left


fn _undo_up(
    current: Int16, left: Int16 = 0, above: Int16 = 0, above_left: Int16 = 0
) -> Int16:
    return current + above


fn _undo_average(
    current: Int16, left: Int16 = 0, above: Int16 = 0, above_left: Int16 = 0
) -> Int16:
    return current + (
        (above + left) >> 1
    )  # Bitshift is equivalent to division by 2


fn _undo_paeth(
    current: Int16, left: Int16 = 0, above: Int16 = 0, above_left: Int16 = 0
) -> Int16:
    var peath: Int16 = left + above - above_left
    var peath_a: Int16 = abs(peath - left)
    var peath_b: Int16 = abs(peath - above)
    var peath_c: Int16 = abs(peath - above_left)
    if (peath_a <= peath_b) and (peath_a <= peath_c):
        return current + left
    elif peath_b <= peath_c:
        return current + above
    else:
        return current + above_left


fn _undo_filter(
    filter_type: UInt8,
    current: UInt8,
    left: UInt8 = 0,
    above: UInt8 = 0,
    above_left: UInt8 = 0,
) raises -> UInt8:
    var current_int = current.cast[DType.int16]()
    var left_int = left.cast[DType.int16]()
    var above_int = above.cast[DType.int16]()
    var above_left_int = above_left.cast[DType.int16]()
    var result_int: Int16 = 0

    if filter_type == 0:
        result_int = _undo_trivial(
            current_int, left_int, above_int, above_left_int
        )
    elif filter_type == 1:
        result_int = _undo_sub(current_int, left_int, above_int, above_left_int)
    elif filter_type == 2:
        result_int = _undo_up(current_int, left_int, above_int, above_left_int)
    elif filter_type == 3:
        result_int = _undo_average(
            current_int, left_int, above_int, above_left_int
        )
    elif filter_type == 4:
        result_int = _undo_paeth(
            current_int, left_int, above_int, above_left_int
        )
    else:
        print("Unknown filter type", filter_type)
        raise Error("Unknown filter type")
    return result_int.cast[DType.uint8]()


fn load_png_as_tensor[
    type: DType = DType.uint8
](fpath: Path) raises -> Tensor[type]:
    var raw_data: List[UInt8]
    with open(fpath, "r") as image_file:
        raw_data = image_file.read_bytes()

    var read_head = 8

    var header_chunk = parse_next_chunk(raw_data, read_head)
    read_head = header_chunk.end

    var width = int(bytes_to_uint32_be(header_chunk.data[0:4])[0])
    var height = int(bytes_to_uint32_be(header_chunk.data[4:8])[0])
    var bit_depth = int(header_chunk.data[8].cast[DType.uint32]())
    var color_type = int(header_chunk.data[9])
    var compression_method = header_chunk.data[10].cast[DType.uint8]()
    var filter_method = header_chunk.data[11].cast[DType.uint8]()
    var interlaced = header_chunk.data[12].cast[DType.uint8]()

    assert_true(color_type == 2, "Only RGB images are supported")
    assert_true(bit_depth == 8, "Only 8-bit images are supported")
    assert_true(interlaced == 0, "Interlaced images are not supported")
    assert_true(compression_method == 0, "Compression method not supported")

    var color_type_dict = Dict[Int, Int]()
    color_type_dict[0] = 1
    color_type_dict[2] = 3
    color_type_dict[3] = 1
    color_type_dict[4] = 2
    color_type_dict[6] = 4

    var channels = color_type_dict[color_type]
    var pixel_size: Int = channels * (bit_depth // 8)

    # Scan over chunks until end found
    var ended = False
    var data_found = False
    var compressed_data = List[UInt8]()
    while read_head < len(raw_data) and not ended:
        var chunk = parse_next_chunk(raw_data, read_head)
        read_head = chunk.end

        if chunk.type == "IDAT":
            compressed_data.extend(chunk.data)
            data_found = True
        elif chunk.type == "IEND":
            ended = True

    assert_true(ended, "IEND chunk not found")
    assert_true(data_found, "IDAT chunk not found")
    var uncompressed_data = uncompress(
        compressed_data, width * height * pixel_size, False
    )

    ### CONVERT DATA TO TENSOR
    var spec = TensorSpec(DType.uint8, height, width, channels)
    var tensor_image = Tensor[DType.uint8](spec)

    # Initialize the previous scanline to 0
    var previous_result = List[UInt8](0 * width)

    for line in range(height):
        var line_len = width * pixel_size
        var offset = 1 + 1 * line + line * line_len
        var left: UInt8 = 0
        var above_left: UInt8 = 0

        var result = List[UInt8](capacity=width * pixel_size)
        # print("pulling out scanline", offset + line_len, len(uncompressed_data))
        var scanline = uncompressed_data[offset : offset + line_len]

        var filter_type = uncompressed_data[offset - 1]

        for i in range(len(scanline)):
            if i >= pixel_size:
                left = result[i - pixel_size]
                above_left = previous_result[i - pixel_size]

            result.append(
                _undo_filter(
                    filter_type,
                    uncompressed_data[i + offset],
                    left,
                    previous_result[i],
                    above_left,
                )
            )

        previous_result = result
        for x in range(width):
            for c in range(channels):
                tensor_image[Index(line, x, c)] = result[x * channels + c]

    print(
        "returning tensor", tensor_image.shape().num_elements()
    )  # <-- THIS PRINTS JUST FINE
    return tensor_image


def main():
    var img_tensor = load_png_as_tensor[](Path("random_image.png"))
    print(
        "image loaded", img_tensor.shape().num_elements()
    )  # <- NEVER PRINTS, CRASH HAPPENS BEFORE

System information

- What OS did you do install Mojo on ?
Ubuntu 20.04.6 LTS

- Provide version information for Mojo by pasting the output of `mojo -v`
mojo 24.4.0 (59977802)

- Provide Modular CLI version by pasting the output of `modular -v`
modular 0.9.2 (b3079bd5)

The text was updated successfully, but these errors were encountered:

deanrobertcook added bug Something isn't working mojo-repo Tag all issues with this label labels Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] "LLVM ERROR: out of memory" on returning tensor #3465

[BUG] "LLVM ERROR: out of memory" on returning tensor #3465

deanrobertcook commented Sep 10, 2024 •

edited

Loading

[BUG] "LLVM ERROR: out of memory" on returning tensor #3465

[BUG] "LLVM ERROR: out of memory" on returning tensor #3465

Comments

deanrobertcook commented Sep 10, 2024 • edited Loading

Bug description

Steps to reproduce

System information

deanrobertcook commented Sep 10, 2024 •

edited

Loading