You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following along with this blog post, I'm trying to get basic PNG loading working in mojo (I previously tried using cv2 directly from mojo, but couldn't find a way to convert a numpy array to a tensor in a reasonable time).
The code uses FFI to call to zlib to decompress the PNG data chunk. This step works fine. Then the code walks through each "scanline" of the PNG, undoes the filtering and places the line into a newly initialized tensor of the appropriate size. This also seems to work okay, which I verified with a print statement right before the tensor is returned.
However, control never makes it back to the caller. Instead, somewhere between the return and the call the following trace is thrown:
LLVM ERROR: out of memory
Allocation failed
Please submit a bug report to https://github.com/modularml/mojo/issues and include the crash backtrace along with all the relevant source codes.
Stack dump:
0. Program arguments: mojo test.mojo
#0 0x000055db77683478 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x12a1478)
#1 0x000055db7768129e (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x129f29e)
#2 0x000055db77683b0d (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x12a1b0d)
#3 0x00007ff87377e420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#4 0x00007ff87320900b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
#5 0x00007ff8731e8859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
#6 0x000055db776206c8 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x123e6c8)
#7 0x000055db77620702 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x123e702)
#8 0x000055db78e1fb35 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x2a3db35)
#9 0x000055db79a72a82 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x3690a82)
#10 0x000055db79a72d58 (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x3690d58)
#11 0x000055db79a710ed (/home/karaqst/.modular/pkg/packages.modular.com_max/bin/mojo+0x368f0ed)
#12 0x00007ff8631deead KGEN_CompilerRT_AlignedAlloc (/home/karaqst/.modular/pkg/packages.modular.com_max/lib/libKGENCompilerRTShared.so.19.0git+0x3cead)
#13 0x00007ff81c0154bf
mojo crashed!
Please file a bug report.
Aborted (core dumped)
I've posted a simplified version of the code below. Originally the code crashed on 1024*1024 PNGs, but I also tried generating PNGs as small as 10x10 and I still run out of memory...
This could of course be something I'm doing wrong. I'm still learning about how Mojo works, but this one just seemed a bit strange to me.
Thanks in advance!
Steps to reproduce
The following code should hopefully be complete—see the very bottom for where the crash happens. The only requirements are zlib and a very simple PNG file, which can be generated with the following python code:
from sys import ffi
from tensor import Tensor, TensorSpec, TensorShape
from utils.index import Index
from pathlib import Path
from testing import assert_true
from bit import byte_swap, bit_reverse
from algorithm import vectorize
from sys.info import simdwidthof
aliasBytef= Scalar[DType.uint8]
aliasuLong= UInt64
aliaszlib_type=fn (
_out: Pointer[Bytef],
_out_len: Pointer[UInt64],
_in: Pointer[Bytef],
_in_len: uLong,
) -> Int
aliassimd_width= simdwidthof[DType.uint32]()
structChunk(Movable, Copyable):
"""A struct representing a PNG chunk."""varlength: UInt32
"""The lengh of the chunk (in bytes)."""vartype: String
"""The type of the chunk."""vardata: List[UInt8]
"""The data contained in the chunk."""varcrc: UInt32
"""The CRC32 checksum of the chunk."""varend: Int
"""The position in the data list where the chunk ends."""fn__init__(
inoutself,
length: UInt32,
chunk_type: String,
data: List[UInt8],
crc: UInt32,
end: Int,
):
"""Initializes a new Chunk struct. Args: length: The length of the chunk. chunk_type: The type of the chunk. data: The data contained in the chunk. crc: The CRC32 checksum of the chunk. end: The position in the data list where the chunk ends."""self.length = length
self.type = chunk_type
self.data = data
self.crc = crc
self.end = end
fn__moveinit__(inoutself, ownedexisting: Chunk):
"""Move data of an existing Chunk into a new one. Args: existing: The existing Chunk."""self.length = existing.length
self.type = existing.type
self.data = existing.data
self.crc = existing.crc
self.end = existing.end
fn__copyinit__(inoutself, existing: Chunk):
"""Copy constructor for the Chunk struct. Args: existing: The existing struct to copy from."""self.length = existing.length
self.type = existing.type
self.data = existing.data
self.crc = existing.crc
self.end = existing.end
fnbytes_to_uint32_be(ownedlist: List[UInt8]) raises -> List[UInt32]:
"""Converts a list of bytes into a list of UInt32s. Args: list: The List of bytes. Returns: Input data translated to a List of UInt32. Raises: ValueError: If the length of the input list is not a multiple of 4."""
assert_true(
len(list) %4==0,
"List[UInt8] length must be a multiple of 4 to convert to List[UInt32]",
)
varresult_length=len(list) //4# get the data pointer with ownership.# This avoids copying and makes sure only one List owns a pointer to the underlying address.varptr_to_uint8=list.steal_data()
varptr_to_uint32= ptr_to_uint8.bitcast[UInt32]()
vardtype_ptr= DTypePointer[DType.uint32](
Pointer[UInt32](ptr_to_uint32.address)
)
# vectorize byte_swap over DTypePointer@parameterfn_byte_swap[_width: Int](i: Int):
# call byte_swap on a batch of UInt32 valuesvarbit_swapped= byte_swap(dtype_ptr.load[width=_width](i))
# We are updating in place and both ptr_to_uint32 and dtype_ptr share the addresses
dtype_ptr.store[width=_width](i, bit_swapped)
# swap the bytes in each UInt32 to convert from big-endian to little-endian
vectorize[_byte_swap, simd_width](result_length)
return List[UInt32](
unsafe_pointer=ptr_to_uint32, size=result_length, capacity=result_length
)
fnbytes_to_string(list: List[UInt8]) -> String:
"""Converts a list of bytes to a string. Args: list: The List of bytes. Returns: The String representation of the bytes."""varword= String("")
for letter inlist:
word +=chr(int(letter[].cast[DType.uint8]()))
return word
fnCRC32(
data: List[SIMD[DType.uint8, 1]],
value: SIMD[DType.uint32, 1] =0xFFFFFFFF,
) -> SIMD[DType.uint32, 1]:
"""Calculate the CRC32 value for a given list of bytes. Args: data: The list of bytes for chich to calulate the CRC32 value. value: The initial value of the CRC32 calculation. Returns: The CRC32 value for the given list of bytes."""varcrc32= value
for byte in data:
crc32 = (bit_reverse(byte[]).cast[DType.uint32]() <<24) ^ crc32
for _ inrange(8):
if crc32 &0x80000000!=0:
crc32 = (crc32 <<1) ^0x04C11DB7else:
crc32 = crc32 <<1return bit_reverse(crc32 ^0xFFFFFFFF)
defparse_next_chunk(data: List[UInt8], read_head: Int) -> Chunk:
"""Parses the chunk starting at read head. Args: data: A list containing the raw data in the PNG file. read_head: The position in the data list to start reading the chunk. Returns: A Chunk struct containing the information contained in the chink starting at read head."""
chunk_length = bytes_to_uint32_be(data[read_head : read_head +4])[0]
chunk_type = bytes_to_string(data[read_head +4 : read_head +8])
start_data =int(read_head +8)
end_data =int(start_data + chunk_length)
chunk_data = data[start_data:end_data]
start_crc =int(end_data)
end_crc =int(start_crc +4)
chunk_crc = bytes_to_uint32_be(data[start_crc:end_crc])[0]
# Check CRC
assert_true(
CRC32(data[read_head +4 : end_data]) == chunk_crc,
"CRC32 does not match",
)
return Chunk(
length=chunk_length,
chunk_type=chunk_type,
data=chunk_data,
crc=chunk_crc,
end=end_crc,
)
fnuncompress(
compressed: List[UInt8], uncompressed_len: Int, quiet: Bool =True
) raises -> List[UInt8]:
varhandle= ffi.DLHandle("")
varzlib_uncompress= handle.get_function[zlib_type]("uncompress")
varbuffer_len= uncompressed_len +2000#TODO(Dean) how much extra room is needed?varp_uncompressed= Pointer[Bytef].alloc(buffer_len)
varp_compressed= Pointer[Bytef].alloc(len(compressed))
varp_uncompressed_len= Pointer[uLong].alloc(1)
memset_zero(p_uncompressed, buffer_len)
memset_zero(p_uncompressed_len, 1)
p_uncompressed_len[0] = buffer_len
for i inrange(len(compressed)):
p_compressed.store(i, compressed[i])
varZ_RES: Int32 = zlib_uncompress(
p_uncompressed,
p_uncompressed_len,
p_compressed,
len(compressed),
)
ifnot quiet:
# _log_zlib_result(Z_RES, compressing=False)print("Uncompressed length: "+str(p_uncompressed_len[0]))
# Can probably do something more efficient here with pointers, but eh.varres= List[UInt8]()
for i inrange(p_uncompressed_len[0]):
res.append(p_uncompressed[i])
p_uncompressed.free()
p_compressed.free()
p_uncompressed_len.free()
return res
fn_undo_trivial(
current: Int16, left: Int16 =0, above: Int16 =0, above_left: Int16 =0
) -> Int16:
return current
fn_undo_sub(
current: Int16, left: Int16 =0, above: Int16 =0, above_left: Int16 =0
) -> Int16:
return current + left
fn_undo_up(
current: Int16, left: Int16 =0, above: Int16 =0, above_left: Int16 =0
) -> Int16:
return current + above
fn_undo_average(
current: Int16, left: Int16 =0, above: Int16 =0, above_left: Int16 =0
) -> Int16:
return current + (
(above + left) >>1
) # Bitshift is equivalent to division by 2fn_undo_paeth(
current: Int16, left: Int16 =0, above: Int16 =0, above_left: Int16 =0
) -> Int16:
varpeath: Int16 = left + above - above_left
varpeath_a: Int16 =abs(peath - left)
varpeath_b: Int16 =abs(peath - above)
varpeath_c: Int16 =abs(peath - above_left)
if (peath_a <= peath_b) and (peath_a <= peath_c):
return current + left
elif peath_b <= peath_c:
return current + above
else:
return current + above_left
fn_undo_filter(
filter_type: UInt8,
current: UInt8,
left: UInt8 =0,
above: UInt8 =0,
above_left: UInt8 =0,
) raises -> UInt8:
varcurrent_int= current.cast[DType.int16]()
varleft_int= left.cast[DType.int16]()
varabove_int= above.cast[DType.int16]()
varabove_left_int= above_left.cast[DType.int16]()
varresult_int: Int16 =0if filter_type ==0:
result_int = _undo_trivial(
current_int, left_int, above_int, above_left_int
)
elif filter_type ==1:
result_int = _undo_sub(current_int, left_int, above_int, above_left_int)
elif filter_type ==2:
result_int = _undo_up(current_int, left_int, above_int, above_left_int)
elif filter_type ==3:
result_int = _undo_average(
current_int, left_int, above_int, above_left_int
)
elif filter_type ==4:
result_int = _undo_paeth(
current_int, left_int, above_int, above_left_int
)
else:
print("Unknown filter type", filter_type)
raise Error("Unknown filter type")
return result_int.cast[DType.uint8]()
fnload_png_as_tensor[
type: DType = DType.uint8
](fpath: Path) raises -> Tensor[type]:
varraw_data: List[UInt8]
withopen(fpath, "r") as image_file:
raw_data = image_file.read_bytes()
varread_head=8varheader_chunk= parse_next_chunk(raw_data, read_head)
read_head = header_chunk.end
varwidth=int(bytes_to_uint32_be(header_chunk.data[0:4])[0])
varheight=int(bytes_to_uint32_be(header_chunk.data[4:8])[0])
varbit_depth=int(header_chunk.data[8].cast[DType.uint32]())
varcolor_type=int(header_chunk.data[9])
varcompression_method= header_chunk.data[10].cast[DType.uint8]()
varfilter_method= header_chunk.data[11].cast[DType.uint8]()
varinterlaced= header_chunk.data[12].cast[DType.uint8]()
assert_true(color_type ==2, "Only RGB images are supported")
assert_true(bit_depth ==8, "Only 8-bit images are supported")
assert_true(interlaced ==0, "Interlaced images are not supported")
assert_true(compression_method ==0, "Compression method not supported")
varcolor_type_dict= Dict[Int, Int]()
color_type_dict[0] =1
color_type_dict[2] =3
color_type_dict[3] =1
color_type_dict[4] =2
color_type_dict[6] =4varchannels= color_type_dict[color_type]
varpixel_size: Int = channels * (bit_depth //8)
# Scan over chunks until end foundvarended=Falsevardata_found=Falsevarcompressed_data= List[UInt8]()
while read_head <len(raw_data) andnot ended:
varchunk= parse_next_chunk(raw_data, read_head)
read_head = chunk.end
if chunk.type =="IDAT":
compressed_data.extend(chunk.data)
data_found =Trueelif chunk.type =="IEND":
ended =True
assert_true(ended, "IEND chunk not found")
assert_true(data_found, "IDAT chunk not found")
varuncompressed_data= uncompress(
compressed_data, width * height * pixel_size, False
)
### CONVERT DATA TO TENSORvarspec= TensorSpec(DType.uint8, height, width, channels)
vartensor_image= Tensor[DType.uint8](spec)
# Initialize the previous scanline to 0varprevious_result= List[UInt8](0* width)
for line inrange(height):
varline_len= width * pixel_size
varoffset=1+1* line + line * line_len
varleft: UInt8 =0varabove_left: UInt8 =0varresult= List[UInt8](capacity=width * pixel_size)
# print("pulling out scanline", offset + line_len, len(uncompressed_data))varscanline= uncompressed_data[offset : offset + line_len]
varfilter_type= uncompressed_data[offset -1]
for i inrange(len(scanline)):
if i >= pixel_size:
left = result[i - pixel_size]
above_left = previous_result[i - pixel_size]
result.append(
_undo_filter(
filter_type,
uncompressed_data[i + offset],
left,
previous_result[i],
above_left,
)
)
previous_result = result
for x inrange(width):
for c inrange(channels):
tensor_image[Index(line, x, c)] = result[x * channels + c]
print(
"returning tensor", tensor_image.shape().num_elements()
) # <-- THIS PRINTS JUST FINEreturn tensor_image
defmain():
varimg_tensor= load_png_as_tensor[](Path("random_image.png"))
print(
"image loaded", img_tensor.shape().num_elements()
) # <- NEVER PRINTS, CRASH HAPPENS BEFORE
System information
- What OS did you do install Mojo on ?
Ubuntu 20.04.6 LTS
- Provide version information for Mojo by pasting the output of `mojo -v`
mojo 24.4.0 (59977802)
- Provide Modular CLI version by pasting the output of `modular -v`
modular 0.9.2 (b3079bd5)
The text was updated successfully, but these errors were encountered:
Bug description
Following along with this blog post, I'm trying to get basic PNG loading working in mojo (I previously tried using cv2 directly from mojo, but couldn't find a way to convert a numpy array to a tensor in a reasonable time).
The code uses FFI to call to zlib to decompress the PNG data chunk. This step works fine. Then the code walks through each "scanline" of the PNG, undoes the filtering and places the line into a newly initialized tensor of the appropriate size. This also seems to work okay, which I verified with a print statement right before the tensor is returned.
However, control never makes it back to the caller. Instead, somewhere between the return and the call the following trace is thrown:
I've posted a simplified version of the code below. Originally the code crashed on 1024*1024 PNGs, but I also tried generating PNGs as small as 10x10 and I still run out of memory...
This could of course be something I'm doing wrong. I'm still learning about how Mojo works, but this one just seemed a bit strange to me.
Thanks in advance!
Steps to reproduce
The following code should hopefully be complete—see the very bottom for where the crash happens. The only requirements are zlib and a very simple PNG file, which can be generated with the following python code:
System information
The text was updated successfully, but these errors were encountered: