Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer not aligned #231

Closed
iacore opened this issue Apr 11, 2023 · 10 comments · Fixed by #235
Closed

buffer not aligned #231

iacore opened this issue Apr 11, 2023 · 10 comments · Fixed by #235

Comments

@iacore
Copy link
Contributor

iacore commented Apr 11, 2023

I converted the pytorch checkpoint to safetensors. The buffer is not aligned

RWKV-4-Pile-430M-20220808-8066.pth is from https://huggingface.co/BlinkDL/rwkv-4-pile-430m
The convert script is here: https://github.com/iacore/rwkv-np/blob/main/convert.py

> xxd RWKV-4-Pile-430M-20220808-8066.safetensors | head -n 2
00000000: 66a7 0000 0000 0000 7b22 626c 6f63 6b73  f.......{"blocks
00000010: 2e30 2e61 7474 2e6b 6579 2e77 6569 6768  .0.att.key.weigh

The tensor data are all f32.

0xa766 % 4 == 2

Why not aligned: the offset count from after the metadata header (sized 0xa766).

@Narsil
Copy link
Collaborator

Narsil commented Apr 11, 2023

Which version are you using ? Alignement was added in 0.3.0.

@iacore
Copy link
Contributor Author

iacore commented Apr 11, 2023

0.3.0

@Narsil
Copy link
Collaborator

Narsil commented Apr 13, 2023

Hey, I just took a look.

For this file: https://huggingface.co/BlinkDL/rwkv-4-pile-430m/blob/main/RWKV-4-Pile-430M-20220808-8066.pth

All tensor data are bfloat16, not f32 and alignment of f16 is respected there, no ?

@iacore
Copy link
Contributor Author

iacore commented Apr 13, 2023

The convert script is here: https://github.com/iacore/rwkv-np/blob/main/convert.py

The file I used is .safetensors, which has only float32 data. Please use this script to convert the model first.

@iacore
Copy link
Contributor Author

iacore commented Apr 19, 2023

The problem: offsets are calculated from the end of header. If the header size is not aligned by 4, even if the offsets are aligned, tho actual memory address won't be aligned

@Narsil
Copy link
Collaborator

Narsil commented Apr 19, 2023

I'm not sure I understand. The header get added empty spaces, until memory addresses are aligned.

The offsets alignment doesn't matter. I could share a script to showcase addresses alignement if you want.
I used this alignement regularly already to truly get zero-copy. Mostly on f32 so it could be possible that other alignment might have issues.

@iacore
Copy link
Contributor Author

iacore commented Apr 20, 2023

I'm not sure I understand. The header get added empty spaces, until memory addresses are aligned.

The offsets alignment doesn't matter. I could share a script to showcase addresses alignement if you want. I used this alignement regularly already to truly get zero-copy. Mostly on f32 so it could be possible that other alignment might have issues.

I know it's possible to add alignment on Linux/POSIX mmap. The problem is that the model created with safetensors's Python library doesn't load aligned with safetensor's Rust library.

@Narsil
Copy link
Collaborator

Narsil commented Apr 20, 2023

You can check this:

from huggingface_hub import hf_hub_download
import torch
from safetensors.torch import load_file, save_file

filename = hf_hub_download("BlinkDL/rwkv-4-pile-430m", filename="RWKV-4-Pile-430M-20220808-8066.pth")

weights = torch.load(filename, map_location="cpu")
save_file(weights, "out.safetensors")

import mmap
import torch
import json
import os
from huggingface_hub import hf_hub_download


def load_file(filename, device):
    with open(filename, mode="r", encoding="utf8") as file_obj:
        with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m:
            header = m.read(8)
            n = int.from_bytes(header, "little")
            metadata_bytes = m.read(n)
            metadata = json.loads(metadata_bytes)

    size = os.stat(filename).st_size
    storage = torch.ByteStorage.from_file(filename, shared=False, size=size).untyped()
    offset = n + 8
    return {name: create_tensor(storage, info, offset) for name, info in metadata.items() if name != "__metadata__"}


DTYPES = {"F32": torch.float32, "BF16": torch.bfloat16}
ALIGNMENT = {torch.float32: 4, torch.bfloat16: 2}

device = "cpu"


def create_tensor(storage, info, offset):
    dtype = DTYPES[info["dtype"]]
    shape = info["shape"]
    start, stop = info["data_offsets"]
    print((start + offset) % ALIGNMENT[dtype])
    return torch.asarray(storage[start + offset : stop + offset], dtype=torch.uint8).view(dtype=dtype).reshape(shape)


weights = load_file("out.safetensors", device)

The loading is done in pure Python just so that you can mess with pointers easily.

the mmap initial pointer is always page aligned and what's important to check is storage + offset. Does this work correctly ?

@iacore
Copy link
Contributor Author

iacore commented Apr 20, 2023

Your example works correctly by coincidence. Please try this. I only changed a few lines to convert the weights to f32.

from huggingface_hub import hf_hub_download
import torch
from safetensors.torch import load_file, save_file

filename = hf_hub_download("BlinkDL/rwkv-4-pile-430m", filename="RWKV-4-Pile-430M-20220808-8066.pth")

weights = torch.load(filename, map_location="cpu")

for k in weights.keys():
    weights[k] = weights[k].float() # convert to float32

save_file(weights, "out.safetensors")

import mmap
import torch
import json
import os
from huggingface_hub import hf_hub_download


def load_file(filename, device):
    with open(filename, mode="r", encoding="utf8") as file_obj:
        with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m:
            header = m.read(8)
            n = int.from_bytes(header, "little")
            metadata_bytes = m.read(n)
            metadata = json.loads(metadata_bytes)

    size = os.stat(filename).st_size
    storage = torch.ByteStorage.from_file(filename, shared=False, size=size).untyped()
    offset = n + 8
    # print(n)
    return {name: create_tensor(storage, info, offset) for name, info in metadata.items() if name != "__metadata__"}


DTYPES = {"F32": torch.float32, "BF16": torch.bfloat16}
ALIGNMENT = {torch.float32: 4, torch.bfloat16: 2}

device = "cpu"


def create_tensor(storage, info, offset):
    dtype = DTYPES[info["dtype"]]
    shape = info["shape"]
    start, stop = info["data_offsets"]
    print((start + offset) % ALIGNMENT[dtype])
    return torch.asarray(storage[start + offset : stop + offset], dtype=torch.uint8).view(dtype=dtype).reshape(shape)


weights = load_file("out.safetensors", device)

@Narsil
Copy link
Collaborator

Narsil commented Apr 20, 2023

Indeed that's pretty bad !

I created #235 to fix that.

I did some testing with various models on custom backends and I was pretty lucky I guess.

davidar added a commit to davidar/eigenGPT that referenced this issue Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants