-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMAP for Windows (not working atm) #341
Conversation
Also I think it's time to add commentaries for the code (mmap specifically), as it already becomes stupidly complicated and has a lot of magic constants. It simply becomes harder to contribute and maintain. |
Originally posted by @anzz1 in #278 (comment) I do not think replacing the default memory allocation routines is the correct way forward. If sharing memory between processes were to be implemented, it should be made through a simple C api outside the main program and not deeply baked inside the main program which makes it hard to maintain and remove if the functionality is not wanted. Any such functionality which strays towards complication and less portability should be implemented as an easy to remove module rather than baking it in. |
You mean something like importing https://github.com/alitrack/mman-win32 ? |
Nope, quite the opposite, steering clear of any non-portable code, imported libraries or dependencies inside the main program, and have any functionality like this rather implemented as a module which would live outside the main program. Being easily separated from the main code and used (or not used) by adding a compiler flag and including the module's .cpp file in the project. Explained in more detail in here: #23 (comment) |
cmake script is broken, no longer detects msvc compiler |
@anzz1 This PR is not about multiprocessing or sharing memory but rather accelerating the loading of the model via a memory mapped file (see #91 for more details). Though I do agree with your point that all the fancy stuff should be kept outside of main to make everythin portable/maintainable. |
@nicknitewolf I looked into mman-win32 and tried to use the source code. Unfortunately it doesn't work. It is pretty similar to what @jart had already written but breaks without Justine's tweaks. I tried to adapt it to match the implementations, but no luck. Sadly I'm out of options at this moment, not enough expertise for this kind of stuff. Unless there will be a person knowing mmap on win who can fix this, the only option is to fall back to default load on Windows. |
Yes, but the goal of accelerating the loading via a memory-mapped file is the ability to preload a model to memory, thus sharing it. Let's not argue about semantics :D In any case, a C-style API is now being implemented: #370 |
I have created my own library that implements mmap using mingw32 that makes this project maintainable for windows. It is possible to compile the program using library from https://github.com/CoderRC/libmingw32_extended, make changes like in #564 and the specific make command below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good evening! I'm back on the job. I'm on a Windows computer. I've just installed MSVC 2022.
Wow! This was so much more than I anticipated, when I suggested you contribute this. I'm really happy with what I'm seeing. I'm going to run your code locally and see how close it is to being in a working state. If it's close, then I would feel comfortable merging this as you've written it, and then pushing a quick change addressing any regressions on Linux / etc. just in case anyone's tracking our mmap
branch.
Then, depending on how quickly I can confirm MSVC is working properly, I'll create a separate pull request, to merge mmap
into master
.
If you're working this evening, then please feel free to push away, addressing the comments below, before I merge.
Also I think it's time to add commentaries for the code (mmap specifically), as it already becomes stupidly complicated and has a lot of magic constants. It simply becomes harder to contribute and maintain.
Magic numbers don't need to be maintained. WIN32 has a strong API stability promise. My goal with mmap.h has been to tuck all that stuff away, hidden within a magical header, that polyfills standard-looking POSIX code, so that WIN32 ideally needn't concern anyone in the future who works on this project. I can see from this change you get that, and you've done a good job improving upon it.
}; | ||
|
||
static void winMSync(magic* addr, size_t len_bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice! Having msync() on WIN32 might not have been strictly necessary. I mostly added it because POSIX has rules about needing msync() which mostly only apply to less common UNIX platforms (for example, OpenBSD lacks read() + memory coherency) but this is really nice to have, especially since it has the error reporting code.
@@ -182,9 +227,9 @@ void *memalign(size_t a, size_t n) { | |||
i = i + sizeof(size_t); | |||
i = ROUNDUP(i, a); | |||
j = ROUNDUP(i + m, MAGIC_GRAN); | |||
if (j > mag->capacity) { | |||
//if (j > mag->capacity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line surprises me and I suppose it relates to why this is still a draft. Since wouldn't this cause us to blow away and recreate mappings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, most likely it is the draft code. I just was trying out different things depending on my intuition of what's happening, not sure how it got into PR
mag->offset = i + m; | ||
spin_unlock(mag->lock); | ||
p = MAGIC_ADDR + i; | ||
((size_t *)p)[-1] = n; | ||
return p; | ||
} | ||
|
||
void *malloc(size_t n) { | ||
void *_malloc(size_t n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah shucks! MSVC won't let us override its own malloc implementation in the linkage process? That's important to have since C++ STL depends on libc malloc(). Without this, we'd need to explicitly override the STL allocators. I'm going to see if I can work some magic here persuading the compiler to restore this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that actually was the first thing I was trying to figure out. Haven't checked it out yet, but maybe we could make MSVC not link against binaries defining malloc? Not sure if it's possible though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a working solution based off this. The trick I used is to override operator new
and then, to avoid changing ggml.c
I supply a custom allocated array to its initializer.
#if defined(malloc) | ||
# undef malloc | ||
#endif | ||
#define malloc(x) _malloc(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With something like this, you could probably get away with moving these to the top of the file if you pulled the trick
#define malloc(...) _malloc(__VA_ARGS__)
That way you could avoid having to rename the symbols above.
static int WinMadvise(char* fd, size_t length, int flags) { | ||
auto p_handle = GetCurrentProcess(); | ||
struct _WIN32_MEMORY_RANGE_ENTRY entry((void*)fd, length); | ||
bool success = PrefetchVirtualMemory(p_handle, 1, &entry, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, nice.
LPVOID lpMsgBuf; | ||
LPVOID lpDisplayBuf; | ||
DWORD error_code = GetLastError(); | ||
FormatMessage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably want to create a helper function for this error code.
LocalSize(lpDisplayBuf) / sizeof(TCHAR), | ||
TEXT("failed with error %d: %s"), | ||
error_code, lpMsgBuf); | ||
fprintf(stderr, (char*)lpDisplayBuf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W.r.t. fprintf, I'd say if you're already going to the trouble of pulling out all of the above WIN32 functions, then just go for the gold and use WriteFile(GetStdHandle(STD_ERROR_HANDLE))
.
[side commentary] In my work on Cosmopolitan Libc, one thing I love doing for instance, is ensuring that Cosmo never depends on the MSVC Libc, since it has a history of bundling things like telemetry, plus there's like 10 different Microsoft Libc's to choose from. But linking just KERNEL32 is awesome when it's possible, which is what Cosmo apps do.
*.msp | ||
|
||
# JetBrains Rider | ||
*.sln.iml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does all of this stuff need to be in the gitignore? (eg I see ignores for F# which this project definitely does not use). Would it be possible to limit it to just the things that are actually likely to appear in people’s worktrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm about to merge this and push my fixes right afterwards to our mmap dev branch. I'll do my best to trim this down in the follow-up commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion incorporated into cbddf46. I don't think we needed to change .gitignore at all. At least not with MSVC 2022 using CMake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is a WIP but it's going into a dev branch. I think this is worth merging. I'll push the fixes I found to the problems we encountered right afterwards. Thank you!
mag->offset = i + m; | ||
spin_unlock(mag->lock); | ||
p = MAGIC_ADDR + i; | ||
((size_t *)p)[-1] = n; | ||
return p; | ||
} | ||
|
||
void *malloc(size_t n) { | ||
void *_malloc(size_t n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a working solution based off this. The trick I used is to override operator new
and then, to avoid changing ggml.c
I supply a custom allocated array to its initializer.
*.msp | ||
|
||
# JetBrains Rider | ||
*.sln.iml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm about to merge this and push my fixes right afterwards to our mmap dev branch. I'll do my best to trim this down in the follow-up commit.
Still not fully working yet. Closes #341
I've cherry-picked this pull request onto the pushed I've just made to the |
The code in PR lets you run llama the first time, but the second time the program crashes. This is due to memory access violation when trying to access any data related to the model or vocab. I don't know why, but current implementation cannot serialize pointers in the
magic
properly. I suspect this is due to allocating them usingnew
operator which generates addresses outside of mapped memory range, guess on the second run those pointers are simply not interpretable and point to nowhere.Major changes:
malloc
renamed to_malloc
. Otherwise linker is complaining when linking against CRT as there are multiple definitions formalloc
. I tried to doundef malloc
to remove references for the original one, but did not verify yet if it makes any difference.access
flag to FILE_MAP_COPY when loading second time. See https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffile.Some of the fixes to make it compile include:
#define NOMINMAX
stat
structure for win.MS_ASYNC
for win.This PR code will likely not compile on Linux right now (and it's also quite crappy as I did rapid changes just to make it compile at least). So if there are people using Visual Studio, I hope this PR will lay some foundation for further development.