Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

EOS-VM Optimized Compiler #7975

Merged
merged 28 commits into from
Sep 30, 2019
Merged

EOS-VM Optimized Compiler #7975

merged 28 commits into from
Sep 30, 2019

Conversation

spoonincode
Copy link
Contributor

@spoonincode spoonincode commented Sep 24, 2019

Change Description

EOS-VM Optimized Compiler is a high performance WASM runtime designed for operation inside of nodeos. It is a tier up runtime that works in conjunction with the configured baseline runtime (such as wabt or EOS-VM JIT). When enabled, actions on a contract are initially dispatched to the baseline runtime while EOS-VM Optimized Compiler does an optimized compilation in the background. Once this compilation is complete, actions on that contract will then be run with the optimized compiled code from then on. This optimized compile can take a handful of seconds but remember it is performed in the background so it doesn’t cause any hiccups. Optimized compilations are also saved to a new data file so that when restarting nodeos there is no need to recompile previously compiled contracts.

Usage

The option --eos-vm-oc-enable (or eos-vm-oc-enable = true in the config file) enables EOS-VM Optimized Compiler. There is no need to replay an existing chain. You can restart nodeos with or without this option at any time in the future without conflicts in the code cache file. The number of simultaneous background compiles is configured with the option eos-vm-oc-compile-threads which defaults to 1. More compile threads means EOS-VM Optimized Compiler can tier up more contracts simultaneously.

There is also an option eos-vm-oc-cache-size-mb that controls the maximum size of the cache. The default of 1GB is more than enough for any public EOSIO chain at the moment. Be wary of shrinking this too small as it can cause cache thrashing.

Consensus Changes

  • Consensus Changes

API Changes

  • API Changes

Documentation Additions

  • Documentation Additions

static constexpr uint64_t wasm_memory_size = eosio::chain::wasm_constraints::maximum_linear_memory;
static constexpr uint64_t intrinsic_count = boost::hana::length(intrinsic_table);
//warning: changing the following 3 params will invalidate existing PIC
static constexpr uint64_t mutable_global_size = 8u * eosio::chain::wasm_constraints::maximum_mutable_globals/4u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spacing is slightly misleading. Multiplication and division are left-associative. This doesn't affect behavior since maximum_mutable_globals is small and a multiple of 4.

//prolouge + 33MB + 4GB fault buffer + 4096 addtional buffer for safety
static constexpr uint64_t total_memory_per_slice = memory_prologue_size + wasm_memory_size + UINT64_C(0x100000000) + UINT64_C(4096);

static constexpr uint64_t number_slices = wasm_memory_size/(64u*1024u)+1u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64u*1024u == wasm_page_size

libraries/chain/webassembly/eos-vm-oc.cpp Show resolved Hide resolved
#include "IR/Module.h"

#pragma push_macro("N")
#undef N
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this file uses N, you can just #undef it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm's headers do
Just commenting out the #undef N gives me errors like

In file included from /home/spoon/eos/src/libraries/chain/webassembly/eos-vm-oc/LLVMJIT.h:8:
In file included from /usr/include/llvm/IR/Module.h:27:
In file included from /usr/include/llvm/IR/Function.h:28:
In file included from /usr/include/llvm/IR/BasicBlock.h:23:
In file included from /usr/include/llvm/IR/Instruction.h:22:
In file included from /usr/include/llvm/IR/DebugLoc.h:18:
In file included from /usr/include/llvm/IR/TrackingMDRef.h:17:
/usr/include/llvm/IR/Metadata.h:1227:48: error: member initializer 'string_to_name' does not name a non-static data member or base class
  MDTupleTypedArrayWrapper(const MDTuple *N) : N(N) {}
                                               ^~~~
/home/spoon/eos/src/libraries/chain/include/eosio/chain/name.hpp:89:28: note: expanded from macro 'N'
#define N(X) eosio::chain::string_to_name(#X)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #undef is definitely needed. It's the push and pop that are unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those cases where I prefer to leave things how I found them. Maybe some day it'll save someone a few minutes trying to figure out where N() is after LLVMJIT.h somehow gets included in their source file 3 layers deep

libraries/chain/webassembly/eos-vm-oc/executor.cpp Outdated Show resolved Hide resolved
libraries/chain/webassembly/eos-vm-oc/executor.cpp Outdated Show resolved Hide resolved

void executor::execute(const code_descriptor& code, const memory& mem, apply_context& context) {
if(mapping_is_executable == false) {
mprotect(code_mapping, code_mapping_size, PROT_EXEC|PROT_READ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mprotect can fail for various reasons.


size_t sz = fc::raw::pack_size(message);
EOS_ASSERT(sz < max_message_size, misc_exception, "trying to send too large of message");
char buff[sz];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable length arrays are a C99 feature which is not standard C++.

fstat(fd, &st); //XXX
void* contents = mmap(nullptr, st.st_size, PROT_READ, MAP_SHARED, fd, 0); //XXX
memcpy(dst, contents, st.st_size);
munmap(contents, st.st_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just read?

static_assert(sizeof(control_block) <= wcb_allowance, "EOS-VM OC memory doesn't set aside enough memory for control block");

//round up the prologue to multiple of 4K page
static constexpr uint64_t memory_prologue_size = ((memory::wcb_allowance + mutable_global_size + table_size + intrinsic_count*UINT64_C(8))+UINT64_C(4095))/UINT64_C(4096)*UINT64_C(4096);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use sysconf for the page size instead of the constant of 4096 (_SC_PAGESIZE).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately querying _SC_PAGESIZE won't allow me to fold this in to a constexpr. is there anything else similar in a #define you're aware of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sadly not. We could add a constexpr variable in eos-vm in the next release and in some initializer place an assert that the two values match. It won't give us any compile time protection, but it would at least ensure that the page size is correct with the system.

T* base = array_ptr_impl<T>((U32)ptr, length);
if ( reinterpret_cast<uintptr_t>(base) % alignof(T) != 0 ) {
EOSVMOC_MEMORY_PTR_cb_ptr;
std::vector<uint8_t>& copy = cb_ptr->bounce_buffers->emplace_back(length > 0 ? length*sizeof(T) : 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine for most compilers with strict aliasing, but uint8_t is not guaranteed to be a character type. Strict aliasing rule for c++ dictates that the type should be a char or unsigned char.

T* base = array_ptr_impl<T>((U32)ptr, length);
if ( reinterpret_cast<uintptr_t>(base) % alignof(T) != 0 ) {
EOSVMOC_MEMORY_PTR_cb_ptr;
std::vector<uint8_t>& copy = cb_ptr->bounce_buffers->emplace_back(length > 0 ? length*sizeof(T) : 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

char check;
check = p[(U32)ptr];
check = p[(U32)(ptr)+sizeof(T)-1u];
T &base = *(T*)(((char*)(cb_ptr->full_linear_memory_start))+(U32)ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it breaks strict aliasing. I believe if you change the type of full_linear_memory_start from uintptr_t to char*, that should suffice.

char check;
check = p[(U32)ptr];
check = p[(U32)(ptr)+sizeof(T)-1u];
T &base = *(T*)(((char*)(cb_ptr->full_linear_memory_start))+(U32)ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it breaks strict aliasing. I believe if you change the type of full_linear_memory_start from uintptr_t to char*, that should suffice.

EOS-VM OC needs to fork off its out of process compilation processes as early as possible. I don't want to do this, say, once a chain controller is initialized because there may already be open TCP sockets, keys loaded in memory, ASIO initialized, etc.

My first take at this was to use a static initializer but LLVM spewed weird errors (presumedly it also has some static initializers). Then as a placeholder I left a direct call as the first thing in main(), but this is annoying copy pasta that has to be placed in nodeos, unit_test, spec_tests, etc

This commit performs a little linker trick to execute the forking code right before main(). This means all static initializers are done when the fork occurs, the fork occurs before main(), and there is no need to put explicit code inside of main() for every user of libchain.
eosvmoc_optional_offset_or_import_t start;
unsigned apply_offset;
int starting_memory_pages;
unsigned initdata_prolouge_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misspelled prologue.

@@ -166,6 +205,11 @@ namespace eosio { namespace chain {
wasm_cache_index wasm_instantiation_cache;

const chainbase::database& db;
const wasm_interface::vm_type wasm_runtime_time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/_time/_type/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is N/A after merging from develop


//as a double check that the control block pointer is what we expect, look for the magic
if(cb_in_main_segment->magic != signal_sentinel)
goto notus;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were to trigger, there's no guarantee that cb_in_main_segment points to valid memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise for anything else that accesses cb_in_main_segment in here. Are you suggesting that it's a worthless check and might as well be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presumed that the intent of this check is to determine whether gs is ours. Thus, the later checks should be safe provided this passes. The check here is not entirely worthless, but it isn't very reliable either.


}

boost::signals2::signal<void()> connection_dead_signal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A signal seems like overkill. There's always exactly one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there are some unittests that create two controllers ergo two wasm_runtimes ergo two connections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure there can be more than one controller, but the connection_dead_signal is only used to erase the compile_monitor_session that it is part of. connect is only called once, right after the compile_monitor_session is emplaced into compile_monitor::_compile_sessions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, I dunno that whole thing is just a pattern I typically use for asio server + asio clients so that I avoid shared_ptrs and client objects are completely decoupled

wrapped_fd memfd_for_vector(const std::vector<uint8_t>& bytes) {
int fd = syscall(SYS_memfd_create, "eosvmoc_code", MFD_CLOEXEC);
ftruncate(fd, bytes.size());
uint8_t* b = (uint8_t*)mmap(nullptr, bytes.size(), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XXX

wrapped_fd memfd_for_blob(const shared_blob& blob) {
int fd = syscall(SYS_memfd_create, "eosvmoc_code", MFD_CLOEXEC);
ftruncate(fd, blob.size());
uint8_t* b = (uint8_t*)mmap(nullptr, blob.size(), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XXX MAP_FAILED


mapsize = total_memory_per_slice*number_slices;
mapbase = (uint8_t*)mmap(nullptr, mapsize, PROT_NONE, MAP_PRIVATE|MAP_ANON, 0, 0);
FC_ASSERT(mapbase != MAP_FAILED, "Failed to mmap memory");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the other FC_ASSERTs do not unmap the memory that was already mapped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you haven't mapped anything, yet here, but it does leak fd.

}

intrinsic::intrinsic(const char* n, const IR::FunctionType* t, void* f, size_t o) {
the_intrinsic_map().erase(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for n to already exist in the map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. First all intrinsics are registered and then the internals of EOS-VM OC come along and overwrite eosio_exit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so that's why you're setting the static initialization order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could use some explanation.


//a 0 GS value is an indicator an executor hasn't been active on this thread recently
uint64_t current_gs;
syscall(SYS_arch_prctl, ARCH_GET_GS, &current_gs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for using syscall over the glibc wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the docs don't allow arch_prctl & mprotect from a signal handling context. Yeah the docs don't allow syscall either but it's even less likely to do be touching some state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting errno is an issue, so I suppose I can live with this in the signal handler. You're using syscall in other places, too, though.


std::vector<wrapped_fd> fds_to_pass;
fds_to_pass.emplace_back(socks[1]);
fds_to_pass.emplace_back(cache_fd); //put cache_fd in to a wrapped_fd just temporarily, ick
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is especially bad, because it will close cache_fd if write_message_with_fds throws, but not if any other part of the function throws.

wrapped_fd socket_to_monitor_session(socks[0]);

std::vector<wrapped_fd> fds_to_pass;
fds_to_pass.emplace_back(socks[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can leak socks[1].

When originally testing a proof of concept for EOS-VM OC design, I validated that both Clang and GCC could operate on pointers to the GS segment easily. Unfortunately what I didn't realize was that GCC only can do this when compiling C - not C++ - and my proof of concept had used C. This wasn't detected until I was pretty far along with the implementation.

This commit refactors some things around so EOS-VM OC works with GCC.

Non-hot accesses to GS are funneled through a separate C file. Honestly this makes things pretty clunky because the control block has pointers to C++ classes and such. But, I feel like reducing the amount of inline asm bloat is preferred even if I have to #ifdef stuff around in there.

Hot access to GS (like intrinsic pointer validation) are replaced with in-line asm.

Also a number of magic constants are replaced with better constexpr values.
}

void code_cache_base::check_eviction_threshold(size_t free_bytes) {
if(free_bytes < _free_bytes_eviction_threshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the cache is fragmented due to many small wasms? Can we get in a situation where nothing is ever evicted because we have enough free bytes in total, but the wasms we want to execute don't fit in any free segment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe on a toofull message from the compile process, cache eviction should be more aggressive and disregard the threshold

local::datagram_protocol::socket response_socket(_ctx);
response_socket.assign(local::datagram_protocol(), socks[0]);
std::vector<wrapped_fd> fds_pass_to_trampoline;
fds_pass_to_trampoline.emplace_back(socks[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May leak socks[1].

#else
void* ctx;
void* eptr;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better not to have the struct use different types for C vs. C++. You can add a typed (non-member) accessor if needed.

@swatanabe-b1
Copy link
Contributor

I've finished reviewing modulo whatever changes you make after this.

#include <eosio/chain/webassembly/eos-vm-oc/eos-vm-oc.h>

#ifdef __clang__
#define GS_PTR __attribute__((address_space(256)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the gs register stuff we should probably make it apparent that we can't compile with stack protector (gcc and clang both use the gs register for canaries). We should keep a close eye on adding any new compile commands added to eos in the future to ensure that no code is that uses the gs register. We should also keep an eye on Linux kernel versions when we change our supported OSes lists. I believe that currently Linux only uses gs for kernel stack, but they could possibly (although unlikely) make use it for user space needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure GS segment is completely unused by Linux userland bits: TLS, SSP, all on FS. Arch actually has SSP enabled by default for gcc & clang and I haven't seen any problems; the asm I've seen in functions def uses FS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* SPDX-License-Identifier: GPL-2.0 */
/*
 * GCC stack protector support.
 *
 * Stack protector works by putting predefined pattern at the start of
 * the stack frame and verifying that it hasn't been overwritten when
 * returning from the function.  The pattern is called stack canary
 * and unfortunately gcc requires it to be at a fixed offset from %gs.
 * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
 * and x86_32 use segment registers differently and thus handles this
 * requirement differently.
 *
 * On x86_64, %gs is shared by percpu area and stack canary.  All
 * percpu symbols are zero based and %gs points to the base of percpu
 * area.  The first occupant of the percpu area is always
 * fixed_percpu_data which contains stack_canary at offset 40.  Userland
 * %gs is always saved and restored on kernel entry and exit using
 * swapgs, so stack protector doesn't add any complexity there.
 *
 * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
 * used for userland TLS.  Unfortunately, some processors are much
 * slower at loading segment registers with different value when
 * entering and leaving the kernel, so the kernel uses %fs for percpu
 * area and manages %gs lazily so that %gs is switched only when
 * necessary, usually during task switch.
 *
 * As gcc requires the stack canary at %gs:20, %gs can't be managed
 * lazily if stack protector is enabled, so the kernel saves and
 * restores userland %gs on kernel entry and exit.  This behavior is
 * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
 * system.h to hide the details.
 */

Also, Clang ShadowCallStack uses the gs register for this same kind of purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your quoted blob is from kernel sources. Kernel has its own GS value; it even says so in that blob

Userland %gs is always saved and restored on kernel entry and exit using swapgs

If you compile with -fstack-protector-strong you'll see code like

00000000000011f0 <fooz>:
fooz():
    11f0:       41 54                   push   %r12
    11f2:       48 89 fe                mov    %rdi,%rsi
    11f5:       48 83 ec 50             sub    $0x50,%rsp
    11f9:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
...
    122c:       48 8b 44 24 48          mov    0x48(%rsp),%rax
    1231:       64 48 33 04 25 28 00    xor    %fs:0x28,%rax
    1238:       00 00 
    123a:       75 07                   jne    1243 <fooz+0x53>
    123c:       48 83 c4 50             add    $0x50,%rsp
    1240:       41 5c                   pop    %r12
    1242:       c3                      retq   
    1243:       e8 08 fe ff ff          callq  1050 <__stack_chk_fail@plt>

it's def using FS.

Having trouble finding applicably of ShadowCallStack, docs say it's ARM8 only
http://releases.llvm.org/9.0.0/tools/clang/docs/ShadowCallStack.html

};

struct code_compilation_result_message {
eosvmoc_optional_offset_or_import_t start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future release we should clean up these variable names, probably just ifdef __cplusplus using <name> = eosvmoc_<name>, it's very redundant for the c++ code.


struct code_compilation_result_message {
eosvmoc_optional_offset_or_import_t start;
unsigned apply_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've used explicitly sized types every where else, why are these different?

// Cast the pointer to the appropriate type.
auto bytePointer = CreateInBoundsGEPWAR(irBuilder, moduleContext.defaultMemoryBase, byteIndex);

return irBuilder.CreatePointerCast(bytePointer,memoryType->getPointerTo(256));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't in the default address space, make 256 a constexpr named variable somewhere so that other uses don't have to use the literal and so that they have some context to them.

@spoonincode spoonincode marked this pull request as ready for review September 29, 2019 04:02
Copy link
Contributor

@larryk85 larryk85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good. If none of your changes for tomorrow are contentious then I should approve and we can get this into develop.

this is not the intended way of deprecating old generated code -- that would be the codegen_version field. But this is a quick way to zap any code caches that were created by versions of EOS-VM OC before now. They could have had leaks/problems
@spoonincode spoonincode merged commit 3f0f353 into develop Sep 30, 2019
@spoonincode spoonincode deleted the eos-vm-oc branch September 30, 2019 03:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants