Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter state serialization #225

Closed
pgoodman opened this issue May 2, 2017 · 5 comments
Closed

Smarter state serialization #225

pgoodman opened this issue May 2, 2017 · 5 comments

Comments

@pgoodman
Copy link
Member

pgoodman commented May 2, 2017

I think Manticore should take the FrankenPSE approach to saving states, it's much more disk friendly.

Suppose you get a symbolic EIP, and there's 10 possible concretizations for it. In FrankenPSE, I'd serialize a pre-fork state file, make 10 post-fork hardlinks to it, delete the pre-fork state file, then in memory store the assertion condition to go from a pre-fork to a post-fork state.

It ended up drastically reducing required disk space, and required I/O at each fork and also benefits from more reuse in the kernel's page cache. So you'd have 10 names for identical files sharing the same underlying data in the kernel's page cache / on disk, and in memory, where you already need to maintain the list of states, you have these little closures that can assert a simple condition, e.g. assert that cpu.EIP == 0xf00, storing these in such a way that they don't hold references to the state/cpu/etc objects.

Here's the relevant code, copypasta'd from FrankenPSE. I don't have time to adapt it to Manticore, someone else might ;-)

    def _state_filename(self, co):
        return os.path.join(self.workspace, 'state_{}'.format(co))

    def _do_save_state(self, state, co):
        file_size = 0
        source_name = self._state_filename(state.co)

        # No state file, or stale state file; (re)create it.
        if not os.path.exists(source_name) or int(os.path.getctime(source_name)) < START_TIME:
            if 'sat' != state.solver.check():
                raise SolverException("Can't sat before fork.")

            state.event_visitor.visit_exe_fork()
            with open(source_name, 'wb') as f:
                pickle.dump(state, f, pickle.HIGHEST_PROTOCOL)
                file_size = f.tell()
                self.state_storage_size[state.co] = file_size
            self.total_used_storage[file_size] += 1
        else:
            file_size = self.state_storage_size[state.co]

        dest_name = self._state_filename(co)
        if os.path.exists(dest_name):
            os.remove(dest_name)

        os.link(source_name, dest_name)
        self.total_used_storage[file_size] += 1
        self.state_storage_size[co] = file_size
        return file_size, dest_name

    def save_state(self, state, next_pc=None, transition_func=null_transition):
        if not self._can_store_more():
            logger.info("Maximum state storage budget exceeded. Ignoring state.")
            return False

        cpu = state.current
        is_covered = True
        branch_info = (0, 0, 0)
        if next_pc is None:
            next_pc = cpu.read_register('PC')
            assert not isinstance(next_pc, Symbol)
        else:
            state.num_symbolic_branches += 1
            
            last_branch_block_pc = state.concretize(cpu.branch_info[0], 'ALL')[0]
            block_pc_of_branch = state.concretize(cpu.branch_info[1], 'ALL')[0]
            branch_target_block_pc = next_pc
            cpu.branch_info = (last_branch_block_pc, block_pc_of_branch, cpu.branch_info[2])

            branch_info = (last_branch_block_pc, block_pc_of_branch, branch_target_block_pc)
            logger.info("Saving state for path {:08x} -> {:08x} -> {:08x}".format(*branch_info))
            is_covered = self.coverage.is_covered(*branch_info)

        try:
            co = next_state_id()
            file_size, file_name = self._do_save_state(state, co)
        except SolverException:
            logger.info("Can't sat before fork.")
            return True
        except StopExecutionException:
            return True

        logger.info("Saved state {} (derived from {}) to file {}".format(co, state.co, file_name))

        num_recoveries = 0
        for process in state.processes():
            if process:
                num_recoveries += process.mem.num_recoveries

        new_state = {
            'num_received_bytes' : state.os.num_received_bytes,
            'num_transmitted_bytes': state.os.num_transmitted_bytes,
            'num_symbolic_branches': state.num_symbolic_branches,
            'num_syscalls': state.os.num_syscalls,
            'num_instructions': sum(state.visited.values()),
            'num_unique_instructions': len(state.visited),
            'file_size': file_size,
            'co': co,
            'source_co': state.co,
            'transition_func': transition_func,
            'next_pc': next_pc,
            'num_recoveries': num_recoveries,
            'event_score': state.event_visitor.visit_exe_score(),
        }

        if not state.current.mem.is_executable(next_pc):
            self.error_states.append(new_state)
        elif num_recoveries:
            self.recovered_states.append(new_state)

        elif not is_covered:
            logger.info("!!! Found new coverage state {:08x} -> {:08x} -> {:08x}".format(
                *branch_info))
            self.new_coverage_states.append(new_state)
        else:
            self.states.append(new_state)

        #logger.info("Adding state %s to processing list. State list size: %d", co, len(self.states))
        return True

    def _make_reg_transition(self, reg_name, symbol, val):
        def transition(state):
            logger.debug("Concretizing {} = 0x{:02x} in state {}".format(reg_name, val, state.co))
            state.current.write_register(reg_name, val)
            if not state.event_visitor.visit_exe_assert_reg(reg_name, symbol, val):
                raise SolverException("Cannot concretize {} = {:02x} at PC {:08x}".format(
                    reg_name, val, state.current.read_register('PC')))
            return state
        return transition

    MEMORY_SIZE_NAME = {8: "BYTE", 16: "WORD", 32: "DWORD", 64: "QWORD", 128: "XMMWORD", 256: "YMMWORD"}

    def _make_mem_transition(self, addr, size, symbol, val):
        def transition(state):
            word = Executor.MEMORY_SIZE_NAME[size]
            logger.debug("Concretizing {} PTR [{:08x}] = 0x{:02x} in state {}".format(word, addr, val, state.co))
            state.solver.add(symbol == val)
            state.current.store(addr, val, size)
            if 'sat' != state.solver.check():
                raise SolverException("Cannot concretize {} PTR [{:08x}] = {:02x} at PC {:08x}".format(
                    word, addr, val, state.current.read_register('PC')))
            return state
        return transition

...

    def _concretize_register(self, reg_name, policy):

        set_next_pc = reg_name in ('PC', 'EIP', 'RIP')
        current_state = self.current_state
        self.current_state = None

        symbol = current_state.current.read_register(reg_name)
        get_vals = lambda: current_state.concretize(symbol, policy)
        try:
            vals = current_state.event_visitor.visit_exe_concretize_reg(
                reg_name, symbol, get_vals)
        except StopExecutionException:
            logger.info("Not concretizing register {} with policy {}".format(
                reg_name, policy))
            return

        if 1 == len(vals) and not len(self.states):
            transition_func = self._make_reg_transition(
                reg_name, symbol, vals[0])
            self.current_state = transition_func(current_state)
        else:
            for val in vals:
                next_pc = None
                if set_next_pc:
                    next_pc = val

                self.save_state(
                    current_state,
                    next_pc=next_pc,
                    transition_func=self._make_reg_transition(
                        reg_name, symbol, val))

    def _concretize_memory(self, address, size, policy):
        current_state = self.current_state
        self.current_state = None

        symbol = current_state.current.load(address, size)
        get_vals = lambda: current_state.concretize(symbol, policy)
        try:
            vals = current_state.event_visitor.visit_exe_concretize_mem(
                address, size, symbol, get_vals)
        except StopExecutionException:
            logger.info("Not concretizing memory {:08x} with policy {}".format(
                address, policy))
            return

        if 1 == len(vals) and not len(self.states):
            transition_func = self._make_mem_transition(
                address, size, symbol, vals[0])
            self.current_state = transition_func(current_state)
        else:
            for val in vals:
                self.save_state(
                    current_state,
                    transition_func=self._make_mem_transition(
                        address, size, symbol, val))

...

    def _run(self):
        global MAX_ICOUNT

        logger.info("Starting cgc-symbolic-emulator main loop.")

        while self.current_state or self._num_pending_states():
            try:
                self._try_execute_instructions()

            except ConcretizeRegister as e:
                self._concretize_register(e.reg_name, e.policy)

            except ConcretizeMemory, e:
                logger.info(str(e))
                self._concretize_memory(e.address, e.size, e.policy)

...
@withzombies
Copy link

That's a super clever approach!

@yan
Copy link
Contributor

yan commented Sep 12, 2018

Worth revisiting after #1093.

@yeti-detective
Copy link
Contributor

@pgoodman howdy, on line 31 there's a function signature like save_state(self, state, next_pc=None, transition_func=null_transition) but I can't find a definition for null_transition anywhere. I'm guessing FrankenPSE is an internal project at ToB?
If I wanted to kind of work on adapting this into Manticore would it be appropriate to just define it like

def null_transition():
    pass

???

@pgoodman
Copy link
Member Author

def null_transition(state):
    return state

That file is here: https://gist.github.com/pgoodman/f1eb082936d433d32c65a2596bce7c12

@ehennenfent ehennenfent added this to the Validate Existing issues milestone Jan 23, 2019
@ehennenfent ehennenfent removed this from the Validate Existing issues milestone Feb 26, 2019
@ehennenfent
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants