SHMA: Software-managed Caching for Hybrid DRAM/NVM Memory Architectures, implemented with zsim and nvmain hybrid simulators


       SHMA is implemented with zsim and NVMain. Hybrid simulator that integrates cycle-accurate main memory simulator for emerging non-volatile memories --NVMain with zsim can be forked from "". Comparing to zsim-nvmain hybrid simulator, SHMA has achieved following functions:

  • Implemented memory management simulations(such as MemoryNode, Zone, BuddyAllocator etc.): Considering that pin-based zsim only replays virtual address into simulation architecture, and doesn support OS simulation, SHMA has added memory management simulation into zsim, including memory node, zone and buddy allocator.

  • TLB simulation: Original zsim-nvmain hybrid simulator has no simulation of TLB, since SHMA has added memory management modules into zsim, TLB simulation is implemented in zsim accordingly to accelerate address translation procedure for virtual address to physical address.

  • Implementation of SHMA, a hierarchical hybrid DRAM/NVM memory system that brought DRAM caching issues into software level: DRAM cache is managed by hardware totally in tranditional DRAM-NVM hierarchical hybrid systems, SHMA is based on a novel software-managed cache mechanism that organizes NVM and DRAM in a flat physical address space while logically supporting a hierarchical memory architecture, this design has brought DRAM caching issues into software level.Besides, SHMA only caches hot pages into DRAM cache to reduce cache pollution and bandwidth waste between DRAM cache and NVM main memory.

  • Multiple DRMA-NVM hybrid architecture supports: Support both DRAM-NVM flat-addressable hybrid memory architecuture and DRAM-NVM hierarchical hybrid architecture.As shown in following picture,both DRAM and NVM are used as main memory and managed by OS uniformly in DRAM-NVM flat-addressable hybrid architecture. In DRAM-NVM hierarchical hybrid memory architecture, DRAM is exploited as cache of NVM, hardware-assisted hit-judgement used to determine whether data hits in DRAM cache is necessary in this architecutre. Besides, to reduce hardware overhead, DRAM cache is organized set-associative and uses Demand-based caching policy. Image of Yaktocat

  • Multiple DRAM-NVM hybrid system optimization policies: We have implemented Row Buffer Locality Aware(RBLA) Migrating policy and MultiQueue-based(MultiQueue) Migrating policy in DRAM-NVM flat addressable hybrid memory system. RBLA Migrating policy is a simple implementation of hybrid memory system proposed in thesis "Row Buffer Locality Aware Caching Policies for Hybrid Memories", MultiQueue Migrating policy is a simple implementation of thesis "Page Placement in Hybrid Memory Systems". RBLA Migrating policy is aimed at migrating NVM pages with bad row buffer locality to DRAM since row buffer miss of NVM pages pay more overhead than row buffer miss of DRAM pages, and row buffer hit of NVM pages gains more performance than row buffer hit of DRAM pages.MultiQueue Migrating policy migrates hot NVM pages into DRAM, hotness of a page is measured by both time locality and access frequency, MQ algorithm is used to update hotness of pages.

Modules and architecture of hybrid simulator are shown as following: Image of Yaktocat

The research leading to these results has received funding from National high technology research and development program(863 program) project corpus, in-memory computing system software research and development project

Origianl License & Copyright of zsim

zsim is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.

zsim was originally written by Daniel Sanchez at Stanford University, and per Stanford University policy, the copyright of this original code remains with Stanford (specifically, the Board of Trustees of Leland Stanford Junior University). Since then, zsim has been substantially modified and enhanced at MIT by Daniel Sanchez, Nathan Beckmann, and Harshad Kasture. zsim also incorporates contributions on main memory performance models from Krishna Malladi, Makoto Takami, and Kenta Yasufuku.

zsim was also modified and enhanced while Daniel Sanchez was an intern at Google. Google graciously agreed to share these modifications under a GPLv2 license. This code is (C) 2011 Google Inc. Files containing code developed at Google have a different license header with the correct copyright attribution.

Additionally, if you use this software in your research, we request that you reference the zsim paper ("ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems", Sanchez and Kozyrakis, ISCA-40, June 2013) as the source of the simulator in any publications that use this software, and that you send us a citation of your work.

License & Copyright of SHMA (HUST SCTS & CGCL Lab)

SHMA was extended by Yujie Chen, Dong Liu at Cluster and Grid Computing Lab & Services Computing Technology and System Lab of Huazhong University of Science and Technology(HUST SCTS & CGCL Lab), the copyright of this SHMA remains with CGCL & SCTS Lab of Huazhong University of Science and Technology.

Setup,Compiling and Configuration

1.External Dependencies
       Before install hybrid simulator zsim-nvmain, it's essential that you have already install dependencies listing below.


  • Update environment script according to your machine configuration
PINPATH= path of pin_kit
NVMAINPATH= path of nvmain
ZSIMPATH= path of zsim-nvmain
BOOST= path of boost
LIBCONFIG= path of libconfig
HDF5=path of hdf5
  • Compiling and Installation
[root @node1 SHMA]# cd zsim-nvmain
[root @node1 zsim-nvmain]# source  //init environmental values
[root @node1 zsim-nvmain]# scons -j16    //compiling, -j16 represents that compiling with 16 cores

If error "could not exec $PINPATH/intel64(ia32)/bin/pinbin" happens, it means that you are not authorized to execute pinbin, this can be solved with the following command:

[root @node1 zsim-nvmain]# chmod a+x $PINPATH/intel64(ia32)/bin/pinbin 
  • Using a virtual machine
    If you use another OS, can't make system-wide configuration changes, or just want to test zsim without modifying your system, you can run zsim on a Linux VM. We have included a vagrant configuration file ( that will provision an Ubuntu 12.04 VM to run zsim. You can also follow this Vagrantfile to figure out how to setup zsim on an Ubuntu system. Note that zsim will be much slower on a VM because it relies on fast context-switching, so we don't recommend this for purposes other than testing and development. Assuming you have vagrant installed (sudo apt-get install vagrant on Ubuntu or Debian), follow these steps: Copy the Vagrant file to the zsim root folder, boot up and provision the base VM with all dependencies, then ssh into the VM.
[root @node1 zsim-nvmain]# cp misc/Vagrantfile .
[root @node1 zsim-nvmain]# vagrant up
[root @node1 zsim-nvmain]# vagrant ssh

Vagrant automatically syncs the zsim root folder of your host machine to /vagrant/ on the guest machine. Now that you're in the VM, navigate to that synced folder, and simply build and use zsim (steps 5 and 6 above)

[root @node1 zsim-nvmain]# cd cd /vagrant/
[root @node1 zsim-nvmain]# scons -j4

3.zsim Configuration Keys (example zsim configuration files is in zsim-nvmain/config directory)

  • Enable TLB、Page Table and Memory Management Simulation
    (1) sys.tlbs.tlb_type: type of TLB, default is "CommonTlb","HotMonitorTlb" enables SHMA policy;
    (2) sys.tlbs.itlb(dtlb): prefix for configuring instruction/data TLB
    entry_num: Number of TLB entries, default is 128;
    hit_lantency: Latency(cycles) of TLB hit, default is 1cycle;
    response_latency: TLB response latency(cycles) to CPU, default is 1cycle;
    evict_policy: evict policy, default is "LRU";
    (3) sys.pgt_walker( page table walker configuration)
    ① mode: paging mode configuration, SHMA supports seven paging modes, namely, Legacy_Normal(4GB address space, page size is 4KB), Legacy_Huge(4GB address space, page size is 4MB), PAE_Normal(64GB address space, page size is 4KB),PAE_Huge(64GB address space, page size is 2MB),LongMode_Normal(address length is 48 bits,page size is 4KB), LongMode_Middle(address length is 48 bits, page size is 2MB) and LongMode_Huge(address length is 48bits, page size is 1GB);
    ② itlb: instruction TLB name corresponding to this page table walker;
    ③ dtlb: name of data TLB corresponding to this page table walker;
    ④ reversed_pgt: true, enable reversed page table; false, disable reversed page table; when simulating single process, default is false; while simulating multiple processes, default is true;

(4) memory management configuration zone_dma/zone_dma32/zone_normal/zone_highmem: set OS zone size(MB)
(5) sys.enable_shared_memory: true, enable shared memory simulation ( default is true )

  • Enable Simpoints
    (1) configuration key
    simPoints=directory of simpoints
    (2) how to get simpoints
    create .bb files with valgrind: cmd is execution command of the executable programs
 valgrind --tool=exp-bbv --interval-size=<instructions of a simpoint,example:1000000000> <cmd> 

get simpoints with .bb files with SimPoint(get from

simpoint -k <simpoints num> -loadFVFile <path of .bb files> -saveSimpoints <file store generated simpoints> -saveSimpointWeights <file store weights of generated simpoints> -sampleSize <instructions of a simpoint, eg:1000000000>

(3) format of simpoints
(the first simpoint period) 0
(the second simpoint period) 1
... ...
(the ith simpoint period) i-1
... ...
example( simpoint file of msf with 31 simpoints):

38 0
19 1
64 2
13 3
58 4
43 5
55 6
10 7
14 8
39 9
15 10
30 11
9 12
42 13
24 14
4 15
0 16
12 17
48 18
0 21
1 22
2 23
3 24
4 25
5 26
6 27
7 28
8 29
9 30
  • SHMA(Software-Managed DRAM Cache) Related Configuration(example in zsim-nvmain/config/shma.cfg)
    (1) sys.tlbs.tlb_type: must be set to be "HotMonitorTlb";
    (2) sys.init_access_threshold: set initial value of fetching_threshold, default is 0;
    (3) sys.adjust_interval: period of adjusting fetching_threshold automatically, defalut is 10000000 cycles (1000cycles is basic units); (4) sys.mem_access_time: cycles of per memory access caused by page table walking; 4.nvmain Configuration Keys (example nvmain configuration files is in zsim-nvmain/config/nvmain-config directory)

  • Enabling DRAM-NVM hierarchical hybrid architecture( zsim-nvmain/config/nvmain-config/hierarchy)
    (1) EventDriven: true;
    (2) CMemType: set physical memory type, is HierDRAMCache in hardware managed DRAM Cache hybrid memory system;
    (3) MM_Config: configuration file of NVM main memory;
    (4) DRC_CHANNEL: configuration file of DRAM Cache;

  • Enabling SHMA(software-managed DRAM Cache) policy in DRAM-NVM hierarchical hybrid architecture(zsim-nvmain/config/nvmain-config/shma)
    (1) EventDriven:true;
    (2) ReservedChannels: number of DRAM cache channels;
    (3) CONFIG_DRAM_CHANNEL: configuration file of every DRAM cache channel;
    (4) CONFIG_CHANNEL: configuration files of every NVM main memory channel;
    (5) CMemType: physical memory type, is FineNVMain in SHMA;
    (6) DRAMBufferDecoder: DRAM cache decoder type, is BufferDecoder in SHMA;

  • Enabling RBLA policy in DRAM-NVM hybrid architecture(zsim-nvmain/config/nvmain-config/rbla)
    (1) EventDriven:true;
    (2) Decoder: physical decoder object, is Migrator in RBLA;
    (3) PromotionChannel: channel id of fast memory (DRAM);
    (4) CMemType: physical memory type, is RBLANVMain in RBLA;
    (5) CONFIG_CHANNEL: configuration file path of every main memory channel;

  • Enabling MultiQueue policy in DRAM-NVM hybrid architecture(zsim-nvmain/config/nvmain-config/mq)
    (1) EventDriven: true;
    (2) AddHook: hook type, is "MultiQueueMigrator" in MultiQueue policy based hybrid memory system;
    (3) Decoder: decoder type, is "MQMigrator" in MultiQueue policy based hybrid memory system;
    (4) PromotionChannel: channel id of fast memory(DRAM);
    (5) CONFIG_CHANNEL: configuration file path of every main memory channel;

  • Enabling Flat DRAM-NVM hybrid architectures(zsim-nvmain/config/nvmain-config/rbla)
    (1) FAST_CONFIG: configuration file of fast memory (eg DRAM)
    (2) SLOW_CONFIG: configuration file of slow memory (eg NVM)
    (3) Decoder: *FlatDecoder, decoder of flat DRAM-NVMA hybrid architecture
    (4) CMemType:FlatRBLANVMain, memory type

TLB, Page Table and Memory Management Simulation Modules

       As described above, original zsim doesn't support OS simulation, and SHMA has added TLB, page table and memory management simulation into zsim, main modification is shown as following picture. The left side marks major code of original zsim corresponding to system simulation, the right side marks SHMA modifications to zsim for TLB, page table and memory management simulation support. Image of Yaktocat

Architecture of SHMA(software-managed DRAM Caching)

        SHMA has extended both page table and TLB to maintain both mappings from virtual address to physical address and physical address to DRAM cache address, this has brought DRAM cache management into software level, so that DRAM cache can be exploited fully. Besides, SHMA adopts utility-based DRAM caching policy that only fetching hot pages into DRAM cache when its memory pressure in high state to reduce DRAM cache pollution. SHMA supports DRAM cache directly bypass,too. Following picture is the architecture of SHMA.Image of Yaktocat

Implementations of RBLA and MultiQueue Policies

  • Row Buffer Locality Aware Migrator (RBLA)
           RBLA migrates NVM pages with bad row buffer locality to DRAM, and reserve pages with good row buffer locality in NVM to gain benefit from row buffer hit in NVM and reduce overhead caused by row buffer miss in NVM. Its implementation is shown as following picture:Image of Yaktocat

  • hot page migrator based on MultiQueue Alogrithm (MultiQueue)
           MultiQueue classify NVM pages into hot pages and cold pages using multiqueue algorithm accroding to both page access frequency and time locality. Its implementation is shown as following picture:Image of Yaktocat

  • Architecture of flat memory supporting different channel configurations of DRAM and NVM
           Considering that DRAM and NVM with different channel configurations have the overlapping address space in the low end, we divide the continuous overlapped address space into {channel_nums} and mapping them to different address space interleavingly to make full use of channel parallization

Happy hacking and hope you find SHMA useful for hybrid memory architecture research.

@Support or Contact

SHMA is developed in the HUST SCTS&CGCL Lab by Yujie Chen, Haikun Liu and Xiaofei Liao. If you have any questions, please contact Yujie Chen(, Haikun Liu ( and Xiaofei Liao ( We welcome you to commit your modification to support our project.

