moleculec-c2

Improved C/Rust plugin for the molecule serialization system. Read as molecule c c2: the first c means compiler, c2 is the code name. We already have a moleculec which is used here.

How to use

Install "moleculec"

make install-tools

Compile Rust code to binary

cargo build --release

Generate C/Rust files by moleculec-c2 and moleculec

# generate intermedia json file
moleculec --language - --schema-file mol/blockchain.mol --format json > mol/blockchain.json
# generate C
target/release/moleculec-c2 --input mol/blockchain.json | clang-format -style=Google > tests/blockchain/blockchain-api2.h
# generate Rust
target/release/moleculec-c2 --rust --input mol/blockchain.json | rustfmt > tests/blockchain_rust/src/blockchain.rs

Include the generated file to your source file

The json file is intermedia file.
clang-format -style=Google or rustfmt is not needed if you don't care about coding style.

The following are optimized compared to the old C/Rust API:

Strong type for C

If we look into the code of old molecule API usage, we find that mol_seg_t is everywhere: it's like a weak type in dynamic languages(Python, lua). We can't use type system of C compilers to check whether we use the API correctly. With new API, we can use the type system help us to reduce possibilities for bugs,
checking that the code is written in a consistent way, giving hint while coding. Here is an example usage of blockchain. And browse the generated API for blockchain.

Extra support for known types

From the Encoding Spec, we know that there is no types system in molecule. For example, we can find the following definitions in molecule:

array Uint32 [byte; 4];
array Uint64 [byte; 8];

We now have "version" with type "Uint32". But with old molecule API, the API still returns "uint_8*" instead of "uint32_t".

Now the following type names are reserved for types:

Uint8, Int8
Uint16, Int16
Uint32, Int32
Uint64, Int64

When they appear in schema file, it is automatically converted to the corresponding types in the generated files. Here are the mapping list:

Molecule type	Type name	C Type	Rust Type
byte	/	uint8_t	u8
`[byte; 1]`	int8	int8_t	i8
`[byte; 1]`	uint8	uint8_t	u8
`[byte; 2]`	int16	int16_t	i16
`[byte; 2]`	uint16	uint16_t	u16
`[byte; 4]`	int32	int32_t	i32
`[byte; 4]`	uint32	uint32_t	u32
`[byte; 8]`	int64	int64_t	i64
`[byte; 8]`	uint64	uint64_t	u64
`[byte; N]`	/	mol2_cursor_t	`Cursor`
`<byte>`	/	mol2_cursor_t	`Cursor`
option	/	/	`Option<_>`

The type name is case-insensitive. For example, int8, Int8, INT8 are all mapped to int8_t.

Load memory on demand

mol_seg_t, is the most important data structure in old molecule API:

typedef struct {
    uint8_t                     *ptr;               // Pointer
    mol_num_t                   size;               // Full size
} mol_seg_t;

It comes with an assumption: the data has been loaded into memory already. It's not a good design to system like CKB-VM which only has very limited memory (4M).

As we look into the Molecule Spec, if we only need some part of data, we can get the data through some "hops". We can read the header only, estimating where to hop and don't need to read the remaining data. For a lot of scenarios which only need some part of data, we can have a load-on-demand mechanic.

This load-on-demand mechanic is introduced by the following data structure:

typedef struct mol2_cursor_t {
  uint32_t offset;  // offset of slice
  uint32_t size;    // size of slice
  mol2_data_source_t *data_source;
} mol2_cursor_t;

We have a very simple implementation of "read" field over memory:

uint32_t mol2_source_memory(uintptr_t args[], uint8_t *ptr, uint32_t len,
                            uint32_t offset) {
  uint32_t mem_len = (uint32_t)args[1];
  ASSERT(offset < mem_len);
  uint32_t remaining_len = mem_len - offset;

  uint32_t min_len = MIN(remaining_len, len);
  uint8_t *start_mem = (uint8_t *)args[0];
  ASSERT((offset + min_len) <= mem_len);
  memcpy(ptr, start_mem + offset, min_len);
  return min_len;
}

We can also make another one based on syscall.

When "mol2_cursor_t" is returned from functions, it doesn't access memory. As the name "cursor" suggests, it's only an cursor. We can access memory on demand by "mol2_read_at", for example:

    mol2_cursor_t witness_cur = witnesses.tbl->at(&witnesses, 0);
    uint8_t witness[witness_cur.size];
    mol2_read_at(&witness_cur, witness, witness_cur.size);
    assert(witness_cur.size == 3 && witness[0] == 0x12 && witness[1] == 0x34);

The rust version is much simpler:

impl Read for Vec<u8> {
    fn read(&self, buf: &mut [u8], offset: usize) -> Result<usize, Error> {
        let mem_len = self.len();
        if offset >= mem_len {
            return Err(Error::OutOfBound);
        }

        let remaining_len = mem_len - offset;
        let min_len = min(remaining_len, buf.len());

        if (offset + min_len) > mem_len {
            return Err(Error::OutOfBound);
        }
        buf[0..min_len].copy_from_slice(&self.as_slice()[offset..offset + min_len]);
        Ok(min_len)
    }
}

// same as `make_cursor_from_memory` in C
impl From<Vec<u8>> for Cursor {
    fn from(mem: Vec<u8>) -> Self {
        Cursor::new(MAX_CACHE_SIZE, mem.len(), Box::new(mem))
    }
}

Cache support

When the data is read from data source via syscall, the costs on every syscall is expensive. It would be great if it can read more data for future use for each syscall: now it supports cache for every reading. See mol2_read_at(in C) or read_at (in Rust) for more information.

Split declaration and definition for C

When the header file is generated, it can only be included in one single source file. If you choose multiple source files, it's better to split declaration and definition. Follow the following steps:

Define macro "MOLECULEC_C2_DECLARATION_ONLY" and include the header files

#define MOLECULEC_C2_DECLARATION_ONLY
#include "sample-api2.h"

See here. It can be repeated for every source files if needed.

Include header file fully in another source file (.c)

#include "sample-api2.h"

See here. It can only be done once.

For CKB developer

There is an already generated file blockchain-api2.h, together with molecule2_reader.h: they can be included in source file directly.

The original mol file is here.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
deps		deps
include		include
mol		mol
molecule2		molecule2
moleculec-c2		moleculec-c2
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Makefile		Makefile
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

moleculec-c2

How to use

Strong type for C

Extra support for known types

Load memory on demand

Cache support

Split declaration and definition for C

For CKB developer

About

Releases

Packages

Contributors 2

Languages

XuJiandong/moleculec-c2

Folders and files

Latest commit

History

Repository files navigation

moleculec-c2

How to use

Strong type for C

Extra support for known types

Load memory on demand

Cache support

Split declaration and definition for C

For CKB developer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages