Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework test framework, fix a number of related bugs #372

Merged
merged 43 commits into from
Mar 31, 2020
Merged

Conversation

geky
Copy link
Member

@geky geky commented Jan 27, 2020

This took a bit longer than I thought it would, some because of the number of bugs that fell out, some because I underestimated how much time it would take to edit all of the tests.

EDIT: I did not realize how long the description of this PR would be, sorry for the wall of text. Also sorry about being out-of-touch for a while.

Test framework rework

This PR reworks the testing framework to better isolate and fix filesystem bugs. This mainly involves generalizing the tests to let them run under different contexts (power-loss, wear, valgrind), while also adding extra scripts to make test results easier to reproduce and visualize.

  • Reworked how test.py works. Now instead of just dispatching tests/test_*.sh scripts, it loads tests stored in tests/test_*.toml files. This lets us store extra attributes describing tests, such as "leaky" or "reentrant", which we can use to decide how to run tests later.

    • ./scripts/test.py - Runs tests normally. Each test case is a single function that should terminate without asserting.
    • ./scripts/test.py --reentrant - Run tests with a simulated power-loss occuring every n cycles where n = n+1 every power-loss. Only test cases marked with reentrant = true are ran, since the test itself also needs to be able to handle power-loss.
    • ./scripts/test.py --valgrind - Run tests under valgrind, checking for memory constraint errors. Tests marked with leaky = true are skipped here.

    There are also the addition of "internal" tests, where test cases can specify in = "lfs.c" or some other file to be compiled as though it was appended to the file "lfs.c". This gives the test access to static functions and other internals. I'm planning to use these for more unit-style tests in the future.

    It's worth noting that tests can no longer span multiple sequential processes, but this feature wasn't that helpful and created problems for reproducibility and porting to devices.

    The real kicker is how test.py integrates with gdb. By adding --gdb, test.py will drop you you into gdb at the exact assert where a test fails. This includes reproducing any power-cycles necessary to get the test into the failing state. This is very helpful for reproducing complex test failures.

    $ ./scripts/test.py --gdb
    ====== building ======
    built 15 test suites, 116 test cases, 740 permutations
    ====== testing ======
    test_alloc ✓✓✓✓✓✓✓✓✓✓✓✓✓
    test_attrs ✓✓✓✓
    test_badblocks ✓✓✓✓✓✓✓✓✓✓✓✓
    test_dirs ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
    ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
    test_entries ✓✓✓✓✓✓✓✓
    test_exhaustion ✓✓✓✓✓✓✓✓✓✓✓✓
    test_format ✓✓✓✓✗
    ====== results ======
    tests/test_format.toml:29:failure: test_format#5#1 (BLOCK_CYCLES=32, N=10) failed with -6
    tests/test_format.toml:39:assert: assert failed with 2, expected eq 1
            assert(info.type == LFS_TYPE_REG);
    
    ======= gdb ======
    GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
    Copyright (C) 2016 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from ./tests/test_format.toml.test...done.
    Starting program: /home/geky/littlefs3/tests/test_format.toml.test 5 1
    tests/test_format.toml:39:assert: assert failed with 2, expected eq 1
    
    Program received signal SIGABRT, Aborted.
    0x00007ffff7a42428 in __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:54
    54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
    #2  0x0000000000401f17 in test_case5 (BLOCK_CYCLES=32, N=10) at tests/test_format.toml:39
    39              assert(info.type == LFS_TYPE_REG);
    (gdb)
  • Restructured the test block devices. Now there is testbd, filebd, and rambd.

    The way that emubd created a separate file for each block wasn't that useful. It was marginally easier? But didn't match the single-file-block-device that was easy to create when dumping flash from an actual device.

    I've replaced emubd with simpler, and hopefully more useful, block devices:

    • filebd - Simulate a block device on a single file.

      As an added plus this block device can be used to mount disk images extracted from a device's flash. You can even use it to mount /dev/sdb devices into a C program.

    • rambd - Simulate a block device in RAM.

      The main reason for adding this is performance. Now that we don't allow separate-process tests, we can test using a block device stored in RAM. This speeds up the tests by ~7x.

      • test_dirs w/ filebd - 0m7.170s
      • test_dirs w/ rambd - 0m0.966s

    On top of these is testbd, a wrapper for filebd/rambd with extra hooks for testing. This is where a lot of the test magic happens.

    • Simulated block wear - testbd can be configured to keep track of block wear, and force blocks to start reporting LFS_ERR_CORRUPT when exhausted. The current wear modified during a test to tightly control how the block device will behave under littlefs.

      Block wear can also be used to explicitly force geometry of the underlying block device, and lets us change the size of the block device on the fly while testing.

      We can also read the state of wear on the block device after a test, which may be useful for future benchmarks.

    • Simulated power loss - testbd can be give a number of "write cycles" to run for. This creates a counter that is decremented every prog/erase cycle. When the counter hits zero the program is forcefully terminated.

      Note this is much faster that using gdb to kill the program externally.

  • Added a handful of debugging scripts for visualizing the state of littlefs.

    • ./scripts/readblock.py disk block_size block - Prints a hex dump of a given block on disk. It's basically just dd if=disk bs=block_size count=1 skip=block | xxd -g1 - but with less typing.

      $ ./scripts/readblock.py disk 512 0xa
      off       data
      00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73  q.......littlefs
      00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00  /...............
      00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19  ...............
      00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00  a....b 0......d.
      ...
    • ./scripts/readmdir.py disk block_size block1 block2 - Prints info about the tags in a metadata pair on disk. It can print the currently active tags as well as the raw log of the metadata pair.

      $ ./scripts/readmdir.py disk 512 0xa 0xb
      off       tag       type            id  len  data (truncated)
      0000003b: 0020000a  dir              0   10  63 6f 6c 64 63 6f 66 66 coldcoff
      00000049: 20000008  dirstruct        0    8  02 02 00 00 03 02 00 00 ........
      00000008: 00200409  dir              1    9  68 6f 74 63 6f 66 66 65 hotcoffe
      00000015: 20000408  dirstruct        1    8  fe 01 00 00 ff 01 00 00 ........
    • ./scripts/readtree.py disk block_size - Parses the littlefs tree and prints info about the semantics of what's on disk. This includes the superblock, global-state, and directories/metadata-pairs.

      $ ./scripts/readtree.py disk 512
      superblock "littlefs"
        version v2.0
        block_size 512
        block_count 1024
        name_max 255
        file_max 2147483647
        attr_max 1022
      gstate 0x000000000000000000000000
      dir "/"
      mdir {0x0, 0x1} rev 3
      v id 0 superblock "littlefs" inline size 24
      mdir {0x77, 0x78} rev 1
        id 0 dir "coffee" dir {0x1fc, 0x1fd}
      dir "/coffee"
      mdir {0x1fd, 0x1fc} rev 2
        id 0 dir "coldcoffee" dir {0x202, 0x203}
        id 1 dir "hotcoffee" dir {0x1fe, 0x1ff}
      dir "/coffee/coldcoffee"
      mdir {0x202, 0x203} rev 1
      dir "/coffee/warmcoffee"
      mdir {0x200, 0x201} rev 1
  • Added explode_asserts.py.

    What is explode_asserts.py? It is an extra C preprocessing-step that makes asserts aggressively readable.

    explode_asserts.py takes in a C file and parses any asserts it finds, transforming them into an equivalent statement that can also report the original arguments to the logic operation in the assert. This is very useful for debugging and removes the need for assert_int_eq, assert_str_lt, etc that are a common sight in C testing frameworks.

    And because explode_asserts.py works with traditional asserts, it can be applied to any existing project to get nicer assert results.

    Here's a test failure showing how explode_asserts.py can extract assert arguments:

    geky@s:littlefs3$ ./scripts/test.py
    ====== building ======
    built 15 test suites, 116 test cases, 740 permutations
    ====== testing ======
    test_alloc ✓✓✓✓✓✓✓✓✓✓✓✓✓
    test_attrs ✓✓✓✓
    test_badblocks ✓✓✓✓✓✓✓✓✓✓✓✓
    test_dirs ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✗
    ====== results ======
    tests/test_dirs.toml:48:failure: test_dirs#3#1 (N=3) failed with -6
    tests/test_dirs.toml:72:assert: assert failed with "removeme000", expected eq "removeme0"
            assert(strcmp(info.name, path) == 0);
    
    tests passed: 64
    tests failed: 1

Bug fixes!

What was the point of all these test changes? It was to make my life easier Finding and fixing bugs! of which there were quite a few:

  • Fixed bug in dir splitting when there's a large number of open files 9453ebd

    Updates open files were depending on the mutable dir rather than the immutable copy intended to avoid sync issues.

  • Fixed issues with neighbor updates during moves f4b6a6b

    This was found by @eastmoutain and initial fix proposed in power loss reboot triggered internal assert  #343 though there were a couple lingering issues that needed to fixed.

    This lingering issues turned out to be quite involved. There's a better writeup in f4b6a6b's commit message.

    Basically the order and number of create/delete tags was assumed to behave a certain way when passed to lfs_dir_commit and this was wrong.

    The fix was to restructure how lfs_dir_commit tracks create/delete tags and how implicit moves in global state are handled (by making them no-longer implicit).

    I believe this was the source of most of the "LFS_ASSERT(block != LFS_BLOCK_NULL)" issues. Will be working through open issues to make sure.

  • Fixed parent going out of sync when relocated in lfs_mkdir a5d614f

    This can happen if we're creating a new directory in a multi-pair directory, our predecessor has to be relocated as we thread ourselves into the linked-list.

    Fortunately this can be fixed by adding our temporary mdir to the list of open files/dirs, which are automatically updated on metadata-pair changes. Note we need to take care to unhook our temporary before lfs_mkdir returns.

  • Fixed to-be-removed child going out of sync when relocated in lfs_remove and lfs_rename a5d614f

    Surprisingly this can happen if a nested directory is removed and it is its own parent's predecessor (which can happen after moving the directory around, since this does not change the threaded linked-list). If this happens, and our parent is relocated, we could go out of sync, causing us to update are predecessor with the incorrect tail pointer.

  • Fixed lfs_rename with same source and destination a5d614f

    An uncommon operation, lfs_rename("deja/vu", "deja/vu"). This was broken and would result in two deletes to the same directory entry. Fixed and added test case specifically for this situation.

  • Fixed how we manage gstate deltas when we lose power during nested relocations a5d614f

    Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated.

    The issue happens when we remove a directory, have to relocate its parent, and lose power.

    When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). This used to go to the parent, but this actually isn't correct since the threaded linked-list is used to read the gstate, not the directory tree.

    The fix here is to take special care about where we put the gstate delta. Delta depending on removes needs to go to the predecessor, while orphan-related delta describing the remove itself needs to go to the parent.

  • Fixed lfs_deorphan so it can handle more than one orphan/half-orphan during a power-loss a5d614f

    Two bugs here:

    1. lfs_deorphan couldn't handle handle multiple orphans in a single power-loss. We can't have multiple orphans, but we can have multiple half-orphans (where only one block address is out-of-date in the metadata pair). This happens when we have nested relocations.

    2. lfs_deorphan needs to relocate the current pair if it is a half-orphan. This is because if both a directory and its immediate successor is relocated, the tail pointer we fetch isn't valid until after we fix the half-orphan.

    Fortunately the fix here is simpler than the bug and we just need to carefully handle how lfs_deorphan scans the threaded linked-list of metadata-pairs.

  • Fixed reproducability issue when we can't read a directory revision 517d341

    An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count.

    We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging.

    I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early.

  • Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size 517d341

    This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit.

    Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased.

  • Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync 517d341

    Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct.

    This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often.

    If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem.

    It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it.

  • Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan 517d341

    This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning?

    It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass.

    Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.

  • Fixed commit issue if block cannot be written to and contains valid commits 77e3078

    We can miss bad writes if the underlying block device contains a valid commit with the exact same size as in the exact same offset.

    The fix here is to check the CRC on disk with the CRC we have in RAM. This avoids the issue without needed to save our entire commit in RAM.

    It could be an issue if the disk contains a valid commit with exact same size at the exact same offset and the exact same CRC. But this is unlikely enough to never happen.

  • Fixed broken wear-leveling when block_cycles = 2n-1 fe957de

    See Metadata Revision Value Incredibly Large #369

    Blocks in the metadata-pair are relocated every "block_cycles", or, more mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.

    But there's a problem, rev isn't += 1 every block write. There are two blocks in a metadata-pair, so looking at it from each blocks perspective, rev += 2 every block write.

    This leads to a sort of aliasing issue, where, if block_cycles is divisible by 2, one block in the metadata-pair is always relocated, and the other block is never relocated. Causing a complete failure of block-level wear-leveling.

    Fortunately, because of a previous workaround to avoid block_cycles = 1 (since this will cause the relocation algorithm to never terminate), the actual math is rev % (block_cycles+1) == 0. This means the bug only shows its head in the much less likely case where block_cycles is a multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for some n.

    To workaround this we can bitwise or our block_cycles with 1 to force it to never be a multiple of 2n.

  • Fixed lfs_fs_size doubling metadata-pairs 6530cb3

    This was caused by the previous fix for allocations during lfs_fs_deorphan in this branch. To catch half-orphans during block allocations we needed to duplicate all metadata-pairs reported to lfs_fs_traverse. Unfortunately this causes lfs_fs_size to report 2x the number of metadata-pairs, which would undoubtably confuse users.

    The fix here is inelegantly simple, just do a different traversale for allocations and size measurements. It reuses the same code but touches slightly different sets of blocks.

    Unfortunately, this causes the public lfs_fs_traverse and lfs_fs_size functions to split in how they report blocks. This is technically allowed, since lfs_fs_traverse may report blocks multiple times due to CoW behavior, however it's undesirable and I'm sure there will be some confusion.

    But I don't have a better solution, so from this point lfs_fs_traverse will be reporting 2x metadata-blocks and shouldn't be used for finding the number of available blocks on the filesystem.

  • Fixed lfs_alloc lookahead corruption if littlefs hits a bd error during traversal

    This issue was caused by errors returned from the block device during a lookahead scan. littlefs aborts the scan to return the errors, but at that point has already cleared the lookahead buffer, putting the allocator into a state where is thinks blocks are free when they may be in use.

    The fix is to mark the lookahead buffer as empty if the allocator must abort. This will force a new lookahead scan on the next alloc call. This means a filesystem may be unable to make progress, but at least gives the user a chance to recover.

    This was found and fixed by @thrasher8390 in Lookahead corruption fix given an IO Error during traversal #401

Related issues

WIP: I'm currently going through open issues to figure out what is impacted. I'm currently behind on this.

TODO

This is the start of reworking littlefs's testing framework based on
lessons learned from the initial testing framework.

1. The testing framework needs to be _flexible_. It was hacky, which by
   itself isn't a downside, but it wasn't _flexible_. This limited what
   could be done with the tests and there ended up being many
   workarounds just to reproduce bugs.

   The idea behind this revamped framework is to separate the
   description of tests (tests/test_dirs.toml) and the running of tests
   (scripts/test.py).

   Now, with the logic moved entirely to python, it's possible to run
   the test under varying environments. In addition to the "just don't
   assert" run, I'm also looking to run the tests in valgrind for memory
   checking, and an environment with simulated power-loss.

   The test description can also contain abstract attributes that help
   control how tests can be ran, such as "leaky" to identify tests where
   memory leaks are expected. This keeps test limitations at a minimum
   without limiting how the tests can be ran.

2. Multi-stage-process tests didn't really add value and limited what
   the testing environment.

   Unmounting + mounting can be done in a single process to test the
   same logic. It would be really difficult to make this fail only
   when memory is zeroed, though that can still be caught by
   power-resilient tests.

   Requiring every test to be a single process adds several options
   for test execution, such as using a RAM-backed block device for
   speed, or even running the tests on a device.

3. Added fancy assert interception. This wasn't really a requirement,
   but something I've been wanting to experiment with for a while.

   During testing, scripts/explode_asserts.py is added to the build
   process. This is a custom C-preprocessor that parses out assert
   statements and replaces them with _very_ verbose asserts that
   wouldn't normally be possible with just C macros.

   It even goes as far as to report the arguments to strcmp, since the
   lack of visibility here was very annoying.

   tests_/test_dirs.toml:186:assert: assert failed with "..", expected eq "..."
       assert(strcmp(info.name, "...") == 0);

   One downside is that simply parsing C in python is slower than the
   entire rest of the compilation, but fortunately this can be
   alleviated by parallelizing the test builds through make.

Other neat bits:
- All generated files are a suffix of the test description, this helps
  cleanup and means it's (theoretically) possible to parallelize the
  tests.
- The generated test.c is shoved base64 into an ad-hoc Makefile, this
  means it doesn't force a rebuild of tests all the time.
- Test parameterizing is now easier.
- Hopefully this framework can be repurposed also for benchmarks in the
  future.
- Reworked how permutations work
  - Now with global defines as well (apply to all code)
  - Also supports lists of different permutation sets
- Added better cleanup in tests and "make clean"
Aside from reworking the internals of test_.py to work well with
inherited TestCase classes, this also provides the two main features
that were the main reason for revamping the test framework

1. ./scripts/test_.py --reentrant

   Runs reentrant tests (tests with reentrant=true in the .toml
   configuration) under gdb such that the program is killed on every
   call to lfs_emubd_prog or lfs_emubd_erase.

   Currently this just increments a number of prog/erases to skip, which
   means it doesn't necessarily check every possible branch of the test,
   but this should still provide a good coverage of power-loss tests.

2. ./scripts/test_.py --gdb

   Run the tests and if a failure is hit, drop into GDB. In theory this
   will be very useful for reproducing and debugging test failures.

   Note this can be combined with --reentrant to drop into GDB on the
   exact cycle of power-loss where the tests fail.
The idea behind emubd (file per block), was neat, but doesn't add much
value over a block device that just operates on a single linear file
(other than adding a significant amount of overhead). Initially it
helped with debugging, but when the metadata format became more complex
in v2, most debugging ends up going through the debug.py script anyways.

Aside from being simpler, moving to filebd means it is also possible to
mount disk images directly.

Also introduced rambd, which keeps the disk contents in RAM. This is
very useful for testing where it increases the speed _significantly_.
- test_dirs w/ filebd - 0m7.170s
- test_dirs w/ rambd  - 0m0.966s

These follow the emubd model of using the lfs_config for geometry. I'm
not convinced this is the best approach, but it gets the job done.

I've also added lfs_ramdb_createcfg to add additional config similar to
lfs_file_opencfg. This is useful for specifying erase_value, which tells
the block device to simulate erases similar to flash devices. Note that
the default (-1) meets the minimum block device requirements and is the
most performant.
Also some tweaks to test_.py to capture Makefile warnings and print
test locations a bit better.
This involved some minor tweaks for the various types of tests, added
predicates to the test framework (necessary for test_entries and
test_alloc), and cleaned up some of the testing semantics such as
reporting how many tests are filtered, showing permutation config on
the result screen, and properly inheriting suite config in cases.
Both test_move and test_orphan needed internal knowledge which comes
with the addition of the "in" attribute. This was in the plan for the
test-revamp from the beginning as it really opens up the ability to
write more unit-style-tests using internal knowledge of how littlefs
works. More unit-style-tests should help _fix_ bugs by limiting the
scope of the test and where the bug could be hiding.

The "in" attribute effectively runs tests _inside_ the .c file
specified, giving the test access to all static members without
needed to change their visibility.
Even with adding better reentrance testing, the bad-block tests are
still very useful at isolating the block eviction logic.

This also required rewriting a bit of the internal testing wirework
to allow custom block devices which opens up quite a bit more straegies
for testing.
Also finished migrating tests with test_relocations and test_exhaustion.

The issue I was running into when migrating these tests was a lack of
flexibility with what you could do with the block devices. It was
possible to hack in some hooks for things like bad blocks and power
loss, but it wasn't clean or easily extendable.

The solution here was to just put all of these test extensions into a
third block device, testbd, that uses the other two example block
devices internally.

testbd has several useful features for testing. Note this makes it a
pretty terrible block device _example_ since these hooks look more
complicated than a block device needs to be.

- testbd can simulate different erase values, supporting 1s, 0s, other byte
  patterns, or no erases at all (which can cause surprising bugs). This
  actually depends on the simulated erase values in ramdb and filebd.

  I did try to move this out of rambd/filebd, but it's not possible to
  simulate erases in testbd without buffering entire blocks and creating
  an excessive amount of extra write operations.

- testbd also helps simulate power-loss by containing a "power cycles"
  counter that is decremented every write operation until it calls exit.

  This is notably faster than the previous gdb approach, which is
  valuable since the reentrant tests tend to take a while to resolve.

- testbd also tracks wear, which can be manually set and read. This is
  very useful for testing things like bad block handling, wear leveling,
  or even changing the effective size of the block device at runtime.
Also fixed a bug in dir splitting when there's a large number of open
files, which was the main reason I was trying to make it easier to debug
disk images.

One part of the recent test changes was to move away from the
file-per-block emubd and instead simulate storage with a single
contiguous file. The file-per-block format was marginally useful
at the beginning, but as the remaining bugs get more subtle, it
becomes more useful to inspect littlefs through scripts that
make the underlying metadata more human-readable.

The key benefit of switching to a contiguous file is these same
scripts can be reused for real disk images and can even read through
/dev/sdb or similar.

- ./scripts/readblock.py disk block_size block

  off       data
  00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73  q.......littlefs
  00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00  /...............
  00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19  ...............
  00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00  a....b 0......d.
  ...

  readblock.py prints a hex dump of a given block on disk. It's basically
  just "dd if=disk bs=block_size count=1 skip=block | xxd -g1 -" but with
  less typing.

- ./scripts/readmdir.py disk block_size block1 block2

  off       tag       type            id  len  data (truncated)
  0000003b: 0020000a  dir              0   10  63 6f 6c 64 63 6f 66 66 coldcoff
  00000049: 20000008  dirstruct        0    8  02 02 00 00 03 02 00 00 ........
  00000008: 00200409  dir              1    9  68 6f 74 63 6f 66 66 65 hotcoffe
  00000015: 20000408  dirstruct        1    8  fe 01 00 00 ff 01 00 00 ........

  readmdir.py prints info about the tags in a metadata pair on disk. It
  can print the currently active tags as well as the raw log of the
  metadata pair.

- ./scripts/readtree.py disk block_size

  superblock "littlefs"
    version v2.0
    block_size 512
    block_count 1024
    name_max 255
    file_max 2147483647
    attr_max 1022
  gstate 0x000000000000000000000000
  dir "/"
  mdir {0x0, 0x1} rev 3
  v id 0 superblock "littlefs" inline size 24
  mdir {0x77, 0x78} rev 1
    id 0 dir "coffee" dir {0x1fc, 0x1fd}
  dir "/coffee"
  mdir {0x1fd, 0x1fc} rev 2
    id 0 dir "coldcoffee" dir {0x202, 0x203}
    id 1 dir "hotcoffee" dir {0x1fe, 0x1ff}
  dir "/coffee/coldcoffee"
  mdir {0x202, 0x203} rev 1
  dir "/coffee/warmcoffee"
  mdir {0x200, 0x201} rev 1

  readtree.py parses the littlefs tree and prints info about the
  semantics of what's on disk. This includes the superblock,
  global-state, and directories/metadata-pairs. It doesn't print
  the filesystem tree though, that could be a different tool.
The root of the problem was some assumptions about what tags could be
sent to lfs_dir_commit.

- The first assumption is that there could be only one splice (create/delete)
  tag at a time, which is trivially broken by the core commit in lfs_rename.

- The second assumption is that there is at most one create and one delete in
  a single commit. This is less obvious but turns out to not be true in
  the case that we rename a file such that it overwrites another file in
  the same directory (1 delete for source file, 1 delete for destination).

- The third assumption was that there was an ordering to the
  delete/creates passed to lfs_dir_commit. It may be possible to force all
  deletes to follow creates by rearranging the tags in lfs_rename, but
  this risks overflowing tag ids.

The way the lfs_dir_commit first collected the "deletetag" and "createtag"
broke all three of these assumptions. And because we lose the ordering
information we can no longer apply the directory changes to open files
correctly. The file ids may be shifted in a way that doesn't reflect the
actual operations on disk.

These problems were made worst by lfs_dir_commit cleaning up moves
implicitly, which also creates deletes implicitly. While cleaning up moves
in lfs_dir_commit may save some code size, it makes the commit logic much more
difficult to implement correctly.

This bug turned into pulling out a dead tree stump, roots and all.

I ended up reworking how lfs_dir_commit updates open files so that it
has less assumptions, now it just traverses the commit tags multiple
times in order to update file ids after a successful commit in the
correct order.

This also got rid of the dir copy by carefully updating split dirs
after all files have an up-to-date copy of the original dir.

I also just removed the implicit move cleanup. It turns out the only
commits that can occur before we have cleaned up the move is in
lfs_fs_relocate, so it was simple enough to explicitly handle this case
when we update our parent and pred during a relocate.

Cases where we may need to fix moves:
- In lfs_rename when we move a file/dir
- In lfs_demove if we lose power
- In lfs_fs_relocate if we have to relocate our parent and we find it
  had a pending move (or else the move will be outdated)
- In lfs_fs_relocate if we have to relocate our predecessor and we find it
  had a pending move (or else the move will be outdated)

Note the two cases in lfs_fs_relocate may be recursive. But
lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not
possible for pending moves to spill out into other filesystem commits

And of couse, I added several tests to cover these situations. Hopefully
the rename-with-open-files logic should be fairly locked down now.

found with initial fix by eastmoutain
… out

The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
  threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write

Also added a couple other tests to test_orphans and test_relocations.

The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.

1. It's actually possible for our parent to be relocated and go out of
   sync in lfs_mkdir. This can happen if our predecessor's predecessor
   is our parent as we are threading ourselves into the filesystem's
   threaded list. (note this doesn't happen if our predecessor _is_ our
   parent, as we then update our parent in a single commit).

   This is annoying because it only happens if our parent is a long (>1
   pair) directory, otherwise we wouldn't need to catch relocations.
   Fortunately we can reuse the internal open file/dir linked-list to
   catch relocations easily, as long as we're careful to unhook our
   parent whenever lfs_mkdir returns.

2. Even more surprising, it's possible for the child in lfs_remove
   to be relocated while we delete the entry from our parent. This
   can happen if we are our own parent's predecessor, since we need
   to be updated then if our parent relocates.

   Fortunately we can also hook into the open linked-list here.

   Note this same issue was present in lfs_rename.

   Fortunately, this means now all fetched dirs are hooked into the
   open linked-list if they are needed across a commit. This means
   we shouldn't need assumptions about tree movement for correctness.

3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
   was broken and tried to delete the entry twice.

4. Managing gstate deltas when we lose power during relocations was
   broken. And unfortunately complicated.

   The issue happens when we lose power during a relocation while
   removing a directory.

   When we remove a directory, we need to move the contents of its
   gstate delta to another directory or we'll corrupt littlefs gstate.
   (gstate is an xor of all deltas on the filesystem). We used to just
   xor the gstate into our parent's gstate, however this isn't correct.

   The gstate isn't built out of the directory tree, but rather out of
   the threaded linked-list (which exists to make collecting this
   gstate efficient).

   Because we have to remove our dir in two operations, there's a point
   were both the updated parent and child can exist in threaded
   linked-list and duplicate the child's gstate delta.

     .--------.
   ->| parent |-.
     | gstate | |
   .-|   a    |-'
   | '--------'
   |     X <- child is orphaned
   | .--------.
   '>| child  |->
     | gstate |
     |   a    |
     '--------'

   What we need to do is save our child's gstate and only give it to our
   predecessor, since this finalizes the removal of the child.

   However we still need to make valid updates to the gstate to mark
   that we've created an orphan when we start removing the child.

   This led to a small rework of how the gstate is handled. Now we have
   a separation of the gpending state that should be written out ASAP
   and the gdelta state that is collected from orphans awaiting
   deletion.

5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
   more than one orphan after a power-cycle. Having more than one orphan
   is very rare, but of course very possible. Fortunately this was just
   a mistake with using a break the in the deorphan, perhaps left from
   v1 where multiple orphans weren't possible?

   Note that we use a continue to force a refetch of the orphaned block.
   This is needed in the case of a half-orphan, since the fetched
   half-orphan may have an outdated tail pointer.
Normally I wouldn't consider optimizing this sort of script, but
explode_asserts.py proved to be terribly inefficient and dominated
the build time for running tests. It was slow enough to be distracting
when attempting to test patches while debugging. Just running
explode_asserts.py was ~10x slower than the rest of the compilation
process.

After implementing a proper tokenizer and switching to a handwritten
recursive descent parser, I was able to speed up explode_asserts.py
by ~5x and make test compilation much more tolerable.

I don't think this was a limitaiton of parsy, but rather switching
to a recursive descent parser made it much easier to find the hotspots
where parsing was wasting cycles (string slicing for one).

It's interesting to note that while the assert patterns can be parsed
with a LL(1) parser (by dumping seen tokens if a pattern fails),
I didn't bother as it's much easier to write the patterns with LL(k)
and parsing asserts is predicated by the "assert" string.

A few other tweaks:
- allowed combining different test modes in one run
- added a --no-internal option
- changed test_.py to start counting cases from 1
- added assert(memcmp(a, b) == 0) matching
- added better handling of string escapes in assert messages

time to run tests:
before: 1m31.122s
after:  0m41.447s
The root of the problem was the notorious Python quirk with mutable
default parameters. The default defines for the TestSuite class ended
up being mutated as the class determined the permutations to test,
corrupting other test's defines.

However, the only define that was mutated this way was the CACHE_SIZE
config in test_entries.

The crazy thing was how this small innocuous change would cause
"./scripts/test.py -nr test_relocations" and "./scripts/test.py -nr"
to drift out of sync only after a commit spanning the different cache
sizes would be written out with a different number of prog calls. This
offset the power-cycle counter enough to cause one case to make it to
an erase, and the other to not.

Normally, the difference between a successful/unsuccessful erase
wouldn't change the result of a test, but in this case it offset the
revision count used for wear-leveling, causing one run run expand the
superblock and the other to not.

This change to the filesystem would then propogate through the rest of
the test, making it difficult to reproduce test failures.

Fortunately the fix was to just make a copy of the default define
dictionary. This should also prevent accidently mutating of dicts
belonging to our caller.

Oh, also fixed a buffer overflow in test_files.
- Removed old tests and test scripts
- Reorganize the block devices to live under one directory
- Plugged new test framework into Makefile

renamed:
- scripts/test_.py -> scripts/test.py
- tests_ -> tests
- {file,ram,test}bd/* -> bd/*

It took a surprising amount of effort to make the Makefile behave since
it turns out the "test_%" rule could override "tests/test_%.toml.test"
which is generated as part of test.py.
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
  outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan

Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
  beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode

Explanation of fixes below

1. Fixed reproducability issue when we can't read a directory revision

   An interesting subtlety of the block-device layer is that the
   block-device is allowed to return LFS_ERR_CORRUPT on reads to
   untouched blocks. This can easily happen if a user is using ECC or
   some sort of CMAC on their blocks. Normally we never run into this,
   except for the optimization around directory revisions where we use
   uninitialized data to start our revision count.

   We correctly handle this case by ignoring whats on disk if the read
   fails, but end up using unitialized RAM instead. This is not an issue
   for normal use, though it can lead to a small information leak.
   However it creates a big problem for reproducability, which is very
   helpful for debugging.

   I ended up running into a case where the RAM values for the revision
   count was different, causing two identical runs to wear-level at
   different times, leading to one version running out of space before a
   bug occured because it expanded the superblock early.

2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size

   This could be caused if the previous tag was a valid commit and we
   lost power causing a partially written tag as the start of a new
   commit.

   Fortunately we already have a separate condition for exceeding the
   block size, so we can force that case to always treat the mdir as
   unerased.

3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
   outline a file in lfs_file_sync

   Most operations involving metadata-pairs treat the mdir struct as
   entirely temporary and throw it out if any error occurs. Except for
   lfs_file_sync since the mdir is also a part of the file struct.

   This is relevant because of a cleanup issue in lfs_dir_compact that
   usually doesn't have side-effects. The issue is that lfs_fs_relocate
   can fail. It needs to allocate new blocks to relocate to, and as the
   disk reaches its end of life, it can fail with ENOSPC quite often.

   If lfs_fs_relocate fails, the containing lfs_dir_compact would return
   immediately without restoring the previous state of the mdir. If a new
   commit comes in on the same mdir, the old state left there could
   corrupt the filesystem.

   It's interesting to note this is forced to happen in lfs_file_sync,
   since it always tries to outline the file if it gets ENOSPC (ENOSPC
   can mean both no blocks to allocate and that the mdir is full). I'm
   not actually sure this bit of code is necessary anymore, we may be
   able to remove it.

4. Fixed cleanup issue if we run out of space while extending a CTZ
   skip-list

   The actually CTZ skip-list logic itself hasn't been touched in more
   than a year at this point, so I was surprised to find a bug here. But
   it turns out the CTZ skip-list could be put in an invalid state if we
   run out of space while trying to extend the skip-list.

   This only becomes a problem if we keep the file open, clean up some
   space elsewhere, and then continue to write to the open file without
   modifying it. Fortunately an easy fix.

5. Fixed missing half-orphans when allocating blocks during
   lfs_fs_deorphan

   This was a really interesting bug. Normally, we don't have to worry
   about allocations, since we force consistency before we are allowed
   to allocate blocks. But what about the deorphan operation itself?
   Don't we need to allocate blocks if we relocate while deorphaning?

   It turns out the deorphan operation can lead to allocating blocks
   while there's still orphans and half-orphans on the threaded
   linked-list. Orphans aren't an issue, but half-orphans may contain
   references to blocks in the outdated half, which doesn't get scanned
   during the normal allocation pass.

   Fortunately we already fetch directory entries to check CTZ lists, so
   we can also check half-orphans here. However this causes
   lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
   about this yet.
I had a system that was constantly hitting this assert, after making
this change it recovered immediately.
@@ -480,7 +475,7 @@ static int lfs_alloc(lfs_t *lfs, lfs_block_t *block) {

// find mask of free blocks from tree
memset(lfs->free.buffer, 0, lfs->cfg->lookahead_size);
int err = lfs_fs_traverse(lfs, lfs_alloc_lookahead, lfs);
int err = lfs_fs_traverseraw(lfs, lfs_alloc_lookahead, lfs, true);
if (err) {
return err;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reset .i, .size, .off incase of err, we want to make sure that we force a traverse next time lfs_alloc is called.
I'm going to make a PR related to #394 but may need help updating the testing suite as I'm not certain the best way to make that change.

…te loops

These two features have been much requested by users, and have even had
several PRs proposed to fix these in several cases. Before this, these
error conditions usually were caught by internal asserts, however
asserts prevented users from implementing their own workarounds.

It's taken me a while to provide/accept a useful recovery mechanism
(returning LFS_ERR_CORRUPT instead of asserting) because my original thinking
was that these error conditions only occur due to bugs in the filesystem, and
these bugs should be fixed properly.

While I still think this is mostly true, the point has been made clear
that being able to recover from these conditions is definitely worth the
code cost. Hopefully this new behaviour helps the longevity of devices
even if the storage code fails.

Another, less important, reason I didn't want to accept fixes for these
situations was the lack of tests that prove the code's value. This has
been fixed with the new testing framework thanks to the additional of
"internal tests" which can call C static functions and really take
advantage of the internal information of the filesystem.
Derek Thrasher and others added 4 commits March 29, 2020 14:12
…rmation so that we can reset it after a failed traversal
… v1 code since it can be risky. Let's improve the future! Also renamed and moved around a the lookahead free / reset function
- Removed the declaration of lfs_alloc_ack
- Consistent brackets
…94-lookahead-buffer-corruption

Lookahead corruption fix given an IO Error during traversal
@geky geky marked this pull request as ready for review March 29, 2020 23:01
@geky geky added next minor and removed needs work nothing broken but not ready yet labels Mar 29, 2020
Block device tracing has a lot of potential uses, of course debugging,
but it can also be used for profiling and externally tracking littlefs's
usage of the block device. However, block device tracing emits a massive
amount of output. So keeping block device tracing on by default limits
the usefulness of the filesystem tracing.

So, instead, I've moved the block device tracing into a separate
LFS_TESTBD_YES_TRACE define which switches on the LFS_TESTBD_TRACE
macro. Note that this means in order to get block device tracing, you
need to define both LFS_YES_TRACE and LFS_TESTBD_YES_TRACE. This is
needed as the LFS_TRACE definition is gated by LFS_YES_TRACE in
lfs_util.h.
- Standardized littlefs debug statements to use hex prefixes and
  brackets for printing pairs.

- Removed the entry behavior for readtree and made -t the default.
  This is because 1. the CTZ skip-list parsing was broken, which is not
  surprising, and 2. the entry parsing was more complicated than useful.
  This functionality may be better implemented as a proper filesystem
  read script, complete with directory tree dumping.

- Changed test.py's --gdb argument to take [init, main, assert],
  this matches the names of the stages in C's startup.

- Added printing of tail to all mdir dumps in readtree/readmdir.

- Added a print for if any mdirs are corrupted in readtree.

- Added debug script side-effects to .gitignore.
@geky
Copy link
Member Author

geky commented Mar 30, 2020

Ok, I believe it's time to bring this PR in. There's still a number of open issues that would be nice to fix (specifically non-DAG trees and #379), but this PR has been open for long enough already and additional fixes can hopefully come in on later patch releases.

v2.2 will include this PR along with any open PRs tagged v2.2

There's going to be a delay as we wait for CI (I've lost permissions to kill unnecessary Travis jobs and need to sort that out), but hopefully the v2.2 release will be done in the next day or two.

@geky geky merged commit a049f13 into master Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants