Skip to content

Commit

Permalink
WriteRename: Create backup of original file
Browse files Browse the repository at this point in the history
To make sure the updates on important files are performed correctly,
we are taking the following steps:

1. Create temporary file next to the file we want to update and Write
the new context to the temp file and call fsync on the descriptor, and
close the descriptor

2. Execute `rename` syscall to move the tempfile -> original_file

The only step missing here is:

3. Call `fsync` on the directory containing the file.

Theoretically `rename` should guarantee deleting the original file and
renaming the temp file as one atomic operation. But we still observe that
the original file is some time missing after a power cut, or crash. It
can be that we are forcing a weird corner case in ext4 code because of
this missing step 3, and because of `rename` is closely followed by
`unlink`, but we do not have any proof yet.

To address this behavior, this patch changes the workflow to the
following:

1. Create temp file and write new context, call fsync and close

2. Create a copy of the original file as "orig_file.bac"

3. Call fsync on the directory where the original file located

4. Rename temp_file -> orig_file

5. Call fsync on the directory where the original file located

This way, whenever the crash happens, we must have at least the *.bac
file.

This patch introduces the safety net. We would still need to teach the
consumers of these files to reach for the *.bac file in case the
original file is missing.

A positive side effect of this feature - postmortem analysis of a
broken node. Or a leverage for manual recovery after configuration bug.

Signed-off-by: Yuri Volchkov <yuri@zededa.com>
  • Loading branch information
zededa-yuri authored and eriknordmark committed May 31, 2022
1 parent ec68fa7 commit 3b29da6
Showing 1 changed file with 53 additions and 2 deletions.
55 changes: 53 additions & 2 deletions pkg/pillar/utils/file/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,49 @@ import (
"time"

"github.com/lf-edge/eve/pkg/pillar/base"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)

const maxCounterReadSize = 16384 // Max size of counter file

func dirSync(dirName string) error {
f, err := os.OpenFile(dirName, os.O_RDONLY, 0755)
if err != nil {
return err
}

err = f.Sync()
if err != nil {
f.Close()
return err
}

// Not a deferred call, because dirSync is a critical
// path. Better safe then sorry, and we better check all the
// errors including one returned by close()
err = f.Close()
return err
}

func backupFile(fileName string) error {
_, err := os.Stat(fileName)
if err != nil {
// File doesn't exist
return err
}

bakName := fmt.Sprintf("%s.bak", fileName)

err = CopyFile(fileName, bakName)
if err != nil {
return err
}

err = dirSync(filepath.Dir(fileName))
return err
}

// WriteRename write data to a fmpfile and then rename it to a desired name
func WriteRename(fileName string, b []byte) error {
dirName := filepath.Dir(fileName)
Expand All @@ -38,18 +76,31 @@ func WriteRename(fileName string, b []byte) error {
return errors.New(errStr)
}
// Make sure the file is flused from buffers onto the disk
tmpfile.Sync()
if err := tmpfile.Sync(); err != nil {
errStr := fmt.Sprintf("WriteRename(%s) failed to sync temp file: %s",
fileName, err)
return errors.New(errStr)
}

if err := tmpfile.Close(); err != nil {
errStr := fmt.Sprintf("WriteRename(%s): %s",
fileName, err)
return errors.New(errStr)
}

err = backupFile(fileName)
if err != nil {
// Not a fatal error, continuing
logrus.Errorf("Unable to backup file %s: %v", fileName, err)
}

if err := os.Rename(tmpfile.Name(), fileName); err != nil {
errStr := fmt.Sprintf("writeRename(%s): %s",
fileName, err)
return errors.New(errStr)
}
return nil

return dirSync(filepath.Dir(fileName))
}

// Writable checks if the directory is writable
Expand Down

0 comments on commit 3b29da6

Please sign in to comment.