Skip to content

Commit

Permalink
2.4.1 Release Changes (#1587)
Browse files Browse the repository at this point in the history
* feat: support workload identity token (#1556)

* feat: support workload identity token

* Create block pool only once in child process (#1581)

* create block pool in child only

* Update golang.org/x/crypto to v0.31.0 (#1594)

* Update golang.org/x/crypto to v0.31.0

* sync with main (#1603)

* updated year in copyright message (#1601)

* Use ListBlob for hns accounts  (#1555)

* Optimize HNS listing

* Added statfs for block-cache (#1470)

* Added statfs for block_cache

* Add strong consistency check for data on disk (#1604)

* Add strong consistency check for data on disk

* bug in block cache open call (#1580)

* current implementation of open file when opened in O_WRONLY will truncate the file to zero. This is incorrect behaviour.
We don't see it in the normal scenario as write-back cache is on by default. Hence all the open calls with O_WRONLY will be redirected O_RDWR.
To simulate this turn of the write-back cache and then open file in O_WRONLY.

* Feature: Blob filter (#1595)

* Integrating blob filter in azstorage

* Serve getAttr call for destination file after the Copy finishes from the cache

* Cleanup on start shall be set to cleanup temp cache (#1613)

* Add Tests

* Refactor the code and refresh the cache after copying the attributes

* Automate blobfuse2 setup for new VM (#1575)

added script for blobfuse setup and azsecpack setup in VM

* * Update the Unit tests.
* Refactor the Code

* Update Changelog

* do go fmt on src

* Downgrade go version to 1.22.7 due to memory issues in 1.23 (#1619)

* Enable ETAG based validation on every block download to provide higher consistency (#1608)

* Make etag validation a defualt option

* BUG#31069208:  Fixed Prefix filtering from File Path (#1618)

* Fixed the logic to filter out folder prefix from path
* Added/Updated/Removed test case

---------

Co-authored-by: weizhi <weizhichen@microsoft.com>
Co-authored-by: Sourav Gupta <98318303+souravgupta-msft@users.noreply.github.com>
Co-authored-by: Jan Jagusch <77677602+JanJaguschQC@users.noreply.github.com>
Co-authored-by: ashruti-msft <137055338+ashruti-msft@users.noreply.github.com>
Co-authored-by: syeleti-msft <syeleti@microsoft.com>
Co-authored-by: jainakanksha-msft <jainakanksha@microsoft.com>
  • Loading branch information
7 people authored Feb 3, 2025
1 parent 32ba9cc commit ea3f10f
Show file tree
Hide file tree
Showing 40 changed files with 1,625 additions and 522 deletions.
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@

## 2.4.1 (Unreleased)
**Bug Fixes**
- Create block pool only in the child process.
- Prevent the block cache to truncate the file size to zero when the file is opened in O_WRONLY mode when writebackcache is disabled.
- Correct statFS results to reflect block-cache in memory cache status.
- Do not wipeout temp-cache on start after a un-graceful unmount, if `cleanup-on-start` is not configured in file-cache.
- When the subdirectory is mounted and there is some file/folder operation, remove only the subdirectory path from the file paths.

**Other Changes**
- Optimized listing operation on HNS account to support symlinks.
- Optimized Rename operation to do less number of REST calls.

**Features**
- Mount container or directory but restrict the view of blobs that you can see. This feature is available only in read-only mount.
- To protect against accidental overwrites on data stored by block-cache on temp path, crc64 hash will be validated on read. This feature can be enabled by using `--block-cache-strong-consistency` cli flag.
- To provide strong consistency check, ETAG of the file will be preserved on open. For any subsequent block download, with block-cache, ETAG will be verified and if the blob has changed in container the download will be declare failure resulting into read failure.

## 2.4.0 (2024-12-03)
**Features**
- Added 'gen-config' command to auto generate the recommended blobfuse2 config file based on computing resources and memory available on the node. Command details can be found with `blobfuse2 gen-config --help`.
Expand All @@ -16,6 +34,7 @@
- `Stream` option automatically replaced with "Stream with Block-cache" internally for optimized performance.
- Login via Managed Identify is supported with Object-ID for all versions of blobfuse except 2.3.0 and 2.3.2.To use Object-ID for these two versions, use AzCLI or utilize Application/Client-ID or Resource ID base authentication..
- Version check is now moved to a static website hosted on a public container.
- 'df' command output will present memory availability in case of block-cache if disk is not configured.

## 2.3.2 (2024-09-03)
**Bug Fixes**
Expand Down
2 changes: 1 addition & 1 deletion MIGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Note: Blobfuse2 accepts all CLI parameters that Blobfuse does, but may ignore pa
| --log-level=LOG_WARNING | --log-level=LOG_WARNING | logging.level | |
| --use-attr-cache=true | --use-attr-cache=true | attr_cache | Add attr_cache to the components list |
| --use-adls=false | --use-adls=false | azstorage.type | Specify either 'block' or 'adls' |
| --no-symlinks=false | --no-symlinks=true | attr_cache.no-symlinks | |
| --no-symlinks=false | --no-symlinks=false | attr_cache.no-symlinks | |
| --cache-on-list=true | --cache-on-list=true | attr_cache.no-cache-on-list | This parameter has the opposite boolean semantics |
| --upload-modified-only=true | --upload-modified-only=true | | Always on in blobfuse2 |
| --max-concurrency=12 | --max-concurrency=12 | azstorage.max-concurrency | |
Expand Down
33 changes: 33 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -4093,4 +4093,37 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.






****************************************************************************

============================================================================
>>> github.com/vibhansa-msft/blobfilter
==============================================================================

MIT License

Copyright (c) 2024 Vikas Bhansali

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


--------------------- END OF THIRD PARTY NOTICE --------------------------------
32 changes: 31 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ One of the biggest BlobFuse2 features is our brand new health monitor. It allows
- Set MD5 sum of a blob while uploading
- Validate MD5 sum on download and fail file open on mismatch
- Large file writing through write Block-Cache
- Blob filter to view only files matching given criteria for read-only mount

## Blobfuse2 performance compared to blobfuse(v1.x.x)
- 'git clone' operation is 25% faster (tested with vscode repo cloning)
Expand Down Expand Up @@ -139,9 +140,10 @@ To learn about a specific command, just include the name of the command (For exa
* `--wait-for-mount=<TIMEOUT IN SECONDS>` : Let parent process wait for given timeout before exit to ensure child has started.
* `--block-cache` : To enable block-cache instead of file-cache. This works only when mounted without any config file.
* `--lazy-write` : To enable async close file handle call and schedule the upload in background.
* `--filter=<STRING>`: Enable blob filters for read-only mount to restrict the view on what all blobs user can see or read.
- Attribute cache options
* `--attr-cache-timeout=<TIMEOUT IN SECONDS>`: The timeout for the attribute cache entries.
* `--no-symlinks=true`: To improve performance disable symlink support.
* `--no-symlinks=false`: By default symlinks will be supported and the performance overhead, that earlier existed, has been resolved.
- Storage options
* `--container-name=<CONTAINER NAME>`: The container to mount.
* `--cancel-list-on-mount-seconds=<TIMEOUT IN SECONDS>`: Time for which list calls will be blocked after mount. ( prevent billing charges on mounting)
Expand All @@ -166,6 +168,8 @@ To learn about a specific command, just include the name of the command (For exa
* `--block-cache-prefetch=<Number of blocks>`: Number of blocks to prefetch at max when sequential reads are in progress. Default - 2 times number of CPU cores.
* `--block-cache-parallelism=<count>`: Number of parallel threads doing upload/download operation. Default - 3 times number of CPU cores.
* `--block-cache-prefetch-on-open=true`: Start prefetching on open system call instead of waiting for first read. Enhances perf if file is read sequentially from offset 0.
* `--block-cache-strong-consistency=true`: Enable strong data consistency checks in block-cache. This will increase load on your CPU and may introduce some latency.
This will need support of `xattr` on your system. Kindly install the feature manually before using this cli parameter.
- Fuse options
* `--attr-timeout=<TIMEOUT IN SECONDS>`: Time the kernel can cache inode attributes.
* `--entry-timeout=<TIMEOUT IN SECONDS>`: Time the kernel can cache directory listing.
Expand Down Expand Up @@ -235,6 +239,32 @@ Below diagrams guide you to choose right configuration for your workloads.
- [Sample Block-Cache Config](./sampleBlockCacheConfig.yaml)
- [All Config options](./setup/baseConfig.yaml)

## Blob Filter
- In case of read-only mount, user can configure a filter to restrict what all blobs a mount can see or operate on.
- Blobfuse supports filters based on
- Name
- Size
- Last modified time
- File extension
- Blob Name based filter
- Supported operations are "=" and "!="
- Name shall be a valid regex expression
- e.g. ```filter=name=^mine[0-1]\\d{3}.*```
- Size based filter
- Supported operations are "<=", ">=", "!=", "<", ">" and "="
- Size shall be provided in bytes
- e.g. ```filter=size > 1000```
- Last Modified Date based filter
- Supported operations are "<=", ">=", "<", ">" and "="
- Date shall be provided in RFC1123 Format e.g. "Mon, 24 Jan 1982 13:00:00 UTC"
- e.g. ```filter=modtime>Mon, 24 Jan 1982 13:00:00 UTC```
- File Extension based filter
- Supported operations are "=" and "!="
- Extension can be supplied as string. Do not include "." in the filter
- e.g. ```--filter=format=pdf```
- Multiple filters can be combined using '&&' and '||' operator as well, however precedence using '()' is not supported yet.
- e.g. ```--filter=name=^testfil.* && size>130000000```


## Frequently Asked Questions
- How do I generate a SAS with permissions for rename?
Expand Down
4 changes: 4 additions & 0 deletions azure-pipeline-templates/huge-list-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ steps:
env:
mount_dir: ${{ parameters.mount_dir }}

- script: grep "OUTGOING REQUEST" blobfuse2-logs.txt | wc -l
displayName: 'HugeList: ${{ parameters.idstring }} Request Count'
continueOnError: true

- script: |
cat blobfuse2-logs.txt
displayName: 'View Logs'
Expand Down
2 changes: 1 addition & 1 deletion cmd/mount.go
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ func (opt *mountOptions) validate(skipNonEmptyMount bool) error {
var cleanupOnStart bool
_ = config.UnmarshalKey("file_cache.cleanup-on-start", &cleanupOnStart)

if tempCachePath != "" && !cleanupOnStart {
if tempCachePath != "" && cleanupOnStart {
if err = common.TempCacheCleanup(tempCachePath); err != nil {
return fmt.Errorf("failed to cleanup file cache [%s]", err.Error())
}
Expand Down
2 changes: 1 addition & 1 deletion common/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ import (

// Standard config default values
const (
blobfuse2Version_ = "2.4.0"
blobfuse2Version_ = "2.4.1"

DefaultMaxLogFileSize = 512
DefaultLogFileCount = 10
Expand Down
13 changes: 13 additions & 0 deletions common/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@ import (
"crypto/aes"
"crypto/cipher"
"crypto/rand"
"encoding/binary"
"fmt"
"hash/crc64"
"io"
"os"
"os/exec"
Expand Down Expand Up @@ -500,3 +502,14 @@ func WriteToFile(filename string, data string, options WriteToFileOptions) error

return nil
}

func GetCRC64(data []byte, len int) []byte {
// Create a CRC64 hash using the ECMA polynomial
crc64Table := crc64.MakeTable(crc64.ECMA)
checksum := crc64.Checksum(data[:len], crc64Table)

checksumBytes := make([]byte, 8)
binary.BigEndian.PutUint64(checksumBytes, checksum)

return checksumBytes
}
10 changes: 10 additions & 0 deletions common/util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,16 @@ func (suite *utilTestSuite) TestWriteToFile() {

}

func (suite *utilTestSuite) TestCRC64() {
data := []byte("Hello World")
crc := GetCRC64(data, len(data))

data = []byte("Hello World!")
crc1 := GetCRC64(data, len(data))

suite.assert.NotEqual(crc, crc1)
}

func (suite *utilTestSuite) TestGetFuseMinorVersion() {
i := GetFuseMinorVersion()
suite.assert.GreaterOrEqual(i, 0)
Expand Down
22 changes: 19 additions & 3 deletions component/attr_cache/attr_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,21 @@ func (ac *AttrCache) invalidateDirectory(path string) {
ac.invalidatePath(path)
}

// Copies the attr to the given path.
func (ac *AttrCache) updateCacheEntry(path string, attr *internal.ObjAttr) {
cacheEntry, found := ac.cacheMap[path]
if found {
// Copy the attr
cacheEntry.attr = attr
// Update the path inside the attr
cacheEntry.attr.Path = path
// Update the Existence of the entry
cacheEntry.attrFlag.Set(AttrFlagExists)
// Refresh the cache entry
cacheEntry.cachedAt = time.Now()
}
}

// invalidatePath: invalidates a path
func (ac *AttrCache) invalidatePath(path string) {
// Keys in the cache map do not contain trailing /, truncate the path before referencing a key in the map.
Expand Down Expand Up @@ -360,14 +375,15 @@ func (ac *AttrCache) DeleteFile(options internal.DeleteFileOptions) error {
// RenameFile : Mark the source file deleted. Invalidate the destination file.
func (ac *AttrCache) RenameFile(options internal.RenameFileOptions) error {
log.Trace("AttrCache::RenameFile : %s -> %s", options.Src, options.Dst)

srcAttr := options.SrcAttr
err := ac.NextComponent().RenameFile(options)
if err == nil {
// Copy source attribute to destination.
// LMT of Source will be modified by next component if the copy is success.
ac.cacheLock.RLock()
defer ac.cacheLock.RUnlock()

ac.updateCacheEntry(options.Dst, srcAttr)
ac.deletePath(options.Src, time.Now())
ac.invalidatePath(options.Dst)
}

return err
Expand Down
49 changes: 47 additions & 2 deletions component/attr_cache/attr_cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,25 @@ func assertUntouched(suite *attrCacheTestSuite, path string) {
suite.assert.True(suite.attrCache.cacheMap[path].exists())
}

// This method is used when we transfer the attributes from the src to dst, and mark src as invalid
func assertAttributesTransferred(suite *attrCacheTestSuite, srcAttr *internal.ObjAttr, dstAttr *internal.ObjAttr) {
suite.assert.EqualValues(srcAttr.Size, dstAttr.Size)
suite.assert.EqualValues(srcAttr.Path, dstAttr.Path)
suite.assert.EqualValues(srcAttr.Mode, dstAttr.Mode)
suite.assert.EqualValues(srcAttr.Atime, dstAttr.Atime)
suite.assert.EqualValues(srcAttr.Mtime, dstAttr.Mtime)
suite.assert.EqualValues(srcAttr.Ctime, dstAttr.Ctime)
suite.assert.True(suite.attrCache.cacheMap[dstAttr.Path].exists())
suite.assert.True(suite.attrCache.cacheMap[dstAttr.Path].valid())
}

// If next component changes the times of the attribute.
func assertSrcAttributeTimeChanged(suite *attrCacheTestSuite, srcAttr *internal.ObjAttr, srcAttrCopy internal.ObjAttr) {
suite.assert.NotEqualValues(suite, srcAttr.Atime, srcAttrCopy.Atime)
suite.assert.NotEqualValues(suite, srcAttr.Mtime, srcAttrCopy.Mtime)
suite.assert.NotEqualValues(suite, srcAttr.Ctime, srcAttrCopy.Ctime)
}

// Directory structure
// a/
//
Expand Down Expand Up @@ -676,15 +695,41 @@ func (suite *attrCacheTestSuite) TestRenameFile() {
suite.assert.NotContains(suite.attrCache.cacheMap, src)
suite.assert.NotContains(suite.attrCache.cacheMap, dst)

// Entry Already Exists
// Src, Dst Entry Already Exists
addPathToCache(suite.assert, suite.attrCache, src, false)
addPathToCache(suite.assert, suite.attrCache, dst, false)
options.SrcAttr = suite.attrCache.cacheMap[src].attr
options.SrcAttr.Size = 1
options.SrcAttr.Mode = 2
options.DstAttr = suite.attrCache.cacheMap[dst].attr
options.DstAttr.Size = 3
options.DstAttr.Mode = 4
srcAttrCopy := *options.SrcAttr

suite.mock.EXPECT().RenameFile(options).Return(nil)
err = suite.attrCache.RenameFile(options)
suite.assert.Nil(err)
assertDeleted(suite, src)
modifiedDstAttr := suite.attrCache.cacheMap[dst].attr
assertSrcAttributeTimeChanged(suite, options.SrcAttr, srcAttrCopy)
// Check the attributes of the dst are same as the src.
assertAttributesTransferred(suite, options.SrcAttr, modifiedDstAttr)

// Src Entry Exist and Dst Entry Don't Exist
addPathToCache(suite.assert, suite.attrCache, src, false)
// Add negative entry to cache for Dst
suite.attrCache.cacheMap[dst] = newAttrCacheItem(&internal.ObjAttr{}, false, time.Now())
options.SrcAttr = suite.attrCache.cacheMap[src].attr
options.DstAttr = suite.attrCache.cacheMap[dst].attr
options.SrcAttr.Size = 1
options.SrcAttr.Mode = 2
suite.mock.EXPECT().RenameFile(options).Return(nil)
err = suite.attrCache.RenameFile(options)
suite.assert.Nil(err)
assertDeleted(suite, src)
assertInvalid(suite, dst)
modifiedDstAttr = suite.attrCache.cacheMap[dst].attr
assertSrcAttributeTimeChanged(suite, options.SrcAttr, srcAttrCopy)
assertAttributesTransferred(suite, options.SrcAttr, modifiedDstAttr)
}

// Tests Write File
Expand Down
1 change: 1 addition & 0 deletions component/azstorage/azauth.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ type azAuthConfig struct {
ClientID string
ClientSecret string
OAuthTokenFilePath string
WorkloadIdentityToken string
ActiveDirectoryEndpoint string

Endpoint string
Expand Down
16 changes: 16 additions & 0 deletions component/azstorage/azauthspn.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
package azstorage

import (
"context"

"github.com/Azure/azure-sdk-for-go/sdk/azcore"
"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/service"
Expand Down Expand Up @@ -69,6 +71,20 @@ func (azspn *azAuthSPN) getTokenCredential() (azcore.TokenCredential, error) {
log.Err("AzAuthSPN::getTokenCredential : Failed to generate token for SPN [%s]", err.Error())
return nil, err
}
} else if azspn.config.WorkloadIdentityToken != "" {
log.Trace("AzAuthSPN::getTokenCredential : Going for fedrated token flow ")

cred, err = azidentity.NewClientAssertionCredential(
azspn.config.TenantID,
azspn.config.ClientID,
func(ctx context.Context) (string, error) {
return azspn.config.WorkloadIdentityToken, nil
},
&azidentity.ClientAssertionCredentialOptions{})
if err != nil {
log.Err("AzAuthSPN::getTokenCredential : Failed to generate token for SPN [%s]", err.Error())
return nil, err
}
} else {
log.Trace("AzAuthSPN::getTokenCredential : Using client secret for fetching token")

Expand Down
10 changes: 7 additions & 3 deletions component/azstorage/azstorage.go
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ func (az *AzStorage) DeleteFile(options internal.DeleteFileOptions) error {
func (az *AzStorage) RenameFile(options internal.RenameFileOptions) error {
log.Trace("AzStorage::RenameFile : %s to %s", options.Src, options.Dst)

err := az.storage.RenameFile(options.Src, options.Dst)
err := az.storage.RenameFile(options.Src, options.Dst, options.SrcAttr)

if err == nil {
azStatsCollector.PushEvents(renameFile, options.Src, map[string]interface{}{src: options.Src, dest: options.Dst})
Expand Down Expand Up @@ -453,7 +453,8 @@ func (az *AzStorage) ReadInBuffer(options internal.ReadInBufferOptions) (length
return 0, nil
}

err = az.storage.ReadInBuffer(options.Handle.Path, options.Offset, dataLen, options.Data)
err = az.storage.ReadInBuffer(options.Handle.Path, options.Offset, dataLen, options.Data, options.Etag)

if err != nil {
log.Err("AzStorage::ReadInBuffer : Failed to read %s [%s]", options.Handle.Path, err.Error())
}
Expand Down Expand Up @@ -555,7 +556,7 @@ func (az *AzStorage) StageData(opt internal.StageDataOptions) error {
}

func (az *AzStorage) CommitData(opt internal.CommitDataOptions) error {
return az.storage.CommitBlocks(opt.Name, opt.List)
return az.storage.CommitBlocks(opt.Name, opt.List, opt.NewETag)
}

// TODO : Below methods are pending to be implemented
Expand Down Expand Up @@ -665,6 +666,9 @@ func init() {
preserveACL := config.AddBoolFlag("preserve-acl", false, "Preserve ACL and Permissions set on file during updates")
config.BindPFlag(compName+".preserve-acl", preserveACL)

blobFilter := config.AddStringFlag("filter", "", "Filter string to match blobs")
config.BindPFlag(compName+".filter", blobFilter)

config.RegisterFlagCompletionFunc("container-name", func(cmd *cobra.Command, args []string, toComplete string) ([]string, cobra.ShellCompDirective) {
return nil, cobra.ShellCompDirectiveNoFileComp
})
Expand Down
Loading

0 comments on commit ea3f10f

Please sign in to comment.