Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions g3doc/user_guide/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@ doc(
weight = "40",
)

doc(
name = "fuse",
src = "fuse.md",
category = "User Guide",
permalink = "/docs/user_guide/fuse/",
weight = "41",
)

doc(
name = "networking",
src = "networking.md",
Expand Down
136 changes: 136 additions & 0 deletions g3doc/user_guide/fuse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# FUSE

[TOC]

gVisor supports [FUSE](Filesystem in Userspace), allowing userspace programs to
serve filesystems inside a sandbox. There are two modes of operation:

* **In-sandbox FUSE**: A FUSE daemon runs inside the sandbox and communicates
with the gVisor kernel via `/dev/fuse`. This is the standard FUSE model.
* **External FUSE server**: A FUSE server runs on the host, outside the
sandbox, and communicates with gVisor over a socketpair passed into the
sandbox as a host file descriptor. This is useful when the filesystem
implementation must access resources that are not available inside the
sandbox.

## External FUSE Server

The external FUSE server feature allows a host-side process to serve a FUSE
filesystem into a gVisor sandbox. The host process and the sandbox communicate
over a Unix socketpair using the standard FUSE protocol. This approach avoids
the performance penalty incurred by context switching through the I/O proxy
mechansim that's otherwise used to expose host filesystems.

### How It Works

1. The host creates a Unix socketpair (`SOCK_SEQPACKET`).
2. One end of the socketpair is passed into the sandbox using the `--pass-fd`
flag on `runsc run` or `runsc create`.
3. The other end is given to a FUSE server process running on the host.
4. Inside the sandbox, the application mounts a FUSE filesystem using the
passed file descriptor.
5. All FUSE operations (read, write, lookup, etc.) are forwarded over the
socketpair to the host FUSE server, which performs the actual I/O.

### Setup

#### 1. Create the socketpair and start the FUSE server

The host process creates a socketpair and starts the FUSE server with one end:

```bash
# Example: create a socketpair and pass FD 4 to the FUSE server.
# The FUSE server reads FUSE requests from its FD and responds with
# the standard FUSE protocol (FUSEHeaderIn/Out framing).
./my_fuse_server --fd=4 --backing-dir=/data/shared
```

The FUSE server must implement the FUSE kernel protocol: it reads
`FUSEHeaderIn`-framed requests and writes `FUSEHeaderOut`-framed responses. At
minimum, it should handle `FUSE_INIT`, `FUSE_GETATTR`, `FUSE_LOOKUP`,
`FUSE_OPEN`, `FUSE_READ`, `FUSE_RELEASE`, and `FUSE_ACCESS`. Additional opcodes
like `FUSE_WRITE`, `FUSE_FLUSH`, `FUSE_STATFS`, and `FUSE_CREATE` can be added
as needed.

#### 2. Pass the FD into the sandbox

Use the `--pass-fd` flag to map the host-side socketpair FD into the sandbox:

```bash
runsc run \
--pass-fd=3:100 \
--bundle=/path/to/bundle \
my-container
```

The format is `--pass-fd=HOST_FD:GUEST_FD`. In this example, host FD 3 becomes
FD 100 inside the sandbox. The `--pass-fd` flag can be specified multiple times
to pass additional file descriptors.

#### 3. Mount the FUSE filesystem inside the container

Inside the sandbox, the application mounts a FUSE filesystem referencing the
passed FD:

```c
// Mount using the passed file descriptor.
mount("fuse", "/mnt/shared", "fuse", MS_NODEV | MS_NOSUID,
"fd=100,user_id=0,group_id=0,rootmode=40000");
```

Or equivalently from a shell:

```bash
mount -t fuse fuse /mnt/shared -o fd=100,user_id=0,group_id=0,rootmode=40000
```

The mount options are:

* `fd=N`: The file descriptor number inside the sandbox.
* `user_id=UID`: The UID that owns the mount.
* `group_id=GID`: The GID that owns the mount.
* `rootmode=MODE`: The permission mode of the root inode (octal). Use `40000`
for a directory.

### Example: End-to-End with a Socketpair

Here is a complete example in Go that sets up the host side:

```go
// Create a socketpair for FUSE communication.
fds, _ := unix.Socketpair(unix.AF_UNIX, unix.SOCK_SEQPACKET, 0)

// fds[0] goes into the sandbox, fds[1] goes to the FUSE server.
sandboxFile := os.NewFile(uintptr(fds[0]), "fuse-sandbox")
serverFD := fds[1]

// Start the FUSE server on the host with the server-side FD.
go myFuseServer.Serve(serverFD, "/data/backing")

// Launch the sandbox with the FD passed in.
cmd := exec.Command("runsc", "run",
"--pass-fd=3:100", // host FD 3 → guest FD 100
"--bundle="+bundleDir,
containerID,
)
cmd.ExtraFiles = []*os.File{sandboxFile} // FD 3 in the child process
cmd.Run()
```

### Limitations

* **No /dev/fuse**: The external path does not use `/dev/fuse`. The
application mounts FUSE using the passed socketpair FD directly.
* **FUSE protocol only**: The host server must implement the raw FUSE kernel
protocol. Higher-level FUSE libraries (e.g., libfuse) typically expect
`/dev/fuse` and may not work directly over a socketpair without adaptation.

## In-Sandbox FUSE

gVisor also supports the standard FUSE model where both the FUSE daemon and the
application run inside the sandbox. The daemon opens `/dev/fuse`, and the
application mounts a FUSE filesystem using the resulting file descriptor. This
works the same as FUSE on a regular Linux system, with the gVisor kernel
handling the FUSE protocol internally.

[FUSE]: https://www.kernel.org/doc/html/latest/filesystems/fuse.html
4 changes: 4 additions & 0 deletions pkg/sentry/fsimpl/fuse/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ go_library(
"directory.go",
"file.go",
"fusefs.go",
"host_connection.go",
"inode.go",
"inode_connection.go",
"inode_refs.go",
Expand Down Expand Up @@ -110,6 +111,8 @@ go_test(
srcs = [
"connection_test.go",
"dev_test.go",
"host_connection_integration_test.go",
"host_connection_test.go",
"utils_test.go",
"xattr_test.go",
],
Expand All @@ -119,6 +122,7 @@ go_test(
"//pkg/context",
"//pkg/errors/linuxerr",
"//pkg/fspath",
"//pkg/hostarch",
"//pkg/marshal/primitive",
"//pkg/sentry/fsimpl/testutil",
"//pkg/sentry/kernel",
Expand Down
51 changes: 40 additions & 11 deletions pkg/sentry/fsimpl/fuse/connection.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,34 @@ const (
fuseDefaultMaxPagesPerReq = 32
)

// fuseConn abstracts the FUSE request/response transport. The connection
// struct delegates call dispatch to its fuseConn implementation.
type fuseConn interface {
call(ctx context.Context, r *Request) (*Response, error)
release(ctx context.Context)
}

// deviceConn implements fuseConn for the in-sandbox /dev/fuse path.
// It uses the queue-based mechanism where the FUSE daemon reads requests
// from and writes responses to the DeviceFD.
type deviceConn struct {
conn *connection
}

func (dc *deviceConn) call(ctx context.Context, r *Request) (*Response, error) {
fut, err := dc.conn.callFuture(ctx, r)
if err != nil {
return nil, linuxError(err)
}
res, err := fut.resolve(ctx)
if err != nil {
return res, linuxError(err)
}
return res, nil
}

func (dc *deviceConn) release(ctx context.Context) {}

// connection is the struct by which the sentry communicates with the FUSE server daemon.
//
// Lock order:
Expand All @@ -54,6 +82,10 @@ const (
type connection struct {
connectionRefs

// fuseConn is the transport implementation. For the DeviceFD path this
// is a *deviceConn; for host passthrough this is a *hostConnection.
fuseConn fuseConn `state:"nosave"`

// We target FUSE 7.23.
// The following FUSE_INIT flags are currently unsupported by this implementation:
// - FUSE_EXPORT_SUPPORT
Expand Down Expand Up @@ -309,7 +341,12 @@ func newFUSEConnection(_ context.Context, fuseFD *DeviceFD, opts *filesystemOpti
// synchronization and without checking if fuseFD has already been used to
// mount another filesystem.

// Create the writeBuf for the header to be stored in.
return newFUSEConnectionOpts(opts)
}

// newFUSEConnectionOpts creates a FUSE connection with the given options.
// This is used by both the DeviceFD path and the host FD passthrough path.
func newFUSEConnectionOpts(opts *filesystemOptions) (*connection, error) {
conn := &connection{
completions: make(map[linux.FUSEOpID]*futureResponse),
fullQueueCh: make(chan struct{}, opts.maxActiveRequests),
Expand All @@ -321,6 +358,7 @@ func newFUSEConnection(_ context.Context, fuseFD *DeviceFD, opts *filesystemOpti
initializedChan: make(chan struct{}),
connected: true,
}
conn.fuseConn = &deviceConn{conn: conn}
conn.InitRefs()
return conn, nil
}
Expand Down Expand Up @@ -379,16 +417,7 @@ func (conn *connection) Call(ctx context.Context, r *Request) (*Response, error)
return nil, linuxerr.ECONNREFUSED
}

fut, err := conn.callFuture(ctx, r)
if err != nil {
return nil, linuxError(err)
}

res, err := fut.resolve(ctx)
if err != nil {
return res, linuxError(err)
}
return res, nil
return conn.fuseConn.call(ctx, r)
}

// callFuture makes a request to the server and returns a future response.
Expand Down
24 changes: 12 additions & 12 deletions pkg/sentry/fsimpl/fuse/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ func (fd *fileDescription) statusFlags() uint32 {
// Release implements vfs.FileDescriptionImpl.Release.
func (fd *fileDescription) Release(ctx context.Context) {
// no need to release if FUSE server doesn't implement Open.
conn := fd.inode().fs.conn
if conn.noOpen {
fs := fd.inode().fs
if fs.conn.noOpen {
return
}

Expand All @@ -89,19 +89,19 @@ func (fd *fileDescription) Release(ctx context.Context) {
opcode = linux.FUSE_RELEASE
}
// Ignoring errors and FUSE server replies is analogous to Linux's behavior.
req := conn.NewRequest(auth.CredentialsFromContext(ctx), pidFromContext(ctx), inode.nodeID, opcode, &in)
req := fs.conn.NewRequest(auth.CredentialsFromContext(ctx), pidFromContext(ctx), inode.nodeID, opcode, &in)
// The reply will be ignored since no callback is defined in asyncCallBack().
conn.Call(ctx, req)
fs.conn.Call(ctx, req)
}

// OnClose implements vfs.FileDescriptionImpl.OnClose.
func (fd *fileDescription) OnClose(ctx context.Context) error {
inode := fd.inode()
conn := inode.fs.conn
fs := inode.fs
inode.attrMu.Lock()
defer inode.attrMu.Unlock()

if conn.noOpen {
if fs.conn.noOpen {
return nil
}
if fd.OpenFlag&linux.FOPEN_NOFLUSH != 0 {
Expand All @@ -112,8 +112,8 @@ func (fd *fileDescription) OnClose(ctx context.Context) error {
Fh: fd.Fh,
LockOwner: 0, // TODO(gvisor.dev/issue/3245): file lock
}
req := conn.NewRequest(auth.CredentialsFromContext(ctx), pidFromContext(ctx), inode.nodeID, linux.FUSE_FLUSH, &in)
res, err := conn.Call(ctx, req)
req := fs.conn.NewRequest(auth.CredentialsFromContext(ctx), pidFromContext(ctx), inode.nodeID, linux.FUSE_FLUSH, &in)
res, err := fs.conn.Call(ctx, req)
if err != nil {
return err
}
Expand Down Expand Up @@ -170,9 +170,9 @@ func (fd *fileDescription) Sync(ctx context.Context) error {
inode := fd.inode()
inode.attrMu.Lock()
defer inode.attrMu.Unlock()
conn := inode.fs.conn
fs := inode.fs
// no need to proceed if FUSE server doesn't implement Open.
if conn.noOpen {
if fs.conn.noOpen {
return linuxerr.EINVAL
}

Expand All @@ -181,9 +181,9 @@ func (fd *fileDescription) Sync(ctx context.Context) error {
FsyncFlags: fd.statusFlags(),
}
// Ignoring errors and FUSE server replies is analogous to Linux's behavior.
req := conn.NewRequest(auth.CredentialsFromContext(ctx), pidFromContext(ctx), inode.nodeID, linux.FUSE_FSYNC, &in)
req := fs.conn.NewRequest(auth.CredentialsFromContext(ctx), pidFromContext(ctx), inode.nodeID, linux.FUSE_FSYNC, &in)
// The reply will be ignored since no callback is defined in asyncCallBack().
conn.CallAsync(ctx, req)
fs.conn.CallAsync(ctx, req)
return nil
}

Expand Down
Loading
Loading