Skip to content

Implement containers snapshot API#6337

Open
gpanders wants to merge 7 commits intomainfrom
ganders/CC-6863
Open

Implement containers snapshot API#6337
gpanders wants to merge 7 commits intomainfrom
ganders/CC-6863

Conversation

@gpanders
Copy link
Member

Implement the first draft of the containers snapshot API. It is experimental for now.

Local dev snapshots are implemented using Docker volumes and the Docker archive API.

@gpanders gpanders requested review from a team as code owners March 16, 2026 19:11
@gpanders gpanders marked this pull request as draft March 16, 2026 19:12
@gpanders gpanders force-pushed the ganders/CC-6863 branch 2 times, most recently from a788f4a to be86e28 Compare March 16, 2026 19:16
Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR adds a containers snapshot API: snapshotDirectory() to create snapshots and snapshots option in start() to restore them, backed by Docker volumes and the Docker archive API for the local dev implementation.

Findings (ranked by severity):

  1. [MEDIUM] Snapshot restore path nesting — archive read from /mnt introduces extra directory level
  2. [LOW] dockerApiBinaryRequest reads entire response into memory unbounded before size check
  3. [LOW] Deserialized snapshot id is not validated before use in Docker volume names

These review comments were generated by an AI assistant and may not be perfect.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 16, 2026

I'm Bonk, and I've done a quick review of your PR.

This PR adds a containers snapshot API: snapshotDirectory() to create snapshots and snapshots option in start() to restore them, backed by Docker volumes and the Docker archive API for the local dev implementation.

I posted 3 inline review comments on PR #6337, ranked by severity:

  1. [MEDIUM] Snapshot restore path nesting (container-client.c++:1054) — The archive read from /mnt in the temp container includes a mnt/ prefix in the tar, which when extracted to the parent directory could introduce an extra directory level. Asked for confirmation that the round-trip works correctly.

  2. [LOW] Unbounded readAllBytes() (container-client.c++:577) — The binary response is fully buffered in memory before the 4 GiB size check runs.

  3. [LOW] Unvalidated snapshot id during restore (container-client.c++:1039) — Deserialized snapshot IDs are used in Docker volume names without format validation.

The overall design is sound — the serialization, compat flag gating, capnp schema additions, and test coverage all look good. The DirectorySnapshot JSG binding pattern with JSG_SERIALIZABLE, visitForMemoryInfo, and prototype properties follows existing conventions correctly.

github run

@github-actions
Copy link

github-actions bot commented Mar 16, 2026

The generated output of @cloudflare/workers-types has been changed by this PR. If this is intentional, run just generate-types to update the snapshot. Alternatively, you can download the full generated types:

Full Type Diff
diff -r types/generated-snapshot/latest/index.d.ts bazel-bin/types/definitions/latest/index.d.ts
3757a3758,3763
> interface DirectorySnapshot {
>   get id(): string;
>   get size(): number;
>   get dir(): string;
>   get name(): string | undefined;
> }
3773a3780
>   snapshots?: DirectorySnapshot[];
diff -r types/generated-snapshot/latest/index.ts bazel-bin/types/definitions/latest/index.ts
3763a3764,3769
> export interface DirectorySnapshot {
>   get id(): string;
>   get size(): number;
>   get dir(): string;
>   get name(): string | undefined;
> }
3779a3786
>   snapshots?: DirectorySnapshot[];

@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 0% with 286 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.64%. Comparing base (bc0bc2a) to head (509974a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/server/container-client.c++ 0.00% 201 Missing ⚠️
src/workerd/api/container.c++ 0.00% 69 Missing ⚠️
src/workerd/api/container.h 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6337      +/-   ##
==========================================
- Coverage   70.80%   70.64%   -0.16%     
==========================================
  Files         420      420              
  Lines      112299   112606     +307     
  Branches    18417    18459      +42     
==========================================
+ Hits        79513    79552      +39     
- Misses      21789    22056     +267     
- Partials    10997    10998       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

return size;
}

JSG_RESOURCE_TYPE(DirectorySnapshot) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this type seems to be a simple data structure containing four properties and no methods, would it make sense to define it as a JSG_STRUCT instead?

Structs will generally be a lot more efficient than JSG_RESOURCE_TYPE.

Copy link
Member Author

@gpanders gpanders Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we need JSG_RESOURCE_TYPE so that we can make the value serializable (and one day track references to these snapshots through DO storage) as we discussed in the RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpanders's reply is correct. JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE — it uses the jsgSuper typedef that only JSG_RESOURCE_TYPE defines (via static_assert(static_cast<uint>(jsgSuper::jsgSerializeTag) != static_cast<uint>(TAG))). JSG_STRUCT does not define jsgSuper and cannot use JSG_SERIALIZABLE.

The JSG README (line 163-164) and ser.h (line 19) both document this constraint. There are zero examples of JSG_STRUCT + JSG_SERIALIZABLE across the entire codebase — all 17 uses are on jsg::Object subclasses.

So using JSG_RESOURCE_TYPE here is the correct choice for a type that needs custom serialization for DO storage persistence.

@gpanders
Copy link
Member Author

/bonk review

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings from detailed review (ranked by severity):

  1. [MEDIUM] dockerApiBinaryRequest default maxResponseSize = kj::maxValue — PUT response bodies read without explicit limit
  2. [MEDIUM] cleanupSnapshotVolumes constructs JSON filter without escaping prefix
  3. [LOW] Deserialized size not validated in DirectorySnapshot::deserialize — could yield a double that doesn't round-trip exactly for very large values
  4. [LOW] Unnecessary static on free function validateSnapshotDir (minor style)

Re: kentonv's JSG_STRUCT question — gpanders's reply is correct: JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE (it references jsgSuper which only JSG_RESOURCE_TYPE defines). There are zero examples of JSG_STRUCT + JSG_SERIALIZABLE in the codebase, and the JSG README explicitly documents this constraint.

The overall design is sound: the snapshot create/restore round-trip via Docker volumes and the archive API is correct, the compat-flag gating is proper, the serialization format is versioned, and the cleanup logic (KJ_DEFER + volumeCommitted flag) handles partial failures well. The path nesting fix in the latest commit correctly addresses the earlier review finding.


These review comments were generated by an AI assistant and may not be perfect.

auto binds = hostConfig.initBinds(1);
binds.set(0, kj::str(volumeName, ":", mountPath));

auto response = co_await dockerApiRequest(network, kj::str(dockerPath), kj::HttpMethod::POST,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

kj::HttpMethod method,
kj::String endpoint,
kj::Maybe<kj::Array<kj::byte>> body = kj::none,
uint64_t maxResponseSize = kj::maxValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Default maxResponseSize = kj::maxValue makes it easy to accidentally read unbounded data from Docker.

The two PUT call sites (lines 1114 and 1235) rely on this default, meaning the Docker response body is read with no limit. While PUT responses from the Docker archive API are typically small status bodies, it's better to be explicit. Consider either:

  1. Removing the default and requiring all callers to pass an explicit limit, or
  2. Setting a reasonable default like MAX_JSON_RESPONSE_SIZE (16 MiB) which is already defined
Suggested change
uint64_t maxResponseSize = kj::maxValue) {
uint64_t maxResponseSize = MAX_JSON_RESPONSE_SIZE) {

This would make the code defensive by default — callers that need the larger 1 GiB tar limit already pass MAX_SNAPSHOT_TAR_SIZE explicitly.

kj::Promise<void> cleanupSnapshotVolumes(
kj::Network& network, kj::String dockerPath, kj::String containerName) {
auto prefix = kj::str(SNAPSHOT_VOLUME_PREFIX, containerName, "-");
auto filterJson = kj::str("{\"name\":[\"", prefix, "\"]}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] The prefix value is interpolated directly into JSON without escaping. If containerName ever contains characters like ", \, or control characters, this would produce malformed JSON or could be exploited.

Currently containerName is URI-encoded in the constructor (line 321: kj::encodeUriComponent), so characters like " would be encoded as %22 and this is safe in practice. But as a defensive measure, consider using the Cap'n Proto JSON codec to construct the filter (consistent with how other JSON bodies are built in this file), or at minimum add a comment noting the safety relies on the URI-encoding.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 17, 2026

Review posted on PR #6337. Here's a summary of what was posted:

Review body (top-level comment, review ID 3963586130):

  • 4 ranked findings with overall assessment that the design is sound

3 inline comments:

  1. [MEDIUM] container-client.c++:265dockerApiBinaryRequest has maxResponseSize = kj::maxValue default, making it easy to read unbounded Docker responses. Suggested changing the default to MAX_JSON_RESPONSE_SIZE (16 MiB) since callers needing the 1 GiB tar limit already pass it explicitly.

  2. [MEDIUM] container-client.c++:277cleanupSnapshotVolumes constructs the JSON filter by string interpolation without escaping. Safe in practice due to URI-encoding of containerName, but noted as a defensive coding concern.

  3. [LOW] container.c++:44 — Deserialized size not validated against the max limit or double precision boundary. Minor since legitimate values are bounded.

1 reply to kentonv's comment:

  • Confirmed gpanders's reply is correct: JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE (uses jsgSuper typedef), cannot be used with JSG_STRUCT. Cited the JSG README, ser.h, and the 17 existing examples in the codebase.

github run

@gpanders gpanders marked this pull request as ready for review March 17, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants