Conversation
a788f4a to
be86e28
Compare
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR adds a containers snapshot API: snapshotDirectory() to create snapshots and snapshots option in start() to restore them, backed by Docker volumes and the Docker archive API for the local dev implementation.
Findings (ranked by severity):
- [MEDIUM] Snapshot restore path nesting — archive read from
/mntintroduces extra directory level - [LOW]
dockerApiBinaryRequestreads entire response into memory unbounded before size check - [LOW] Deserialized snapshot
idis not validated before use in Docker volume names
These review comments were generated by an AI assistant and may not be perfect.
|
I'm Bonk, and I've done a quick review of your PR. This PR adds a containers snapshot API: I posted 3 inline review comments on PR #6337, ranked by severity:
The overall design is sound — the serialization, compat flag gating, capnp schema additions, and test coverage all look good. The |
|
The generated output of Full Type Diffdiff -r types/generated-snapshot/latest/index.d.ts bazel-bin/types/definitions/latest/index.d.ts
3757a3758,3763
> interface DirectorySnapshot {
> get id(): string;
> get size(): number;
> get dir(): string;
> get name(): string | undefined;
> }
3773a3780
> snapshots?: DirectorySnapshot[];
diff -r types/generated-snapshot/latest/index.ts bazel-bin/types/definitions/latest/index.ts
3763a3764,3769
> export interface DirectorySnapshot {
> get id(): string;
> get size(): number;
> get dir(): string;
> get name(): string | undefined;
> }
3779a3786
> snapshots?: DirectorySnapshot[]; |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6337 +/- ##
==========================================
- Coverage 70.80% 70.64% -0.16%
==========================================
Files 420 420
Lines 112299 112606 +307
Branches 18417 18459 +42
==========================================
+ Hits 79513 79552 +39
- Misses 21789 22056 +267
- Partials 10997 10998 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| return size; | ||
| } | ||
|
|
||
| JSG_RESOURCE_TYPE(DirectorySnapshot) { |
There was a problem hiding this comment.
Given this type seems to be a simple data structure containing four properties and no methods, would it make sense to define it as a JSG_STRUCT instead?
Structs will generally be a lot more efficient than JSG_RESOURCE_TYPE.
There was a problem hiding this comment.
IIUC we need JSG_RESOURCE_TYPE so that we can make the value serializable (and one day track references to these snapshots through DO storage) as we discussed in the RFC.
There was a problem hiding this comment.
gpanders's reply is correct. JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE — it uses the jsgSuper typedef that only JSG_RESOURCE_TYPE defines (via static_assert(static_cast<uint>(jsgSuper::jsgSerializeTag) != static_cast<uint>(TAG))). JSG_STRUCT does not define jsgSuper and cannot use JSG_SERIALIZABLE.
The JSG README (line 163-164) and ser.h (line 19) both document this constraint. There are zero examples of JSG_STRUCT + JSG_SERIALIZABLE across the entire codebase — all 17 uses are on jsg::Object subclasses.
So using JSG_RESOURCE_TYPE here is the correct choice for a type that needs custom serialization for DO storage persistence.
be86e28 to
509974a
Compare
|
/bonk review |
There was a problem hiding this comment.
Findings from detailed review (ranked by severity):
- [MEDIUM]
dockerApiBinaryRequestdefaultmaxResponseSize = kj::maxValue— PUT response bodies read without explicit limit - [MEDIUM]
cleanupSnapshotVolumesconstructs JSON filter without escapingprefix - [LOW] Deserialized
sizenot validated inDirectorySnapshot::deserialize— could yield adoublethat doesn't round-trip exactly for very large values - [LOW] Unnecessary
staticon free functionvalidateSnapshotDir(minor style)
Re: kentonv's JSG_STRUCT question — gpanders's reply is correct: JSG_SERIALIZABLE requires JSG_RESOURCE_TYPE (it references jsgSuper which only JSG_RESOURCE_TYPE defines). There are zero examples of JSG_STRUCT + JSG_SERIALIZABLE in the codebase, and the JSG README explicitly documents this constraint.
The overall design is sound: the snapshot create/restore round-trip via Docker volumes and the archive API is correct, the compat-flag gating is proper, the serialization format is versioned, and the cleanup logic (KJ_DEFER + volumeCommitted flag) handles partial failures well. The path nesting fix in the latest commit correctly addresses the earlier review finding.
These review comments were generated by an AI assistant and may not be perfect.
| auto binds = hostConfig.initBinds(1); | ||
| binds.set(0, kj::str(volumeName, ":", mountPath)); | ||
|
|
||
| auto response = co_await dockerApiRequest(network, kj::str(dockerPath), kj::HttpMethod::POST, |
| kj::HttpMethod method, | ||
| kj::String endpoint, | ||
| kj::Maybe<kj::Array<kj::byte>> body = kj::none, | ||
| uint64_t maxResponseSize = kj::maxValue) { |
There was a problem hiding this comment.
[MEDIUM] Default maxResponseSize = kj::maxValue makes it easy to accidentally read unbounded data from Docker.
The two PUT call sites (lines 1114 and 1235) rely on this default, meaning the Docker response body is read with no limit. While PUT responses from the Docker archive API are typically small status bodies, it's better to be explicit. Consider either:
- Removing the default and requiring all callers to pass an explicit limit, or
- Setting a reasonable default like
MAX_JSON_RESPONSE_SIZE(16 MiB) which is already defined
| uint64_t maxResponseSize = kj::maxValue) { | |
| uint64_t maxResponseSize = MAX_JSON_RESPONSE_SIZE) { |
This would make the code defensive by default — callers that need the larger 1 GiB tar limit already pass MAX_SNAPSHOT_TAR_SIZE explicitly.
| kj::Promise<void> cleanupSnapshotVolumes( | ||
| kj::Network& network, kj::String dockerPath, kj::String containerName) { | ||
| auto prefix = kj::str(SNAPSHOT_VOLUME_PREFIX, containerName, "-"); | ||
| auto filterJson = kj::str("{\"name\":[\"", prefix, "\"]}"); |
There was a problem hiding this comment.
[MEDIUM] The prefix value is interpolated directly into JSON without escaping. If containerName ever contains characters like ", \, or control characters, this would produce malformed JSON or could be exploited.
Currently containerName is URI-encoded in the constructor (line 321: kj::encodeUriComponent), so characters like " would be encoded as %22 and this is safe in practice. But as a defensive measure, consider using the Cap'n Proto JSON codec to construct the filter (consistent with how other JSON bodies are built in this file), or at minimum add a comment noting the safety relies on the URI-encoding.
|
Review posted on PR #6337. Here's a summary of what was posted: Review body (top-level comment, review ID 3963586130):
3 inline comments:
1 reply to kentonv's comment:
|
Implement the first draft of the containers snapshot API. It is experimental for now.
Local dev snapshots are implemented using Docker volumes and the Docker archive API.