Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR adds hostname glob matching to interceptOutboundHttp and teaches interceptAllOutboundHttp to intercept hostname-based traffic via DNS resolution in the sidecar. Changes span the Cap'n Proto schema, sidecar command-line args, egress mapping logic, and the sidecar API payload.
Findings (highest severity first):
- [HIGH]
status()still callsupdateSidecarEgressPort(old method) instead ofupdateSidecarEgressConfigatcontainer-client.c++:995. On workerd restart recovery, the sidecar won't receive DNS allow-hostnames or internet config. This line is outside the diff so I've posted a separate comment. - [LOW]
EgressMappinguses two separateMaybefields (cidr,hostnameGlob) for a mutually-exclusive value — akj::OneOfwould enforce the invariant at the type level.
This review was generated by an AI assistant and may contain inaccuracies. Please verify suggestions before applying.
|
[HIGH] When workerd restarts and recovers a running container+sidecar, the sidecar will only receive the egress port — it will not get the DNS allow-hostnames list or the internet-enabled config. This means hostname-glob intercepts and internet settings will not survive a workerd restart. This line is outside the diff so I cannot post an inline suggestion, but the fix is: // container-client.c++:995
co_await updateSidecarEgressConfig(sidecar.ingressHostPort, egressListenerPort);(replacing This comment was generated by an AI assistant and may contain inaccuracies. |
|
Review posted on PR #6320. Summary of findings:
The rest of the PR (capnp schema additions, glob matching logic, test coverage, BUILD.bazel dep) looked correct. The capnp schema changes are additive (new fields with new ordinals), the hostname glob matching algorithm is a standard |
Co-authored-by: gabivlj <gabivlj@users.noreply.github.com>
I think I disagree because when we recover, we don't have the data in the class. The hostname data still lives in egress interception and we will process requests normally in workerd. The user will call intercept outboudn again and it will refresh teh outbound mappings in the class |
5be2d48 to
f445db7
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6320 +/- ##
==========================================
- Coverage 70.80% 70.71% -0.10%
==========================================
Files 422 422
Lines 112347 112497 +150
Branches 18411 18445 +34
==========================================
Hits 79547 79547
- Misses 21806 21956 +150
Partials 10994 10994 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // Whether general internet access is enabled for this container | ||
| bool internetEnabled = false; | ||
| // Whether general internet access is enabled for this container, when known. | ||
| kj::Maybe<bool> internetEnabled = kj::none; |
There was a problem hiding this comment.
Can we just use an enum for this?
There was a problem hiding this comment.
not sure how an enum would be better here... but if we don't want bool for whatever reason a WD_STRONG_BOOL would likely be better.
| headers.forEach([&](kj::StringPtr headerName, kj::StringPtr value) { | ||
| if (result == kj::none && workerd::strcaseeq(headerName, name)) { | ||
| result = value; | ||
| } |
There was a problem hiding this comment.
Nit: it's too bad that forEach will unconditionally loop through every header even when we find the match. Maybe something we can update in kj later would be to allow the forEach to return a bool... if false is returned, iteration short circuits (@kentonv ... opinions?) ... obviously non-blocking here tho.
| return {.destination = kj::mv(cidr), .port = hostAndPort.port}; | ||
| } | ||
|
|
||
| return {.destination = workerd::toLower(hostAndPort.host), .port = hostAndPort.port}; |
There was a problem hiding this comment.
If you add the trailing comma after the last item in these and format them like this the clang-format results will be easier to read...
| return {.destination = workerd::toLower(hostAndPort.host), .port = hostAndPort.port}; | |
| return { | |
| .destination = workerd::toLower(hostAndPort.host), | |
| .port = hostAndPort.port, | |
| }; |
|
|
||
| uint32_t cmdSize = | ||
| 6; // --http-egress-port <port> --http-ingress-address 0.0.0.0:<port> --docker-gateway-cidr <cidr> | ||
| 7; // --http-egress-port <port> --http-ingress-address 0.0.0.0:<port> --docker-gateway-cidr <cidr> --dns-enabled |
There was a problem hiding this comment.
I know this isn't new, but this could use a code comment about how the number is determined. non-blocking.
f445db7 to
da9a827
Compare
…bs, and make sure that interceptAllOutboundHttp can intercept all hostname globs How this works is that we will start resolving DNS queries for the user in local dev. When they intercept HTTP, they will be able to specify which hostnames they want to receive traffic from. With interceptAllOutboundHttp, they are specifying they want to receive all traffic, even non existent domains, to their Worker. The user can decide which domains they want to intercept by just using interceptOutboundHttp before it hits their worker.
da9a827 to
b0df1a3
Compare
How this works is that we will start resolving DNS queries for the user in local dev. When they intercept HTTP, they will be able to specify which hostnames they want to receive traffic from. With interceptAllOutboundHttp, they are specifying they want to receive all traffic, even non existent domains, to their Worker.
The user can decide which domains they want to intercept by just using interceptOutboundHttp before it hits their worker.