wip feat: improve middleware performance by 50%#1209
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
✨ No issues found! Your code is sparkling clean! ✨ Need help? Join our Discord for support! |
5fb08e0 to
0472e42
Compare
72ede91 to
cdd6889
Compare
📝 WalkthroughWalkthroughThis PR introduces middleware chain execution functions in Changes
Sequence DiagramsequenceDiagram
actor Client
participant Server as Server
participant BeforeChain as Before Chain
participant Middleware as Middleware(s)
participant AfterChain as After Chain
participant HTTP as HTTP Handler
Client->>Server: Request
Server->>BeforeChain: execute_before_middleware_chain()
rect rgba(100, 150, 200, 0.5)
Note over BeforeChain,Middleware: Detect all_sync?
alt All Synchronous
BeforeChain->>BeforeChain: Single GIL block
BeforeChain->>Middleware: Execute all
else Has Async
loop Each Middleware
BeforeChain->>Middleware: await execute_middleware_function()
end
end
end
alt Early Response from Chain
Middleware-->>BeforeChain: Response
BeforeChain-->>Server: MiddlewareReturn::Response
Server-->>Client: Response
else Request Continues
Middleware-->>BeforeChain: Request
BeforeChain-->>Server: MiddlewareReturn::Request
Server->>HTTP: execute_http_function()
HTTP-->>Server: Response
Server->>AfterChain: execute_after_middleware_chain()
rect rgba(150, 150, 100, 0.5)
AfterChain->>AfterChain: Detect all_sync?
alt All Synchronous
AfterChain->>Middleware: Execute all in GIL
else Has Async
loop Each Middleware
AfterChain->>Middleware: await execution
end
end
end
Middleware-->>AfterChain: Response
AfterChain-->>Server: Updated Response
Server-->>Client: Response
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/executors/mod.rs`:
- Around line 263-298: The after-middleware chain currently only forwards a
Response, breaking the (request, response) contract used by
execute_after_middleware_function; update execute_after_middleware_chain to
accept and pass both &Request and &Response to middleware invocations (use
get_function_output and execute_middleware_function with both references so
after_request(req, resp) callbacks can access request-scoped data), ensure you
handle the returned MiddlewareReturn the same way (only accept
MiddlewareReturn::Response) and update the caller in server.rs to pass &request
into this helper accordingly.
In `@src/server.rs`:
- Around line 517-520: The current early return in index() when
execute_before_middleware_chain returns MiddlewareReturn::Response(r) skips
global response header application and the after-middleware chain; instead,
change the handling so index() continues down the common response path: if
execute_before_middleware_chain yields MiddlewareReturn::Response, store that
Response as the "short-circuit" response while also preserving the most-recent
mutated Request (from MiddlewareReturn::Request or a sidecar), then proceed
through the same global header application and execute_after_middleware_chain on
the response before returning; update the match on
execute_before_middleware_chain (and any variables around before_middlewares) to
propagate both the latest Request and the Response into the standard response
flow rather than returning ResponseType::Standard immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a573752f-50fa-4ab3-a2d6-3721ec7f5e1f
📒 Files selected for processing (2)
src/executors/mod.rssrc/server.rs
| pub async fn execute_after_middleware_chain( | ||
| input: &Response, | ||
| middlewares: &[FunctionInfo], | ||
| ) -> Result<MiddlewareReturn> { | ||
| let mut current_response = input.clone(); | ||
|
|
||
| // Check if all middlewares are sync to optimize GIL usage | ||
| let all_sync = middlewares.iter().all(|m| !m.is_async); | ||
|
|
||
| if all_sync { | ||
| // Execute all sync middlewares in a single GIL acquisition | ||
| Python::with_gil(|py| -> Result<MiddlewareReturn> { | ||
| for middleware in middlewares { | ||
| let output = get_function_output(middleware, py, ¤t_response)?; | ||
|
|
||
| // After middleware should return Response | ||
| match output.extract::<Response>() { | ||
| Ok(response) => current_response = response, | ||
| Err(e) => return Err(e.into()), | ||
| } | ||
| } | ||
|
|
||
| Ok(MiddlewareReturn::Response(current_response)) | ||
| }) | ||
| } else { | ||
| // Fall back to individual execution for mixed sync/async middlewares | ||
| for middleware in middlewares { | ||
| current_response = match execute_middleware_function(¤t_response, middleware).await? { | ||
| MiddlewareReturn::Response(r) => r, | ||
| MiddlewareReturn::Request(_) => { | ||
| return Err(anyhow::anyhow!("After middleware returned a request")) | ||
| } | ||
| }; | ||
| } | ||
|
|
||
| Ok(MiddlewareReturn::Response(current_response)) |
There was a problem hiding this comment.
Preserve the (request, response) contract for after middleware.
This chain runner only passes a Response: the sync path calls get_function_output(...), and the async path calls execute_middleware_function(...). That regresses the behavior implemented by execute_after_middleware_function at Lines 181-217, which passes both request and response. Any after_request(req, resp) middleware will now fail at runtime, and those callbacks can no longer inspect request-scoped data such as path params.
🔧 Suggested fix
-pub async fn execute_after_middleware_chain(
- input: &Response,
+pub async fn execute_after_middleware_chain(
+ request: &Request,
+ input: &Response,
middlewares: &[FunctionInfo],
) -> Result<MiddlewareReturn> {
let mut current_response = input.clone();
@@
if all_sync {
Python::with_gil(|py| -> Result<MiddlewareReturn> {
for middleware in middlewares {
- let output = get_function_output(middleware, py, ¤t_response)?;
+ let output = get_function_output_with_two_args(
+ middleware,
+ py,
+ request,
+ ¤t_response,
+ )?;
@@
} else {
for middleware in middlewares {
- current_response = match execute_middleware_function(¤t_response, middleware).await? {
+ current_response = match execute_after_middleware_function(
+ request,
+ ¤t_response,
+ middleware,
+ )
+ .await? {
MiddlewareReturn::Response(r) => r,
MiddlewareReturn::Request(_) => {
return Err(anyhow::anyhow!("After middleware returned a request"))
}
};
}src/server.rs will need to pass &request into this helper as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/executors/mod.rs` around lines 263 - 298, The after-middleware chain
currently only forwards a Response, breaking the (request, response) contract
used by execute_after_middleware_function; update execute_after_middleware_chain
to accept and pass both &Request and &Response to middleware invocations (use
get_function_output and execute_middleware_function with both references so
after_request(req, resp) callbacks can access request-scoped data), ensure you
handle the returned MiddlewareReturn the same way (only accept
MiddlewareReturn::Response) and update the caller in server.rs to pass &request
into this helper accordingly.
There was a problem hiding this comment.
Resolve — noted, WIP PR. Will fix before merge.
There was a problem hiding this comment.
Rate Limit Exceeded
@sansyrox have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 37 seconds before sending another message.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| request = match execute_before_middleware_chain(&request, &before_middlewares).await { | ||
| Ok(MiddlewareReturn::Request(r)) => r, | ||
| Ok(MiddlewareReturn::Response(r)) => { | ||
| return ResponseType::Standard(r); |
There was a problem hiding this comment.
Avoid returning short-circuited before-middleware responses here.
This exits index() before the common response path at Lines 567-609, so responses produced by before middleware no longer receive global response headers or the after-middleware chain. That will show up on auth/preflight-style short circuits as missing CORS/security headers. If you keep these responses on the normal path, the chain result also needs to preserve the latest mutated Request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/server.rs` around lines 517 - 520, The current early return in index()
when execute_before_middleware_chain returns MiddlewareReturn::Response(r) skips
global response header application and the after-middleware chain; instead,
change the handling so index() continues down the common response path: if
execute_before_middleware_chain yields MiddlewareReturn::Response, store that
Response as the "short-circuit" response while also preserving the most-recent
mutated Request (from MiddlewareReturn::Request or a sidecar), then proceed
through the same global header application and execute_after_middleware_chain on
the response before returning; update the match on
execute_before_middleware_chain (and any variables around before_middlewares) to
propagate both the latest Request and the Response into the standard response
flow rather than returning ResponseType::Standard immediately.
There was a problem hiding this comment.
Resolve — noted, will address before merge.
There was a problem hiding this comment.
Rate Limit Exceeded
@sansyrox have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 52 seconds before sending another message.
|
Do find, is the fix for improvement are really complete? Even do run the benchmark tests and find the performance improvements, and from where that performance is being extracted, which thing caused it to speed up? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== New chain executor functions ==="
cat src/executors/mod.rs
echo ""
echo "=== Server middleware usage ==="
cat src/server.rs
echo ""
echo "=== Look for benchmark files ==="
fd -e rs bench
fd -e toml bench
rg -r "criterion\|bench\|hyperfine\|wrk\|vegeta" --type toml -l
rg -r "criterion\|bench\|hyperfine" --type rs -l
echo ""
echo "=== Cargo.toml for bench deps ==="
cat Cargo.toml
echo ""
echo "=== Existing test files ==="
fd -e rs test
rg --type rs "#\[bench\]" -l
echo ""
echo "=== Any benchmark scripts at repo root ==="
fd bench
fd benchmark
ls -la
echo ""
echo "=== Check for profiling or perf scripts ==="
fd -e sh
fd -e py benchLength of output: 42633 🏁 Script executed: #!/bin/bash
echo "=== benchmark.sh ==="
cat benchmark.sh
echo ""
echo "=== unit_tests contents ==="
ls -la unit_tests/
find unit_tests -type f | head -30
cat unit_tests/*.rs 2>/dev/null || true
find unit_tests -name "*.py" -exec cat {} \;
echo ""
echo "=== integration_tests structure ==="
find integration_tests -type f | head -40
echo ""
echo "=== middleware-related integration tests ==="
rg -l "middleware" integration_tests/
echo ""
echo "=== Any before/after middleware test files ==="
find integration_tests -name "*middleware*" | xargs cat 2>/dev/null | head -200Length of output: 13247
What causes the speedupThe core idea is in let all_sync = middlewares.iter().all(|m| !m.is_async);
if all_sync {
// ✅ ONE GIL acquisition for the entire chain
Python::with_gil(|py| {
for middleware in middlewares { ... }
})
} else {
// ❌ N GIL acquisitions — one per middleware (same as before)
for middleware in middlewares {
execute_middleware_function(...).await?;
}
}Before: Even synchronous middlewares went through This is the real source of the claimed speedup. Is the improvement actually complete? Several gaps:1. ❌ No benchmark numbers exist in the PRThe PR description is a placeholder ("This PR does...."). The "50%" claim is completely unsubstantiated. No before/after numbers were provided. 2. ❌
|
| Path | Middleware returns Request |
|---|---|
| Sync | Raw PyO3 extraction error (cryptic) |
| Async | Err(anyhow!("After middleware returned a request")) (clear) |
The sync path only tries extract::<Response>() and returns a PyO3 error on failure — it never produces the descriptive message.
5. ❌ No unit tests for the new chain functions
unit_tests/ and integration_tests/ contain no tests that specifically exercise execute_before_middleware_chain or execute_after_middleware_chain, particularly the all_sync batching path. The existing test_middlewares.py is a functional correctness test, not a performance or path-coverage test.
6. ❌ execute_after_middleware_chain async fallback passes a Response to execute_middleware_function
execute_middleware_function<T> tries both extract::<Response> and extract::<Request>. When T = Response, trying to extract a Request from a Response object is a no-op failure that gets silently swallowed — this is latent behavior inconsistency vs the sync path.
Summary
| Question | Answer |
|---|---|
| What causes the speedup? | Reducing N Python::with_gil lock acquire/release cycles to 1 for all-sync middleware chains |
| Is the 50% claim proven? | ❌ No — no benchmark numbers, and benchmark.sh doesn't even test middleware routes |
| Does it help async middleware users? | ❌ No — async chains still execute one-by-one |
| Is the implementation complete? | |
| Is the PR description complete? | ❌ No — it's a placeholder with no linked issue, no description, no data |
The PR author (@sansyrox) should be asked to:
- Add actual benchmark numbers (run
benchmark.shagainst a middleware-heavy route, before and after) - Add a benchmark route to
benchmark.shor a dedicated middleware benchmark script - Fix the inconsistent
Request-returned-from-after-middleware error handling in the sync path - Add unit/integration tests specifically for the chain batching path
|
Can we add optimization even for async paths, mixed sync & async paths too, how? If not, then why, reason for it? Detailed explanation for that too |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== Full executors/mod.rs ==="
cat src/executors/mod.rs
echo ""
echo "=== execute_middleware_function implementation ==="
rg -n "execute_middleware_function\|execute_before_middleware\|execute_after_middleware\|with_gil\|is_async\|FunctionInfo\|MiddlewareReturn" src/executors/mod.rs
echo ""
echo "=== FunctionInfo struct ==="
rg -n "struct FunctionInfo\|is_async\|pub fn call\|fn call" src/ -A5
echo ""
echo "=== MiddlewareReturn definition ==="
rg -n "MiddlewareReturn\|enum Middleware" src/ -A5Length of output: 13850
1. Why pure async middleware chains cannot be further batchedFor every async Python function, // Step 1: GIL acquire `#1` — convert the Python coroutine into a Rust future
let output: Py<PyAny> = Python::with_gil(|py| {
pyo3_async_runtimes::tokio::into_future(get_function_output(function, py, input)?)
})?
// Step 2: GIL is RELEASED here — Tokio drives the Python coroutine
.await?;
// Step 3: GIL acquire `#2` — extract the Rust value from the Python result
Python::with_gil(|py| { output.extract::<Response>(py) ... })The 2. Why concurrent/parallel execution of async middlewares is also impossibleYou might think: "run No two middlewares in a chain can be parallelized since they are data-dependent on each other. 3. Mixed (sync + async) chains can be partially optimizedThis is where there is real untapped improvement. The idea is run-length batching: group consecutive sync middlewares into one DiagramFor a more extreme case: What the implementation would look likepub async fn execute_before_middleware_chain(
input: &Request,
middlewares: &[FunctionInfo],
) -> Result<MiddlewareReturn> {
let mut current_request = input.clone();
let mut i = 0;
while i < middlewares.len() {
// Find the end of the next consecutive sync run
let sync_run_end = middlewares[i..]
.iter()
.position(|m| m.is_async) // first async index relative to i
.map(|pos| i + pos)
.unwrap_or(middlewares.len()); // no async → rest are all sync
// Batch-execute all consecutive sync middlewares in ONE GIL acquisition
if i < sync_run_end {
let sync_slice = &middlewares[i..sync_run_end];
let result = Python::with_gil(|py| -> Result<MiddlewareReturn> {
let mut req = current_request.clone();
for mw in sync_slice {
let output = get_function_output(mw, py, &req)?;
match output.extract::<Response>() {
Ok(resp) => return Ok(MiddlewareReturn::Response(resp)),
Err(_) => req = output.extract::<Request>()
.map_err(anyhow::Error::from)?,
}
}
Ok(MiddlewareReturn::Request(req))
})?;
match result {
MiddlewareReturn::Response(r) => return Ok(MiddlewareReturn::Response(r)),
MiddlewareReturn::Request(r) => current_request = r,
}
i = sync_run_end;
}
// Execute exactly ONE async middleware (cannot batch; GIL must be released for .await)
if i < middlewares.len() {
// middlewares[i] is guaranteed async here
current_request = match execute_middleware_function(¤t_request, &middlewares[i]).await? {
MiddlewareReturn::Request(r) => r,
MiddlewareReturn::Response(r) => return Ok(MiddlewareReturn::Response(r)),
};
i += 1;
}
}
Ok(MiddlewareReturn::Request(current_request))
}4. Summary table
5. Bottom line
The PR as written handles only the all-sync fast path. Adding run-length batching for mixed chains would make the optimization genuinely general and would justify the "50%" headline more robustly across real-world middleware configurations. |
|
So it means the using one gil for all sync tasks, is meant for python 3.14+? Is it any possibility even for async tasks too? If we try our own rust layer to handle async tasks, rather than pyo3, will it helps us? How, in which terms, areas, improvements? |
|
Tip For best results, initiate chat on the files or code changes.
1. Is the single-GIL-for-sync optimization Python 3.14+ only?No — it works on any Python version PyO3 supports (3.7+). You may be thinking of these Python initiatives, which are related but different:
The optimization in this PR doesn't remove the GIL — it reduces how many times you acquire and release it. Acquiring the GIL is a mutex lock/unlock (a syscall), which is expensive when done N times per request. Doing it once for N sync middlewares instead of N times is pure performance gain on any Python version. 2. Can a custom Rust async layer bypass PyO3 and optimize async tasks?Fundamentally no — and here is the exact reason. The GIL is a CPython implementation constraint, not a PyO3 constraint. PyO3 is just a safe Rust binding; it did not invent the GIL. Any code path that runs Python bytecode inside CPython must hold the GIL — no matter what Rust library you use. What You cannot do it in fewer than 2 GIL acquire/release cycles for any async Python function. A custom Rust layer would arrive at the same floor. 3. Where a custom Rust layer WOULD actually helpEven though it can't eliminate the GIL cost, a custom Rust layer or different architecture could help in specific ways: a) Deferred extraction — reduce
|
| Strategy | GIL operations |
|---|---|
| Current | 2×5 = 10 |
| Deferred extraction | 5 × 1 (launch) + 5 × 1 (drive/await) + 1 (final extract) = 11 — actually worse due to re-passing overhead |
In practice this is only a win if extraction itself is expensive (e.g., deep struct clone).
b) Move hot middleware to Rust entirely (the real win)
If common middleware — e.g., JWT validation, rate limiting, CORS checks — is rewritten as native Rust middleware rather than Python functions, the GIL disappears entirely for those steps:
sequenceDiagram
participant R as Rust Executor
participant P as Python (GIL)
Note over R: auth check (Rust native) — NO GIL
Note over R: rate limit (Rust native) — NO GIL
R->>P: call user Python middleware (GIL acquired)
P-->>R: result
Note over R: response transform (Rust native) — NO GIL
This is architecturally the most impactful path — move infrastructure middleware to Rust, keep business-logic middleware in Python.
c) Python 3.13+ free-threaded mode (no-GIL build)
If Robyn adds support for Python 3.13+ compiled with --disable-gil, then multiple with_gil calls don't actually serialize anymore — they become near-zero-cost. This would make the async case as fast as the current PR's sync case, without any code changes to the executor.
Standard CPython: N async middlewares → 2N serial GIL acquisitions
Free-threaded 3.13+: N async middlewares → 2N concurrent, non-serializing GIL calls
Summary
| Scenario | Can a custom Rust layer help? | Real solution |
|---|---|---|
| Sync-only chains | ✅ Already done in this PR | Already optimal |
| Async-only chains | ❌ GIL is CPython-level, not PyO3-level | Free-threaded Python 3.13+, or rewrite in Rust |
| Mixed chains | Already described in previous comment | |
| Infrastructure middleware (auth, rate-limit) | ✅ Rewrite those in native Rust | Big win, no GIL at all |
The real 10x improvement path for Robyn is: native Rust middleware for common cross-cutting concerns + free-threaded Python 3.13+ support for user async middleware.
(ᓀ‸ᓂ)
|
Tip For best results, initiate chat on the files or code changes.
1. Is the "one GIL for all sync tasks" optimization Python 3.14+ only?No — it works on every Python version (3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, …). The optimization is entirely on the Rust side. This is pure Rust-side scheduling logic. Python itself doesn't change — you just hold the lock for longer in one stretch instead of repeatedly asking for it.
2. Can a custom Rust async layer eliminate GIL overhead for async Python functions?Short answer: No — not without fundamentally changing what Robyn allows users to write. Here's why, broken into layers: Why the constraint is fundamentalWhen a user writes an
What a custom Rust layer could and could not do
The one genuine opportunity: Python 3.13 free-threaded modegraph LR
A["Standard CPython (3.x)"] -->|"GIL = 1 global mutex"| B["Sequential Python execution\nin any thread"]
C["Free-threaded Python 3.13t"] -->|"Per-object locks instead"| D["True parallel Python\nacross threads"]
D --> E["N async middlewares could overlap\nif they're I/O-bound"]
If Robyn adds a build-time flag to target
Summary
🐇 🤔 |
Description
This PR fixes #
Summary
This PR does....
PR Checklist
Please ensure that:
Pre-Commit Instructions:
Summary by CodeRabbit