fix(performance): stream remote pdf downloads to reduce memory usage#551
fix(performance): stream remote pdf downloads to reduce memory usage#551Namraa310806 wants to merge 4 commits into
Conversation
|
@Namraa310806 is attempting to deploy a commit to the firefistisdead's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Review limit reached
More reviews will be available in 48 minutes and 14 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
This PR improves the
/process-from-urldocument ingestion pipeline by replacing memory-intensive PDF buffering with a streaming-based download and processing workflow.Previously, remote PDF files were downloaded into memory using an
ArrayBufferand converted into aBufferbefore processing. Under concurrent workloads, multiple large PDF uploads could significantly increase memory consumption, resulting in excessive garbage collection, degraded performance, and potential service instability.This update introduces a streaming pipeline that minimizes memory usage, improves scalability, and strengthens resilience against resource exhaustion attacks.
Changes Made
Streaming-Based PDF Processing
Large File Handling Improvements
Resource Management
Error Handling Enhancements
Test Coverage
Added tests covering:
Performance Impact
Before
After
Security Impact
This change reduces the risk of:
Files Modified
Verification Checklist
Related Issue
Fixes: #502