Skip to content

docs(alluxio): add S3 high-concurrency read tuning guide#5874

Open
CAICAIIs wants to merge 3 commits into
fluid-cloudnative:masterfrom
CAICAIIs:docs/issue-5802-alluxio-s3-high-concurrency
Open

docs(alluxio): add S3 high-concurrency read tuning guide#5874
CAICAIIs wants to merge 3 commits into
fluid-cloudnative:masterfrom
CAICAIIs:docs/issue-5802-alluxio-s3-high-concurrency

Conversation

@CAICAIIs
Copy link
Copy Markdown
Contributor

Ⅰ. Describe what this PR does

This PR adds bilingual documentation for a verified AlluxioRuntime + S3 high-concurrency read tuning guide.

It documents the investigation result from issue #5802:

  • Fluid v1.0.8 + Alluxio 2.9.5 can reproduce the fio high-concurrency hang.
  • Fluid master at the time of investigation + Alluxio 2.9.5 still reproduces the same behavior.
  • The validation suggests the issue is mainly related to Alluxio 2.9.5 FUSE/client read-path pressure under high-concurrency S3 reads, rather than Fluid controller logic.

The new docs provide:

  • the fio reproduction command
  • observed failure symptoms
  • recommended AlluxioRuntime properties and FUSE args
  • Dataset and test Pod examples
  • validation results from the reproduced environment
  • risks and scope of the tuning configuration

This is intentionally a docs/example PR first. It does not change controller behavior or AlluxioRuntime defaults.

Ⅱ. Does this pull request fix one issue?

Addresses #5802

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

No code tests are added because this is a documentation-only change.

The tuning configuration documented here was validated in the reproduced environment:

  • fio numjobs=8/16/32/64 passed
  • repeated numjobs=64 passed
  • test Pods deleted normally
  • Alluxio master/worker/fuse stayed Running with restart count 0
  • no DeadlineExceededRuntimeException, Timer expired, or OutOfDirectMemoryError

Ⅳ. Describe how to verify it

Review the rendered docs:

  • docs/en/samples/alluxio_s3_high_concurrency.md
  • docs/zh/samples/alluxio_s3_high_concurrency.md

Local checks run:

  • git diff --check upstream/master...HEAD
  • check_dco.sh upstream/master
  • check_pr.sh --base upstream/master

Ⅴ. Special notes for reviews

This PR documents a tuning/configuration guide, not an upstream Alluxio internal fix.

The documented settings are intended for S3-compatible high-concurrency read workloads similar to #5802. Different S3 backends, object sizes, network latency, or concurrency levels may still require tuning.

If maintainers prefer productizing this in Fluid after reviewing the docs, I can follow up with a separate opt-in implementation PR.

CAICAIIs added 2 commits May 14, 2026 12:27
Signed-off-by: CAICAIIs <3360776475@qq.com>
Signed-off-by: CAICAIIs <3360776475@qq.com>
@fluid-e2e-bot
Copy link
Copy Markdown

fluid-e2e-bot Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cheyang for approval by writing /assign @cheyang in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fluid-e2e-bot
Copy link
Copy Markdown

fluid-e2e-bot Bot commented May 14, 2026

Hi @CAICAIIs. Thanks for your PR.

I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new tuning guide in both English and Chinese for optimizing AlluxioRuntime performance during high-concurrency read operations from S3-compatible backends. The documentation provides a validated configuration profile, including JVM options and FUSE arguments, to address potential hangs and stability issues. Feedback from the review suggests improving the examples by using generic placeholders for local paths and specifying a container image that includes the 'fio' utility to ensure the test scenarios are reproducible.

Comment thread docs/en/samples/alluxio_s3_high_concurrency.md Outdated
Comment thread docs/en/samples/alluxio_s3_high_concurrency.md Outdated
Comment thread docs/zh/samples/alluxio_s3_high_concurrency.md Outdated
Comment thread docs/zh/samples/alluxio_s3_high_concurrency.md Outdated
Signed-off-by: CAICAIIs <3360776475@qq.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.13%. Comparing base (9291dee) to head (7828644).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5874   +/-   ##
=======================================
  Coverage   59.13%   59.13%           
=======================================
  Files         480      480           
  Lines       32611    32611           
=======================================
  Hits        19284    19284           
  Misses      11759    11759           
  Partials     1568     1568           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud
Copy link
Copy Markdown

@cheyang
Copy link
Copy Markdown
Collaborator

cheyang commented May 14, 2026

/ok-to-test

@cheyang
Copy link
Copy Markdown
Collaborator

cheyang commented May 14, 2026

/lgtm /approve

This is a comment-only PR (documentation addition). All required checks pass, DCO verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants