Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions docs/content.zh/docs/ops/state/state_backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,15 @@ Flink 内置了以下这些开箱即用的 state backends :

在 *HashMapStateBackend* 内部,数据以 Java 对象的形式存储在堆中。 Key/value 形式的状态和窗口算子会持有一个 hash table,其中存储着状态值、触发器。

HashMapStateBackend 的局限:

- 状态大小受限于 TaskManager 上可用的 JVM 堆内存。从 checkpoint 或 savepoint 恢复时,每个 TaskManager 必须有足够的堆内存来容纳其分配到的状态。
- 仅支持全量快照,不支持增量 checkpoint。每次 checkpoint 都会捕获完整的状态,随着状态规模增长,checkpoint 时长和恢复时间也会随之延长。

HashMapStateBackend 的适用场景:

- 有较大 state,较长 window 和较大 key/value 状态的 Job。
- 所有的高可用场景
- 状态可以完全放入 TaskManager JVM 堆内存的 Job,需要快速、基于内存的状态访问
- 对延迟敏感、希望避免每次状态访问都进行序列化/反序列化开销的 Job

建议同时将 [managed memory]({{< ref "docs/deployment/memory/mem_setup_tm" >}}#managed-memory) 设为0,以保证将最大限度的内存分配给 JVM 上的用户代码。

Expand Down
9 changes: 7 additions & 2 deletions docs/content/docs/ops/state/state_backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ If nothing else is configured, the system will use the HashMapStateBackend.
The *HashMapStateBackend* holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables
that store the values, triggers, etc.

Limitations of the HashMapStateBackend:

- State size is bounded by the JVM heap available on the TaskManagers. Restoring a checkpoint or savepoint requires that each TaskManager has enough heap to hold its share of the state.
- Only full snapshots are supported as incremental checkpoints are not available. Every checkpoint captures the complete state, which can lengthen checkpoint duration and recovery time as state grows.

The HashMapStateBackend is encouraged for:

- Jobs with large state, long windows, large key/value states.
- All high-availability setups.
- Jobs whose state fits comfortably in the JVM heap of the TaskManagers, where fast, in-memory state access is the priority.
- Jobs with low-latency requirements that benefit from avoiding de-/serialization on every state access.
Comment thread
Dennis-Mircea marked this conversation as resolved.

It is also recommended to set [managed memory]({{< ref "docs/deployment/memory/mem_setup_tm" >}}#managed-memory) to zero.
This will ensure that the maximum amount of memory is allocated for user code on the JVM.
Expand Down