diff --git a/docs/content.zh/docs/ops/state/state_backends.md b/docs/content.zh/docs/ops/state/state_backends.md index 685fe236cd7ab..3bd98b209200a 100644 --- a/docs/content.zh/docs/ops/state/state_backends.md +++ b/docs/content.zh/docs/ops/state/state_backends.md @@ -56,10 +56,15 @@ Flink 内置了以下这些开箱即用的 state backends : 在 *HashMapStateBackend* 内部,数据以 Java 对象的形式存储在堆中。 Key/value 形式的状态和窗口算子会持有一个 hash table,其中存储着状态值、触发器。 +HashMapStateBackend 的局限: + + - 状态大小受限于 TaskManager 上可用的 JVM 堆内存。从 checkpoint 或 savepoint 恢复时,每个 TaskManager 必须有足够的堆内存来容纳其分配到的状态。 + - 仅支持全量快照,不支持增量 checkpoint。每次 checkpoint 都会捕获完整的状态,随着状态规模增长,checkpoint 时长和恢复时间也会随之延长。 + HashMapStateBackend 的适用场景: - - 有较大 state,较长 window 和较大 key/value 状态的 Job。 - - 所有的高可用场景。 + - 状态可以完全放入 TaskManager JVM 堆内存的 Job,需要快速、基于内存的状态访问。 + - 对延迟敏感、希望避免每次状态访问都进行序列化/反序列化开销的 Job。 建议同时将 [managed memory]({{< ref "docs/deployment/memory/mem_setup_tm" >}}#managed-memory) 设为0,以保证将最大限度的内存分配给 JVM 上的用户代码。 diff --git a/docs/content/docs/ops/state/state_backends.md b/docs/content/docs/ops/state/state_backends.md index 6f04e495882f4..52b6d2d0a541b 100644 --- a/docs/content/docs/ops/state/state_backends.md +++ b/docs/content/docs/ops/state/state_backends.md @@ -53,10 +53,15 @@ If nothing else is configured, the system will use the HashMapStateBackend. The *HashMapStateBackend* holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables that store the values, triggers, etc. +Limitations of the HashMapStateBackend: + + - State size is bounded by the JVM heap available on the TaskManagers. Restoring a checkpoint or savepoint requires that each TaskManager has enough heap to hold its share of the state. + - Only full snapshots are supported as incremental checkpoints are not available. Every checkpoint captures the complete state, which can lengthen checkpoint duration and recovery time as state grows. + The HashMapStateBackend is encouraged for: - - Jobs with large state, long windows, large key/value states. - - All high-availability setups. + - Jobs whose state fits comfortably in the JVM heap of the TaskManagers, where fast, in-memory state access is the priority. + - Jobs with low-latency requirements that benefit from avoiding de-/serialization on every state access. It is also recommended to set [managed memory]({{< ref "docs/deployment/memory/mem_setup_tm" >}}#managed-memory) to zero. This will ensure that the maximum amount of memory is allocated for user code on the JVM.