feat: add ENTITY_ATTRIBUTES support for domain-specific entity enrich… by akshay-saraswat · Pull Request #2986 · HKUDS/LightRAG

akshay-saraswat · 2026-04-27T04:35:31Z

Allows operators to configure per-entity attribute extraction via the ENTITY_ATTRIBUTES environment variable (JSON list of attribute names), without any schema migrations or breaking changes to existing deployments.

Domain-specific LightRAG deployments often need entities enriched with context beyond name, type, and description. For example:

A customer support knowledge graph benefits from sentiment and urgency entities.
A research graph benefits from confidence scores on extracted claims.

Previously this required forking and hand-editing the extraction prompt with no supported mechanism to pass the attribute list through config.

The feature is implemented as a clean opt-in extension to the existing entity-extraction pipeline:

ENTITY_ATTRIBUTES env var (default: empty list) is parsed in config.py and passed through addon_params → extract_entities, the same path used by ENTITY_TYPES.
When non-empty, two placeholders are injected into the extraction prompts:
- {entity_attributes}: the comma-separated attribute list
- {entity_attributes_instruction}: a human-readable instruction string telling the LLM to append a compact single-line JSON object as a 5th field on each entity line.
_handle_single_entity_extraction now accepts 4 OR 5 fields. The 5th field is parsed as JSON → dict; malformed JSON is logged and silently dropped so extraction still succeeds for the entity itself.
Attributes are merged (last non-null value per key wins) across all extraction instances of the same entity, both in _merge_nodes_then_upsert (normal insert path) and _rebuild_single_entity (rebuild path).
The merged attributes dict is serialised as a JSON string and stored in the graph node under the key 'attributes'. All existing graph backends (NetworkX, Postgres, Neo4j, etc.) can round-trip it without schema changes. Consumers deserialise with json.loads.

Entity line with attributes:
entity<|#|>Pain Point X<|#|>painpoint<|#|>Users report...<|#|>{"sentiment":"negative","urgency":"high","confidence":0.87}

Stored node property:
"attributes": "{"sentiment": "negative", "urgency": "high", "confidence": 0.87}"

ENTITY_ATTRIBUTES defaults to [] → extraction prompt is byte-for-byte identical to pre-patch behaviour.
Parser still accepts 4-field records → existing extracted graphs are unaffected.
5-field records when ENTITY_ATTRIBUTES is empty are treated as malformed (entity still extracted without attributes).

ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'

Combined with ENTITY_TYPES for a customer support graph:

ENTITY_TYPES='["Person","Product","PainPoint"]'
ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'

Description

Adds ENTITY_ATTRIBUTES env var (JSON list of strings, default []) that requests extra per-entity attributes to be extracted alongside name/type/description.
When non-empty, injects an instruction into the extraction prompt asking the LLM to append a compact single-line JSON object as a 5th field on each entity line.
Parses and merges the 5th field across all extraction instances of the same entity; stored as a JSON string under node["attributes"].
Zero change to default behaviour — empty ENTITY_ATTRIBUTES produces byte-identical prompts and 4-field records as before.

Related Issues

N/A

Changes Made

File	Change
lightrag/constants.py	DEFAULT_ENTITY_ATTRIBUTES = []
lightrag/api/config.py	Read ENTITY_ATTRIBUTES env var; pass into args.entity_attributes
lightrag/api/lightrag_server.py	Pass entity_attributes through addon_params
lightrag/prompt.py	Document optional 5th field in system prompt; add {entity_attributes_instruction} to user prompt
lightrag/operate.py	Read from addon_params; build instruction string; thread through _process_extraction_result and _handle_single_entity_extraction; merge and store attributes in both insert and rebuild paths

Checklist

Changes tested locally
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

N/A

…ment Allows operators to configure per-entity attribute extraction via the ENTITY_ATTRIBUTES environment variable (JSON list of attribute names), without any schema migrations or breaking changes to existing deployments. Domain-specific LightRAG deployments often need entities enriched with context beyond name, type, and description. For example: - A customer support knowledge graph benefits from sentiment and urgency entities. - A research graph benefits from confidence scores on extracted claims. Previously this required forking and hand-editing the extraction prompt with no supported mechanism to pass the attribute list through config. The feature is implemented as a clean opt-in extension to the existing entity-extraction pipeline: 1. ENTITY_ATTRIBUTES env var (default: empty list) is parsed in config.py and passed through addon_params → extract_entities, the same path used by ENTITY_TYPES. 2. When non-empty, two placeholders are injected into the extraction prompts: - {entity_attributes}: the comma-separated attribute list - {entity_attributes_instruction}: a human-readable instruction string telling the LLM to append a compact single-line JSON object as a 5th field on each entity line. 3. _handle_single_entity_extraction now accepts 4 OR 5 fields. The 5th field is parsed as JSON → dict; malformed JSON is logged and silently dropped so extraction still succeeds for the entity itself. 4. Attributes are merged (last non-null value per key wins) across all extraction instances of the same entity, both in _merge_nodes_then_upsert (normal insert path) and _rebuild_single_entity (rebuild path). 5. The merged attributes dict is serialised as a JSON string and stored in the graph node under the key 'attributes'. All existing graph backends (NetworkX, Postgres, Neo4j, etc.) can round-trip it without schema changes. Consumers deserialise with json.loads. Entity line with attributes: entity<|#|>Pain Point X<|#|>painpoint<|#|>Users report...<|#|>{"sentiment":"negative","urgency":"high","confidence":0.87} Stored node property: "attributes": "{\"sentiment\": \"negative\", \"urgency\": \"high\", \"confidence\": 0.87}" - ENTITY_ATTRIBUTES defaults to [] → extraction prompt is byte-for-byte identical to pre-patch behaviour. - Parser still accepts 4-field records → existing extracted graphs are unaffected. - 5-field records when ENTITY_ATTRIBUTES is empty are treated as malformed (entity still extracted without attributes). ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]' Combined with ENTITY_TYPES for a customer support graph: ENTITY_TYPES='["Person","Product","PainPoint"]' ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'

akshay-saraswat marked this pull request as draft April 29, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ENTITY_ATTRIBUTES support for domain-specific entity enrich…#2986

feat: add ENTITY_ATTRIBUTES support for domain-specific entity enrich…#2986
akshay-saraswat wants to merge 1 commit intoHKUDS:mainfrom
Evols-AI:feat/entity-attributes

akshay-saraswat commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akshay-saraswat commented Apr 27, 2026

Description

Related Issues

Changes Made

Checklist

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant