feat: add ENTITY_ATTRIBUTES support for domain-specific entity enrich…#2986
Draft
akshay-saraswat wants to merge 1 commit intoHKUDS:mainfrom
Draft
feat: add ENTITY_ATTRIBUTES support for domain-specific entity enrich…#2986akshay-saraswat wants to merge 1 commit intoHKUDS:mainfrom
akshay-saraswat wants to merge 1 commit intoHKUDS:mainfrom
Conversation
…ment
Allows operators to configure per-entity attribute extraction via the
ENTITY_ATTRIBUTES environment variable (JSON list of attribute names),
without any schema migrations or breaking changes to existing deployments.
Domain-specific LightRAG deployments often need entities enriched with
context beyond name, type, and description. For example:
- A customer support knowledge graph benefits from sentiment and urgency entities.
- A research graph benefits from confidence scores on extracted claims.
Previously this required forking and hand-editing the extraction prompt
with no supported mechanism to pass the attribute list through config.
The feature is implemented as a clean opt-in extension to the existing
entity-extraction pipeline:
1. ENTITY_ATTRIBUTES env var (default: empty list) is parsed in
config.py and passed through addon_params → extract_entities, the
same path used by ENTITY_TYPES.
2. When non-empty, two placeholders are injected into the extraction
prompts:
- {entity_attributes}: the comma-separated attribute list
- {entity_attributes_instruction}: a human-readable instruction
string telling the LLM to append a compact single-line JSON object
as a 5th field on each entity line.
3. _handle_single_entity_extraction now accepts 4 OR 5 fields. The 5th
field is parsed as JSON → dict; malformed JSON is logged and silently
dropped so extraction still succeeds for the entity itself.
4. Attributes are merged (last non-null value per key wins) across all
extraction instances of the same entity, both in _merge_nodes_then_upsert
(normal insert path) and _rebuild_single_entity (rebuild path).
5. The merged attributes dict is serialised as a JSON string and stored
in the graph node under the key 'attributes'. All existing graph
backends (NetworkX, Postgres, Neo4j, etc.) can round-trip it without
schema changes. Consumers deserialise with json.loads.
Entity line with attributes:
entity<|#|>Pain Point X<|#|>painpoint<|#|>Users report...<|#|>{"sentiment":"negative","urgency":"high","confidence":0.87}
Stored node property:
"attributes": "{\"sentiment\": \"negative\", \"urgency\": \"high\", \"confidence\": 0.87}"
- ENTITY_ATTRIBUTES defaults to [] → extraction prompt is byte-for-byte
identical to pre-patch behaviour.
- Parser still accepts 4-field records → existing extracted graphs are
unaffected.
- 5-field records when ENTITY_ATTRIBUTES is empty are treated as
malformed (entity still extracted without attributes).
ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'
Combined with ENTITY_TYPES for a customer support graph:
ENTITY_TYPES='["Person","Product","PainPoint"]'
ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allows operators to configure per-entity attribute extraction via the ENTITY_ATTRIBUTES environment variable (JSON list of attribute names), without any schema migrations or breaking changes to existing deployments.
Domain-specific LightRAG deployments often need entities enriched with context beyond name, type, and description. For example:
Previously this required forking and hand-editing the extraction prompt with no supported mechanism to pass the attribute list through config.
The feature is implemented as a clean opt-in extension to the existing entity-extraction pipeline:
ENTITY_ATTRIBUTES env var (default: empty list) is parsed in config.py and passed through addon_params → extract_entities, the same path used by ENTITY_TYPES.
When non-empty, two placeholders are injected into the extraction prompts:
_handle_single_entity_extraction now accepts 4 OR 5 fields. The 5th field is parsed as JSON → dict; malformed JSON is logged and silently dropped so extraction still succeeds for the entity itself.
Attributes are merged (last non-null value per key wins) across all extraction instances of the same entity, both in _merge_nodes_then_upsert (normal insert path) and _rebuild_single_entity (rebuild path).
The merged attributes dict is serialised as a JSON string and stored in the graph node under the key 'attributes'. All existing graph backends (NetworkX, Postgres, Neo4j, etc.) can round-trip it without schema changes. Consumers deserialise with json.loads.
Entity line with attributes:
entity<|#|>Pain Point X<|#|>painpoint<|#|>Users report...<|#|>{"sentiment":"negative","urgency":"high","confidence":0.87}
Stored node property:
"attributes": "{"sentiment": "negative", "urgency": "high", "confidence": 0.87}"
ENTITY_ATTRIBUTES defaults to [] → extraction prompt is byte-for-byte identical to pre-patch behaviour.
Parser still accepts 4-field records → existing extracted graphs are unaffected.
5-field records when ENTITY_ATTRIBUTES is empty are treated as malformed (entity still extracted without attributes).
ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'
Combined with ENTITY_TYPES for a customer support graph:
ENTITY_TYPES='["Person","Product","PainPoint"]'
ENTITY_ATTRIBUTES='["sentiment","urgency","confidence"]'
Description
Related Issues
N/A
Changes Made
Checklist
Additional Notes
N/A