-
Notifications
You must be signed in to change notification settings - Fork 74
Add infosetWalkerMode tunable for streaming and non-streaming modes
#1676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
d1c2bdf
06f3427
e371216
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -212,11 +212,6 @@ abstract class SequenceParserBase( | |
| // should not increment the group index. | ||
| pstate.mpstate.moveOverOneGroupIndexOnly() | ||
| } | ||
|
|
||
| // we might have added a new instance to the array. Attempt to project it to an | ||
| // infoset if there are no PoU's or anything blocking it | ||
| pstate.walker.walk() | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we still want these |
||
| } // end while for each repeat | ||
| parser.endArray(pstate) | ||
| parser.arrayCompleteChecks(pstate, resultOfTry, priorResultOfTry) | ||
|
|
@@ -312,10 +307,6 @@ abstract class SequenceParserBase( | |
| } // end case scalarParser | ||
| } // end match case parser | ||
|
|
||
| // we finished parsing one whole thing (scalar element, entire array, etc). Attempt to | ||
| // project it to an infoset if there are no PoU's or anything blocking it | ||
| pstate.walker.walk() | ||
|
|
||
| scpIndex += 1 | ||
|
|
||
| } // end while for each sequence child parser | ||
|
|
@@ -330,10 +321,6 @@ abstract class SequenceParserBase( | |
| // that we incremented above. This will allow the infoset walker to walk | ||
| // into the new children that are now in the correct order. | ||
| pstate.infoset.infosetWalkerBlockCount -= 1 | ||
|
|
||
| // we've unblocked the unordered sequence, try walking to output | ||
| // everything we've created | ||
| pstate.walker.walk() | ||
| } | ||
|
|
||
| if (child ne null) child.sequenceCompleteChecks(pstate, resultOfTry, priorResultOfTry) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -239,6 +239,17 @@ | |
| </xs:restriction> | ||
| </xs:simpleType> | ||
| </xs:element> | ||
| <xs:element name="infosetWalkerMode" type="daf:TunableInfosetWalkerMode" default="nonStreaming" minOccurs="0"> | ||
| <xs:annotation> | ||
| <xs:documentation> | ||
| Daffodil can periodically walk the internal infoset to send events to the configured | ||
| InfosetOutputter (streaming) or it can walk the internal infoset once at the end of | ||
| parsing (nonStreaming). The idea being that simple schemas would benefit from the | ||
| nonStreaming infoset walker, while more complex schemas with lots of points of | ||
| uncertaintly would benefit from the streaming infoset walker. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think schemas with PoU's are actual also likely to benefit with non-streaming mode, this is because PoU's tend to make it so the infoset walker can't do any work because the walker can't walk into something if there's a PoU where we might backtrack and create a different infoset. So streaming with lots of PoUs is likely to just lead to to attempts to walk that don't do anything. I think the main situation where someone would want to use streaming mode is when the infoset is likely to be very large or when memory is constrained. Note that this is because the main benefit of streaming mode is that it allows parts of the internal infoset that we know won't be changed to be sent to the outputter and garbage collected in the middle of a parse, which frees up memory while parsing. But if there is no real memory pressure, I imagine in most cases non-streaming will be faster or the same. I don't think we need to mention these details, just giving some background. |
||
| </xs:documentation> | ||
| </xs:annotation> | ||
| </xs:element> | ||
| <xs:element name="infosetWalkerSkipMin" default="32" minOccurs="0"> | ||
| <xs:annotation> | ||
| <xs:documentation> | ||
|
|
@@ -780,6 +791,13 @@ | |
| </xs:list> | ||
| </xs:simpleType> | ||
|
|
||
| <xs:simpleType name="TunableInfosetWalkerMode"> | ||
| <xs:restriction base="xs:string"> | ||
| <xs:enumeration value="streaming" /> | ||
| <xs:enumeration value="nonStreaming" /> | ||
| </xs:restriction> | ||
| </xs:simpleType> | ||
|
|
||
| <xs:element name="dfdlConfig"> | ||
| <xs:complexType> | ||
| <xs:sequence> | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment that documents that this is an alternative to using the InfosetWalker and that the two are not compatible. A single parse must either call InfosetWalker.walk or DINode.walk, but never combined since the two methods are incompatible.