Feat/webinar llm latencies 2#5002
Conversation
|
This pull request is automatically being deployed by Amplify Hosting (learn more). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 838cc6f657
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for (const dataSet of message.dataSets) { | ||
| const pending = this.pendingSyncMetrics.get(dataSet.name); | ||
|
|
||
| if (pending && dataSet.version >= pending.dataSetVersion) { |
There was a problem hiding this comment.
I think Locus team wanted us to match the received event with the sync request by using the tracking id, see this page:
https://confluence-eng-gpk2.cisco.com/conf/spaces/CAL/pages/801680811/5K+Webinar+Latencies+Tracking
I think it's important for them in cases if there is a sync storm
There was a problem hiding this comment.
I checked the captured LLM locus.state_message payloads via window.locusEvents across initial join, practice session toggle, breakout session join, and closed caption scenarios. I don’t see the original /sync request tracking id in either event.data or stateElementsMessage.
SDK can store the /sync tracking id from the HTTP response header, but to correlate it with the final LLM broadcast as requested, Locus/LLM needs to include the same tracking id in the broadcast payload, e.g. stateElementsMessage.trackingId or event.data.trackingId.
There was a problem hiding this comment.
the tracking id in LLM messages is outside event.data and outside stateElementsMessage - it is at the top level of the received event. Here's a screenshot from the network dev tools socket received messages:

Also, as far as I understand it, when we make the /sync request we may or may not receive the LLM message with the matching tracking id - it's up to Locus to decide what to do - sometimes they give us the response in the http response, sometimes they respond with 204 and send LLM messages and the LLM message may contain our sync tracking id or if there were other clients requesting sync around the same time, it may contain the tracking id of those other clients' syncs - in such case we shouldn't be sending the "client.locus.sync.complete" event.
| */ | ||
| // @ts-ignore - Fix type | ||
| this.locusInfo = new LocusInfo(this.updateMeetingObject.bind(this), this.webex, this.id); | ||
| this.locusInfo.syncMetricsCallback = (metrics: { |
There was a problem hiding this comment.
I think it would be cleaner to pass the callback in the constructor. We are already passing 1 callback in the constructor, so we could have a callbacks argument object with multiple callbacks. Same for how the callback is passed between LocusInfo and HashTreeParser class.
By requiring the callback in the constructor we can make sure it's always there so we then don't have to check for it when calling it
| breakoutMoveId: eventInfo.breakoutMoveId, | ||
| breakoutSessionId: eventInfo?.currentSession?.sessionId, | ||
| breakoutGroupId: eventInfo?.currentSession?.groupId, | ||
| ...(eventInfo?.llmWebsocketUrl && {llmWebsocketUrl: eventInfo.llmWebsocketUrl}), |
There was a problem hiding this comment.
nitpick: personally I don't like this syntax, I think it's not very readable. Wouldn't it be better like this?
const {llmWebsocketUrl, llmLatency} = eventInfo;
const payload: any = {
llmLatency,
identifiers: {
breakoutMoveId: eventInfo.breakoutMoveId,
breakoutSessionId: eventInfo?.currentSession?.sessionId,
breakoutGroupId: eventInfo?.currentSession?.groupId,
llmWebsocketUrl,
},
};
We know that eventInfo must be defined, otherwise we would have returned in line 37
I see same syntax pattern in many other places in this PR, most of them could be made simpler by having the const with the same name as the field in the created object defined earlier.
|
|
||
| dataSet.timer = setTimeout(() => { | ||
| dataSet.timer = undefined; | ||
| dataSet.lastBackoffTime = delay - dataSet.idleMs; |
There was a problem hiding this comment.
I feel that it would be cleaner to have:
const randomBackOff = this.getWeightedBackoffTime(dataSet.backoff);
const delay = dataSet.idleMs + randomBackOff;
dataSet.lastBackoffTime = randomBackOff;
a few lines above instead of doing the subtraction here. Do you agree? Also, feels a bit cleaner to be setting dataSet.lastBackoffTime before the call to setTimeout instead of inside the setTimeout callback.
| throw error; | ||
| } | ||
|
|
||
| hashtreeResponseTime = shouldCollectMetrics |
There was a problem hiding this comment.
this is not exactly the right place to measure the /hashtree response time - this should be done inside getHashesFromLocus() immediately after getting the response. Right now getHashesFromLocus doesn't do much after getting the response, but that's not guaranteed to be always the case, someone might add some async or time consuming stuff to it and then the calculation here won't give the correct result.
Same for hashtreePrepStart that's used for hashtreePrepTime calculations - it should be done inside getHashesFromLocus() immediately before calling this.webexRequest
| }); | ||
| } | ||
|
|
||
| const syncResponseReceivedAt = shouldCollectMetrics ? performance.now() : 0; |
There was a problem hiding this comment.
having all that code that calculates latencies spread out across the code and mixed in with other logic feels quite messy. It would be better to have a more generic mechanism - a class that maintains the latency values and does the calculations and in the HashTreeParser we would just call that class to tell it about various milestones being reached like "hash tree prep start/end", "hash tree request/response" etc. In fact we already have a class for doing this sort of stuff - CallDiagnosticLatencies - see for example how shareDuration is calculated for client.share.stopped event. You can add more "internal" events to InternalEvent - they are internal to SDK as the name suggests.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b82b71b8e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f15504d61e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3335dc8100
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35ba5211a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4dbd7decf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc5d4e74da
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f34a260972
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d236cadf8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
For single-leaf datasets (leafCount === 1), preserve the heartbeat target version from the message instead of using the local version for pending sync metrics. This prevents metrics from being completed prematurely when subsequent non-root messages at the same local version arrive, before the actual remote heartbeat version broadcasts are received. Changes: - Add heartbeatVersion field to sync queue entries - Pass receivedDataSet.version from runSyncAlgorithm to enqueueSyncForDataset - Update performSync to use heartbeat version for single-leaf targetSyncVersion - This ensures pending metrics wait for the correct remote version
| assert.isDefined(result); | ||
| assert.isNumber(result.clientLLMDatachannelResponseTime); | ||
| assert.isNumber(result.clientLLMWebSocketConnectTime); | ||
| assert.isAtLeast(result.clientLLMDatachannelResponseTime, 0); |
There was a problem hiding this comment.
this check is not good enough, we can control time progression with sinon.useFakeTimers() so we can have a precise check that the returned time values are correct and also not mixed up
| } | ||
|
|
||
| const hashtreePrepTime = | ||
| this.getDiffBetweenLocusSyncTimestamps(record, 'syncStart', 'hashTreeRequest') ?? 0; |
There was a problem hiding this comment.
hashtreePrepTime and hashtreeResponseTime have a fallback to 0 while all other values don't have a fallback and we return undefined if any of them are missing - it's because the /hashtree request isn't always done during the sync, right? it would be worth adding a comment explaining that
| * @param meetingId meeting id | ||
| * @returns storage key | ||
| */ | ||
| private getLocusSyncLatencyKey(dataSetName: string, meetingId?: string) { |
There was a problem hiding this comment.
why is meetingId optional? Locus syncs can only be done for meetings and each meeting always has an id, so any code related to sending locus sync metrics should be able to have the meeting id - it should never be optional. I see this needs fixing here, but also in many other places like clearLocusSyncLatency(), getLocusSyncLatency() etc.
| * @returns whether event is Locus sync latency milestone | ||
| */ | ||
| private isLocusSyncLatencyEvent(key: MetricEventNames) { | ||
| return [ |
There was a problem hiding this comment.
instead of having a duplication of the list of locus sync events here and in the type definition for InternalEvent we could define a type for Locus sync events and use it in InternalEvent
| hashtreeResponseTime: 0, | ||
| syncPrepTime: 20, | ||
| syncResponseTime: 20, | ||
| syncMessageReceiveTime: 20, |
There was a problem hiding this comment.
all values here are the same (20) it would be better to have distinct values to make sure they are not mixed up.
| export type SyncLatencyTracker = { | ||
| saveTimestamp: (options: { | ||
| key: | ||
| | 'internal.client.locus.sync.start' |
There was a problem hiding this comment.
I mentioned in the other comment, it would be good to have a type defined for these events
| locusInfoUpdateCallback: LocusInfoUpdateCallback; | ||
| private callbacks: HashTreeParserCallbacks; | ||
| private syncLatencyTracker?: SyncLatencyTracker; | ||
| private syncLatencyMeetingId?: string; |
There was a problem hiding this comment.
like I mentioned in another comment, the meeting id should not be optional
| this.webexRequest = options.webexRequest; | ||
| this.locusInfoUpdateCallback = options.locusInfoUpdateCallback; | ||
| this.callbacks = options.callbacks; | ||
| this.syncLatencyTracker = options.callbacks.syncLatencyTracker; |
There was a problem hiding this comment.
syncLatencyTracker is already saved in this.callbacks so we should use that one
| this.handleRootHashHeartBeatMessage(message); | ||
| this.resetHeartbeatWatchdogs(message.dataSets); | ||
| } else { | ||
| this.tryCompletePendingSyncMetrics(message); |
There was a problem hiding this comment.
I'm not convinced if this is the right place to do this. Yes, HashTreeParser gathers most of the timings for the sync, but it doesn't need to be the one that's calling metrics module to send the CA event. The CA event should be sent only once we receive the LLM event with the right tracking id. HashTreeParser doesn't even have access to that tracking id, so I think the call should be done from the Meeting object, from processLocusLLMEvent() - this is where we get the LLM event and have access to the tracking id.
I think the design should be:
- HashTreeParser gathers timestamps and sets them in metrics plugin as internal event timestamps, it also passes the tracking id to it
- metrics plugin stores the timestamps (keyed by meeting id only, without dataset name, but dataset name needs to be part of the record) and calculates the latencies
- meeting object calls metrics plugin with the tracking id when it receives an LLM event, metrics plugin finds the matching record for that meeting id and tracking id. It doesn't need to check the dataset name, it could just check all records for a given meeting id and search for tracking id in them, there will only be 1 or 2 of them most of the time.
So we wouldn't need to maintain pendingSyncMetrics - the latency data is maintained in metrics plugin and HashTreeParser or Meeting object calls it with meeting id to store it and if the record for a given meeting id doesn't exist, metrics plugin creates it, if it exists it just updates it.
Also this would allow us have the locusSyncLatencies keyed by just meeting id (without data set name) and it means that this Map could be more generic, we could rename it to something like meetingLatencies and could be in the future used for other latencies that nee do be tracked separately for each meeting.
What do you think? do we need a call to discuss this in more detail?
There was a problem hiding this comment.
Thanks, I agree with the direction. I updated the flow so HashTreeParser only records internal sync timestamps into the metrics plugin. It does not submit client.locus.sync.complete.
The CA event is now sent from Meeting. In processLocusLLMEvent(), Meeting extracts the LLM tracking id, passes it to completeLocusSyncLatency(meetingId, trackingId), and sends client.locus.sync.complete only if the metrics plugin can find and complete a matching record.
The sync latency records are now maintained in the metrics plugin under meetingLatencies, keyed by meeting id. Dataset name is stored inside each record, and completion searches records by meeting id + tracking id rather than requiring dataset name. So we no longer need pendingSyncMetrics.
One nuance: there is still a callback from HashTreeParser to Meeting when /sync response arrives, but it only lets Meeting retry completion through the metrics plugin to handle event ordering. HashTreeParser still does not send the CA event.
| clearTimeout(dataSet.timer); | ||
| } | ||
|
|
||
| dataSet.lastBackoffTime = randomBackoffTime; |
There was a problem hiding this comment.
not sure if we actually need to store lastBackoffTime in HashTreeParser, we could just call metrics whenver we calculate it and store it there, we store all other timestamps there, so why treating backoff time differently and storing it here? we call saveTimestamp() for storing timestamps, we could call saveLatency() for storing the backoff
| this.compareAndUpdateFlags = {}; | ||
| this.meetingId = meetingId; | ||
| this.updateMeeting = updateMeeting; | ||
| this.updateMeeting = callbacks.updateMeeting; |
There was a problem hiding this comment.
it is a bit confusing that we're storing updateMeeting now twice: in this.callbacks and here. Probably would be better to update all the places where it's called to use this.callbacks.updateMeeting now
| // @ts-ignore | ||
| const llmWebsocketUrl = this.webex.internal.llm.getWebSocketUrl?.() || undefined; | ||
|
|
||
| breakoutEvent.onBreakoutJoinResponse( |
There was a problem hiding this comment.
shouldn't this call be inside some if to check that we are actually in breakouts? for example by checking if breakoutMoveId and/or this.breakouts?.currentBreakoutSession is defined? same for similar call in the catch block
| }; | ||
|
|
||
| if (llmWebsocketUrl) { | ||
| payload.identifiers = {llmWebsocketUrl}; |
There was a problem hiding this comment.
nitpick: we could just do:
const payload: any = {
identifiers: {
llmWebsocketUrl,
},
syncLatency: metrics.syncLatency,
llmInfo: {
dataSet: metrics.dataSet,
},
};
which would be shorter, wouldn't it?
similar stuff applies to code in sendLLMConnectMetric()
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 97b7bfffa2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| public clearTimestamps() { | ||
| this.latencyTimestamps.clear(); | ||
| this.precomputedLatencies.clear(); | ||
| this.locusSyncLatencies.clear(); |
There was a problem hiding this comment.
Avoid clearing other meetings' pending sync latencies
Because callDiagnosticLatencies is shared across the SDK instance, internal.reset.join.latencies from one meeting (for example Meeting.leave() submits it) now clears the entire locusSyncLatencies map, including records keyed for other active meetings. If meeting A is waiting for an empty-body /sync broadcast while meeting B leaves/resets join latencies, A's pendingSyncMetrics entry remains but its tracker record is gone, so completion later returns undefined and drops client.locus.sync.complete; clear only the join timestamps or the current meeting's sync record instead of the whole map.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4356f3dd5d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| if (!hasLLMLatency) { |
There was a problem hiding this comment.
Do not re-emit join response before LLM latency
When automatic LLM is enabled, Breakouts.updateBreakout() still emits client.breakout-session.join.response first without llmLatency, and fresh evidence in this revision is that this no-latency branch returns true without recording the breakoutMoveId. The later updateLLMConnection() emission for the same move ID with clientLLMDatachannelResponseTime/clientLLMWebSocketConnectTime is therefore also allowed through, so successful breakout moves report duplicate join-response metrics instead of one latency-enriched response.
Useful? React with 👍 / 👎.
| const connectStart = performance.now(); | ||
|
|
||
| await this.connect(connectUrl, sessionId); | ||
|
|
There was a problem hiding this comment.
Preserve datachannel latency when websocket connect fails
If the datachannel registration succeeds but the websocket connect() rejects, this method rejects before returning the already-measured clientLLMDatachannelResponseTime; the caller’s catch path then reports both LLM connect latencies as 0. In websocket-failure scenarios this corrupts the new client.llm.connect.response failure telemetry by losing the registration response time that was actually observed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 444d4c562f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| reason: string, | ||
| isInitialization = false | ||
| isInitialization = false, | ||
| heartbeatVersion: number | undefined = undefined |
There was a problem hiding this comment.
Update queued heartbeat sync targets
When a sync for a single-leaf LLM dataset is already queued (for example because another dataset is currently syncing), a later heartbeat for the same dataset can carry a newer heartbeatVersion, but this duplicate-queue path keeps the old entry and drops the new target. Fresh evidence in this revision is the new heartbeatVersion argument is only stored when pushing a new entry, so performSync() can record pending metrics against a stale/local version and an intervening message can complete client.locus.sync.complete before the broadcast for the newer heartbeat arrives; update the existing entry's heartbeatVersion when the duplicate represents a newer target.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fdddb7f0f3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| key: 'internal.client.locus.sync.response', | ||
| options: this.getSyncLatencyTimestampOptions(dataSet.name, undefined, trackingId), | ||
| }); | ||
| this.callbacks.syncResponseCallback?.(trackingId); |
There was a problem hiding this comment.
Defer sync-complete callback for empty sync responses
When /sync returns an empty body, this path explicitly expects the dataset update to arrive later via LLM messages, but the callback is invoked before checking for that empty-body case. Meeting.handleHashTreeSyncResponse() immediately completes and removes the latency record, synthesizing messageReceived at HTTP response time, so the later real broadcast can no longer be matched and client.locus.sync.complete reports the wrong receive latency for empty-body syncs. Only complete here when the response body itself contains the sync message, or wait for the actual message event.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 051343e373
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const syncMetrics = ( | ||
| this as any | ||
| ).webex.internal.newMetrics.callDiagnosticLatencies.completeLocusSyncLatency( | ||
| this.id, | ||
| trackingId |
There was a problem hiding this comment.
Complete sync latency only after a sync message
When a /sync response is empty and leaves a pending latency record, this unconditional completion runs after every LLM locus event, including root-hash heartbeats or other events that did not record internal.client.locus.sync.message.received. completeLocusSyncLatency() then falls back to the sole completable record and fills messageReceived with the current time, so client.locus.sync.complete can be emitted before the actual sync broadcast arrives and the real receive latency is lost. Gate this call on the parser actually matching a sync message/dataset or a tracking id match.
Useful? React with 👍 / 👎.
This pull request addresses
https://jira-eng-gpk2.cisco.com/jira/browse/SPARK-774846
by making the following changes
added latency telemetry for webinar LLM connect and Locus sync flows, including timing collection, websocket/dataset context in payloads, callback wiring from parser to meeting.
Change Type
The following scenarios were tested
< ENUMERATE TESTS PERFORMED, WHETHER MANUAL OR AUTOMATED >
The GAI Coding Policy And Copyright Annotation Best Practices
I certified that
Make sure to have followed the contributing guidelines before submitting.