Add concurrent normalization#3196
Conversation
197045b to
ccf882c
Compare
|
@rowanG077 I let Claude have a look at the issue. The reasoning it came up with sounds reasonable to me, but I haven't checked it in detail so please be careful :). The test suite passes locally. I'll test it on ProblemThe commit "Add concurrent normalization" introduced a race condition in the specialization cache. The With concurrent normalization, multiple threads normalizing different entities (e.g.,
Since steps 1-4 weren't atomic, N threads would each increment the counter for the same logical specialization, causing the counter to reach N× the expected value and exceeding the specialization limit of 20. FixTwo changes in
|
|
Ah that's close to what my hunch was. Interesting that it can do that. I will check in the code whether it's actually true. If true I'm surprised how good it is. |
|
Modern Claude with VSCode / terminal integration is very close to magic. |
99c384a to
80adea0
Compare
|
On bittide this does speed up Clash, but it trades it for high memory usage (5.5 GB -> 21 GB). Reducing the thread count from |
|
And this was not that much of an improvement right and requiring manual |
|
With |
|
It seems to be fundamental to how Clash normalization works: due to its lifting of binders and specialization caches |
|
Oh that is MUCH more significant than I thought |
6ce3ecd to
4ffba9b
Compare
|
I fixed a deadlock that occurs when a thread is waiting on a specialization result of another thread that never completed. I also added listening to The actual implementation looks fine to me now. |
|
Cool, I'll have a look soon. In retrospect I should have made TracedMVar for qualified use, instead of trying to be consistent with the existing MVar module. What do you think? |
I don't have really strong feelings either way. I do like that the API matches the standard |
Co-authored-by: Vanessa McHale <vamchale@gmail.com> Co-authored-by: Alexander McKenna <alex@qbaylogic.com> Co-authored-by: Rowan Goemans <rowan@qbaylogic.com>
When the thread responsible for creating a specialization failed after installing the cache placeholder, waiting threads could block forever on the empty result MVar.
b93c821 to
90321b2
Compare
This adds concurrent normalization by wrapping all state (fields) in
MVars. Earlier patches used STM, but this didn't yield a performance benefit due to excessive contention/rollbacks. Explicit locks have all the usual drawbacks of potential deadlocks, which is whyTracedMVarhas been added. This can be enabled byCLASH_DEBUG_MVAR. Earlier tests on thebittideproject yielded a 300% performance increase.The feature is disabled by default, but can be enabled using
-fclash-concurrent-normalization. It is enabled in theclash-testsuite.Still TODO:
master-fclash-concurrent-normalizationbittide-hardwareHit specialization limit 20 on function `Clash.Explicit.Testbench.outputVerifierWith[454934]'.. This can be triggered semi-reliably by executingcabal run clash-testsuite -- -j8 -p clash --hide-successes -p XilinxDDR.