-
Notifications
You must be signed in to change notification settings - Fork 30
Pinning parameters #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| - Feature Name: pin-parameters | ||
| - Start Date: 2025-05-05 | ||
| - RFC PR: | ||
| - Stan Issue: | ||
|
|
||
| # Summary | ||
| [summary]: #summary | ||
|
|
||
| It is often useful to be able to 'pin' the otherwise free parameters in a statistical model to specific values. First, this is useful when debugging a statistical model to diagnose computational problems or understand the priors. Second, it is useful when one wishes to explore a simpler model that is nested inside an extended one. e.g., the mu = 0 no effect model that is nested inside the mu != 0 model of a new effect of size mu. This proposal makes pinning straight-forward at runtime. | ||
|
|
||
| # Motivation | ||
| [motivation]: #motivation | ||
|
|
||
| At present, to pin a parameter a Stan model must be rewritten. We must either: | ||
|
|
||
| - move a parameter from the parameter block to the data block, where it is pinned to a fixed value | ||
| - add convoluted logic so that a boolean (more precisely an integer<lower=0, upper=1>) in the data block can control whether a parameter is pinned, e.g., (from [here](https://discourse.mc-stan.org/t/fixing-parameters-in-a-model/39035/4?u=andrewfowlie)) | ||
| ``` | ||
| data { | ||
| int<lower=0, upper=1> mu_is_data; | ||
| array[mu_is_data] mu_val; | ||
| ... | ||
| parameters { | ||
| array[1 - mu_is_data] mu_param; | ||
| ... | ||
| transformed parameters { | ||
| real mu = mu_is data ? mu_val[1] : mu_param[1]; | ||
| ... | ||
| ``` | ||
|
|
||
| These are inelegant and unsatisfactory, partly as they require rewriting and recompiling a model. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example above does not require recompilation. And it's only rewriting if you wrote it once without anticipating the pinning. Instead, I'd just mention that this is a very clunky pattern to code and obfuscates the inherent generative structure of the model. |
||
|
|
||
| # Guide-level explanation | ||
| [guide-level-explanation]: #guide-level-explanation | ||
|
|
||
| The following section is a draft of the docs for the new `pin` keyword-value pair in `cmdstan` command-line options that would appear [here](https://mc-stan.org/docs/cmdstan-guide/command_line_options.html). We don't at this stage make any proposal for how this feature would be propagated to other Stan interfaces, they do not anticipate any difficulties. | ||
|
|
||
| ## Command-Line Interface Overview | ||
|
|
||
| ... | ||
|
|
||
| - `pin` - specifies values for any parameters that should be pinned, if any | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CmdStan is very particular about the hierarchical structure of arguments, so this needs to be clarified in terms of where it is possible. Which interfaces will it apply to?
Each of these has its own particular argument structure. For example, here's sampling's default config hierarcharchy into which I would suggest making it parallel to |
||
|
|
||
| ... | ||
|
|
||
| ### Pin model parameters argument | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section needs to make it clear that variables are going to be pinned or not pinned---they can't be partially pinned. Suppose I have the following program. vector[10] a;I can pin all 10 components of
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's also worth noting that this restriction is primarily due to the lack of a good design, NOT due to mathematical concerns like the restriction on constrained parameters. If we could agree on what the specification for the JSON file that only specifies
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's fine mathematically to pin a constrained parameter. It's just too challenging on the Stan side to evaluate the implied constraints on remaining parameters and whether the result is even consistent or expressible in Stan. In the same way, pinning just two values is more of a problem in terms of rewriting the Stan model than in terms of math. If we pin just two values, it's like this: I don't know an equivalently simple workaround to the one @andrewfowlie listed in the original proposal for fully pinned parameters. |
||
|
|
||
| Parameters defined in the parameters block can be 'pinned' to specific values. This is useful when debugging a model or exploring a simpler model that is nested inside an extended one. | ||
|
|
||
| By default, no parameters are pinned. The pinned parameters are read from an input data file in JSON format using the syntax: | ||
| ``` | ||
| pin=<filepath> | ||
| ``` | ||
| The value must be a filepath to a JSON file containing pinned values for some or all of the parameters in the parameters block. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to mention that this is going to be the same JSON format used for other Stan files and provide a pointer: https://mc-stan.org/docs/cmdstan-guide/json_apdx.html#creating-json-files |
||
|
|
||
| At present, there are two restrictions on parameters that can be pinned: | ||
|
|
||
| - you cannot pin a subset of elements of a vector; all elements must be pinned | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd number these given that it's preceded by "there are two". They are also full sentences, so should be capitalized and ended with a period. Then you don't need the This is all just super-nitpicky style stuff that doesn't really affect the spec itself. |
||
| - you cannot pin constrained parameters | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You want to motivate why here so that the reader understands the restrictions. I think we can generalize a bit here so that you can pin constrained parameters that do not depend on any other parameters? For example, consider this example. parameters {
real a;
real<lower = a> b;
}
We should run this by @WardBrian to see what the constraints are on the compiler side.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think an even more general thing we could allow is to pin any parameter on the unconstrained scale. Of course, we don't need to allow this explictly, because a user could always write parameters {
real a;
real b_raw;
}
transformed parameters {
real b = lower_bound_jacobian(b_raw, a);
}and pin We could try to compute the dataflow of which things you are allowed to pin or not (so that you could pin
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Are you suggesting not allowing any constrained parameters to be pinned? If we can't pin scale parameters, it's going to eliminate the main use case put forward by @andrewgelman, which is pinning scale parameters in hierarchical models. What this would wind up doing is forcing users to define log scale parameters, pin those, and do their own transforms.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My preference would be to disallow them entirely at first, yes. Even if we did allow simple cases like The fact that the built in transforms are available as functions now decreases some of the pain of this disallowing, since you can still get the same result if you don't mind the extra variable and mandatory unconstrained representation for the pin value There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pinned values should be on the constrained scale, just like custom initial values already are. Unconstrained scale is an implementation detail that users don't need to know about. As for the C++ implementation, I think pinning must be handled in the model constructor, not
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a real downside. I think we could minimize the overall code changes if we updated a few of the helpers like
That would come at the cost of some pretty serious branching on every parameter read, right? Because we won't know at compile time which will be which. (which also means we won't be able to store them as typed entries in the model class, but need to do another round of deserialization somehow, unless we use something like std::optional) Obviously the compiler will need to change either way, since sorting it out in the algorithmic layer would require some extra metadata or helper functions for those offsets into the parameter vector, but I think the fact that it leaves There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I'm imagining code that looks like if (param_pinned__.has_value) {
param = param_pinned__.value;
} else {
param = in__.read();
}for every parameter (whose constraints don't depend on other parameters.) (and, yes, that's
It leaves the API of log_prob unchanged either way, the question is on which side of BridgeStan do you want to implement pinning?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. log_probs API would be the same, but doing it in the algorithms would mean the entire code gen of log_prob is the same, with no extra branches. Because you'd get one branch per parameter name, I'm also hesitant to assume the branch predictor will do particularly well with them, even though they'd be basically constant for a given model run. I also don't see how you could extend this to partially specifying a vector, for example, but a mask-in-the-algorithm approach ultimately doesn't care about that.
That question is actually what motivated my current thinking, because I realized someone working with BridgeStan to write their own sampler already could do pinning by basically masking the gradient you get back from log prob. If you precompute that mask, you can even avoid any branching (by doing a bunch of multiplies instead) So, my answer would be "outside". That has some direct benefits, like allowing a BridgeStan user who wants to experiment with a pinning only actually instantiate one copy of their model/data to try both pinned and unpinned options, but it also the option that requires the least changes for us in BridgeStan (adding another option to the constructor would require either a new overload or a major version bump of our C API). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You add new deserializer method that takes a mask in addition to other constraints.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is getting way into the weeds of implementation, but I was imagining what would happen is that we would do a bind in the log density function to generate a new log density function, so that the inference systems (sampling, VI, optimization) don't need to know anything about it. This is what @WardBrian is suggesting for BridgeStan, for example. That does leave open how to handle the equivalent of |
||
|
|
||
| # Reference-level explanation | ||
| [reference-level-explanation]: #reference-level-explanation | ||
|
|
||
| ** to be discussed ** | ||
|
|
||
| # Drawbacks | ||
| [drawbacks]: #drawbacks | ||
|
|
||
| - It's another command-line argument and there are already several | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Number these and punctuate as sentences. |
||
| - The same thing can be achieved by re-programming the model. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "re-programming" is not conventionally hyphenated, but then it has a specific meaning which we don't want. How about saying the same thing can be achieved by coding the model in a much more complicated way as indicated above? |
||
| - It changes the *interpretation* of a Stan model, though in a very explicit way | ||
| - The two restrictions, particularly cannot pin constrained parameters, might limit use-cases | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This definitely will limit use cases. P.S. We're not as picky about design docs as our regular doc. For future reference, noun compounds like "use cases" are only hyphenated when used as adjectives. So that means in "use-cases discussion" it gets a hyphen but "discussion of use cases" it does not. |
||
|
|
||
| # Rationale and alternatives | ||
| [rationale-and-alternatives]: #rationale-and-alternatives | ||
|
|
||
|
|
||
| I think pinning parameters at runtime is far more elegant than existing solutions. At first, I had thought about a new keyword in the Stan language itself, e.g., in a parameter constraint | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Stay away from phrases like "I think" in design documents. There are better ways to hedge, but in this case, I don't think there's going to be much pushback that it's more elegant. Same for "At first, I had thought about"---that can just be "An alternative would be a new keyword in the Stan language itself.". But we wouldn't do that with a keyword, we'd probably do it with an annotation if we want dto go that way. parameters {
@pin(0)
real mu;
...The only situation where Stan requires you to write |
||
| ``` | ||
| parameters { | ||
| real<pin=0.> mu; | ||
| } | ||
| ``` | ||
| It's certainly neater than | ||
| ``` | ||
| data { | ||
| int<lower=0, upper=1> mu_is_data; | ||
| array[mu_is_data] mu_val; | ||
| ... | ||
| parameters { | ||
| array[1 - mu_is_data] mu_param; | ||
| ... | ||
| transformed parameters { | ||
| real mu = mu_is data ? mu_val[1] : mu_param[1]; | ||
| ... | ||
| ``` | ||
| but even with `<pin=>`, pinning still requires one to change a model and recompile. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. New sentence. "Even with an annotation, ...". As second drawback is that it doesn't let you change the value at runtime. So it's much less flexible than the explicitly coded one you started with. |
||
|
|
||
| # Prior art | ||
| [prior-art]: #prior-art | ||
|
|
||
| `PyMC` has specific functionality for pinning. See [here](https://www.pymc.io/projects/docs/en/stable/api/model/generated/pymc.model.transform.conditioning.do.html). In `PyMC`, pinning (and perhaps other similar things) are called 'interventions'. The example given in the docs is this, | ||
| ``` | ||
|
andrewfowlie marked this conversation as resolved.
Outdated
|
||
| import pymc as pm | ||
|
|
||
| with pm.Model() as m: | ||
| x = pm.Normal("x", 0, 1) | ||
| y = pm.Normal("y", x, 1) | ||
| z = pm.Normal("z", y + x, 1) | ||
|
|
||
| # Dummy posterior, same as calling `pm.sample` | ||
| idata_m = az.from_dict({rv.name: [pm.draw(rv, draws=500)] for rv in [x, y, z]}) | ||
|
|
||
| # Replace `y` by a constant `100.0` | ||
| with pm.do(m, {y: 100.0}) as m_do: | ||
| idata_do = pm.sample_posterior_predictive(idata_m, var_names="z") | ||
| ``` | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is really great that you included a PyMC example. I don't know if we want to mention this, but @WardBrian found a bug in their implementation where they don't respect constraints. Probably not relevant for this discussion. Do you know how general this is in PyMC? If not, we can ask their devs. For example, does PyMC let me set just one component of a vector?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know PyMC that well and I've never used this feature. But just playing around with it, it seems like it's all or nothing. |
||
| # Unresolved questions | ||
| [unresolved-questions]: #unresolved-questions | ||
|
|
||
| I don't know how this would be implemented technically, but there is a comment [here](https://discourse.mc-stan.org/t/fixing-parameters-in-a-model/39035/7?u=andrewfowlie) | ||
Uh oh!
There was an error while loading. Please reload this page.