Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions designs/0035-pin-parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
- Feature Name: pin-parameters
- Start Date: 2025-05-05
- RFC PR:
- Stan Issue:

# Summary
[summary]: #summary

It is often useful to be able to 'pin' the otherwise free parameters in a statistical model to specific values. First, this is useful when debugging a statistical model to diagnose computational problems or understand the priors. Second, it is useful when one wishes to explore a simpler model that is nested inside an extended one. e.g., the mu = 0 no effect model that is nested inside the mu != 0 model of a new effect of size mu. This proposal makes pinning straight-forward at runtime.

# Motivation
[motivation]: #motivation

At present, to pin a parameter a Stan model must be rewritten. We must either:

- move a parameter from the parameter block to the data block, where it is pinned to a fixed value
- add convoluted logic so that a boolean (more precisely an integer<lower=0, upper=1>) in the data block can control whether a parameter is pinned, e.g., (from [here](https://discourse.mc-stan.org/t/fixing-parameters-in-a-model/39035/4?u=andrewfowlie))
```
data {
int<lower=0, upper=1> mu_is_data;
array[mu_is_data] mu_val;
Comment thread
andrewfowlie marked this conversation as resolved.
Outdated
...
parameters {
array[1 - mu_is_data] mu_param;
...
transformed parameters {
real mu = mu_is data ? mu_val[1] : mu_param[1];
...
```

These are inelegant and unsatisfactory, partly as they require rewriting and recompiling a model.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example above does not require recompilation. And it's only rewriting if you wrote it once without anticipating the pinning. Instead, I'd just mention that this is a very clunky pattern to code and obfuscates the inherent generative structure of the model.


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The following section is a draft of the docs for the new `pin` keyword-value pair in `cmdstan` command-line options that would appear [here](https://mc-stan.org/docs/cmdstan-guide/command_line_options.html). We don't at this stage make any proposal for how this feature would be propagated to other Stan interfaces, they do not anticipate any difficulties.

## Command-Line Interface Overview

...

- `pin` - specifies values for any parameters that should be pinned, if any
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CmdStan is very particular about the hierarchical structure of arguments, so this needs to be clarified in terms of where it is possible. Which interfaces will it apply to?

  • HMC, NUTS sampling
  • ADVI, Pathfinder VI
  • Laplace approximation
  • plain old optimization

Each of these has its own particular argument structure. For example, here's sampling's default config hierarcharchy into which pin will need to be slotted:

method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = false (Default)
    thin = 1 (Default)
    adapt
      engaged = true (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
      save_metric = false (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 1 (Default)
id = 0 (Default)
data
  file = bernoulli.data.json
init = 2 (Default)
random
  seed = 3252652196 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)

I would suggest making it parallel to init. This placing should be available for all of the inference options.


...

### Pin model parameters argument
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs to make it clear that variables are going to be pinned or not pinned---they can't be partially pinned. Suppose I have the following program.

vector[10] a;

I can pin all 10 components of a or none of them. I can't just pin a[3] and a[7] and leave the others as parameters, for example.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth noting that this restriction is primarily due to the lack of a good design, NOT due to mathematical concerns like the restriction on constrained parameters.

If we could agree on what the specification for the JSON file that only specifies a[3] and a[7] looks like, we could plumb enough information through the compiler and inference to make it work

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this restriction is primarily due to the lack of a good design, NOT due to mathematical concerns like the restriction on constrained parameters.

It's fine mathematically to pin a constrained parameter. It's just too challenging on the Stan side to evaluate the implied constraints on remaining parameters and whether the result is even consistent or expressible in Stan.

In the same way, pinning just two values is more of a problem in terms of rewriting the Stan model than in terms of math. If we pin just two values, it's like this:

data {
  vector[2]  pinned_a;
  ...
parameters {
  vector[8] unpin_a
  ...  
transformed parameters {
  vector[10] a;
  a[1:2] = unpin_a[1:2];
  a[3] = pinned_a[1];
  a[4:6] = unpin_a[3:5];
  a[7] = pinned_a[2];
  a[8:10] = unpin_a[6:8];
  ...

I don't know an equivalently simple workaround to the one @andrewfowlie listed in the original proposal for fully pinned parameters.


Parameters defined in the parameters block can be 'pinned' to specific values. This is useful when debugging a model or exploring a simpler model that is nested inside an extended one.

By default, no parameters are pinned. The pinned parameters are read from an input data file in JSON format using the syntax:
```
pin=<filepath>
```
The value must be a filepath to a JSON file containing pinned values for some or all of the parameters in the parameters block.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention that this is going to be the same JSON format used for other Stan files and provide a pointer:

https://mc-stan.org/docs/cmdstan-guide/json_apdx.html#creating-json-files


At present, there are two restrictions on parameters that can be pinned:

- you cannot pin a subset of elements of a vector; all elements must be pinned
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd number these given that it's preceded by "there are two". They are also full sentences, so should be capitalized and ended with a period. Then you don't need the : before.

This is all just super-nitpicky style stuff that doesn't really affect the spec itself.

- you cannot pin constrained parameters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to motivate why here so that the reader understands the restrictions. I think we can generalize a bit here so that you can pin constrained parameters that do not depend on any other parameters? For example, consider this example.

parameters {
  real a;
  real<lower = a> b;
}
  • If you pin b, it imposes an upper-bound constraint on a to keep things in order. In general those are too hard to track.
  • If you pin a, I think everything should be fine.

We should run this by @WardBrian to see what the constraints are on the compiler side.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an even more general thing we could allow is to pin any parameter on the unconstrained scale. Of course, we don't need to allow this explictly, because a user could always write

parameters {
  real a;
  real b_raw;
}
transformed parameters {
  real b = lower_bound_jacobian(b_raw, a);
}

and pin b_raw.

We could try to compute the dataflow of which things you are allowed to pin or not (so that you could pin a, or a AND b, but not b by itself), but that would be very tricky since it ultimately needs to be done at runtime when we actually see the pin.json, not in the compiler. So I think starting conservatively and maybe expanding later is best

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting conservatively

Are you suggesting not allowing any constrained parameters to be pinned? If we can't pin scale parameters, it's going to eliminate the main use case put forward by @andrewgelman, which is pinning scale parameters in hierarchical models. What this would wind up doing is forcing users to define log scale parameters, pin those, and do their own transforms.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to disallow them entirely at first, yes. Even if we did allow simple cases like real<lower=0> scale; where there is no parameter dependence, we'd still need to decide if the user is providing the pinning on the constrained or unconstrained scale, and, if we want it to be on the constrained scale, we'd need to really think about how transform_inits is structured and if it would work for this use case.

The fact that the built in transforms are available as functions now decreases some of the pain of this disallowing, since you can still get the same result if you don't mind the extra variable and mandatory unconstrained representation for the pin value

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinned values should be on the constrained scale, just like custom initial values already are. Unconstrained scale is an implementation detail that users don't need to know about.

As for the C++ implementation, I think pinning must be handled in the model constructor, not transform_inits, and num_params_r__ counts the number of unpinned parameters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require extra work if we want to support pinning in pathfinder etc.

Yeah, that's a real downside. I think we could minimize the overall code changes if we updated a few of the helpers like stan::model::log_prob_grad and stan::model::gradient, as these are used by all the services

it reads unpinned parameters from the deserializer (as before) but pinned parameters are just copied from the data, no deserializer needed

That would come at the cost of some pretty serious branching on every parameter read, right? Because we won't know at compile time which will be which. (which also means we won't be able to store them as typed entries in the model class, but need to do another round of deserialization somehow, unless we use something like std::optional)

Obviously the compiler will need to change either way, since sorting it out in the algorithmic layer would require some extra metadata or helper functions for those offsets into the parameter vector, but I think the fact that it leaves log_prob more or less untouched is a big win

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would come at the cost of some pretty serious branching on every parameter read, right?

Yes, I'm imagining code that looks like

if (param_pinned__.has_value) {
  param = param_pinned__.value;
} else {
  param = in__.read();
}

for every parameter (whose constraints don't depend on other parameters.)
Since pinning is a runtime decision, there must be branching on every parameter regardless of how it is implemented.

(and, yes, that's std::optional)

I think the fact that it leaves log_prob more or less untouched is a big win

It leaves the API of log_prob unchanged either way, the question is on which side of BridgeStan do you want to implement pinning?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_probs API would be the same, but doing it in the algorithms would mean the entire code gen of log_prob is the same, with no extra branches. Because you'd get one branch per parameter name, I'm also hesitant to assume the branch predictor will do particularly well with them, even though they'd be basically constant for a given model run.

I also don't see how you could extend this to partially specifying a vector, for example, but a mask-in-the-algorithm approach ultimately doesn't care about that.

on which side of BridgeStan do you want to implement pinning?

That question is actually what motivated my current thinking, because I realized someone working with BridgeStan to write their own sampler already could do pinning by basically masking the gradient you get back from log prob. If you precompute that mask, you can even avoid any branching (by doing a bunch of multiplies instead)

So, my answer would be "outside". That has some direct benefits, like allowing a BridgeStan user who wants to experiment with a pinning only actually instantiate one copy of their model/data to try both pinned and unpinned options, but it also the option that requires the least changes for us in BridgeStan (adding another option to the constructor would require either a new overload or a major version bump of our C API).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't see how you could extend this to partially specifying a vector

You add new deserializer method that takes a mask in addition to other constraints.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting way into the weeds of implementation, but I was imagining what would happen is that we would do a bind in the log density function to generate a new log density function, so that the inference systems (sampling, VI, optimization) don't need to know anything about it. This is what @WardBrian is suggesting for BridgeStan, for example.

That does leave open how to handle the equivalent of write_array---I think that's going to need to be done as @nhuure is suggesting.


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

** to be discussed **

# Drawbacks
[drawbacks]: #drawbacks

- It's another command-line argument and there are already several
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number these and punctuate as sentences.

- The same thing can be achieved by re-programming the model.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"re-programming" is not conventionally hyphenated, but then it has a specific meaning which we don't want.

How about saying the same thing can be achieved by coding the model in a much more complicated way as indicated above?

- It changes the *interpretation* of a Stan model, though in a very explicit way
- The two restrictions, particularly cannot pin constrained parameters, might limit use-cases
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely will limit use cases.

P.S. We're not as picky about design docs as our regular doc. For future reference, noun compounds like "use cases" are only hyphenated when used as adjectives. So that means in "use-cases discussion" it gets a hyphen but "discussion of use cases" it does not.


# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives


I think pinning parameters at runtime is far more elegant than existing solutions. At first, I had thought about a new keyword in the Stan language itself, e.g., in a parameter constraint
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stay away from phrases like "I think" in design documents. There are better ways to hedge, but in this case, I don't think there's going to be much pushback that it's more elegant. Same for "At first, I had thought about"---that can just be "An alternative would be a new keyword in the Stan language itself.". But we wouldn't do that with a keyword, we'd probably do it with an annotation if we want dto go that way.

parameters {
  @pin(0)
  real mu;
  ...

The only situation where Stan requires you to write 0.0 instead of 0 is when you are doing integer arithmetic.

```
parameters {
real<pin=0.> mu;
}
```
It's certainly neater than
```
data {
int<lower=0, upper=1> mu_is_data;
array[mu_is_data] mu_val;
...
parameters {
array[1 - mu_is_data] mu_param;
...
transformed parameters {
real mu = mu_is data ? mu_val[1] : mu_param[1];
...
```
but even with `<pin=>`, pinning still requires one to change a model and recompile.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New sentence. "Even with an annotation, ...".

As second drawback is that it doesn't let you change the value at runtime. So it's much less flexible than the explicitly coded one you started with.


# Prior art
[prior-art]: #prior-art

`PyMC` has specific functionality for pinning. See [here](https://www.pymc.io/projects/docs/en/stable/api/model/generated/pymc.model.transform.conditioning.do.html). In `PyMC`, pinning (and perhaps other similar things) are called 'interventions'. The example given in the docs is this,
```
Comment thread
andrewfowlie marked this conversation as resolved.
Outdated
import pymc as pm

with pm.Model() as m:
x = pm.Normal("x", 0, 1)
y = pm.Normal("y", x, 1)
z = pm.Normal("z", y + x, 1)

# Dummy posterior, same as calling `pm.sample`
idata_m = az.from_dict({rv.name: [pm.draw(rv, draws=500)] for rv in [x, y, z]})

# Replace `y` by a constant `100.0`
with pm.do(m, {y: 100.0}) as m_do:
idata_do = pm.sample_posterior_predictive(idata_m, var_names="z")
```

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great that you included a PyMC example.

I don't know if we want to mention this, but @WardBrian found a bug in their implementation where they don't respect constraints. Probably not relevant for this discussion.

Do you know how general this is in PyMC? If not, we can ask their devs. For example, does PyMC let me set just one component of a vector?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know PyMC that well and I've never used this feature. But just playing around with it, it seems like it's all or nothing.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

I don't know how this would be implemented technically, but there is a comment [here](https://discourse.mc-stan.org/t/fixing-parameters-in-a-model/39035/7?u=andrewfowlie)