Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions boba/bobarun.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import subprocess
import os
import ast
import click
from .lang import Lang
from .wrangler import DIR_SCRIPT, DIR_LOG, get_universe_id_from_script, get_universe_log, get_universe_error_log, get_universe_name
from subprocess import PIPE
Expand Down Expand Up @@ -52,3 +54,12 @@ def run_commands_in_folder(folder, file_with_commands):
for line in f.readlines():
os.system(line)
os.chdir(cwd)

class PythonDict(click.Option):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the class is a bit too general. Maybe indicating it's a custom click command class. Also why not move it to cli.py (if it's only used there? I might be wrong)


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more comments / documentation? In general, we'll need one Docstring per function, ideally describing (1) what the function does, (2) any parameters, (3) the return value. You might only do (1) if the function is simple. Search for "Docstring" to learn more about the format

def type_cast_value(self, ctx, value):
try:
return dict(ast.literal_eval(value))
except:
raise click.BadParameter(value)

9 changes: 7 additions & 2 deletions boba/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,19 @@
default='.', show_default=True)
@click.option('--lang', help='Language, can be python/R [default: inferred from file extension]',
default='')
def compile(script, out, lang):
@click.option('--decisions', '-d', cls=PythonDict, default='{}')
def compile(script, out, lang, decisions):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more flexible if we add this "decisions" option to boba run instead of boba compile? So we only need to generate the entire multiverse once, and we could have the flexibility of running a small subset based on decision values. This will be a very useful feature to add to boba run.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the implementation will be much simpler as we'll only need to read and filter the summary.csv to get the correct universe ids. Are there particular reasons that modifying boba compile is better?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main issue with making this part of boba run and not boba compile is continuous variables.
Let's say i have a template that has a continuous variable A with range 0..1, and a static variable B with options "red", "green", "blue".
Now lets look at the two different cases, filter on boba run and filter on boba compile.

filter on boba run:
We already ran boba compile and generated the universes. This means there is some number of universes where A is in the range 0..1 and B has values "red", "green", "blue".
Filtering on B is fine: we can easily do boba run -d "{'B': 'red'}", and that would run all of the generated universes that have B = "red".
Filtering on A, however, is problematic. Suppose I do boba run -d "{'A': 0.5}". Per the spec provided by our user, this is legal. A should be able to take on any value in the range 0..1. But in the implementation, this is not the case, as boba compile creates a set of discrete universes via sampling, not infinite universes. Thus, there may not be a generated universe such that A = 0.5.
How could we fix this? Well, we could pick the universe with the closest A value, but that is technically lying to the user since we're not actually testing with A = 0.5. And what if the number of samples is low and there really isn't a 'close enough' A?

filter on boba compile:
Again, filtering on B is trivial: simply generate the universes where B is a particular value.
Filtering on A is also trivial now, since we can simply generate the universes where A is a particular value.

That was my reasoning behind choosing to use boba compile instead of boba run.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The example was helpful and I think the reasoning makes sense. As we discussed in the meeting, besides the review points we'll also need to combine path_filter with the rest, as placeholder vars and blocks aren't allowed to have the same identifier.

"""Generate multiverse analysis from specifications."""

check_path(script)

click.echo('Creating multiverse from {}'.format(script))
ps = Parser(script, out, lang)
ps.main()

if decisions:
ps.main(decisions)
else:
ps.main()

ex = """To execute the multiverse, run the following commands:
boba run --all
Expand Down
62 changes: 41 additions & 21 deletions boba/decisionparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,22 @@ class Decision:
value: list
desc: str = ''

def check_value(self, val):
""" Check if a value can be contained by this decision """
for option in self.value:
if val == option:
return True
elif isinstance(option, dict) and isinstance(val, float):
if 'range' in option:
if val > option['range'][0] and val < option['range'][1]:
return True
if (('exclusive' in option and not option['exclusive']) or 'exclusive' not in option) and\
(val == option['range'][0] or val == option['range'][1]):
return True
else:
return True
return False

class SamplingError(SyntaxError):
pass

Expand Down Expand Up @@ -47,7 +63,10 @@ def _is_id_token(s):

@staticmethod
def check_var_types(args, types, names):
for i in range(0, len(args)):
if len(args) != len(names):
raise ValueError('expected ' + str(len(names)) + ' arguments')

for i in range(0, len(names)):
if not isinstance(args[i], types[i]):
raise ValueError(names[i] + ' must be of type ' + str(types[i]))

Expand All @@ -66,14 +85,14 @@ def get_within_range(function, args, distribution_range, exclusive):
return val

@staticmethod
def random_uniform(minimum, maximum, args):
def random_uniform(distr_range, args):
"""randomly sample a number from a uniform distribution"""
exclusive = args.get('exclusive', False)
DecisionParser.check_var_types([minimum, maximum, exclusive],
[float, float, bool],
['min', 'max', 'exclusive'])
DecisionParser.check_var_types([distr_range, exclusive],
[list, bool],
['range', 'exclusive'])

distr_range = [minimum, maximum]
DecisionParser.check_var_types(distr_range, [float, float], ['range[0]', 'range[1]'])
return DecisionParser.get_within_range(random.uniform, distr_range, distr_range, exclusive)

@staticmethod
Expand All @@ -82,21 +101,17 @@ def rand_x_normal(function, args):
mean = args.get('mean', 0.0)
std_dev = args.get('std_dev', 1.0)
exclusive = args.get('exclusive', False)
distribution_range = args.get('range', [])
DecisionParser.check_var_types([mean, std_dev, exclusive, distribution_range],
distr_range = args.get('range', [])
DecisionParser.check_var_types([mean, std_dev, exclusive, distr_range],
[float, float, bool, list],
['mean', 'std_dev', 'exclusive', 'range'])

if len(distribution_range) == 0:
distribution_range = None

if distribution_range and len(distribution_range) != 2:
raise ValueError('expected two items in range list')
elif distribution_range and len(distribution_range) == 2:
DecisionParser.check_var_types(distribution_range, [float, float], ['range[0]', 'range[1]'])

if distribution_range:
return DecisionParser.get_within_range(function, [mean, std_dev], distribution_range, exclusive)
if len(distr_range) == 0:
distr_range = None

if distr_range:
DecisionParser.check_var_types(distr_range, [float, float], ['range[0]', 'range[1]'])
return DecisionParser.get_within_range(function, [mean, std_dev], distr_range, exclusive)
else:
return function(mean, std_dev)

Expand All @@ -114,7 +129,7 @@ def random_normal(args):
def discretize(obj, discretization_method, count):
"""discretizes a continuous variable into 'count' descrete options."""
discretization_methods = {
'uniform': DiscretizationFn(DecisionParser.random_uniform, ['min', 'max'], ['exclusive']),
'uniform': DiscretizationFn(DecisionParser.random_uniform, ['range'], ['exclusive']),
'lognormal': DiscretizationFn(DecisionParser.random_lognormal, [], ['mean', 'std_dev', 'exclusive', 'range']),
'normal' : DiscretizationFn(DecisionParser.random_normal, [], ['mean', 'std_dev', 'exclusive', 'range'])
}
Expand Down Expand Up @@ -264,15 +279,20 @@ def get_decs(self):
"""Get a list of decision names."""
return [i for i in self.decisions.keys()]

def gen_code(self, template, dec_id, i_alt):
def gen_code(self, template, dec_id=None, i_alt=None, option=None):
"""
Replace the placeholder variable in a template chunk.
:param template: a chunk of code with only one placeholder
:param dec_id: variable ID of the decision
:param i_alt: which alternative
:return: {string, string} replaced code and the value at this parameter
"""
v = self.get_alt_discrete(dec_id, i_alt)
if dec_id is not None and i_alt is not None:
v = self.get_alt_discrete(dec_id, i_alt)
elif option is not None:
v = option
else:
raise ValueError("either dec_id and i_alt must not be None, or option must not be None")

# assuming the placeholder var is always at the end
# which is true given how we chop up the chunks
Expand Down
86 changes: 76 additions & 10 deletions boba/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,14 +249,14 @@ def _code_gen_recur(self, path, i, code, history):

if chunk.variable != '':
# check if we have already encountered the placeholder variable
prev_idx = None
prev_option = None
for d in history.decisions:
if d.parameter == chunk.variable:
prev_idx = d.idx
prev_option = d.option

if prev_idx is not None:
if prev_option is not None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does prev_option has a fixed type (e.g. string)? We need to make sure a user-defined option will never be None. And if you use == with two options somewhere else, it might not work for types like floating point ... I used index before because it is an integer and we do not have to worry about these.

# use the previous value
snippet, opt = self.dec_parser.gen_code(chunk.code, chunk.variable, prev_idx)
snippet, opt = self.dec_parser.gen_code(chunk.code, option=prev_option)
self._code_gen_recur(path, i+1, code+snippet, history)
else:
# expand the decision
Expand All @@ -272,7 +272,7 @@ def _code_gen_recur(self, path, i, code, history):
continue

# code gen
snippet, opt = self.dec_parser.gen_code(chunk.code, chunk.variable, k)
snippet, opt = self.dec_parser.gen_code(chunk.code, dec_id=chunk.variable, i_alt=k)
decs = [a for a in history.decisions]
decs.append(DecRecord(chunk.variable, opt, k))
self._code_gen_recur(path, i+1, code + snippet,
Expand All @@ -281,7 +281,7 @@ def _code_gen_recur(self, path, i, code, history):
code += chunk.code
self._code_gen_recur(path, i+1, code, history)

def _code_gen(self):
def _code_gen(self, decrecords=None, path_filter=None):
paths = self._get_code_paths()

self.wrangler.counter = 0 # keep track of file name
Expand All @@ -290,8 +290,46 @@ def _code_gen(self):
+ len(self.code_parser.get_decisions())

self.wrangler.create_dir()
for idx, p in enumerate(paths):
self._code_gen_recur(p, 0, '', History(idx))

if decrecords is None:
for idx, p in enumerate(paths):
self._code_gen_recur(p, 0, '', History(idx))
elif path_filter is None:
for idx, p in enumerate(paths):
self._code_gen_recur(p, 0, '', History(idx, '', decrecords, []))
else:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly you might want to encapsulate this chunk of code in a function that returns valid_paths. And here you'll only need a line to get the valid paths. We can probably get rid of the if-else block too (if it's possible to pass an empty decrecord to History and it will still work properly). Smaller functions that do simple things are much easier to read and maintain.

valid_paths = []
for idx, p in enumerate(paths):
run_path = True
filtered_nodes = set()
for node in p:
if ':' in node[0]:
split = node[0].split(':')
name = split[0]
value = split[1]

if name not in path_filter:
continue

if path_filter[name] != value:
run_path = False
break
else:
filtered_nodes.add(name)

for node in path_filter:
if node not in filtered_nodes:
run_path = False
break

if run_path:
valid_paths.append((idx, p))

if not valid_paths:
raise ValueError('bad input to path_filter!')

for path in valid_paths:
self._code_gen_recur(path[1], 0, '', History(path[0], '', decrecords, []))

# write the pre and post execs to a file.
self.wrangler.write_pre_exe()
Expand Down Expand Up @@ -387,9 +425,37 @@ def _warn_size(self):
print('Aborted.')
exit(0)

def main(self, verbose=True):
def main(self, decisions=None, verbose=True):
self._warn_size()
self._code_gen()

if decisions is not None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps move this chunk of code to a new function in the parser class (or a function in the decision parser class). We want to keep this main function nice and simple. Also, as we've discussed, placeholder variables are not allowed to have the same name as code blocks, so maybe we'll need some revisions related to this.

decrecords = []
code_path = 'path_filter'
if code_path in decisions:
path_filter = decisions[code_path]
else:
path_filter = None

for key, value in decisions.items():
if key == code_path:
continue

dec_exist = False
for decname, dec in self.dec_parser.decisions.items():
if decname == key:
if not dec.check_value(value):
raise ValueError('The decision "' + key + '" cannot have the value "' + str(value) + '"')
dec_exist = True

if not dec_exist:
raise ValueError('The decision "' + key + '" does not exist')

decrecords.append(DecRecord(key, value, 0))

self._code_gen(decrecords, path_filter)
else:
self._code_gen()

self._write_csv()
self._write_server_config()
if verbose:
Expand Down
3 changes: 1 addition & 2 deletions example/simple_cont/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
"seed" : 0,
"sample" : "uniform",
"count" : 50,
"min" : 1.0,
"max" : 3.0
"range" : [1.0, 3.0]
}
]
}
Expand Down
26 changes: 8 additions & 18 deletions test/specs/continuous-err.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,55 +21,45 @@
},
"4" : {
"decisions": [
{"var": "err", "options": [{"sample" : "uniform", "count" : 5, "min" : 0.0}] , "desc" : "check required variable omission"}
{"var": "err", "options": [{"sample" : "uniform", "count" : 5, "range" : [true, 1.0]}] , "desc" : "check bad type for variables"}
]
},
"5" : {
"decisions": [
{"var": "err", "options": [{"sample" : "uniform", "count" : 5, "max" : 0.0}] , "desc" : "check required variable omission"}
{"var": "err", "options": [{"sample" : "uniform", "count" : 5, "range" : [1.0, "bad"]}] , "desc" : "check bad type for variables"}
]
},
"6" : {
"decisions": [
{"var": "err", "options": [{"sample" : "uniform", "count" : 5, "min" : true, "max" : 5.0}] , "desc" : "check bad type for variables"}
]
},
"7" : {
"decisions": [
{"var": "err", "options": [{"sample" : "uniform", "count" : 5, "min" : 1.0, "max" : true}] , "desc" : "check bad type for variables"}
]
},
"8" : {
"decisions": [
{"var": "err", "options": [{"sample" : "lognormal", "count" : 5, "exclusive" : 1.0}] , "desc" : "check bad type for variables"}
]
},
"9" : {
"7" : {
"decisions": [
{"var": "err", "options": [{"sample" : "lognormal", "count" : 5, "mean" : "mean"}] , "desc" : "check bad type for variables"}
]
},
"10" : {
"8" : {
"decisions": [
{"var": "err", "options": [{"sample" : "normal", "count" : 5, "range" : "range"}] , "desc" : "check bad type for variables"}
]
},
"11" : {
"9" : {
"decisions": [
{"var": "err", "options": [{"sample" : "normal", "count" : 5, "range" : ["range", "range"]}] , "desc" : "check bad type for variables"}
]
},
"12" : {
"10" : {
"decisions": [
{"var": "err", "options": [{"sample" : "normal", "count" : 5, "range" : [0.0, 1.0, 2.0]}] , "desc" : "check bad type for variables"}
]
},
"13" : {
"11" : {
"decisions": [
{"var": "err", "options": [{"sample" : "normal", "count" : 5, "range" : [1.0, 0.0]}] , "desc" : "check bad type for variables"}
]
},
"14" : {
"12" : {
"decisions": [
{"var": "err", "options": [{"sample" : "normal", "count" : 5, "std_dev" : 1.0, "range" : [1.0, 0.0]}] , "desc" : "check bad type for variables"}
]
Expand Down
14 changes: 7 additions & 7 deletions test/specs/continuous.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"decisions": [
{"var": "A", "options": [{"sample" : "uniform", "count" : 10, "seed" : 0, "min" : 0.0, "max" : 5.0}] ,
{"var": "A", "options": [{"sample" : "uniform", "count" : 10, "seed" : 0,"range" : [0.0, 5.0]}] ,
"desc" : "uniform continuous variable expansion"},

{"var": "B", "options": [{"sample" : "lognormal", "count" : 10, "seed" : 0, "mean" : 0.0, "std_dev" : 5.0}] ,
Expand All @@ -9,7 +9,7 @@
{"var": "C", "options": [{"sample" : "normal", "count" : 10, "seed" : 0, "mean" : 0.0, "std_dev" : 5.0}] ,
"desc" : "normal continuous variable expansion"},

{"var": "D", "options": [{"sample" : "uniform", "count" : 10, "seed" : 0, "min" : 0.0, "max" : 5.0}, 17.0] ,
{"var": "D", "options": [{"sample" : "uniform", "count" : 10, "seed" : 0,"range" : [0.0, 5.0]}, 17.0] ,
"desc" : "uniform continuous variable expansion with additional constants"},

{"var": "E", "options": [{"sample" : "lognormal", "count" : 10, "seed" : 0, "mean" : 0.0, "std_dev" : 5.0}, 0.0, 1.0, 2.0] ,
Expand All @@ -18,17 +18,17 @@
{"var": "F", "options": [{"sample" : "normal", "count" : 10, "seed" : 0, "mean" : 0.0, "std_dev" : 5.0}, 0.0, 1.0, 2.0, 3.0, 4.0] ,
"desc" : "normal continuous variable expansion with additional constants"},

{"var": "G", "options": [{"sample" : "uniform", "count" : 3, "seed" : 0, "min" : 0.0, "max" : 5.0},
{"var": "G", "options": [{"sample" : "uniform", "count" : 3, "seed" : 0,"range" : [0.0, 5.0]},
{"sample" : "lognormal", "count" : 3, "seed" : 0, "mean" : 0.0, "std_dev" : 5.0},
{"sample" : "normal", "count" : 3, "seed" : 0, "mean" : 0.0, "std_dev" : 5.0}] ,
"desc" : "multiple continuous variable expansions"},

{"var": "H", "options": [{"sample" : "uniform", "count" : 4, "seed" : 0, "min": 0.0, "max": 5.0},
{"sample" : "uniform", "count" : 4, "seed" : 1, "min": 10.0, "max": 15.0}] ,
{"var": "H", "options": [{"sample" : "uniform", "count" : 4, "seed" : 0, "range" : [0.0, 5.0]},
{"sample" : "uniform", "count" : 4, "seed" : 1, "range" : [10.0, 15.0]}] ,
"desc" : "multiple continuous variable expansions"},

{"var": "I", "options": [{"sample" : "uniform", "count" : 3, "seed" : 0, "min": 0.0, "max": 5.0}, -1.1,
{"sample" : "uniform", "count" : 3, "seed" : 1, "min": 10.0, "max": 15.0},
{"var": "I", "options": [{"sample" : "uniform", "count" : 3, "seed" : 0, "range" : [0.0, 5.0]}, -1.1,
{"sample" : "uniform", "count" : 3, "seed" : 1, "range" : [10.0, 15.0]},
0.0, 1.0, 2.0, 3.1415] ,
"desc" : "multiple continuous variable expansions with additional constants"},

Expand Down
Loading