pymcmcstat package

pymcmcstat.MCMC module

Created on Wed Jan 17, 2018

@author: prmiles

Description: This module is intended to be the main class from which to run these Markov Chain Monte Carlo type simulations. The user will create an MCMC object, initialize data, simulation options, model settings and parameters.

class pymcmcstat.MCMC.MCMC(rngseed=None, seterr={'over': 'ignore', 'under': 'ignore'})[source]

Bases: object

Markov Chain Monte Carlo (MCMC) simulation object.

This class type is the primary feature of this Python package. All simulations are run through this class type, and for the most part the user will interact with an object of this type. The class initialization provides the option for setting the random seed, which can be very useful for testing the functionality of the code. It was found that setting the random seed at object initialization was the simplest interface.

Args:
  • rngseed (float): Seed for numpy’s random number generator.

Attributes:
display_current_mcmc_settings()[source]

Display model settings, simulation options, and current covariance values.

Example display:

model settings:
    sos_function = <function test_ssfun at 0x1c13c5d400>
    model_function = None
    sigma2 = [1.]
    N = [100.]
    N0 = [0.]
    S20 = [1.]
    nsos = 1
    nbatch = 1
simulation options:
    nsimu = 5000
    adaptint = 100
    ntry = 2
    method = dram
    printint = 100
    lastadapt = 5000
    drscale = [5. 4. 3.]
    qcov = None
covariance:
    qcov = [[0.01   0.    ]
    [0.     0.0625]]
    R = [[0.1  0.  ]
    [0.   0.25]]
    RDR = [array([[0.1 , 0.  ],
   [0.  , 0.25]]), array([[0.02, 0.  ],
   [0.  , 0.05]])]
    invR = [array([[10.,  0.],
   [ 0.,  4.]]), array([[50.,  0.],
   [ 0., 20.]])]
    last_index_since_adaptation = 0
    covchain = None
run_simulation(use_previous_results=False)[source]

Run MCMC Simulation

Note

This is the method called by the user to run the simulation. The user must specify a data structure, setup simulation options, and define the model settings and parameters before calling this method.

Args:
  • use_previous_results (bool): Flag to indicate whether simulation is being restarted.

pymcmcstat.MCMC.print_rejection_statistics(rejected, isimu, iiadapt, verbosity)[source]

Print Rejection Statistics.

Threshold for printing is verbosity greater than or equal to 2. If the rejection counters are as follows:

  • total: 144

  • in_adaptation_interval: 92

  • outside_bounds: 0

Then we would expect the following display at the 200th simulation with an adaptation interval of 100.

i:200 (72.0,92.0,0.0)
Args:
  • isimu (int): Simulation counter

  • iiadapt (int): Adaptation counter

  • verbosity (int): Verbosity of display output.

pymcmcstat.ParallelMCMC module

Created on Tue May 1 15:58:22 2018

@author: prmiles

class pymcmcstat.ParallelMCMC.ParallelMCMC[source]

Bases: object

Run Parallel MCMC Simulations.

Attributes:
display_individual_chain_statistics()[source]

Display chain statistics for different chains in parallel simulation.

Example display:

****************************************
Displaying results for chain 0
Files: <parallel_dir>/chain_0

---------------------
name      :       mean        std     MC_err        tau     geweke
m         :     1.9869     0.1295     0.0107   320.0997     0.9259
b         :     3.0076     0.2489     0.0132   138.1260     0.9413
---------------------

****************************************
Displaying results for chain 1
Files: <parallel_dir>/chain_1

---------------------
name      :       mean        std     MC_err        tau     geweke
m         :     1.8945     0.4324     0.0982  2002.6361     0.3116
b         :     3.2240     1.0484     0.2166  1734.0201     0.4161
---------------------
run_parallel_simulation()[source]

Run MCMC simulations in parallel.

The code is run in parallel by using Pool. While running, you can expect a display similar to

Processing: <parallel_dir>/chain_1
Processing: <parallel_dir>/chain_0
Processing: <parallel_dir>/chain_2

The simulation is complete when you see the run time displayed.

Parallel simulation run time: 16.15234899520874 sec
setup_parallel_simulation(mcset, initial_values=None, num_cores=1, num_chain=1)[source]

Setup simulation to run in parallel.

Settings defined in mcset object will be copied into different instances in order to run parallel chains.

Args:
  • mcset (MCMC): Instance of MCMC object with serial simulation setup.

  • initial_values (ndarray): Array of initial parameter values - [num_chain,npar].

  • num_cores (int): Number of cores designated by user.

  • num_chain (int): Number of sampling chains to be generated.

pymcmcstat.ParallelMCMC.assign_number_of_cores(num_cores=1)[source]

Assign number of cores to use in parallel process

Args:
  • num_cores (int): Number of cores designated by user.

Returns:
  • num_cores (int): Number of cores designated by user or maximum number of cores available on machine.

pymcmcstat.ParallelMCMC.check_directory(directory)[source]

Check and make sure directory exists

Args:
  • directory (str): Folder/directory path name.

pymcmcstat.ParallelMCMC.check_for_restart_file(json_restart_file, chain_dir)[source]

Check if restart directory was specified.

Args:
  • json_restart_file (str): String specifying path to results directory.

  • chain_dir (str): String specifying which chain is being restarted.

pymcmcstat.ParallelMCMC.check_initial_values(initial_values, num_chain, npar, low_lim, upp_lim)[source]

Check if initial values satisfy requirements.

Args:
  • initial_values (ndarray): Array of initial parameter values - [num_chain,npar].

  • num_chain (int): Number of sampling chains to be generated.

  • npar (int): Number of model parameters.

  • low_lim (ndarray): Lower limits.

  • upp_lim (ndarray): Upper limits.

Returns:
  • initial_values (ndarray): Array of initial parameter values - [num_chain,npar].

  • num_chain (int): Number of sampling chains to be generated.

pymcmcstat.ParallelMCMC.check_options_output(options)[source]

Check output settings defined in options

Args:
Returns:
  • options (SimulationOptions): MCMC simulation options with at least binary save flag set to True.

pymcmcstat.ParallelMCMC.check_shape_of_users_initial_values(initial_values, num_chain, npar)[source]

Check shape of users initial values

Args:
  • initial_values (ndarray): Array of initial parameter values - expect [num_chain,npar]

  • num_chain (int): Number of sampling chains to be generated.

  • npar (int): Number of model parameters.

Returns:
  • num_chain (int): Number of sampling chains to be generated - equal to number of rows in initial values array.

  • initial_values

pymcmcstat.ParallelMCMC.check_users_initial_values_wrt_limits(initial_values, low_lim, upp_lim)[source]

Check users initial values wrt parameter limits

Args:
  • initial_values (ndarray): Array of initial parameter values - expect [num_chain,npar]

  • low_lim (ndarray): Lower limits.

  • upp_lim (ndarray): Upper limits.

Returns:
  • initial_values (ndarray): Array of initial parameter values - expect [num_chain,npar]

Raises:
  • SystemExit if initial values are outside parameter bounds.

pymcmcstat.ParallelMCMC.generate_initial_values(num_chain, npar, low_lim, upp_lim)[source]

Generate initial values by sampling from uniform distribution between limits

Args:
  • num_chain (int): Number of sampling chains to be generated.

  • npar (int): Number of model parameters.

  • low_lim (ndarray): Lower limits.

  • upp_lim (ndarray): Upper limits.

Returns:
  • initial_values (ndarray): Array of initial parameter values - [num_chain,npar]

pymcmcstat.ParallelMCMC.get_parameter_features(parameters)[source]

Get features of model parameters.

Args:
  • parameters (list): List of MCMC model parameter dictionaries.

Returns:
  • npar (int): Number of model parameters.

  • low_lim (ndarray): Lower limits.

  • upp_lim (ndarray): Upper limits.

pymcmcstat.ParallelMCMC.load_parallel_simulation_results(savedir, extension='h5', chainfile='chainfile', sschainfile='sschainfile', s2chainfile='s2chainfile', covchainfile='covchainfile')[source]

Load results from parallel simulation directory json files.

Lists in json files are converted to numpy arrays.

Args:
  • savedir (str): String indicated path to parallel directory.

Returns:
  • pres (list): Each element of list is an MCMC result dictionary.

pymcmcstat.ParallelMCMC.run_serial_simulation(mcstat)[source]

Run serial MCMC simulation

Args:
  • mcstat (MCMC.MCMC): MCMC object.

Returns:
  • results (dict): Results dictionary for serial simulation.

pymcmcstat.ParallelMCMC.unpack_mcmc_set(mcset)[source]

Unpack attributes of MCMC object.

Args:
  • mcset (MCMC): MCMC object.

Returns:

pymcmcstat.propagation module

Created on Wed Nov 8 12:00:11 2017

@author: prmiles

pymcmcstat.propagation.calculate_intervals(chain, results, data, model, s2chain=None, nsample=500, waitbar=True, sstype=0)[source]

Calculate distribution of model response to form propagation intervals

Samples values from chain, performs forward model evaluation, and tabulates credible and prediction intervals (if obs. error var. included).

Args:
  • chain (ndarray): Parameter chains, expect shape=(nsimu, npar).

  • results (dict): Results dictionary generated by pymcmcstat.

  • data (DataStructure): Data

  • model: User defined function. Note, if your model outputs multiple quantities of interest (QoI) at the same time in a multi-dimensional array, then make sure it is returned as a (N, p) array where N is the number of evaluation points and p is the number of QoI.

Kwargs:
  • s2chain (float, ndarray, or None): Observation error variance chain.

  • nsample (int): No. of samples drawn from posteriors.

  • waitbar (bool): Flag to display progress bar.

  • sstype (int): Sum-of-squares type. Can be 0 (normal), 1 (sqrt), or 2 (log).

Returns:
  • dict with two elements: 1) credible and 2) prediction

pymcmcstat.propagation.check_s2chain(s2chain, nsimu)[source]

Check size of s2chain

Args:
  • s2chain (float, ndarray, or None):

    Observation error variance chain or value

  • nsimu (int): No. of elements in chain

Returns:
pymcmcstat.propagation.define_sample_points(nsample, nsimu)[source]

Define indices to sample from posteriors.

Args:
  • nsample (int): Number of samples to draw from posterior.

  • nsimu (int): Number of MCMC simulations.

Returns:
  • iisample (ndarray): Array of indices in posterior set.

  • nsample (int): Number of samples to draw from posterior.

pymcmcstat.propagation.generate_quantiles(x, p=array([0.25, 0.5, 0.75]))[source]

Calculate empirical quantiles.

Args:
  • x (ndarray): Observations from which to generate quantile.

  • p (ndarray): Quantile limits.

Returns:
  • (ndarray): Interpolated quantiles.

pymcmcstat.propagation.observation_sample(s2, y, sstype)[source]

Calculate model response with observation errors.

Args:
  • s2 (ndarray): Observation error(s).

  • y (ndarray): Model responses.

  • sstype (int): Flag to specify sstype.

Returns:
  • opred (ndarray): Model responses with observation errors.

pymcmcstat.propagation.plot_3d_intervals(intervals, time, ydata=None, xdata=None, limits=[95], adddata=False, addlegend=True, addmodel=True, figsize=None, model_display={}, data_display={}, interval_display={}, addcredible=True, addprediction=True, fig=None, legloc='upper left', ciset=None, piset=None, return_settings=False)[source]

Plot propagation intervals in 3-D

This routine takes the model distributions generated using the calculate_intervals() method and then plots specific quantiles. The user can plot just the intervals, or also include the median model response and/or observations. Specific settings for credible intervals are controlled by defining the ciset dictionary. Likewise, for prediction intervals, settings are defined using piset.

The setting options available for each interval are as follows:
  • limits: This should be a list of numbers between 0 and 100, e.g., limits=[50, 90] will result in 50% and 90% intervals.

  • cmap: The program is designed to “try” to choose colors that are visually distinct. The user can specify the colormap to choose from.

  • colors: The user can specify the color they would like for each interval in a list, e.g., [‘r’, ‘g’, ‘b’]. This list should have the same number of elements as limits or the code will revert back to its default behavior.

Args:
  • intervals (dict): Interval dictionary generated using calculate_intervals() method.

  • time (ndarray): Independent variable, i.e., x- and y-axes of plot. Note, it must be a 2-D array with shape=(N, 2), where N is the number of evaluation points.

Kwargs:
  • ydata (ndarray or None): Observations, expect 1-D array if defined.

  • xdata (ndarray or None): Independent values corresponding to observations. This is required if the observations do not align with your times of generating the model response.

  • limits (list): Quantile limits that correspond to percentage size of desired intervals. Note, this is the default limits, but specific limits can be defined using the ciset and piset dictionaries.

  • adddata (bool): Flag to include data

  • addmodel (bool): Flag to include median model response

  • addlegend (bool): Flag to include legend

  • addcredible (bool): Flag to include credible intervals

  • addprediction (bool): Flag to include prediction intervals

  • model_display (dict): Display settings for median model response

  • data_display (dict): Display settings for data

  • interval_display (dict): General display settings for intervals.

  • fig: Handle of previously created figure object

  • figsize (tuple): (width, height) in inches

  • legloc (str): Legend location - matplotlib help for details.

  • ciset (dict): Settings for credible intervals

  • piset (dict): Settings for prediction intervals

  • return_settings (bool): Flag to return ciset and piset along with fig and ax.

Returns:
  • (tuple) with elements
    1. Figure handle

    2. Axes handle

    3. Dictionary with ciset and piset inside (only outputted if return_settings=True)

pymcmcstat.propagation.plot_intervals(intervals, time, ydata=None, xdata=None, limits=[95], adddata=None, addmodel=True, addlegend=True, addcredible=True, addprediction=True, data_display={}, model_display={}, interval_display={}, fig=None, figsize=None, legloc='upper left', ciset=None, piset=None, return_settings=False)[source]

Plot propagation intervals in 2-D

This routine takes the model distributions generated using the calculate_intervals() method and then plots specific quantiles. The user can plot just the intervals, or also include the median model response and/or observations. Specific settings for credible intervals are controlled by defining the ciset dictionary. Likewise, for prediction intervals, settings are defined using piset.

The setting options available for each interval are as follows:
  • limits: This should be a list of numbers between 0 and 100, e.g., limits=[50, 90] will result in 50% and 90% intervals.

  • cmap: The program is designed to “try” to choose colors that are visually distinct. The user can specify the colormap to choose from.

  • colors: The user can specify the color they would like for each interval in a list, e.g., [‘r’, ‘g’, ‘b’]. This list should have the same number of elements as limits or the code will revert back to its default behavior.

Args:
Kwargs:
  • ydata (ndarray or None): Observations, expect 1-D array if defined.

  • xdata (ndarray or None): Independent values corresponding to observations. This is required if the observations do not align with your times of generating the model response.

  • limits (list): Quantile limits that correspond to percentage size of desired intervals. Note, this is the default limits, but specific limits can be defined using the ciset and piset dictionaries.

  • adddata (bool): Flag to include data

  • addmodel (bool): Flag to include median model response

  • addlegend (bool): Flag to include legend

  • addcredible (bool): Flag to include credible intervals

  • addprediction (bool): Flag to include prediction intervals

  • model_display (dict): Display settings for median model response

  • data_display (dict): Display settings for data

  • interval_display (dict): General display settings for intervals.

  • fig: Handle of previously created figure object

  • figsize (tuple): (width, height) in inches

  • legloc (str): Legend location - matplotlib help for details.

  • ciset (dict): Settings for credible intervals

  • piset (dict): Settings for prediction intervals

  • return_settings (bool): Flag to return ciset and piset along with fig and ax.

Returns:
  • (tuple) with elements
    1. Figure handle

    2. Axes handle

    3. Dictionary with ciset and piset inside (only outputted if return_settings=True)

pymcmcstat.propagation.setup_display_settings(interval_display, model_display, data_display)[source]

Compare user defined display settings with defaults and merge.

Args:
  • interval_display (dict): User defined settings for interval display.

  • model_display (dict): User defined settings for model display.

  • data_display (dict): User defined settings for data display.

Returns:
  • interval_display (dict): Settings for interval display.

  • model_display (dict): Settings for model display.

  • data_display (dict): Settings for data display.

pymcmcstat.propagation.setup_interval_colors(iset, inttype='CI')[source]

Setup colors for empirical intervals

This routine attempts to distribute the color of the UQ intervals based on a normalize color map. Or, it will assign user-defined colors; however, this only happens if the correct number of colors are specified.

Args:
  • iset (dict): This dictionary should contain the following keys - limits, cmap, and colors.

Kwargs:
  • inttype (str): Type of uncertainty interval

Returns:
  • ic (list): List containing color for each interval