pymcmcstat.chain package¶

pymcmcstat.chain.ChainProcessing module¶

Created on Tue May 1 09:12:06 2018

@author: prmiles

pymcmcstat.chain.ChainProcessing.check_parallel_directory_contents(parallel_dir, cf_orig)[source]¶

Check that items in directory are subdirectories with name “chain_#”

Args:

parallel_dir (str): Directory where parallel log files are saved.
cf_orig (list): List of items contained in parallel directory.

Returns:

chainfolders (list): List of items that match criteria in parallel directory.

pymcmcstat.chain.ChainProcessing.generate_chain_list(pres, burnin_percentage=50)[source]¶

Generate list of chains.

Args:

pres (list): Parallel results list.
burnin_percentage (int): Percentage of chain to remove for burnin.

Returns:

(list): Each element of list corresponds to different chain set.

pymcmcstat.chain.ChainProcessing.generate_combined_chain_with_index(pres, burnin_percentage=50)[source]¶

Generate combined chain with index.

Args:

pres (list): Parallel results list.
burnin_percentage (int): Percentage of chain to remove for burnin.

Returns:

(ndarray, list): Combined chain array, index label

pymcmcstat.chain.ChainProcessing.load_json_object(filename)[source]¶

Load object stored in json file.

Note

Filename should include extension.

Args:

filename (str): Load object from file with this name.

Returns:

results (dict): Object loaded from file.

pymcmcstat.chain.ChainProcessing.print_log_files(savedir)[source]¶

Print log files to screen.

Args:

savedir (str): Directory where log files are saved.

The output display will include a date/time stamp, as well as indices of the chain that were saved during that export sequence.

Example display:

--------------------------
Display log file: <savedir>/binlogfile.txt
2018-05-03 14:15:54     0       999
2018-05-03 14:15:54     1000    1999
2018-05-03 14:15:55     2000    2999
2018-05-03 14:15:55     3000    3999
2018-05-03 14:15:55     4000    4999
--------------------------

pymcmcstat.chain.ChainProcessing.read_in_bin_file(filename)[source]¶

Read in information from file containing binary data.

If file exists, it will read in the array elements. Otherwise, it will return and empty list.

Args:

filename (str): Name of file to read.

Returns:

out (ndarray): Array of chain elements.

pymcmcstat.chain.ChainProcessing.read_in_parallel_json_results_files(parallel_dir)[source]¶

Read in json results files from directory containing results from parallel MCMC simulation.

Args:

parallel_dir (str): Directory where parallel log files are saved.

pymcmcstat.chain.ChainProcessing.read_in_parallel_savedir_files(parallel_dir, extension='h5', chainfile='chainfile', sschainfile='sschainfile', s2chainfile='s2chainfile', covchainfile='covchainfile')[source]¶

Read in log files from directory containing results from parallel MCMC simulation.

Args:

parallel_dir (str): Directory where parallel log files are saved.
extension (str): Extension of files being loaded.
chainfile (str): Name of chain log file.
sschainfile (str): Name of sschain log file.
s2chainfile (str): Name of s2chain log file.
covchainfile (str): Name of covchain log file.

pymcmcstat.chain.ChainProcessing.read_in_savedir_files(savedir, extension='h5', chainfile='chainfile', sschainfile='sschainfile', s2chainfile='s2chainfile', covchainfile='covchainfile')[source]¶

Read in log files from directory.

Args:

savedir (str): Directory where log files are saved.
extension (str): Extension of files being loaded.
chainfile (str): Name of chain log file.
sschainfile (str): Name of sschain log file.
s2chainfile (str): Name of s2chain log file.
covchainfile (str): Name of covchain log file.

pymcmcstat.chain.ChainProcessing.read_in_txt_file(filename)[source]¶

Read in information from file containing text data.

If file exists, it will read in the array elements. Otherwise, it will return and empty list.

Args:

filename (str): Name of file to read.

Returns:

out (ndarray): Array of chain elements.

pymcmcstat.chain.ChainStatistics module¶

Created on Thu Apr 26 10:23:51 2018

@author: prmiles

pymcmcstat.chain.ChainStatistics.batch_mean_standard_deviation(chain, b=None)[source]¶

Standard deviation calculated from batch means

Args:

chain (ndarray): Sampling chain.
b (int): Step size.

Returns:

s (ndarray): Batch mean standard deviation.

pymcmcstat.chain.ChainStatistics.calculate_psrf(x, nsimu, nchains)[source]¶

Calculate Potential Scale Reduction Factor (PSRF)

Performs analysis of variances for set of chains corresponding to a single parameter. This code follows the MATLAB implementation found here:

https://users.aalto.fi/~ave/code/mcmcdiag/

Args:

x (ndarray): Expect an [nsimu x nchains] array.
nsimu (int): Number of simulations in each chain.
nchains (int): Number of chains.

Returns:

dict
- R - PSRF
- B - Between Sequence Variances
- W - Within Sequence Variances
- V - Mixture-of-Sequences Variances
- neff - Effective number of samples

pymcmcstat.chain.ChainStatistics.chainstats(chain=None, results=None, returnstats=False)[source]¶

Calculate chain statistics.

Args:

chain (ndarray): Sampling chain.
results (dict): Results from MCMC simulation.
returnstats (bool): Flag to return statistics.

Returns:

stats (dict): Statistical measures of chain convergence.

pymcmcstat.chain.ChainStatistics.display_gelman_rubin(psrf)[source]¶

Display results of Gelman-Rubin diagnostic

Args:

psrf (dict): Results from GR diagnostic

pymcmcstat.chain.ChainStatistics.gelman_rubin(chains, names=None, results=None, display=True)[source]¶

Gelman-Rubin diagnostic for multiple chains [GR+92], [BG98].

This diagnostic technique compares the variance within a single change to the variance between multiple chains. This process serves as a method for testing whether or not the chain has converged. If the chain has converged, we would expect the variance within and the variance between to be equal. This diagnostic tool pairs well with the ParallelMCMC module, which generates a set of distinct chains that have all been initialized at different points within the parameter space.

Args:

chains (list): List of arrays - each array corresponds to different chain set.
names (list): List of strings - corresponds to parameter names.
results (dict): Results from MCMC simulation.

Returns:

(dict): Keywords of the dictionary correspond to the parameter names. Each keyword corresponds to a dictionary outputted from calculate_psrf().

pymcmcstat.chain.ChainStatistics.get_parameter_names(nparam, results)[source]¶

Get parameter names from results dictionary.

If no results found, then default names are generated. If some results are found, then an extended set is generated to complete the list requirement. Uses the functions: generate_default_names() and extend_names_to_match_nparam()

Args:

nparam (int): Number of parameter names needed

Returns:

names (list): List of length nparam with strings.

pymcmcstat.chain.ChainStatistics.geweke(chain, a=0.1, b=0.5)[source]¶

Geweke’s MCMC convergence diagnostic

Test for equality of the means of the first a% (default 10%) and last b% (50%) of a Markov chain - see [BR98].

Args:

chain (ndarray): Sampling chain.
a (float): First a% of chain.
b (float): Last b% of chain.

Returns:

z (ndarray): Convergence diagnostic prior to CDF.
p (ndarray): Geweke’s MCMC convergence diagnostic.

Note

The percentage of the chain should be given as a decimal between zero and one. So, for the first 10% of the chain, define a = 0.1. Likewise, for the last 50% of the chain, define b = 0.5.

pymcmcstat.chain.ChainStatistics.integrated_autocorrelation_time(chain)[source]¶

Estimates the integrated autocorrelation time using Sokal’s adaptive truncated periodogram estimator.

Args:

chain (ndarray): Sampling chain.

Returns:

tau (ndarray): Autocorrelation time.
m (ndarray): Counter.

pymcmcstat.chain.ChainStatistics.power_spectral_density_using_hanning_window(x, nfft=None, nw=None)[source]¶

Power spectral density using Hanning window.

Args:

x (ndarray): Array of points - portion of chain.
nfft (int): Length of Fourier transform.
nw (int): Size of window.

Returns:

y (ndarray): Power spectral density.

pymcmcstat.chain.ChainStatistics.print_chain_statistics(names, meanii, stdii, mcerr, tau, p)[source]¶

Print chain statistics to terminal window.

Args:

names (list): List of parameter names.
meanii (list): Parameter mean values.
stdii (list): Parameter standard deviation.
mcerr (ndarray): Normalized batch mean standard deviation.
tau (ndarray): Integrated autocorrelation time.
p (ndarray): Geweke’s convergence diagnostic.

Example display:

---------------------
name      :       mean        std     MC_err        tau     geweke
$p_{0}$   :     1.9680     0.0319     0.0013    36.3279     0.9979
$p_{1}$   :     3.0818     0.0803     0.0035    37.1669     0.9961
---------------------

pymcmcstat.chain.ChainStatistics.spectral_estimate_for_variance(x)[source]¶

Spectral density at frequency zero.

Args:

x (ndarray): Array of points - portion of chain.

Returns:

s (ndarray): Spectral estimate for variance.