amimodels package¶
Submodules¶
amimodels.deterministics module¶
This module provides PyMC Deterministic objects custom built for hidden Markov models.
-
class
amimodels.deterministics.
HMMLinearCombination
(name, X_matrices, betas, states, *args, **kwds)¶ Bases:
sphinx.ext.autodoc.Deterministic
A deterministic that represents
\[\mu_t = x_t^{(S_t)\top} \beta^{(S_t)}\]for a state sequence \(\{S_t\}_{t=1}^T\) with \(S_t \in \{1,\dots,K\}\), rows of design matrices \(x^{(k)}_t\), and covariate vectors \(\beta^{(k)}\).
This deterministic organizes, separates and tracks the aforementioned \(\mu_t\) in pieces designated by the current state sequence, \(S_t\). Specficially, it tracks a set containing the following sets for \(k \in \{1,\dots,K\}\)
\[\left\{ \mu^{(k)} = \tilde{X}^{(k)} \beta^{(k)}, \tilde{X}^{(k)} = X_{\mathcal{T}^{(k)}}, \mathcal{T}^{(k)} = \{t : S_t = k\} \right\}\]
amimodels.eemeter_tools module¶
-
amimodels.eemeter_tools.
read_meter_data
(trace_filename, project_info_filename, project_id=None, weather=True, merge_series=True)¶ Read meter data from a raw XML file source, obtain matching project information from a separate CSV file. Fetches the corresponding weather data, when requested, too.
Parameters: trace_filename: str
Filename of XML meter trace.
project_info_filename: str
Filename of CSV file containing project info.
project_id: str
Manually provide the project ID used in project_info_filename. If None, the first part of trace_filename before a _ is used.
weather: bool
True will obtain weather (temperature) data.
merge_series: bool
True will return a pandas.DataFrame with merged consumption and temperature data.
Returns: A DataCollection object with the following fields:
- project_info: pandas.DataFrame
Contains columns for project properties.
- baseline_end: pandas.Datetime
End date of the baseline period.
- consumption_data: eemeter.consumption.ConsumptionData
Consumption data object.
- consumption_data_freq: pandas.DataFrame
Consumption data with normalized frequency.
If
weather=True
:- weather_source: eemeter.ISDWeatherSource
Weather source object.
- weather_data: pandas.DataFrame
Temperature observations in, degF, with frequency matching consumption_data_freq. Values are averaged if raw temperature observations are lower frequency.
If
merge_series=True
:- cons_weather_data: pandas.DataFrame
Merged consumption and temperature data.
amimodels.hmm_utils module¶
-
amimodels.hmm_utils.
compute_steady_state
(trans_mat)¶ Compute the steady state of a transition probability matrix.
Parameters: trans_mat: numpy.array
A transition probability matrix for K states with shape (K, K-1), i.e. the last column omitted.
Returns: A numpy.array representing the steady state.
-
amimodels.hmm_utils.
compute_trans_freqs
(states, N_states, counts_only=False)¶ Computes empirical state transition frequencies.
Parameters: states: a pymc object or ndarray
Vector sequence of states.
N_states: int
Total number of observable states.
counts_only: boolean
Return only the transition counts for each state.
Returns: Unless counts_only is True, return the empirical state transition
frequencies; otherwise, return the transition counts for each state.
-
amimodels.hmm_utils.
plot_hmm
(mcmc_step, obs_index=None, axes=None, smpl_subset=None, states_true=None, plot_samples=True, range_slice=slice(0, None, None))¶ Plot the observations, estimated observation mean parameter’s statistics and estimated state sequence statistics.
Parameters: mcmc_res: a pymc.MCMC object
The MCMC object after estimation.
obs_index: pandas indices or None
Time series index for observations.
axes: list of matplotlib axes
Axes to use for plotting.
smpl_subset: float, numpy.ndarray of int, or None
If a float, the percent of samples to plot; if a numpy.ndarray, the samples to plot; if None, plot all samples.
states_true: pandas.DataFrame
True states time series.
plot_samples: bool
If True, plot the individual sample values, along with the means.
Returns: A matplotlib plot object.
amimodels.normal_hmm module¶
This module provides classes and functions for producing and simulating hidden Markov models (HMM) with scalar normal/Gaussian-distributed observations in PyMC.
-
class
amimodels.normal_hmm.
NormalHMMInitialParams
(alpha_trans, trans_mat, states, betas, Ws, Vs, p0)¶ Bases:
object
An object that holds initial parameters for a normal-observations HMM.
-
class
amimodels.normal_hmm.
NormalHMMProcess
(trans_mat, N_obs, p0, betas, Vs, exogenous_sim_func=None, formulas=None, start_datetime=None, seed=None)¶ Bases:
object
An object that produces simulations from a normal-observations HMM.
Methods
generate_exogenous
()Generate exogenous terms/covariates and time indices. simulate
()Simulate a series from this normal-emissions HMM. -
generate_exogenous
()¶ Generate exogenous terms/covariates and time indices. Override this if you will, but make sure the dimensions match self.betas.
Returns: index_sim
A pandas.DatetimeIndex.
X_matrices
A sequence of design matrices corresponding to each self.betas.
-
simulate
()¶ Simulate a series from this normal-emissions HMM.
Returns: states: numpy.array of int
Simulated state values.
y: pandas.DataFrame
Time series of simulated usage observations.
X_matrices: list of pandas.DataFrame
List of pandas.DataFrame`s for each `self.betas with designs given by self.formulas.
-
-
amimodels.normal_hmm.
bic_norm_hmm_init_params
(y, X_matrices)¶ Initialize a normal HMM regression mixture with a GMM mixture of a BIC determined number of states. Starting with an initial set of design matrices, this function searches for the best number of additional constant states to add to the model.
Parameters: y: pandas.DataFrame or pandas.Series
Time-indexed vector of observations.
X_matrices: list of pandas.DataFrame
Collection of design matrices for each initial state.
Returns: init_params:
A NormalHMMInitialParams object.
-
amimodels.normal_hmm.
calc_alpha_prior
(obs_states, N_states, trans_freqs=None)¶ A method of producing informed Dirichlet distribution parameters from observed states.
Parameters: obs_states: ndarray of int
Array of state label observations.
N_states: int
Total number of states (max integer label of observations).
trans_freqs: numpy.array of float
Empirical transition probabilities for obs_states sequence.
Returns: numpy.array of Dirichlet parameters initialized/updated by the observed
sequence.
-
amimodels.normal_hmm.
generate_temperatures
(ind, period=24.0, offset=0.0, base_temp=60.0, flux_amt=10.0)¶ Generate very regular temperature oscillations. This is roughly based on observations starting at 4/1/2015 in CA (in UTC!).
Parameters: ind: pandas.DatetimeIndex
Time index over which the temperatures will be computed.
period: float
Period of the sinusoid.
offset: float
Frequency offset.
base_temp: float
Temperature around which the sinusoid will fluctuate.
flux_amt: float
Scaling intensity of sinusoid.
-
amimodels.normal_hmm.
get_stochs_excluding
(stoch, excluding)¶ Get the parents of a stochastic excluding the given list of stochastic and/or parent names.
Parameters: stoch: pymc.Stochastic
Root stochastic/node.
excluding: list of str
Stochastic/node and parent names to exclude.
-
amimodels.normal_hmm.
gmm_norm_hmm_init_params
(y, X_matrices)¶ Generates initial parameters for the univariate normal-emissions HMM with normal mean priors.
Parameters: y: pandas.DataFrame or pandas.Series
Time-indexed vector of observations.
X_matrices: list of pandas.DataFrame
Collection of design matrices for each hidden state’s mean.
Returns: init_params:
A NormalHMMInitialParams object.
-
amimodels.normal_hmm.
make_normal_baseline_hmm
(y_data, X_data, baseline_end, initial_params)¶ Construct a PyMC2 scalar normal-emmisions HMM with a stochastic reporting period start time parameter and baseline, reporting parameters for all other stochastics/estimated terms in the model. The reporting period start time parameter is given a discrete uniform distribution starting from the first observation after the baseline to the end of the series.
Parameters: y_data: pandas.DataFrame
Usage/response observations.
X_data: list of pandas.DataFrame
List of design matrices for each state. Each must span the entire length of observations (i.e. y_data).
baseline_end: pandas.tslib.Timestamp
End of baseline period (inclusive), beginning of reporting period.
initial_params: NormalHMMInitialParams
An object containing the following fields/members:
Returns
=======
A pymc.Model object used for sampling.
-
amimodels.normal_hmm.
make_normal_hmm
(y_data, X_data, initial_params=None, single_obs_var=False, include_ppy=False)¶ Construct a PyMC2 scalar normal-emmisions HMM model of the form
\[\begin{split}y_t &\sim \operatorname{N}^{+}(x_t^{(S_t)\top} \beta^{(S_t)}, V^{(S_t)}) \\ \beta^{(S_t)}_i &\sim \operatorname{N}(m^{(S_t)}, C^{(S_t)}), \quad i \in \{1,\dots, M\} \\ S_t \mid S_{t-1} &\sim \operatorname{Categorical}(\pi^{(S_{t-1})}) \\ \pi^{(S_t-1)} &\sim \operatorname{Dirichlet}(\alpha^{(S_{t-1})})\end{split}\]where \(\operatorname{N}_{+}\) is the positive (truncated below zero) normal distribution, \(S_t \in \{1, \ldots, K\}\), \(C^{(S_t)} = \lambda_i^{(S_t) 2} \tau^{(S_t) 2}\) and
\[\begin{split}\lambda^{k}_i &\sim \operatorname{Cauchy}^{+}(0, 1) \\ \tau^{(k)} &\sim \operatorname{Cauchy}^{+}(0, 1) \\ V^{(k)} &\sim \operatorname{Gamma}(n_0/2, n_0 S_0/2)\end{split}\]for \(k \in \{1, \ldots, K\}\).
for observations \(y_t\) in \(t \in \{0, \dots, T\}\), features \(x_t^{(S_t)} \in \mathbb{R}^M\), regression parameters \(\beta^{(S_t)}\), state sequences \(\{S_t\}^T_{t=1}\) and state transition probabilities \(\pi \in [0, 1]^{K}\). \(\operatorname{Cauchy}^{+}\) is the standard half-Cauchy distribution and \(\operatorname{N}\) is the normal/Gaussian distribution.
The set of random variables, \(\mathcal{S} = \{\{\beta^{(k)}, \lambda^{(k)}, \tau^{(k)}, \tau^{(k)}, \pi^{(k)}\}_{k=1}^K, \{S_t\}^T_{t=1}\}\), are referred to as “stochastics” throughout the code.
Parameters: y_data: pandas.DataFrame
Usage/response observations \(y_t\).
X_data: list of pandas.DataFrame
List of design matrices for each state, i.e. \(x_t^{(S_t)}\). Each must span the entire length of observations (i.e. y_data).
initial_params: NormalHMMInitialParams
The initial parameters, which include \(\pi_0, m^{(k)}, \alpha^{(k)}, V^{(k)}\).
single_obs_var: bool, optional
Determines whether there are multiple observation variances or not. Only used when not given intial parameters.
include_ppy: bool, optional
If True, then include an unobserved observation Stochastic that can be used to produce posterior predicitve samples. The Stochastic will have the name y_pp.
Returns: A
pymc.Model
object used for sampling.
-
amimodels.normal_hmm.
make_poisson_hmm
(y_data, X_data, initial_params)¶ Construct a PyMC2 scalar poisson-emmisions HMM model.
TODO: Update to match normal model design.
The model takes the following form:
\[\begin{split}y_t &\sim \operatorname{Poisson}(\exp(x_t^{(S_t)\top} \beta^{(S_t)})) \\ \beta^{(S_t)}_i &\sim \operatorname{N}(m^{(S_t)}, C^{(S_t)}), \quad i \in \{1,\dots,M\} \\ S_t \mid S_{t-1} &\sim \operatorname{Categorical}(\pi^{(S_{t-1})}) \\ \pi^{(S_t-1)} &\sim \operatorname{Dirichlet}(\alpha^{(S_{t-1})})\end{split}\]where \(C^{(S_t)} = \lambda_i^{(S_t) 2} \tau^{(S_t) 2}\) and
\[\begin{split}\lambda^{(S_t)}_i &\sim \operatorname{Cauchy}^{+}(0, 1) \\ \tau^{(S_t)} &\sim \operatorname{Cauchy}^{+}(0, 1)\end{split}\]for observations \(y_t\) in \(t \in \{0, \dots, T\}\), features \(x_t^{(S_t)} \in \mathbb{R}^M\), regression parameters \(\beta^{(S_t)}\), state sequences \(\{S_t\}^T_{t=1}\) and state transition probabilities \(\pi \in [0, 1]^{K}\). \(\operatorname{Cauchy}^{+}\) is the standard half-Cauchy distribution and \(\operatorname{N}\) is the normal/Gaussian distribution.
The set of random variables, \(\mathcal{S} = \{\{\beta^{(k)}, \lambda^{(k)}, \tau^{(k)}, \tau^{(k)}, \pi^{(k)}\}_{k=1}^K, \{S_t\}^T_{t=1}\}\), are referred to as “stochastics” throughout the code.
Parameters: y_data: pandas.DataFrame
Usage/response observations \(y_t\).
X_data: list of pandas.DataFrame
List of design matrices for each state, i.e. \(x_t^{(S_t)}\). Each must span the entire length of observations (i.e. y_data).
initial_params: NormalHMMInitialParams
The initial parameters, which include \(\pi_0, m^{(k)}, \alpha^{(k)}, V^{(k)}\). Ignores V parameters. FIXME: using the “Normal” initial params objects is only temporary.
Returns: A
pymc.Model
object used for sampling.
-
amimodels.normal_hmm.
trace_sampler
(model, stoch, traces, dbname=None)¶ Creates a PyMC ram database for stochastic given a model and set of trace values for its parent stochastics.
Parameters: model: pymc.Model object
The model object.
stoch: pymc.Stochastic or str
The stochastic, or name, for which we want values under the given samples in traces.
traces: dict of str, numpy.ndarray
A dictionary of stoch‘s parents’ stochastic names and trace values
Returns: A pymc.database.ram.Database.
amimodels.step_methods module¶
amimodels.stochastics module¶
-
class
amimodels.stochastics.
HMMStateSeq
(name, trans_mat, N_obs, p0=None, *args, **kwargs)¶ Bases:
sphinx.ext.autodoc.Stochastic
A stochastic that represents an HMM’s state process \(\{S_t\}_{t=1}^T\). It’s basically the distribution of a sequence of Categorical distributions connected by a discrete Markov transition probability matrix.
Use the step methods made specifically for this distribution; otherwise, the default Metropolis samplers will likely perform too poorly.
Parameters: trans_mat: ndarray
A transition probability matrix for K-many states with shape (K, K-1).
N_obs: int
Number of observations.
p0: ndarray
Initial state probabilities. If None, the steady state is computed and used.
value: ndarray of int
Initial value array of discrete states numbers/indices/labels.
size: int
Not used.
See also
pymc.PyMCObjects
- Stochastic
-
class
amimodels.stochastics.
TransProbMatrix
(name, alpha_trans, *args, **kwargs)¶ Bases:
sphinx.ext.autodoc.Stochastic
A stochastic that represents an HMM’s transition probability matrix with rows given by
\[\pi^{(k)} \sim \operatorname{Dir}(\alpha^{(k)}) \;,\]for \(k \in \{1, \dots, K\}\).
This object technically works with the \(K-1\)-many columns of transition probabilities, and each row is represented a Dirichlet distribution (in the \(K-1\) independent terms.
Parameters: alpha_trans: ndarray
Dirichlet parameters for each row of the transition probability matrix.
value: ndarray of int
Initial value.
See also
pymc.PyMCObjects
- Stochastic
-
amimodels.stochastics.
states_logp
(value, trans_mat, N_obs, p0)¶ Computes log likelihood of states in an HMM.
Parameters: value: ndarray of int
Array of discrete states numbers/indices/labels
trans_mat: ndarray
A transition probability matrix for K states with shape (K, K-1), i.e. the last column omitted.
N_obs: int
Number of observations.
p0: ndarray
Initial state probabilities. If None, the steady state is computed and used.
Returns: float value of the log likelihood
-
amimodels.stochastics.
states_random
(trans_mat, N_obs, p0, size=None)¶ Samples states from an HMM.
Parameters: trans_mat: ndarray
A transition probability matrix for K-many states with shape (K, K-1).
N_obs: int
Number of observations.
p0: ndarray
Initial state probabilities. If None, the steady state is computed and used.
size: int
Not used.
Returns: A ndarray of length N_obs containing sampled state numbers/indices/labels.
-
amimodels.stochastics.
trans_mat_logp
(value, alpha_trans)¶ Computes the log probability of a transition probability matrix for the given Dirichlet parameters.
Parameters: value: ndarray
The observations.
alpha_trans: ndarray
The Dirichlet parameters.
Returns: The log probability.
-
amimodels.stochastics.
trans_mat_random
(alpha_trans)¶ Sample a shape (K, K-1) transition probability matrix with K-many states given Dirichlet parameters for each row.
Parameters: alpha_trans: ndarray
Dirichlet parameters for each row.
Returns: A ndarray of the sampled transition probability matrix (without the
last column).
amimodels.testing module¶
-
amimodels.testing.
assert_hpd
(stochastic, true_value, alpha=0.05, subset=slice(0, None, None), rtol=0)¶ Assert that the given stochastic’s \((1-\alpha)\) HPD interval covers the true value.
Parameters: stochastic: a pymc.Stochastic
The stochastic we want to test. Must have trace values.
true_value: ndarray
The “true” values to check against.
alpha: float, optional
The alpha confidence level.
subset: slice, optional
Slice for the subset of parameters to check.
rtol: array of float, optional
Relative tolerences for matching the edges of the intervals.
edge
Returns: Nothing, just exec’s the assert statements.
-
amimodels.testing.
simple_norm_reg_model
(N_obs=100, X_matrices=None, betas=None, trans_mat_obs=array([[ 0.9], [ 0.1]]), tau_y=100, y_obs=None)¶ A simple normal observations/emissions HMM regression model with fixed transition matrix. Uses HMMStateSeq and HMMLinearCombination to model the HMM states and observation mean, respectively.
This is useful for generating simple test/toy data according to a completely known model and parameters.
Parameters: N_obs: int
Number of observations.
X_matrices: list of ndarray or None
List of design matrices for the regression parameters. If None, design matrices are generated for the given betas (which themselves will be generated if None).
betas: list of ndarray or pymc.Stochastic
List of regression parameters matching X_matrices. If None, these are randomly generated according to the shapes of X_matrices.
trans_mat_obs: ndarray or pymc.Stochastic
Transition probability matrix.
tau_y: float or ndarray
Observation variance.
y_obs: ndarray
A vector of observations. If None, a non-observed pymc.Stochastic is produced.
Returns: A dictionary containing all the variables in the model.
-
amimodels.testing.
simple_state_seq_model
(mu_vals=array([-1, 1]), trans_mat_obs=array([[ 0.9], [ 0.1]]), N_obs=100, tau_y=100, y_obs=None)¶ A simple normal observations/emissions HMM model with fixed transition matrix. Uses HMMStateSeq and an array-indexing deterministic for the observations’ means.
This is useful for generating simple test/toy data according to a completely known model and parameters.
Parameters: mu_vals: list of ndarray or pymc.Stochastic
List of mean values for the observations/emissions of each state.
trans_mat_obs: ndarray or pymc.Stochastic
Transition probability matrix.
N_obs: int
Number of observations.
tau_y: float or ndarray
Observation variance.
y_obs: ndarray
A vector of observations. If None, a non-observed pymc.Stochastic is produced.
Returns: A dictionary containing all the variables in the model.
-
amimodels.testing.
simple_state_trans_model
(mu_vals=array([-1, 1]), alpha_trans=array([[ 1., 10.], [ 10., 1.]]), N_obs=100, tau_y=100, y_obs=None)¶ A simple normal observations/emissions HMM model with a stochastic transition matrix. Uses HMMStateSeq, TransProbMatrix and an array-indexing deterministic for the observations’ means.
This is useful for generating simple test/toy data according to a completely known model and parameters.
Parameters: mu_vals: list of ndarray or pymc.Stochastic
List of mean values for the observations/emissions of each state.
alpha_trans: ndarray
Dirichlet hyper-parameters for the transition probability matrix’s prior.
N_obs: int
Number of observations.
tau_y: float or ndarray
Observation variance.
y_obs: ndarray
A vector of observations. If None, a non-observed pymc.Stochastic is produced.
Returns: A dictionary containing all the variables in the model.