amimodels package

Submodules

amimodels.deterministics module

This module provides PyMC Deterministic objects custom built for hidden Markov models.

class amimodels.deterministics.HMMLinearCombination(name, X_matrices, betas, states, *args, **kwds)

Bases: sphinx.ext.autodoc.Deterministic

A deterministic that represents

\[\mu_t = x_t^{(S_t)\top} \beta^{(S_t)}\]

for a state sequence \(\{S_t\}_{t=1}^T\) with \(S_t \in \{1,\dots,K\}\), rows of design matrices \(x^{(k)}_t\), and covariate vectors \(\beta^{(k)}\).

This deterministic organizes, separates and tracks the aforementioned \(\mu_t\) in pieces designated by the current state sequence, \(S_t\). Specficially, it tracks a set containing the following sets for \(k \in \{1,\dots,K\}\)

\[\left\{ \mu^{(k)} = \tilde{X}^{(k)} \beta^{(k)}, \tilde{X}^{(k)} = X_{\mathcal{T}^{(k)}}, \mathcal{T}^{(k)} = \{t : S_t = k\} \right\}\]

amimodels.eemeter_tools module

amimodels.eemeter_tools.read_meter_data(trace_filename, project_info_filename, project_id=None, weather=True, merge_series=True)

Read meter data from a raw XML file source, obtain matching project information from a separate CSV file. Fetches the corresponding weather data, when requested, too.

Parameters:

trace_filename: str

Filename of XML meter trace.

project_info_filename: str

Filename of CSV file containing project info.

project_id: str

Manually provide the project ID used in project_info_filename. If None, the first part of trace_filename before a _ is used.

weather: bool

True will obtain weather (temperature) data.

merge_series: bool

True will return a pandas.DataFrame with merged consumption and temperature data.

Returns:

A DataCollection object with the following fields:

project_info: pandas.DataFrame

Contains columns for project properties.

baseline_end: pandas.Datetime

End date of the baseline period.

consumption_data: eemeter.consumption.ConsumptionData

Consumption data object.

consumption_data_freq: pandas.DataFrame

Consumption data with normalized frequency.

If weather=True:

weather_source: eemeter.ISDWeatherSource

Weather source object.

weather_data: pandas.DataFrame

Temperature observations in, degF, with frequency matching consumption_data_freq. Values are averaged if raw temperature observations are lower frequency.

If merge_series=True:

cons_weather_data: pandas.DataFrame

Merged consumption and temperature data.

amimodels.hmm_utils module

amimodels.hmm_utils.compute_steady_state(trans_mat)

Compute the steady state of a transition probability matrix.

Parameters:

trans_mat: numpy.array

A transition probability matrix for K states with shape (K, K-1), i.e. the last column omitted.

Returns:

A numpy.array representing the steady state.

amimodels.hmm_utils.compute_trans_freqs(states, N_states, counts_only=False)

Computes empirical state transition frequencies.

Parameters:

states: a pymc object or ndarray

Vector sequence of states.

N_states: int

Total number of observable states.

counts_only: boolean

Return only the transition counts for each state.

Returns:

Unless counts_only is True, return the empirical state transition

frequencies; otherwise, return the transition counts for each state.

amimodels.hmm_utils.plot_hmm(mcmc_step, obs_index=None, axes=None, smpl_subset=None, states_true=None, plot_samples=True, range_slice=slice(0, None, None))

Plot the observations, estimated observation mean parameter’s statistics and estimated state sequence statistics.

Parameters:

mcmc_res: a pymc.MCMC object

The MCMC object after estimation.

obs_index: pandas indices or None

Time series index for observations.

axes: list of matplotlib axes

Axes to use for plotting.

smpl_subset: float, numpy.ndarray of int, or None

If a float, the percent of samples to plot; if a numpy.ndarray, the samples to plot; if None, plot all samples.

states_true: pandas.DataFrame

True states time series.

plot_samples: bool

If True, plot the individual sample values, along with the means.

Returns:

A matplotlib plot object.

amimodels.normal_hmm module

This module provides classes and functions for producing and simulating hidden Markov models (HMM) with scalar normal/Gaussian-distributed observations in PyMC.

class amimodels.normal_hmm.NormalHMMInitialParams(alpha_trans, trans_mat, states, betas, Ws, Vs, p0)

Bases: object

An object that holds initial parameters for a normal-observations HMM.

class amimodels.normal_hmm.NormalHMMProcess(trans_mat, N_obs, p0, betas, Vs, exogenous_sim_func=None, formulas=None, start_datetime=None, seed=None)

Bases: object

An object that produces simulations from a normal-observations HMM.

Methods

generate_exogenous() Generate exogenous terms/covariates and time indices.
simulate() Simulate a series from this normal-emissions HMM.
generate_exogenous()

Generate exogenous terms/covariates and time indices. Override this if you will, but make sure the dimensions match self.betas.

Returns:

index_sim

A pandas.DatetimeIndex.

X_matrices

A sequence of design matrices corresponding to each self.betas.

simulate()

Simulate a series from this normal-emissions HMM.

Returns:

states: numpy.array of int

Simulated state values.

y: pandas.DataFrame

Time series of simulated usage observations.

X_matrices: list of pandas.DataFrame

List of pandas.DataFrame`s for each `self.betas with designs given by self.formulas.

amimodels.normal_hmm.bic_norm_hmm_init_params(y, X_matrices)

Initialize a normal HMM regression mixture with a GMM mixture of a BIC determined number of states. Starting with an initial set of design matrices, this function searches for the best number of additional constant states to add to the model.

Parameters:

y: pandas.DataFrame or pandas.Series

Time-indexed vector of observations.

X_matrices: list of pandas.DataFrame

Collection of design matrices for each initial state.

Returns:

init_params:

A NormalHMMInitialParams object.

amimodels.normal_hmm.calc_alpha_prior(obs_states, N_states, trans_freqs=None)

A method of producing informed Dirichlet distribution parameters from observed states.

Parameters:

obs_states: ndarray of int

Array of state label observations.

N_states: int

Total number of states (max integer label of observations).

trans_freqs: numpy.array of float

Empirical transition probabilities for obs_states sequence.

Returns:

numpy.array of Dirichlet parameters initialized/updated by the observed

sequence.

amimodels.normal_hmm.generate_temperatures(ind, period=24.0, offset=0.0, base_temp=60.0, flux_amt=10.0)

Generate very regular temperature oscillations. This is roughly based on observations starting at 4/1/2015 in CA (in UTC!).

Parameters:

ind: pandas.DatetimeIndex

Time index over which the temperatures will be computed.

period: float

Period of the sinusoid.

offset: float

Frequency offset.

base_temp: float

Temperature around which the sinusoid will fluctuate.

flux_amt: float

Scaling intensity of sinusoid.

amimodels.normal_hmm.get_stochs_excluding(stoch, excluding)

Get the parents of a stochastic excluding the given list of stochastic and/or parent names.

Parameters:

stoch: pymc.Stochastic

Root stochastic/node.

excluding: list of str

Stochastic/node and parent names to exclude.

amimodels.normal_hmm.gmm_norm_hmm_init_params(y, X_matrices)

Generates initial parameters for the univariate normal-emissions HMM with normal mean priors.

Parameters:

y: pandas.DataFrame or pandas.Series

Time-indexed vector of observations.

X_matrices: list of pandas.DataFrame

Collection of design matrices for each hidden state’s mean.

Returns:

init_params:

A NormalHMMInitialParams object.

amimodels.normal_hmm.make_normal_baseline_hmm(y_data, X_data, baseline_end, initial_params)

Construct a PyMC2 scalar normal-emmisions HMM with a stochastic reporting period start time parameter and baseline, reporting parameters for all other stochastics/estimated terms in the model. The reporting period start time parameter is given a discrete uniform distribution starting from the first observation after the baseline to the end of the series.

Parameters:

y_data: pandas.DataFrame

Usage/response observations.

X_data: list of pandas.DataFrame

List of design matrices for each state. Each must span the entire length of observations (i.e. y_data).

baseline_end: pandas.tslib.Timestamp

End of baseline period (inclusive), beginning of reporting period.

initial_params: NormalHMMInitialParams

An object containing the following fields/members:

Returns

=======

A pymc.Model object used for sampling.

amimodels.normal_hmm.make_normal_hmm(y_data, X_data, initial_params=None, single_obs_var=False, include_ppy=False)

Construct a PyMC2 scalar normal-emmisions HMM model of the form

\[\begin{split}y_t &\sim \operatorname{N}^{+}(x_t^{(S_t)\top} \beta^{(S_t)}, V^{(S_t)}) \\ \beta^{(S_t)}_i &\sim \operatorname{N}(m^{(S_t)}, C^{(S_t)}), \quad i \in \{1,\dots, M\} \\ S_t \mid S_{t-1} &\sim \operatorname{Categorical}(\pi^{(S_{t-1})}) \\ \pi^{(S_t-1)} &\sim \operatorname{Dirichlet}(\alpha^{(S_{t-1})})\end{split}\]

where \(\operatorname{N}_{+}\) is the positive (truncated below zero) normal distribution, \(S_t \in \{1, \ldots, K\}\), \(C^{(S_t)} = \lambda_i^{(S_t) 2} \tau^{(S_t) 2}\) and

\[\begin{split}\lambda^{k}_i &\sim \operatorname{Cauchy}^{+}(0, 1) \\ \tau^{(k)} &\sim \operatorname{Cauchy}^{+}(0, 1) \\ V^{(k)} &\sim \operatorname{Gamma}(n_0/2, n_0 S_0/2)\end{split}\]

for \(k \in \{1, \ldots, K\}\).

for observations \(y_t\) in \(t \in \{0, \dots, T\}\), features \(x_t^{(S_t)} \in \mathbb{R}^M\), regression parameters \(\beta^{(S_t)}\), state sequences \(\{S_t\}^T_{t=1}\) and state transition probabilities \(\pi \in [0, 1]^{K}\). \(\operatorname{Cauchy}^{+}\) is the standard half-Cauchy distribution and \(\operatorname{N}\) is the normal/Gaussian distribution.

The set of random variables, \(\mathcal{S} = \{\{\beta^{(k)}, \lambda^{(k)}, \tau^{(k)}, \tau^{(k)}, \pi^{(k)}\}_{k=1}^K, \{S_t\}^T_{t=1}\}\), are referred to as “stochastics” throughout the code.

Parameters:

y_data: pandas.DataFrame

Usage/response observations \(y_t\).

X_data: list of pandas.DataFrame

List of design matrices for each state, i.e. \(x_t^{(S_t)}\). Each must span the entire length of observations (i.e. y_data).

initial_params: NormalHMMInitialParams

The initial parameters, which include \(\pi_0, m^{(k)}, \alpha^{(k)}, V^{(k)}\).

single_obs_var: bool, optional

Determines whether there are multiple observation variances or not. Only used when not given intial parameters.

include_ppy: bool, optional

If True, then include an unobserved observation Stochastic that can be used to produce posterior predicitve samples. The Stochastic will have the name y_pp.

Returns:

A pymc.Model object used for sampling.

amimodels.normal_hmm.make_poisson_hmm(y_data, X_data, initial_params)

Construct a PyMC2 scalar poisson-emmisions HMM model.

TODO: Update to match normal model design.

The model takes the following form:

\[\begin{split}y_t &\sim \operatorname{Poisson}(\exp(x_t^{(S_t)\top} \beta^{(S_t)})) \\ \beta^{(S_t)}_i &\sim \operatorname{N}(m^{(S_t)}, C^{(S_t)}), \quad i \in \{1,\dots,M\} \\ S_t \mid S_{t-1} &\sim \operatorname{Categorical}(\pi^{(S_{t-1})}) \\ \pi^{(S_t-1)} &\sim \operatorname{Dirichlet}(\alpha^{(S_{t-1})})\end{split}\]

where \(C^{(S_t)} = \lambda_i^{(S_t) 2} \tau^{(S_t) 2}\) and

\[\begin{split}\lambda^{(S_t)}_i &\sim \operatorname{Cauchy}^{+}(0, 1) \\ \tau^{(S_t)} &\sim \operatorname{Cauchy}^{+}(0, 1)\end{split}\]

for observations \(y_t\) in \(t \in \{0, \dots, T\}\), features \(x_t^{(S_t)} \in \mathbb{R}^M\), regression parameters \(\beta^{(S_t)}\), state sequences \(\{S_t\}^T_{t=1}\) and state transition probabilities \(\pi \in [0, 1]^{K}\). \(\operatorname{Cauchy}^{+}\) is the standard half-Cauchy distribution and \(\operatorname{N}\) is the normal/Gaussian distribution.

The set of random variables, \(\mathcal{S} = \{\{\beta^{(k)}, \lambda^{(k)}, \tau^{(k)}, \tau^{(k)}, \pi^{(k)}\}_{k=1}^K, \{S_t\}^T_{t=1}\}\), are referred to as “stochastics” throughout the code.

Parameters:

y_data: pandas.DataFrame

Usage/response observations \(y_t\).

X_data: list of pandas.DataFrame

List of design matrices for each state, i.e. \(x_t^{(S_t)}\). Each must span the entire length of observations (i.e. y_data).

initial_params: NormalHMMInitialParams

The initial parameters, which include \(\pi_0, m^{(k)}, \alpha^{(k)}, V^{(k)}\). Ignores V parameters. FIXME: using the “Normal” initial params objects is only temporary.

Returns:

A pymc.Model object used for sampling.

amimodels.normal_hmm.trace_sampler(model, stoch, traces, dbname=None)

Creates a PyMC ram database for stochastic given a model and set of trace values for its parent stochastics.

Parameters:

model: pymc.Model object

The model object.

stoch: pymc.Stochastic or str

The stochastic, or name, for which we want values under the given samples in traces.

traces: dict of str, numpy.ndarray

A dictionary of stoch‘s parents’ stochastic names and trace values

Returns:

A pymc.database.ram.Database.

amimodels.step_methods module

amimodels.stochastics module

class amimodels.stochastics.HMMStateSeq(name, trans_mat, N_obs, p0=None, *args, **kwargs)

Bases: sphinx.ext.autodoc.Stochastic

A stochastic that represents an HMM’s state process \(\{S_t\}_{t=1}^T\). It’s basically the distribution of a sequence of Categorical distributions connected by a discrete Markov transition probability matrix.

Use the step methods made specifically for this distribution; otherwise, the default Metropolis samplers will likely perform too poorly.

Parameters:

trans_mat: ndarray

A transition probability matrix for K-many states with shape (K, K-1).

N_obs: int

Number of observations.

p0: ndarray

Initial state probabilities. If None, the steady state is computed and used.

value: ndarray of int

Initial value array of discrete states numbers/indices/labels.

size: int

Not used.

See also

pymc.PyMCObjects
Stochastic
class amimodels.stochastics.TransProbMatrix(name, alpha_trans, *args, **kwargs)

Bases: sphinx.ext.autodoc.Stochastic

A stochastic that represents an HMM’s transition probability matrix with rows given by

\[\pi^{(k)} \sim \operatorname{Dir}(\alpha^{(k)}) \;,\]

for \(k \in \{1, \dots, K\}\).

This object technically works with the \(K-1\)-many columns of transition probabilities, and each row is represented a Dirichlet distribution (in the \(K-1\) independent terms.

Parameters:

alpha_trans: ndarray

Dirichlet parameters for each row of the transition probability matrix.

value: ndarray of int

Initial value.

See also

pymc.PyMCObjects
Stochastic
amimodels.stochastics.states_logp(value, trans_mat, N_obs, p0)

Computes log likelihood of states in an HMM.

Parameters:

value: ndarray of int

Array of discrete states numbers/indices/labels

trans_mat: ndarray

A transition probability matrix for K states with shape (K, K-1), i.e. the last column omitted.

N_obs: int

Number of observations.

p0: ndarray

Initial state probabilities. If None, the steady state is computed and used.

Returns:

float value of the log likelihood

amimodels.stochastics.states_random(trans_mat, N_obs, p0, size=None)

Samples states from an HMM.

Parameters:

trans_mat: ndarray

A transition probability matrix for K-many states with shape (K, K-1).

N_obs: int

Number of observations.

p0: ndarray

Initial state probabilities. If None, the steady state is computed and used.

size: int

Not used.

Returns:

A ndarray of length N_obs containing sampled state numbers/indices/labels.

amimodels.stochastics.trans_mat_logp(value, alpha_trans)

Computes the log probability of a transition probability matrix for the given Dirichlet parameters.

Parameters:

value: ndarray

The observations.

alpha_trans: ndarray

The Dirichlet parameters.

Returns:

The log probability.

amimodels.stochastics.trans_mat_random(alpha_trans)

Sample a shape (K, K-1) transition probability matrix with K-many states given Dirichlet parameters for each row.

Parameters:

alpha_trans: ndarray

Dirichlet parameters for each row.

Returns:

A ndarray of the sampled transition probability matrix (without the

last column).

amimodels.testing module

amimodels.testing.assert_hpd(stochastic, true_value, alpha=0.05, subset=slice(0, None, None), rtol=0)

Assert that the given stochastic’s \((1-\alpha)\) HPD interval covers the true value.

Parameters:

stochastic: a pymc.Stochastic

The stochastic we want to test. Must have trace values.

true_value: ndarray

The “true” values to check against.

alpha: float, optional

The alpha confidence level.

subset: slice, optional

Slice for the subset of parameters to check.

rtol: array of float, optional

Relative tolerences for matching the edges of the intervals.

edge

Returns:

Nothing, just exec’s the assert statements.

amimodels.testing.simple_norm_reg_model(N_obs=100, X_matrices=None, betas=None, trans_mat_obs=array([[ 0.9], [ 0.1]]), tau_y=100, y_obs=None)

A simple normal observations/emissions HMM regression model with fixed transition matrix. Uses HMMStateSeq and HMMLinearCombination to model the HMM states and observation mean, respectively.

This is useful for generating simple test/toy data according to a completely known model and parameters.

Parameters:

N_obs: int

Number of observations.

X_matrices: list of ndarray or None

List of design matrices for the regression parameters. If None, design matrices are generated for the given betas (which themselves will be generated if None).

betas: list of ndarray or pymc.Stochastic

List of regression parameters matching X_matrices. If None, these are randomly generated according to the shapes of X_matrices.

trans_mat_obs: ndarray or pymc.Stochastic

Transition probability matrix.

tau_y: float or ndarray

Observation variance.

y_obs: ndarray

A vector of observations. If None, a non-observed pymc.Stochastic is produced.

Returns:

A dictionary containing all the variables in the model.

amimodels.testing.simple_state_seq_model(mu_vals=array([-1, 1]), trans_mat_obs=array([[ 0.9], [ 0.1]]), N_obs=100, tau_y=100, y_obs=None)

A simple normal observations/emissions HMM model with fixed transition matrix. Uses HMMStateSeq and an array-indexing deterministic for the observations’ means.

This is useful for generating simple test/toy data according to a completely known model and parameters.

Parameters:

mu_vals: list of ndarray or pymc.Stochastic

List of mean values for the observations/emissions of each state.

trans_mat_obs: ndarray or pymc.Stochastic

Transition probability matrix.

N_obs: int

Number of observations.

tau_y: float or ndarray

Observation variance.

y_obs: ndarray

A vector of observations. If None, a non-observed pymc.Stochastic is produced.

Returns:

A dictionary containing all the variables in the model.

amimodels.testing.simple_state_trans_model(mu_vals=array([-1, 1]), alpha_trans=array([[ 1., 10.], [ 10., 1.]]), N_obs=100, tau_y=100, y_obs=None)

A simple normal observations/emissions HMM model with a stochastic transition matrix. Uses HMMStateSeq, TransProbMatrix and an array-indexing deterministic for the observations’ means.

This is useful for generating simple test/toy data according to a completely known model and parameters.

Parameters:

mu_vals: list of ndarray or pymc.Stochastic

List of mean values for the observations/emissions of each state.

alpha_trans: ndarray

Dirichlet hyper-parameters for the transition probability matrix’s prior.

N_obs: int

Number of observations.

tau_y: float or ndarray

Observation variance.

y_obs: ndarray

A vector of observations. If None, a non-observed pymc.Stochastic is produced.

Returns:

A dictionary containing all the variables in the model.

Module contents

About

This package contains models for advanced metering infrastructure (AMI) data.

amimodels.get_version()