Utilities¶

Utility functions module.

`CrossValidation.fit`	Uses Gaussian process regression to build the response surface as a side effect.
`CrossValidation.predict`	Predict method of cross-validation.
`CrossValidation.run`	Run the k-fold cross validation procedure.
`CrossValidation.scorer`	Score function of cross-validation.
`Normalizer.fit_transform`	Return corresponding points shifted and scaled to [-1, 1]^n_params.
`Normalizer.inverse_transform`	Return corresponding points shifted and scaled to [self.lb, self.ub].
`average_rrmse`	Objective function to be optimized during the tuning process of the method `tune_pr_matrix()`.
`initialize_weights`	Inizialize uniform weights for simple Monte Carlo method or linear regression in local linear gradients.
`linear_program_ineq`	Solves an equality constrained linear program with variable bounds.
`local_linear_gradients`	Estimate a collection of gradients from input/output pairs.
`rrmse`	Evaluates the relative root mean square error.
`sort_eigpairs`	Sort eigenpairs.

class CrossValidation(inputs, outputs, gradients, subspace, folds=5, **kwargs)[source]

Bases: object

Class to perform k-fold cross validation when tuning hyperparameters for the design of a response surface with ActiveSubspaces or KernelActiveSubspaces. Used in particular during the tuning of the parameters of the spectral distribution of the feature map, inside the object function average_rrmse. default score is the relative root mean square error (rrmse).

Parameters

inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.
outputs (numpy.ndarray) – n_sample-by-output_dim output matrix.
gradients (numpy.ndarray) – n_samples-by-output_dim-by-input_dim gradients matrix.
subspace (Subspaces) – ActiveSubspace or KernelActiveSubspace object, from which evaluate the response surface. The dimension of the response surface is specified in subspace.dim attribute.
folds (int) – number of folds of the cross-validation procedure.
kwargs (dict) – additional paramters organized in a dictionary to pass to subspace.fit method. For example ‘weights’ or ‘metric’.

Variables

gp (sklearn.gaussian_process.GaussianProcessRegressor) – Gaussian process of the response surface built with scikit-learn.

fit(inputs, gradients, outputs)[source]

Uses Gaussian process regression to build the response surface as a side effect. The dimension of the response surface is specified in the attribute self.ss.dim.

Parameters

inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.
outputs (numpy.ndarray) – n_sample-by-output_dim output matrix.
gradients (numpy.ndarray) – n_samples-by-output_dim-by-input_dim gradients matrix.

predict(inputs)[source]

Predict method of cross-validation.

Parameters: inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.
Returns: n_samples-by-dim prediction of the surrogate response surface model at the inputs. The value dim corresponds to self.ss.dim.
Return type: numpy.ndarray

run()[source]

Run the k-fold cross validation procedure. In each fold a fit and an evaluation of the score are compute.

Returns: mean and standard deviation of the scores.
Return type: list of two numpy.ndarray.

scorer(inputs, outputs)[source]

Score function of cross-validation.

Parameters

inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.
outputs (numpy.ndarray) – n_sample-by-output_dim output matrix.

Returns

relative root mean square error between inputs and outputs.

Return type

np.float64

class Normalizer(lb, ub)[source]

Bases: object

A class for normalizing and unnormalizing bounded inputs.

Parameters

lb (numpy.ndarray) – array n_params-by-1 that contains lower bounds on the simulation inputs.
ub (numpy.ndarray) – array n_params-by-1 that contains upper bounds on the simulation inputs.

fit_transform(inputs)[source]

Return corresponding points shifted and scaled to [-1, 1]^n_params.

Parameters: inputs (numpy.ndarray) – contains all input points to normalize. The shape is n_samples-by-n_params. The components of each row of inputs should be between self.lb and self.ub.
Returns: the normalized inputs. The components of each row should be between -1 and 1.
Return type: numpy.ndarray

inverse_transform(inputs)[source]

Return corresponding points shifted and scaled to [self.lb, self.ub].

Parameters: inputs (numpy.ndarray) – contains all input points to unnormalize. The shape is n_samples-by-n_params. The components of each row of inputs should be between -1 and 1.
Returns: the unnormalized inputs. The components of each row should be between self.lb and self.ub.
Return type: numpy.ndarray

average_rrmse(hyperparams, best, csv, verbose=False, resample=5)[source]¶

Objective function to be optimized during the tuning process of the method tune_pr_matrix(). The optimal hyperparameters of the spectral distribution are searched for in a domain logarithmically scaled in base 10. For each call of average_rrmse() by the optimizer, the same hyperparameter is tested in two nested procedures: in the external procedure the projection matrix is resampled a number of times specified by the resample parameter; in the internal procedure the relative root mean squared error (rrmse()) is evaluated as the k-fold mean of a k-fold cross-validation procedure. The score of a single fold of this cross-validation procedure is the rrmse on the validation set of the predictions of the response surface built with a Subspace object on the training set.

Parameters

hyperparameters (list) – logarithm of the parameter of the spectral distribution passed to average_rrmse by the optimizer.
csv ('CrossValidation') – CrossValidation object which contains the same Subspace object and the inputs, outputs, gradients datasets. The
best (list) – list that records the best score and the best projection matrix. The initial values are 0.8 and a n_features-by-input_dim numpy.ndarray of zeros.
resample (int) – number of times the projection matrix is resampled from the same spectral distribution with the same hyperparameter.
verbose (bool) – True to print the score for each resample.

Returns

minumum of the scores evaluated for the same hyperparameter and a specified number of resamples of the projection matrix.

Return type

numpy.float64

initialize_weights(matrix)[source]¶

Inizialize uniform weights for simple Monte Carlo method or linear regression in local linear gradients.

Parameters: matrix (numpy.ndarray) – matrix which shape[0] value contains the dimension of the weights to be computed.
Returns: weights
Return type: numpy.ndarray

linear_program_ineq(c, A, b)[source]¶

Solves an equality constrained linear program with variable bounds. This method returns the minimizer of the following linear program.

minimize c^T x subject to A x >= b

Parameters

c (numpy.ndarray) – coefficients vector of the linear objective function to be minimized.
A (numpy.ndarray) – 2-D array which, when matrix-multiplied by x, gives the values of the lower-bound inequality constraints at x.
b (numpy.ndarray) – 1-D array of values representing the lower-bound of each inequality constraint (row) in A.

Returns

the independent variable vector which minimizes the linear programming problem.

Return type

numpy.ndarray

Raises

RuntimeError

local_linear_gradients(inputs, outputs, weights=None, n_neighbors=None)[source]¶

Estimate a collection of gradients from input/output pairs.

Given a set of input/output pairs, choose subsets of neighboring points and build a local linear model for each subset. The gradients of these local linear models comprise estimates of sampled gradients.

Parameters

inputs (numpy.ndarray) – M-by-m matrix that contains the m-dimensional inputs
outputs (numpy.ndarray) – M-by-1 matrix that contains scalar outputs
weights (numpy.ndarray) – M-by-1 matrix that contains the weights for each observation (default None)
n_neighbors (int) – how many nearest neighbors to use when constructing the local linear model. the default value is floor(1.7*m)

Returns

M-by-m matrix that contains estimated partial derivatives approximated by the local linear models; the corresponding new inputs

Return type

numpy.ndarray, numpy.ndarray

Raises

ValueError, TypeError

rrmse(predictions, targets)[source]¶

Evaluates the relative root mean square error. It can be vectorized for multidimensional predictions and targets.

Parameters

predictions (numpy.ndarray) – predictions input.
targets (numpy.ndarray) – targets input.

Returns

relative root mean squared error

Return type

np.float64

sort_eigpairs(evals, evects)[source]¶

Sort eigenpairs.

Parameters

evals (numpy.ndarray) – eigenvalues.
evects (numpy.ndarray) – eigenvectors.

Returns

vector of sorted eigenvalues; orthogonal matrix of corresponding eigenvectors.

Return type

numpy.ndarray, numpy.ndarray

Note

Eigenvectors are unique up to a sign. We make the choice to normalize the eigenvectors so that the first component of each eigenvector is positive. This normalization is very helpful for the bootstrapping.