Utilities

Utility functions module.

CrossValidation.fit

Uses Gaussian process regression to build the response surface as a side effect.

CrossValidation.predict

Predict method of cross-validation.

CrossValidation.run

Run the k-fold cross validation procedure.

CrossValidation.scorer

Score function of cross-validation.

Normalizer.fit_transform

Return corresponding points shifted and scaled to [-1, 1]^n_params.

Normalizer.inverse_transform

Return corresponding points shifted and scaled to [self.lb, self.ub].

average_rrmse

Objective function to be optimized during the tuning process of the method tune_pr_matrix().

initialize_weights

Inizialize uniform weights for simple Monte Carlo method or linear regression in local linear gradients.

linear_program_ineq

Solves an equality constrained linear program with variable bounds.

local_linear_gradients

Estimate a collection of gradients from input/output pairs.

rrmse

Evaluates the relative root mean square error.

sort_eigpairs

Sort eigenpairs.

class CrossValidation(inputs, outputs, gradients, subspace, folds=5, **kwargs)[source]

Bases: object

Class to perform k-fold cross validation when tuning hyperparameters for the design of a response surface with ActiveSubspaces or KernelActiveSubspaces. Used in particular during the tuning of the parameters of the spectral distribution of the feature map, inside the object function average_rrmse. default score is the relative root mean square error (rrmse).

Parameters
  • inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.

  • outputs (numpy.ndarray) – n_sample-by-output_dim output matrix.

  • gradients (numpy.ndarray) – n_samples-by-output_dim-by-input_dim gradients matrix.

  • subspace (Subspaces) – ActiveSubspace or KernelActiveSubspace object, from which evaluate the response surface. The dimension of the response surface is specified in subspace.dim attribute.

  • folds (int) – number of folds of the cross-validation procedure.

  • kwargs (dict) – additional paramters organized in a dictionary to pass to subspace.fit method. For example ‘weights’ or ‘metric’.

Variables

gp (sklearn.gaussian_process.GaussianProcessRegressor) – Gaussian process of the response surface built with scikit-learn.

fit(inputs, gradients, outputs)[source]

Uses Gaussian process regression to build the response surface as a side effect. The dimension of the response surface is specified in the attribute self.ss.dim.

Parameters
  • inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.

  • outputs (numpy.ndarray) – n_sample-by-output_dim output matrix.

  • gradients (numpy.ndarray) – n_samples-by-output_dim-by-input_dim gradients matrix.

predict(inputs)[source]

Predict method of cross-validation.

Parameters

inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.

Returns

n_samples-by-dim prediction of the surrogate response surface model at the inputs. The value dim corresponds to self.ss.dim.

Return type

numpy.ndarray

run()[source]

Run the k-fold cross validation procedure. In each fold a fit and an evaluation of the score are compute.

Returns

mean and standard deviation of the scores.

Return type

list of two numpy.ndarray.

scorer(inputs, outputs)[source]

Score function of cross-validation.

Parameters
  • inputs (numpy.ndarray) – n_samples-by-input_dim input matrix.

  • outputs (numpy.ndarray) – n_sample-by-output_dim output matrix.

Returns

relative root mean square error between inputs and outputs.

Return type

np.float64

class Normalizer(lb, ub)[source]

Bases: object

A class for normalizing and unnormalizing bounded inputs.

Parameters
  • lb (numpy.ndarray) – array n_params-by-1 that contains lower bounds on the simulation inputs.

  • ub (numpy.ndarray) – array n_params-by-1 that contains upper bounds on the simulation inputs.

fit_transform(inputs)[source]

Return corresponding points shifted and scaled to [-1, 1]^n_params.

Parameters

inputs (numpy.ndarray) – contains all input points to normalize. The shape is n_samples-by-n_params. The components of each row of inputs should be between self.lb and self.ub.

Returns

the normalized inputs. The components of each row should be between -1 and 1.

Return type

numpy.ndarray

inverse_transform(inputs)[source]

Return corresponding points shifted and scaled to [self.lb, self.ub].

Parameters

inputs (numpy.ndarray) – contains all input points to unnormalize. The shape is n_samples-by-n_params. The components of each row of inputs should be between -1 and 1.

Returns

the unnormalized inputs. The components of each row should be between self.lb and self.ub.

Return type

numpy.ndarray

average_rrmse(hyperparams, best, csv, verbose=False, resample=5)[source]

Objective function to be optimized during the tuning process of the method tune_pr_matrix(). The optimal hyperparameters of the spectral distribution are searched for in a domain logarithmically scaled in base 10. For each call of average_rrmse() by the optimizer, the same hyperparameter is tested in two nested procedures: in the external procedure the projection matrix is resampled a number of times specified by the resample parameter; in the internal procedure the relative root mean squared error (rrmse()) is evaluated as the k-fold mean of a k-fold cross-validation procedure. The score of a single fold of this cross-validation procedure is the rrmse on the validation set of the predictions of the response surface built with a Subspace object on the training set.

Parameters
  • hyperparameters (list) – logarithm of the parameter of the spectral distribution passed to average_rrmse by the optimizer.

  • csv ('CrossValidation') – CrossValidation object which contains the same Subspace object and the inputs, outputs, gradients datasets. The

  • best (list) – list that records the best score and the best projection matrix. The initial values are 0.8 and a n_features-by-input_dim numpy.ndarray of zeros.

  • resample (int) – number of times the projection matrix is resampled from the same spectral distribution with the same hyperparameter.

  • verbose (bool) – True to print the score for each resample.

Returns

minumum of the scores evaluated for the same hyperparameter and a specified number of resamples of the projection matrix.

Return type

numpy.float64

initialize_weights(matrix)[source]

Inizialize uniform weights for simple Monte Carlo method or linear regression in local linear gradients.

Parameters

matrix (numpy.ndarray) – matrix which shape[0] value contains the dimension of the weights to be computed.

Returns

weights

Return type

numpy.ndarray

linear_program_ineq(c, A, b)[source]

Solves an equality constrained linear program with variable bounds. This method returns the minimizer of the following linear program.

minimize c^T x subject to A x >= b

Parameters
  • c (numpy.ndarray) – coefficients vector of the linear objective function to be minimized.

  • A (numpy.ndarray) – 2-D array which, when matrix-multiplied by x, gives the values of the lower-bound inequality constraints at x.

  • b (numpy.ndarray) – 1-D array of values representing the lower-bound of each inequality constraint (row) in A.

Returns

the independent variable vector which minimizes the linear programming problem.

Return type

numpy.ndarray

Raises

RuntimeError

local_linear_gradients(inputs, outputs, weights=None, n_neighbors=None)[source]

Estimate a collection of gradients from input/output pairs.

Given a set of input/output pairs, choose subsets of neighboring points and build a local linear model for each subset. The gradients of these local linear models comprise estimates of sampled gradients.

Parameters
  • inputs (numpy.ndarray) – M-by-m matrix that contains the m-dimensional inputs

  • outputs (numpy.ndarray) – M-by-1 matrix that contains scalar outputs

  • weights (numpy.ndarray) – M-by-1 matrix that contains the weights for each observation (default None)

  • n_neighbors (int) – how many nearest neighbors to use when constructing the local linear model. the default value is floor(1.7*m)

Returns

M-by-m matrix that contains estimated partial derivatives approximated by the local linear models; the corresponding new inputs

Return type

numpy.ndarray, numpy.ndarray

Raises

ValueError, TypeError

rrmse(predictions, targets)[source]

Evaluates the relative root mean square error. It can be vectorized for multidimensional predictions and targets.

Parameters
Returns

relative root mean squared error

Return type

np.float64

sort_eigpairs(evals, evects)[source]

Sort eigenpairs.

Parameters
Returns

vector of sorted eigenvalues; orthogonal matrix of corresponding eigenvectors.

Return type

numpy.ndarray, numpy.ndarray

Note

Eigenvectors are unique up to a sign. We make the choice to normalize the eigenvectors so that the first component of each eigenvector is positive. This normalization is very helpful for the bootstrapping.