qstack.regression.kernel_utils

Kernel computation utility functions and defaults.

Provides:

REGMODULE_PATH: Path to the module. defaults: Default parameters.

qstack.regression.kernel_utils.get_global_kernel(arg, local_kernel)[source]

Create a global kernel function from a local kernel.

Parameters:
  • arg (tuple) – Tuple of (gkernel_name, options_dict).

  • local_kernel (callable) – Local kernel function.

Returns:

Global kernel function that combines local kernels.

Return type:

callable

Raises:

NotImplementedError – If the specified global kernel is not implemented.

qstack.regression.kernel_utils.get_kernel(arg, arg2=None)[source]

Return the appropriate kernel function based on arguments.

Parameters:
  • arg (str) – Local kernel name.

  • arg2 (tuple, optional) – If provided, tuple of (global_kernel_name, options) for global kernel. Defaults to None.

Returns:

Kernel function (local or global).

Return type:

callable

qstack.regression.kernel_utils.get_local_kernel(arg)[source]

Obtain a local-environment kernel function by name.

Parameters:

arg (str) – Kernel name. Available options include: - ‘G’: Gaussian (RBF) kernel. - ‘L’: Laplacian kernel. - ‘dot’: Linear (dot product) kernel. - ‘cosine’: Cosine similarity kernel. - Implementation-specific variants: ‘G_sklearn’, ‘G_custom_c’, ‘L_sklearn’, ‘L_custom_c’, ‘L_custom_py’.

Returns:

Kernel function with signature kernel(X, Y, gamma) -> numpy.ndarray.

Return type:

callable

Raises:
  • NotImplementedError – If the specified kernel is not implemented.

  • RuntimeError – If the kernel implementation is not available (e.g., C library missing).

qstack.regression.kernel_utils.sparse_regression_kernel(K_train, y_train, sparse_idx, eta)[source]

Compute the sparse regression matrix and vector.

Solution of a sparse regression problem is $$ vec w = left( mathbf{K}_{MN} mathbf{K}_{NM} + eta mathbf{1} right) ^{-1} mathbf{K}_{MN}vec y $$ where

w: regression weights N: training set M: sparse regression set y: target K: kernel

This function computes K_solve: $mathbf{K}_{MN} mathbf{K}_{NM} + eta mathbf{1}$ and y_solve $mathbf{K}_{MN}vec y$.

Parameters:
  • K_train (numpy.1darray(Ntrain1,Ntrain) – Kernel computed on the training set. Ntrain1 (N in the equation) may differ from the full training set Ntrain (e.g. a subset).

  • y_train (numpy.1darray(Ntrain)) – array containing the target property of the full training set

  • sparse_idx (numpy.1darray of int) – (M in the equation): sparse subset indices wrt to the order of the full training set.

  • eta (float) – Regularization strength for matrix inversion.

Returns:

matrix to be inverted numpy.1darray((len(sparse)), dtype=float) : vector of the constant terms

Return type:

numpy.2darray((len(sparse), len(sparse)), dtype=float)

qstack.regression.kernel_utils.train_test_split_idx(y, idx_test=None, idx_train=None, test_size=0.2, random_state=0)[source]

Perfrom test/train data split based on random shuffling or given indices.

If neither idx_test nor idx_train are specified, the splitting

is done randomly using random_state.

If either idx_test or idx_train is specified, the rest idx are used

as the counterpart.

If both idx_test and idx_train are specified, they are returned. * Duplicates within idx_test and idx_train are not allowed. * idx_test and idx_train may overlap but a warning is raised.

Parameters:
  • y (numpy.1darray(Nsamples)) – array containing the target property of all Nsamples

  • test_size (float or int) – Test set fraction (or number of samples).

  • idx_test ([int] / numpy.1darray) – List of indices for the test set (based on the sequence in X).

  • idx_train ([int] / numpy.1darray) – List of indices for the training set (based on the sequence in X).

  • random_state (int) – The seed used for random number generator (controls train/test splitting).

Returns:

test indices numpy.1darray(Ntrain, dtype=int) : train indices numpy.1darray(Ntest, dtype=float) : test set target property numpy.1darray(Ntrain, dtype=float) : train set target property

Return type:

numpy.1darray(Ntest, dtype=int)

Raises:

RuntimeError – If test indices are repeated.