Adapter Configuration

Classes representing the architectures of adapter modules and fusion layers.

Single (bottleneck) adapters

class transformers.AdapterConfigBase

Base class for all adaptation methods. This class does not define specific configuration keys, but only provides some common helper methods.

Parameters

architecture (str, optional) – The type of adaptation method defined by the configuration.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfigBase instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class transformers.AdapterConfig(mh_adapter: bool, output_adapter: bool, reduction_factor: Union[float, collections.abc.Mapping], non_linearity: str, original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

Base class that models the architecture of an adapter.

Parameters
  • mh_adapter (bool) – If True, add adapter modules after the multi-head attention block of each layer.

  • output_adapter (bool) – If True, add adapter modules after the output FFN of each layer.

  • reduction_factor (float or Mapping) – Either a scalar float (> 0) specifying the reduction factor for all layers or a mapping from layer ID (starting at 0) to values specifying the reduction_factor for individual layers. If not all layers are represented in the mapping a default value should be given e.g. {‘1’: 8, ‘6’: 32, ‘default’: 16}. Specifying a reduction factor < 1 will result in an up-projection layer.

  • non_linearity (str) – The activation function to use in the adapter bottleneck.

  • original_ln_before (bool, optional) – If True, apply layer pre-trained normalization and residual connection before the adapter modules. Defaults to False. Only applicable if is_parallel is False.

  • original_ln_after (bool, optional) – If True, apply pre-trained layer normalization and residual connection after the adapter modules. Defaults to True.

  • ln_before (bool, optional) – If True, add a new layer normalization before the adapter bottleneck. Defaults to False.

  • ln_after (bool, optional) – If True, add a new layer normalization after the adapter bottleneck. Defaults to False.

  • init_weights (str, optional) – Initialization method for the weights of the adapter modules. Currently, this can be either “bert” (default) or “mam_adapter”.

  • is_parallel (bool, optional) – If True, apply adapter transformations in parallel. By default (False), sequential application is used.

  • scaling (float or str, optional) – Scaling factor to use for scaled addition of adapter outputs as done by He et al. (2021). Can be either a constant factor (float) or the string “learned”, in which case the scaling factor is learned. Defaults to 1.0.

  • use_gating (bool, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.

  • residual_before_ln (bool, optional) – If True, take the residual connection around the adapter bottleneck before the layer normalization. Only applicable if original_ln_before is True.

  • adapter_residual_before_ln (bool, optional) – If True, apply the residual connection around the adapter modules before the new layer normalization within the adapter. Only applicable if ln_after is True and is_parallel is False.

  • inv_adapter (str, optional) – If not None (default), add invertible adapter modules after the model embedding layer. Currently, this can be either “nice” or “glow”.

  • inv_adapter_reduction_factor (float, optional) – The reduction to use within the invertible adapter modules. Only applicable if inv_adapter is not None.

  • cross_adapter (bool, optional) – If True, add adapter modules after the cross attention block of each decoder layer in an encoder-decoder model. Defaults to False.

  • leave_out (List[int], optional) – The IDs of the layers (starting at 0) where NO adapter modules should be added.

  • phm_layer (bool, optional) – If True the down and up projection layers are a PHMLayer. Defaults to False

  • phm_dim (int, optional) – The dimension of the phm matrix. Only applicable if phm_layer is set to True. Defaults to 4.

  • shared_phm_rule (bool, optional) – Whether the phm matrix is shared across all layers. Defaults to True

  • factorized_phm_rule (bool, optional) – Whether the phm matrix is factorized into a left and right matrix. Defaults to False.

  • learn_phm (bool, optional) – Whether the phm matrix should be learned during training. Defaults to True

  • ( (factorized_phm_W) – obj:bool, optional): Whether the weights matrix is factorized into a left and right matrix. Defaults to True

  • shared_W_phm (bool, optional) – Whether the weights matrix is shared across all layers. Defaults to False.

  • phm_c_init (str, optional) – The initialization function for the weights of the phm matrix. The possible values are [“normal”, “uniform”]. Defaults to normal.

  • phm_init_range (float, optional) – std for initializing phm weights if phm_c_init=”normal”. Defaults to 0.0001.

  • hypercomplex_nonlinearity (str, optional) – This specifies the distribution to draw the weights in the phm layer from. Defaults to glorot-uniform.

  • phm_rank (int, optional) – If the weight matrix is factorized this specifies the rank of the matrix. E.g. the left matrix of the down projection has the shape (phm_dim, _in_feats_per_axis, phm_rank) and the right matrix (phm_dim, phm_rank, _out_feats_per_axis). Defaults to 1

  • phm_bias (bool, optional) – If True the down and up projection PHMLayer has a bias term. If phm_layer is False this is ignored. Defaults to True

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfigBase instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class transformers.PfeifferConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.

class transformers.PfeifferInvConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = 'nice', inv_adapter_reduction_factor: Optional[float] = 2, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.

class transformers.HoulsbyConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The adapter architecture proposed by Houlsby et al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.

class transformers.HoulsbyInvConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = 'nice', inv_adapter_reduction_factor: Optional[float] = 2, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The adapter architecture proposed by Houlsby et. al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.

class transformers.ParallelConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 2, non_linearity: str = 'relu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'mam_adapter', is_parallel: bool = True, scaling: Union[float, str] = 4.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The parallel adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.

class transformers.CompacterConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The Compacter architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.

class transformers.CompacterPlusPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)

The Compacter++ architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.

Prefix Tuning

class transformers.PrefixTuningConfig(architecture: Optional[str] = 'prefix_tuning', encoder_prefix: bool = True, cross_prefix: bool = True, leave_out: List[int] = <factory>, flat: bool = False, prefix_length: int = 30, bottleneck_size: int = 512, non_linearity: str = 'tanh', dropout: float = 0.0, use_gating: bool = False, shared_gating: bool = True)

The Prefix Tuning architecture proposed by Li & Liang (2021). See https://arxiv.org/pdf/2101.00190.pdf.

Parameters
  • encoder_prefix (bool) – If True, add prefixes to the encoder of an encoder-decoder model.

  • cross_prefix (bool) – If True, add prefixes to the cross attention of an encoder-decoder model.

  • flat (bool) – If True, train the prefix parameters directly. Otherwise, reparametrize using a bottleneck MLP.

  • prefix_length (int) – The length of the prefix.

  • bottleneck_size (int) – If flat=False, the size of the bottleneck MLP.

  • non_linearity (str) – If flat=False, the non-linearity used in the bottleneck MLP.

  • dropout (float) – The dropout rate used in the prefix tuning layer.

  • leave_out (List[int]) – The IDs of the layers (starting at 0) where NO prefix should be added.

  • use_gating (bool, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.

  • ( (shared_gating) – obj:bool, optional): Whether to use a shared gate for the prefixes of all attention matrices. Only applicable if use_gating=True. Defaults to True.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfigBase instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

LoRAConfig

class transformers.LoRAConfig(architecture: Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = False, output_lora: bool = False, r: int = 8, alpha: int = 8, dropout: float = 0.0, attn_matrices: List[str] = <factory>, composition_mode: str = 'add', init_weights: str = 'lora', use_gating: bool = False)

The Low-Rank Adaptation (LoRA) architecture proposed by Hu et al. (2021). See https://arxiv.org/pdf/2106.09685.pdf. LoRA adapts a model by reparametrizing the weights of a layer matrix. You can merge the additional weights with the original layer weights using model.merge_adapter("lora_name").

Parameters
  • selfattn_lora (bool, optional) – If True, add LoRA to the self-attention weights of a model. Defaults to True.

  • intermediate_lora (bool, optional) – If True, add LoRA to the intermediate MLP weights of a model. Defaults to False.

  • output_lora (bool, optional) – If True, add LoRA to the output MLP weights of a model. Defaults to False.

  • r (int, optional) – The rank of the LoRA layer. Defaults to 8.

  • alpha (int, optional) – The hyperparameter used for scaling the LoRA reparametrization. Defaults to 8.

  • dropout (float, optional) – The dropout rate used in the LoRA layer. Defaults to 0.0.

  • attn_matrices (List[str], optional) – Determines which matrices of the self-attention module to adapt. A list that may contain the strings “q” (query), “k” (key), “v” (value). Defaults to [“q”, “v”].

  • composition_mode (str, optional) – Defines how the injected weights are composed with the original model weights. Can be either “add” (addition of decomposed matrix, as in LoRA) or “scale” (element-wise multiplication of vector, as in (IA)^3). “scale” can only be used together with r=1. Defaults to “add”.

  • init_weights (str, optional) – Initialization method for the weights of the LoRA modules. Currently, this can be either “lora” (default) or “bert”.

  • use_gating (bool, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False. Note that modules with use_gating=True cannot be merged using merge_adapter().

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfigBase instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

IA3Config

class transformers.IA3Config(architecture: Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = True, output_lora: bool = False, r: int = 1, alpha: int = 1, dropout: float = 0.0, attn_matrices: List[str] = <factory>, composition_mode: str = 'scale', init_weights: str = 'ia3', use_gating: bool = False)

The ‘Infused Adapter by Inhibiting and Amplifying Inner Activations’ ((IA)^3) architecture proposed by Liu et al. (2022). See https://arxiv.org/pdf/2205.05638.pdf. (IA)^3 builds on top of LoRA, however, unlike the additive composition of LoRA, it scales weights of a layer using an injected vector.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfigBase instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

Combined configurations

class transformers.ConfigUnion(*configs: List[transformers.adapters.configuration.AdapterConfigBase])

Composes multiple adaptation method configurations into one. This class can be used to define complex adaptation method setups.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfigBase instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

static validate(configs)

Performs simple validations of a list of configurations to check whether they can be combined to a common setup.

Parameters

configs (List[AdapterConfigBase]) – list of configs to check.

Raises
  • TypeError – One of the configurations has a wrong type. ValueError: At least two given configurations

  • conflict.

class transformers.MAMConfig(prefix_tuning: Optional[transformers.adapters.configuration.PrefixTuningConfig] = None, adapter: Optional[transformers.adapters.configuration.AdapterConfig] = None)

The Mix-And-Match adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.

class transformers.UniPELTConfig(prefix_tuning: Optional[transformers.adapters.configuration.PrefixTuningConfig] = None, adapter: Optional[transformers.adapters.configuration.AdapterConfig] = None, lora: Optional[transformers.adapters.configuration.LoRAConfig] = None)

The UniPELT adapter architecture proposed by Mao et al. (2022). See https://arxiv.org/pdf/2110.07577.pdf.

Adapter Fusion

class transformers.AdapterFusionConfig(key: bool, query: bool, value: bool, query_before_ln: bool, regularization: bool, residual_before: bool, temperature: bool, value_before_softmax: bool, value_initialized: str)

Base class that models the architecture of an adapter fusion layer.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], **kwargs)

Loads a given adapter fusion configuration specifier into a full AdapterFusionConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTERFUSION_CONFIG_MAP

  • the path to a file containing a full adapter fusion configuration

Returns

The resolved adapter fusion configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class transformers.StaticAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = False, query_before_ln: bool = False, regularization: bool = False, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = False)

Static version of adapter fusion without a value matrix. See https://arxiv.org/pdf/2005.00247.pdf.

class transformers.DynamicAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = True, query_before_ln: bool = False, regularization: bool = True, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = True)

Dynamic version of adapter fusion with a value matrix and regularization. See https://arxiv.org/pdf/2005.00247.pdf.

Adapter Setup

class transformers.AdapterSetup(adapter_setup, head_setup=None, ignore_empty: bool = False)

Represents an adapter setup of a model including active adapters and active heads. This class is intended to be used as a context manager using the with statement. The setup defined by the AdapterSetup context will override static adapter setups defined in a model (i.e. setups specified via active_adapters).

Example:

with AdapterSetup(Stack("a", "b")):
    # will use the adapter stack "a" and "b" outputs = model(**inputs)

Note that the context manager is thread-local, i.e. it can be used with different setups in a multi-threaded environment.