Adapter Configuration¶
Classes representing the architectures of adapter modules and fusion layers.
Single (bottleneck) adapters¶
-
class
transformers.
AdapterConfigBase
¶ Base class for all adaptation methods. This class does not define specific configuration keys, but only provides some common helper methods.
- Parameters
architecture (str, optional) – The type of adaptation method defined by the configuration.
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], download_kwargs=None, **kwargs)¶ Loads a given adapter configuration specifier into a full AdapterConfigBase instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
-
class
transformers.
AdapterConfig
(mh_adapter: bool, output_adapter: bool, reduction_factor: Union[float, collections.abc.Mapping], non_linearity: str, original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ Base class that models the architecture of an adapter.
- Parameters
mh_adapter (
bool
) – If True, add adapter modules after the multi-head attention block of each layer.output_adapter (
bool
) – If True, add adapter modules after the output FFN of each layer.reduction_factor (
float
orMapping
) – Either a scalar float (> 0) specifying the reduction factor for all layers or a mapping from layer ID (starting at 0) to values specifying the reduction_factor for individual layers. If not all layers are represented in the mapping a default value should be given e.g. {‘1’: 8, ‘6’: 32, ‘default’: 16}. Specifying a reduction factor < 1 will result in an up-projection layer.non_linearity (
str
) – The activation function to use in the adapter bottleneck.original_ln_before (
bool
, optional) – If True, apply layer pre-trained normalization and residual connection before the adapter modules. Defaults to False. Only applicable ifis_parallel
is False.original_ln_after (
bool
, optional) – If True, apply pre-trained layer normalization and residual connection after the adapter modules. Defaults to True.ln_before (
bool
, optional) – If True, add a new layer normalization before the adapter bottleneck. Defaults to False.ln_after (
bool
, optional) – If True, add a new layer normalization after the adapter bottleneck. Defaults to False.init_weights (
str
, optional) – Initialization method for the weights of the adapter modules. Currently, this can be either “bert” (default) or “mam_adapter”.is_parallel (
bool
, optional) – If True, apply adapter transformations in parallel. By default (False), sequential application is used.scaling (
float
orstr
, optional) – Scaling factor to use for scaled addition of adapter outputs as done by He et al. (2021). Can be either a constant factor (float) or the string “learned”, in which case the scaling factor is learned. Defaults to 1.0.use_gating (
bool
, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.residual_before_ln (
bool
, optional) – If True, take the residual connection around the adapter bottleneck before the layer normalization. Only applicable iforiginal_ln_before
is True.adapter_residual_before_ln (
bool
, optional) – If True, apply the residual connection around the adapter modules before the new layer normalization within the adapter. Only applicable ifln_after
is True andis_parallel
is False.inv_adapter (
str
, optional) – If not None (default), add invertible adapter modules after the model embedding layer. Currently, this can be either “nice” or “glow”.inv_adapter_reduction_factor (
float
, optional) – The reduction to use within the invertible adapter modules. Only applicable ifinv_adapter
is not None.cross_adapter (
bool
, optional) – If True, add adapter modules after the cross attention block of each decoder layer in an encoder-decoder model. Defaults to False.leave_out (
List[int]
, optional) – The IDs of the layers (starting at 0) where NO adapter modules should be added.phm_layer (
bool
, optional) – If True the down and up projection layers are a PHMLayer. Defaults to Falsephm_dim (
int
, optional) – The dimension of the phm matrix. Only applicable if phm_layer is set to True. Defaults to 4.shared_phm_rule (
bool
, optional) – Whether the phm matrix is shared across all layers. Defaults to Truefactorized_phm_rule (
bool
, optional) – Whether the phm matrix is factorized into a left and right matrix. Defaults to False.learn_phm (
bool
, optional) – Whether the phm matrix should be learned during training. Defaults to True( (factorized_phm_W) – obj:bool, optional): Whether the weights matrix is factorized into a left and right matrix. Defaults to True
shared_W_phm (
bool
, optional) – Whether the weights matrix is shared across all layers. Defaults to False.phm_c_init (
str
, optional) – The initialization function for the weights of the phm matrix. The possible values are [“normal”, “uniform”]. Defaults to normal.phm_init_range (
float
, optional) – std for initializing phm weights if phm_c_init=”normal”. Defaults to 0.0001.hypercomplex_nonlinearity (
str
, optional) – This specifies the distribution to draw the weights in the phm layer from. Defaults to glorot-uniform.phm_rank (
int
, optional) – If the weight matrix is factorized this specifies the rank of the matrix. E.g. the left matrix of the down projection has the shape (phm_dim, _in_feats_per_axis, phm_rank) and the right matrix (phm_dim, phm_rank, _out_feats_per_axis). Defaults to 1phm_bias (
bool
, optional) – If True the down and up projection PHMLayer has a bias term. If phm_layer is False this is ignored. Defaults to True
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], download_kwargs=None, **kwargs)¶ Loads a given adapter configuration specifier into a full AdapterConfigBase instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
-
class
transformers.
PfeifferConfig
(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.
-
class
transformers.
PfeifferInvConfig
(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = 'nice', inv_adapter_reduction_factor: Optional[float] = 2, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.
-
class
transformers.
HoulsbyConfig
(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The adapter architecture proposed by Houlsby et al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.
-
class
transformers.
HoulsbyInvConfig
(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = 'nice', inv_adapter_reduction_factor: Optional[float] = 2, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The adapter architecture proposed by Houlsby et. al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.
-
class
transformers.
ParallelConfig
(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 2, non_linearity: str = 'relu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'mam_adapter', is_parallel: bool = True, scaling: Union[float, str] = 4.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The parallel adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.
-
class
transformers.
CompacterConfig
(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The Compacter architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.
-
class
transformers.
CompacterPlusPlusConfig
(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: Union[float, collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: bool = True, adapter_residual_before_ln: bool = False, inv_adapter: Optional[str] = None, inv_adapter_reduction_factor: Optional[float] = None, cross_adapter: bool = False, leave_out: List[int] = <factory>, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: Optional[bool] = True, shared_W_phm: Optional[bool] = False, shared_phm_rule: Optional[bool] = True, factorized_phm_rule: Optional[bool] = False, phm_c_init: Optional[str] = 'normal', phm_init_range: Optional[float] = 0.0001, learn_phm: Optional[bool] = True, hypercomplex_nonlinearity: Optional[str] = 'glorot-uniform', phm_rank: Optional[int] = 1, phm_bias: Optional[bool] = True)¶ The Compacter++ architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.
Prefix Tuning¶
-
class
transformers.
PrefixTuningConfig
(architecture: Optional[str] = 'prefix_tuning', encoder_prefix: bool = True, cross_prefix: bool = True, leave_out: List[int] = <factory>, flat: bool = False, prefix_length: int = 30, bottleneck_size: int = 512, non_linearity: str = 'tanh', dropout: float = 0.0, use_gating: bool = False, shared_gating: bool = True)¶ The Prefix Tuning architecture proposed by Li & Liang (2021). See https://arxiv.org/pdf/2101.00190.pdf.
- Parameters
encoder_prefix (bool) – If True, add prefixes to the encoder of an encoder-decoder model.
cross_prefix (bool) – If True, add prefixes to the cross attention of an encoder-decoder model.
flat (bool) – If True, train the prefix parameters directly. Otherwise, reparametrize using a bottleneck MLP.
prefix_length (int) – The length of the prefix.
bottleneck_size (int) – If flat=False, the size of the bottleneck MLP.
non_linearity (str) – If flat=False, the non-linearity used in the bottleneck MLP.
dropout (float) – The dropout rate used in the prefix tuning layer.
leave_out (List[int]) – The IDs of the layers (starting at 0) where NO prefix should be added.
use_gating (
bool
, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.( (shared_gating) – obj:bool, optional): Whether to use a shared gate for the prefixes of all attention matrices. Only applicable if use_gating=True. Defaults to True.
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], download_kwargs=None, **kwargs)¶ Loads a given adapter configuration specifier into a full AdapterConfigBase instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
LoRAConfig¶
-
class
transformers.
LoRAConfig
(architecture: Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = False, output_lora: bool = False, r: int = 8, alpha: int = 8, dropout: float = 0.0, attn_matrices: List[str] = <factory>, composition_mode: str = 'add', init_weights: str = 'lora', use_gating: bool = False)¶ The Low-Rank Adaptation (LoRA) architecture proposed by Hu et al. (2021). See https://arxiv.org/pdf/2106.09685.pdf. LoRA adapts a model by reparametrizing the weights of a layer matrix. You can merge the additional weights with the original layer weights using
model.merge_adapter("lora_name")
.- Parameters
selfattn_lora (bool, optional) – If True, add LoRA to the self-attention weights of a model. Defaults to True.
intermediate_lora (bool, optional) – If True, add LoRA to the intermediate MLP weights of a model. Defaults to False.
output_lora (bool, optional) – If True, add LoRA to the output MLP weights of a model. Defaults to False.
r (int, optional) – The rank of the LoRA layer. Defaults to 8.
alpha (int, optional) – The hyperparameter used for scaling the LoRA reparametrization. Defaults to 8.
dropout (float, optional) – The dropout rate used in the LoRA layer. Defaults to 0.0.
attn_matrices (List[str], optional) – Determines which matrices of the self-attention module to adapt. A list that may contain the strings “q” (query), “k” (key), “v” (value). Defaults to [“q”, “v”].
composition_mode (str, optional) – Defines how the injected weights are composed with the original model weights. Can be either “add” (addition of decomposed matrix, as in LoRA) or “scale” (element-wise multiplication of vector, as in (IA)^3). “scale” can only be used together with r=1. Defaults to “add”.
init_weights (
str
, optional) – Initialization method for the weights of the LoRA modules. Currently, this can be either “lora” (default) or “bert”.use_gating (
bool
, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False. Note that modules with use_gating=True cannot be merged using merge_adapter().
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], download_kwargs=None, **kwargs)¶ Loads a given adapter configuration specifier into a full AdapterConfigBase instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
IA3Config¶
-
class
transformers.
IA3Config
(architecture: Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = True, output_lora: bool = False, r: int = 1, alpha: int = 1, dropout: float = 0.0, attn_matrices: List[str] = <factory>, composition_mode: str = 'scale', init_weights: str = 'ia3', use_gating: bool = False)¶ The ‘Infused Adapter by Inhibiting and Amplifying Inner Activations’ ((IA)^3) architecture proposed by Liu et al. (2022). See https://arxiv.org/pdf/2205.05638.pdf. (IA)^3 builds on top of LoRA, however, unlike the additive composition of LoRA, it scales weights of a layer using an injected vector.
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], download_kwargs=None, **kwargs)¶ Loads a given adapter configuration specifier into a full AdapterConfigBase instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
-
classmethod
Combined configurations¶
-
class
transformers.
ConfigUnion
(*configs: List[transformers.adapters.configuration.AdapterConfigBase])¶ Composes multiple adaptation method configurations into one. This class can be used to define complex adaptation method setups.
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], download_kwargs=None, **kwargs)¶ Loads a given adapter configuration specifier into a full AdapterConfigBase instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
-
static
validate
(configs)¶ Performs simple validations of a list of configurations to check whether they can be combined to a common setup.
- Parameters
configs (List[AdapterConfigBase]) – list of configs to check.
- Raises
TypeError – One of the configurations has a wrong type. ValueError: At least two given configurations
conflict. –
-
classmethod
-
class
transformers.
MAMConfig
(prefix_tuning: Optional[transformers.adapters.configuration.PrefixTuningConfig] = None, adapter: Optional[transformers.adapters.configuration.AdapterConfig] = None)¶ The Mix-And-Match adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.
-
class
transformers.
UniPELTConfig
(prefix_tuning: Optional[transformers.adapters.configuration.PrefixTuningConfig] = None, adapter: Optional[transformers.adapters.configuration.AdapterConfig] = None, lora: Optional[transformers.adapters.configuration.LoRAConfig] = None)¶ The UniPELT adapter architecture proposed by Mao et al. (2022). See https://arxiv.org/pdf/2110.07577.pdf.
Adapter Fusion¶
-
class
transformers.
AdapterFusionConfig
(key: bool, query: bool, value: bool, query_before_ln: bool, regularization: bool, residual_before: bool, temperature: bool, value_before_softmax: bool, value_initialized: str)¶ Base class that models the architecture of an adapter fusion layer.
-
classmethod
from_dict
(config)¶ Creates a config class from a Python dict.
-
classmethod
load
(config: Union[dict, str], **kwargs)¶ Loads a given adapter fusion configuration specifier into a full AdapterFusionConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTERFUSION_CONFIG_MAP
the path to a file containing a full adapter fusion configuration
- Returns
The resolved adapter fusion configuration dictionary.
- Return type
dict
-
replace
(**changes)¶ Returns a new instance of the config class with the specified changes applied.
-
to_dict
()¶ Converts the config class to a Python dict.
-
classmethod
-
class
transformers.
StaticAdapterFusionConfig
(key: bool = True, query: bool = True, value: bool = False, query_before_ln: bool = False, regularization: bool = False, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = False)¶ Static version of adapter fusion without a value matrix. See https://arxiv.org/pdf/2005.00247.pdf.
-
class
transformers.
DynamicAdapterFusionConfig
(key: bool = True, query: bool = True, value: bool = True, query_before_ln: bool = False, regularization: bool = True, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = True)¶ Dynamic version of adapter fusion with a value matrix and regularization. See https://arxiv.org/pdf/2005.00247.pdf.
Adapter Setup¶
-
class
transformers.
AdapterSetup
(adapter_setup, head_setup=None, ignore_empty: bool = False)¶ Represents an adapter setup of a model including active adapters and active heads. This class is intended to be used as a context manager using the
with
statement. The setup defined by theAdapterSetup
context will override static adapter setups defined in a model (i.e. setups specified viaactive_adapters
).Example:
with AdapterSetup(Stack("a", "b")): # will use the adapter stack "a" and "b" outputs = model(**inputs)
Note that the context manager is thread-local, i.e. it can be used with different setups in a multi-threaded environment.