Adapter Modules

Classes implementing task and language adapters.

class transformers.adapters.modeling.Activation_Function_Class(hidden_act)

Implementation of various activation function.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class transformers.adapters.modeling.Adapter(adapter_name, input_size, down_sample, config: transformers.adapters.configuration.AdapterConfig)

Implementation of a sequential bottleneck adapter block.

forward(x, residual_input, output_gating=False)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static init_bert_weights(module)

Initialize the weights.

post_forward(hidden_states, input_hidden_states, input_tensor, layer_norm)

Performs computations after the forward pass of the adapter block(s). This e.g. includes applying the residual connection and layer norm if configured in this way.

Parameters
  • hidden_states – The hidden states outputted by the adapter block(s).

  • input_hidden_states – Residual connection before the adapter block(s).

  • input_tensor – Residual connection before the Transformer FFN/ attention layer.

  • layer_norm – Transformer LayerNorm.

Returns

The modified hidden states.

pre_forward(hidden_states, input_tensor, layer_norm, fusion_config=None)

Retrieves the hidden_states, query (for Fusion), and residual connection according to the set configuration.

Parameters
  • adapter_config – config file according to what the parameters are passed

  • hidden_states – output of previous layer

  • input_tensor – residual connection before FFN

Returns: hidden_states, query, residual

class transformers.adapters.modeling.BertFusion(config: transformers.adapters.configuration.AdapterFusionConfig, dense_size, attention_probs_dropout_prob)

Implementation of an AdapterFusion block.

forward(query, key, value, residual, output_attentions: bool = False)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class transformers.adapters.modeling.GLOWCouplingBlock(dims_in, dims_c=[], non_linearity='relu', reduction_factor=2, clamp=5.0)

Coupling Block following the GLOW design. The only difference to the RealNVP coupling blocks, is the fact that it uses a single subnetwork to jointly predict [s_i, t_i], instead of two separate subnetworks. This reduces computational cost and speeds up learning. clamp: Soft clamping for the multiplicative component. The amplification or attenuation of each input dimension can be at most ±exp(clamp).

forward(x, c=[], rev=False)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class transformers.adapters.modeling.NICECouplingBlock(dims_in, dims_c=[], non_linearity='relu', reduction_factor=2)

Coupling Block following the NICE design.

forward(x, c=[], rev=False)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class transformers.adapters.modeling.PHMLayer(adapter_name: str, in_features: int, out_features: int, position: str, config: dict)

This class is adapted from the compacter implementation at https://github.com/rabeehk/compacter

forward(x: torch.Tensor) → torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

set_phm_rule(phm_rule=None, phm_rule_left=None, phm_rule_right=None)

If factorized_phm_rules is set, phm_rule is a tuple, showing the left and right phm rules, and if this is not set, this is showing the phm_rule.

class transformers.adapters.modeling.ParallelAdapter(adapter_name, input_size, down_sample, config: transformers.adapters.configuration.AdapterConfig)

Implementation of a parallel bottleneck adapter block.

forward(x, residual_input, output_gating=False)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

post_forward(hidden_states, input_hidden_states, input_tensor, layer_norm)

Performs computations after the forward pass of the adapter block(s). This e.g. includes applying the residual connection and layer norm if configured in this way.

Parameters
  • hidden_states – The hidden states outputted by the adapter block(s).

  • input_hidden_states – Residual connection before the adapter block(s).

  • input_tensor – Residual connection before the Transformer FFN/ attention layer.

  • layer_norm – Transformer LayerNorm.

Returns

The modified hidden states.

pre_forward(hidden_states, input_tensor, layer_norm, fusion_config=None)

Retrieves the hidden_states, query (for Fusion), and residual connection according to the set configuration.

Parameters
  • adapter_config – config file according to what the parameters are passed

  • hidden_states – output of previous layer

  • input_tensor – residual connection before FFN

Returns: hidden_states, query, residual

transformers.adapters.modeling.init_W(config, W_left=None, W_right=None, W=None)

Initialize the weights for the compacter module or the shared parameters

transformers.adapters.modeling.init_shared_parameters(config, in_features, device)

Create and initialize the parameters shared by all compacter modules

transformers.adapters.modeling.kronecker_product(a, b)

Copied from rabeehk/compacter seq2seq/hypercomplex/kronecker.py

Kronecker product of matrices a and b with leading batch dimensions. Batch dimensions are broadcast. The number of them mush :type a: torch.Tensor :type b: torch.Tensor :rtype: torch.Tensor