AdaptiveGELU#
- class AdaptiveGELU(alpha=None, beta=None, gamma=None, fixed=None)[source]#
Bases:
AdaptiveActivationFunctionInterface
Adaptive trainable
GELU
activation function.Given the function \(\text{GELU}:\mathbb{R}^n\rightarrow\mathbb{R}^n\), the adaptive function \(\text{GELU}_{\text{adaptive}}:\mathbb{R}^n\rightarrow\mathbb{R}^n\) is defined as:
\[\text{GELU}_{\text{adaptive}}({x}) = \alpha\,\text{GELU}(\beta{x}+\gamma),\]where \(\alpha,\,\beta,\,\gamma\) are trainable parameters, and the GELU function is defined as:
\[\text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3)))\]See also
Original reference: Godfrey, Luke B., and Michael S. Gashler. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 2015 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K). Vol. 1. IEEE, 2015. DOI: arXiv preprint arXiv:1602.01321..
Jagtap, Ameya D., Kenji Kawaguchi, and George Em Karniadakis. Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics 404 (2020): 109136. DOI: JCP 10.1016.
Initializes the Adaptive Function.
- Parameters:
alpha (float | complex) – Scaling parameter alpha. Defaults to
None
. WhenNone
is passed, the variable is initialized to 1.beta (float | complex) – Scaling parameter beta. Defaults to
None
. WhenNone
is passed, the variable is initialized to 1.gamma (float | complex) – Shifting parameter gamma. Defaults to
None
. WhenNone
is passed, the variable is initialized to 1.fixed (list) – List of parameters to fix during training, i.e. not optimized (
requires_grad
set toFalse
). Options arealpha
,beta
,gamma
. Defaults to None.