Citation
@article{Wu2025Learning, author = {Shiqi Wu and Gérard Meunier and Olivier Chedebec and Ludovic Chamoin and Qianxiao Li}, title = {Learning Dynamics of Nonlinear Field-Circuit Coupled Problems}, journal = {International Journal for Numerical Methods in Engineering}, year = {2025}, doi = {10.1002/nme.70015} }
Summary
Technical Implementation
Implementation Details
In this study, we adopt ResNet as the backbone architecture and optimize the normalization strategy to better handle certain training conditions. The original input data has a dimensionality close to 7,000, which poses challenges for computational efficiency and potential overfitting. To mitigate these issues, we first apply Principal Component Analysis (PCA) for dimensionality reduction, capturing the most significant variance while reducing the feature space. This step ensures that the model processes a more compact and informative representation of the input data.
Traditional Batch Normalization (BatchNorm) relies on batch-wise statistics (mean and variance), which can become unstable in cases where the batch size is small or when the input distribution within a batch is highly discrete. In extreme cases, if all samples in a batch have identical values, the variance computed by BatchNorm approaches zero, leading to issues in gradient updates and training stability.
To address this, we replace BatchNorm with Layer Normalization (LayerNorm), which normalizes activations across feature dimensions instead of batch dimensions. This makes LayerNorm more robust for small-batch training and scenarios where inputs are discrete with limited unique values. Additionally, we employ tanh as the activation function for the following reasons:
- Symmetric Value Distribution: The output range of tanh is (-1,1), which provides a zero-centered distribution, making it more suitable for symmetric data distributions compared to ReLU.
- Gradient Stability: While tanh is known to suffer from gradient vanishing issues at extreme values, LayerNorm helps mitigate this problem by normalizing feature distributions at each layer, thereby improving training stability.
Overall, these modifications, including PCA-based dimensionality reduction, enhance model efficiency, improve robustness in small-batch training and highly discrete input scenarios, and contribute to better generalization performance.