Alchemical Model¶
Warning
This is an experimental model. You should not use it for anything important.
This is an implementation of Alchemical Model: a Behler-Parrinello neural network [1] with Smooth overlab of atomic positions (SOAP) features [2] and Alchemical Compression of the composition space [3][4][5]. This model is extremely useful for simulating systems with large amount of chemical elements.
Installation¶
To install the package, you can run the following command in the root directory of the repository:
pip install .[alchemical-model]
This will install the package with the Alchemical Model dependencies.
Default Hyperparameters¶
The default hyperparameters for the Alchemical Model model are:
model:
soap:
num_pseudo_species: 4
cutoff: 5.0
basis_cutoff_power_spectrum: 400
radial_basis_type: 'physical'
basis_scale: 3.0
trainable_basis: true
normalize: true
contract_center_species: true
bpnn:
hidden_sizes: [32, 32]
output_size: 1
training:
batch_size: 8
num_epochs: 100
learning_rate: 0.001
early_stopping_patience: 50
scheduler_patience: 10
scheduler_factor: 0.8
log_interval: 10
checkpoint_interval: 25
per_structure_targets: []
Tuning Hyperparameters¶
The default hyperparameters above will work well in most cases, but they may not be optimal for your specific dataset. In general, the most important hyperparameters to tune are (in decreasing order of importance):
cutoff
: This should be set to a value after which most of the interactions between atoms is expected to be negligible.num_pseudo_species
: This number determines the number of pseudo species to use in the Alchemical Compression of the composition space. This value should be adjusted based on the prior knowledge of the size of original chemical space size.learning_rate
: The learning rate for the neural network. This hyperparameter controls how much the weights of the network are updated at each step of the optimization. A larger learning rate will lead to faster training, but might cause instability and/or divergence.batch_size
: The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.hidden_sizes
: This hyperparameter controls the size and depth of the descriptors and the neural network. In general, increasing this might lead to better accuracy, especially on larger datasets, at the cost of increased training and evaluation time.
Architecture Hyperparameters¶
- param name:
experimental.alchemical_model
model¶
soap¶
- param num_pseudo_species:
Number of pseudo species to use in the Alchemical Compression of the composition space.
- param cutoff_radius:
Spherical cutoff (Å) to use for atomic environments.
- param basis_cutoff:
The maximal eigenvalue of the Laplacian Eigenstates (LE) basis functions used as radial basis [6]. This controls how large the radial-angular basis is.
- param radial_basis_type:
A type of the LE basis functions used as radial basis. The supported radial basis functions are
LE
: Original Laplacian Eigenstates raidal basis. These radial basis functions can be set in the.yaml
file as:radial_basis_type: "le"
Physical
: Physically-motivated basis functions. These radial basis functions can be set inradial_basis_type: "physical"
- param basis_scale:
Scaling parameter of the radial basis functions, representing the characteristic width (in Å) of the basis functions.
- param trainable_basis:
If
True
, the radial basis functions will be accompanied by the trainable multi-layer perceptron (MLP). IfFalse
, the radial basis functions will be fixed.- param normalize:
Whether to use normalizations such as LayerNorm in the model.
- param contract_center_species:
If
True
, the Alchemcial Compression will be applied on center species as well. IfFalse
, the Alchemical Compression will be applied only on the neighbor species.
bpnn¶
- param hidden_sizes:
number of neurons in each hidden layer
- param output_size:
number of neurons in the output layer
training¶
The parameters for the training loop are
- param batch_size:
batch size
- param num_epochs:
number of training epochs
- param learning_rate:
learning rate
- param log_interval:
number of epochs that elapse between reporting new training results
- param checkpoint_interval:
Interval to save a checkpoint to disk.
- param per_atom_targets:
Specifies whether the model should be trained on a per-atom loss. In that case, the logger will also output per-atom metrics for that target. In any case, the final summary will be per-structure.