Initial commit of the AENET code.
Initial commit of the AENET code.
_________________________________
INPUT FILE FORMAT FOR `TRAIN.X'
Nongnuch Artrith
_________________________________
Table of Contents
_________________
1 Alphabetic list of keywords
2 Training methods
.. 2.1 Online gradient descent (`online_gd')
.. 2.2 Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method
.. 2.3 Levenberg-Marquardt method
3 Input file template (train.in)
4 Example input file (train.in)
1 Alphabetic list of keywords
=============================
All keywords are case insensitive and independent of the order. Blank
lines and lines starting with `!', `#', or `%' are ignored.
`debug' (optional): Activate debugging mode; additional output files
will be created.
`iterations' (optional): Specifies the number of training
iterations/epochs (default: 10).
`maxenergy' (optional): Highest formation energy to include in the
training set.
`method' (optional): Specifies the training method/algorithm to be used
for the weight optimization. The line following
the keyword contains as first item the name of the
method (e.g., `bfgs', `online_gd', `lm') and as
further items the parameters of the method (if
applicable). The default method is `bfgs'.
`networks' (required): Defines the architectures and specifies files for
all ANNs. Each of the `<NT>' (= number of types)
lines following the keyword contains the chemical
symbol `<T_i>' of the /i/-th atomic species in
the training set, the path to the ANN output file
(binary), and the architecture of the hidden
network layers. The latter is defined by the
number of hidden layers followed by the number of
nodes and the activation function separated by a
colon (see example below for two hidden layers of
5 nodes each and the hyperbolic tangent
activation).
`save_energies' (optional): Activate output of the final energies of all
training and testing structures. The
resulting output files can be used to
visualize the quality of the ANN fit and to
identify structures that are not well
represented. One file per process will be
generated, containing only the energies of
all structures handled by the process. The
files can simply be concatenated.
`testpercent' (optional): Specifies the percentage of reference
structures to be used as independent testing
set (default: 10%).
`timing' (optional): Activate timing; additional output files will be
created.
`trainingset' (required): Defines the name/path to the binary training
set file (output of generate.x, e.g.,
"refdata.train").
2 Training methods
==================
The training method is specified with the *method* keyword followed by
the identifier of the method and its parameters. Currently, `train.x'
offers three different optimization methods: online gradient descent,
the limited-memory BFGS algorithm and the Levenberg-Marquardt method.
2.1 Online gradient descent (`online_gd')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gradient descent is implemented as /online/ learning method which
currently prevents efficient parallelization. The method is selected
with the identifier `online_gd' and has two parameters, the /learning
rate/ (`gamma') that is a measure of the stepsize per iteration, and
the /momentum parameter/ (`alpha') that controls fluctuations.
An example definition with reasonable parameters is:
,----
| METHOD
| online_gd gamma=3.0d-2 alpha=0.05d0
`----
2.2 Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The L-BFGS method is implemented as /batch/ training method, which
enables efficient parallelization of the error function evaluation.
The method is selected with the identifier `bfgs' and does not
currently offer any adjustable parameters:
,----
| METHOD
| bfgs
`----
2.3 Levenberg-Marquardt method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Levenberg-Marquardt method that is presently only available in
serial is selected with the identifier `lm'. The method supports a
number of parameters: `batchsize' sets the number of training points
that are used to evaluate the error function at a time. This /batch
size/ determines the computational requirements of the method, but
should be chosen as large as possible to guarantee convergence. The
`learnrate' is the initial value of the learning rate (see online
gradient descent). The parameter `iter' determines the number of
iterations per optimization step used to adjust the learning rate, and
the factor used for this adjustment is defined with `adjust'.
Finally, a convergence threshold for the error function can be
specified with `conv'.
Example of reasonable parameters
,----
| METHOD
| lm batchsize=5000 learnrate=0.1d0 iter=3 conv=0.001 adjust=5.0
`----
3 Input file template (train.in)
================================
,----
| TRAININGSET <path/to/data/file>
| TESTPERCENT <percentage>
| ITERATIONS <NI>
| MAXENERGY <emax e.g. -0.05 eV>
| SAVE_ENERGIES
|
| METHOD
| <method name> <parameters>
|
| # Examples
| #
| # (1) online steepest descent
| # METHOD
| # online_gd gamma=5.0d-7 alpha=0.25d0
| # (2) BFGS
| # METHOD
| # bfgs
| # (3) Levenberg-Marquardt
| # METHOD
| # lm batchsize=1000 learnrate=0.1 iter=1 conv=0.001 adjust=10.0
|
| NETWORKS
| # atom network hidden
| # types file-name layers nodes:activation
| <T_1> <path/to/net-1> 2 5:tanh 5:tanh
| <T_2> <path/to/net-2> 2 5:tanh 5:tanh
| ...
| <T_NT> <path/to/net-NT> 2 5:tanh 5:tanh
|
| # Example using different activation functions:
| # For details see Eq. (1) in:
| # N. Artrith and A. Urban, Comput. Mater. Sci. 114 (2016) 135-150.
| #
| # <T_1> <path/to/net-1> 2 5:linear 5:linear
| # <T_2> <path/to/net-2> 2 5:linear 5:linear
|
| # <T_1> <path/to/net-1> 2 5:tanh 5:tanh
| # <T_2> <path/to/net-2> 2 5:tanh 5:tanh
|
| # <T_1> <path/to/net-1> 2 5:sigmoid 5:sigmoid
| # <T_2> <path/to/net-2> 2 5:twist 5:twist
`----
4 Example input file (train.in)
===============================
,----
| TRAININGSET TiO2.train
| TESTPERCENT 10
| ITERATIONS 500
|
| TIMING
|
| METHOD
| lm batchsize=5000 learnrate=0.1d0 iter=3 conv=0.001 adjust=5.0
|
| NETWORKS
| ! atom network hidden
| ! types file-name layers nodes:activation
| O O.10t-10t.ann 2 10:twist 10:twist
| Ti Ti.10t-10t.ann 2 10:twist 10:twist
`----