We compare the ability of a simulated annealing program and an evolutionary algorithm to find molecules with large molecular average hyperpolarizabilities. This property is an important component of nonlinear optical materials. Both optimization programs represent molecules as SMILES strings, a method that is widely used by chemists to describe molecular structure using short ASCII strings. Our results suggest that both approaches are comparable and can be used to solve a variety of more realistic problems of interest to chemists and material scientists.
Many quantum chemical programs can calculate the properties of a molecule from its structure but the reverse question, identifying a compound that has a specific set of properties, is a much more difficult problem because the solution space is vast and the variables are discrete rather than continuous. One way to approach this problem is to perform an elaborate trial-and-error search through an enormous collection of potential candidates. Another way is to use an optimization algorithm to sort through a set of potential molecules. These methods generally arrange a set of basic building blocks (atoms, chemical groups, etc.) using combinatorial methods [1][2][3], simulated annealing [4][5][6] or genetic algorithms [7][8][9].
Any optimization algorithm that wants to find molecules with specific properties must be able to represent the structure of a molecule in a form that a program can easily manipulate. SMILES (Simplified Molecular Input Line Entry Specification) is a simple language that describes the structure of chemical molecules using short ASCII strings. Four simple rules define a valid SMILES string [10]:
(1) Atoms are described by their standard atomic symbol. Each symbol is normally enclosed in square brackets, such as [Au] for gold, however, many of the atoms found in organic molecules (such as B, C, N, O, P, S and F) are written without brackets. Most SMILES strings omit all hydrogen atoms. The implicit number of hydrogen atoms attached to other atoms is the difference between the atom’s valence and the number of bonds assigned to the atom. (2) Single, double, and triple bonds are represented by the symbols ‘-’, ‘=’ and ‘#’ respectively. The atoms connected by these bonds are indicated by their adjacency. In most versions of SMILES, single bonds are omitted from the string, but for convenience, we explicitly show all bonds. (3) Branching is specified by placing the symbols for the atoms and bonds in this subchain between parentheses. These parentheses are placed directly after the symbol for the atom in the main sequence to which it is connected. (4) Rings are represented by breaking a single bond in each ring and then designating the two atoms connected by this bond with a digit immediately following the symbol for the atoms.
In this paper, we compare the performance of simulated annealing with an evolutionary algorithm to find molecules with large hyperpolarizabilities, 𝛽. This property determines how a molecule interacts with light; molecules with large values can dramatically modify the frequency, phase and/or polarization of light. Several theoretical and experimental studies have identified numerous organic molecules that could form the basis of nonlinear optical (NLO) materials [11,12]. Although these materials have several physical requirements (such as high thermal stability and transparency), a large hyperpolarizability is a critical one. For this reason, we want to determine which program increases 𝛽 as quickly as possible. For testing purposes, we consider molecules that contain only carbon, oxygen and hydrogen atoms. In Section 2 we examine a basic evolutionary algorithm and in Section 3 a simulated annealing program. In Section 4 we compare the results of both approaches.
Evolutionary algorithms were introduced in 1975 by John Holland [13]. In his book “Adaptation in Natural and Artificial Systems”, Holland described how simulating biological evolution can become a general problem-solving strategy. Using these ideas, our program consists of the following basic steps:
(1) Choose an initial population of parents. In our calculations, each parent is described as a SMILES string.
(2) Apply mutation and crossover operators on the parents to generate a population of children. The choice of mutation or crossover is determined by a random number. If this number is below some fixed ratio, the seven mutation operators described below act on the parent: N-C) All valid strings generated by these mutations are included as children. If the random number is above the fixed ratio, a basic “cut and splice” crossover operator [14] is employed. This method chooses two strings at random, A and B, and breaks each at a random point between a bond and an atom. This ensures that the two ends of each string have a good chance of forming a valid string. The pieces, (A1, A2) and (B1, B2), are then recombined into the two new individuals (A1, B2) and (B1, A2). All valid strings are included as children and all invalid strings are discarded.
(3) Apply a selection process to the entire population of children.
Those with high fitness functions survive; those with low scores are discarded. The survivors then become the parents of the next generation. In our program, we divide the children into four equal groups. From the group with the highest fitness values, we randomly select 40% of the children to survive. From the second group, we randomly select 30% of the children; from the third group 20% of the chi
This content is AI-processed based on open access ArXiv data.