Learning Fitness Functions for Machine Programming

L E A R N I N G F I T N E S S F U N C T I O N S F O R M AC H I N E P R O G R A M M I N G Shantanu Mandal 1 T odd Anderson 2 Ja vier T urek 2 Justin Gottschlich 2 Shengtian Zhou 2 Abdullah Muzahid 1 A B S T R AC T The problem of automatic software generation has been referred to as machine pr o gramming . In this work, we propose a framework based on genetic algorithms to help make progress in this domain. Although genetic algorithms (GAs) hav e been successfully used for many problems, one criticism is that hand-crafting GAs ﬁtness function , the test that aims to effecti vely guide its e volution, can be notably challenging. Our frame work presents a nov el approach to learn the ﬁtness function using neural netw orks to predict v alues of ideal ﬁtness functions. W e also augment the e volutionary process with a minimally intrusi ve search heuristic. This heuristic improv es the framew ork’ s ability to disco ver correct programs from ones that are approximately correct and does so with negligible computational overhead. W e compare our approach with several state-of-the-art program synthesis methods and demonstrate that it ﬁnds more correct programs with fewer candidate program generations. 1 I N T R O D U C T I O N In recent years, there has been notable progress in the space of automatic software generation, also known as machine pr ogr amming (MP) ( Gottschlich et al. , 2018 ; Ratner et al. , 2019 ). An MP system produces a program as output that satisﬁes some input speciﬁcation to the system, often in the form of input-output examples. The previous approaches to this problem have ranged from formal program synthe- sis ( Alur et al. , 2015 ; Gulwani et al. , 2012 ) to machine learning (ML) ( Balog et al. , 2017a ; Devlin et al. , 2017 ; Reed & de Freitas , 2016 ; Zohar & W olf , 2018 ) as well as their combinations ( Feng et al. , 2018 ). Genetic algorithms (GAs) hav e also been sho wn to have signiﬁcant promise for MP ( Becker & Gottschlich , 2017 ; Brameier , 2007 ; Langdon & Poli , 2010 ; Perkis , 1994 ). GA is a simple and intuiti ve ap- proach and demonstrates competitiv e performance in many challenging domains ( K orns , 2011 ; Real et al. , 2018 ; Such et al. , 2017 ). Therefore, in this paper , we focus on GA - more speciﬁcally , a fundamental aspect of GA in the context of MP . A genetic algorithm (GA) is a machine learning technique that attempts to solve a problem from a pool of candidate so- lutions. These generated candidates are iterati vely e volv ed and mutated and selected for survi val based on a grading criteria, called the ﬁtness function . Fitness functions are usually hand-crafted heuristics that grade the approximate correctness of candidate solutions such that those that are 1 Department of Computer Science and Engineering, T exas A&M Univ ersity 2 Intel Labs. Correspondence to: Abdullah Muza- hid . closer to being correct are more likely to appear in subse- quent generations. In the context of MP , candidate solutions are programs, initially random but e volving over time to get closer to a program satisfying the input speciﬁcation. Y et, to guide that ev olution, it is particularly dif ﬁcult to design an effecti ve ﬁtness function for a GA-based MP system. The ﬁtness function is giv en a candidate program and the input speciﬁ- cation (e.g., input-output examples) and from those, must estimate ho w close that candidate program is to satisfying the speciﬁcation. Ho wev er , we know that a program having only a single mistake may produce output that in no obvious way resembles the correct output. That is why , one of the most frequently used ﬁtness functions (i.e., edit-distance be- tween outputs) in this domain ( Becker & Gottschlich , 2017 ; Brameier , 2007 ; Langdon & Poli , 2010 ; Perkis , 1994 ) will in many cases gi ve wildly wrong estimates of candidate pro- gram correctness. Thus, it is clear that designing effecti ve ﬁtness functions for MP are difﬁcult. Designing simple and effecti ve ﬁtness functions is a unique challenge for GA. Despite many successful applications of GA, it still remains an open challenge to automate the generation of such ﬁtness functions. An impediment to this goal is that ﬁtness function complexity tends to increase proportionally with the problem being solv ed, with MP being particularly complex. In this paper , we e xplore an approach to automatically generate these ﬁtness functions by representing their structure with a neural network. While we in vestigate this technique in the context of MP , we believ e the technique to be applicable and generalizable to other domains. W e make the follo wing technical contributions: • F itness Function: Our fundamental contribution is in Learning Fitness Functions f or Machine Programming the automation of ﬁtness functions for genetic algo- rithms. W e propose to do so by mapping ﬁtness func- tion generation as a big data learning problem. T o the best of our knowledge, our w ork is the ﬁrst of its kind to use a neural network as a genetic algorithm’ s ﬁtness function for the purpose of MP . • Con ver gence: A secondary contribution is in our uti- lization of local neighborhood search to improve the con ver gence of approximately correct candidate solu- tions. W e demonstrate its ef ﬁcacy empirically . • Generality: W e demonstrate that our approach can support different neural netw ork ﬁtness functions, uni- formly . W e dev elop a neural network model to predict the ﬁtness score based on the gi ven speciﬁcation and program trace. • Metric: W e contribute a new metric suitable for MP domain. The metric, “search space” size (i.e., ho w many candidate programs have been searched), is an alternati ve to program generation time, and is designed to emphasize the algorithmic ef ﬁciency as opposed to the implementation efﬁcienc y of an MP approach. 2 R E L AT E D W O R K Machine programming can be achiev ed in many ways. One way is by using formal pr ogram synthesis , a technique that uses formal methods and rules to generate programs ( Manna & W aldinger , 1975 ). Formal program synthesis usually guarantees some program properties by e valuating a gener - ated program’ s semantics against a corresponding speciﬁ- cation ( Alur et al. , 2015 ; Gulwani et al. , 2012 ). Although useful, such formal synthesis techniques can often be lim- ited by exponentially increasing computational ov erhead that grows with the program’ s instruction size ( Bodík & Jobstmann , 2013 ; Cheung et al. , 2012 ; Heule et al. , 2016 ; Loncaric et al. , 2018 ; Solar-Lezama et al. , 2006 ). An alternati ve to formal methods for MP is to use machine learning (ML). Machine learning dif fers from traditional for - mal program synthesis in that it generally does not provide correctness guarantees. Instead, ML-dri ven MP approaches are usually only pr obabilistically correct, i.e., their results are derived from sample data relying on statistical signiﬁ- cance ( Murphy , 2012 ). Such ML approaches tend to explore software program generation using an objecti ve function. Objectiv e functions are used to guide an ML system’ s ex- ploration of a problem space to ﬁnd a solution. More recently , there has been a surge of research exploring ML-based MP using neural networks (NNs). For example, in ( Balog et al. , 2017b ), the authors train a neural netw ork with input-output examples to predict the probabilities of the functions that are most likely to be used in a program. Raychev et al. ( Rayche v et al. , 2014 ) take a different ap- proach and use an n-gram model to predict the functions that are most likely to complete a partially constructed pro- gram. Rob ustﬁll ( Devlin et al. , 2017 ) encodes input-output examples using a series of recurrent neural networks (RNN), and generates the the program using another RNN one tok en at a time. Bunel et al. ( Bunel et al. , 2018 ) e xplore a unique approach that combines reinforcement learning (RL) with a supervised model to ﬁnd semantically correct programs. These are only a few of the works in the MP space using neural networks ( Cai et al. , 2017 ; Chen et al. , 2018 ; Reed & de Freitas , 2016 ). Signiﬁcant research has been done in the ﬁeld of genetic pro- gramming ( Brameier , 2007 ; Langdon & Poli , 2010 ; Perkis , 1994 ) whose goal is to ﬁnd a solution in the form of a com- plete or partial program for a gi ven speciﬁcation. Prior work in this ﬁeld has tended to focus on either the representation of programs or operators during the ev olution process. Real et al. ( Real et al. , 2019 ) recently demonstrated that genetic algorithms can generate accurate image classiﬁers. Their approach produced a state-of-the-art classiﬁer for CIF AR- 10 ( Krizhe vsky , 2009 ) and ImageNet ( Deng et al. , 2009 ) datasets. Moreo ver , genetic algorithms hav e been exploited to successfully automate the neural architecture optimiza- tion process ( Labs ; Liu et al. , 2017 ; Real et al. , 2020 ; Sali- mans et al. , 2017 ; Such et al. , 2017 ). Even with this notable progress, genetic algorithms can be challenging to use due to the complexity of hand-crafting ﬁtness functions that guide the search. W e claim that our proposed approach is the ﬁrst of its kind to automate the generation of ﬁtness functions. 3 B AC K G R O U N D Let S t = { ( I j , O t j ) } m j =1 be a set of m input-output pairs, such that the output O t j is obtained by ex ecuting the program P t on the input I j . Inherently , the set S t of input-output examples describes the behavior of the program P t . One would lik e to synthesize a program P t 0 that recov ers the same functionality of P t . Ho wev er , P t is usually unknown, and we are left with the set S t , which was obtained by run- ning P t . Based on this assumption, we deﬁne equiv alency between two programs as follo ws: Deﬁnition 3.1 (Program Equiv alency) . Programs P a and P b are equi valent under the set S = { ( I j , O j ) } m j =1 of input- output examples if and only if P a ( I j ) = P b ( I j ) = O j , for 1 ≤ j ≤ m . W e denote the equiv alency by P a ≡ S P b . Deﬁnition 3.1 suggests that to obtain a program equi valent to P t , we need to synthesize a program that is consistent with the set S t . Therefore, our goal is ﬁnd a program P t 0 that is equiv alent to the target program P t (which was used to generate S t ), i.e., P t 0 ≡ S t P t . This task is known as Learning Fitness Functions f or Machine Programming Solution Evolv e : Crossover and Muta te Genetic Algorithm Candidate genes Restrict ed Local Neighborhood Search Local pro ximity search T op N candidate genes Solution found? Generat e output on targ et input No, Infer ence Y es Search No Search or evolve? NN fitness function x 2 x 1 h 1 h 2 h 3 y Start Y es Candidate output matches tar get output? T arget input T arget output Random initialization T raining set Generat e input-output Input- output ex amples T rain Embed input and output, train neural netw ork (NN) Phase 1: Fitness Function Generation Generat ed fitness function as NN x 2 x 1 h 1 h 2 h 3 y Phase 2: Program Generation Figure 1. Overvie w of NetSyn. Phase 1 automates the ﬁtness function generation by training a neural network on a corpus of example programs and their inputs and outputs. Phase 2 ﬁnds the target program for a giv en input-output example using the trained neural network as a ﬁtness function in a genetic algorithm. Inductiv e Program Synthesis (IPS). As suggested by ( Balog et al. , 2017b ), a machine learning based solution to the IPS problem requires the deﬁnition of some components. First, we need a programming language that deﬁnes the domain of valid programs. Second, we need a method to search ov er the program domain. The search method sweeps ov er the program domain to ﬁnd P t 0 that satisﬁes the equiv alency property . Optionally , we may want to deﬁne a ranking function to rank all the solutions found and choose the best ones. Last, as we plan to base our solution on machine learning techniques, we will need data to train models. 4 N E T S Y N Here, we describe our solution to IPS in more detail, in- cluding the choices and no velties for each of the proposed components. W e name our solution NetSyn as it is based on neural networks for program synthesis. 4.1 Domain Speciﬁc Language As NetSyn’ s programming language, we choose a domain speciﬁc language (DSL) constructed speciﬁcally for it. This choice allows us to constrain the program space by restrict- ing the operations used by our solution. NetSyn’ s DSL follows the DeepCoder’ s DSL ( Balog et al. , 2017b ), which was inspired by SQL and LINQ ( Dinesh et al. , 2007 ). The only data types in the language are (i) integers and (ii) lists of inte gers. The DSL contains 41 functions, each taking one or two ar guments and returning one output. Many of these functions include operations for list manipulation. Likewise, some operations also require lambda functions. There is no explicit control ﬂo w (conditionals or looping) in the DSL. Howe ver , sev eral of the operations are high-lev el functions and are implemented using such control ﬂo w structures. A full description of the DSL can be found in the supplemen- tary material. W ith these data types and operations, we deﬁne a program P as a sequence of functions. T able 1 presents an example of a program of 4 instructions with an input and respectiv e output. Arguments to functions are not speciﬁed via named vari- ables. Instead, each function uses the output of the previ- ously ex ecuted function that produces the type of output that is used as the input to the ne xt function. The ﬁrst function of each program uses the pro vided input I . If I has a type mismatch, default values are used (i.e., 0 for integers and an empty list for a list of integers). The ﬁnal output of a programs is the output of its last function. T able 1. An example program of length 4 with an input and corre- sponding output. [int] Input: F I LT E R ( > 0 ) [-2, 10, 3, -4, 5, 2] M A P ( * 2 ) S O RT Output: R E V E R S E [20, 10, 6, 4] As a whole, NetSyn’ s DSL is no vel and amenable to genetic algorithms. The language is deﬁned such that all possible programs are valid by construction . This makes the whole program space valid and is important to facilitate the search of programs by any learning method. In particular , this is very useful in e volutionary process in genetic algorithms. Learning Fitness Functions f or Machine Programming When genetic crossov er occurs between two programs or mutation occurs within a single program, the resulting pro- gram will always be valid. This eliminates the need for pruning to identify valid programs. 4.2 Search Pr ocess NetSyn synthesizes a program by searching the program space with a genetic algorithm-based method ( Thomas , 2009 ). It does this by creating a population of random genes (i.e., candidate programs) of a given length L and uses a learned neural netw ork-based ﬁtness function (NN- FF) to estimate the ﬁtness of each gene. Higher graded genes are preferentially selected for crossover and mutation to produce the next generation of genes. In general, NetSyn uses this process to ev olve the genes from one generation to the next until it discov ers a correct candidate program as veriﬁed by the input-output examples. From time to time, NetSyn takes the top N scoring genes from the population, determines their neighborhoods, and looks for the target program using a local proximity search. If a correctly gen- erated program is not found within the neighborhoods, the ev olutionary process resumes. Figure 1 summarizes the NetSyn’ s search process. W e use a value encoding approach for each gene. A gene ζ is represented as a sequence of values from Σ DSL , the set of functions. Formally , a gene ζ = ( f 1 , . . . , f i , . . . , f L ) , where f i ∈ Σ DSL . Practically , each f i contains an identi- ﬁer (or index) corresponding to one of the DSL functions. The encoding scheme satisﬁes a one-to-one match between programs and genes. The search process begins with a set Φ 0 of | Φ 0 | = T ran- domly generated programs. If a program equiv alent to the target program P t is found, the search process stops. Other- wise, the genes are ranked using a learned NN-FF . A small percentage (e.g., 20%) of the top graded genes in Φ j are passed in an unmodiﬁed fashion to the next generation Φ j +1 for the next e volutionary phase. This guarantees that some of the top graded genes are identically preserved, aiding in forward progress guarantees. The remaining genes of the new generation Φ j +1 are created through crossov er or mu- tation with some probability . For crossov er , two genes from Φ j are selected using the Roulette Wheel algorithm with the crossov er point selected randomly ( Goldberg , 1989 ). For mutation, one gene is Roulette Wheel sele cted and the mutation point k in that gene is selected based on the same learned NN-FF . The selected value z k is mutated to some other random value z 0 such that z 0 ∈ Σ DSL and z 0 6 = z k . Crossov ers and mutations can occasionally lead to a new gene with dead code. T o address this issue, we eliminate dead code. Dead code elimination (DCE) is a classic com- piler technique to remov e code from a program that has no effect on the program’ s output ( Debray et al. , 2000 ). Dead code is possible in our list DSL if the output of a statement is never used. W e implemented DCE in NetSyn by track- ing the input/output dependencies between statements and eliminating those statements whose outputs are ne ver used. NetSyn uses DCE during candidate program generation and during crossov er/mutation to ensure that the effecti ve length of the program is not less than the target program length due to the presence of dead code. If dead code is present, we repeat crossov er and mutation until a gene without dead code is produced. 4.2.1 Learning the F itness Function Evolving the population of genes in a genetic algorithm requires a ﬁtness function to rank the ﬁtness (quality) of genes based on the problem being solved. Ideally , a ﬁtness function should measure how close a gene is to the solution. Namely , it should measure ho w close a candidate program is to an equiv alent of P t under S t . Finding a good ﬁtness function is of great importance to reduce the number of steps in reaching the solution and directing the algorithm in the right direction so that genetic algorithm are more likely to ﬁnd P t . Intuition: A ﬁtness function, often, is handcrafted to ap- proximate some ideal function that is impossible (due to incomplete kno wledge about the solution) or too computa- tionally intensiv e to implement in practice. For e xample, if we kne w P t beforehand, we could ha ve designed an ideal ﬁtness function that compares a candidate program with P t and calculates some metric of closeness (e.g., edit dis- tance, the number of common functions etc.) as the ﬁtness score. Since we do not kno w P t , we cannot implement the ideal ﬁtness function. Instead, in this work, we propose to approximate the ideal ﬁtness function by learning it from training data (generated from a number of kno wn programs). For this purpose, we use a neural netw ork model. W e train it with the goal of predicting the values of an ideal ﬁtness function. W e call such an ideal ﬁtness function (that would always giv e the correct answer with respect to the actual solution) the or acle ﬁtness function as it is impossible to achie ve in practice merely by examining input-output exam- ples. In this case, our models will not be able to approach the 100% accuracy of the or acle but rather will still ha ve suf ﬁ- ciently high enough accuracy to allo w the genetic algorithm to make forward progress. Also, we note that the trained model needs to generalize to predict for any unav ailable solution and not a single speciﬁc target case. W e follow ideas from works that hav e explored the au- tomation of ﬁtness functions using neural networks for approximating a known mathematical model. F or exam- ple, Matos Dias et al. ( Matos Dias et al. , 2014 ) automated them for IMR T beam angle optimization, while Khuntia et al. ( Khuntia et al. , 2005 ) used them for rectangular mi- Learning Fitness Functions f or Machine Programming Input 1 Output 1 Embedding Embedding LSTM LSTM f 1 t 1 One-hot- encoding Embedding LSTM f n t n One-hot- encoding Embedding LSTM LSTM LSTM LSTM LSTM Hidden Vector (H 1 ) Fully Connected Fully Connected Fitness V alue IO Example 1 Program, T race1 IO Example 1 , Program, Tr ace 1 IO Example m , Program, Tr ace m LSTM Layer s LSTM Layer s Hidden Vector (H m ) Hidden Vector (H 1 ) LSTM LSTM Fully Connected Fully Connected Fitness V alue (a) (b) … … Figure 2. Neural network ﬁtness function for (a) single and (b) multiple IO examples. In each ﬁgure, layers of LSTM encoders are used to combine multiple inputs into hidden vectors for the ne xt layer . Final ﬁtness score is produced by the fully connected layer . crostrip antenna design automation. In contrast, our work is fundamentally different in that we use a lar ge corpus of program metadata to train our models to predict how close a giv en, incorrect solution could be from an unknown correct solution (that will generate the correct output). In other words, we propose to automate the generation of ﬁt- ness functions using big data learning. T o the best of our knowledge, NetSyn is the ﬁrst proposal for automation of ﬁtness functions in genetic algorithms. In this paper , we demonstrate this idea using MP as the use case. Giv en the input-output samples S t =  I j , O t j  j of the target program P t and an ideal ﬁtness function f it ( · ) , we would like a model that predicts the ﬁtness value f it ( ζ , P t ) for a gene ζ . In practice, our model predicts the values of f it ( · ) from input-output samples in S t and from ex ecution traces of the program P ζ (corresponding to ζ ) by running with those inputs. Intuitively , execution traces provide in- sights of whether the program P ζ is on the right track. In NetSyn, we use a neural network to model the ﬁtness function, referred to as NN-FF . This task requires us to generate a training dataset of programs with respecti ve input- output samples. T o train the NN-FF , we randomly generate a set of e xample programs, E = { P e j } , along with a set of random inputs I j = { I e j i } per program P e j . W e then ex ecute each program P e j in E with its corresponding input set I j to calculate the output set O j . Additionally , for each P e j in E , we randomly generate another program P r j = ( f r j 1 , f r j 2 , ..., f r j n ) , where f r j k is a function from the DSL i.e., f r j k ∈ Σ DSL . W e apply the previously generated input I e j i to P r j to get an ex ecution trace, T rj i = ( t rj i 1 , t rj i 2 , ..., t rj in ) , where t rj ik = f rj k ( t rj i ( k − 1) ) with t rj i 1 = f rj 1 ( I e j i ) and t rj in = f rj n ( t rj i ( n − 1) ) = P r j ( I e j i ) . Thus, the input set I j = { I e j i } of the program P e j produces a set of traces T j = { T r j i } from the program P r j . W e then compare the programs P r j and P e j to calculate the ﬁtness value and use it as an example to train the neural network. In NetSyn, the inputs of NN-FF consist of input-output examples, generated programs, and their ex ecution traces. Let us consider the case of a single input-output example, ( I e j i , O e j i ) . Let us assume that P e j is the tar get program that NetSyn attempts to generate and in the process, it gener- ates P r j as a potential equi valent. NN-FF uses ( I e j i , O e j i ) , and { ( f r j k , t r j ik ) } as the inputs. Each of ( I e j i , O e j i ) , and t r j ik are passed through an embedding layer follo wed by an LSTM encoder . f r j k is passed as a one-hot-encoding vec- tor . Figure 2 (a) sho ws the NN-FF architecture for a single input-output example. T wo layers of LSTM encoders com- bines the vectors to produce a single vector , H j i , which is then processed through fully connected layers to predict the ﬁtness value. In order to handle a set of input-output exam- ples, { ( I e j i , O e j i ) } , a set of ex ecution traces, T j = { T r j i } , is collected from a single generated program, P r j . Each input-output example along with the corresponding ex ecu- tion trace produces a single vector , H j i . An LSTM encoder combines such vectors to produce a single vector , which is then processed by fully connected layers to predict the ﬁtness value (Figure 2 (b)). Example: T o illustrate, suppose the program in T able 1 is in E . Let us assume that P r j is another program {[ I N T ] , F I LT E R ( > 0 ) , M A P ( * 2 ) , R E V E R S E , D RO P ( 2 )}. If we use the input in T able 1 (i.e., [-2, 10, 3, -4, 5, 2]) with P r j , the ex ecution trace is {[10, 3, 5, 2], [20, 6, 10, 4], [4, 10, 6, 20], [6, 20]}. So, the input of NN-FF is {[-2, 10, 3, -4, 5, 2], [20, 10, 6, 4], F ilter v , [10, 3, 5, 2], M ap v , [20, 6, 10, 4], Rev erse v , [4, 10, 6, 20], D rop v , [6, 20]}. f v indicates the value corresponding to the function f . There are different ways to quantify ho w close tw o programs Learning Fitness Functions f or Machine Programming are to one another . Each of these different methods then has an associated metric and ideal ﬁtness v alue. W e in vestigated three such metrics – common functions, longest common subsequence, and function probability – which we use as the expected predicted output for the NN-FF . Common Functions: NetSyn can use the number of common functions (CF) between P ζ and P t as a ﬁtness value for ζ . In other words, the ﬁtness v alue of ζ is f C F P t ( ζ ) = | elems ( P ζ ) ∩ elems ( P t ) | . For the earlier example, f C F will be 3. Since the output of the neural network will be an integer from 0 to len ( P t ) , the neural network can be designed as a multiclass classiﬁer with a softmax layer as the ﬁnal layer . Longest Common Subsequence: As an alternative to CF , we can use longest common subsequence (LCS) be- tween P ζ and P t . The ﬁtness score of ζ is f LC S P t ( ζ ) = len ( LCS ( P ζ , P t )) . Similar to CF , training data can be constructed from E which is then fed into a neural network- based multiclass classiﬁer . For the earlier example, f LC S will be 2. Function Probability: The w ork ( Balog et al. , 2017b ) pro- posed a probability map for the functions in the DSL. Let us assume that the probability map p is deﬁned as the prob- ability of each DSL operation to be in P t giv en the input- output samples. Namely , p = ( p 1 , . . . , p k , . . . , p | Σ DSL | ) such that p k = P r ob (op k ∈ elems ( P t ) |{ ( I j , O t j ) } m j =1 ) , where op k is the k th operation in the DSL. Then, a mul- ticlass, multilabel neural network classiﬁer with sigmoid activ ation functions used in the output of the last layer can be used to predict the probability map. T raining data can be constructed for the neural network using E . W e can use the probability map to calculate the ﬁtness score of ζ as f F P P t ( ζ ) = P k :op k ∈ elems ( P ζ ) p k . NetSyn also uses the probability map to guide the mutation process. F or example, instead of mutating a function z k with z 0 that is selected ran- domly , NetSyn can select z 0 using Roulette Wheel algorithm using the probability map. 4.2.2 Local Neighborhood Sear ch Neighborhood search (NS) checks some candidate genes in the neighborhood of the N top scoring genes from the genetic algorithm. The intuition behind NS is that if the target program P t is in that neighborhood, NetSyn may be able to ﬁnd it without relying on the genetic algorithm, which would likely result in a f aster synthesis time. Let us assume that NetSyn has completed l generations. Then, let µ l − w +1 ,l denote the av erage ﬁtness score of genes for the last w generations (i.e., from l − w + 1 to l ) and µ 1 ,l − w will denote the a verage ﬁtness score before the last w generations (i.e., from 1 to l − w ). Here, w is the sliding window . NetSyn in vokes NS if µ l − w +1 ,l ≤ µ 1 ,l − w . The ... ... ... ... i=1 i=len j=1 j=|f| (c ) ... ... ... i =1 i =l e n j =1 ... (d) j =| f| (a) BFS-based ... ... ... ... i =1 i =l e n j =1 j =| f| (c ) ... ... ... i=1 i=len j=1 ... (d) j=|f| (b) DFS-based Figure 3. Examples of neighborhood using (a) BFS- and (b) DFS- based approach. Each neighborhood constructs a set of close-by genes by systematically changing one function at a time. rationale is that under these conditions, the search procedure has not produced improved genes for the last w generations (i.e., saturating). Therefore, it should check if the neighbor- hood contains any program equi valent to P t . Algorithm 1 Deﬁnes and searches neighborhood based on BFS principle Input: A set G of top N scoring genes Output: P t 0 , if found, or Not found otherwise 1 for Eac h ζ ∈ G do 2 N H ← ∅ 3 for i ← 1 to len ( ζ ) do 4 for j ← 1 to | Σ DS L | do 5 ζ n ← ζ with ζ i r eplaced with op j such that ζ i 6 = op j 6 N H ← N H ∪ { ζ n } 7 if ther e is P t 0 ∈ N H such that P t 0 ≡ S t P t then 8 retur n P t 0 9 retur n Not found Neighborhood Deﬁnition: Algorithm 1 shows ho w to de- ﬁne and search a neighborhood. The algorithm is inspired by the breadth ﬁrst search (BFS) method. For each top scor- ing gene ζ , NetSyn considers one function at a time starting from the ﬁrst operation of the gene to the last one. Each selected operation is replaced with all other operations from Σ DSL , and inserts the resultant genes into the neighborhood set N H . If a program P t 0 equiv alent to P t is found in N H , NetSyn stops there and returns the solution. Otherwise, it continues the search and returns to the genetic algorithm. The complexity of the search is O ( N · len ( ζ ) · | Σ DSL | ) , which is signiﬁcantly smaller than the exponential search space used by a traditional BFS algorithm. Similar to BFS, NetSyn can deﬁne and search the neighborhood using an Learning Fitness Functions f or Machine Programming approach similar to depth ﬁrst search (DFS). It is similar to Algorithm 1 except i keeps track of depth here. After the loop in line 4 ﬁnishes, NetSyn picks the best scoring gene from N H to replace ζ before going to the next lev el of depth. The algorithmic complexity remains the same. Figure 3 (a) and (b) sho w examples of neighborhood using BFS- and DFS-based approach. 5 E X P E R I M E N TA L R E S U LT S W e implemented NetSyn in Python with a T ensorFlo w back- end ( Abadi et al. , 2015 ). W e dev eloped an interpreter for NetSyn’ s DSL to ev aluate the generated programs. W e used 4,200,000 randomly generated unique example programs of length 5 to train the neural networks. W e used 5 input- output examples for each program to generate the training data. T o allo w our model to predict equally well across all possible CF/LCS v alues, we generate these programs such that each of the 0-5 possible CF/LCS values for 5 length programs are equally represented in the dataset. T o test NetSyn, we randomly generated a total of 100 programs for each program length from 5 to 10. F or each program length, 50 of the generated programs produce a singleton integer as the output; the rest produce a list of inte gers. W e therefore refer to the ﬁrst 50 programs as singleton pr ograms and the rest as list pr ograms . W e collected m = 5 input-output examples for each testing program. When synthesizing a program using NetSyn, we execute it K = 10 times and av erage the results to eliminate any noise. 5.1 Demonstration of Synthesis Ability W e ran three variants of NetSyn - NetSyn CF , NetSyn LCS , and NetSyn FP , each predicting f CF , f LCS , and f FP ﬁt- ness functions, respectiv ely . Each used N S B F S and FP- based mutation operation. W e ran the publicly available best performing implementations of DeepCoder ( Balog et al. , 2017b ), PCCoder ( Zohar & W olf , 2018 ), and Robust- Fill ( Devlin et al. , 2017 ). W e also implemented a genetic programming-based approach, PushGP ( Perkis , 1994 ). For comparison, we also tested two other ﬁtness functions: 1) edit-distance between outputs ( f Edit ), and 2) the oracle ( f Or acle ). F or ev ery approach, we set the maximum search space size to 3,000,000 candidate programs. If an approach does not ﬁnd the solution before reaching that threshold, we conclude the experiment marking it as “solution not found” . Figure 4 (a) - (c) show comparati ve results using the pro- posed metric: sear ch space used. For each test program, we count the number of candidate programs searched before the experiment has concluded by either ﬁnding a correct program or exceeding the threshold. The number of can- didate programs searched is expressed as a percentage of the maximum search space threshold, i.e., 3,000,000 and shown in y-axis. W e sort the time taken to synthesize the programs. A position N on the X-axis corresponds to the program synthesized in the Nth longest percentile time of all the programs. Lines terminate at the point at which the approach f ails to synthesize the corresponding program. For all approaches, except for f E dit -based NetSyn and PushGP , up to 30% of the programs can be synthesized by searching less than 2% of the maximum search space. Search space use increases when an approach tries to synthesize more programs. In general, DeepCoder , PCCoder , and RobustFill search more candidate programs than f CF , f LCS or f FP - based NetSyn. For example, for synthesizing programs of length 5, DeepCoder, PCCoder and RobustFill use 37%, 33%, and 47% search space to synthesize 40%, 50%, and 60% programs, respectively . In comparison, NetSyn can synthesize upwards of 90% programs by using less than 60% search space. NetSyn synthesizes programs at per- centages ranging from 65% (in case of NetSyn FP for 10 length programs) to as high as 97% (in case of NetSyn LCS for 5 length programs). In other words, NetSyn is more efﬁcient in generating and searching likely tar get programs. Even for length 10 programs, NetSyn can generate 65% of the programs using less than 45% of the maximum search space. In contrast, DeepCoder, PCCoder, and Rob ustFill cannot synthesize more than 60% of the programs even if they use the maximum search space. PushGP and edit distance-based approaches always use more search space than f CF or f LCS . Figure 4 (d) - (f) show the distrib ution of synthesis rate (i.e., what percentage of K = 10 runs synthesizes a particular program) in violin plots. A violin plot sho ws interquartile range (i.e., middle 50% range) as a v ertical black bar with the median as a white dot. Moreover , wider section of the plot indicates more data points in that section. For 5 length programs, NetSyn has a high synthesis rate (close to 100%) for almost e very program (as indicated by one wide section). On the other hand, DeepCoder, PCCoder , RobustFill, and PushGP ha ve bimodal distributions as indicated by tw o wide sections. At higher lengths, NetSyn synthesizes around 65% to 75% programs and therefore, the distribution becomes bimodal with two wide sections. Howe ver , the section at the top is wider indicating that NetSyn maintains high synthesis rate for the successful cases. DeepCoder , PCCoder, Rob ust- Fill, and PushGP hav e more unsuccessful cases than the successful ones. Howe ver , for the successful cases, these approaches also hav e high synthesis rates. Figure 4 (g) - (i) show comparativ e results using synthesis time as the metric. In general, DeepCoder , PCCoder, Ro- bustFill and NetSyn can synthesize up to 20% programs within a fe w seconds for all program lengths we tested. As expected, synthesis time increases as an approach attempts to synthesize more difﬁcult programs. DeepCoder, PCCoder , and RobustFill usually ﬁnd solutions f aster than NetSyn. It should be noted that the goal of NetSyn is to synthesize a Learning Fitness Functions f or Machine Programming (a) Program length = 5 (b) Program length = 7 (c) Program length = 10 (d) Program length = 5 (e) Program length = 7 (f) Program length = 10 (g) Program length = 5 (h) Program length = 7 (i) Program length = 10 Figure 4. NetSyn’ s synthesis ability with respect to different ﬁtness functions and schemes. When limited by a maximum search space, NetSyn synthesizes more programs than DeepCoder, PCCoder , RobustFill, and PushGP . Moreov er for each program, NetSyn synthesizes a higher percentage of runs than other approaches. program with as few tries as possible. Therefore, the imple- mentation of NetSyn is not streamlined to tak e advantage of various parallelization and performance enhancement tech- niques such as GPUs, hardw are accelerators, data parallel models etc. The synthesis time tends to increase for longer length programs. 5.2 Characterization of NetSyn Next, we characterize the effect of dif ferent components of NetSyn. W e sho w the results in this section based on programs of length 5. Howe ver , we found our general ob- servations to be true for longer length programs also. T able 2 shows how man y unique programs of leng th = 5 (out of a total of 100 programs) that the different approaches were able to synthesize. It also shows the average genera- T able 2. Programs synthesized for dif f erent settings of NetSyn. GA stands for genetic algorithm. Approach Programs A vg A vg Syn. Synthesized Generation Rate (%) GA + f CF 92 3273 74 GA + f CF + N S BF S 94 2953 77 GA + f CF + N S DF S 94 3026 76 GA + f CF + M utation F P 93 2726 83 GA + f CF + N S BF S + M utation F P 94 2275 85 tions and synthesis rate for each program. NetSyn synthe- sized the most number of programs in the lowest number of generations and at the highest rate of synthesis when both the NS and improv ed mutation based on function probabil- ity ( M utation F P ) are used in addition to the the NN-FF . W e note that BFS-based NS performs slightly better than DFS-based NS. Moreover , M utation F P has some measur- Learning Fitness Functions f or Machine Programming (a) N etS y n C F (b) N etS y n LC S (c) N etS y n F P Figure 5. NetSyn’ s synthesis ability with respect to ﬁtness functions and DSL function types. Programs producing a single integer output are harder to synthesize in all three variants of NetSyn. (a) CF (b) FP 0 20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Synthesis Rate(%) Function 0 20 40 60 80 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Synthesis Rate(%) Function Figure 6. Synthesis percentage across different functions. Func- tions 1 to 12 tend to hav e a lower synthesis rate because they produce a single integer output. Moreov er, f C F has a higher synthesis rate. able impact on NetSyn. Figure 7 (a) - (c) show the synthe- sis percentage for dif ferent programs and ﬁtness functions. Program 1 to 50 are singleton programs and have lower synthesis percentage in all three ﬁtness function choices. Particularly , the f FP -based approach has a low synthesis percentage for singleton programs. Functions 1 to 12 pro- duce singleton integer and tend to cause lower synthesis percentage for any program that contains them. This implies that singleton programs are relativ ely harder to synthesize. T o shed more light on this issue, Figure 6 sho ws synthesis percentage across diff erent functions. The synthesis per- centage for a function is at least 40% for the f CF -based approach, whereas for the f FP -based approach, four func- tions cannot be synthesized at all. Details of functions are in the appendix. 5.3 Characterization of Neural Networks Figure 7 (a), (b), and (c) show the prediction ability of our proposed neural network ﬁtness functions on v alidation data. Figure 7 (a) & (b) show the confusion matrix for f C F and f LC S neural network ﬁtness functions. The confusion ma- trix is a two dimensional matrix where ( i, j ) entry indicates the probability of predicting the value i when the actual value is j . Thus, each ro w of the matrix sums up to 1.0. W e can see that when a candidate program is close to the solu- tion (i.e., the ﬁtness score is 4 or abov e), each of f C F and f LC S -based model predicts a ﬁtness score of 4 or higher with a probability of 0.7 or higher . In other words, the models are very accurate in identifying potentially close- enough solutions. Similar is the case when the candidate program is mostly mistaken (i.e., a ﬁtness score is 1 or less). Thus, the neural networks are good at identifying both close- enough solutions and mostly wrong solutions. If a candidate program is some what correct (i.e., the candidate program has fe w correct functions but the rest of the functions are incorrect), it is dif ﬁcult to identify them by the proposed models. f F P model predicts probability of different functions gi ven the IO examples. W e assume a function probability to be correct if the function is in the tar get program and the neural network predicts its probability as 0.5 or higher . Figure 7 (c) shows the accuracy of f F P model. W ith enough epochs, it reaches close to 90% accuracy on the v alidation data set. 5.3.1 Additional Models and F itness Functions W e tried several other models for neural netw orks and ﬁtness functions. F or example, instead of a classiﬁcation problem, Learning Fitness Functions f or Machine Programming (a) f C F (b) f LC S (c) f F P Figure 7. Confusion matrix of (a) f C F (b) f LC S neural network ﬁtness functions. (c) shows accurac y of f F P ov er epochs. All graphs are based on the v alidation data. Overall, f C F and f LC S are capable identifying of close-enough solutions as well as mostly mistaken solutions. f F P reaches close to 90% accuracy after 40 epochs. we treated ﬁtness scores as a regression problem. W e found that the neural networks produced higher prediction error as the networks had a tendency to predict v alues close to the median of the v alues in the training set. W ith the higher pre- diction errors of the ﬁtness function, the genetic algorithm performance degraded. W e also experimented with training a netw ork to predict a correctness ordering among a set of genes. W e note that the ultimate goal of the ﬁtness score is to pro vide an order among genes for the Roulette Wheel algorithm. Rather than getting this ordering indirectly via a ﬁtness score for each gene, we attempted to ha ve the neural network predict this ordering directly . Howe ver , we were not able to train a network to predict this relativ e ordering whose accuracy was higher than the one for absolute ﬁtness scores. W e believ e that there are other potential implementations for this relativ e ordering and that it may be possible for it to be made to work in the future. Additionally , we tried a tw o-tier ﬁtness function. The ﬁrst tier was a neural network to predict whether a gene has a ﬁtness score of 0 or not. In the e vent the ﬁtness score was predicted to be non-zero, we used a second neural network to predict the actual non-zero v alue. This idea came from the intuition that since many genes hav e a ﬁtness score of 0 (at least for initial generations), we can do a better job predicting those if we use a separate predictor for that purpose. Unfortunately , mispredictions in the ﬁrst tier caused enough good genes to be eliminated that NetSyn’ s synthesis rate was reduced. Finally , we explored training a bigram model (i.e., predict- ing pairs of functions appearing one after the other). This approach is complicated by the fact that ov er 99% of the 41 × 41 (i.e., number of DSL functions squared) bigram matrix are zeros. W e tried a two-tiered neural network and principle component analysis to reduce the dimensionality of this matrix ( Li & W ang , 2014 ). Our results using this bigram model in NetSyn were similar to that of DeepCoder , with up to 90% reduction in synthesis rate for singleton programs. 6 C O N C L U S I O N In this paper , we presented a genetic algorithm-based frame- work for program synthesis called NetSyn. T o the best of our kno wledge, it is the ﬁrst work that uses a neural network to automatically generate an genetic algorithm’ s ﬁtness func- tion in the context of machine programming. W e proposed three neural network-based ﬁtness functions. NetSyn is also nov el in that it uses neighborhood search to expedite the con- ver gence process of a genetic algorithm. W e compared our approach against se veral state-of-the art program synthesis systems - DeepCoder ( Balog et al. , 2017b ), PCCoder ( Zo- har & W olf , 2018 ), RobustFill ( De vlin et al. , 2017 ), and PushGP ( Perkis , 1994 ). NetSyn synthesizes more programs than each of those prior approaches with fewer candidate program generations. W e believ e that our proposed work could open up a new direction of research by automating ﬁt- ness function generations for genetic algorithms by mapping the problem as a big data learning problem. This has the potential to improv e any application of genetic algorithms. R E F E R E N C E S Abadi, M., Agarwal, A., Barham, P ., Bre vdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfello w , I., Harp, A., Irv- ing, G., Isard, M., Jia, Y ., Jozefowicz, R., Kaiser , L., Kudlur , M., Levenber g, J., Mané, D., Monga, R., Moore, S., Murray , D., Olah, C., Schuster , M., Shlens, J., Steiner , B., Sutskev er , I., T alwar , K., T ucker , P ., V anhouck e, V ., V asude van, V ., V iégas, F ., V inyals, Learning Fitness Functions f or Machine Programming O., W arden, P ., W attenberg, M., W icke, M., Y u, Y ., and Zheng, X. T ensorFlow: Large-Scale Ma- chine Learning on Heterogeneous Distributed Systems, 2015. URL http://download.tensorflow. org/paper/whitepaper2015.pdf . Alur , R., Bodík, R., Dallal, E., Fisman, D., Garg, P ., Ju- niwal, G., Kress-Gazit, H., Madhusudan, P ., Martin, M. M. K., Raghothaman, M., Saha, S., Seshia, S. A., Singh, R., Solar-Lezama, A., T orlak, E., and Udupa, A. Syntax- Guided Synthesis. In Irlbeck, M., Peled, D. A., and Pretschner , A. (eds.), Dependable Softwar e Systems Engi- neering , volume 40 of N A TO Science for P eace and Secu- rity Series, D: Information and Communication Security , pp. 1–25. IOS Press, 2015. ISBN 978-1-61499-494-7. doi: 10.3233/978- 1- 61499- 495- 4- 1. URL https:// doi.org/10.3233/978- 1- 61499- 495- 4- 1 . Balog, M., Gaunt, A. L., Brockschmidt, M., Now ozin, S., and T arlow , D. DeepCoder . https://github .com/dkamm/deepcoder , 2017a. Balog, M., Gaunt, A. L., Brockschmidt, M., Nowozin, S., and T arlow , D. DeepCoder: Learning to Write Programs. In International Confer ence on Learning Repr esentations , April 2017b. Becker , K. and Gottschlich, J. AI Programmer: Au- tonomously Creating Software Programs Using Genetic Algorithms. CoRR , abs/1709.05703, 2017. URL http: //arxiv.org/abs/1709.05703 . Bodík, R. and Jobstmann, B. Algorithmic Program Syn- thesis: Introduction. International Journal on Softwar e T ools for T echnology T ransfer , 15:397–411, 2013. Brameier , M. On Linear Genetic Pr ogramming . PhD thesis, Dortmund, Germany , 2007. Bunel, R., Hausknecht, M. J., Devlin, J., Singh, R., and K ohli, P . Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis. In 6th In- ternational Conference on Learning Repr esentations, ICLR 2018, V ancouver , BC, Canada, April 30 - May 3, 2018, Conference T rack Pr oceedings . OpenRevie w .net, 2018. URL https://openreview.net/forum? id=H1Xw62kRZ . Cai, J., Shin, R., and Song, D. Making Neural Programming Architectures Generalize via Recursion. In 5th Interna- tional Confer ence on Learning Repr esentations, ICLR 2017, T oulon, F rance, April 24-26, 2017, Confer ence T rac k Pr oceedings . OpenRe view .net, 2017. URL https: //openreview.net/forum?id=BkbY4psgg . Chen, X., Liu, C., and Song, D. T o wards synthesizing complex programs from input-output examples. In 6th International Conference on Learning Representations, ICLR 2018, V ancouver , BC, Canada, April 30 - May 3, 2018, Conference T rack Pr oceedings . OpenReview .net, 2018. URL https://openreview.net/forum? id=Skp1ESxRZ . Cheung, A., Solar-Lezama, A., and Madden, S. Using Program Synthesis for Social Recommendations. ArXiv , abs/1208.2925, 2012. Debray , S. K., Evans, W ., Muth, R., and De Sutter, B. Com- piler T echniques for Code Compaction. A CM T rans. Pr o- gram. Lang. Syst. , 22(2):378–415, March 2000. ISSN 0164-0925. doi: 10.1145/349214.349233. URL http: //doi.acm.org/10.1145/349214.349233 . Deng, J., Dong, W ., Socher , R., Li, L.-J., Li, K., and Fei- Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09 , 2009. Devlin, J., Uesato, J., Bhupatiraju, S., Singh, R., Mohamed, A., and K ohli, P . Rob ustFill: Neural Program Learning under Noisy I/O. In Pr oceedings of the 34th Interna- tional Conference on Mac hine Learning , ICML 2017, Sydne y , NSW , Australia, 6-11 A ugust 2017 , pp. 990–998, 2017. URL http://proceedings.mlr.press/ v70/devlin17a.html . Dinesh, K., Luca, B., Matt, W ., Anders, H., and Kit, G. LINQ to SQL: .NET Language- Integrated Query for Relational Data, 2007. URL https://docs.microsoft.com/en- us/ previous- versions/dotnet/articles/ bb425822(v=msdn.10) . Feng, Y ., Martins, R., Bastani, O., and Dillig, I. Pro- gram Synthesis Using Conﬂict-dri ven Learning. In Pr o- ceedings of the 39th ACM SIGPLAN Conference on Pr ogramming Language Design and Implementation , PLDI 2018, pp. 420–435, New Y ork, NY , USA, 2018. A CM. ISBN 978-1-4503-5698-5. doi: 10.1145/3192366. 3192382. URL http://doi.acm.org/10.1145/ 3192366.3192382 . Goldberg, D. E. Genetic Algorithms in Sear ch, Optimiza- tion and Machine Learning . Addison-W esley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1989. ISBN 0201157675. Gottschlich, J., Solar-Lezama, A., T atbul, N., Carbin, M., Rinard, M., Barzilay , R., Amarasinghe, S., T enenbaum, J. B., and Mattson, T . The Three Pillars of Machine Programming. In Pr oceedings of the 2nd A CM SIG- PLAN International W orkshop on Machine Learning and Pr ogramming Languag es , MAPL 2018, pp. 69–80, New Y ork, NY , USA, 2018. A CM. ISBN 978-1-4503- 5834-7. doi: 10.1145/3211346.3211355. URL http: //doi.acm.org/10.1145/3211346.3211355 . Learning Fitness Functions f or Machine Programming Gulwani, S., Harris, W . R., and Singh, R. Spreadsheet Data Manipulation Using Examples. Commun. A CM , 55(8):97–105, August 2012. ISSN 0001-0782. doi: 10.1145/2240236.2240260. URL http://doi.acm. org/10.1145/2240236.2240260 . Heule, S., Schkufza, E., Sharma, R., and Aiken, A. Stratiﬁed Synthesis: Automatically Learning the x86-64 Instruction Set. SIGPLAN Not. , 51(6):237–250, June 2016. ISSN 0362-1340. doi: 10.1145/2980983.2908121. URL http: //doi.acm.org/10.1145/2980983.2908121 . Khuntia, B., Pattnaik, S., Panda, D., Neog, D., Devi, S., and Dutta, M. Genetic algorithm with artiﬁcial neural networks as its ﬁtness function to design rectangular mi- crostrip antenna on thick substrate. Microwave and Op- tical T echnology Letters , 44:144 – 146, 01 2005. doi: 10.1002/mop.20570. K orns, M. F . Accuracy in Symbolic Re gression , pp. 129–151. Springer New Y ork, New Y ork, NY , 2011. ISBN 978-1-4614-1770-5. doi: 10.1007/ 978- 1- 4614- 1770- 5_8. URL https://doi.org/10. 1007/978- 1- 4614- 1770- 5_8 . Krizhevsk y , A. Learning Multiple Layers of Features from T iny Images. T echnical report, 2009. Labs, S. Evolv Deli vers Autonomous Optimization Across W eb & Mobile. https://www.evolv.ai/ . Langdon, W . B. and Poli, R. F oundations of Genetic Pr o- gramming . Springer Publishing Company , Incorporated, 1st edition, 2010. ISBN 3642076327. Li, C. and W ang, B. Principal Components Analysis, 2014. URL http://www.ccs.neu.edu/home/vip/ teach/MLcourse/5_features_dimensions/ lecture_notes/PCA/PCA.pdf . Liu, H., Simonyan, K., V inyals, O., Fernando, C., and Kavukcuoglu, K. Hierarchical Representations for Ef ﬁ- cient Architecture Search. CoRR , abs/1711.00436, 2017. URL . Loncaric, C., Ernst, M. D., and T orlak, E. General- ized Data Structure Synthesis. In Pr oceedings of the 40th International Confer ence on Softwar e Engineering , ICSE 2018, pp. 958–968, New Y ork, NY , USA, 2018. A CM. ISBN 978-1-4503-5638-1. doi: 10.1145/3180155. 3180211. URL http://doi.acm.org/10.1145/ 3180155.3180211 . Manna, Z. and W aldinger, R. Kno wledge and Reasoning in Program Synthesis. Artiﬁcial Intelligence , 6(2):175 – 208, 1975. ISSN 0004-3702. Matos Dias, J., Rocha, H., Ferreira, B., and Lopes, M. d. C. A genetic algorithm with neural network ﬁtness function ev aluation for IMR T beam angle optimization. Central Eur opean Journal of Operations Resear ch , 22, 09 2014. doi: 10.1007/s10100- 013- 0289- 4. Murphy , K. P . Mac hine Learning: A Pr obabilistic P er- spective . The MIT Press, 2012. ISBN 0262018020, 9780262018029. Perkis, T . Stack-based genetic programming. In Proceed- ings of the F irst IEEE Conference on Evolutionary Com- putation. IEEE W orld Congr ess on Computational Intelli- gence , pp. 148–153 v ol.1, 1994. Ratner , A., Alistarh, D., Alonso, G., Andersen, D. G., Bailis, P ., Bird, S., Carlini, N., Catanzaro, B., Chung, E., Dally , B., Dean, J., Dhillon, I. S., Dimakis, A. G., Dubey , P ., Elkan, C., Fursin, G., Ganger , G. R., Getoor , L., Gibbons, P . B., Gibson, G. A., Gonzalez, J. E., Gottschlich, J., Han, S., Hazelwood, K. M., Huang, F ., Jaggi, M., Jamieson, K. G., Jordan, M. I., Joshi, G., Khalaf, R., Knight, J., K onecný, J., Kraska, T ., Kumar , A., Kyrillidis, A., Li, J., Madden, S., McMahan, H. B., Meijer , E., Mitliagkas, I., Monga, R., Murray , D. G., Papailiopoulos, D. S., Pekhi- menko, G., Rekatsinas, T ., Rostamizadeh, A., Ré, C., Sa, C. D., Sedghi, H., Sen, S., Smith, V ., Smola, A., Song, D., Sparks, E. R., Stoica, I., Sze, V ., Udell, M., V an- schoren, J., V enkataraman, S., V inayak, R., W eimer, M., W ilson, A. G., Xing, E. P ., Zaharia, M., Zhang, C., and T alwalkar , A. SysML: The Ne w Frontier of Machine Learning Systems. CoRR , abs/1904.03257, 2019. URL http://arxiv.org/abs/1904.03257 . Raychev , V ., V echev , M., and Y ahav , E. Code Com- pletion with Statistical Language Models. In Pr o- ceedings of the 35th ACM SIGPLAN Conference on Pr ogramming Language Design and Implementation , PLDI ’14, pp. 419–428, New Y ork, NY , USA, 2014. A CM. ISBN 978-1-4503-2784-8. doi: 10.1145/2594291. 2594321. URL http://doi.acm.org/10.1145/ 2594291.2594321 . Real, E., Aggarwal, A., Huang, Y ., and Le, Q. V . Regular- ized Evolution for Image Classiﬁer Architecture Search. CoRR , abs/1802.01548, 2018. URL http://arxiv. org/abs/1802.01548 . Real, E., Aggarwal, A., Huang, Y ., and Le, Q. V . Regular- ized Evolution for Image Classiﬁer Architecture Search. In Thirty-Thir d AAAI Confer ence on Artiﬁcial Intelli- gence , February 2019. Real, E., Liang, C., So, D. R., and Le, Q. V . Automl- zero: Evolving machine learning algorithms from scratch, 2020. Learning Fitness Functions f or Machine Programming Reed, S. E. and de Freitas, N. Neural Programmer - Interpreters. In Bengio, Y . and LeCun, Y . (eds.), 4th Inter - national Confer ence on Learning Repr esentations, ICLR 2016, San J uan, Puerto Rico, May 2-4, 2016, Conference T rac k Pr oceedings , 2016. URL abs/1511.06279 . Salimans, T ., Ho, J., Chen, X., Sidor , S., and Sutskev er , I. Evolution Strategies as a Scalable Alternative to Rein- forcement Learning. CoRR , abs/1703.03864, 2017. URL https://arxiv.org/abs/1703.03864 . Solar-Lezama, A., T ancau, L., Bodik, R., Seshia, S., and Saraswat, V . Combinatorial Sketching for Finite Pro- grams. SIGOPS Oper . Syst. Rev . , 40(5):404–415, Oc- tober 2006. ISSN 0163-5980. doi: 10.1145/1168917. 1168907. URL http://doi.acm.org/10.1145/ 1168917.1168907 . Such, F . P ., Madha van, V ., Conti, E., Lehman, J., Stanley , K. O., and Clune, J. Deep Neuroev olution: Genetic Algo- rithms Are a Competiti ve Alternati ve for T raining Deep Neural Networks for Reinforcement Learning. CoRR , abs/1712.06567, 2017. URL abs/1712.06567 . Thomas. Global Optimization Algorithms-Theory and Application . 2009. http://www.it- weise.de/ projects/book.pdf . Zohar , A. and W olf, L. Automatic Program Synthesis of Long Programs with a Learned Garbage Collector. CoRR , abs/1809.04682, 2018. URL abs/1809.04682 . A A P P E N D I X A : N E T S Y N ’ S D S L In this appendix, we pro vide more details about the list DSL that NetSyn uses to generate programs. Our list DSL has only two implicit data types, integer and list of integer . A program in this DSL is a sequence of statements, each of which is a call to one of the 41 functions deﬁned in the DSL. There are no explicit v ariables, nor conditionals, nor explicit control ﬂow operations in the DSL, although many of the functions in the DSL are high-level and contain implicit conditionals and control ﬂo w within them. Each of the 41 functions in the DSL tak es one or two ar guments, each being of integer or list of integer type, and returns exactly one output, also of inte ger or list of inte ger type. Given these rules, there are 10 possible function signatures. Ho wever , only 5 of these signatures occur for the functions we chose to be part of the DSL. The following sections are broken down by the function signature, wherein all the functions in the DSL having that signature are described. Instead of named variables, each time a function call re- quires an argument of a particular type, our DSL ’ s runtime searches backwards and ﬁnds the most recently executed function that returns an output of the required type and then uses that output as the current function’ s input. Thus, for the ﬁrst statement in the program, there will be no previous function’ s output from which to draw the arguments for the ﬁrst function. When there is no previous output of the correct type, then our DSL ’ s runtime looks at the arguments to the program itself to pro vide those v alues. Moreov er , it is possible for the program’ s inputs to not provide a value of the requested type. In such cases, the runtime provides a default value for missing inputs, 0 in the case of integer and an empty list in the case of list of integer . For e xample, let us say that a program is given a list of integer as input and that the ﬁrst three functions called in the program each consume and produce a list of inte ger . No w , let us assume that the fourth function called takes an integer and a list of integer as input. The list of integer input will use the list of integer output from the pre vious function call. The DSL runtime will search backwards and ﬁnd that none of the previous function calls produced inte ger output and that no integer input is present in the program’ s inputs either . Thus, the runtime would pro vide the v alue 0 as the inte ger input to this fourth function call. The ﬁnal output of a program is the output of the last function called. Thus, our language is deﬁned in such a way that so long as the program consists only of calls to one of the 41 functions provided by the DSL, that these programs are v alid by con- struction. Each of the 41 functions is guaranteed to ﬁnish in a ﬁnite time and there are no looping constructs in the DSL, and thus, programs in our DSL are guaranteed to ﬁn- ish. This property allo ws our system to not have to monitor the programs that they e xecute to detect potentially inﬁnite Learning Fitness Functions f or Machine Programming loops. Moreover , so long as the implementations of those 41 functions are secure and hav e no potential for memory corruption then programs in our DSL are similarly guaran- teed to be secure and not crash and thus we do not require any sand-boxing techniques. When our system performs crossov er between two candidate programs, any arbitrary cut points in both of the parent programs will result in a child program that is also v alid by construction. Thus, our system need not test that programs created via crossover or mutation are valid. In the following sections, [] is used to indicate the type list of integer whereas int is used to indicate the integer type. The type after the arro w is used to indicate the output type of the function. A.1 Functions with the Signature [] → int There are 9 functions in our DSL that take a list of integer as input and return an integer as output. A.1.1 HEAD (Function 6) This function returns the ﬁrst item in the input list. If the list is empty , a 0 is returned. A.1.2 LAST (Function 7) This function returns the last item in the input list. If the list is empty , a 0 is returned. A.1.3 MINIMUM (Function 8) This function returns the smallest integer in the input list. If the list is empty , a 0 is returned. A.1.4 MAXIMUM (Function 9) This function returns the lar gest integer in the input list. If the list is empty , a 0 is returned. A.1.5 SUM (Function 11) This functions returns the sum of all the inte gers in the input list. If the list is empty , a 0 is returned. A.1.6 COUNT (Function 2-5) This function returns the number of items in the list that satisfy the criteria speciﬁed by the additional lambda. Each possible lambda is counted as a dif ferent function. Thus, there are 4 COUNT functions having lambdas: >0, <0, odd, ev en. A.2 Functions with the Signature [] → [] There are 21 functions in our DSL that take a list of integer as input and produce a list of integer as output. A.2.1 REVERSE (Function 29) This function returns a list containing all the elements of the input list but in re verse order . A.2.2 SORT (Function 35) This function returns a list containing all the elements of the input list in sorted order . A.2.3 MAP (Function 19-28) This function applies a lambda to each element of the input list and creates the output list from the outputs of those lambdas. Let I n be the nth element of the input list to MAP and let O n be the nth element of the output list from Map. MAP produces an output list such that O n =lambda( I n ) for all n. There are 10 MAP functions corresponding to the following lambdas: +1,-1,*2,*3,*4,/2,/3,/4,*(-1),ˆ2. A.2.4 FILTER (Function 14-17) This function returns a list containing only those elements in the input list satisfying the criteria speciﬁed by the additional lambda. Ordering is maintained in the output list relative to the input list for those elements satisfying the criteria. There are 4 FIL TER functions having the lambdas: >0, <0, odd, ev en. A.2.5 SCANL1 (Function 30-34) Let I n be the nth element of the input list to SCANL1 and let O n be the nth element of the output list from SCANL1. This function produces an output list as follows: ( O n = I n & n == 0 O n = lambda ( I n , O n − 1 ) & n > 0 There are 5 SCANL1 functions corresponding to the follo w- ing lambdas: +, -, *, min, max. A.3 Functions with the Signature int,[] → [] There are 4 functions in our DSL that take an inte ger and a list of inte ger as input and produce a list of inte ger as output. A.3.1 T AKE (Function 36) This function returns a list consisting of the ﬁrst N items of the input list where N is the smaller of the integer argument to this function and the size of the input list. A.3.2 DR OP (Function 13) This function returns a list in which the ﬁrst N items of the input list are omitted, where N is the integer argument to this function. Learning Fitness Functions f or Machine Programming A.3.3 DELETE (Function 12) This function returns a list in which all the elements of the input list having v alue X are omitted where X is the integer argument to this function. A.3.4 INSERT (Function 18) This function returns a list where the v alue X is appended to the end of the input list, where X is the integer argument to this function. A.4 Functions with the Signature [],[] → [] There is only one function in our DSL that takes two lists of integers and returns another list of inte gers. A.4.1 ZIPWITH (Function 37-41) This function returns a list whose length is equal to the length of the smaller input list. Let O n be the nth element of the output list from ZIPWITH. Moreov er , let I 1 n and I 2 n be the nth elements of the ﬁrst and second input lists re- spectiv ely . This function creates the output list such that O n =lambda( I 1 n , I 2 n ). There are 5 ZIPWITH functions corre- sponding to the following lambdas: +, -, *, min, max. A.5 Functions with the Signature int,[] → int There are two functions in our DSL that take an integer and list of integer and return an inte ger . A.5.1 A CCESS (Function 1) This function returns the Nth element of the input list, where N is the inte ger argument to this function. If N is less than 0 or greater than the length of the input list then 0 is returned. A.5.2 SEARCH (Function 10) This function return the position in the input list where the value X is ﬁrst found, where X is the integer argument to this function. If no such value is present in the list, then -1 is returned. B A P P E N D I X B : S Y S T E M D E TA I L S B.1 Hyper -parameters for the Models and Genetic Algorithm • Evolutionary Algorithm: – Gene pool size: 100 – Number of reserve gene in each generation: 5 – Maximum number of generation: 30,000 – Crossov er rate: 40% – Mutation rate: 30% C A D D I T I O N A L R E S U LT S T able 3 shows detailed numerical results using synthesis time as the metric. Columns 10% to 100% show the duration of time (in seconds) it takes to synthesize the corresponding percentage of programs. Learning Fitness Functions f or Machine Programming T able 3. Comparison with DeepCoder and PCCoder in synthesizing diff erent length programs. All e xperiments are done with the maximum search space set to 3,000,000 candidate programs. P RO G R A M M E T H O D S Y N T H E S I S T I M E R E Q U I R E D T O S Y N T H E S I Z E ( I N S E C O N D S ) L E N G T H P E R C E N T AG E 1 0 % 2 0 % 3 0 % 4 0 % 5 0 % 6 0 % 7 0 % 8 0 % 9 0 % 1 00 % 5 PushGP 4 5 % 1 S 6 5 S 3 7 2 S 4 5 6 S - - - - - - Edit 7 2 % 1 S 7 S 1 1 6 S 2 8 8 S 3 6 5 S 3 9 5 S 4 9 2 S - - - De epCo der 4 0 % < 1 S < 1 S 2 S 1 2 6 S - - - - - - PCCo der 5 1 % 1 S 1 S 6 S 6 6 S 3 57 S - - - - - R obustFil l 6 3 % 1 S 1 S 8 S 8 3 S 4 72 S 1 3 2 1 S - - - - NetSyn FP 9 4 % 1 3 S 1 3 S 1 9 S 6 1 S 1 7 2 S 6 9 1 S 1 6 7 1 S 6 3 1 1 S 3 0 7 1 2 S - NetSyn LCS 9 7 % 1 3 S 1 3 S 1 9 S 5 7 S 1 7 5 S 9 5 7 S 1 8 8 0 S 4 1 3 0 S 2 0 5 8 0 S - NetSyn CF 9 4 % 1 2 S 1 2 S 1 7 S 3 1 S 1 7 2 S 1 0 3 8 S 2 8 2 5 S 78 6 4 S 4 2 6 4 8 S - Or acle LCS | CF 1 0 0 % < 1 S < 1 S < 1 S < 1 S < 1 S < 1 S 1 S 1 S 1 S 1 S 7 PushGP 3 8 % 1 S 1 S 6 9 4 S - - - - - - - Edit 5 1 % 1 S 1 S 2 5 4 S 3 6 7 S 4 3 3 S - - - - - De epCo der 4 5 % < 1 S < 1 S < 1 S 1 3 S - - - - - - PCCo der 5 2 % 1 S 1 S 2 S 1 1 S 6 35 S - - - - - R obustFil l 5 6 % 1 S 1 S 3 S 2 7 S 5 35 S - - - - - NetSyn FP 7 2 % 1 3 S 1 3 S 1 6 S 5 1 S 4 2 4 S 6 5 0 6 S 1 0 9 6 5 9 S - - - NetSyn LCS 7 2 % 1 3 S 1 3 S 1 6 S 5 8 S 4 3 3 S 1 0 3 6 3 S 1 0 0 7 2 8 S - - - NetSyn CF 7 6 % 1 2 S 1 2 S 1 5 S 5 6 S 4 8 9 S 6 8 6 2 S 81 0 3 7 S - - - Or acle LCS | CF 1 0 0 % < 1 S < 1 S < 1 S < 1 S < 1 S < 1 S 1 S 1 S 1 S 1 S 1 0 PushGP 3 2 % 1 S 1 S 1 4 54 S - - - - - - - Edit 4 3 % 1 S 2 0 5 S 4 3 7 S 5 9 1 S - - - - - - De epCo der 4 2 % < 1 S < 1 S < 1 S 6 7 S - - - - - - PCCo der 4 8 % 1 S 1 S 4 S 1 0 1 1 S - - - - - - R obustFil l 4 5 % 1 S 2 S 1 4 S 8 5 6 S - - - - - - NetSyn FP 6 4 % 1 3 S 1 3 S 1 3 S 7 4 S 7 6 3 S 2 9 2 0 6 S - - - - NetSyn CF 6 6 % 1 3 S 1 3 S 1 3 S 6 3 S 7 0 1 S 9 0 1 6 S - - - - NetSyn LCS 6 6 % 1 3 S 1 3 S 1 3 S 6 0 S 5 2 1 S 1 7 3 8 4 S - - - - Or acle LCS | CF 1 0 0 % < 1 S < 1 S < 1 S < 1 S < 1 S < 1 S 1 S 1 S 1 S 1 S Learning Fitness Functions f or Machine Programming T able 4. Comparison with DeepCoder and PCCoder in terms of search space use. All experiments are done with the maximum search space set to 3,000,000 candidate programs. P RO G R A M M E T H O D S E A R C H S PAC E U S E D T O S Y N T H E S I Z E L E N G T H 1 0 % 2 0 % 3 0 % 4 0 % 5 0 % 6 0 % 7 0 % 8 0 % 9 0 % 1 0 0 % 5 PushGP < 1 % 9 % 6 0 % 6 7 % - - - - - - Edit < 1 % < 1 % 1 7 % 4 3 % 5 4 % 6 0 % 7 3 % - - - De epCo der < 1 % 1 % 1 % 3 7 % - - - - - - PCCo der < 1 % 1 % 1 % 7 % 3 3 % - - - - - R obustFil l < 1 % 1 % 1 % 8% 3 5 % 4 7 % - - - - NetSyn FP < 1 % < 1 % < 1 % < 1 % 1 % 4 % 1 3 % 3 0 % 5 5 % - NetSyn LCS < 1 % < 1 % < 1 % < 1 % 1 % 8 % 1 7 % 2 5 % 4 8 % - NetSyn CF < 1 % < 1 % < 1 % < 1 % 1 % 1 0 % 2 2 % 4 0 % 5 8 % - Or acle LCS | CF < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % 7 PushGP < 1 % < 1 % 8 2 % - - - - - - - Edit < 1 % < 1 % 3 4 % 4 8 % 6 9 % - - - - - De epCo der < 1 % < 1 % 1 % 3 % - - - - - - PCCo der < 1 % < 1 % 1 % 1 % 3 8 % - - - - - R obustFil l < 1 % < 1 % 1 % 2 % 3 5 % - - - - - NetSyn FP < 1 % < 1 % < 1 % < 1 % 3 % 3 1 % 4 7 % - - - NetSyn LCS < 1 % < 1 % < 1 % < 1 % 3 % 2 6 % 5 9 % - - - NetSyn CF < 1 % < 1 % < 1 % < 1 % 4 % 3 1 % 5 6 % - - - Or acle LCS | CF < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % 1 0 PushGP < 1 % < 1 % 9 0 % - - - - - - - Edit < 1 % 2 0 % 4 3 % 5 6 % - - - - - - De epCo der < 1 % < 1 % 1 % 9 % - - - - - - PCCo der < 1 % < 1 % 1 % 6 1 % - - - - - - R obustFil l < 1 % 1 % 4 % 5 8 % - - - - - - NetSyn FP < 1 % < 1 % < 1 % < 1 % 5 % 3 4 % - - - - NetSyn CF < 1 % < 1 % < 1 % < 1 % 4 % 3 6 % - - - - NetSyn LCS < 1 % < 1 % < 1 % < 1 % 4 % 4 0 % - - - - Or acle LCS | CF < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 % < 1 %

Learning Fitness Functions for Machine Programming

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment