Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method

Algorithmic comple xity for psychology: A user -friendly implementation of the coding theorem method. Nicolas Gauvrit CHArt (P ARIS-reasoning), Ecole Pratique des Hautes Etudes, Paris, France Henrik Singmann Institut f ¨ ur Psychologie, Albert-Ludwigs-Univ ersit ¨ at Freibur g, Freib urg, German y . Fernando Soler-T oscano Grupo de L ´ ogica, Lenguaje e Informaci ´ on, Univ ersidad de Sevilla, Spain. Hector Zenil Unit of Computational Medicine, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden Corresponding author: Nicolas Gauvrit, ngauvrit@me.com K olmogorov-Chaitin complexity has long been believed to be impossible to approximate when it comes to short sequences (e.g. of length 5-50). Ho we ver , with the ne wly de veloped coding theorem method the complexity of strings of length 2-11 can now be numerically estimated. W e present the theoretical basis of algorithmic complexity for short strings (A CSS) and describe an R-package pro viding functions based on A CSS that will co ver psychologists’ needs and improve upon previous methods in three ways: (1) ACSS is now av ailable not only for binary strings, but for strings based on up to 9 di ﬀ erent symbols, (2) ACSS no longer requires time-consuming computing, and (3) a new approach based on ACSS giv es access to an estimation of the complexity of strings of any length. Finally , three illustrative examples show ho w these tools can be applied to psychology . Keyw ords: algorithmic complexity , randomness, subjective probability , coding theorem method Randomness and complexity are two concepts which are in- timately related and are both central to numerous recent de- velopments in various ﬁelds, including ﬁnance (T aufemback, Giglio, & Da Silva, 2011; Brandouy , Delahaye, Ma, & Zenil, 2012), linguistics (Gruber, 2010; Naranan, 2011), neuropsychology (Machado, Miranda, Morya, Amaro Jr , & Sameshima, 2010; Fern ´ andez et al., 2011, 2012), psychiatry (Y ang & Tsai, 2012; T akahashi, 2013), genetics (Y agil, 2009; Ryabko, Reznikov a, Druzyaka, & Pantelee va, 2013), sociol- ogy (Elzinga, 2010) and the behavioral sciences (W atanabe et al., 2003; Scafetta, Marchi, & W est, 2009). In psychol- ogy , randomness and complexity have recently attracted in- terest, following the realization that they could shed light on a div ersity of previously undeciphered behaviors and mental processes. It has been found, for instance, that the subjecti ve di ﬃ culty of a concept is directly related to its “boolean com- plexity”, deﬁned as the shortest logical description of a con- cept (Feldman, 2000, 2003, 2006). In the same vein, visual detection of shapes has been shown to be related to contour complexity (W ilder , Feldman, & Singh, 2011). Henrik Singmann is now at the Department of Psychology , Uni- versity of Zurich (Switzerland). W e would like to thank W illiam Matthews for providing the data from his 2013 manuscript. More generally , perceptual organization itself has been described as based on simplicity or , equiv alently , likelihood (Chater, 1996; Chater & V it ´ anyi, 2003), in a model recon- ciling the complexity approach (perception is organized to minimize complexity) and a probability approach (percep- tion is organized to maximize likelihood), very much in line with our view in this paper . Even the perception of similarity may be viewed through the lens of (conditional) complexity (U. Hahn, Chater , & Richardson, 2003). Randomness and complexity also play an important role in modern approaches to selecting the “best” among a set of candidate models (i.e., model selection; e.g., Myung, Nav arro, & Pitt, 2006; K ellen, Klauer , & Br ¨ oder, 2013), as discussed in more detail below in the section called “Rela- tionship to complexity based model selection”. Complexity can also shed light on short term memory storage and recall, more speciﬁcally , on the process un- derlying chunking . It is well known that the short term memory span lies between 4 and 7 items / chunks (Miller, 1956; Cow an, 2001). When instructed to memorize longer sequences of, for example, letters or numbers, indi viduals employ a strategy of subdividing the sequence into chunks (Baddeley , Thomson, & Buchanan, 1975). Ho wev er, the way chunks are created remains largely unexplained. A plausible explanation might be that chunks are built via minimizing the complexity of each chunk. For instance, one could split the sequence “ AAABAB AB A ” into the two substrings “ AAA ” 2 NICOLAS GA UVRIT and “BAB AB A ”. V ery much in line with this idea, Mathy and Feldman (2012) provided evidence for the hypothesis that chunks are units of “maximally compressed code”. In the abov e situation, both “ AAA ” and “BAB AB A ” are suppos- edly of low complexity , and the chunks tailored to minimize the length of the resulting compressed information. Outside the psychology of short term memory , the com- plexity of pseudo-random human-generated sequences is re- lated to the strength of executi ve functions, speciﬁcally to in- hibition or sustained attention (T o wse & Cheshire, 2007). In random generation tasks, participants are asked to generate random-looking sequences, usually inv olving numbers from 1 to 6 (dice task) or digits between 1 and 9. These tasks are easy to administer and very informati ve in terms of higher lev el cognitiv e abilities. They have been used in in vestiga- tions in various areas of psychology , such as visual percep- tion (Cardaci, Di Gesu, Petrou, & T abacchi, 2009), aesthet- ics (Boon, Casti, & T aylor , 2011), de velopment (Scibinetti, T occi, & Pesce, 2011; Pureza, Gonc ¸ alves, Branco, Grassi- Oliv eira, & Fonseca, 2013), sport (Audi ﬀ ren, T omporowski, & Zagrodnik, 2009), creati vity (Zabelina, Robinson, Coun- cil, & Bresin, 2012), sleep (Heuer , Kohlisch, & Klein, 2005; Bianchi & Mendez, 2013), and obesity (Crov a et al., 2013), to mention a few . In neuropsychology , random number gen- eration tasks and other measures of beha vioral or brain acti v- ity complexity hav e been used to in vestigate di ﬀ erent disor- ders, such as schizophrenia (K oike et al., 2011), autism (Lai et al., 2010; Maes, V issers, Egger, & Eling, 2012; F ournier , Amano, Radonovich, Bleser , & Hass, 2013), depression (Fern ´ andez et al., 2009), PTSD (Pearson & Sawyer, 2011; Curci, Lanciano, Soleti, & Rim ´ e, 2013), ADHD (Sokunbi et al., 2013), OCD (B ´ edard, Joyal, Godbout, & Chantal, 2009), hemispheric neglect (Loetscher & Brugger , 2009), aphasia (Proios, Asaridou, & Brugger , 2008), and neurodegenera- tiv e syndromes such as Parkinson’ s and Alzheimer’ s disease (Brown & Marsden, 1990; T . Hahn et al., 2012). Perceiv ed comple xity and randomness are also of the utmost importance within the “new paradigm psychology of reasoning” (Over, 2009). As an example, let us con- sider the representativeness heuristic (Tversky & Kahne- man, 1974). Participants usually believ e that the sequence “HTHTHTHT” is less likely to occur than the sequence “HTHHTHTT” when a fair coin is tossed 8 times. This of course is mathematically wrong, since all sequences of 8 heads or tails, including these two, share the same proba- bility of occurrence (1 / 2 8 ). The “old paradigm” was con- cerned with ﬁnding such biases and attributing irrationality to individuals (Kahneman, Slovic, & Tversk y , 1982). In the “new paradigm” on the other hand, researchers try to dis- cov er the ways in which sound probabilistic or Bayesian rea- soning can lead to the observed errors (Manktelow & Over, 1993; U. Hahn & W arren, 2009). W e can ﬁnd some kind of rationality behind the wrong an- swer , by assuming that individuals do not estimate the prob- ability that a fair coin will produce a particular string s b ut rather the “in verse” probability that the process underlying this string is mere chance. More formally , if we use s to denote a gi ven string, R to denote the event where a string has been produced by a random process, and D to denote the complementary ev ent where a string has been produced by a non-random (or deterministic) process, then indi viduals may assess P ( R | s ) instead of P ( s | R ). If they do so within the frame work of formal probability theory (and the new paradigm postulates that indi viduals tend to do so), then their estimation of the probability should be such that Bayes’ the- orem holds: P ( R | s ) = P ( s | R ) P ( R ) P ( s | R ) P ( R ) + P ( s | D ) P ( D ) . (1) Alternativ ely , we could assume that individuals do not es- timate the complete in verse P ( R | s ) b ut just the posterior odds of a given string being produced by a random rather than a deterministic process (W illiams & Gri ﬃ ths, 2013). Again, these odds are giv en by Bayes’ theorem: P ( R | s ) P ( D | s ) = P ( s | R ) P ( s | D ) × P ( R ) P ( D ) . (2) The important part of Equation 2 is the ﬁrst term on the right-hand side, as it is a function of the observed string s and independent of the prior odds P ( R ) / P ( D ). This likeli- hood ratio, also known as the Bayes factor (Kass & Raftery, 1995), quantiﬁes the evidence brought by the string s based on which the prior odds are changed. In other words, this part corresponds to the “amount of evidence [ s ] provides in fav or of a random generating process” (Hsu, Gri ﬃ ths, & Schreiber, 2010). The numerato r of the Bayes factor , P ( s | R ), is easily com- puted, and amounts to 1 / 2 8 in the example giv en abov e. Howe ver the other likelihood, the probability of s given that it was produced by an (unknown) deterministic pro- cess, P ( s | D ), is more problematic. Although this probability has been informally linked to complexity , to the best of our knowledge no formal account of that link has ev er been pro- vided in the psychological literature, although some authors hav e suggested such a link (e.g., Chater, 1996). As we will see, howe ver , computer scientists hav e fruitfully addressed this question (Solomono ﬀ , 1964a, 1964b; Levin, 1974). One can think of P ( s | D ) as the probability that a randomly se- lected (deterministic) algorithm produces s . In this sense, P ( s | D ) is none other than the so-called algorithmic pr obabil- ity of s . This probability is formally link ed to the algorithmic complexity K ( s ) of the string s by the following formula (see below for more details): K ( s ) ≈ − log 2 ( P ( s | D )) . (3) A normativ e measure of complexity and a way to make sense of P ( s | D ) are crucial to sev eral areas of research in psychology . In our example concerning the representati ve- ness heuristic, one could see some sort of rationality in the usually observed behavior if in fact the comple xity of s 1 = “HTHTHTHT” were lo wer than that of s 2 = “HTHHTHTT” (which it is, as shown below). Then following Equation 3, P ( s 1 | D ) > P ( s 2 | D ). Consequently , the Bayes factor for a string being produced by a random process would be larger ALGORITHMIC COMPLEXITY FOR PSYCHOLOGY : A USER-FRIENDL Y IMPLEMENT A TION OF THE CODING THEOREM METHOD. 3 for s 2 than for s 1 . In other words, ev en when ignoring the question of the priors for a random versus deterministic pro- cess (which are inherently subjective and debatable) s 2 pro- vides more evidence for a random process than s 1 . Researchers have hitherto relied on an intuiti ve percep- tion of complexity , or in the last decades developed and used several tailored measures of randomness or complex- ity (T owse, 1998; Barbasz, Stettner , W ierzcho ´ n, Piotrowski, & Barbasz, 2008; Schulter , Mittenecker , & Papousek, 2010; W illiams & Gri ﬃ ths, 2013; U. Hahn & W arren, 2009) in the hope of approaching algorithmic complexity . Because all these measures rely upon choices that are partially sub- jectiv e and each focuses on a single characteristic of chance, they hav e come under strong criticism (Gauvrit, Zenil, De- lahaye, & Soler-T oscano, 2013). Among these measures, some have a sound mathematical basis, but focus on partic- ular features of randomness. For that reason, contradictory results have been reported (W agenaar, 1970; Wie gersma, 1984). The mathematical deﬁnition of comple xity , known as K olmogorov-Chaitin complexity theory (K olmogorov, 1965; Chaitin, 1966), or simply algorithmic complexity , has been recognized as the best possible option by mathematicians (Li & V it ´ anyi, 2008) and psychologists (Gri ﬃ ths & T enenbaum, 2003, 2004). Howe ver , because algorithmic complexity was thought to be impossible to approximate for the short se- quences we usually deal with in psychology (sequences of 5-50 symbols, for instance), it has seldom been used. In this article, we will ﬁrst brieﬂy describe algorithmic complexity theory and its deep links with algorithmic proba- bility (leading to a formal deﬁnition of the probability that an unknown deterministic process results in a particular obser- vation s ). W e will then describe the practical limits of algo- rithmic complexity and present a means to overcome them, namely the coding theorem method, the root of algorithmic complexity for short strings (ACSS). A new set of tools, b un- dled in a package for the statistical programming language R (R Core T eam, 2014) and based on A CSS, will then be described. Finally , three short applications will be presented for illustrativ e purposes. Algorithmic complexity for short strings Algorithmic complexity As deﬁned by Alan T uring, a uni versal T uring machine is an abstraction of a general-purpose computing device capa- ble of running any computer program. Among uni versal T ur- ing machines, some are pr eﬁx fr ee , meaning that they only accept programs ﬁnishing with an “END” symbol. The al- gorithmic complexity (K olmogorov, 1965; Chaitin, 1966) – also called Kolmogorov or K olmogorov-Chaitin complexity – of a string s is giv en by the length of the shortest com- puter program running on a univ ersal preﬁx-free T uring ma- chine U that produces s and then halts, or formally written, K U ( s ) = min {| p | : U ( p ) = s } . From the invariance theo- r em (Calude, 2002; Li & V it ´ anyi, 2008), we kno w that the impact of the choice of U (that is, of a speciﬁc T uring ma- chine), is limited and independent of s . It means that for any other univ ersal T uring machine U 0 , the absolute value of K U ( s ) − K U 0 ( s ) is bounded by some constant C U , U 0 which depends on U and U 0 but not on the speciﬁc s . So K ( s ) is usually written instead of K U ( s ). More precisely , the inv ariance theorem states that K ( s ) computed on tw o di ﬀ erent T uring machine will di ﬀ er at most by an additi ve constant c , which is independent of s , but that can be arbitrary lar ge. One consequence of this theorem is that there actually are inﬁnitely many di ﬀ erent comple xities, depending on the T uring machine. T alking about “the al- gorithmic complexity” of a string is a shortcut. This theo- rem also guarantees that asymptotically (or for long strings), the choice of the T uring machine will hav e limited impact. Howe ver , for short strings we are considering here, the im- pact could be important, and di ﬀ erent T uring machines can yield di ﬀ erent values, or ev en di ﬀ erent orders. This limita- tion is not due to technical reasons, but the fact that there is no objecti ve default univ ersal T uring machine one could pick to compute “the” complexity of a string. As we will ex- plain below , our approach seeks to overcome this di ﬃ culty by deﬁning what could be thought of as a “mean complex- ity” as computed with random T uring machines. Because we do not choose a particular T uring machine but sample the space of all possible Turing machine (running on a blank tape), the result will be an objecti ve estimation of algorith- mic complexity , although it will of course di ﬀ er from most speciﬁc K U . Algorithmic complexity gav e birth to a deﬁnition of ran- domness. T o put it in a nutshell, a string is random if it is complex (i.e. exhibit no structure). Among the most strik- ing results of algorithmic complexity theory is the conv er- gence in deﬁnitions of randomness. For e xample, using mar- tingales, Schnorr (1973) proved that K olmogorov random (complex) sequences are e ﬀ ecti vely unpredictable and vice versa; Chaitin (2004) proved that Kolmogoro v random se- quences pass all e ﬀ ectiv e statistical tests for randomness and vice versa, and are therefore equiv alent to Martin-L ¨ of ran- domness (Martin-L ¨ of, 1966), hence the general acceptance of this measure. K has become the accepted ultimate univ er- sal deﬁnition of complexity and randomness in mathematics and computer science (Downey & Hirschfeldt, 2008; Nies, 2009; Zenil, 2011a). One generally o ﬀ ered caveat regarding K is that it is un- computable , meaning there is no T uring machine or algo- rithm that, giv en a string s , can output K ( s ), the length of the shortest computer program p that produces s . In fact, the theory sho ws that no computable measure can be a uni versal complexity measure. Howe ver , it is often overlooked that K is upper semi-computable . This means that it can be e ﬀ ec- tiv ely approximated from above. That is, that there are e ﬀ ec- tiv e algorithms, such as lossless compression algorithms, that can ﬁnd programs (the decompressor + the data to reproduce the string s in full) gi ving upper bounds of K olmogorov com- plexity . Howe ver , these methods are inapplicable to short strings (of length belo w 100), which is why they are seldom used in psychology . One reason why compression algorithms are unadapted to short strings is that compressed ﬁles include not only the instructions to decompress the string, but also 4 NICOLAS GA UVRIT ﬁle headers and other data structures. It makes that for a short string s , the size of a compressed text ﬁle with just s is longer than | s | . Another reason is that, as it is the case with the complexity K U associated with a particular T uring ma- chine U , the choice of the compression algorithm is crucial for short strings, because of the in variance theorem. Algorithmic pr obability and its r elationship to al- gorithmic complexity A universal preﬁx-free T uring machine U , can also be used to deﬁne a probability measure m on the set of all pos- sible strings by setting m ( s ) = P p : U ( p ) = s 1 / 2 | p | , where p is a program of length | p | , and U ( p ) the string produced by the T uring machine U fed with program p . The Kraft in- equality (Calude, 2002) guarantees that 0 ≤ P s m ( s ) < 1. The number m ( s ) is the probability that a randomly selected deterministic program will produce s and then halt, or sim- ply the algorithmic pr obability of s, and provides a formal deﬁnition of P ( s | D ), where D stands for a generic determin- istic algorithm (for a more detailed description see Gauvrit et al., 2013). Numerical approximations to m ( s ) using stan- dard Turing machines have shed light on the stability and robustness of m ( s ) in the f ace of changes in U , providing ex- amples of applications to various areas leading to semantic measures (Cilibrasi & V it ´ anyi, 2005, 2007), which today are accepted as regular methods in areas of computer science and linguistics, to mention but tw o disciplines. Recent work (Delahaye & Zenil, 2012; Zenil, 2011b) has suggested that approximating m ( s ) could in practice be used to approximate K for short strings. Indeed, the algorithmic coding theor em (Levin, 1974) establishes the connection as K ( s ) = − log 2 m ( s ) + O (1), where O (1) is bounded indepen- dently of s . This relationship shows that strings with low K ( s ) have the highest probability m ( s ), while strings with large K ( s ) have a correspondingly low probability m ( s ) of being generated by a randomly selected deterministic pro- gram. This approach, based upon and motiv ated by algorith- mic probability , allows us to approximate K by means other than lossless compression, and has been recently applied to ﬁnancial time series (Brandouy et al., 2012; Zenil & Dela- haye, 2011) and in psychology (e.g. Gauvrit, Soler-T oscano, & Zenil, 2014). The approach is equiv alent to ﬁnding the best possible compression algorithm with a particular com- puter program enumeration. Here, we extend a previous method addressing the ques- tion of binary strings’ complexity (Gauvrit et al., 2013) in sev eral ways. First, we provide a method to estimate the complexity of strings based on any number of di ﬀ erent sym- bols, up to 9 symbols. Second, we provide a fast and user- friendly algorithm to compute this estimation. Third, we also provide a method to approximate the comple xity (or the local complexity) of strings of medium length (see belo w). The in variance theor em and the pr oblem with short strings The in v ariance theorem does not provide a reason to ex- pect − log 2 ( m ( s )) and K ( s ) to induce the same ordering over short strings. Here, we have chosen a simple and stan- dard T uring machine model (the Busy Beaver model; Rado, 1962) 1 in order to build an output distrib ution based on the seminal concept of algorithmic probability . This output dis- tribution then serv es as an objective complexity measure pro- ducing results in agreement both with intuition and with K ( s ) to which it will con ver ge for long strings guaranteed by the in variance theorem. Furthermore, we have found that esti- mates of K ( s ) are strongly correlated to those produced by lossless compression algorithms as the y have traditionally been used as estimators of K ( s ) (compressed data is a su ﬃ - cient test of non-randomness, hence of low K ( s )), when both techniques (the coding theorem method and lossless com- pression) ov erlap in their range of application for medium size strings in the order of hundreds to 1K bits. The lack of guarantee in obtaining K ( s ) from − log 2 ( m ( s )) is a problem also found in the most traditional method to estimate K ( s ). Indeed, there is no guarantee that some loss- less compressor algorithm will be able to compress a string that is compressible by some (or many) other(s). W e do hav e statistical evidence that, at least with the Busy Beav er model, extending or reducing the Turing machine sample space does not impact the order (Zenil, Soler-T oscano, De- lahaye, & Gauvrit, 2012; Soler-T oscano, Zenil, Delahaye, & Gauvrit, 2014, 2013). W e also found a correlation in output distribution using very di ﬀ erent computational formalisms, e.g. cellular automata and Post tag systems (Zenil & Dela- haye, 2010). W e hav e also shown (Zenil et al., 2012) that − log 2 ( m ( s )) produces results compatible with compression methods K ( s ), and strongly correlates to direct K ( s ) calcula- tion (length of ﬁrst shortest T uring machine found producing s ; Soler-T oscano et al., 2013). The coding theor em method in practice The basic idea at the root of the coding theorem method is to compute an approximation of m ( s ). Instead of choosing a particular T uring machine, we ran a huge sample of T ur- ing machines, and sav ed the resulting strings. The distribu- tion of resulting strings giv es the probability that a randomly selected T uring machine (equi valent to a univ ersal T uring machine with a randomly selected program) will result in a giv en string. It therefore approximates m ( s ). From this, we estimate an approximation K of the algorithmic complexity of any string s using the equation K ( s ) = − log 2 ( m ( s )). T o put it in a nutshell again: A string is comple x and henceforth random if the likelihood of it being produced by a randomly selected algorithm is low , which we estimated as described in the following. T o actually build the frequency distributions of strings with di ﬀ erent numbers of symbols, we used a T uring ma- chine simulator , written in C++ , running on a supercomputer of middle-size at CICA (Centro Inform ´ atico Cient ´ ıﬁco de Andaluc ´ ıa). The simulator runs T uring machines in ( n , m ) ( n is the number of states of the T uring machine, and m the number of symbols it uses) ov er a blank tape and stores the 1 A demonstration is av ailable online at http://demonstrations.wolfram.com/BusyBeaver/ ALGORITHMIC COMPLEXITY FOR PSYCHOLOGY : A USER-FRIENDL Y IMPLEMENT A TION OF THE CODING THEOREM METHOD. 5 ( n , m ) Steps Machines T ime (5,2) 500 9 658 153 742 336 450 days (4,4) 2000 3 . 34 × 10 11 62 days (4,5) 2000 2 . 14 × 10 11 44 days (4,6) 2000 1 . 8 × 10 11 41 days (4,9) 4000 2 × 10 11 75 days T able 1 Data of the computations to b uild the frequency distributions output of halting computations. For the generation of ran- dom machines, we used the implementation of the Mersenne T wister in the Boost C++ library . T able 1 summarizes the size of the computations to build the distributions. The data corresponding to (5 , 2) comes from a full exploration of the space of T uring machines with 5 states and 2 symbols, as explained in Soler-T oscano, Zenil, Delahaye and Gauvrit (2014, 2013). All other data, previ- ously unpublished, correspond to samples of machines. The second column is the runtime cut. As the detection of non- halting machines is an undecidable problem, we stopped the computations exceeding that runtime. T o determine the run- time bound, we ﬁrst picked a sample of machines with an apt runtime T . For example, in the case of (4 , 4), we ran 1 . 68 × 10 10 machines with a runtime cut of 8000 steps. For halting machines in that sample, we built the runtime distri- bution (Fig. 1). Then we chose a runtime lower than T with an accumulated halting probability v ery close to 1. That way we chose 2000 steps for (4 , 4). In Soler -T oscano et al. (2014) we argued that, follo wing this methodology , we were able to cov er the v ast majority of halting machines. The third column in T able 1 is the size of the sample, that is, the machines that were actually run at the C++ simula- tor . After these computations, sev eral symmetric comple- tions were applied to the data, so in fact the number of ma- chines represented in the samples is greater . For example, we only considered machines moving to the right at the ini- tial transition. So we complemented the set of output strings with their re versals. More details about the shortcuts to re- 2000 4000 6000 8000 0.999997 0.999998 0.999998 0.999999 0.999999 1.000000 F igure 1 . Runtime distribution in (4 , 4). More than 99.9999% of the T uring machines that stopped in 8000 steps or less actually halted before 2000 steps. duce the computations and the completions can be found elsewhere (Soler-T oscano et al., 2013). The last column in T able 1 is an estimate of the time the computation would hav e taken on a single processor at the CICA supercomputer . As we used between 10 and 70 processors, the actual times the computations took were shorter . In the following paragraphs we o ﬀ er some details about the datasets obtained for each set of symbols. (5,2) . This distribution is the only one previously pub- lished (Soler -T oscano et al., 2014; Gauvrit et al., 2013). It consists of 99 608 di ﬀ erent binary strings. All strings up to length 11 are included, and only 2 strings of length 12 are missing. (4,4) . After applying the symmetric completions to the 3 . 34 × 10 11 machines in the sample, we obtained a dataset representing the output of 325 433 427 739 halting machines producing 17 768 208 di ﬀ erent string patterns 2 . T o reduce the ﬁle size and make it usable in practice, we selected only those patterns with 5 or more occurrences, resulting in a total of 1 759 364 string patterns. In the ﬁnal dataset, all strings comprising 4 symbols up to length 11 are represented by these patterns. (4,5) . After applying the symmetric completions, we obtained a dataset corresponding to the output of 220 037 859 595 halting machines producing 39 057 551 dif- ferent string patterns. Again, we selected only those patterns with 5 or more occurrences, resulting in a total of 3 234 430 string patterns. In the ﬁnal dataset, all strings comprising 5 symbols up to length 10 are represented by these patterns. (4,6) . After applying the symmetric completions,we obtained a dataset corresponding to the output of 192 776 974 234 halting machines producing 66 421 783 di ﬀ erent string patterns. Here we selected only those patterns with 10 or more occurrences, resulting in a total of 2 638 374 string patterns. In the ﬁnal dataset, all strings with 6 symbols up to length 10 are represented by these patterns. (4,9) . After applying the symmetric completions, we ob- tained 231 250 483 485 halting T uring machines producing 165 305 964 di ﬀ erent string patterns. W e selected only those with 10 or more occurrences, resulting in a total of 5 127 061 string patterns. In the ﬁnal dataset, all strings comprising 9 symbols up to length 10 are represented by these patterns. Summary . W e approximated the algorithmic complexity of short strings (ACSS) using the coding theorem method by running huge numbers of randomly selected T uring machines (or all possible T uring machines for strings of only 2 sym- bols) and recorded their halting state. The resulting distribu- tion of strings obtained approximates the comple xity of each 2 T wo strings correspond to the same pattern if one is obtained from the other by changing the symbols, such as “11233” and “22311”. The pattern is the structure of the string, in this exam- ple described as “one symbol repeated once, a second one, and a third one repeated once”. Given the deﬁnition of Turing Machines, strings with the same patterns always share the same comple xity . 6 NICOLAS GA UVRIT string; a string that was produced by more T uring machines is less comple x (or random) than one produced by fewer T ur- ing machines. The results of these computations, one dataset per number of symbols (2, 4, 5, 6, and 9), were bundled and made freely av ailable. Hence, although the initial computa- tion took weeks, the distributions are no w readily av ailable for all researchers interested in a formal and mathematically sound measure of the complexity of short strings. The acss packages T o make A CSS av ailable, we hav e released two packages for the statistical programming language R (R Core T eam, 2014) under the GPL license (Free Software Foundation, 2007) which are av ailable at the Central R Archive Network (CRAN; http://cran.r-project.org/ ). An introduc- tion to R is, howe ver , beyond the scope of this manuscript. W e recommend the use of RStudio ( http://www.rstudio .com/ ) and refer the interested reader to more comprehen- siv e literature (e.g., Jones, Maillardet, & Robinson, 2009; Maindonald & Braun, 2010; Matlo ﬀ , 2011). The ﬁrst package, acss.data , contains only the cal- culated datasets described in the previous section in com- pressed form (the total size is 13.9 MB) and should not be used directly . The second package, acss , contains (a) functions to access the datasets and obtain ACSS and (b) functions to calculate other measures of complexity , and is intended to be used by researchers in psychology who wish to analyze short or medium-length (pseudo-)random strings. When installing or loading acss the data-only pack- age is automatically installed or loaded. T o install both packages, simply run install.packages("acss") at the R prompt. After installation, the packages can be loaded with library("acss") . 3 The next section describes the most important functions currently included in the package. Main functions All functions within acss hav e some common features. Most importantly , the ﬁrst argument to all functions is string , corresponding to the string or strings for which one wishes to obtain the complexity measure. This argument nec- essarily needs to be a character vector (to avoid issues stem- ming from automatic coercion). In accordance with R ’ s gen- eral design, all functions are fully vectorized, hence string can be of length > 1 and will return an object of correspond- ing size. In addition, all functions return a named object, the names of the returned objects corresponding to the initial strings. Algorithmic complexity and probability . The main objec- tiv e of the acss package is to implement a con venient ver - sion of algorithmic complexity and algorithmic probability for short strings. The function acss() returns the A CSS ap- proximation of the complexity K ( s ) of a string s of length between 2 and 12 characters, based on alphabets with ei- ther 2, 4, 5, 6, or 9 symbols, which we shall hereafter call K 2 , K 4 , K 5 , K 6 and K 9 . The result is thus an approximation of the length of the shortest program running on a T uring machine that would produce the string and then halt. The function acss() also returns the observed probabil- ity D ( s ) that a string s of length up to 12 was produced by a randomly selected deterministic Turing machine. Just like K , it may be based on alphabets of 2, 4, 5, 6 or 9 symbols, here- after called D 2 , D 4 , ..., D 9 . As a consequence of returning both K and D , acss() per default returns a matrix 4 . Note that the ﬁrst time a function accessing acss.data is called within a R session, such as acss() , the complete data of all strings is loaded into the RAM which takes some time (even on modern computers this can take more than 10 seconds). > acss(c("aba", "aaa")) K.9 D.9 aba 11.90539 0.0002606874 aaa 11.66997 0.0003068947 Per default, acss() returns K 9 and D 9 , which can be used for strings up to an alphabet of 9 symbols. T o obtain values from a di ﬀ erent alphabet one can use the alphabet argu- ment (which is the second argument to all acss functions), which for acss() also accepts a vector of length > 1. If a string has more symbols than are available in the alphabet, these strings will be ignored: > acss(c("01011100", "00030101"), alphabet = c(2, 4)) K.2 K.4 D.2 D.4 01011100 22.00301 24.75269 2.379222e-07 3.537500e-08 00030101 NA 24.92399 NA 3.141466e-08 Local comple xity . When asked to judge the randomness of sequences of medium length (say 10-100) or asked to pro- duce pseudo-random sequences of this length, the limit of human working memory becomes problematic, and individ- uals likely try to overcome it by subdi viding and performing the task on subsequences, for example, by maximizing the local complexity of the string or a veraging across local com- plexities (U. Hahn, 2014; U. Hahn & W arren, 2009). This feature can be assessed via the local complexity() func- tion, which returns the complexity of substrings of a string as computed with a sliding windo w of substrings of length span , which may range from 2 to 12. The result of the func- tion as applied to a string s = s 1 s 2 . . . s l of length l with a span k is a vector of l − k + 1 values. The i -th value is the complexity (A CSS) of the substring s [ i ] = s i s i + 1 . . . s i + k − 1 . As an illustration, let’ s consider the 8-character long string based on a 4-symbol alphabet, “aaabcbad”. The local com- plexity of this string with span = 6 will return K 4 (aaabcb), K 4 (aabcba), and K 4 (abcbad), which equals (18 . 6 , 19 . 4 , 19 . 7): 3 More information is av ailable at http://cran.r-project .org/package=acss and the documentation for all functions is a v ailable at http://cran.r-project.org/web/packages/ acss/acss.pdf . 4 Remember (e.g., Equation 3) that the measures D and K are linked by simple relations deri ved from the coding theorem method : K ( s ) = − log 2 ( D ( s )) D ( s ) = 2 − K ( s ) . ALGORITHMIC COMPLEXITY FOR PSYCHOLOGY : A USER-FRIENDL Y IMPLEMENT A TION OF THE CODING THEOREM METHOD. 7 > local_complexity("aaabcbad", alphabet=4, span=6) $aaabcbad aaabcb aabcba abcbad 18.60230 19.41826 19.71587 Bayesian approac h . As discussed in the introduction, complexity may play a crucial role when combined with Bayes’ theorem (see also Williams & Gri ﬃ ths, 2013; Hsu et al., 2010). Instead of the observed probability D we actually may be interested in the likelihood of a string s of length l giv en a deterministic process P ( s | D ). As discussed before, this likelihood is trivial for a random process and amounts to P ( s | R ) = 1 / m l , where m , as abov e, is the size of the alphabet. T o facilitate the corresponding usage of A CSS, likelihood d() returns the likelihood P ( s | D ), giv en the ac- tual length of s . This is done by taking D ( s ) and normalizing it with the sum of all D ( s i ) for all s i with the same length l as s (note that this also entails performing the symmetric completions). As expected in the beginning, the likelihood of “HTHTHTHT” is larger than that of “HTHHTHTT” under a deterministic process: > likelihood_d(c("HTHTHTHT","HTHHTHTT"), alphabet=2) HTHTHTHT HTHHTHTT 0.010366951 0.003102718 W ith the likelihood at hand, we can make full use of Bayes’ theorem, and acss contains the corresponding func- tions. One can either obtain the likelihood ratio (Bayes factor) for a random rather than deterministic process via function likelihood ratio() . Or , if one is willing to make assumptions on the prior probability with which a ran- dom rather than deterministic process is responsible (i.e., P ( R ) = 1 − P ( D )) one can obtain the posterior probability for a random process given s , P ( R | s ), using prob random() . The default for the prior is P ( R ) = 0 . 5. > likelihood_ratio(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2) HTHTHTHT HTHHTHTT 0.3767983 1.2589769 > prob_random(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2) HTHTHTHT HTHHTHTT 0.2736772 0.5573217 Entr opy and second-or der entr opy . Entropy (Barbasz et al., 2008; Shannon, 1948) has been used for decades as a measure of complexity . It must be emphasized that (ﬁrst or- der) entropy does not capture the structure of a string, and only depends on the relati ve frequency of the symbols in the said string. F or instance, the string “0101010101010101” has a greater entrop y than “0100101100100010” because the ﬁrst string is balanced in terms of 0’ s and 1’ s. According to entropy , the ﬁrst string, although it is highly regular , should be considered more complex or more random than the second one. Second-order entropy has been put forward to over - come this incon venience, but it only does so partially . Indeed, second-order entropy only captures the narrowly local structure of a string. For instance, the string “01100110011001100110... ” maximizes second-order en- tropy , because the four patterns 00, 01, 10 and 11 share the same frequency in this string. The fact that the sequence is the result of a simplistic rule is not taken into account. Notwithstanding these strong limitations, entropy has been extensi vely used. For that historical reason, acss includes two functions, entropy() and entropy2() (second order entropy). Change complexity . Algorithmic complexity for short strings is an objective and univ ersal normativ e measure of complexity approximating Kolmogoro v-Chaitin complexity . A CSS helps in detecting any computable departures from randomness. This is exactly what researchers seek when they want to assess the formal quality of a pseudo-random production. Ho wever , psychologists may also wish to assess complexity as it is perceived by human participants. In that case, algorithmic complexity may be too sensitiv e. For in- stance, there exists a (relatively) short program computing the decimals of π . Howe ver , faced with that series of digits, humans are unlikely to see any regularity: algorithmic com- plexity identiﬁes as non-random some series that will look random to most humans. When it comes to assessing per ceived complexity , the tool developed by Aksentije vic and Gibson (2012), named “change comple xity”, is an interesting alternati ve. It is based on the idea that humans’ perception of complexity depends largely on the changes between one symbol and the next. Un- fortunately , change complexity is, to date, only av ailable for binary strings. As change complexity is to our knowledge not yet included in an R -package, we hav e implemented it in acss in function change complexity() . A comparison of complexity measur es There are several complexity measures based on the cod- ing theorem method, because the computation depends on the set of possible symbols the T uring machines can manip- ulate. T o date, the package provides ACSS for 2, 4, 5, 6 and 9 symbols, giving 5 di ﬀ erent measures of complexity . As we will see, ho wev er , these measures are strongly correlated. As a consequence one may use K 9 to assess the complexity of strings with 7 or 8 di ﬀ erent symbols, or K 4 to assess the complexity of a string with 3 symbols. Thus in the end any alphabet size between 2 and 9 is av ailable. Also, these mea- sures are mildly correlated to change complexity , and poorly to entropy . Any binary string can be thought of as an n -symbol string ( n ≥ 2) that happens to use only 2 symbols. For instance, “0101” could be produced by a machine that only uses 0s and 1s, but also by a machine that uses digits from 0 to 9. Hence “0101” may be viewed as a word based on the alpha- bet { 0 , 1 } , but also based on { 0 , 1 , 2 , 3 , 4 } , etc. Therefore, the complexity of a binary string can be rated by K 2 , but also by K 4 or K 5 . W e computed K n (with n ∈ { 2 , 4 , 5 , 6 , 9 } ), entropy and change complexity of all 2047 binary strings of length up to 11. T able 2 displays the resulting correlations and Figure 8 NICOLAS GA UVRIT F igure 2 . Scatter plot showing the relation between measures of complexity on every binary string with length from 1 to 11. “Change” stands for change complexity 2 shows the corresponding scatter plots. The di ﬀ erent al- gorithmic complexity estimates through the coding theorem method are closely related, with correlations above 0.999 for K 4 , K 5 , K 6 and K 9 . K 2 is less correlated to the others, but e v- ery correlation stands above 0.97. There is a mild linear rela- tion between A CSS and change complexity . Finally , entropy is only weakly linked to algorithmic and change complexity . The correlation between di ﬀ erent versions of A CSS (that is, K n ) may be partly explained by the length of the strings. Unlike entropy , A CSS is sensitiv e to the number of symbols in a string. This is not a weakness. On the contrary , if the complexity of a string is linked to the e vidence it brings to the chance hypothesis, it should depend on length. Throwing a coin to determine if it is fair and getting HTHTTTHH is K2 K4 K5 K6 K9 Ent K4 0.98 K5 0.97 1.00 K6 0.97 1.00 1.00 K9 0.97 1.00 1.00 1.00 Ent 0.32 0.36 0.36 0.37 0.38 Change 0.69 0.75 0.75 0.75 0.75 0.50 T able 2 Corr elation matrix of complexity measur es computed on all binary strings of length up to 11. “Ent” stands for entr opy , and “Change” for chang e complexity . K4 32 36 40 36 40 44 32 36 40 32 36 40 K5 K6 34 38 42 36 40 44 K9 32 36 40 34 38 42 0.8 1.2 1.6 2.0 0.8 1.2 1.6 2.0 Entropy F igure 3 . Scatterplot matrix computed with 844 randomly chosen 12-character long 4-symbol strings. more con vincing than getting just HT . Although both strings are balanced, the ﬁrst one should be considered more com- plex because it a ﬀ ords more e vidence that the coin is fair (and not, for instance, bound to alternate heads and tails). Howe ver , to control for the e ﬀ ect of length and extend our previous result to more complex strings, we picked up 844 random 4-symbol strings of length 12. Change complexity is not deﬁned for non-binary sequences, but as Figure 3 and T able 3 illustrate, the di ﬀ erent A CSS measures are strongly correlated, and mildly correlated to entropy . Applications In this last section, we provide three illustrati ve applica- tions of A CSS. The ﬁrst two are short reports of new and illustrativ e experiments and the third one is a re-analysis of previously published data (Matthe ws, 2013). Although these experiments are presented to illustrate the use of A CSS, the y also provide new insights into subjective probability and the K4 K5 K6 K9 K5 0.94 K6 0.92 0.95 K9 0.88 0.92 0.94 Entropy 0.52 0.58 0.62 0.69 T able 3 P earson’ s corr elation between complexity measur es, com- puted fr om 844 random 12-char acter long 4-symbol strings. ALGORITHMIC COMPLEXITY FOR PSYCHOLOGY : A USER-FRIENDL Y IMPLEMENT A TION OF THE CODING THEOREM METHOD. 9 24 26 28 30 32 34 Hu m an A ll F igure 4 . V iolin plot showing the distribution of complexity of human strings vs. every possible pattern of strings, with 4-symbol alphabet and length 10. perception of randomness. Note that all the data and analysis scripts for these applications are also part of acss . Experiment 1: Humans ar e “better than chance” Human pseudo-random sequential binary productions hav e been reported to be overly complex, in the sense that they are more complex than the average complexity of truly random sequences (i.e., sequences of ﬁxed length produced by repeatedly tossing a coin; Gauvrit et al., 2013). Here, we test the same e ﬀ ect with non-binary sequences based on 4 symbols. T o replicate the analysis, type ?exp1 at the R prompt after loading acss and ex ecute the examples. P articipants . A sample of 34 healthy adults participated in this experiment. Ages ranged from 20 to 55 (mean = 37 . 65, SD = 7 . 98). Participants were recruited via e-mail and did not receiv e any compensation for their participation. Methods . Participants were asked to produce at their own pace a series of 10 symbols using “ A ”, “B”, “C”, and “D” that would “look as random as possible, so that if someone else saw the sequence, she would believe it to be a truly random one”. Participants submitted their responses via e-mail. Results . A one sample t -test showed that the mean com- plexity of the responses of participants is signiﬁcantly larger that the mean complexity of all possible patterns of length 10 ( t (33) = 10 . 62 , p < . 0001). The violin plot in Figure 4 shows that human productions are more complex than random pat- terns because humans avoid low-comple xity strings. On the other hand, human productions did not reach the highest pos- sible values of comple xity . Discussion . These results are consistent with the hypothe- sis that when participants try to behave randomly , they in fact tend to maximize the complexity of their responses, leading to ov erly complex sequences. Howe ver , whereas they suc- ceed in av oiding lo w-complexity patterns, they cannot build the most complex strings. Experiment 2: The thr eshold of comple xity – a case study Humans are sensitive to regularity and distinguish truly random series from deterministic ones (Y amada, Kaw abe, & Miyazaki, 2013). More complex strings should be more likely to be considered random than simple ones. Here, we brieﬂy address this question through a binary forced choice task. W e assume that there exists an individual threshold of complexity for which the probability that the individual iden- tiﬁes a string as random is . 5. W e estimated that threshold for one participant. The participant was a healthy adult male, 42 years old. The data and code are av ailable by calling ?exp2 . Methods . A series of 200 random strings of length 10 from an alphabet of 6 symbols, such as “6154256554”, were gen- erated with the R function sample() . For each string, the participant had to decide whether or not the sequence ap- peared random. Results . A logistic regression of the actual complexities of the strings ( K 6 ) on the responses is displayed in Figure 5. The results sho wed that more complex sequences were more likely to be considered random (slope = 1 . 9, p < . 0001, cor- responding to an odds ratio of 6.69). Furthermore, a com- plexity of 36.74 corresponded to the subjectiv e probability of randomness 0.5 (i.e., the individual threshold w as 36.74). K 6 Probability "random" 0.0 0.2 0.4 0.6 0.8 1.0 34 35 36 37 38 F igure 5 . Graphical display of the logistic regression with actual complexities ( K 6 ) of 200 strings as independent v ariable and the observed responses (appears random or not) of one participant as dependent variable. The gray area depicts 95%-conﬁdence bands, the black dots at the bottom the 200 complexities. The dotted lines show the threshold where the perceived probability of randomness is 0.5. 10 NICOLAS GA UVRIT The span of local complexity In a study of contextual e ﬀ ect in the perception of random- ness, Matthews (2013, Experiment 1) showed participants series of binary strings of length 21. For each string, par- ticipants had to rate the sequence on a 6-point scale ranging from “deﬁnitely random” to “deﬁnitely not random”. Re- sults sho wed that participants were inﬂuenced by the context of presentation: sequences with medial alternation rate (AR) were considered highly random when they were intermixed with low AR, but as relativ ely non-random when intermixed with high AR sequences. In the following, we will analyze the data irrespectiv e of the context or AR. When individuals judge whether a short string of, for ex- ample, 3-6 characters, is random, they probably consider the complete sequence. For these cases, A CSS would be the right normative measure. When strings are longer , such as a length of 21, individuals probably cannot consider the com- plete sequence at once. Matthe ws (2013) and others (e.g., U. Hahn, 2014) ha ve hypothesized that in these cases indi- viduals rely on the local complexity of the string. If this were true, the question remains as to how local the analysis is. T o answer this, we will reanalyze Matthews’ data. For each string and each span ranging from 3 to 11, we ﬁrst computed the mean local complexity of the string. For instance, the string “XXXXXXXOOOOXXXOOOOOOO” with span 11 giv es a mean local complexity of 29.53. The same string has a mean local complexity of 11.22 with span 5. > sapply(local_complexity("XXXXXXXOOOOXXXOOOOOOO", 11, 2), mean) XXXXXXXOOOOXXXOOOOOOO 29.52912 > sapply(local_complexity("XXXXXXXOOOOXXXOOOOOOO", 5, 2), mean) XXXXXXXOOOOXXXOOOOOOO 11.21859 For each span, we then computed R 2 (the proportion of variance accounted for) between mean local complexity (a formal measure) and the mean randomness score given by the participants in Matthews’ (2013) Experiment 1. Figure 6 shows that a span of 4 or 5 best describes the judgments with R 2 of 54% and 50%. Furthermore, R 2 decreases so fast that it amounts to less than 0.002% when the span is set to 10. These results suggest that when asked to judge if a string is random, individuals rely on very local structural features of the strings, only considering subsequences of 4-5 symbols. This is very near the suggested limit of the short term mem- ory of 4 chunks (Cow an, 2001). Future researchers could build on this preliminary account to inv estigate the possible “span” of human observation in the face of possibly random serial data. The data and code for this application are avail- able by calling ?matthews2013 . 4 6 8 10 0. 0 0.1 0.2 0. 3 0.4 0.5 R 2 Span F igure 6 . R 2 between mean local complexity with span 3 to 11 and the subjectiv e mean evaluation of randomness. Relationship to complexity based model selection As mentioned in the beginning, complexity also plays an important role in modern approaches to model selection, speciﬁcally within the minimum description length frame- work (MDL; Gr ¨ unwald, 2007; Myung et al., 2006). Model selection in this context (see e.g. Myung, Cavagnaro, & Pitt, In press) refers to the process of selecting, among a set of candidate models, the model that strik es the best balance be- tween goodness-of-ﬁt (i.e., how well does the model describe the obtained data) and complexity (i.e., how well does the model describe all possible data or data in general). MDL provides a principled way of combining model ﬁt with a quantiﬁcation of model complexity that originated in infor - mation theory (Rissanen, 1989) and is, similar to algorithmic complexity , based on the notion of compressed code. The ba- sic idea is that a model can be vie wed as an algorithm that, in combination with a set of parameters, can produce a speciﬁc prediction. A model that describes the observed data well (i.e., provides a good ﬁt) is a model that can compress the data well as it only requires a set of parameters and there are little residuals that need to be described in addition. W ithin this framew ork, the complexity of a model is the shortest possible code or algorithm that describes all possible data pattern predicted by the model. The model selection index is the length of the concatenation of the code describing param- eters and residuals and the code producing all possible data sets. As usual, the best model in terms of MDL is the model with the lowest model selection inde x. The MDL approach di ﬀ ers from the complexity approach ALGORITHMIC COMPLEXITY FOR PSYCHOLOGY : A USER-FRIENDL Y IMPLEMENT A TION OF THE CODING THEOREM METHOD. 11 discussed in the current manuscript as it focuses on a spe- ciﬁc set of models. T o be selected as possible candidates, models usually need to satisfy other criteria such as provid- ing explanatory value, need to be a priori plausible, and the parameters should be interpretable in terms of psychologi- cal processes (Myung et al., In press). In contrast, algorith- mic complexity is concerned with ﬁnding the shortest possi- ble description considering all possible models leaving those considerations aside. Howe ver , given that both approaches are couched within the same information theoretic frame- work, MDL con verges towards algorithmic complexity if the set of candidate models becomes inﬁnite (W allace & Dowe, 1999). Furthermore, in the current manuscript we are only concerned with short strings, whereas even in compressed form data and the prediction space of a model are usually comparativ ely large. Conclusion Until the de velopment of the coding theorem method (Soler-T oscano et al., 2014; Delahaye & Zenil, 2012), re- searchers interested in short random strings were constrained to use measures of complexity that focused on particular fea- tures of randomness. This has led to a number of (unsatis- factory) measures (T o wse, 1998). Each of these previously used measures can be thought of as a way of performing a particular statistical test of randomness. In contrast, algo- rithmic complexity a ﬀ ords access to the ultimate measure of randomness, as the algorithmic comple xity deﬁnition of ran- domness has been shown to be equivalent to deﬁning a ran- dom sequence as one which would pass every computable test of randomness. W e hav e computed an approximation of algorithmic complexity for short strings and made this ap- proximation freely av ailable in the R package acss . Because human capacities are limited, it is unlikely that humans will be able to recognize every kind of deviation from randomness. Subjectiv e randomness does not equal algorithmic complexity , as Gri ﬃ ths and T enenbaum (2004) remind us. Other measures of complexity will still be useful in describing the ways in which human pseudo-random be- haviors or the subjectiv e perception of randomness or com- plexity di ﬀ er from objectiv e randomness, as deﬁned within the mathematical theory of randomness and adv ocated in this manuscript. But to achie ve this goal of comparing subjec- tiv e and objectiv e randomness, we also need an objecti ve and universal measure of randomness (or complexity) based on a sound mathematical theory of randomness. Although the uncomputability of K olmogorov complexity places some limitations on what is kno wable about objectiv e randomness, A CSS provides a sensible and practical approximation that researchers can use in real life. W e are conﬁdent that ACSS will prov e very useful as a normati ve measure of complexity in helping psychologists understand how human subjectiv e randomness di ﬀ ers from “true” randomness as deﬁned by al- gorithmic complexity . References Aksentijevic, A., & Gibson, K. (2012). Complexity equals change. Cognitive Systems Resear ch , 15-16 , 1–16. Audi ﬀ ren, M., T omporowski, P. D., & Zagrodnik, J. (2009). Acute aerobic exercise and information processing: modulation of ex- ecutiv e control in a random number generation task. Acta Psy- chologica , 132 (1), 85–95. Baddeley , A. D., Thomson, N., & Buchanan, M. (1975). W ord length and the structure of short-term memory . Journal of V erbal Learning and V erbal Behavior , 14 (6), 575–589. Barbasz, J., Stettner, Z., Wierzcho ´ n, M., Piotrowski, K. T ., & Bar- basz, A. (2008). How to estimate the randomness in random se- quence generation tasks?. P olish Psychological Bulletin , 39 (1), 42 - 46. B ´ edard, M.-J., Joyal, C. C., Godbout, L., & Chantal, S. (2009). Executi ve functions and the obsessiv e-compulsiv e disorder: On the importance of subclinical symptoms and other concomitant factors. Arc hives of Clinical Neur opsychology , 24 (6), 585–598. Bianchi, A. M., & Mendez, M. O. (2013). Methods for heart rate variability analysis during sleep. In Engineering in medicine and biology society (embc), 2013 35th annual international confer- ence of the ieee (pp. 6579–6582). Boon, J. P ., Casti, J., & T aylor , R. P . (2011). Artistic forms and complexity . Nonlinear Dynamics-Psycholo gy and Life Sciences , 15 (2), 265. Brandouy , O., Delahaye, J.-P ., Ma, L., & Zenil, H. (2012). Algorith- mic complexity of ﬁnancial motions. Resear ch in International Business and F inance , 30 (C), 336–347. Brown, R., & Marsden, C. (1990). Cogniti ve function in parkin- son’ s disease: from description to theory . T rends in Neur o- sciences , 13 (1), 21–29. Calude, C. (2002). Information and randomness. an algorithmic perspective (2nd, re vised and extended ed.). Springer -V erlag. Cardaci, M., Di Gesu, V ., Petrou, M., & T abacchi, M. E. (2009). Attentional vs computational complexity measures in observing paintings. Spatial vision , 22 (3), 195–209. Chaitin, G. (1966). On the length of programs for computing ﬁnite binary sequences. Journal of the A CM , 13 (4), 547-569. Chaitin, G. (2004). Algorithmic information theory (V ol. 1). Cam- bridge Univ ersity Press. Chater , N. (1996). Reconciling simplicity and likelihood principles in perceptual organization. Psychological Review , 103 (3), 566- 581. Chater , N., & V it ´ anyi, P . (2003). Simplicity: A unifying principle in cogniti ve science? T r ends in Cognitive Sciences , 7 (1), 19-22. Cilibrasi, R., & V it ´ anyi, P . (2005). Clustering by compression. Information Theory , IEEE T ransactions on , 51 (4), 1523–1545. Cilibrasi, R., & V it ´ anyi, P . (2007). The google similarity distance. Knowledge and Data Engineering, IEEE T ransactions on , 19 (3), 370–383. Cow an, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity . Behavioral and Brain Sciences , 24 (1), 87–114. Crov a, C., Struzzolino, I., Marchetti, R., Masci, I., V annozzi, G., Forte, R., et al. (2013). Cognitively challenging physical activ- ity beneﬁts ex ecuti ve function in overweight children. Journal of Sports Sciences , ahead-of-print , 1–11. Curci, A., Lanciano, T ., Soleti, E., & Rim ´ e, B. (2013). Ne ga- tiv e emotional experiences arouse rumination and a ﬀ ect working memory capacity . Emotion , 13 (5), 867–880. Delahaye, J.-P ., & Zenil, H. (2012). Numerical evaluation of algo- rithmic complexity for short strings: A glance into the innermost 12 NICOLAS GA UVRIT structure of randomness. Applied Mathematics and Computa- tion , 219 (1), 63–77. Downe y , R. R. G., & Hirschfeldt, D. R. (2008). Algorithmic ran- domness and complexity . Springer. Elzinga, C. H. (2010). Complexity of cate gorical time series. Soci- ological Methods & Researc h , 38 (3), 463–481. Feldman, J. (2000). Minimization of boolean complexity in human concept learning. Nature , 407 (6804), 630–633. Feldman, J. (2003). A catalog of boolean concepts. J ournal of Mathematical Psychology , 47 (1), 75-89. Feldman, J. (2006). An algebra of human concept learning. Journal of Mathematical Psychology , 50 (4), 339-368. Fern ´ andez, A., Quintero, J., Hornero, R., Zuluaga, P ., Nav as, M., G ´ omez, C., et al. (2009). Complexity analysis of spontaneous brain activity in attention-deﬁcit / hyperactivity disorder: diag- nostic implications. Biological Psychiatry , 65 (7), 571–577. Fern ´ andez, A., R ´ ıos-Lago, M., Ab ´ asolo, D., Hornero, R., ´ Alvarez- Linera, J., Paul, N., et al. (2011). The correlation between white- matter microstructure and the complexity of spontaneous brain activity: A difussion tensor imaging-meg study . Neur oimage , 57 (4), 1300–1307. Fern ´ andez, A., Zuluaga, P ., Ab ´ asolo, D., G ´ omez, C., Serra, A., M ´ endez, M. A., et al. (2012). Brain oscillatory complexity across the life span. Clinical Neurophysiolo gy , 123 (11), 2154– 2162. Fournier , K. A., Amano, S., Radonovich, K. J., Bleser, T. M., & Hass, C. J. (2013). Decreased dynamical complexity during quiet stance in children with autism spectrum disorders. Gait & P osture . Free Software Foundation. (2007). GNU general public license. Retriev ed from http://www.gnu.org/licenses/gpl.html Gauvrit, N., Soler-T oscano, F ., & Zenil, H. (2014). Natural scene statistics mediate the perception of image complexity . V isual Cognition , 22 (8), 1084-1091. Gauvrit, N., Zenil, H., Delahaye, J.-P ., & Soler-T oscano, F . (2013). Algorithmic complexity for short binary strings applied to psy- chology: A primer . Behavior Resear ch Methods , 46 (3), 732- 744. Gri ﬃ ths, T. L., & T enenbaum, J. B. (2003). Probability , algorith- mic complexity , and subjectiv e randomness. In R. Alterman & D. Kirsch (Eds.), Pr oceedings of the 25th annual conference of the cognitive science society (p. 480-485). Mahwah, NJ: Erl- baum. Gri ﬃ ths, T. L., & T enenbaum, J. B. (2004). From algorith- mic to sub- jective randomness. In S. Thrun, L. K. Saul, & B. Sch ¨ olkopf (Eds.), Advances in neur al information pr ocessing systems (V ol. 16, p. 953-960). Cambridge, MA: MIT Press. Gruber , H. (2010). On the descriptional and algorithmic comple xity of r egular languag es . Justus Liebig University Giessen. Gr ¨ unwald, P. D. (2007). The minimum description length principle . MIT press. Hahn, T ., Dresler , T ., Ehlis, A.-C., Pyka, M., Dieler , A. C., Saatho ﬀ , C., et al. (2012). Randomness of resting-state brain oscilla- tions encodes gray’ s personality trait. Neur oimage , 59 (2), 1842– 1845. Hahn, U. (2014). Experiential limitation in judgment and decision. T opics in Cognitive Science , 6 (2), 229–244. Hahn, U., Chater , N., & Richardson, L. B. (2003). Similarity as transformation. Cognition , 87 (1), 1-32. Hahn, U., & W arren, P. A. (2009). Perceptions of randomness: Why three heads are better than four . Psychological Revie w , 116 (2), 454–461. Heuer , H., Kohlisch, O., & Klein, W . (2005). The e ﬀ ects of total sleep depriv ation on the generation of random sequences of key-presses, numbers and nouns. The Quarterly Journal of Experimental Psychology A: Human Experimental Psychology , 58A (2), 275 - 307. Hsu, A. S., Gri ﬃ ths, T. L., & Schreiber , E. (2010). Subjectiv e randomness and natural scene statistics. Psychonomic Bulletin & Review , 17 (5), 624–629. Jones, O., Maillardet, R., & Robinson, A. (2009). Intr oduction to scientiﬁc pr ogr amming and simulation using R . Boca Raton, FL: Chapman & Hall / CRC. Kahneman, D., Slovic, P ., & Tversky , A. (1982). J udgment un- der uncertainty: Heuristics and biases . Cambridge Univ ersity Press. Kass, R. E., & Raftery , A. E. (1995, June). Bayes factors. Journal of the American Statistical Association , 90 (430), 773– 795. Retrieved from http://www.tandfonline.com/doi/ abs/10.1080/01621459.1995.10476572 Kellen, D., Klauer , K. C., & Br ¨ oder , A. (2013). Recognition mem- ory models and binary-response ROCs: a comparison by mini- mum description length. Psychonomic Bulletin & Review , 20 (4), 693–719. K oike, S., T akizawa, R., Nishimura, Y ., Marumo, K., Kinou, M., Kawakubo, Y ., et al. (2011). Association between sev ere dorso- lateral prefrontal dysfunction during random number generation and earlier onset in schizophrenia. Clinical Neur ophysiology , 122 (8), 1533–1540. K olmogorov , A. (1965). Three approaches to the quantitativ e def- inition of information. Problems of Information and T ransmis- sion , 1 (1), 1-7. Lai, M.-C., Lombardo, M. V ., Chakrabarti, B., Sadek, S. A., Pasco, G., Wheelwright, S. J., et al. (2010). A shift to randomness of brain oscillations in people with autism. Biological Psychiatry , 68 (12), 1092–1099. Levin, L. A. (1974). Laws of information conservation (nongrowth) and aspects of the foundation of probability theory . Problemy P eredac hi Informatsii , 10 (3), 30–35. Li, M., & V it ´ anyi, P . (2008). An intr oduction to kolmogor ov com- plexity and its applications . Springer V erlag. Loetscher , T ., & Brugger , P . (2009). Random number generation in neglect patients reveals enhanced response stereotypy , but no neglect in number space. Neur opsychologia , 47 (1), 276 - 279. Machado, B., Miranda, T ., Morya, E., Amaro Jr , E., & Sameshima, K. (2010). P24-23 algorithmic complexity measure of EEG for staging brain state. Clinical Neur ophysiology , 121 , S249–S250. Maes, J. H., V issers, C. T ., Egger , J. I., & Eling, P. A. (2012). On the relationship between autistic traits and executi ve functioning in a non-clinical Dutch student population. Autism , 17 (4), 379– 389. Maindonald, J., & Braun, W. J. (2010). Data analysis and graph- ics using R: An example-based appr oach (3rd ed.). Cambridge: Cambridge Univ ersity Press. Manktelow , K. I., & Over , D. E. (1993). Rationality: Psychological and philosophical perspectives. T aylor & Frances / Routledge. Martin-L ¨ of, P . (1966). The deﬁnition of random sequences. Infor- mation and contr ol , 9 (6), 602–619. Mathy , F ., & Feldman, J. (2012). What’ s magic about magic num- bers? Chunking and data compression in short-term memory . Cognition , 122 (3), 346–362. Matlo ﬀ , N. (2011). The art of R pr ogramming: A tour of statistical softwar e design (1st ed.). San Francisco: No Starch Press. Matthews, W . (2013). Relatively random: Context e ﬀ ects on per- ceiv ed randomness and predicted outcomes. Journal of Exper- ALGORITHMIC COMPLEXITY FOR PSYCHOLOGY : A USER-FRIENDL Y IMPLEMENT A TION OF THE CODING THEOREM METHOD. 13 imental Psychology: Learning, Memory , and Cognition , 39 (5), 1642-1648. Miller , G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Re view , 63 (2), 81–97. Myung, J. I., Cav agnaro, D. R., & Pitt, M. A. (In press). New hand- book of mathematical psychology , vol. 1: Measurement and methodology . In W. H. Batchelder, H. Colonius, E. Dzhafarov , & J. I. Myung (Eds.), (chap. Model ev aluation and selection). Cambridge Univ ersity Press. Myung, J. I., Nav arro, D. J., & Pitt, M. A. (2006). Model selection by normalized maximum likelihood. Journal of Mathematical Psychology , 50 (2), 167–179. Naranan, S. (2011). Historical linguistics and evolutionary genetics. based on symbol frequencies in tamil texts and dna sequences. Journal of Quantitative Linguistics , 18 (4), 337–358. Nies, A. (2009). Computability and randomness (V ol. 51). Oxford Univ ersity Press. Over , D. E. (2009). New paradigm psychology of reasoning. Think- ing & Reasoning , 15 (4), 431–438. Pearson, D. G., & Sawyer , T . (2011). E ﬀ ects of dual task interfer- ence on memory intrusions for a ﬀ ective images. International Journal of Co gnitive Therapy , 4 (2), 122–133. Proios, H., Asaridou, S. S., & Brugger , P . (2008). Random number generation in patients with aphasia: A test of executi ve func- tions. Acta Neuropsyc hologica , 6 , 157-168. Pureza, J. R., Gonc ¸ alves, H. A., Branco, L., Grassi-Oliveira, R., & Fonseca, R. P . (2013). Executiv e functions in late childhood: Age di ﬀ erences among groups. Psyc hology & Neur oscience , 6 (1), 79–88. R Core T eam. (2014). R: A language and environment for statis- tical computing [Computer software manual]. V ienna, Austria. Retriev ed from http://www.R-project.org/ Rado, T . (1962). On non-computable functions. Bell System T ech- nical Journal , 41 , 877-884. Rissanen, J. (1989). Stochastic complexity in statistical inquiry theory . W orld Scientiﬁc Publishing Co., Inc. Ryabko, B., Rezniko va, Z., Druzyaka, A., & P antelee va, S. (2013). Using ideas of K olmogorov complexity for studying biological texts. Theory of Computing Systems , 52 (1), 133–147. Scafetta, N., Marchi, D., & W est, B. J. (2009). Understanding the complexity of human gait dynamics. Chaos: An Inter disci- plinary Journal of Nonlinear Science , 19 (2), 026108. Schnorr , C.-P . (1973). Process complexity and e ﬀ ective random tests. Journal of Computer and System Sciences , 7 (4), 376–388. Schulter , G., Mittenecker , E., & Papousek, I. (2010). A computer program for testing and analyzing random generation behavior in normal and clinical samples: The Mittenecker pointing test. Behavior Resear ch Methods , 42 , 333-341. Scibinetti, P ., T occi, N., & Pesce, C. (2011). Motor creativity and creativ e thinking in children: The diver ging role of inhibition. Cr eativity Research J ournal , 23 (3), 262–272. Shannon, C. E. (1948). A mathematical theory of communication, part I. Bell Systems T echnical J ournal , 27 , 379–423. Sokunbi, M. O., Fung, W ., Sawlani, V ., Choppin, S., Linden, D. E., & Thome, J. (2013). Resting state fMRI entropy probes com- plexity of brain activity in adults with ADHD. Psychiatry Re- sear ch: Neur oimaging , 214 (3), 341–348. Soler-T oscano, F ., Zenil, H., Delahaye, J.-P ., & Gauvrit, N. (2013). Correspondence and independence of numerical ev aluations of algorithmic information measures. Computability , 2 (2), 125- 140. Soler-T oscano, F ., Zenil, H., Delahaye, J.-P ., & Gauvrit, N. (2014). Calculating K olmogorov complexity from the output frequency distributions of small turing machines. PLOS One , 9 (5), e96223. Solomono ﬀ , R. J. (1964a). A formal theory of inductiv e inference. Part I. Information and Control , 7 (1), 1–22. Solomono ﬀ , R. J. (1964b). A formal theory of inductive inference. Part II. Information and Control , 7 (2), 224–254. T akahashi, T . (2013). Complexity of spontaneous brain activity in mental disorders. Pro gress in Neur o-Psychopharmacology and Biological Psychiatry , 45 , 258–266. T aufemback, C., Giglio, R., & Da Silva, S. (2011). Algorithmic complexity theory detects decreases in the relativ e e ﬃ ciency of stock markets in the aftermath of the 2008 ﬁnancial crisis. Eco- nomics Bulletin , 31 (2), 1631–1647. T owse, J. N. (1998). Analyzing human random generation behav- ior: A revie w of methods used and a computer program for de- scribing performance. Behavior Researc h Methods , 30 (4), 583- 591. T owse, J. N., & Cheshire, A. (2007). Random number generation and working memory . European Journal of Cognitive Psychol- ogy , 19 (3), 374-394. Tversk y , A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science , 185 (4157), 1124-1131. W agenaar, W. A. (1970). Subjective randomness and the capacity to generate information. Acta Psychologica , 33 , 233-242. W allace, C. S., & Dowe, D. L. (1999). Minimum message length and Kolmogoro v complexity . The Computer Journal , 42 (4), 270–283. W atanabe, T ., Cellucci, C., Kohe gyi, E., Bashore, T ., Josiassen, R., Greenbaun, N., et al. (2003). The algorithmic complexity of multichannel EEGs is sensitiv e to changes in behavior . Psy- chophysiology , 40 (1), 77–97. W iegersma, S. (1984). High-speed sequantial vocal response pro- duction. P er ceptual and Motor Skills , 59 , 43-50. W ilder, J., Feldman, J., & Singh, M. (2011). Contour complexity and contour detectability . Journal of V ision , 11 (11), 1044. W illiams, J. J., & Gri ﬃ ths, T. L. (2013). Why are people bad at detecting randomness? a statistical argument. Journal of Exper - imental Psychology: Learning, Memory , and Cognition , 39 (5), 1473–1490. Y agil, G. (2009). The structural complexity of dna templatesimpli- cations on cellular complexity . Journal of Theoretical Biology , 259 (3), 621–627. Y amada, Y ., Kawabe, T ., & Miyazaki, M. (2013). Pattern random- ness aftere ﬀ ect. Scientiﬁc Reports , 3 . Y ang, A. C., & Tsai, S.-J. (2012). Is mental illness complex? From behavior to brain. Pr ogress in Neur o-Psychopharmacolo gy and Biological Psychiatry , 45 , 253–257. Zabelina, D. L., Robinson, M. D., Council, J. R., & Bresin, K. (2012). Patterning and nonpatterning in creativ e cognition: In- sights from performance in a random number generation task. Psychology of Aesthetics, Creativity , and the Arts , 6 (2), 137– 145. Zenil, H. (2011a). Randomness through computation: Some an- swers, mor e questions . W orld Scientiﬁc. Zenil, H. (2011b). Une appr oche exp ´ erimentale de la th ´ eorie algo- rithmique de la complexit ´ e . Unpublished doctoral dissertation, Univ ersidad de Buenos Aires. Zenil, H., & Delahaye, J.-P . (2010). On the algorithmic nature of the world. In G. Dodig-Crnkovic & M. Burgin (Eds.), Information and computation (p. 477-496). W orld Scientiﬁc. Zenil, H., & Delahaye, J.-P . (2011). An algorithmic information 14 NICOLAS GA UVRIT theoretic approach to the behaviour of ﬁnancial markets. Jour - nal of Economic Surve ys , 25 (3), 431–463. Zenil, H., Soler-T oscano, F ., Delahaye, J., & Gauvrit, N. (2012). T wo-dimensional Kolmogorov complexity and valida- tion of the coding theorem method by compressibility . CoRR , abs / 1212.6745 . Retriev ed from 1212.6745

Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment