Inferring physical laws by artificial intelligence based causal models
The advances in Artificial Intelligence (AI) and Machine Learning (ML) have opened up many avenues for scientific research, and are adding new dimensions to the process of knowledge creation. However, even the most powerful and versatile of ML applic…
Authors: Jorawar Singh, Kishor Bharti, Arvind
Inferring ph ysical la ws b y artificial in telligence based causal mo dels Jora war Singh, 1 , ∗ Kishor Bharti, 2 , † and Arvind 1, 3 , ‡ 1 Dep artment of Physic al Scienc es, Indian Institute of Scienc e Educ ation and R ese ar ch (IISER) Mohali, Se ctor 81 SAS Nagar, Manauli PO 140306 Punjab, India 2 Institute of High Performanc e Computing (IHPC), A gency for Scienc e, T e chnology and R esear ch (A*ST AR), 1 F usionop olis Way, # 16-16 Connexis, Singap or e 138632, R epublic of Singap or e 3 Punjabi University, Patiala, 147002, Punjab, India The adv ances in Artificial In telligence (AI) and Mac hine Learning (ML) ha ve op ened up man y a ven ues for scientific research, and are adding new dimensions to the pro cess of knowledge cre- ation. Ho wev er, ev en the most p ow erful and v ersatile of ML applications till date are primarily in the domain of analysis of asso ciations and b oil down to complex data fitting. Judea Pearl has p oin ted out that Artificial General In telligence must in v olve in terven tions inv olving the acts of do- ing and imagining. Any mac hine assisted scientific discov ery thus must include causal analysis and in terven tions. In this con text, w e prop ose a causal learning mo del of physical principles, whic h not only recognizes correlations but also brings out causal relationships. W e use the principles of causal inference and interv entions to study the cause-and-effect relationships in the context of some w ell-known physical phenomena. W e show that this technique can not only figure out asso ciations among data, but is also able to correctly ascertain the cause-and-effect relations amongst the v ari- ables, thereby strengthening (or weak ening) our confidence in the prop osed mo del of the underlying ph ysical pro cess. I. INTR ODUCTION Artificial In telligence (AI), sp ecifically through its Mac hine Learning (ML) form, has b een successfully applied to a wide range of fields including agricul- ture, so cial media, gaming, and rob otics [ 1 , 2 ]. ML pla ys a significant role in autonomous driving, natu- ral language pro cessing, finance, health care, under- standing the human genome, manufacturing, energy harv esting, and muc h more [ 2 , 3 ]. ML has also lent a hand to the scien tific com- m unity and has found quite a few applications in scien tific research. In ph ysics sp ecifically , ML has b een used to explore many-bo dy ph ysics [ 4 , 5 ], glassy dynamics [ 6 ], learning phases of matter [ 7 – 9 ], de- signing new exp erimen ts [ 10 – 13 ], to interpret na- ture [ 14 , 15 ], quan tum foundations [ 16 ], quantum state tomography [ 17 , 18 ], phase transition [ 19 , 20 ], quan tum matter [ 21 ], Mon te Carlo sim ulation [ 22 ], p olymer states [ 23 ], topological co des [ 24 ], the study of blac k hole detection [ 25 ], quan tum circuit opti- mization and control [ 26 , 27 ], anti-de Sitter/con- formal field theory (AdS/CFT) corresp ondence [ 28 ], quan tum state preparation [ 29 , 30 ], thermo dynam- ics [ 31 ], gra vitational lenses [ 32 ], c haracterizing the landscap e of string theories [ 33 ], and w a ve analy- sis [ 34 ], to name a few. An important aim for mac hine-assisted scien tific disco very , proposed in the ∗ ph19023@iisermohali.ac.in † kishor.bharti1@gmail.com ‡ arvind@iisermohali.ac.in seminal work b y Iten et. al. where they prop ose a neural net work architecture modeled after the hu- man physical reasoning pro cess [ 35 ]. The currently prev alen t ML architectures primar- ily identify correlations and associations in data and th us the mo dels only uncov er direct connections in the data. Based on the asso ciations one must learn the causal mo del and the general AI systems should b e able to uncov er the underlying causal structures. Therefore, to fully realize the p oten tial of artificial general in telligence, one needs to incorp orate the essence of cognition within the scop e of ML. Judea P earl [ 36 ] divides this cognitiv e abilit y in to three dis- tinct lev els as depicted in Fig. 1 , distinguished by the t yp e of query b eing answered, the levels are termed as: Association, Interv en tion, and Counterfactuals. The first lev el(asso ciation) of the ladder describ ed in Fig. 1 inv olv es predictions based on passiv e ob- serv ations of the data, i. e. data-cen tric search for correlations, asso ciations, regularities and patterns. This lev el answ ers queries p ertaining to observ ations as to what can b e found in the data. The second lev el(interv en tion), inv olves analysis of resp onse to c hange in v ariables. Rather than just observing the data, one queries the effect of an induced change, and thus one is looking at a cause-and-effect rela- tionship in the v ariables of the data. The final level utilizes the causal structure to estimate p ortions of data that don’t exist or cannot b e observed. It an- sw ers queries related to the hypothetical questions that one may imagine - the “what if ” questions. Therefore this level inv olves Counterfactuals. One th us sees that most applications of ML in sci- 2 FIG. 1: The Ladder of Causation depicting the 3 lev els of cognitive ability . Present day ML is at ‘asso ciation’, the lo west level. A machine capable of understanding causal structures would b e placed at ‘interv en tion’ level while the more sophisticated AI will also op erate at the level of contrafactuals ence are basically at the first level of the ladder. F or example, in the context of the spring-mass vibrat- ing system, ML can find the relationship b etw een the length of the spring and the weigh t attached to it. Ho w ever ML mo dels cannot answer the ques- tion, “is the c hange in spring length caused b y the c hange in w eigh t or vice-v ersa”. Causal Inference tak es us a step ab ov e on the ladder of causation and lets us answer such questions. Once armed with the kno wledge of causal relations, one can b egin explor- ing the counterfactuals leading to a framew ork whic h then b ecomes a motif for form ulating the la ws of na- ture. Posed a bit differently: “Had the weigh t on this spring doubled, its length w ould hav e doubled as well” (Ho ok e’s law). - Judea P earl [ 36 ]. W e b egin by studying the basics of causal discov- ery and causal inference in Section- I I . In Section- I I I w e analyze the causality relations of some physical phenomenon. The examples that w e consider in- clude tide height, Ohm’s law, ligh t dep enden t resis- tance (LDR) characteristics, and quantum measure- men t correlations. Finally , w e close with a discussion on the results and possible paths ahead, in Section- IV . I I. CA USAL DISCOVER Y AND INFERENCE Causal inference refers to the process of answer- ing questions based on the underlying causal mo del of the cause-and-effect relationship b etw een differ- en t v ariables of the data. As seen from the ladder of causation (Fig. 1 ), causality relates to response to in terven tions. W e do a certain action and observe a certain resp onse. The limitations of correlations and imp ortance of causal relations can be easily understoo d from a sim- ple exp eriment of the atmospheric pressure reading of a barometer [ 36 ]. While there is a direct corre- lation b etw een the barometer reading and pressure, this correlation cannot in itself establish the causal relationship. Is it the barometer reading that causes the atmospheric pressure to c hange or is it the atmo- spheric pressure that causes the barometer reading to change? One requires the knowledge of causal re- lations to conclude that it is the pressure that causes the change in reading leading to the observed corre- lation and not the other w ay around. FIG. 2: A directed acyclic graph with no des represen ting v ariables and arrows showing the cause-and-effect relationship b et ween v ariables. Statistical algorithms are used to infer the causal structure from observ ational data. The model is assumed to b e acyclic where a Directed Acyclic graph(D AG) can be used to depict the causal re- lationships as shown in Fig. 2 . The no des represen t the v ariables and arro ws depict the cause-and-effect relations. The mo del is considered to b e Marko- vian where a given no de is conditioned on its im- mediate parents only . The mo del is assumed to satisfy the conditions of Sufficiency and F aith- fulness which resp ectively mean that there exists no external common cause to any pair of no des in the graph and all conditional indep endences (from the underlying distribution) are completely repre- sen ted in the graph. Most algorithms for causal disco very w ork with the assumption that statisti- cal indep endence implies the absence of causal rela- 3 tion [ 37 ]. Sp ecifically , the Peter Spirtes and Clark Glymour(PC) algorithm uses the conditional inde- p endence testing criterion to generate a D AG from a fully connected graph [ 38 ], while the Greedy Equiv a- lence Searc h(GES) algorithm applies a greedy searc h in the graph space to fill an empt y graph while max- imizing a fitness measure [ 39 ]. Exploiting the asym- metries in models, LiNGAM (Linear non-Gaussian Acyclic Models) prioritizes the mo dels that better fit a Linear Non-Gaussian relation among the v ari- ables [ 40 ]. The final goal of causal discov ery pro cess is to arrive at the D AG from the giv en data set. The standard statistics works with correlations whic h means working with probability of Y given X denoted by P ( Y | X ). Causal inference on the other hand works with probability of Y giv en that X is done denoted b y P ( Y | do ( X )) - the do-calculus [ 36 ]. This ‘do’, though a small change from statistics, is the represen tation of an in terv ention. The differ- ence from standard ML predictions is that here we are appro ximating the effect of treatment X on the outcome Y , based on data that do es not exist in the data-set. The basic idea b ehind causal inference is to esti- mate the effect of treatment X on the outcome Y while eliminating the dep endence on any v ariable Z that has a direct influence on both the treatmen t and the outcome (confounding v ariables). This is sc hematically explained in Fig. 3 . Many metho ds exist for the estimation of the effect that is pro- duced b y doing X . These include observ ational studies (conducting and simulating randomized ex- p erimen ts), simple natural exp erimen ts, instrument v ariables (sp ecific causal effect estimation criteria) and refutations. These metho ds are explained in de- tail in Reference [ 41 ]. FIG. 3: Basic aim of causal inference is to estimate the effect of a treatment X on the outcome Y while con trolling for the confounding v ariables Z W e use Causal Discov ery T o olkit (CDT) [ 42 ] to obtain causal models directly from the data and DoWh y , “An end-to-end library for causal infer- ence” [ 43 ], to carry out the causal analysis. The basic analysis inv olves the following steps: 1. Creating a causal mo del: W e create an initial mo del of the phenomena that w e are studying as a Directed Acyclic Graph. The DA G is input in to the DoWh y library as a dot graph (a textual represen- tation of the graph using DOT Language) [ 44 ]. This initial mo del is either extracted from the data using CDT or from domain knowledge. 2. Causal effect iden tification: Based on causal mo del, we iden tify the causal effects to b e estimated using a suitable criterion among the following: a: Back-door: Con trolling for the set of v ari- ables that block all the back-door paths b e- t ween the treatmen t and the outcome. A Bac k-do or path is any path connecting treat- men t to outcome via an arrow in ward on the treatmen t. In Fig. 4 , X ← Z → Y is a back- do or path from treatmen t X to outcome Y . Adjusting for the v ariable Z will b e the back- do or criterion: P ( Y | do ( X )) = X z P ( Y | X , z ) P ( z ) FIG. 4: A sample causal mo del as a DA G. X is the treatmen t, Y the outcome, Z a confounder, and W a mediator. X ← Z → Y constitutes the backw ard path while X → W → Y is the forw ard path from treatmen t to outcome b: F ron t-do or: Con trolling of v ariables in the forw ard path from the treatment to outcome. In Fig. 4 , X → W → Y is the fron t-do or path from treatment to outcome. Adjusting for v ariables X and W will b e the front-door criterion: P ( Y | do ( X )) = X w P ( w | X ) X x P ( Y | x, w ) P ( x ) c: Instrumental v ariables [ 41 ]: A sp ecial case of the fron t-do or criteria, this metho d helps in iden tifying the direct causal estimate from X to Y when the back-door criterion fails (e.g. - obtaining data on Z is not p ossible, and hence Z cannot b e controlled for). This metho d can only b e applied if there exists a v ariable which is indep enden t of confounders of treatment and outcome, has a direct relation with the treat- men t, and has no direct effect on the outcome as depicted in Fig. 5 . d: Mediation: This metho d is applied when the treatmen t has multiple causal pathw a ys to the 4 FIG. 5: Causal mo del for Instrumen tal V ariable. Since W is indep endent of confounder Z , and has direct effect on treatment X , and has no direct effect on the outcome Y , it can b e used to estimate the effect of X on Y (giv en that controlling for Z is not p ossible) outcome as shown in Fig. 6 . It enables us to separate the total effect on Y in to direct ( X → Y ) and indirect ( X → W → Y ) causal estimates. FIG. 6: Mediation causal mo del. The treatment X has tw o causal pathw ays to the outcome Y , direct ( X → Y ) and indirect ( X → W → Y ) via the mediator W 3. Estimate the target estimand: Many statisti- cal methods exist for estimating the inden tified casal effect. Dep ending on the iden tification criteria one can use linear regression, distance matching, prop en- sot y score stratification [ 45 ] for backdoor; wald es- timator [ 46 ], regression discontin uity [ 47 ] for instru- men t v ariable; tw o-stage linear regression for fron t- do or and so on. The estimate is obtained in units of Average T reatment Effect (A TE), Average treat- men t effect for the treated (A TT) or Av erage treat- men t effect for the controls (A TC). 4. Refute the obtained estimate using multi- ple robustness c hec ks: Causal mo dels are not ab- solute, as they cannot b e pro ven to b e correct or in- correct. One can how ever, increase faith in a mo del b y chec king the v alidity of the assumptions b ehind the mo del against v arious robustness chec ks which include: a: Random Common Cause: Chec k the v ari- ation of estimate ov er addition of an indep en- den t random common cause. Lesser the v aria- tion, higher our faith in the mo del. b: Placeb o T reatmen t Refuter: Rerunning the analysis with an indep enden t random v ari- able as the treatment v ariable. If the initial treatmen t is in fact the cause, the new esti- mate should go to zero. c: Data Subset Refuter: How mu ch is the v ari- ation in estimate when only a subset of the data is used? The v ariation is small for a strong causal relation. I II. EXAMPLES This section describes our main w ork where we ha ve chosen four different examples to build causal mo dels. F or each case we consider different p ossible causal mo dels and ev aluate their relative efficacy by emplo ying the metho ds describ ed ab o v e. The exam- ples are c hosen from diverse fields. The first exam- ple of tides and cause of their v arying height ov er the year is ab out a natural phenomenon where data is taken from do cumented sources. The second ex- ample is ab out a physics model inv olving Ohm’s law and direct and indirect dependence of curren t on v arious p ossible parameters. The third example is ab out an actual exp eriment where we collect data for a light dep endent resistance(LDR) and consider v arious possible causal mo dels for it, whic h we ev alu- ate and compare using data and domain knowledge. In the last example we consider quan tum correla- tions in the t wo-part y t w o-v alue setting, and ask the question as to what is the most plausible cause of these non-trivial quantum correlations. A. Heigh t of Tides It is a well known fact that tides, the rise and fall of sea levels, are a cum ulativ e effect of Sun and Mo on’s gra vitational force on Earth, among other minor factors. W e ask a ML mo del, whic h of the t wo - Sun or Mo on - pla ys a bigger role in determining the maxim um height of the tide on a given day . T o that end, w e prepare a data-set with daily Earth-Sun distance in astronomical units(A U), Earth-Mo on distance (in AU), and the maximum height of tide at four differen t lo cations- Honolulu (Haw aii), Mumbai (India), Liverpo ol (England), Halifax (Canada) W e collected the y ear round data of earth-mo on distance, earth-sun distance, and tide heigh t from do cumen ted sources. The earth-sun distance for a giv en day of the year is obtained from the csv file a v ailable on the USGS w ebpage . A sample of the dataset is shown in Fig. 7 The earth-mo on distance is extracted from IM- CCE VIR TUAL OBSER V A TOR Y . Fig. 8 shows the sample of the table generated at the webpage. The data of tide height is av ailable as PDF file on the 5 FIG. 7: Sample Earth-Sun distance data. The data includes day of the year ( DO Y ) and the Earth-Sun distance d (in AU) FIG. 8: Sample Earth-Mo on distance data, 2019. The website generates the ephemeris data for the Mo on. NO AA website . Fig. 9 sho ws a sample of the PDF. A t any given day , the tide heights were recorded 3-4 times. W e used the maximum v alue of height (in ft) for a given day . FIG. 9: Sample Tidal data for Liverpo ol, England, 2019. Eac h b ox of a given date records the time (in hours and minutes) height of high and low tides (in feet and cm) W e prepared the mo dels as described in Fig. 10 and computed tw o causal es timates with tide-height as the outcome. W e used the Earth-Mo on distance as target for the first estimate and Earth-Sun dis- tance for the second. (a) F rom LiNGAM algorithm in CDT (b) F rom Domain Kno wledge FIG. 10: Causal diagram for tide height. d E S and d E M represen t the earth-sun and earth-mo on distance resp ectively The causal diagram predicted from data only gives us the d E M → h causal relation, which is in fact the most significant one. The estimates (in A TE) from the predicted and ground-truth mo dels differ marginally: -2964.45 and -2913.16 respectively (for Halifax). The causal-estimates for Earth-Mo on and Earth-Sun distance (in A TE) for the ground-truth mo del are listed in table I . It is clearly visible from Causal Relation Estimate (A TE) Halifax Liverpo ol Honolulu Mum bai d E M → h -2913.16 -10045.83 -1205.91 -7232.59 d E S → h -2.20 -8.62 -3.34 -22.15 T ABLE I: Causal estimates for d E M → h and d E S → h at the four lo cations. The A TE v alues are sho wn for Earth-Mo on and Earth-Sun for all four lo cations. the estimates and from the causal diagram obtained from data, that the Earth-Mo on distance is the pri- mary cause for the tide heigh t. B. Ohm’s law In this example, w e look for the driving forces (cause) of the current I in a wire of length L , cross- sectional area A , resistivit y ρ , at a temp erature T , with a p otential V applied across its ends. Using causal analysis one can test the v alidit y of a given cause and effect relation. T o that end, we consider and chec k the v alidity of a mo del with a direct T → I arm added in addition to the known dep endence of I on T via R . Different causal-mo dels that we ev al- uate are depicted in Fig. 12 . 6 Using the known relations (Eq. ( 1 )) b etw een cur- ren t and voltage, and the temp erature dependence of resistance, we generate the required data. W e use platin um as the material for our constants ( α, ρ 0 ). Fig. 11 shows a sample input data. V = I R R = ρ t L A ρ t = ρ 0 (1 + α ∆ T ) (1) FIG. 11: Sample of the data-set used in the analysis. Curren t I resulting from Poten tial V applied across a wire of length L , resistance R (resistivit y ρ , cross-section area A ) at tempe rature T The candidate causal models depicted in Fig. 12 are ev aluated and estimates in terms of A TE v alues are computed which are tabulated in T able I I . W e see that the ma jor driving force is p oten tial V , with resistance showing an inv erse relation as exp ected. (a) including faux-common- cause (T) (b) false relation mo del - rejected while refuting (c) true relation mo del - no refutations FIG. 12: Causal diagram for Ohm’s law. F or a wire with p otential V across its length L , resistivity ρ , cross-section area A at temp erature T W e observ e that the effect of T on I is not only non-zero, but equiv alen t to that of R . The fact that this effect follows not from the direct T → I path, but the T → ρ → R → I path is confirmed by esti- mating the same effect of T on I using a causal mo del whic h do es not hav e the T → I path (Fig. 12c ). W e get the same A TE v alue of 0.218. One can also c heck the effect by removing the other branch: T → ρ (Fig. 12b ). This results in an estimate(A TE) of 1.35, but during the placeb o treat- men t refutation, the new estimated effect in terms of A TE v alues, whic h should b e 0, comes out to b e -10.54 and th us shows that this mo del is less trust- w orthy . Causal Relation Estimate (A TE) Mo del A Model B Model C V → I 1.735 1.735 1.735 R → I -0.205 -0.225 -0.205 T → I 0.218 1.35 0.218 T ABLE I I: Causal estimates for different causal relations in the three mo dels of Ohm’s La w C. P ow er and LDR Resistance Next we p erform causal analysis of real data obtained from an exp eriment. A light emitting dio de(LED) light source, runs using a battery at v oltage V and dra ws curren t I . The ligh t emit- ted b y the LED shines on and light dependent re- sistance(LDR) and this pro vides p ow er P to LDR thereb y c hanging its resistance R . The circuit is de- scrib ed in Fig. 13 . FIG. 13: Circuit Diagram for the LDR Exp erimen t. The LED and LDR are placed in a closed b ox at a fixed distance from eac h other. The LED is supplied with a v ariable v oltage. The v oltmeter measures the voltage across the LED, the ammeter measures the current through the LED, and the ohmmeter measures the resistance of LDR. The exp eriment is rep eated with flux meter in place of LDR to obtain p o w er readings. 7 (a) F rom LiNGAM algorithm in CDT (b) F rom Domain Kno wledge FIG. 14: Causal diagrams for the LDR exp erimen t. V oltage Current Po wer Resistance (V) (mA) (lux) (kΩ) 2.67 100.3 5 37.000 2.90 104.7 9 25.400 3.17 109.9 15 17.800 3.68 119.2 36 6.800 3.84 122.2 47 5.790 . . . . . . . . . . . . 7.06 172.0 847 0.591 7.22 174.1 923 0.558 7.70 179.2 1158 0.459 7.86 181.5 1266 0.435 8.00 183.6 1386 0.413 T ABLE I I I: Sample data from the exp eriment. At eac h v oltage setting, the curren t through LED is measured in mA and the LDR’s resistance is measured in kΩ. The exp eriment is rep eated with LDR replaced by a flux meter to measure p ow er. The model obtained from data (Fig. 14a ) suggests p oten tial V as the cause for b oth p o w er P and cur- ren t I , and finds no direct causal relation b et ween p o w er P and the LDR resistance R . The model as exp ected provide only the most significant cause- effect relations (T able IV ). W e know, as depicted in the domain knowledge mo del (Fig. 14b ), that cur- ren t I acts as a mediator for V ’s effect on P . This b ecomes clear when we compare refutations of mo d- els with and without I → P arm. Refutations sug- gest that we put more faith in the model with I → P arm (p-v alue 0.912) than the ones without this arm (p-v alue 0.882) (T able V ). The analysis also suggest that we put more faith in the mo del which includes the P → R arm ov er the one that do es not contain this arm. D. Measuremen t correlation and quan tum en tanglement The last example we choose is from the domain of quan tum mechanics. Quan tum states of comp osite Mo del Estimates (A TE) V → P V → I P → R Data 251.533 15.42 - Domain 251.533 15.41 -0.008 Kno wledge T ABLE IV: Causal estimates of three causal relations for data and the domain knowledge mo dels in the LDR exp erimen t P → R present P → R absent I → P present 0.928 0.896 I → P absent 0.897 0.867 T ABLE V: Confidence levels of V → P causal estimate for different causal mo dels in the LDR exp erimen t systems can show p eculiar kinds of correlations. W e analyze these correlations from the p oin t of view of constructing a causal mo del. A quan tum spin half particle is a tw o level quan- tum system with its state space consisting of normal- ized densities ov er a t wo dimensional complex linear v ector space. [ 48 ] The measurables for eac h par- ticle are spin comp onen ts in any direction and the spin comp onen t takes t wo v alues ‘up’(1) or ‘down’(0) when measured and are the eigen v alues of the corre- sp onding Hermitian op erator. F or example if we are measuring the z component of spin the corresp ond- ing observ able is the P auli matrix σ z . The scenario that w e consider consists of t wo spin half particles whic h are in a joint quan tum state ρ . Alice and Bob are tw o observ ers with the capabilit y of measuring spin comp onents and the first particle is accessible to Alice while the second is accessible to Bob. The scenario is schematically depicted in Fig. 15 . Consider the case where b oth Alice and Bob mea- sure the spin of their resp ectiv e particle along the z -axis which corresp onds to measuring the op erator σ z in the appropriate state space. F or eac h the p os- sible outcomes thus are 0 or 1. Therefore, the joint measuremen t outcome for the comp osite system will b e in the set { 00,01,10,11 } . One can compare this situation to the one of tossing t wo coins in the clas- sical domain where the outcome set is the same and if the coins are un biased, the probability of each out- come will b e equal. Quan tum states hav e a prop erty called quan tum en tanglement [ 49 ] which is considered to b e resp on- sible for unusual correlation prop erties of comp osite quan tum systems. The entanglemen t can b e math- ematically computed from the giv en density op era- tor and can b e quan tified via a measure called log- 8 negativit y [ 50 ]. F or certain maximally entangled states the outcomes can b e suc h that they alwa ys ei- ther fall in the set { 01,10 } or the set { 00,11 } , i.e. the outcomes are alwa ys (anti)correlated. This scenario is schematically describ ed in Fig. 15 . The data set FIG. 15: Alice and Bob with a shared quan tum state ρ of tw o spin half particles. Eac h measures the spin of their particle along the z -axis and gets one of tw o p ossible outcomes (0 or 1). the we analyze is generated by simulating the mea- suremen t setup b et ween Alice and Bob. A state of the comp osite quantum system ρ is generated ran- domly . On this randomly generated state, b oth Al- ice and Bob, p erform a σ z measuremen t. They re- p eat these measurements on the state 100 times and these 100 measured v alues are used to compute the correlation. The en tanglemen t is computed from the state density matrix mathematically b y computing the log-negativity . This pro cess is repeated for an- other randomly generated comp osite state of the t wo spins. Tw ent y suc h random states are chosen and th us a data-set with 2000 ro ws is generated with 100 ro ws corresp onding to a given random state ρ . The data set is sc hematically describ ed in T able VI . As can be seen for each ρ we hav e 100 ro ws whic h are used to calculate the correlations and ha ve mathe- matically computed log-negativities as do cumen ted in the second column. Where is the cause of the correlation b et ween the measured v alues of Alice and Bob? The initial causal disco very attempts failed to rev eal any relations b e- t ween the v ariables. Up on further inv estigation into the data, w e find that the present scenario is a sp ecial case where the v ariables of interest, though causally link ed (as we kno w from Domain knowl- edge), hav e zero correlation b etw een them. En tan- glemen t ranging from 0 to 1, while Correlation rang- ing from -1 to +1 creates a case where the av erage correlation betw een these t w o v ariables is (very close to) zero. T ackling such cases inv olves lo oking at the causal relation among functions of the inv olv ed v ariables. W e take the absolute v alue of correlation as the sec- ond v ariable of interest and contin ue with the anal- ysis. Similar to the tide-height example, the pre- dicted causal diagram only sho ws the most signif- ican t cause-effect relation. With M A as the treat- State Entanglemen t M A M B Correlation instance 1 (100 rows) ρ 1 E 1 +1 -1 C 1 .. ... .. .. ... ρ 1 E 1 -1 -1 C 1 instance 2 (100 rows) ρ 2 E 2 +1 +1 C 2 .. ... .. .. ... ρ 2 E 2 -1 +1 C 2 .. ... .. .. ... .. ... .. .. ... instance n (100 rows) ρ n E n -1 +1 C n .. ... .. .. ... ρ n E n +1 +1 C n T ABLE VI: Structural setup of the data generated in the simulation of Alice and Bob’s σ z measuremen ts on tw o entangled spin half particles. A single instance is 100 samples of measurements p erformed by Alice ( M A ) and Bob ( M B ) on the shared state ρ . The correlation v alue C is ev aluated from these 100 samples. (a) F rom LiNGAM algorithm in CDT (b) F rom Domain Kno wledge (with additional assumption) FIG. 16: Causal diagram for correlation b etw een Alice and Bob’s measurement outcomes ( M A and M B resp ectiv ely). men t and correlation ( C ) as the outcome (while ac- coun ting for entanglemen t( E ) as a common cause), w e get a causal estimate of -0.0002 A TE. Similarly , w e get -0.0024 A TE for M B (T able VI I ). The estimate for en tanglemen t E as the treatment is 0.3733 A TE. This shows that the machine puts more faith in the mo del that has En tanglement ( E ), as the underlying v ariable as the cause of the cor- relation ( C ) b et w een M A and M B o ver the mo del that assumes either M A or M B as the cause for C . Therefore, the ML analysis confirms this fact that w e kno w from the domain knowledge. This statement is further strengthened by the re- sults of refuting the c ausal mo del in both the ab o ve scenarios. F or example, the placeb o treatment refu- tation gives a confidence of 94% (p-v alue: 0.94) in 9 Causal Relation Estimate Confidence (A TE) (p-v ale) M A → C -0.0002 0.82 M B → C -0.0024 0.72 E → C 0.3733 0.94 T ABLE VI I: Causal estimates for measurement correlation and entanglemen t. The p-v alues are a veraged o ver three different refutation metho ds. the former mo del and a p-v alue of ∼ 0.8 for the latter. IV. DISCUSSION AND FUTURE W ORK While standard AI and ML based techniques ha ve an outstanding performance in asso ciation lev el tasks, they are unable provide answers to basic queries of cause and effect. One requires the use of causal diagrams and causa l inference to equip the mac hine with said capability . Asso ciation level infer- ence is p ossible ev en for a machine that understands cause-and-effect relation. Section II I A sho ws that using causal analysis framew ork one can infer that while both Sun and Mo on’s gra vitational pull affects the tides on Earth, the Earth-Mo on distance is the ma jor cause for the height of tides. The adv antage of causal models ov er simple associative models is seen in Section I I I B . Not only do es the mac hine esti- mates p otential to b e the primary cause for current, it refutes the incorrect assumption of temp erature dir e ctly affecting the current. The LDR exp erimen t analysis sho ws that, while the conclusions are not 100% accurate, using causal discov ery to infer causal relations from exp erimen tal data can hint at where the fo cus in the experiment should b e. In the prob- lem related to cause of correlations b etw een quan- tum measurements, we observ e that the machine is able to figure out the underlying cause of the corre- lation b etw een the measurement outcomes of Alice and Bob b eing the quan tum en tanglement. Causal Theory is still in its initial stages of devel- opmen t and therefore is in no w ay fo olpro of. There exist quite a few different algorithms for causal dis- co very and there is no guaran tee of the outcomes of one agreeing with the outcomes of the other. The approac h is data-centric and do es not alw ays yield relations that make sense. Nonetheless, ha ving an initial estimate of a causal mo del helps sp eed up the pro cess. One can alw ays fine-tuned the estimates and relations using domain knowledge. Adding the la yer of causal analysis can deep en the understand- ing of the phenomena and pro cesses inv olved. The authors in [ 51 ] presen t a physics-inspired sym b olic regression ML algorithm for discov ering expression- s/equations from data alone. One can explore the adv antage of incorp orating causal inference to such ML applications. A CKNOWLEDGMENTS J.S. w ould lik e to thank Amito j Kaur Chandi ( @nic k naysa yer ) for Fig. 1 and Dr. P aramdeep Singh for help with the exp erimen tal setup for Sec- tion I I I C . J.S. ackno wledges I ISER Mohali for fi- nancial supp ort. [1] Konstantinos Liakos, Patrizia Busato, Dimitrios Moshou, Simon P earson, and Dion ysis Bo c htis, “Mac hine learning in agriculture: A review,” Sen- sors 18 , 2674 (2018) . [2] Sheena Angra and Sac hin Ahuja, “Machine learning and its applications: A review,” in 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDA C) (2017) pp. 57–60. [3] Derrick Mwiti, “10 real-life applications of reinforce- men t learning,” (2023). [4] Giusepp e Carleo and Matthias T roy er, “Solving the quan tum man y-b ody problem with artificial neural net works,” Science 355 , 602–606 (2017) . [5] Naeimeh Mohseni, Thomas F¨ osel, Lingzhen Guo, Carlos Nav arrete-Benlloch, and Florian Marquardt, “Deep Learning of Quantum Man y-Bo dy Dynamics via Random Driving,” Quantum 6 , 714 (2022) . [6] S. S. Schoenholz, E. D. Cubuk, D. M. Sussman, E. Kaxiras, and A. J. Liu, “A structural approach to relaxation in glassy liquids,” Nature Physics 12 , 469–471 (2016) . [7] Juan Carrasquilla and Roger G. Melk o, “Mac hine learning phases of matter,” Nature Physics 13 , 431–434 (2017) . [8] Kelvin Ch’ng, Juan Carrasquilla, Roger G. Melko, and Ehsan Khatami, “Machine learning phases of strongly correlated fermions,” Phys. Rev. X 7 , 031038 (2017) . [9] Simone Tibaldi, Giuseppe Magnifico, Da vide V o dola, and Elisa Ercolessi, “Unsup ervised and su- p ervised learning of interacting top ological phases from single-particle correlation functions,” SciPost Ph ys. 14 , 005 (2023) . [10] Mario Krenn, Meh ul Malik, Rob ert Fickler, Radek Lapkiewicz, and Anton Zeilinger, “Automated searc h for new quan tum exp erimen ts,” Phys. Rev. Lett. 116 , 090405 (2016) . [11] Xiao qin Gao, Manuel Erhard, Anton Zeilinger, and 10 Mario Krenn, “Computer-inspired concept for high- dimensional multipartite quan tum gates,” Ph ys. Rev. Lett. 125 , 050501 (2020) . [12] Amin Babazadeh, Man uel Erhard, F eiran W ang, Meh ul Malik, Rahman Nouro ozi, Mario Krenn, and An ton Zeilinger, “High-dimensional single-photon quan tum gates: Concepts and exp erimen ts,” Ph ys. Rev. Lett. 119 , 180510 (2017) . [13] Manuel Erhard, Mehul Malik, Mario Krenn, and An ton Zeilinger, “Exp erimen tal green berger–horne– zeilinger en tanglement b ey ond qubits,” Nature Pho- tonics 12 , 759–764 (2018) . [14] Pascal F riederich, Mario Krenn, Isaac T amblyn, and Al´ an Aspuru-Guzik, “Scientific intuition in- spired by machine learning-generated hypotheses,” Mac hine Learning: Science and T ec hnology 2 , 025027 (2021) . [15] Daniel Flam-Shepherd, T ony C. W u, Xuemei Gu, Alba Cervera-Lierta, Mario Krenn, and Al´ an Aspuru-Guzik, “Learning in terpretable representa- tions of entanglemen t in quan tum optics experi- men ts using deep generative mo dels,” Nature Ma- c hine Intelligence 4 , 544–554 (2022) . [16] Kishor Bharti, T obias Haug, Vlatko V edral, and Leong-Ch uan Kw ek, “Ho w to teach ai to play b ell non-lo cal games: Reinforcement learning,” (2019), arXiv:1912.10783 [quant-ph] . [17] Giacomo T orlai, Guglielmo Mazzola, Juan Car- rasquilla, Matthias T roy er, Roger Melk o, and Giusepp e Carleo, “Neural-net work quantum state tomograph y ,” Nature Physics 14 , 447–450 (2018) . [18] Matthew J. S. Beac h, Isaac De Vlugt, Anna Gol- ub ev a, P atrick Huem b eli, Bohdan Kulch ytskyy , Xi- uzhe Luo, Roger G. Melko, Ejaaz Merali, and Gia- como T orlai, “QuCumber: w av efunction reconstruc- tion with neural netw orks,” SciP ost Ph ys. 7 , 009 (2019) . [19] Lei W ang, “Discov ering phase transitions with unsu- p ervised learning,” Ph ys. Rev. B 94 , 195105 (2016) . [20] W enjian Hu, Ra jiv R. P . Singh, and Richard T. Scalettar, “Discov ering phases, phase transitions, and crosso vers through unsup ervised mac hine learn- ing: A critical examination,” Ph ys. Rev. E 95 , 062122 (2017) . [21] Juan Carrasquilla, “Machine learning for quantum matter,” Adv ances in Physics: X 5 , 1797528 (2020) . [22] Li Huang and Lei W ang, “Accelerated monte carlo sim ulations with restricted boltzmann machines,” Ph ys. Rev. B 95 , 035105 (2017) . [23] Qianshi W ei, Roger G. Melko, and Jeff Z. Y. Chen, “Iden tifying p olymer states by mac hine learning,” Ph ys. Rev. E 95 , 032504 (2017) . [24] Giacomo T orlai and Roger G. Melko, “Neural de- co der for top ological co des,” Ph ys. Rev. Lett. 119 , 030501 (2017) . [25] B. P . Abb ott, R. Abb ott, et al. (LIGO Scientific Collab oration and Virgo Collab oration), “Observ a- tion of gravitational wa ves from a binary black hole merger,” Phys. Rev. Lett. 116 , 061102 (2016) . [26] Thomas F¨ osel, Murphy Y uezhen Niu, Florian Mar- quardt, and Li Li, “Quantum circuit optimiza- tion with deep reinforcement learning,” (2021), arXiv:2103.07585 [quant-ph] . [27] Thomas F¨ osel, Petru Tighineanu, T alitha W eiss, and Florian Marquardt, “Reinforcement learning with neural net works for quantum feedback,” Ph ys. Rev. X 8 , 031084 (2018) . [28] Ko ji Hashimoto, Sotaro Sugishita, Akinori T anak a, and Akio T omiya, “Deep learning and the ads-cft corresp ondence,” Phys. Rev. D 98 , 046019 (2018) . [29] Marin Buko v, “Reinforcemen t learning for au- tonomous preparation of flo quet-engineered states: In verting the quantum k apitza oscillator,” Ph ys. Rev. B 98 , 224305 (2018) . [30] Marin Buko v, Alexandre G. R. Day , Dries Sels, Phillip W einberg, Anatoli Polk ovnik ov, and Pank a j Meh ta, “Reinforcemen t learning in different phases of quantum control,” Phys. Rev. X 8 , 031086 (2018) . [31] Giacomo T orlai and Roger G. Melko, “Learning thermo dynamics with b oltzmann machines,” Phys. Rev. B 94 , 165134 (2016) . [32] Y ashar D. Hezav eh, Laurence Perreault Lev asseur, and Philip J. Marshall, “F ast automated analysis of strong gravitational lenses with conv olutional neural net works,” Nature 548 , 555–557 (2017) . [33] Jonathan Carifio, James Halv erson, Dmitri Kri- ouk ov, and Brent D. Nelson, “Machine learning in the string landscape,” Journal of High Energy Ph ysics 2017 (2017), 10.1007/jhep09(2017)157 . [34] Rahul Biswas, Lindy Blackburn, Jun wei Cao, Reed Essick, Kari Alison Ho dge, Erotokritos Kat- sa vounidis, Kyungmin Kim, Y oung-Min Kim, Eric- Olivier Le Bigot, Chang-Hwan Lee, John J. Oh, Sang Ho on Oh, Edwin J. Son, Y e T ao, Ruslan V aulin, and Xiaoge W ang, “Application of mac hine learning algorithms to the study of noise artifacts in gra vitational-wa v e data,” Phys. Rev. D 88 , 062003 (2013) . [35] Raban Iten, T ony Metger, Henrik Wilming, L ´ ıdia del Rio, and Renato Renner, “Discov ering ph ysical concepts with neural netw orks,” Phys. Rev. Lett. 124 , 010508 (2020) . [36] Judea P earl and Dana Mac kenzie, The b o ok of why: The new science of cause and effect (Basic Bo oks, 2020). [37] Clark Glymour, Kun Zhang, and P eter Spirtes, “Review of causal discov ery metho ds based on graphical mo dels,” F rontiers in Genetics 10 (2019), 10.3389/fgene.2019.00524 . [38] Markus Kalisch, Martin M¨ achler, Diego Colombo, Marlo es H. Maathuis, and P eter B ¨ uhlmann, “Causal inference using graphical mo dels with the r pac k age p calg,” Journal of Statistical Softw are 47 , 1–26 (2012) . [39] David Maxwell Chick ering, “Optimal structure iden tification with greedy search,” J. Mach. Learn. Res. 3 , 507–554 (2003) . [40] Shohei Shimizu, Patrik O. Hoy er, Aap o Hyv¨ arinen, and Antti Kerminen, “A linear non-gaussian acyclic mo del for causal discov ery ,” J. Mac h. Learn. Res. 7 , 2003–2030 (2006). [41] Emre Kiciman Amit Sharma, “Metho ds for causal 11 inference,” (2018). [42] Diviyan Kalainathan, Olivier Goudet, and Ri- tik Dutta, “Causal disco very to olb o x: Uncov ering causal relationships in p ython,” J. Mach. Learn. Res. 21 (2020). [43] Amit Sharma and Emre Kiciman, “Dowh y: An end- to-end library for causal inference,” arXiv preprint arXiv:2011.04216 (2020) , causal Data Science Meet- ing (https://causalscience.org/). [44] Emden Gansner, Eleftherios Koutsofios, Stephen North, Drawing graphs with dot, T ech. Rep. (A T & T Research, 2006). [45] Paul R. Rosenbaum and Donald B. Rubin, “The cen tral role of the prop ensity score in observ ational studies for causal effects,” Biometrik a 70 , 41–55 (1983) . [46] Abraham W ald, “T ests of statistical hypotheses con- cerning several parameters when the num b er of ob- serv ations is large,” T ransactions of the American Mathematical So ciety 54 , 426–482 (1943) . [47] Donald L. Thistlethw aite and Donald T. Campb ell, “Regression-discon tinuit y analysis: An alternativ e to the ex p ost facto exp erimen t.” Journal of Educa- tional Psychology 51 , 309–317 (1960) . [48] Shinil Cho, “Two-lev el quan tum systems,” in Quan tum Computation and Quantum Information Simulation using Python , 2053-2563 (IOP Publishing, 2022) pp. 1–1 to 1–14. [49] Ryszard Horo decki, Pa we l Horo decki, Micha l Horo dec ki, and Karol Horodecki, “Quantum en tan- glemen t,” Rev. Mo d. Phys. 81 , 865–942 (2009) . [50] Martin B. Plbnio and Shashank Virmani, “An intro- duction to en tanglement measures,” Quantum Info. Comput. 7 , 1–51 (2007). [51] Silviu-Marian Udrescu and Max T egmark, “Ai feyn- man: A ph ysics-inspired metho d for symbolic re- gression,” Science Adv ances 6 , eaay2631 (2020) . 12 App endix A: Causal Analysis - DoWhy DoWh y is “An end-to-end library for causal inference” dev elop ed b y Microsoft. It abstracts the entire pro cess of causal analysis in to a 4 step pro cess: mo del, identify , estimate, and refute. Where the general trend of causal inference is estimation of a mo del parameter like the co efficient of linear regression, DoWh y provides a do-sampler that estimates the distribution of P ( Y | do ( X = x )). This enables us to compute statistics other than test and control difference in av erage outcomes. Considering the example of tide heigh ts: 1. Mo del One can provide a mo del as a digraph from the knowledge related to the sub ject: c a u s a l g r a p h = ‘ ‘ ‘ d i g r a p h { E M d [ l a b e l = ‘ ‘ E a r t h − Mo on d i s t a n c e ’ ’ ] ; ESd [ l a b e l = ‘ ‘ E a r t h − S un d i s t a n c e ’ ’ ] ; h [ l a b e l = ‘ ‘ h e i g h t o f t h e t i d e ’ ’ ] ESd − > E M d − > h ; ESd − > h ; } ’ ’ ’ m o d e l= d ow hy . C a u s a l M o d e l ( d a t a = d a t a s e t h a l i f a x , g r a p h= c a u s a l g r a p h . r e p l a c e ( ‘ ‘ \ n ’ ’ , ‘ ‘ ’ ’ ) , t r e a t m e n t = ‘ E M d’ , o u t c o m e = ‘ h ’ , c o m m o n c a u s e s = ‘ ESd ’ ) m o d e l . v i e w m o d e l ( ) d i s p l a y ( I m a g e ( f i l e n a m e = ‘ ‘ c a u s a l m o d e l . p n g ’ ’ ) ) Or one can obtain the mo del from the data itself using causal discov ery algorithms (CDT pac k age): f r o m c d t . c a u s a l i t y . g r a p h i m p o r t L i N G A M l a b e l s = l i s t ( d a t a s e t h a l i f a x . c o l u m n s ) p r e d i c t e d g r a p h = L i N G A M ( ) . p r e d i c t ( d a t a s e t h a l i f a x ) a d j m a t r i x = np . a s a r r a y ( n x . t o n u m p y m a t r i x ( p r e d i c t e d g r a p h ) ) i d x = n p . a b s ( a d j m a t r i x ) > 0 . 0 1 d i r s = n p . w h e r e ( i d x ) g r a p h = g r a p h v i z . D i g r a p h ( e n g i n e = ’ d o t ’ ) f o r na me i n l a b e l s : g r a p h . n o d e ( n ame ) f o r e d g e t o , e d g e f r o m , v a l u e i n z i p ( d i r s [ 0 ] , d i r s [ 1 ] , a d j m a t r i x [ i d x ] ) : g r a p h . e d g e ( l a b e l s [ e d g e f r o m ] , l a b e l s [ e d g e t o ] , l a b e l = s t r ( v a l u e ) ) d i s p l a y ( g r a p h ) Result: (a) F rom LiNGAM algorithm in CDT (b) F rom Domain Knowledge 2. Identify 13 i m p o r t s t a t s m o d e l s i d e n t i f i e d e s t i m a n d = m o d e l . i d e n t i f y e f f e c t ( p r o c e e d w h e n u n i d e n t i f i a b l e = T r u e ) p r i n t ( i d e n t i f i e d e s t i m a n d ) Result: E s t i m a n d t y p e : n o n p a r a m e t r i c − a t e # # # E s t i m a n d : 1 E s t i m a n d n ame : b a c k d o o r E s t i m a n d e x p r e s s i o n : d − − − − − − − ( E x p e c t a t i o n ( h ) ) d [ ESd ] E s t i m a n d a s s u m p t i o n 1 , U n c o n f o u n d e d n e s s : I f U − > { ESd } a n d U − > h t h e n P ( h | ESd , , U ) = P ( h | ESd , ) # # # E s t i m a n d : 2 E s t i m a n d n ame : i v No s u c h v a r i a b l e f o u n d ! # # # E s t i m a n d : 3 E s t i m a n d n ame : f r o n t d o o r No s u c h v a r i a b l e f o u n d ! 3. Estimate e s t i m a t e = m o d e l . e s t i m a t e e f f e c t ( i d e n t i f i e d e s t i m a n d , m e t h o d n a m e = ‘ ‘ b a c k d o o r . l i n e a r r e g r e s s i o n ’ ’ , c o n t r o l v a l u e = 0 , t r e a t m e n t v a l u e = 1 , c o n f i d e n c e i n t e r v a l s = T r u e , t e s t s i g n i f i c a n c e =T r u e ) p r i n t ( ‘ ‘ E s t i m a t e : ’ ’ , e s t i m a t e . v a l u e ) Result: E s t i m a t e : − 2 9 1 3 . 1 5 7 8 8 2 9 4 2 9 8 7 4. Refute # P l a c e b o T r e a t m e n t R e f u t e r : − R a n d o m l y a s s i g n s a n y c o v a r i a t e a s # a t r e a t m e n t a n d r e − r u n s t h e a n a l y s i s . I f o u r a s s u m p t i o n s w e r e # c o r r e c t t h e n t h i s n e w l y f o u n d o u t e s t i m a t e s h o u l d g o t o 0 . r e f u t e = m o d e l . r e f u t e e s t i m a t e ( i d e n t i f i e d e s t i m a n d , e s t i m a t e , m e t h o d n a m e = ‘ ‘ p l a c e b o t r e a t m e n t r e f u t e r ’ ’ ) p r i n t ( r e f u t e ) Result: R e f u t e : U s e a P l a c e b o T r e a t m e n t E s t i m a t e d e f f e c t : − 2 9 1 3 . 1 5 7 8 8 2 9 4 2 9 8 7 N ew e f f e c t : 0 . 0 p v a l u e : 1 . 0
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment