Sabrina: Modeling and Visualization of Economy Data with Incremental Domain Knowledge

Models of financial behavior can vary greatly, since they are typically dependent on domain knowledge, make use of proprietary data or observations or are bound to particular analyses. We are not concerned with economics aspects, but assume such domain knowledge is available as an input to the system. supports any type of model data but is built on a methodology where the financial analyst can infer a model by incrementally introducing domain knowledge or through integrating models obtained from other sources. To leverage knowledge of domain experts and users, we treat the problem of inferring a monetary outflow model as constraint satisfaction, where the constraints encode domain knowledge. A model that satisfies the constraints is a valid model. We note that this approach fits our general context particularly well: (i) domain knowledge encoded as constraints may be updated or expanded at will, (ii) users may introduce their own knowledge for their analysis case, and (iii) the dynamic introduction of constraints renders our approach independent from particular financial modeling techniques, which may be integrated as well as further domain information. In the following, we describe the model inference process: it consists of three steps – after general bounds are introduced, available high-level knowledge is captured, and experts’ information is further introduced.

Domain Bounds. Absolute truths that are universally applicable are encoded as constraints to the problem, and used in the model validation, e.g., a company may not have negative expenses on employee costs.

High-Level Behavior. Macroeconomic monetary flows, e.g., between industry sectors, are generally known and publicly available , typically obtained from census or tax data. For example, it may be known that the outflow of companies from the manufacturing sector to the ones in agriculture amounts to 32 billion in 2014 .

Expert Knowledge. Domain-specific insights are introduced as further constraints to the problem. This step leverages the fact that the intended users are experts in their domain, may possess further proprietary data or models, or use the system in the context of a what-if analysis. For example, an analyst within the financial department of a regional government may know that firms operating in a given municipality with less than 100k employee expenditure trade with a maximum of 10 firms in another region. Note that this step can be performed incrementally, on a case-by-case basis and individually by each user. In a similar fashion, the output of, e.g., a MARA process may be integrated as expert knowledge ; explicit values of firm transactions are encoded as constants. The steps above correspond to component sets of different constraints. To build the target firm-to-firm transaction model, the specified constraints are encoded as first-order logical formulae within satisfiability modulo theories (SMT) , using quantifiers (over finite sets) and integer linear arithmetic for constraint specification. SMT solving is a highly computationally intensive operation. However, the complexity of constraints (viewed as logical formulae) is generally low, and the model inference operation is not required to be online. A valuation that satisfies the resulting formula is a valid model. The output of the process yields a weighted graph, where nodes are firms and edges capture the value of a monetary outflow from one firm to another. In general, there can be many different graph variants satisfying the same constraints; the more constraints are specified, the better the model approximates reality. This is an inherent limitation of the approach – minimal constraints can result in a broad range of satisfiable models. For a comprehensive and extensive analysis of the modeling process, the interested reader can refer to .

To evaluate the advocated model inference approach, we developed tool support and a proof-of-concept implementation based on the CVC4 SMT solver . The resulting assignment is used to derive the firm transaction model. For reference, a problem instance involving 500 firms in Austria, taken from an anonymized subset of the Sabina dataset is solved in 157 minutes on a laptop computer with an Intel i5 2.3GHz processor and 15G RAM. It is worth noting that how the constraints are expressed may have a significant impact on the running times .