- Title: Sabrina Modeling and Visualization of Economy Data with Incremental Domain Knowledge
- ArXiv ID: 1908.07479
- Date: 2020-01-09
- Authors: Alessio Arleo, Christos Tsigkanos, Chao Jia, Roger A. Leite, Ilir Murturi, Manfred Klaffenboeck, Schahram Dustdar, Michael Wimmer, Silvia Miksch, and Johannes Sorger
📝 Abstract
Investment planning requires knowledge of the financial landscape on a large scale, both in terms of geo-spatial and industry sector distribution. There is plenty of data available, but it is scattered across heterogeneous sources (newspapers, open data, etc.), which makes it difficult for financial analysts to understand the big picture. In this paper, we present Sabrina, a financial data analysis and visualization approach that incorporates a pipeline for the generation of firm-to-firm financial transaction networks. The pipeline is capable of fusing the ground truth on individual firms in a region with (incremental) domain knowledge on general macroscopic aspects of the economy. Sabrina unites these heterogeneous data sources within a uniform visual interface that enables the visual analysis process. In a user study with three domain experts, we illustrate the usefulness of Sabrina, which eases their analysis process.
💡 Summary & Analysis
This paper introduces Sabrina, a financial data analysis and visualization approach that addresses the challenge of integrating scattered economic data from diverse sources such as newspapers and open data. The core issue is that while large amounts of financial information are available, they are spread across various platforms, making it difficult for analysts to get an overview. Sabrina combines firm-level ground truth with macroeconomic domain knowledge through a pipeline that generates firm-to-firm transaction networks, providing a unified visual interface for analysis. This approach enables analysts to leverage heterogeneous data sources and improve their understanding of the financial landscape. A user study with three domain experts demonstrated the effectiveness of Sabrina in facilitating more efficient and insightful financial analyses.
📄 Full Paper Content (ArXiv Source)
Models of financial behavior can vary greatly, since they are typically
dependent on domain knowledge, make use of proprietary data or
observations or are bound to particular analyses. We are not concerned
with economics aspects, but assume such domain knowledge is available as
an input to the system. supports any type of model data but is built on
a methodology where the financial analyst can infer a model by
incrementally introducing domain knowledge or through integrating models
obtained from other sources. To leverage knowledge of domain experts and
users, we treat the problem of inferring a monetary outflow model as
constraint satisfaction, where the constraints encode domain knowledge.
A model that satisfies the constraints is a valid model. We note that
this approach fits our general context particularly well: (i) domain
knowledge encoded as constraints may be updated or expanded at will,
(ii) users may introduce their own knowledge for their analysis case,
and (iii) the dynamic introduction of constraints renders our approach
independent from particular financial modeling techniques, which may be
integrated as well as further domain information. In the following, we
describe the model inference process: it consists of three steps – after
general bounds are introduced, available high-level knowledge is
captured, and experts’ information is further introduced.
Domain Bounds. Absolute truths that are universally applicable are
encoded as constraints to the problem, and used in the model validation,
e.g., a company may not have negative expenses on employee costs.
High-Level Behavior. Macroeconomic monetary flows, e.g., between
industry sectors, are generally known and publicly available , typically
obtained from census or tax data. For example, it may be known that the
outflow of companies from the manufacturing sector to the ones in
agriculture amounts to 32 billion in 2014 .
Expert Knowledge. Domain-specific insights are introduced as further
constraints to the problem. This step leverages the fact that the
intended users are experts in their domain, may possess further
proprietary data or models, or use the system in the context of a
what-if analysis. For example, an analyst within the financial
department of a regional government may know that firms operating in a
given municipality with less than 100k employee expenditure trade with a
maximum of 10 firms in another region. Note that this step can be
performed incrementally, on a case-by-case basis and individually by
each user. In a similar fashion, the output of, e.g., a MARA process may
be integrated as expert knowledge ; explicit values of firm transactions
are encoded as constants. The steps above correspond to component sets
of different constraints. To build the target firm-to-firm transaction
model, the specified constraints are encoded as first-order logical
formulae within satisfiability modulo theories (SMT) , using quantifiers
(over finite sets) and integer linear arithmetic for constraint
specification. SMT solving is a highly computationally intensive
operation. However, the complexity of constraints (viewed as logical
formulae) is generally low, and the model inference operation is not
required to be online. A valuation that satisfies the resulting formula
is a valid model. The output of the process yields a weighted graph,
where nodes are firms and edges capture the value of a monetary outflow
from one firm to another. In general, there can be many different graph
variants satisfying the same constraints; the more constraints are
specified, the better the model approximates reality. This is an
inherent limitation of the approach – minimal constraints can result in
a broad range of satisfiable models. For a comprehensive and extensive
analysis of the modeling process, the interested reader can refer to .
To evaluate the advocated model inference approach, we developed tool
support and a proof-of-concept implementation based on the CVC4 SMT
solver . The resulting assignment is used to derive the firm transaction
model. For reference, a problem instance involving 500 firms in Austria,
taken from an anonymized subset of the Sabina dataset is solved in
157 minutes on a laptop computer with an Intel i5 2.3GHz processor and
15G RAM. It is worth noting that how the constraints are expressed may
have a significant impact on the running times .
The copyright of this content belongs to the respective researchers. We deeply appreciate their hard work and contribution to the advancement of human civilization.