Dimensions of Formality: A Case Study for MKM in Software Engineering
We study the formalization of a collection of documents created for a Software Engineering project from an MKM perspective. We analyze how document and collection markup formats can cope with an open-ended, multi-dimensional space of primary and seco…
Authors: Andrea Kohlhase, Michael Kohlhase, Christoph Lange
Dimensions of F ormalit y: A Case Study for MKM in Soft w are Engineering Andrea K ohlhase 1 and Mic hael Kohlhase 2 and Christoph Lange 2 1 German Researc h Center for Artificial In telligence (DFKI) Andrea.Kohlhase@dfki.de 2 Computer Science, Jacobs Universit y Bremen {m.kohlhase,ch.lange}@jacobs-university.de Abstract. W e study the formalization of a collection of documents cre- ated for a Soft ware Engineering pro ject from an MKM p erspective. W e analyze ho w do cumen t and collection markup formats can cope with an op en-ended, multi-dimensional space of primary and secondary classifi- cations and relationships. W e show that RDF a-based extensions of MKM formats, emplo ying flexible “metadata” relationships referencing specific v ocabularies for distinct dimensions, are well-suited to encode this and to put it into service. This formalized knowledge can b e used for enriching in teractive do cumen t bro wsing, for enabling m ulti-dimensional metadata queries ov er do cuments and collections, and for exp orting Linked Data to the Seman tic W eb and thus enabling further reuse. 1 In tro duction The field of Mathematical Kno wledge Managemen t (MKM) tries to mo del math- ematical ob jects and their relationships, their creation and publication pro cesses, and their management requirements. In [CF09, 237 ff.] Carette and F armer analyzed “ six major lenses thr ough which r ese ar chers view MKM ” : the do cumen t, library , formal, digital, in teractive, and the pro cess lens. Quite ob viously , there is a gap b et ween the formal asp ects {“library”, “formal”, “digital” } – related to mac hine use of mathematical knowledge – and the informal ones {“document”, “in teractive”, “pro cess” } – related to human use. In the F ormalSafe pro ject [F or08] at the German Researc h Center for Arti- ficial In telligence (DFKI) Bremen a main goal is the integration of pro ject do c- umen ts into a computer-supp orted softw are developmen t pro cess. MKM tec h- niques are used to bridge the gap b etw een informally stated u ser requirements and formal v erification. One of the F ormalSafe case studies is based on the do cumen ts of the SAMS pro ject (“Sicherungsk omp onente für Autonome Mobile Systeme [Safety Component for Autonomous Mobile Systems]”, see [FHL + 08]) at DFKI. The SAMS ob jective was to dev elop a safety comp onen t for autonomous mobile service rob ots and to get it certified as SIL-3 standard compliant in the course of three years. On the one hand, certification required the verification of certain safety prop erties in the co de do cumen ts with the pro of c heck er Is- ab elle [NPW02]. On the other hand, it necessitated the soft ware dev elopment The final publication of this pap er is av ailable at www.springerlink.com, fo o pro cess to follow the V-Mo del (fig. 1). This mandates e. g. that relev ant do cu- men t fragments get justified and linked to corresp onding fragments in a succes- siv e do cument refinement pro cess (the arms of the ‘V’ from the upp er left o ver the b ottom to the upp er right and b etw een arms in fig. 1). Fig. 1. Do cumen ts in the V-Mo del The collection of SAMS do cuments (w e call it “ SAMSDo cs ” [SAM09]) promised an in teresting case study for F ormalSafe as system developmen t with resp ect to the V-Mo del regime re- sulted in a highly in terconnected col- lection of design do cumen ts, certifica- tion do cuments, co de, formal sp ecifica- tions, and formal pro ofs. F urthermore, it was supp osed that adding semantics to SAMSDo cs would be comparativ ely easy as it w as developed under a strong formalization pressure. In this pap er w e rep ort on — and draw conclusions from — the SAMSDo cs formalization, particularly the formalization of its L A T E X do cuments. In section 2, we do cumen t the pro cess and detect inherent, distinct formality lev els and the multi-dimensionalit y of the formalized structures. Real information needs (dra wn from three use c ases in the SAMS context) turn out in section 3 to b e m ulti-dimensional. This motiv ates our exploration of multi-dimensional markup in section 4. Section 5 show cases the feasibility of m ulti-dimensional services with MKM technology enabled by m ulti-dimensional structured represen tations and section 6 concludes the pap er. 2 Dimensions of F ormalit y in SAMSDo cs In this pap er, we are esp ecially interested in the question “What should we sensibly formalize in a do cument c ol le ction and c an MKM metho ds c op e?” . Note that we understand “to formalize” as “making implicit knowledge explicit” and not as “to make s.th. fully formal”. The SAMS pro ject w as organized as a typical Softw are Engineering pro ject, its collection of do cuments SAMSDo cs therefore has a prototypical comp osition F ormat Files # L A T E X * .tex 251 MS W ord * .doc 61 Isab elle * .thy 33 Misra-C Co de * .c 40 Fig. 2. SAMSDo cs of distinct do cument types like contract, co de, or man ual. Thus, SAMSDo cs presen ts a go o d base for a case study with resp ect to our question. In fig. 2 w e can see the concrete distribution ov er used do c- umen t formats in SAMSDo cs . Requirements analy- sis, system and mo dule sp ecifications, reviews, and the final man ual w ere mainly written in L A T E X, only roughly a sixth in MS W ord. The implementation in Misra-C con tains Isab elle theorem pro ver calls. The first stinging, but unsurprising observ ation was that the level of for- mality of the documents in SAMSDo cs v aries considerably — b ecause dis tinct 2 purp oses create distinct formalit y requirements. F or instance, the con tract do cu- men t serves as communication medium b etw een the customer and the contractor. Here, undersp ecification is an imp ortant to ol, whereas it is regarded harmful in the fine-gran ular mo dule sp ecifications and a fatal flaw in input logic for a theo- rem pro ver. Since this issue was already present in the set of L A T E X do cuments, w e fo cused on just these. F or the formalization of this subset in SAMSDocs we used the S T E X sys- tem [Koh08], a semant ic extension of L A T E X. It offers to b oth publish documents as high-quality human-readable PDF and as formal mac hine-pro cessable OM- Do c [Koh06] via L A T E XML [SKG + 10]. Our formalization pro cess revealed early on that previous S T E X applications (based on OMDo c 1.2) were to o rigid for a step wise seman tic markup. But fortunately , S T E X also allo ws for the OMDo c 1.3 sc heme of metadata via RDF a [ABMP08] annotations (see [K oh10]). In par- ticular, we could ‘inv en t’ our o wn v o cabulary for markup on demand without extending OMDo c. This new v o cabulary consists of SAMSDocs -sp ecific metadata prop erties and relationship types. W e call the pro cess of adding this pr e -formal markup to SAMSDocs (semantic) preloading . Concretely , we extended S T E X to S T E X-SD ( S T E X for SAMSDo cs ) b y adding L A T E XML bindings for all SAMS sp ecific T E X macros and environmen ts used in SAMSDocs , th us enabling the conserv ation of the original PDF do cument la youts at the same time as the generation of meaningful OMDo c. Fig. 3. The F ormalization W orkflow with S T E X-SD [ translated b y the authors ] 3 Let us lo ok at an example for such an S T E X extension within our formaliza- tion workflo w (see fig. 3). W e started out with a T E X do cumen t (upp er left), whic h compiled to the PDF seen on the upp er right. Here, we ha ve a simple, t wo-dimensional table, whic h is realized with a L A T E X environmen t tabular . Seman tically , this table contains a list of sym b ols for do cument states with their definitions, e. g. “i. B.” for “in Bearb eitung [in progress]”. As such definition tables w ere used throughout the pro ject, we developed the en vironment SDTab-def and the macro SDdef as S T E X extensions. W e determined the OMDo c output for these to b e a symbol together with its definition elemen t (for eac h use of SDdef in place of the resp. table ro w) and moreov er, to group all of them in to a theory (via using SDTab-def ). Preloading the T E X table by employing SDTab-def and SDdef turned it into an S T E X do cument (middle of fig. 3) while k eeping the original PDF table structure. Using L A T E XML on this S T E X do cument pro duces the OMDo c output shown in the lo wer area of fig. 3. Mathematical, structural relationships hav e a privileged state in S T E X: their command sequence/en vironment syn tax is analogous to the nativ e XML element and attribute names in OMDo c. Since man y ob jects and relationships induce for- mal representations for Isab elle, it seemed p ossible to seman tically mark them up with a logic-inspired structure. But in the formalization pro cess it so on b e- came apparen t that (imp ortant) knowledge implicit in SAMSDo cs did not refer to the ‘primary’ structure aimed at with the use of S T E X. Instead, this knowl- edge w as concerned with a space of less formal, ‘secondary’ classifications and relationships. Thus, our second observ ation p ertains to the substance of formal- izations. Even though we wan ted to find out what we can sensibly formalize, w e had assumed this to mean how much we can sensibly formalize. Therefore, w e were rather surprised to find distinct formality structur es realized in our S T E X extension. In the follo wing we wan t to rep ort on these structures. W e group ed the macros and environmen ts of S T E X-SD in fig. 5 according to what induced them. P articularly , we distinguished the following triggers: – “ obje ct s” — do cumen t fragments viewed as autonomous elements — and – their net of relationships via the c ol le ction , – do cument s and – their or ganization al handling, and – the pr oje ct itself and thus, its own scheme of meaningful relationships. F or instance, in the system sp ecification w e mark ed a recap of a definition of the braking distance function for straight-ahead driving s G as an ob ject and referenced it from within the assertion seen in fig. 4. In the mo dule sp ecification Fig. 4. s is Bra- king Distance? s G w as then meticulously specified. This do cument fragment is connected to the original one via a refinement-relationship from the V-Mo del, which determined the creation pro cess of the collection. Do cuments induce lay out structures like sec- tions or subsections and they are themselves organized for example under a version management sc heme. In the work- flo w in fig. 3 we already show cased a pro ject-sp ecific element, the definition table, with its meaning. Interestingly , we can- 4 not compare formality in one group with the formality in another. F or example, w e cannot decide whether a do cumen t completely mark ed up with the ob ject- induced structures is more formal than one fully seman tically enhanced by the v ersion management markup. As these group ed structures only interact rela- tiv ely lightly , we can consider them as indep endent dimensions of a formality space that is reified in the formalization pro cess of a do cument collection. Concretely , S T E X-SD co vers the following dimensions and consists of the listed extension macros/en vironments (with attributes in [ · ] where sensible): Fig. 5. F ormalit y Dimensions in S T E X-SD F ormalizing ob ject structures is not alwa ys obvious, since many of the do c- umen ts contain recaps or previews of material that is introduced in other do c- umen ts/parts (e. g. to mak e them self-contained). Compare for example fig. 4 Fig. 6. Y et another Braking Distance s ? and fig. 6, whic h are actually clippings from the system sp ecification “ KonzeptBremsmodell.pdf ”. Note the use of s resp. s G , b oth pointing in fig. 4 to the brak- ing distance function for straight-ahead driving (which is ob vious from the lo cal context), whereas in fig. 6 s represen ts the general arc length function of a circle, whic h is different in principle from the braking distance, but coincides here. W e also realized that S T E X itself had already integrated another formalit y dimension b esides the logic-inspired one, the one concerned with do cument la y- out: A t ypical do cument la yout is structured into established parts like sections or mo dules. If we wan t to keep this grouping information in the formal XML 5 Fig. 7. The Do cumen t F ormality Dimension in S T E X do cumen t, w e might use S T E X’s DCM pack age for annotating general do cument structures with Dublin Core (cf. [Dub08]) and similar general-purpose metadata. In the S T E X b ox in fig. 3 we find for example the command DCMsubsection with attributes containing the title of the subsection and an identifier that can b e used in the usual L A T E X referencing sc heme. Finally , w e would like to remark that the S T E X-SD preloading pro cess was executed as “ in-plac e formalization ” [SIM99]. It frequently considered several of the ab ov e dimensions for the ob ject at hand at the same time. Therefore, the of- ten applied metaphor of “formalization steps” does not mirror the formalization pro cess in our c ase study . W e found that the imp ortant asp ect of the formaliza- tion was not its sequence p er se, which we consider particular to the SAMSDo cs collection, but the fact that the metaphor of ‘steps’ only work ed within each single dimension of formalit y . In particular, there is no scale for formalization progress as distinct formality levels in distinct formalit y dimensions existed in a do cumen t at one p oint in time. 3 Multi-Dimensional Information Needs W e ha ve shown that the formalization of knowledge results in an op en-ended, m ulti-dimensional space of primary and secondary classifications and relation- ships. But are m ulti-dimensional do cument formalizations b eneficial for services supp orting real users? Concretely , we en vision p otential questions in the SAMS con text and services that retrieve and displa y answers based on the multi- dimensional markup of SAMSDo cs . Let us first take a programmer ’s p ersp ective. Her main information source for the programming task will stem from the mo dule sp ecification. But while studying it the follo wing questions might arise: (i) What is the definition for a certain (mathematical) symbol? 3 (ii) Ho w muc h of this sp ecification has already b een implemented? (iii) In what state is the proof of a sp ecific equation, has it already been formally v erified so that it is safe to ground my implementation on it? (iv) Whom can I ask for further details? Assuming multi-dimensional markup an information retriev al system can supply useful responses. F or example, it can answer (i) if technical terms in natural language are linked to the resp ective formal mathematical symbols they represen t. F or replies to (ii) and (iii) we note that, if all collection links are 3 See fig. 4 and 6 for tw o symbols ha ving the same app earance but different meanings. 6 merged into a graph, their original placemen t and direction no longer mak es a difference. So if w e hav e links from the Isab elle formalization to the resp ectiv e C co de and links from this C co de to a sp ecification fragment, as realized in the V-Mo del structure of SAMSDocs , w e can follow the graph from the specifica- tion through to the state of the according pro of. Drawing on the V-Mo del links com bined with the semantic version managemen t or the review logs, the system can deduce the answer for (iv): The co de in question connects to a sp ecification do cumen t that has authors and reviewers. This service can b e as fine-grained as one is willing to formalize the granularit y of the version and review man- agemen t. If we admit further dimensions of markup into the picture, then the system migh t even find persons with similar interests (e. g. expressed in terms of the FO AF vocabulary), as has b een inv estigated in detail for exp ert finder systems [SWJL10]. No w, we tak e a more global p ersp ective, the one of a pro ject manager . She migh t b e concerned with the following issues: (v) Softwar e Engine ering Pr o c ess : Ho w muc h co de has b een implemen ted to satisfy a particular requiremen t from the contract? Has the formal co de structure passed a certain static analysis and verification? She do es n ot w ant to insp ect that man ually b y running Isab elle, th us, she needs high- lev el figures of, e. g., the num ber of mathematical statemen ts without a formally v erified pro of. (vi) Certific ation : What parts of the specification, e. g. requiremen ts, ha ve changed since the last certification? What other parts do es that affect, and thus, what subset of the whole sp ecification has to b e re-certified? (vii) Human Capital : Who is in charge of a do cument? How could an author be replaced if necessary , taking into account colleagues working on the same or on related documents – suc h as previous revisions of the same do cument, or its predecessor in terms of the V-Mo del, i. e. the do cument that is refined b y the current one? Exploiting the multi-dimensionalit y of formalized knowledge, it b ecomes ob vious ho w the issues can b e tackled. Finally , we en vision a certifier ’s information needs. F or insp ection, she migh t first b e interested in getting an ov erview, such as a list of all relev ant concepts in the contract do cumen t. Then, she likes to follo w the links to the detailed sp ecification and further on to the actual implementation. F or more information, she likes to contact the pro ject inv estigator instead of the particular author of a co de snipp et. The certifier also needs to understand what parts of the whole sp ecification are sub ject to a requested re-certification. Her rejection of a certain part of a do cument also affects all elements in the collection that dep end on it. Again, a system can easily support a certifier’s efficiency by combining the formalized information of distinct formalit y dimensions. These use scenarios in a Softw are Engineering pro ject clearly show that multi- dimensional markup is useful, since multi-dimensional queries serv e natural in- formation needs. T o answ er suc h queries, w e need to represent multi-dimensional information in MKM formats. 7 4 Multi-Dimensional Markup Structured representations are usually realized as files mark ed up in formats that reflect the primary formalization in tent and markup preferences of the formalizer. In the ev aluation of do cument formats it is thus imp ortant to realize that every represen tation language concentrates on only a subset of p ossible relationships, whic h it treats with specific language constructs. Note that therefore the for- malit y space of a semantically enhanced do cument is v ery often reduced to this primary dimension. On the formal side, for example, a plethora of system-sp ecific logics exist. F urthermore, formal systems increasingly contain custom mo dular- ization infrastructures, ranging from simple facilities for inputting external files to elaborate multi-logic theory graphs [MML07]. Collections of informal do cu- men ts, on the other side, are often structured by application-sp ecific metadata lik e the Math Sub ject Classification [Soc09] or the V-Mo del relations as in our case study . No given format can natively capture al l asp ects of the domain via sp ecial- purp ose markup primitiv es. It has to relegate some of them to other mec hanisms lik e the S T E X-SD extension for the formalization of SAMSDo cs , if more dimen- sions of the formality space than the primary one are to b e cov ered. In represen- tation formats that support fragment iden tifiers — e. g. XML-based ones — these relationships can b e expressed as stand-off markup in RDF (Resource Descrip- tion F ramework [RDF04]), i. e., as sub ject-predicate-ob ject triples, where sub ject and ob ject are URI references to a fragment and the predicate is a reference to a relationship sp ecified in an external vocabulary or ontology 4 . As we ha ve XML- based formats for informal do cuments (e. g. XHTML+MathML+SVG) and for- mal sp ecifications (Op enMath or Conten t MathML), we can in principle already enco de multi-dimensional structured represen tations, if we only supply according metadata vocabularies for their structural relationships. Indeed this is the basic arc hitecture of the “Semantic W eb approac h” to eScience, and m uch of the work of MKM can b e seen as attempts to come up with go o d metadata vocabularies for the mathematical/scien tific domain. Since RDF stand-off markup is notoriously difficult to k eep up to date, RDF a [ABMP08] has b een developed: A set of attributes for embedding RDF annota- tions in to XML-based languages, originally XHTML. On the one hand, RDF a serv es as an enabling technology for making XML-based languages extensible by in ter- and intra-document relationships. On the other hand, RDF a serv es as a ve- hicle for document format interoperability . All relationships from a format F that cannot be natively represen ted in a format F 0 can be represented as RDF a triples, where the predicate is from an appropriately designed metadata v o cabulary that 4 The difference b etw een “vocabulary” and “ontology” is not sharply defined. V o cabu- laries are often developed in a b ottom-up communit y effort and tend to hav e a low degree of formality , whereas ontologies are often designed by a central group of ex- p erts and hav e a higher degree of formalit y . Here, we use “vocabulary” in its general sense of a set of terms from a particular domain of in terest. This subsumes the term “on tology”, which we will reserv e for cases that require a more formal domain model. 8 describ es the format F . F or instance, an OMDo c elemen t can be rep- resen ted as
in XHTML, using the OMDo c ontology [Lan10]. Supp ort of RDF a relationships mak e all XML-based formats theoretically equiv alent, if they allow fine-grained text structuring with elements like XHTML’s div or span ev erywhere (so that arbitrary text fragments can b e turned into ob jects). In particular, they b ecome formats for multi-dimensional markup as resp ectiv e other dimensions can alw ays b e added via RDF a. W e hav e detailed the necessary extensions for the OMDo c format in [Koh10], so that analogous extensions for an y of the XML- based formats used in the MKM comm unity should b e rather simple to create. Note that the pragmatic restriction to XML-based representation formats is not a loss of generality . In the MKM sphere the three classes of non-XML lan- guages used are computational logics, T E X/L A T E X, and PostScript/PDF. W e see computational logics as compact fron t-end formats that are optimized for man ual input of formal structured representations; it is our exp erience that these can b e transformed into the XML-based Op enMath, MathML, or OMDo c without loss of information (but with a severe loss of notational conciseness). W e consider T E X/L A T E X as analogous for informal structured represen tations; they can b e transformed to XHTML+MathML by the L A T E XML system. The last category of formats are presentation/prin t-oriented output and archiv al formats where the situation is more problematic: PostScript (PS) is largely sup erseded by PDF whic h allo ws standard document-lev el RDF annotations via XMP and the finer- gran ular annotations we need for structured representations via extensions as in [GMH + 07] or [Eri07]. But PS/PDF are usually generated from other formats (mostly office formats or L A T E X), so that alternative generation into XML-based formats lik e XHTML or OMDo c can b e used. Note as well that a dimension typically corresp onds to a vocabulary . In the course of the SAMSDo cs case study , most v o cabularies hav e initially b een imple- men ted from scratch in a pro ject-sp ecific ad ho c w ay . But they can b e elab orated to wards ontologies via S T E X and these can b e translated to RDF-based formats that automated reasoners understand [KKL10]. An alternative is reusing exist- ing ontologies. This has the adv antage that they are more widely used and thus, reusable services may already hav e been implemented for them. F or instance, there already exists a vocabulary that defines basic prop erties of persons and or- ganizations: FO AF (F riend of a F riend [BM07]). The widely known Dublin Core elemen t set is also av ailable as an ontology [Dub08]. DCMI T erms [DCM08], a mo dernized and extended version of the Dublin Core elemen t set, offers a ba- sic vocabulary for revision histories – but not for reviewing and certification. DO AP (Description of a Pro ject [Dub10]) describ es softw are pro jects, alb eit fo- cusing on the top-level structure of public op en source pro jects. Lin et al. hav e dev elop ed an ontology for the requirements-related parts of the V-Mo del (cf. [LFB96]). Happel and Seedorf briefly review further on tologies ab out Soft- w are Engineering [HS06]. As, e. g. the SAMSDocs vocabularies can b e integrated with existing on tologies b y declaring appropriate sub class or equiv alence rela- tionships, services can mak e use of the b est of b oth worlds. 9 5 Multi-Dimensional Services with MKM T echnology W e will no w study an av en ue to w ards a concrete implemen tation of services based on the use cases describ ed in sect. 3 to show how MKM technologies can cop e with m ulti-dimensional information needs demonstrating their feasibility . Concretely , we will study the task of pro ject manager Nora to find a substitute for emplo yee Alice. All required information is contained in the S T E X-SD doc- umen ts. T o abstract from the particulars of S T E X/OMDo c RDF a enco ding — e. g. the somewhat arbitrarily chosen direction of the relations or the interac- tion of metadata relations with the document and the sp ecial markup for the mathematical dimension — we extract a uniform RDF representation of the em- b edded structures, which can then b e queried in the SP ARQL language [PS08]. Listing 1.1 sho ws the necessary query in all detail. Listing 1.1. Finding a Substitute for an Employ ee via the V-Mo del # declaration of vocabulary (= dimension) namespace URIs PREFIX vm: PREFIX omdoc: # OMDoc PREFIX semVM: 5 PREFIX dc: # Dublin Core PREFIX xsd: # XML Schema datatypes SELECT ?potentialSubstituteName WHERE { # for each document Alice is responsible for, get all of its parts 10 # i.e. any kind of semantic (sub)object in the document ?document vm:responsible <.../employees#Alice> ; omdoc:hasPart ?object . # find other objects that are related to each ?object 15 # 1. in that ?object refines them via the V-model { ?object semVM:refines ?relatedObject } UNION # 2. or in that they are other mathematical symbols defined in terms # of ?object (only applies if ?object itself is a symbol) 20 { ?object omdoc:occursInDefinitionOf ?relatedObject } # find the document that contains the related object and the person # responsible for that document ... ?otherDocument omdoc:hasPart ?relatedObject ; 25 dc:date ?date ; vm:responsible ?potentialSubstitute . # (only considering documents that are sufficiently up to date) FILTER (?otherDocument > "2009-01-01"^^xsd:date) 30 # ... and the real name of that person ?potentialSubstitute foaf:name ?potentialSubstituteName . } In this query w e assume that Alice’s F OAF profile is a part of our collection, ha ving the URI .../employees#Alice . Nora retrieves all do cuments in the collection for which Alice is known to b e the resp onsible person. F or any ob ject O in each of these do cuments (e. g. the detailed specification of the braking distance function for straight-ahead driving s G from fig. 4), she selects those ob jects that are refined by O in terms of the V-Mo del (e. g. the general braking distance s ). A dditionally , she considers the mathematical dimension and selects all ob jects that are related to O by mathematical definition, e. g. the braking function that uses s G . Of an y such related ob ject, Nora finds out to what do cument it b elongs. 10 She is only interested in recent documents and therefore filters them by date. Finally , she determines the responsible p ersons via the version management links, and gets their names from their F O AF descriptions. The assumption b ehind this query is that, if, for example, Pierre is resp onsible for the sp ecification that in tro duces the general braking distance s , which Alice has refined, Pierre can b e considered as a substitute for Alice. Note that getting the answer draws on the collection structures of SAMSDo cs (V-Mo del), on the mathematical structures, as well as on the organizational structures (version management). It is easy to imagine ho w additional formalit y dimensions can b e emplo y ed for increasing precision or recall of the query , or for ranking results. Consider, for example, another filter that only accepts as substitutes employ ees who hav e never got a do cumen t rejected in any previous certification. The complexity of the query in listing 1.1 is directly caused by the com- plexit y of the underlying multi-dimensional structures and the non-triviality of answ ering high-level pro ject management queries from the detailed information in SAMSDocs . As users like Nora would not w ant to deal with a mac hine-oriented query language, we hav e developed a system that integrates versioned storage of semantic do cument collections with human-orien ted presentation with em- b edded interactiv e services [DKL + 10]. Thus, the rendered do cumen ts serve as command centers for executing queries and displa ying results 5 . They provide access to queries in tw o w ays: Queries with a fixed structure that hav e to be an- sw ered recurringly will b e made av ailable righ t in the (rendered) do cuments in appropriate places. This is the case with our employ eee substitution query: This mon th, Alice may b e ill, whereas next month, Bob may b e on holiday . Access to this query can b e given wherev er an employ ee or a reference to an emplo yee o ccurs in a document. Alternatively , non-prefabricated queries can b e comp osed more in tuitively on demand using a visual input form. These examples show that multi-dimensional queries like the ones naturally coming up in Softw are Engineering scenarios (sect. 3) can b e answered with ex- isting MKM technology . Moreov er, it illustrates that multi-dimensional markup affords multi-dimensional services. If w e in terpret our dimensions as distinct con- texts, our services b ecome context-sensitiv e, as dimensions can be filtered in and out. F or instance, the context menu of certification documents could be equipped with menu entries for committing an approv al or rejection to the server, which w ould only be displa yed to the certifier. The server could then trigger further actions, such as marking the do cument that con tains a rejected ob ject and all dep endencies of that ob ject as rejected, to o. In general, the more dimensions are formalized in a do cument, the more con text-sensitive services b ecome av ailable. 5 In particular, the rendered XHTML+MathML also preserves the original semantic structure as parallel MathML markup and RDF a annotations, so that a suitable bro wser plugin can dynamically generate in teraction p oints for semantic services; see [KKL10] for details. 11 6 Conclusion and F urther W ork In this pap er we hav e studied the applicability of MKM tec hnologies in Soft- w are Engineering beyond “F ormal Methods” (based on the concrete SAMSDo cs do cumen t collection and its formalization). The initial h yp othesis here is that con tract documents, design specifications, user manuals, and integration rep orts can b e partially formalized and integrated into a computer-supp orted softw are dev elopment pro cess. T o test this hypothesis w e hav e studied a collection of do cumen ts created for the developmen t of a safety zone computation, the formal v erification that the braking tra jectory alw ays lies in the safety zone, and the SIL3 certification of this fact by a public certification agency . As the pro ject do cumen ts contain a wealth of (informal) mathematical conten t, MKM formats (in this case our OMDo c format) are well-suited for this task. During the for- malization of the L A T E X part of the collection, we realized that the do cuments con tain an op en-ended, multi-dimensional space of formalit y that can b e used for supp orting pro jects — if made explicit. W e hav e shown that RDF a-based extensions of MKM formats, emplo ying flexible “metadata” relationships referencing sp ecific vocabularies , can b e used to encode this formality space and put it into service. W e ha v e pointed out that the “dimensions” of this space can b e seen to corresp ond to different meta- data vocabularies. Note that the distinction betw een data and metadata blurs here as, for example, the OMDo c data mo del realized by native markup in the OMDo c format can also b e seen as OMDo c metadata and could equally b e re- alized by RDF a annotations to some text markup format, where the meaning of the annotations is given by the OMDo c on tology . This “metadata view” is applicable to all MKM formats that mark up informal mathematical texts (e. g. MathDo x [CCB06] and MathLang [KWZ08]) as long as they formalize their data mo del in an ontology . This observ ation makes decisions ab out whic h parts of the formalit y space to supp ort with native markup a purely pragmatic c hoice and op ens up new p ossibilities in the design of representation formats. It seems plausible that all MKM formats use native markup for mathematical kno wledge structures (w e think of them as primary formalit y structures for MKM) and differ mostly in the secondary ones they internalize. XHTML+MathML+RDF a migh t even serv e as a baseline in terchange format for MKM applications 6 , since it is minimally committed. Note that if the metadata ontologies are represented in mo dular formats that admit theory morphisms, then these can b e used as crossw alks b etw een secondary metadata for higher levels of interoperability . W e lea ve its developmen t to future w ork. The formalized secondary formality structures can b e used for enriching in teractive do cument bro wsing and for enabling m ulti-dimensional metadata queries ov er documents and collections. W e ha ve sho wn a set of exemplary m ulti- dimensional services based on the RDF a-enco ded metadata, mostly cen tered around Linked Data approaches based on RDF-based queries. More services can 6 Indeed, a similar prop osal has been made for Semantic Wikis [VO06], which hav e related concerns but do not primarily inv olv e mathematics. 12 b e obtained by exp orting Linked Data to the Semantic W eb or a company in- tranet and thus enabling further reuse. In particular, the m ulti-dimensionality observ ed in this paper and its realization via flexible metadata regimes in repre- sen tation formats allows the knowledge engineers to tailor the level of formality to the in tended applications. In our case study , the metadata vocabularies ranged from pro ject-sp ecific ones that had to b e developed (e. g. definition tables) to general ones like the V-Mo del vocabulary , for whic h external ontologies could b e reused later on. W e exp ect that such a range is generally the case for Softw are Engineering pro jects, and that the pro ject-sp ecific v o cabularies ma y stabilize and be standardized in comm unities and companies, low ering the formalization effort en tailed b y eac h individual pro ject. In fact we anticipate that suc h metadata vocabularies and the softw are dev elopment supp ort services will b ecome part of the strategic kno wledge of technical organizations. In [CF09, 241] Carette and F armer challenge MKM researc hers by as- sessing some of their tec hnologies: “ A lack of r e quir ements analysis very often le ads to inter esting solutions to pr oblems which did not nee d solving ” . With this pap er we hop e to hav e shown that MKM technologies can b e extended to cop e with “real w orld concerns” (in Softw are Engineering). Indeed, industry is b ecoming more and more a ware of and interested in Linked Data (see e. g. [Ser08] and [LDF, Question 14]), whic h b o osts relev ance to the multi-dimensional knowledge man- agemen t metho ds presented in this pap er. References ABMP08. Ben A dida, Mark Birbeck, Shane McCarron, and Steven Pem b erton. RDF a in XHTML: Syn tax and pro cessing. W3C Recommendation, W orld Wide W eb Consortium (W3C), Octob er 2008. BM07. Dan Bric kley and Libby Miller. F OAF vocabulary sp ecification 0.91. T ech- nical rep ort, ILR T Bristol, Nov em b er 2007. CCB06. A. M. Cohen, H. Cuypers, and E. Reinaldo Barreiro. Mathdo x: Mathematical do cumen ts on the w eb. In OMDoc – An op en markup format for mathe- matic al do cuments [V ersion 1.2] [Koh06], c hapter 26.7, pages 278–282. CF09. Jacques Carette and William F armer. A review of mathematical kno wledge managemen t. In Jacques Carette, Lucas Dixon, Claudio Sacerdoti Coen, and Stephen M. W att, editors, MKM/Calculemus 2009 Pr o c e e dings , num b er 5625 in LNAI, pages 233–246. Springer V erlag, July 2009. DCM08. DCMI Usage Board. DCMI metadata terms. DCMI recommendation, Dublin Core Metadata Initiativ e, 2008. DKL + 10. Catalin David, Michael Kohlhase, Christoph Lange, Florian Rab e, Nikita Zhiltso v, and V yac heslav Zholudev. Publishing math lecture notes as linked data. In Lora Aroy o, Grigoris An toniou, and Eero Hyvönen, editors, ESWC , n umber 6089 in Lecture Notes in Computer Science, pages 370–375. Springer, June 2010. Dub08. Dublin Core metadata element set. DCMI recommendation, Dublin Core Metadata Initiativ e, 2008. Dub10. Edd Dubmill. DOAP – description of a pro ject. http://trac.usefulinc. com/doap , seen Mar. 2010. 13 Eri07. Henrik Eriksson. The semantic-document approach to combining do cu- men ts and ontologies. International Journal of Human-Computer Studies , 65(7):624–639, 2007. FHL + 08. Udo F rese, Daniel Hausmann, Christoph Lüth, Holger Täubig, and Dennis W alter. The imp ortance of b eing formal. In Hardi Hungar, editor, Interna- tional W orkshop on the Certific ation of Safety-Critic al Softwar e Contr ol le d Systems SafeCert’08 , volume 238 of Ele ctr onic Notes in The or etic al Com- puter Science , pages 57–70, September 2008. F or08. F ormalSafe. http://www.dfki.de/sks/formalsafe/ , 2008. seen March 2010. GMH + 07. T udor Groza, Knud Möller, Siegfried Handsch uh, Diana T rif, and Ste- fan Deck er. SAL T: W eaving the claim web. In Karl Ab erer, Key-Sun Choi, Natasha F ridman Noy , Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, P eter Mik a, Diana Ma ynard, Riichiro Mizoguchi, Guus Sc hreib er, and Philipp e Cudré-Mauroux, editors, ISW C/ASWC , num- b er 4825 in Lecture Notes in Computer Science, pages 197–210. Springer, 2007. HS06. Hans-Jörg Happ el and Stefan Seedorf. Applications of ontologies in softw are engineering. In Pr o c. 2 nd International W orkshop on Semantic W eb Enable d Softwar e Engine ering (SWESE ’06) , 2006. KKL10. Andrea K ohlhase, Michael Kohlhase, and Christoph Lange. sT eX – a system for flexible formalization of linked data. submitted to I-SEMANTICS 2010, 2010. K oh06. Michael K ohlhase. OMDoc – An op en markup format for mathematic al do cuments [V ersion 1.2] . Number 4180 in LNAI. Springer V erlag, August 2006. K oh08. Michael Kohlhase. Using L A T E X as a semantic markup format. Mathematics in Computer Scienc e , 2(2):279–304, 2008. K oh10. Michael Kohlhase. An open markup forma t for mathematical documents OMDo c [ve rsion 1.3]. Draft Sp ecification, 2010. KWZ08. F airouz Kamareddine, J. B. W ells, and Christoph Zengler. Computerising mathematical text with mathlang. Ele ctr on. Notes The or. Comput. Sci. , 205:5–30, 2008. Lan10. Christoph Lange. The OMDoc do cument on tology. web page at http: //kwarc.info/projects/docOnto/omdoc.html , 2010. seen 3/2010. LDF. Link ed data F AQ. http://structureddynamics.com/linked_data. html . LFB96. Jinxin Lin, Mark S. F ox, and T aner Bilgic. A requirement ontology for engi- neering design. In Pr o c e edings of 3 r d International Confer enc e on Concurr ent Engine ering , pages 343–351. T echnomic Publishing Company , Inc., August 1996. MML07. Till Mossako wski, Christian Maeder, and Klaus Lüttich. The heterogeneous to ol set. In Orna Grumberg and Mic hael Huth, editors, Pr o c e e dings of the 13 th International Confer enc e on T o ols and Algorithms for the Construction and Analysis of Systems T ACAS-2007 , num b er 4424 in LNCS, pages 519–522, Berlin, German y , 2007. Springer V erlag. NPW02. T obias Nipko w, Lawrence C. Paulson, and Markus W enzel. Isab el le/HOL — A Pr o of Assistant for Higher-Or der Lo gic . Number 2283 in LNCS. Springer, 2002. PS08. Eric Prud’hommeaux and Andy Seab orne. SP ARQL query language for RDF. W3C Recommendation, W orld Wide W eb Consortium (W3C), Jan uary 2008. 14 RDF04. Resource description framew ork (RDF). http://www.w3.org/RDF/ , 2004. SAM09. SAMS. SAMSDo cs: The document collection of the SAMS pro ject, 2009. http://www.sams- projekt.de . Ser08. F rançois-Paul Serv ant. Linking en terprise data. In Christian Bizer, T om Heath, Kingsley Idehen, and Tim Berners-Lee, editors, Linke d Data on the W eb (LDOW 2008) , n umber 369 in CEUR W orkshop Proceedings, April 2008. SIM99. F rank M. Shipman I I I and Raymond J. McCall. Incremental formalization with the hyper-ob ject substrate. ACM T rans. Inf. Syst. , 17(2):199–227, 1999. SK G + 10. Heinric h Stamerjohanns, Mic hael Kohlhase, Deyan Ginev, Catalin Da vid, and Bruce Miller. T ransforming large collections of scientific publications to XML. Mathematics in Computer Scienc e , 2010. in press. So c09. American Mathematical So ciety . Mathematics Sub ject Classification MSC2010. http://www.ams.org/mathscinet/msc/ , 2009. SWJL10. Milan Stanko vic, Claudia W agner, Jelena Jov anovic, and Philipp e Laublet. Lo oking for exp erts? what can link ed data do for you? In Christian Bizer, T om Heath, Tim Berners-Lee, and Mic hael Hausenblas, editors, Linke d Data on the W eb (LDOW 2010) , CEUR W orkshop Pro ceedings, April 2010. V O06. Max Völkel and Eyal Oren. T ow ards a Wiki Interc hange F ormat (WIF). In Max Völkel, Sebastian Schaffert, and Stefan Deck er, editors, Pr o c e e dings of the 1 st W orkshop on Semantic Wikis, Eur op e an Semantic W eb Confer enc e 2006 , num b er 206 in CEUR W orkshop Proceedings, Budv a, Mon tenegro, June 2006. 15
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment