The Online Software, manager of the JUNO data acquisition (DAQ) system, is composed of many distributed components working coordinately. It takes the responsibility of configuring, processes management, controlling and information sharing etc. The design of service-oriented architecture (SOA) which represents the modern tendency in the distributed system makes the online software lightweight, loosely coupled, reusable, modular, self-contained and easy to be extended. All the services in the SOA distributed online software system will send messages each to another directly without a traditional broker in the middle, which means that services could operate harmoniously and independently. ZeroMQ is chosen but not the only technical choice as the low-level communication middle-ware because of its high performance and convenient communication model while using Google Protocol Buffers as a marshaling library to unify the pattern of message contents. Considering the general requirement of JUNO, the concept of partition and segment are defined to ensure multiple small-scaled DAQs could run simultaneous and easy to join or leave. All running data except the raw physics events will be transmitted, processed and recorded to the database. High availability (HA) is also taken into account to solve the inevitable single point of failure (SPOF) in the distribution system. This paper will introduce all the core services' functionality and techniques in detail.
HE Jiangmen Underground Neutrino Observatory (JUNO) [1] is a multipurpose neutrino experiment under construction in South of China. It is designed to determine neutrino mass hierarchy, precisely measure oscillation parameters and carry out many other frontier scientific researches by detecting reactor neutrinos from the Yangjiang and Taishan Nuclear Power Plants in Jiangmen, China as shown in Fig. 1. JUNO [1] will be the largest liquid scintillator detectors (20 kton) and it will make use of about 20000 20'' photomultiplier tubes (PMT) and 25000 3'' PMTs providing an unprecedented energy resolution.
Since the numerous PMTs with large-scaled front-end electronics readout channels in JUNO, the current data acquisition (DAQ) is designed to be composed of thousands of software processes to accomplish data-taking, online process and event storage. So, the scalability of the DAQ cluster is over several hundreds of computing nodes connected via high speed network to obtain powerful computing ability and satisfy the general requirement of high bandwidth.
The Online Software, a distributed system, which is designed to control and monitor the whole DAQ system, plays important global roles during the whole data taking period in JUNO, including configuring, monitoring, controlling, multi-processes management, information sharing, in addition to, however, the access to the raw events. It is a customizable distributed framework, which provides essentially the ‘glue’ that holds the various sub-systems together [2] and makes them work coordinately. It provides interfaces to the Dataflow System (which is responsible for the transportation of the raw data from the readout drivers to mass storage) [3], to the remote web control system as well as the detector control system (DCS).
The Online Software is designed to be a common framework fit for different high energy physics experiments such as JUNO. So the notion of Partition is strongly advised in case the detectors are independent or the experiment requires multiple DAQ systems running concurrently. Partition is related to the organization and hierarchy of DAQ architecture. 2) High availability Requirement In a distributed system, the problem of single point of failure (SPOF), especially the SPOF of the core components has to be solved to improve the availability of the online software system and even the availability of DAQ.
The general definition of Service Oriented Architecture (SOA) is a deployment methodology relying on the integration and interaction between loosely coupled services. In simple terms, it is a software architecture that treats any running application as a service. This design concept is popular in recent years, for SOA makes the distributed system flexible, modular, and reusable. A Service is a well-encapsulated function unit that runs independently and can be accessed remotely.
As SOA is service oriented, that all functional units act as service module and communication between services is by way of message exchange [4], the underlying communication layer should be considered first.
After comparing a number of communication libraries available on the open software market, ZeroMQ[5] is chosen as the implementation of the network communication layer [6] to construct the message model to transmit data among distributed services. ZeroMQ’s application programming interface (API) seems like traditional Berkeley socket, encapsulates low-levels statement and error handling complexities, however. Several typical and advanced communication models are provided: request-reply, publish-subscribe and push-pull. Different services could use different communication models.
Protocol Buffers[7], developed by Google, is a language-neutral, platform-neutral, extensible way of serializing structured data -think XML, but smaller, faster, and simpler for use in communications protocols, data storage, and more. It provides easy and multi-language supported serialization method to allow the users to define unique ZeroMQ messages to transport among services.
When it comes to high availability (HA), Master/Slave models is the primary solution to be considered. But how to monitor the failover and make the slave take over the task that matters. Zookeeper [8], a reliable, scalable distributed coordination system widely used in big data framework such as Hadoop, could solve the problems and act as the service broker in SOA based online software.
Fig 2 shows the SOA based design of online software. The Green part consists of all online software services discussed in the paper, providing functionality for external components such as the orange parts-user apps or upper services. Table 1 gives an overview of the core services of the online software which will be described below in detail.
The Run Control service of the Online Software supplies all the necessary control and supervision for data taking by coordinating the different DAQ subsystem and detector operations from user
This content is AI-processed based on open access ArXiv data.