Finding the different patterns in buildings data using bag of words representation with clustering

Reading time: 5 minute
...

📝 Abstract

The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.

💡 Analysis

The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.

📄 Content

Finding the different patterns in buildings data using bag of words representation with clustering
Usman Habib, Gerhard Zucker Energy Department, Sustainable Buildings and Cities AIT Austrian Institute of Technology
Vienna, Austria {usman.habib, gerhard.zucker}@ait.ac.at

Abstract—The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K-Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.
Keywords— Building energy performance; Fault detection and diagnosis (FDD); clustering; symbolic aggregate approximation (SAX); Bag of words representation (BoWR); hierarchical clustering; Dynamic time warping (DTW); Coefficient of Performance (COP) I. INTRODUCTION A lot of raw data is recorded during the monitoring of the energy efficient buildings [1]. In order to find the different aspect of the buildings performance the data is analyzed at later stages. The experts in the field usually analyze the data using different visualization tools [2]. The huge amount of data recorded makes it difficult for the experts to have a detailed performance analysis of buildings, thus making it hard to capture the different patterns, hence may lead to faults in the different components of building, reducing the energy efficiency. The use of different data mining techniques can help in finding the different patterns in the buildings data, particularly clustering [3]–[5]. The automatic extraction of different patterns in large data set reduces the burden on experts in finding the different patterns in the data manually and helps in detailed analysis of the data. Therefore, the process of finding different patterns in the data can be feasible and less labor extensive [3], [4], [6]. In this paper an approach for automatically finding the different patterns in the building components operation has been proposed. In order to validate the outcomes, the proposed method has been applied to a data of adsorption chiller and compared to another approach called dynamic time warping (DTW). In first the ON/OFF cycle of the chiller is detected using the K-Means clustering algorithm, as the behavior of the chiller varies in these two different states. The patterns during the ON (operational) cycle is of greater importance for finding the performance of chillers and faults detection and diagnosis (FDD), therefore the data having ON cycle is considered in the proposed approach. Moreover, neglecting the OFF cycle will reduce the amount of data as well. The normalized ON cycles are discretized by using symbolic aggregate approximation (SAX). These discretized values are symbols or words. After transformation of the ON cycles to words, a histogram for each ON cycle is created called as bag of words representation (BoWR). Then the BoWR of ON cycles are clustered using hierarchical clustering. Furthermore, the results of the BoWR method with hierarchical clustering and DTW with hierarchical clustering are compared using cophenetic correlation. The cophenetic correlation demonstrates that the cluster tree have a strong correlation with the distances between objects in the distance vector [7].
The paper is structured as follows. In Section II the relevant research work is discussed. Section III describes the system design while Section IV explains the methodology of the proposed method. Section V explains the results and outcomes applied to real life data. Finally, the conclusion and future directions is given in Section VI.
II. STATE OF THE ART The advancement in sensors technology has made it feasible to record huge amount of data in commercial buildings. The huge amount of data storage makes it manually impossible to analyze it in detail. There are different tools used by experts in the field to visualize the data, which will further require manual analysis for performance of buildings. This process can be time extensive and there is always a chance to overlook some areas of interest [2]. There are different procedures available for faults detection and diagnosis (FDD) in buildings components e.g. HVAC (Heating, Ventilation and Air-Conditioning). Although, use of the earlier knowledge about the sy

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut