Finding the different patterns in buildings data using bag of words representation with clustering
📝 Abstract
The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.
💡 Analysis
The understanding of the buildings operation has become a challenging task due to the large amount of data recorded in energy efficient buildings. Still, today the experts use visual tools for analyzing the data. In order to make the task realistic, a method has been proposed in this paper to automatically detect the different patterns in buildings. The K Means clustering is used to automatically identify the ON (operational) cycles of the chiller. In the next step the ON cycles are transformed to symbolic representation by using Symbolic Aggregate Approximation (SAX) method. Then the SAX symbols are converted to bag of words representation for hierarchical clustering. Moreover, the proposed technique is applied to real life data of adsorption chiller. Additionally, the results from the proposed method and dynamic time warping (DTW) approach are also discussed and compared.
📄 Content
Finding the different patterns in buildings data using bag of words representation with
clustering
Usman Habib, Gerhard Zucker
Energy Department, Sustainable Buildings and Cities
AIT Austrian Institute of Technology
Vienna, Austria
{usman.habib, gerhard.zucker}@ait.ac.at
Abstract—The understanding of the buildings operation has
become a challenging task due to the large amount of data
recorded in energy efficient buildings. Still, today the experts use
visual tools for analyzing the data. In order to make the task
realistic, a method has been proposed in this paper to
automatically detect the different patterns in buildings. The
K-Means clustering is used to automatically identify the ON
(operational) cycles of the chiller. In the next step the ON cycles
are transformed to symbolic representation by using Symbolic
Aggregate Approximation (SAX) method. Then the SAX symbols
are converted to bag of words representation for hierarchical
clustering. Moreover, the proposed technique is applied to real
life data of adsorption chiller. Additionally, the results from the
proposed method and dynamic time warping (DTW) approach
are also discussed and compared.
Keywords— Building energy performance; Fault detection and
diagnosis (FDD); clustering; symbolic aggregate approximation
(SAX); Bag of words representation (BoWR); hierarchical
clustering; Dynamic time warping (DTW); Coefficient of
Performance (COP)
I. INTRODUCTION
A lot of raw data is recorded during the monitoring of the
energy efficient buildings [1]. In order to find the different
aspect of the buildings performance the data is analyzed at later
stages. The experts in the field usually analyze the data using
different visualization tools [2]. The huge amount of data
recorded makes it difficult for the experts to have a detailed
performance analysis of buildings, thus making it hard to
capture the different patterns, hence may lead to faults in the
different components of building, reducing the energy
efficiency.
The use of different data mining techniques can help in
finding the different patterns in the buildings data, particularly
clustering [3]–[5]. The automatic extraction of different
patterns in large data set reduces the burden on experts in
finding the different patterns in the data manually and helps in
detailed analysis of the data. Therefore, the process of finding
different patterns in the data can be feasible and less labor
extensive [3], [4], [6].
In this paper an approach for automatically finding the
different patterns in the building components operation has
been proposed. In order to validate the outcomes, the proposed
method has been applied to a data of adsorption chiller and
compared to another approach called dynamic time warping
(DTW). In first the ON/OFF cycle of the chiller is detected
using the K-Means clustering algorithm, as the behavior of the
chiller varies in these two different states. The patterns during
the ON (operational) cycle is of greater importance for finding
the performance of chillers and faults detection and diagnosis
(FDD), therefore the data having ON cycle is considered in the
proposed approach. Moreover, neglecting the OFF cycle will
reduce the amount of data as well. The normalized ON cycles
are discretized by using symbolic aggregate approximation
(SAX). These discretized values are symbols or words. After
transformation of the ON cycles to words, a histogram for each
ON cycle is created called as bag of words representation
(BoWR). Then the BoWR of ON cycles are clustered using
hierarchical clustering. Furthermore, the results of the BoWR
method with hierarchical clustering and DTW with hierarchical
clustering are compared using cophenetic correlation. The
cophenetic correlation demonstrates that the cluster tree have a
strong correlation with the distances between objects in the
distance vector [7].
The paper is structured as follows. In Section II the relevant
research work is discussed. Section III describes the system
design while Section IV explains the methodology of the
proposed method. Section V explains the results and outcomes
applied to real life data. Finally, the conclusion and future
directions is given in Section VI.
II. STATE OF THE ART
The advancement in sensors technology has made it
feasible to record huge amount of data in commercial
buildings. The huge amount of data storage makes it manually
impossible to analyze it in detail. There are different tools used
by experts in the field to visualize the data, which will further
require manual analysis for performance of buildings. This
process can be time extensive and there is always a chance to
overlook some areas of interest [2].
There are different procedures available for faults detection
and diagnosis (FDD) in buildings components e.g. HVAC
(Heating, Ventilation and Air-Conditioning). Although, use of
the earlier knowledge about the sy
This content is AI-processed based on ArXiv data.