Researchers are often perplexed when their machine learning algorithms are required to deal with complex number. Various strategies are commonly employed to project complex number into real number, although it is frequently sacrificing the information contained in the complex number. This paper proposes a new method and four techniques to represent complex number as real number, without having to sacrifice the information contained. The proposed techniques are also capable of retrieving the original complex number from the representing real number, with little to none of information loss. The promising applicability of the proposed techniques has been demonstrated and worth to receive further exploration in representing the complex number.
Machine learning (ML) process is heavily depended on the properties of its dataset. A well-constructed dataset oftentimes leads to the satisfactory performance of the ML algorithm, and vice versa. The values of the dataset attributes can be either numerical values or nominal values [1], with the numerical values as the most prominently used attributes.
For most of the times, the numerical value is a single real number. However, in some rare cases, the numerical value is a complex number. Researchers are often baffled when encountering these complex numbers. This is because ML algorithms and toolkits are commonly designed to handle real numerical value, such as WEKA [2].
There exist some strategies employed in handling the complex number. Some researchers chose to ignore the imaginary number part of the complex number and only deals with the real number part [3,4], whereas others decide to split the complex number into 2 numerical values, effectively doubling the number of attributes used by ML algorithms [5,6]. Some other researchers represent the complex number as a numerical or nominal value, providing a simple mapping between one value to another [7], while others decide to leave the complex number as is, and treat it as a nominal or textual data. There are also other strategies not covered in this paper.
Hence, it is evident that there are no universal or standard mechanisms on handling the complex number. Each strategy presents their own weaknesses. Ignoring the imaginary part is commendably altering the nature of the data, since there are losses of information associated with the imaginary number part of the complex number [8].
On the other hand, splitting the complex number into two values theoretically retain the information, since both numbers is intact; however, the relationship between these two values can be lost, since some ML algorithms doesn’t maintain the dependencies between two attributes, most notably is feature selection algorithms [1,9]. Feature selection algorithms may deem an attribute which contains real number part is important, and thus discarding the attribute containing imaginary number part, or vice versa.
Conversely, mapping a complex number with another numerical value, preferably natural number, or even with nominal value, present another set of challenges. The mapping will be increasingly larger when the complex number is continuous. Lastly, even though treating the complex number as a textual value seems to be the safest option, since there is no loss of information occurred, the ML algorithms may not be able to determine the similarity of one attribute to another.
Hence, it is necessary to formulate a procedure to represent the complex number in the ML domain without having to sacrifice the information contained, and thus reducing the space required to retain the information and allowing better inference to be obtained. Therefore, this paper proposes novel complex number representation algorithms which can retain the information, and more importantly, allows for original value reconstruction. This concept is almost like the feature projection method of dimensionality reduction. The remainder of the paper is structured as follows. In next section, the proposed techniques are provided. In Section 3, experimental setup involving the dataset preparation and experimental design are presented. The outcomes showcasing the reconstruction capability of the proposed techniques, and conclusion and future works are elaborated in Sections 5 and 6, respectively.
Every complex number can be expressed by specifying either the Cartesian coordinates (CC) or the polar coordinates (PC). The complex number 𝑐 can be represented in CC as
where 𝑖̂ is the imaginary unit. The CC 𝑥, 𝑦 can be described as PC 𝑟, 𝜑 with 𝑟 ≥ 0 and 𝜑 ∈ [0,2𝜋) using
where atan2(𝑦, 𝑥) is a special case of the arctangent function such that
and conversely using
Both ways of representing the complex numbers as a CC or as PC will resulting in twice of the features vector space and the correlation between 𝑥, 𝑦 or 𝑟, 𝜑 can be lost in the ML process [8]. Thus, both two value pairs should be distinctly encoded into a single unique value, hence the correlation of the two values can be conserved. Pairing function (PF) can be employed to achieve this goal. In this paper, two renowned PFs are used, which are Cantor [10] and Szudzik [11] PFs. The formula to calculate Cantor PF and its inverse function is defined in Equations ( 5) and ( 6) respectively as
while the formula to calculate Szudzik PF and its inverse function is defined in Equations ( 7) and ( 8) respectively as
where 𝑝 is the first number, 𝑞 is the second number, 𝐶 is the Cantor paired value, and 𝑆 is the Szudzik paired value.
However, since these PFs can only be employed to distinctly encode positive natural numbers [11,12], both 𝑥, 𝑦 or 𝑟, 𝜑 which are usually kept as 64-bit double-precision floating-number format, should be transformed to natural numbers to
This content is AI-processed based on open access ArXiv data.