Representation Learning for Medical Data

Reading time: 5 minute
...

📝 Original Info

  • Title: Representation Learning for Medical Data
  • ArXiv ID: 2001.08269
  • Date: 2020-01-24
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명 및 소속을 확인할 수 없는 경우, 추후 원문을 참고하시기 바랍니다.) **

📝 Abstract

We propose a representation learning framework for medical diagnosis domain. It is based on heterogeneous network-based model of diagnostic data as well as modified metapath2vec algorithm for learning latent node representation. We compare the proposed algorithm with other representation learning methods in two practical case studies: symptom/disease classification and disease prediction. We observe a significant performance boost in these task resulting from learning representations of domain data in a form of heterogeneous network.

💡 Deep Analysis

📄 Full Content

Representation learning is a group of machine learning methods that aims to find useful representations of the data. The "usefulness" is typically understood in terms of extraction of features that are meaningful from the point of view of target objective. For neural networks, such representation is defined as a mapping f of input representations to d -dimensional vector space: f : V → R d .The development of representation learning is motivated by numerous experimental results showing that extracting the features of the data improves the performance of the network compared to the "naive" data encoding schemes such as binary or one-hot encoding. This is further encouraged by the observations that many deep learning architectures seem to naturally learn the layer-wise representation of the features during the training -a phenomenon which some researchers point out as an important factor contributing to great performance of DL methods. Not without the significance is also the fact that the such internal representations can be, at least in some cases, interpreted by humans, which is a step toward improving the explainability of deep neural models.

To the date, machine learning applications for

The exact algorithm of random walks differs between the algorithms. Authors of node2vec note that there are two distinct kinds of node similarities: homophily (occurring in nodes that are close to each other) and structural equivalence (occurring in nodes that have similar structural roles in the network but are not necessarily closely interconnected). The random walk procedure used in word2vec is characterized by two hyperparameters that incorporate both notions of similarity. The unnormalized transition probability from node v i to node v i+1 given previous node v i-1 is:

)=2

(1)

The

) denotes the shortest path between previous and next node. The return parameter p controls the likelihood of returning to already visited node, while the in-out parameter q controls the tendency to explore outward nodes.

Thanks to this, the random walk can result in different pairs of neighbors, depending on which similarity seems more suitable to the target task.

Specifically, sampling strategy used in DeepWalk is the one where p=1 and q=1 , meaning that each node has the same probability of being visited.

Metapath2vec is a modification of node2vec for heterogeneous networks. Heterogeneous network is defined as a graph G=(V , E ,T ) in which each node and each edge is associated with mapping

tively. Instead of using random walks scheme with explicit p and q parameters, metapath2vec utilizes additional information about node types to provide an alternative method, called meta-path-based random walks. The flow of the walk is determined by the so-called meta-path defined as

are respective node types and R 1 … R l-1 are rela- tions between them. The transition probability for a node at step i is given by: p (v

(2)

Each of the above algorithm can also be used for learning edge representations. They are obtained by combining node representations of adjacent nodes using binary operators, for example average or Hadamard product. This allows to use these algorithm for edge-related tasks, such as link prediction (predicting whether two nodes should be connected or not) or edge classification. an ease to gather the actual data. It can be gathered

The original metapath2vec algorithm uses only a single meta-path to generate walks. This may be an issue if relationships that we want to be included does not form a path. An example can be seen in range. Following methods were used:

• No pretraining. The embedding layer was only initialized with random values.

• node2vec: an original implementation, with parameter values p=1 , q=1 , d =2238 , k =100 , r=10 , l=80 .

• metapath2vec: a custom implementation based on the original code, with parameters values the same as above ( p and q were not used) and with a single meta-path d -s-n-s-w-s-d .

• multi-metapath2vec: a custom implementation, with parameters values the same as

The task is to classify disease (d ) or symptom name (n) nodes according to the subgroup in ICD- 10 classification. For example, disease ‘Other atopic dermatitis’ (ICD code L20.8) is assigned to the subgroup ‘L20-L30 Dermatitis and eczema’. The input vector consists of a single node whereas the output vector is onehot-encoded subgroup label.

The full dataset contains a total 43 classes -33 classes of symptom names and 10 classes of diseases. To make the task non-trivial, a certain percentage of the training data is not used. The network should therefore rely on the knowledge about neighbor nodes, in a form of embedding layer, in order to make correct classification. We analyze two ranges of such incompleteness: from 0% to 90% (with step 10%) and from 90% to 99% (with step 1%) of missing data. obtained the worst values, being outperformed by both metapath2vec and multi-metapath2vec for most of the ranges. However, fo

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut