IISCNLP at SemEval-2016 Task 2: Interpretable STS with ILP based Multiple Chunk Aligner

System Chunks Track: Chunking Module

When gold chunks are not given, we perform an additional chunking step. We use two methods for chunking: (1) With OpenNLP Chunker (2) With stanford-core-nlp API for generating parse trees and using the chunklink tool for chunking based on the parse trees.

For chunking, we do preprocessing to remove punctuations unless the punctuation is space separated (therefore constitutes an independent word). We also convert unicode characters to ascii characters. Output of chunker is further post-processed to combine each single preposition phrase with the preceding phrase. We noted that the OpenNLP chunker ignored last word of a sentence, in which case, we concatenated the last word as a separate chunk. In the case of chunking based on stanford-core-nlp parser, we noted that in several instances, particularly in the student answer dataset, a conjunction such as ‘and’ was consistently being separated into an independent chunk in most cases, and therefore improved chunking can be realized by potentially combining chunks around a conjunction. These processing heuristics are based on observations from gold chunks data. We observe that quality of chunking has a huge impact on the overall score in system chunks track. As future work, we are exploring ways to improve the chunking with custom algorithms.

Problem Formulation: Following is the formal definition of our problem. Consider source sentence ($`Sent_1`$) with M chunks and target sentence ($`Sent_2`$) with N chunks. Consider sets $`C^1=\{ c_1^1,\hdots,c_M^1 \}`$, the chunks of sentence $`Sent_1`$ and $`C^2=\{ c_1^2,\hdots,c_N^2 \}`$, the chunks of sentence $`Sent_2`$. Consider sets $`\mathscr S_1 \subset PowerSet(C^1)-\phi`$ and $`\mathscr S_2 \subset PowerSet(C^2)-\phi`$. Note that $`\mathscr S_1`$ and $`\mathscr S_2`$ are subsets of the power set (set of all possible combinations of sentence chunks) of $`C^1`$ and $`C^2`$ respectively. Consider sets $`S_1 \in \mathscr S_1`$ and $`S_2 \in \mathscr S_2`$, which denotes a specific subset of chunks that are likely to be combined during alignment. Let $`concat(S_1 )`$ denote the phrase resulting from concatenation of chunks in $`S_1`$ and $`concat(S_2)`$ denote the phrase resulting from concatenation of chunks of $`S_2`$. Consider a binary variable $`Z_{S_1,S_2}`$ that takes value 1 if $`concat(S_1)`$ is aligned with $`concat(S_2)`$ and 0 otherwise.

The goal of alignment module is to determine the decision variables ($`Z_{S_1,S_2}`$), which are non-zero. $`S_1`$ and $`S_2`$ can have more than one chunk (multiple alignment), that are not necessarily contiguous. Aligned chunks are further classified using Type classifier and Score classifier. Type prediction module identifies a pair of aligned chunks ($`concat(S_1), concat(s_2))`$) with a relation type like EQUI (equivalent), OPPO (opposite) etc. Score classifier module assigns a similarity score ranging between 0-5 for a pair of chunks. For the system chunks track, the chunking module, converts sentences $`Sent_1, Sent_2`$ to sentence chunks $`C_1,C_2`$.

iMATCH: ILP based Monolingual Aligner for Multiple-Alignment at the Chunk Level

iMatch: An example illustrating notation

We approach the problem of multiple alignment (permitting non-contiguous chunk combinations) by formulating it as an Integer Linear Programming (ILP) optimization problem. We construct the objective function as the sum of all $`Z_{S_1,S_2}, \forall S_1,S_2`$ weighed by the similarity between $`concat(S_1)`$ and $`concat(S_2)`$, subject to constraints to ensure that each chunk is aligned only a single time with any other chunk. This leads to the following optimization problem based on Integer linear programming :

\begin{equation}
\begin{aligned}
& \underset{Z}{\text{max}} 
 &  \underset{S_1 \in \mathcal S_1, S_2 \in \mathcal S_2} \Sigma Z_{S_1, S_2} ~ \alpha(S_1, S_2) ~ Sim({S_1, S_2}) \\ \nonumber
 & \text{S.T}\nonumber
 & \underset{\bar S_1 = \{S: c^1 \in S ,S \in \mathscr S_1 \}, S_2 \in \mathscr S_2 }{\Sigma} {Z_{S_1,S_2}} \leq 1, \forall 1 \leq c^1 \leq M\\ \nonumber
 & & \underset{S_1 \in \mathscr S_1, \bar S_2 = \{S: c^2 \in S, S \in \mathscr S_2 \} }{\Sigma} {Z_{S_1,S_2}} \leq 1, \forall 1 \leq c^2 \leq N \\ \nonumber
& &  Z_{S_1,S_2} \in \{0,1\}, \forall  S_1 \in \mathscr S_1,S_2 \in \mathscr S_2 \\\nonumber
\end{aligned}
\end{equation}

Optimization constraints ensure that a particular chunk $`c`$ appears in an alignment a single time with any subset of chunks in the other sentence. Therefore, one chunk can be part of alignment only once. We note that all possible multiple alignments are explored by this optimization problem when $`\mathscr S_1 = PowerSet(C^1) - \phi`$ and $`\mathscr S_2 = PowerSet(C^2) - \phi`$. However, this leads to a very high number of decision variables $`Z_{S_1,S_2}`$, not suitable for realistic use. Hence we consider a restricted usecase

\mathscr S_1 = \{C^1_1\}, \hdots, \{C^1_M\} \cup \{ \{C^1_i,C^1_j\}: 1\leq i < j \leq M\}

\mathscr S_2 = \{C^2_1\}, \hdots, \{C^2_N\} \cup \{ \{C^2_i,C^2_j\}: 1\leq i < j \leq N\}

This leads to many-to-many alignment where at most two chunks are combined to align with two other chunks. For iSTS task submission, we restrict our experiments to this setting (since this worked well for the iSTS task), but can relax sets $`S_1`$ and $`S_2`$ to cover combinations of 3 or more chunks. For efficiency, it should be possible to consider a subset of chunks based on adjacency information, existence of a dependency using dependency parsing techniques. $`Sim({S_1, S_2})`$, the similarity score, that measures desirability of aligning $`concat(S_1)`$ with $`concat(S_2)`$, plays an important role in finding the optimal solution for the monolingual alignment task. We compute this similarity score by taking the maximum of similarity scores obtained from a subset of features F1, F2, F3, F8, F10 and F11 given in Table 1 as follows: $`max(F1, F2, F3, F8, F10, F11)`$. During implementation, the weighting term, $`\alpha({S_1,S_2})`$ is set as a function of the cardinality of $`S_1`$ and cardinality of $`S_2`$ to ensure aligning fewer individual chunks (for instance, single alignment tends to increase objective function value more due to more aligned pairs, since similarity scores are normalized to lie between -1 and 1) does not get an undue advantage over multiple alignment. This is a hyper-parameter whose value is set using simple grid search. We solve the actual ILP optimization problem using PuLP , a python toolkit for linear programming. Our system achieved the best alignment score for headlines datasets in the gold chunks track.