Automatic Analysis of Collaboration Through Human Conversational Data Resources: A Review

A utomatic Analysis of Collaboration Thr ough Human Con ver sational Data Resources: A Revie w Yi Y u 1,2 , Maria Boritche v 2 , Chloé Clavel 1,2 1 INRIA P aris, ALMAnaCH, 2 L TCI, Télécom P ar is, Institut P olytechnique de P aris, F r ance 48, rue Barrault 75013 P aris, 19 place Marguerite P erey , 91120 Palaiseau {yi.yu, chloe.cla vel}@inria.fr , maria.boritchev@telecom-paris.fr Abstract Collaboration is a task-or iented, high-level human beha vior . In most cases, conv ersation ser ves as the primary medium f or inf or mation e xchange and coordination, making conv ersational data a valuab le resource f or the automatic analysis of collaborativ e processes. In this paper , we f ocus on verbal aspects of collaboration and conduct a re view of collaboration analysis using task-oriented conv ersation resources, encompassing related theories, coding schemes, tasks, and modeling approaches. We aim to address the question of how to utilize task-or iented human-human conv ersational data f or collaboration analysis. We hope our re view will serve as a pr actical resource and illuminate une xplored areas for future collaboration analysis . Ke ywords: Multimodal T ask-Or iented Conv ersation Resources, Human Collaboration Analysis, Literature Re view 1. Introduction Collaboration analysis (CollA) seeks to use com- putational methods to model how people coordi- nate, think, and learn in shared tasks in order to gain insights that improv e both collaboration pro- cesses and outcomes ( Mar tinez-Maldonado et al. , 2021 ). As collaboration is a fundamental human behavior , CollA has broad applications, includ- ing education ( J aques et al. , 2023 ), management ( Casey-Campbell and Mar tens , 2009 ), interf ace design ( Prati et al. , 2021 ), and AI agent dev elop- ment ( Ena yet and Sukthankar , 2023a ; Zhang et al. , 2024 ). Computational methods f or CollA require data to understand the phenomena in pla y . Human conv er- sational resources are irreplaceab le for two main reasons. First, conv ersations are instances of joint action ( Clark , 1996 ), and human conv ersational data is a sequential record of multimodal commu- nicative behavior that reﬂects both individual con- tributions and inter personal dynamics. Additionally , linguistic research on human inter personal phe- nomena supplies f eatures for CollA, such as ref er- ring expressions ( Heeman and Hirst , 1995 ; Clark and Wilk es-Gibbs , 1986 ) and multile vel entrain- ment ( Lubold and P on-Barr y , 2014 ), as discussed in Section 4 . We illustrate in Figure 1 how infor- mative human task-oriented conv ersation can be f or CollA. Second, although we hav e relatively ma- ture metrics for e valuating task-related dimensions of CollA in task-oriented conv ersations ( e.g. , task decomposition, task completion, etc. Guan et al. ( 2025 )), inter personal dynamics, which directly af- f ect the collaboration process and quality , remain largely une xplored. F or collaborativ e lear ning sce- narios, collaboration itself can be a wa y of learning. Collaboration directly f osters both the inter naliza- tion and the sharing of knowledge , which provide new dimensions f or CollA. In addition, the quality of inter personal dynamics can signiﬁcantly inﬂuence learning outcomes ( Y ang , 2023 ). The variety of in- teraction patterns in human task-or iented dialogue also mak es it possible to inv estigate how people come to know each other ( e.g. , through language use, personality traits , or educational background) during collabor ation ( Guo et al. , 2025 ). CollA us- ing human con versational data provides v aluable insights f or designing human-machine collabora- tion systems that balance individual beneﬁts with ov erall task perf or mance of the group, and pro vide a better understanding of humanity . Figure 1: T wo pla yers pla ying the Hanoi tower game and conv ersing with each other to solve the puzzle. Their conversation pro vides many useful elements f or CollA. Se veral surveys ha ve cov ered related b ut distinct topics: Praharaj et al. ( 2021 ) and Schür mann et al. ( 2024 ) e xamine CollA multimodal f eatures and ex- isting collaboration measurements, but focus on the educational conte xt. Zou et al. ( 2025 ) centers on LLM-based human-agent system b uilding and e xamines how human f eedback and control con- tribute to perf or mance improv ement. They re view works on human-LLM systems and the con versa- tion corpora for these systems, f ocusing on tasks where human feedbac k can be crucial f or deﬁn- ing and assessing e valuation metrics. V accaro et al. ( 2024 ) surve ys recent work comparing the perf or mance of humans wor king alone, humans collaborating with machines , and machines oper- ating independently , and highlights the challenges inv olved in integ rating human intelligence with com- putational systems. While this study investigates the conditions under which human–machine col- laboration surpasses human-only or machine-only perf or mance, our w or k instead revie ws computa- tional models of multimodal discourse in collabora- tive conte xts. A holistic revie w of human task-or iented conv er- sation resources and their usage f or CollA remains une xplored. By task-oriented conv ersation, we mean conv ersation directed with clear intention to- ward the completion of tasks Grosz and Sidner ( 1986 ). F or collaboration, b uilding on the deﬁni- tions provided by W ood and Gray ( 1991 ); Randrup et al. ( 2016 ), we conceptualize collaboration as an interactiv e process compr ising f our core ele- ments: a shared goal , a shared understanding of the task (including r ules, norms, and str uctures), positive interdependency , and joint individual commitments reﬂected in par ticipants’ actions and decisions. Then, we apply this deﬁnition to select task-or iented cor pora from peer-re viewed published papers that provide collaboration annota- tions and e valuations , focusing par ticularly on set- tings that create positive interdependencies among par ticipants ( e.g. , tasks that cannot be completed by a single participant under the deﬁned settings). The step-by-step procedure and criteria for paper selection are explained in Appendix A, Figure 2 . W e conduct the revie w by systematically analyz- ing the coding schemes for capturing collaboration in conv ersation, extracting salient multimodal f ea- tures, and examining recent collaboration modeling approaches applied to the selected cor pora. After discussing coding schemes in Section 2 , we present the criter ia used to select CollA corpora and revie w their task settings in Section 3 . W e then e xamine CollA studies based on at least one of the selected corpora, discussing salient features and modeling approaches in Sections 4 and 5 . We conclude by discussing recent advances and future directions of CollA in Section 6 . 2. Coding Schemes f or CollA Coding schemes serve as a lens to sho wcase im- por tant elements for CollA ( Chen et al. , 2020 ) and are used to annotate collaboration in conversa- tions f or computational collaboration model build- ing. This section re views coding schemes em- plo yed in the cor pora discussed in Section 3 , high- lighting how diff erent coding approaches capture both individual and group (including dyad) aspects of collaboration. We also examine applied ques- tionnaires, giv en their ﬂe xibility ( e.g. , self-repor ts, annotation instruments, e xter nal e valuation) and wide application in CollA. The full details of all the emplo yed coding schemes are giv en in T able 1 . We discuss the main categories of coding schemes and question- naires, their theoretical f oundations, and ho w they capture diff erent aspects of collaboration to illumi- nate current trends and challenges. 2.1. Individual Per spective F rom the individual perspectiv e, applied coding schemes and questionnaires f ocus on the single- par ticipant collaborative aspect, such as individual collaborativ e/cooperativ e behaviors , engagement , and individual variables , such as collective orienta- tion that ref ers to the propensity to work in a collec- tiv e manner in a team setting ( Dr iskell et al. , 2010 ). They are applied to individual audio and visual data to model par ticipants’ collaborative beha viors and their impacts on the collaboration process. Using linguistic aspects, Cavicchio and P oe- sio ( 2012 ) use a cooperation coding scheme in Rov ereto cor pus, dev elopped by Da vies ( 1997 , 2007 ). This scheme analyzes individual collabora- tiveness using Grice’ s cooperative principle ( Grice , 1975 ) for conv ersation analysis. It is ev aluative, i.e. , not only used to label what the speak ers do but also to assess it in terms of appropriateness, which can be subjective and hard to agree on f or the annotators . Findings in classroom discourse research re veal the impor tant role of individual ar- gument in the collaborativ e learning process ( Engle and Conant , 2002 ). Based on that, Olshefski et al. ( 2020 ) dev elop a coding scheme to capture the function of collaborativ e argument mov es for stu- dents’ discussion in the Discussion T rack er cor pus to e xplore the collabor ation dimension of argument in large-group collaborativ e lear ning tasks. Using beha vior studies, Richey et al. ( 2016 ) applied a modiﬁed v ersion of the collaborative be- havior coding scheme from Johnson and Johnson ( 2013 ) to the SRI corpus . These coding schemes are rooted in social interdependence theor y ( John- son and Johnson , 2009 ) and Vygotsky’ s cognitive de velopmental theor y ( T udge and Rogoff , 2014 ), which highlight the interactive aspects of individual behavior and collaborative indicators of lear ning. Corpus, Description Scheme/Questionnaire Details Reference Individual SRI speech-based collaborative learn- ing corpus I Code (individual collaboration indicators) regulative/logistical, interaction, and cognitiv e indicators of teamwork behavior Richey et al. ( 2016 ) MISC information-seeking conversa- tions adapted User Engagement Scale (UES) scaled ratings of partner’s collaborativeness McDuff et al. ( 2017 ) RoomReader multimodal, multiparty conversational inter actions cor pus online engagement continuous scaled engagement in groups based on collaborators’ beha viors and per- ceived intentions Rev erdy et al. ( 2022 ) Discussion T racker multipar ty discus- sions collaborative argumentation functions classiﬁcation of arguments as: new ideas, agreements , extensions, probes/challenge Olshefski et al. ( 2020 ) Rovereto emotion and cooperation cor- pus cooperative dialogue eff or t ev aluation of each tur n along three dimensions: knowledge sharing, non-cooperativ e behavior , and cooperation level (scaled) Cavicchio and P oesio ( 2012 ) MUL TICOLLAB multimodal dialogues extreme emotion (frustration) in collabora- tion participant self-assessed fr ustration lev el Peechatt et al. ( 2024 ) T eams multipar ty dialogues for entrain- ment collective orientation participant self-assessed preference for teamwork Litman et al. ( 2016 ) Dyad & Group SRI Q Code (team collaboration quality; triads) team-lev el quality states ( e.g. , good collaboration, follow-the-leader), based on number of engaged participants Richey et al. ( 2016 ) MISC adapted User Engagement Scale (UES) scaled ratings of collaboration process McDuff et al. ( 2017 ) T eams team cohesion, satisfaction, po- tency/efﬁcacy between and post-game questionnaires elicit perceptions of teams processes Litman et al. ( 2016 ) GAME-ON group analysis of multi- modal expression of cohesion corpus modiﬁed Group Environment Question- naire (GEQ) for group cohesion highlights of instrumental function (social vs task facets); affective function optional depending on study Maman et al. ( 2020 ) GAP group affect and performance cor- pus teamwork experience (self-report) ratings of teamwork performance (time management, efﬁciency , overall work quality) Brale y and Murray ( 2018 ) AMI augmented multiparty interaction corpus, PCC patient consultation cor- pus group cohesion ratings of task cohesion, social cohesion, and leadership Hung and Gatica-P erez ( 2010 ) Kantharaju and P elachaud ( 2021 ) MUL TISIMO multimodal group interac- tion corpus collaboration quality (over all) scaled rating of over all collaboration quality Koutsomboger a and V ogel ( 2018 ) PhotoBook visually-grounded dia- logues collaboration performance scaled ratings of overall collaboration perf ormance and perceiv ed mutual understanding Haber et al. ( 2019 ) T able 1: Applied coding schemes that capture diff erent aspects of collaboration in con versational data. Re verdy et al. ( 2022 ) adapts a multif acet classroom engagement behavior coding scheme ( Goldberg et al. , 2021 ) for online interaction in the Room- Reader corpus, where head and hand position and ey e gaze/f ocus play an impor tant role in annotat- ing individual engagement lev els. P eechatt et al. ( 2024 ) design a coding scheme f or annotating the frustration le vel in MUL TICOLLAB to predict critical moments in collaboration process. F rom a human factors perspectiv e, Litman et al. ( 2016 ) collect collectiv e or ientation ("the propensity to work in a collective manner", Driskell et al. ( 2010 )) from each par ticipant via a self- assessment questionnaire in the T eams cor pus, enabling fur ther collaboration analysis alongside individual variab les. 2.2. Dy ad and Group P erspective The dyad and group interaction dynamics of the collaboration process ha ve attracted lots of atten- tion from the research community . These cod- ing schemes or questionnaires f ocus on the inter- personal dynamics and group beha viors, such as group cohesion , self-assessed collaboration per- f or mance , and perceived collaboration quality . F or collaboration quality e valuation, Haber et al. ( 2019 ) use a questionnaire in the dyadic conv ersa- tion cor pus PhotoBook to collect self-assessed, scaled over all collaboration performance, while McDuff et al. ( 2017 ) select items from the User Engagement Scale questionnaire ( O’Br ien and T oms , 2010 ) in MISC , f or collaboration process e valuation. F or three-person groups , Richey et al. ( 2016 ) dev elop a coding scheme, Q codes (as shown in T ab le. 4 of Richey et al. ( 2016 )), in which the collaboration quality is deﬁned as “nb of the group members’ actively contribute to the task com- pletion” with a special focus on balanced involv e- ment of each member . Group cohesion is a group phenomenon deﬁned as “the group members’ inclinations to forge so- cial bonds” Case y-Campbell and Mar tens ( 2009 ), which impacts the collaboration process ( Hung and Gatica-P erez , 2010 ; Kantharaju and P elachaud , 2021 ). Hung and Gatica-P erez ( 2010 ) study both the social and task aspects of cohesion in a com- putational wa y . Their 27-item questionnaire for per- ceiv ed group cohesion is based on group research ( Carron and Bra wley , 2000 ) and psychology liter- ature ( Siebold , 1999 ), and it has been applied to the AMI cor pus ( Kraaij et al. , 2005 ) and the PCC cor pus ( Kantharaju and P elachaud , 2021 ). Se ver t and Estrada ( 2015 ) provides another cod- ing scheme for a cohesion study that includes func- tional and structural dimensions of cohesion. This scheme has two psychological functions: affectiv e and instrumental , but only the last one, with social and task facets , has been used in the GAME-ON cor pus ( Maman et al. , 2020 ), to align with the dom- inant approach ( Braun et al. , 2020 ). 2.3. Main T rends for CollA Coding Schemes There is no universal coding scheme f or CollA. F or segment le vel, earlier studies tend to pro vide annotations for individual-le vel collaborativ eness, while recent coding schemes are mostly applied to study group-le vel collabor ative phenomena. One possible e xplanation could be that a g roup is not simply the sum of its dyads. Group-le vel collab- oration inv olves emergent dynamics that individ- ual or dyadic models cannot capture. How ev er , group-le vel emergent collaborative interaction is conte xt-dependent and can be hard to capture with individual-le vel cues. Human manual annotation re- mains the most reliab le approach for these studies, given that large language models (LLMs) are in- creasingly inv olved in recent annotation processes ( W ang et al. , 2024 ). W e obser ve that task-le vel annotation has been widely chosen to e valuate both individual-le vel ( Mc- Duff et al. , 2017 ; Haber et al. , 2019 ) and group- le vel ( K outsombogera and V ogel , 2018 ; Brale y and Murra y , 2018 ; Litman et al. , 2016 ) collaboration. Most annotations are based on task-adapted ques- tionnaires, such as perceiv ed collaboration quality using User Engagement Scale ( O’Brien and T oms , 2010 ) in MISC and self-assessed ov erall collabora- tion quality and satisf action in MUL TISIMO , GAP , and T eams . Both external and self-assessed an- notations are valuab le for CollA, from modeling collaborativ e con versation to detecting individual variab les for dialogue system adaptation. Ho w- e ver , task-lev el gran ularity can be insufﬁcient f or analyzing emergent phenomena. 3. T ask-Oriented Corpora for CollA In this section, we e xamine open-source , human- human , task-oriented conversation cor pora cre- ated in the last 20 years (from 2005) that require collaboration in their task-settings and hav e di- rect annotations of collaboration ( e.g. , collabora- tion quality/skills, group cohesion, conﬂicts , etc) to understand the recent trends in task setting design f or CollA. Corpora with only measurab le collabo- ration task results will not be included here, e.g. ELEA ( Sanchez-Cor tes et al. , 2011 ), since task results alone cannot reﬂect collaboration quality . The task settings are vital for b uilding CollA cor- pora, as they deter mine the le vel of interdepen- dence among par ticipants dur ing the collaboration process. We ﬁrst discuss CollA task settings , in- cluding g roup size and how they promote collab- oration between par ticipants, and provide details on their collaboration annotations in T able 2 . We then compare different choices and synthesiz e the main trends. Game Game scenarios constitute the dominant task setting categor y in our selection of cor pora. W e categor ize a corpus task as a game when the ke yword “game” is used to describe the cor pus’ scenario. Social game settings hav e been intensively used f or group cohesion analysis. The T eams cor pus ( Litman et al. , 2016 ) is built on a role-playing social game f or CollA. A group of 3 or 4 play- ers, each with a diff erent adventurer role, must discuss strategies to collect enough treasures to complete the game. The GAME-ON cor pus ( Ma- man et al. , 2020 ) is based on a multitask social game in which 3 pla yers must cooperatively dis- cov er clues and solve se ver al puzzles within a lim- ited time frame. Perceiv ed cohesion is ev aluated through self-assessment after each puzzle . They argue that cohesion can take consider able time to emerge in groups of strangers, so they recruit only real-world friends to play the game. Game settings can be easily adapted to a par- ticular aspect of CollA. The MISC cor pus ( Mc- Duff et al. , 2017 ) emplo ys a role-play , inf or mation- seeking setting, assigning the “seeker” role to one par ticipant in each pair . This setting helps understand human interests in this collaborativ e inf or mation-seeking process to design and ev al- uate human-machine interfaces . The PhotoBook cor pus ( Haber et al. , 2019 ) uses a remote multi- round image identiﬁcation game setting. Each group of tw o pla yers has access to different sets of images, and par ticipants must mar k each im- age as either common or diff erent by discussing it with their par tner . This setting enab les referring analysis 1 in a collaborativ e task. Education Education is a domain where collabo- ration has been e xtensively studied in the conte xt of collaborativ e lear ning. Due to pr ivacy concerns, a signiﬁcant number of cor pora under this task set- ting are not public ( Schneider and Bryant , 2022 ; Lämsä et al. , 2021 ; Olsen et al. , 2020 ; Spikol et al. , 2017 ; Salinas et al. , 2021 ). Recent cor pora in collaborativ e lear ning settings often in volv e large- group interactions, making the capture of multi- modal data more difﬁcult. Among the accessible resources, the Discussion T rac ker cor pus ( Olshefski et al. , 2020 ) captures classroom teacher-student interactions , f ocusing on collaborativ e argumentation in literature discus- sions. The SRI cor pus ( Richey et al. , 2016 ) e x- empliﬁes collaborativ e problem-solving with triadic groups solving math prob lems, capturing both so- cial and cognitive dimensions mainly from audio . The only corpus w e f ound with video data and collaborativ e learning task setting is the Room- Reader corpus ( Rev erdy et al. , 2022 ). It is based on online computer-suppor ted student-tutor con- versations and can be used to analyze engage- ment in collaboration and conversational dynam- ics. 1 “John didn’t come to class because he was sick. ” Ref err ing analysis studies what the w ord “he” refers to , trac king who or what is being talked about, a k ey par t of collaborativ e conv ersation. Corpus Lang. Hours T ask T ype Group Size Audio Video T ransc. Sensors Collaboration Annotation Dataset Link Object Obj. Level Annotator Granularity AMI 2005 EN 100 meeting 4 ✓ ✓ ✓ ✗ cohesion group external segment Dataset link Rovereto 2012 IT 4.67 game 4 ✓ ✓ ✓ ✓ cooperativeness individual external segment Dataset on request SRI 2016 EN 26.6 education 3 ✓ ✗ ✓ ✗ I Code, Q Code both external segment Dataset link ✰ T eams 2016 EN 47 game 3-4 ✓ ✓ ✓ ✗ participant’s collectiv e orienta- tion; group cohesion, satisfaction, potency both self task Dataset link ❀ MISC 2017 EN 42 game 2 ✓ ✓ ✓ auto ✗ perceived collabor ation (help/ un- derstanding/communication) individual external task Dataset link MUL TISIMO 2018 EN 4 game 3 ✓ ✓ ✓ ✗ collaboration quality group external task Dataset link GAP 2018 EN 4 meeting 2,3,4 ✓ ✗ ✓ ✗ teamwork experience group self task Dataset link PhotoBook 2019 EN - game 2 ✗ ✗ ✓ ✗ collaboration performance group self task Dataset link GAME-ON 2020 IT 11 game 3 ✓ ✓ ✗ ✓ cohesion group self task Dataset link Discussion T racker 2020 EN - education 15 ✗ ✗ ✓ ✗ collaborative argumentation individual external segment Dataset link PCC 2021 EN 2 other 3,4 ✓ ✓ ✗ ✗ cohesion group external segment Dataset on request RoomReader 2022 EN 38 education 4,5 ✓ ✓ ✓ ✗ engagement, cohesion both both segment Dataset link ✰ MUL TICOLLAB 2023 EN 3 other 2 ✓ ✓ ✓ ✓ extreme emotion (frustration) individual self both Dataset will go pub- lic ❀ The transcripts in MISC are auto-generated without mentioning any manual veriﬁcation process. ✰ MUL TICOLLAB is currently not public but is to be made av ailable in the future. The video, tr anscription, and questionnaire data of T eams will be availab le in a future release. T able 2: Ov er view of collaboration cor pora arranged chronologically , to highlight the ev olution of research f ocus in CollA. Comparative assessment across dimensions: language, recording siz e, task type, g roup size , multimodal data av ailability , and collaboration annotation (annotation object, object lev el, annotator type, and temporal g ranularity). Meetings and Others Scenarios adapted from real-world tasks are frequently used for CollA. A par t of the AMI corpus ( Kraaij et al. , 2005 ) elicits collaboration using a role-pla ying functional meeting within a 4-person design team for new product protocol de velopment. The GAP corpus ( Brale y and Murr ay , 2018 ) also applies a meet- ing setting f or a small group to make a decision on the rank of the most impor tant items f or a plane crash. The PCC cor pus ( Kantharaju and P elachaud , 2021 ) simulates health consultations between patients and healthcare prof essionals by equipping prof essionals with detailed background inf or mation to study cohesion. The MUL TICOLLAB cor pus ( P eechatt et al. , 2024 ) adopts a role-pla ying setting, i.e. , instructor and b uilder , for a b lock b uilding task in which some builders are instructed to deliberately disobe y to stimulate critical moments in the collaboration pro- cess with strong interdependency . Their task set- tings yield half of their data from non-collaborative builders . Main T rends of CollA T ask Settings Common task settings f or CollA include role-play and in- f or mation asymmetr y , which promote interdepen- dence among par ticipants and foster the obser- vation of active joint contributions dur ing the col- laboration process . Co-locating collaboration cor- pora with video recordings, especially with both individual- and room-lev el recordings, is interest- ing f or studying inter personal and group dynamic aspects of CollA ( Kraaij et al. , 2005 ; Litman et al. , 2016 ; K outsombogera and V ogel , 2018 ; Maman et al. , 2020 ) and ha ve attr acted more attention in recent CollA. Remote meetings can also provide a vie w of group synchrony ( e.g. , RoomReader ( Re verdy et al. , 2022 )), while it is recognized that remote settings ma y inhibit natural interaction ( P oel et al. , 2008 ). We also observe a trend to ward analyzing group collaboration phenomena using segment- le vel annotations , rather than individual collabora- tiveness as in earlier CollA studies. 4. CollA Features This section synthesizes e xper imentally suppor ted f eatures extr acted from text, audio , video , sensor signals 2 , and cross-modal, for CollA. We narrow the discussion to features e xtracted from our selec- tion of conv ersation cor pora presented in Section 3 , arguing that task-or iented cor pora built with par- ticular settings ( e.g. , interdependency , inf or mation asymmetr y between par ticipants) that foster col- laboration are better suited for identifying CollA f eatures. Both features that hav e shown signiﬁcant associations with collaboration quality and those used in individual- and group-le vel collaboration modeling are included. W e also cover diff erent feature-gener ation meth- ods ( e.g. , natural language processing, signal pro- cessing, questionnaires) and ho w these f eatures are exploited ( e.g. , used directly in modeling or incor porated into high-le vel construct building) to make the discussion more practical. An o verview of f eatures per modality is av ailable in T ab le 3 . 4.1. T ext-Based Features T e xt-based f eatures coming from lexical, syntac- tic, and semantic proper ties hav e been studied f or CollA. The entrainment of pronoun usage ( e.g. , of 2 W e ﬁnd the f ollowing sensor signals applied f or CollA: electrocardiogram, electrodermal signals, galv anic skin response, photopleth ysmograph y , and body motion. Group-level Collaboration Individual Collaborativeness Group Cohesion and Entrainment T eamwork Process and Perf or mance Engagement Collaboration Behaviors Perceiv ed Personality and Role T ext 4.1 pronoun usage ( Enayet and Sukthankar , 2021a ) lexical entrainment ( Rahimi and Litman , 2020 ) paralinguistic mimicry ( Nanninga et al. , 2017 ) dialogue act Be-Positive ( Kantharaju et al. , 2020 ) syntactic entrainment embedding ( Enayet and Sukthankar , 2021a ) DA CTs sequence embedding, sentiment em- bedding ( Enay et and Sukthankar , 2023a , 2021a ) SUBTL score ( Murray and Oertel , 2018 ) lexical cohesion ( Rahimi and Litman , 2020 ) dependency parse f eature ( Murray and Oer tel , 2018 ) word psycholinguistic score ( Murra y and Oer- tel , 2018 ) word embedding ( Li et al. , 2024 ) underexplored BERT embedding ( F enech et al. , 2022 ) Audio 4.2 intensity , frequency , shimmer, jitter ( P eechatt et al. , 2024 ; Litman et al. , 2016 ) turn-taking ( Sabry et al. , 2021 ; Sassier- Roublin et al. , 2025 ) laughter,bac kchannels ( Kantharaju and Pelachaud , 2021 ) total o verlapping, pause time ( Hung and Gatica-Perez , 2010 ) audio embedding ( Li et al. , 2024 ) underexplored total speaking time, pitch, jitter, loud- ness ( Peechatt et al. , 2024 ; Litman et al. , 2016 ; Sabry et al. , 2021 ) shimmer, harmonics-to-noise ( Sabr y et al. , 2021 ) eGeMAPS features ( Fenech et al. , 2022 ) Video 4.3 mutual gaze ( Kantharaju and Pelachaud , 2021 ) automatic extracted f acial expression, head nods duration ( Kantharaju et al. , 2020 ) underexplored underexploredundere xplored saccade peak velocity ( Peechatt et al. , 2024 ) focus of attention without mutual en- gagement ( Sabry et al. , 2021 ) facial action units from OpenFace ( Fenech et al. , 2022 ) Sensor 4.3 group and individual prox emics and kinesics features ( Sabry et al. , 2021 ) bodily motion energy synchrony ( Kantharaju and Pelachaud , 2021 ) underexplored galvanic skin response (GSR) ( Peechatt et al. , 2024 ) trav eled distance, kinetic energy , pos- ture expansion, amount of walking, and hand gesture ( Sabry et al. , 2021 ) Cross- modality 4.4 representation based leadership ( Sabr y et al. , 2021 ) interpersonal synchrony ( Sassier-Roublin et al. , 2025 ) mutual gaze instance during interruption ( Kantharaju et al. , 2020 ) MUMIN coding on social cues ( Kantharaju and Pelachaud , 2021 ) underexplored underexplored underexplored ratio between successful interruptions and speaking turns ( Sabr y et al. , 2021 ) T able 3: Features are categor ized by modality and their application in group-le vel and individual-lev el collaboration analysis , highlighting current trends and potential undere xplored areas in CollA research. Detailed discussions of each modality are ref erenced in their respective sections . singular/plural pronoun usage, 1st/2nd/3rd-person pronoun usage) has been applied in team perf or- mance lev el classiﬁcation ( Enay et and Sukthankar , 2021a ). Discourse markers ( e .g. , “oka y”, “but”, “because”) signal the communicativ e function of a phrase ( e .g. , agreement, disagreement) and the y can be used as individual-lev el collaborativ e be- havior f eatures ( Koutsomboger a and V ogel , 2018 ), Both le xical and syntactic entrainment , describ- ing how team members adopt similar speaking styles dur ing conv ersation, ha ve been studied: syn- tactic entrainment, calculated using automatic part- of-speech tagging, has been shown to be an effec- tiv e predictor of team perf or mance, but is e xpected to be eff ective only in the late stages of collabo- ration ( Ena yet and Sukthankar , 2021a ). Le xical entrainment of function words based on LIWC- derived categories of function words ( P ennebaker et al. , 2001 ) has been used to identify inﬂuencers, connectors, and passive members ( Rahimi and Litman , 2020 ) f or multipar ty collaboration. BER T -based pretrained models ( Devlin et al. , 2019 ; Liu et al. , 2019 ; He et al. , 2020 ) can be used f or te xt embedding generation in con versa- tion, mapping high-dimensional spaces to low di- mensions while retaining only the most effectiv e representations as sparse vectors . This approach has been employ ed for f eature generation in en- gagement modeling ( Li et al. , 2024 ) and conﬂict modeling ( Ena yet and Sukthankar , 2023a ). 4.2. A udio A udio of the collaborator’ s speech contains many useful features for CollA, such as intensity , fre- quency , and their v ar iations ( e.g. , shimmer , jitter). They can be used to measure speaking energy , pitch, v oice quality and e xcitement, which hav e been f ound to be positively correlated with both e xtreme emotion like frustration ( P eechatt et al. , 2024 ) and group cohesion ( Litman et al. , 2016 ), and often require cross-modality veriﬁcation. F or frustration identiﬁcation, voice f eatures ( e.g. , F0, intensity) can be more salient than visual features , such as chin raises and brow furrows ( P eechatt et al. , 2024 ). A udio features can also automatically be e xtracted using tools such as OpenSMILE ( Ey- ben et al. , 2010 ) and pretrained wa v2vec ( Schnei- der et al. , 2019 ). This has been applied in student engagement prediction ( Li et al. , 2024 ). 4.3. Video and Sensor Signal Eye Gaze Eye gaze ref ers to the direction and mov ement of a person’ s ey es, often used in com- munication to signal attention, engagement, and turn-taking ( Kraaij et al. , 2005 ). It helps regulate conv ersational ﬂow; for e xample, it has been found that speakers av er t their gaze to signal turn initia- tion and re-estab lish ey e contact to yield the ﬂoor ( Hung and Gatica-P erez , 2010 ). Mutual gaze , in- dicating shared attention, is f ound to be related to social cohesion ( Kantharaju and P elachaud , 2021 ). Additionally , metrics such as saccade peak velocity , which can be measured via ey e-trac king sensors, hav e been shown to indicate fr ustration during col- laboration ( P eechatt et al. , 2024 ). Facial Expression P erceived f acial expressions provide insights into par ticipants’ emotional states and can contribute to the functional meanings of human interactions. F or e xample, lip cor ner puller , automatically extracted using OpenF ace ( Amos et al. , 2016 ), is observed as a more frequent and longer action in high-cohesive segments than in low-cohesiv e segments f or small group meetings, while there are no signiﬁcant diff erences f or outer brow raiser and brow lo werer ( Kantharaju et al. , 2020 ). Kantharaju and P elachaud ( 2021 ) conduct a comprehensiv e f acial unit study to e xamine its correlation with high- and low-cohesiv e segments in the collaboration process. Head, Hand, Body Motion and Sensor Signals Interlocutors synchronize in ov erall body mov e- ment. Bodily motion energy synchron y , a simpliﬁed v ersion of inter personal synchrony , is frequently ob- ser ved in highly cohesiv e teams ( Kantharaju and P elachaud , 2021 ). Sabr y et al. ( 2021 ) compute pro xemics ( e.g. , inter personal distance) and ki- nesics ( e.g. , amount of walking, energy synchron y) f eatures of motion caption to model the emergent leadership and group cohesion. Bioph ysical sig- nals, such as Galvanic Skin Response (GSR), pro- vide ph ysiological indicators of states , such as frus- tration, during collaboration ( P eechatt et al. , 2024 ). 4.4. Dialogic and Cr oss-Modality Dialogue act (D ACT) annotation codes the speaker’ s intention. Kantharaju et al. ( 2020 ) study 15 D ACTs f ollowing the coding scheme used in the cor pus AMI 3 and ﬁnd the “Be-P ositive” D ACT as highly related to group cohesion. Howe ver , as far as we kno w , no other DA CTs hav e experimentally been prov en to be salient in CollA. T ur n-taking management can rev eal both par- ticipant dominance and disengagement ( K outsom- bogera and V ogel , 2018 ), but requires fur ther vali- dation from other modalities to determine speciﬁc functions of tur ns. The total pause time dur ing the collaborativ e conv ersation can reﬂect par ticipants’ attentiveness and is consistently high in highly co- hesive meetings ( Hung and Gatica-P erez , 2010 ). Cross-modality inter-speaker features are intu- itiv ely more grounded than unimodal individual fea- tures f or group analysis. Howe ver , for hand-cr afted cross-modality f eatures, an early study from Hung and Gatica-P erez ( 2010 ) shows that silent motion ( i.e. , visual activity when a person is not talking) is relatively a salient feature in cohesion estima- tion but does not outperf or m the total pause time between individual tur ns, which is an audio-only f eature. Audio visual synchron y , either f or the same person ( e.g. , body gestures aligned with speech) or across group members ( e.g. , interlocutors align with one another in both motion and prosody), can be a good indicator of r appor t and comfort ( Hung and Gatica-P erez , 2010 ). 3 https://groups.inf.ed.ac.uk/ami/corpus/ Guidelines/dialogue_acts_manual_1.0.pdf Both the number of interr uptions and the n umber of mutual gaze instances occurring during inter- ruption are positively correlated with group cohe- sion ( Kanthar aju et al. , 2020 ). Overlapping speech , combined with visual expressions and prosodic en- ergy , helps identify dominance and task cohesion ( Hung and Gatica-P erez , 2010 ). Shared laughter can be observed more frequently and can also last longer in high-cohesion situations ( Kantharaju and P elachaud , 2021 ). 4.5. Main T rends of CollA Features Many f eatures have been sho wn to be statistically signiﬁcant and are used in collaboration model- ing, from collected individual lo w-lev el cues to as- sembled g roup-lev el constructs ( e.g. , entrainment, conv ergence). T o better capture group-le vel collabor ative phe- nomena, some studies explored feature-le vel fu- sion across modalities to obtain more grounded f eatures ( Hung and Gatica-P erez , 2010 ; Kan- tharaju et al. , 2020 ; Kantharaju and P elachaud , 2021 ). Howe ver , cross-modality features remain undere xplored relative to single-modality f eatures in CollA. W e also want to highlight the usage of pretr ained models for feature embedding in CollA studies. This approach has been seen a lot in recent CollA, as it enables cross-corpora generalization in the results. How ev er , its eff ectiveness depends hea vily on the robustness and stability of the underlying pretrained models . When applying such models f or f eature e xtraction, careful consideration must be giv en to the alignment between the pretraining data and the intended modeling objectiv es. For instance, studies hav e shown that automatically ex- tracted f acial action units from pretrained models perf or med poor ly in classifying self-assessed frus- tration during collaboration ( P eechatt et al. , 2024 ), highlighting the impor tance of validating pretrained f eature extr actors for speciﬁc CollA tasks . 5. Models As discussed in Section 3 , e xisting CollA cor pora can include both task-lev el collaboration annota- tions as well as segment-le vel annotations . In this section, we discuss e xisting collaboration models based on segment-lev el and task-le vel annotations, comparing them across their modeling objects, modeling approaches, f eature designs, and modal- ity fusion. The objectiv e is to understand the possible rea- sons behind diff erent choices of analysis granular- ity in the CollA research community and identify valuab le future directions. We discuss CollA stud- ies that use at least one of the cor pora listed in Section 3 . 5.1. Predicting T ask-Level Annotations T ask-le vel collaboration annotations have pre vi- ously been used to model team collaboration qual- ity ( Litman et al. , 2016 ; Haber et al. , 2019 ), individ- ual collaboration effort ( McDuff et al. , 2017 ), and group cohesion ( Maman et al. , 2020 ). These ev aluations are suitable for correlation studies between collaboration and task-level , temporal-difference CollA features , such as en- trainment ( Rahimi and Litman , 2020 ; P aletz et al. , 2023 ), con vergence ( Rahimi and Litman , 2018 ), dominance ( V ogel et al. , 2023 ) that change over time during the collaboration process. W e also obser ve that when modeling with task-lev el annota- tions, f eatures are typically aggregated across the entire session ( e .g. , a ver age, standard de viation, min max value , distr ibution ( W alocha et al. , 2020 )). Multimodal embeddings of conv ersations gener- ated by pre-tr ained models hav e also been tested in the prediction of task-lev el collaboration ( Enay et and Sukthankar , 2021a , 2023a ; Rahimi and Litman , 2020 ). For perceiv ed collaboration quality , scaling data for super vised lear ning is relatively easier , since task-lev el e valuations of perceiv ed dimen- sions can be added to e xisting task-or iented cor- pora, enabling model training on combined cor pora. W e obser ve that there are more deep-learning ap- proaches ( e .g. , LSTM ( Ena yet and Sukthankar , 2023b , a ) for conﬂict prediction, multimodal trans- f or mer ( F enech et al. , 2022 ) for personality mod- eling) applied to combined, relatively large cor- pora with task-lev el ev aluation ( Enay et and Suk- thankar , 2023b , a ). We observe that deep-lear ning approaches ( e.g. , LSTM ( Ena yet and Sukthankar , 2023b , a ) for conﬂict prediction and multimodal transf or mers ( F enech et al. , 2022 ) f or personal- ity modeling) are more common in studies using combined, relativ ely large cor pora with task-le vel e valuations . 5.2. Predicting Segment-Level Annotations Most segment-le vel annotations are not directly f or group-lev el collaboration quality modeling, but rather for group-lev el emergent collabora- tion phenomena modeling ( Kantharaju et al. , 2020 ; P eechatt et al. , 2024 ). Due to the conte xt-dependent nature of collaboration, existing segment-le vel direct collaboration annotations can be too scenario-speciﬁc to be utilized in fur ther studies of diff erent collaborative situations ( Riche y et al. , 2016 ). Collaboration-related aspects with broader applicability to human interaction ( e.g. , en- gagement ( Rev erdy et al. , 2022 ), cohesion ( Kan- tharaju et al. , 2020 ), and fr ustration ( P eechatt et al. , 2024 )) hav e attracted attention from the research community . Segment-le vel annotations enab le research on emergent group behavior and phenomena by le ver- aging low-le vel cues and sequential temporal f ea- tures that capture their conte xt-dependent nature. Howe ver , collaboration modeling based on these segment-le vel annotations still f aces sev eral chal- lenges. First, the relativ ely small, often imbalanced datasets limit the choice of super vised models to classical classiﬁers such as suppor t vector ma- chines and logistic regression ( Enay et and Suk- thankar , 2021a ; Kantharaju and P elachaud , 2021 ). Second, the automatically e xtracted multimodal f eatures can hav e alignment issues, and it is chal- lenging to create a common representation space that preser ves cross-modal relationships without losing modality-speciﬁc nuances. Given data scarcity , solutions have been ex- plored at diff erent lev els: at the data le vel, corpus combination ( Ena yet and Sukthankar , 2023b , a ), and synon ym replacement in conv ersational te xt ( Ena yet and Sukthankar , 2023a ); at the f eature le vel, features ov ersampling ( Corbellini et al. , 2023 ). These methods require a careful adaptation to the modeling object to a void introducing bias: f or e xample, data augmentation with synonym replace- ment can delete lexical entrainment in collaboration dynamics. At the model le vel, gr aph-based neural network (GNN) methods hav e achiev ed state-of- the-ar t perf or mance on the RoomReader corpus, which contains only 8 hours of recordings ( Li et al. , 2024 ), and have remained eff ective f or social in- teraction classiﬁcation across se veral corpora with diff erent task settings ( Corbellini et al. , 2023 ). We theref ore expect more studies on the potential of GNNs and other graph-learning methods in CollA. 5.3. State-Of-The-Art P erformance Direct comparison of results remains challenging due to variations in collaboration tasks, group sizes, conte xtual settings, analytical approaches, and e valuation metrics. Model perf or mance in collaboration analysis de- pends strongly on task formulation and label struc- ture. Model performance in collaboration analy- sis depends strongly on task f or mulation and la- bel structure. For example, binar y classiﬁcation tasks, such as high/lo w group cohesion or high/low arousal, tend to yield higher accuracy , reaching up to 78% on AMI and PCC ( Kantharaju and P elachaud , 2021 ). Imbalanced multiclass prob- lems are more challenging: in four-class group- le vel Q-code classiﬁcation, aver age unweighted accuracy drops to around 49% on the relatively small SRI cor pus ( Bassiou et al. , 2016 ). Similarly , tasks with well-deﬁned and balanced classes , such as binar y engagement detection, can exceed 90% accuracy with multimodal f eatures ( Li et al. , 2024 ). Over all, deep-learning approaches outperf or m classical methods in both perf or mance and scala- bility f or multimodal modeling, especially on larger or combined datasets. F or conﬂict prediction, Rahimi and Litman ( 2018 ) achie ves 67.74% accuracy using SVM, while ( Ena yet and Sukthankar , 2021b ) use an LSTM f or f eature engineer ing and achiev e 73.33% accuracy on GitHub Issue Dataset 4 , with training perf or med on the T eams cor pus. Late fusion strategies , which combine modality- speciﬁc embeddings at the decision le vel, lose modality interactions but hav e been adopted in recent studies as a trade-off to enable tr aining on se veral cor pora. These results underscore the impor tance of both task f or mulation and model architecture in adv ancing automatic collaboration analysis. 6. Conclusion and Future Direction This surve y provides an ov er view of recent ad- vances in the analysis of collaboration through human-human conv ersational cor pora. W e discuss coding schemes for individual- and group-le vel col- laboration annotation, different task settings for building the CollA corpora, salient multimodal f ea- tures, and modeling approaches with different gr an- ularity . W e highlight the e volution from classical statistical methods to deep lear ning and large lan- guage models, as well as the g rowing integr ation of multimodality to provide a more nuanced under- standing of collaborativ e processes. Se veral directions f or future research on collab- oration analysis emerge. First, deeper analysis and modeling of individual collaboration strategies using linguistic framew or ks would enhance our un- derstanding of collaboration dynamics ( Haber et al. , 2019 ). Then, more public multimodal cor pora for CollA are needed, especially with recording set- tings that enable group-le vel multimodal feature capture, such as room-lev el video ( K outsombogera and V ogel , 2018 ) or wearable sociometric badges used in T eamSense corpus 5 ( Zhang et al. , 2018 ). Zero-f ew shot and in-conte xt lear ning approaches hav e been applied in many recent studies for the CollA of human-machine collaboration systems. These studies often choose an aspect of human collaboration ev aluation and aim to align human- machine or machine-only collaborativ e conv ersa- tions tow ards human lev el, while human conv ersa- tions are inevitab ly downg raded in their modality diversity to be comparab le. T o the best of our knowledge , the use of these methods for CollA within multimodal human con versational data is understudied. A ke y insight is that collaboration analysis is in- 4 https://github .com/ay eshaEnay et/D AC-USE 5 T eamSense is not a public cor pus and is not included in the range of this re view . herently task-oriented and inter personal, which is driving its growing application in human-centered human-machine collaboration systems. A utomatic analysis of collaboration through conv ersational data is a rapidly e volving ﬁeld. Continued interdis- ciplinar y efforts, combining linguistics, computer science, psychology , and education, will be essen- tial to address challenges and unlock the full poten- tial of collaborativ e technologies in both research and real-world applications. 7. Ackno wledgements Comments from the re viewers are greatly appre- ciated. Special thanks also go to Aina Garí Soler and Elodie Etienne f or their comments. This work was funded by the ANR-23-CE23-0033-01 SINNet project and additional suppor t was pro vided by the ANR under the F rance 2030 progr am PRAIRIE (ANR-23-IA CL-0008). 8. Bibliographical References Brandon Amos, Bartosz Ludwiczuk, Mahade v Satyanar ay anan, et al. 2016. Openf ace: A General-Purpose F ace Recognition Librar y with Mobile Applications . CMU School of Computer Science , 6(2):20. Nikoletta Bassiou, Andreas Tsiar tas, Jennif er Smith, Harry Bratt, Colleen Richey , Elizabeth Shriberg, Cynthia M D’Angelo, and Nonye Alozie. 2016. Pr iv acy-Preser ving Speech Analytics f or A utomatic Assessment of Student Collaboration. In Interspeech , pages 888–892. McK enzie Braley and Gabr iel Murray . 2018. The Group Aff ect and P erformance (GAP) Cor pus . In Proceedings of the Group Interaction F rontiers in T echnology , GIFT’18. Association f or Computing Machiner y . Michael T Braun, Stev e WJ Kozlo wski, T ara A Brown, and Richard P DeShon. 2020. Explor- ing the Dynamic T eam Cohesion–P erformance and Coordination–P erformance Relationships of Newly F or med T eams . Small Group Research , 51(5):551–580. Alber t V Carron and Lawrence R Brawle y . 2000. Cohesion: Conceptual and Measurement Issues . Small group research , 31(1):89–106. Milly Casey-Campbell and Mar tin L Mar tens. 2009. Sticking It All T ogether : A Cr itical Assessment of the Group Cohesion–P erformance Literature . International Jour nal of Management Revie ws , 11(2):223–246. F eder ica Cavicchio and Massimo P oesio . 2012. The Rov ereto Emotion and Cooperation Cor pus: A New Resource to Inv estigate Cooperation and Emotions . Language Resources and Evaluation , 46(1):117–130. Y uxin Chen, Christopher D Andre ws, Cindy E Hmelo-Silv er , and Cynthia D’Angelo. 2020. Cod- ing Schemes as Lenses on Collaborativ e Lear n- ing . Information and Lear ning Sciences , 121(1- 2):1–18. Herber t H Clark. 1996. Using Language . Cam- bridge University Press. Herber t H Clark and Deanna Wilkes-Gibbs . 1986. Ref err ing as a Collaborative Process. Cognition , 22(1):1–39. Nicola Corbellini, Jhony H Giraldo , Giov anna V ar ni, and Gualtiero V olpe. 2023. Fe w Labels are Enough! Semi-Super vised Graph Lear ning f or Social Interaction . In Proceedings of the IEEE/CVF Inter national Conference on Com- puter Vision , pages 3060–3068. Bethan L Davies . 1997. Empir ical Examination of Cooperation, Effor t and Risk in T ask-Oriented Dialogue. Anne xe Thesis Digitisation Project 2016 Block 5 . Bethan L Da vies. 2007. Gr ice’ s Cooperative Prin- ciple: Meaning and Rationality . Jour nal of prag- matics , 39(12):2308–2331. Jacob De vlin, Ming-Wei Chang, K enton Lee, and Kristina T outanov a. 2019. BER T: Pre-training of deep bidirectional transf or mers for language understanding . In Proceedings of the 2019 Con- f erence of the North American Chapter of the As- sociation f or Computational Linguistics: Human Language T echnologies, V olume 1 (Long and Shor t P apers) , pages 4171–4186, Minneapolis, Minnesota. Association f or Computational Lin- guistics. James E Driskell, Eduardo Salas, and Sandra Hughes. 2010. Collective Orientation and T eam P erformance: Dev elopment of an Individual Dif- f erences Measure . Human f actors , 52(2):316– 328. A yesha Ena yet and Gita Sukthankar . 2021a. Ana- lyzing T eam P erformance with Embeddings from Multipar ty Dialogues . In 2021 IEEE 15th Inter- national Conf erence on Semantic Computing (ICSC) , pages 33–39. IEEE. A yesha Ena yet and Gita Sukthankar . 2021b. Lear n- ing a Generalizable Model of T eam Conﬂict from Multipar ty Dialogues . International Journal of Semantic Computing , 15(04):441–460. A yesha Enay et and Gita Sukthankar . 2023a. A Proactive and Gener alizable Conﬂict Prediction Model . In 2023 IEEE 17th Inter national Con- f erence on Semantic Computing (ICSC) , pages 216–220. IEEE. A yesha Enay et and Gita Sukthankar . 2023b. Im- proving the Generalizability of Collaborativ e Dia- logue Analysis With Multi-F eature Embeddings . In Proceedings of the 17th Conf erence of the Eu- ropean Chapter of the Association for Computa- tional Linguistics , pages 3551–3565, Dubrovnik, Croatia. Association f or Computational Linguis- tics. Randi A Engle and F aith R Conant. 2002. Guiding Principles for F oster ing Productiv e Disciplinar y Engagement: Explaining an Emergent Argument in a Community of Learners Classroom . Cogni- tion and instruction , 20(4):399–483. Florian Eyben, Mar tin Wöllmer , and Björ n Schuller . 2010. Opensmile: The Munich V ersatile and F ast Open-Source A udio Feature Extractor . In Proceedings of the 18th A CM inter national con- f erence on Multimedia , pages 1459–1462. Kristian F enech, Ádám Fodor , Sean P Bergeron, Rachid R Saboundji, Catharine Oer tel, and An- drás L ˝ orincz. 2022. P erceived P ersonality State Estimation in Dyadic and Small Group Interac- tion with Deep Lear ning Methods . arXiv prepr int arXiv:2211.04979 . P atricia Goldberg, Ömer Sümer , Kathleen Stür mer , W olfgang W agner , Richard Göllner , P eter Ger- jets, Enkelejda Kasneci, and Ulrich T rautwein. 2021. Attentiv e or Not? T oward a Machine Learn- ing Approach to Assessing Students’ Visible En- gagement in Classroom Instruction . Educational Psychology Re view , 33(1):27–49. Herber t Paul Grice. 1975. Logic and Conv ersation. Syntax and semantics , 3:43–58. Barbara J Grosz and Candace L Sidner . 1986. At- tention, intentions, and the str ucture of discourse . Computational linguistics , 12(3):175–204. Shengyue Guan, Hao yi Xiong, Jindong W ang, Jiang Bian, Bin Zhu, and Jian-guang Lou. 2025. Evaluating LLM-Based Agents for Multi- T ur n Conversations: A Sur ve y . arXiv preprint arXiv:2503.22458 . Ao Guo , Atsumoto Ohashi, Ryu Hirai, Y uya Chiba, Y uiko Tsunomor i, and Ryuichiro Higashinaka. 2025. User P ersonality and Its Inﬂuence on the P erformance of Pipeline and End-to-End T ask- Oriented Dialogue Systems . Scientiﬁc Repor ts , 15(1):20745. Janosch Haber , Tim Baumgär tner , Ece T akmaz, Lieke Gelder loos, Elia Bruni, and Raquel Fer- nández. 2019. The PhotoBook Dataset: Build- ing Common Ground through Visually-Grounded Dialogue . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 1895–1910, Florence, Italy . Association f or Computational Linguistics. P engcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deber ta: Decoding- enhanced ber t with disentangled attention . arXiv preprint arXiv:2006.03654 . P eter A Heeman and Graeme Hirst. 1995. Collab- orating on Ref erring Expressions. arXiv preprint cmp-lg/9504003 . Ha yley Hung and Daniel Gatica-P erez. 2010. Esti- mating Cohesion in Small Groups Using A udio- visual Nonverbal Behavior . IEEE T r ansactions on Multimedia , 12(6):563–575. P atrícia Jaques , Adja Andrade, João J ung, Rafael Bordini, and Rosa Vicar i. 2023. Using P edagog- ical Agents to Suppor t Collaborativ e Distance Learning . In Computer suppor t f or collaborative learning , pages 546–547. Routledge. David W Johnson and Roger T Johnson. 2009. An Educational Psychology Success Story: Social Interdependence Theor y and Cooperativ e Lear n- ing . Educational researcher , 38(5):365–379. David W Johnson and Roger T Johnson. 2013. Cooperation and the Use of T echnology . In Handbook of research on educational communi- cations and technology , pages 777–803. Rout- ledge. Reshmashree Bangalore Kantharaju, Caroline Lan- glet, Mukesh Barange, Chloé Clav el, and Cather- ine P elachaud. 2020. Multimodal Analysis of Cohesion in Multi-P ar ty Interactions . In Proceed- ings of the twelfth language resources and e val- uation conf erence , pages 498–507. Reshmashree Bangalore Kantharaju and Cather- ine P elachaud. 2021. Social Signals of Cohesion in Multi-P ar ty Interactions . In Proceedings of the 21st ACM Inter national Conf erence on Intelligent Vir tual Agents , pages 9–16. Maria Koutsomboger a and Car l V ogel. 2018. Mod- eling Collaborativ e Multimodal Behavior in Group Dialogues: The MUL TISIMO Cor pus . In Pro- ceedings of the Elev enth Inter national Confer- ence on Language Resources and Evaluation (LREC 2018) . W essel Kraaij, Thomas Hain, Mike Lincoln, and Wilfried P ost. 2005. The AMI Meeting Cor pus . In Proc. Inter national Conference on Methods and T echniques in Beha vioral Research , pages 1–4. Ming Li, Xiaosheng Zhuang, Lu Bai, and Weip- ing Ding. 2024. Multimodal Graph Learning Based on 3D Haar Semi-Tight F ramelet for Stu- dent Engagement Prediction . Inf or mation Fu- sion , 105:102224. Diane Litman, Susannah P aletz, Zahra Rahimi, Stef ani Allegretti, and Caitlin Rice. 2016. The T eams Cor pus and Entrainment in Multi-par ty Spoken Dialogues . In Proceedings of the 2016 Conf erence on Empir ical Methods in Natural Language Processing , pages 1421–1431. Yinhan Liu, Myle Ott, Naman Goy al, Jingf ei Du, Mandar Joshi, Danqi Chen, Omer Le vy , Mike Lewis , Luke Zettlemoy er , and V eselin Sto yanov . 2019. Rober ta: A Robustly Opti- mized Ber t Pretraining Approach . arXiv prepr int arXiv:1907.11692 . Nichola Lubold and Heather P on-Barr y . 2014. Acoustic-prosodic Entrainment and Rappor t in Collaborativ e Lear ning Dialogues . In Proceed- ings of the 2014 A CM workshop on Multimodal Learning Analytics Workshop and Grand Chal- lenge , pages 5–12. Joni Lämsä, P ablo Uribe, Abelino Jiménez, Daniela Caballero , Raija H. Hämäläinen, and R. Ara ya. 2021. Deep Networks for Collaboration Analyt- ics: Promoting Automatic Analysis of F ace-to- F ace Interaction in the Context of Inquir y-Based Learning . Lucien Maman, Eleonora Ceccaldi, Nale Lehmann- Willenbrock, Laurence Likf or man-Sulem, Mo- hamed Chetouani, Gualtiero V olpe, and Gio- vanna V ar ni. 2020. Game-On: A Multimodal Dataset f or Cohesion and Group Analysis . IEEE Access , 8:124185–124203. Rober to Mar tinez-Maldonado , Dragan Gaševi ´ c, V anessa Eche verria, Gloria Fernandez Nieto , Zachari Swiecki, and Simon Buckingham Shum. 2021. What Do Y ou Mean b y Collaboration Ana- lytics? A Conceptual Model . Jour nal of Learning Analytics , 8(1):126–153. Daniel McDuff, P aul Thomas, Mary Czerwinski, and Nic k Cras well. 2017. Multimodal Analysis of V ocal Collaborative Search: A Public Corpus and Results . In Proceedings of the 19th acm in- ter national conf erence on multimodal interaction , pages 456–463. Gabriel Murray and Catharine Oer tel. 2018. Pre- dicting Group P erformance in T ask-based Inter- action . In Proceedings of the 20th ACM Inter- national Conf erence on Multimodal Interaction , pages 14–20. Marjolein C Nanninga, Y anxia Zhang, Nale Lehmann-Willenbrock, Zoltán Szlávik, and Ha y- ley Hung. 2017. Estimating V erbal Expressions of T ask and Social Cohesion in Meetings by Quantifying P aralinguistic Mimicry . In Proceed- ings of the 19th A CM international conf erence on multimodal interaction , pages 206–215. Heather L O’Brien and Elaine G T oms. 2010. The De velopment and Ev aluation of a Surve y to Mea- sure User Engagement . Journal of the American Society f or Information Science and T echnology , 61(1):50–69. Jennif er K Olsen, Kshitij Shar ma, Nikol Rummel, and Vincent Alev en. 2020. T emporal Analysis of Multimodal Data to Predict Collaborativ e Lear n- ing Outcomes . Br itish Jour nal of Educational T echnology , 51(5):1527–1547. Christopher Olshefski, Luca Lugini, Ra vneet Singh, Diane Litman, and Amanda Godle y . 2020. The Discussion T rac ker Corpus of Collaborative Ar- gumentation . Proceedings of the 12th Inter na- tional Conf erence on Language Resources and Evaluation (LREC’20) . Susannah BF P aletz, Diane Litman, V aler ie Karuzis, K elly M Jones, and Zahra Rahimi. 2023. Speaking Similarly: T eam P ersonality Composi- tion and Acoustic-Prosodic Entrainment . Small Group Research , 54(6):860–898. Michael P eechatt, Cecilia Ovesdotter Alm, and Reynold Bailey . 2024. MUL TICOLLAB: A Multi- modal Corpus of Dialogues f or Analyzing Collab- oration and F r ustration in Language . In Proceed- ings of the 2024 Joint International Conference on Computational Linguistics, Language Re- sources and Ev aluation (LREC-COLING 2024) , pages 11713–11722. James W P ennebaker , Mar tha E F rancis, Roger J Booth, et al. 2001. Linguistic Inquiry and Word Count: LIWC 2001 . Mahwa y: Lawrence Erlbaum Associates , 71(2001):2001. Mannes P oel, Ronald P oppe, and Anton Nijholt. 2008. Meeting Behavior Detection in Smar t En- vironments: Nonv erbal Cues that Help to Obtain Natural Interaction . In 2008 8th IEEE Inter na- tional Conf erence on Automatic F ace & Gesture Recognition , pages 1–6. IEEE. Sambit Praharaj, Maren Scheff el, Hendrik Drach- sler , and Marcus Specht. 2021. Literature Re- view on Co-located Collaboration Modeling Us- ing Multimodal Lear ning Analytics—Can W e Go the Whole Nine Y ards? IEEE T ransactions on Learning T echnologies , 14(3):367–385. Elisa Prati, V aleria Villani, F abio Grandi, Margher ita P er uzzini, and Lorenzo Sabattini. 2021. Use of Interaction Design Methodologies for Human- Robot Collaboration in Industr ial Scenarios . IEEE T ransactions on A utomation Science and Engineering , 19(4):3126–3138. Zahra Rahimi and Diane Litman. 2018. Weighting Model Based on Group Dynamics to Measure Conv ergence in Multi-par ty Dialogue . In Pro- ceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue , pages 385–390, Mel- bour ne, A ustralia. Association for Computational Linguistics. Zahra Rahimi and Diane Litman. 2020. Entrain- ment2vec: Embedding Entrainment for Multi- P ar ty Dialogues . In Proceedings of the AAAI Conf erence on Ar tiﬁcial Intelligence , v olume 34, pages 8681–8688. Nils Randrup, Douglas Druckenmiller , and Rober t O Br iggs. 2016. Philosoph y of Collabo- ration . In 2016 49th Hawaii international conf er- ence on system sciences (HICSS) , pages 898– 907. IEEE. Justine Rev erdy , Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R Cowan, and Naomi Har te. 2022. Roomreader : A Mul- timodal Cor pus of Online Multipar ty Con ver- sational Interactions . In Proceedings of the Thir teenth Language Resources and Evaluation Conf erence , pages 2517–2527. Colleen Riche y , Cynthia M D’Angelo , Nony e Alozie , Harr y Bratt, and Elizabeth Shriberg. 2016. The SRI Speech-Based Collaborativ e Lear ning Cor- pus. In INTERSPEECH , pages 1550–1554. Souma ya Sabr y , Lucien Maman, and Giov anna V arni. 2021. An Explorator y Computational Study on the Eff ect of Emergent Leadership on Social and T ask Cohesion . In Companion Publi- cation of the 2021 International Conference on Multimodal Interaction , pages 263–272. Omar Salinas, F abian Riquelme, Rober to Muñoz, Cristian Cechinel, Rober to Mar tinez, and Diego Monsalves . 2021. Can Analytics of Speaking Time Serve as Indicators of Eff ective T eam Com- munication and Collabor ation? In Proceedings of the X Latin American Conf erence on Human Computer Interaction , pages 1–4. Dairazalia Sanchez-Cor tes, Oy a Aran, and Daniel Gatica-P erez. 2011. An A udio Visual Cor pus f or Emergent Leader Analysis . In W orkshop on multimodal corpora for machine learning: taking stock and road mapping the future , ICMI-MLMI . Mathilde Sassier-Roublin, Julien Saunier , and Ale xandre P auchet. 2025. T oward Real-Time Cohesion Estimation in Hybr id Groups: A Mul- timodal Social Signal Processing Approach . In Proceedings of the 25th ACM International Con- f erence on Intelligent Vir tual Agents , pages 1–9. Ber trand Schneider and T onya Bryant. 2022. Us- ing Mobile Dual Eye-T racking to Capture Cycles of Collaboration and Cooperation in Co-located Dyads . Steff en Schneider , Ale xei Bae vski, Ronan Col- lober t, and Michael A uli. 2019. W av2v ec: Un- super vised Pre-training f or Speech Recognition . arXiv preprint arXiv:1904.05862 . V erena Schür mann, Nicki Marquardt, and Daniel Bodemer . 2024. Conceptualization and Mea- surement of P eer Collaboration in Higher Ed- ucation: A Systematic Re view . Small Group Research , 55(1):89–138. Jamie B Se vert and Armando X Estrada. 2015. On the Function and Structure of Group Cohe- sion. In T eam cohesion: Adv ances in psycholog- ical theor y , methods and practice , pages 3–24. Emerald Group pub lishing limited. Guy L Siebold. 1999. The Evolution of the Mea- surement of Cohesion. Militar y psychology , 11(1):5–26. Daniel Spikol, E Ruffaldi, and M Cukurova. 2017. Using Multimodal Learning Analytics to Identify Aspects of Collaboration in Project-Based Learn- ing . Jonathan T udge and Barbara Rogoff . 2014. P eer Inﬂuences on Cognitive De velopment: Piagetian and Vygotskian P erspectives . In Inter action in human de velopment , pages 17–40. Psychology Press. Michelle V accaro , Abdullah Almaatouq, and Thomas Malone. 2024. When Combinations of Humans and AI are Useful: A Systematic Revie w and Meta-Analysis . Nature Human Behaviour , 8(12):2293–2303. Carl V ogel, Maria K outsombogera, and J ustine Re verdy . 2023. Aspects of Dynamics in Dialogue Collaboration . Electronics , 12(10):2210. F abian Walocha, Lucien Maman, Mohamed Chetouani, and Gio vanna V ar ni. 2020. Model- ing Dynamics of T ask and Social Cohesion from the Group P erspective Using Non verbal Motion Capture-Based Features . In Companion pub li- cation of the 2020 inter national conf erence on multimodal interaction , pages 182–190. Xinru Wang, Hannah Kim, Sajjadur Rahman, K ushan Mitra, and Zhengjie Miao . 2024. Human- LLM Collaborativ e Annotation Through Effectiv e V er iﬁcation of LLM Labels . In Proceedings of the 2024 CHI conf erence on human factors in computing systems , pages 1–21. Donna J W ood and Barbara Gray . 1991. T o- ward a Comprehensive Theor y of Collabora- tion . The Jour nal of Applied Behavioral Science , 27(2):139–162. Xigui Y ang. 2023. A Histor ical Review of Col- laborativ e Learning and Cooperative Lear ning . T echT rends , 67(4):718–728. Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Br yan Hooi, and Shumin Deng. 2024. Explor- ing Collaboration Mechanisms f or LLM Agents: A Social Psychology View . In Proceedings of the 62nd Annual Meeting of the Association f or Computational Linguistics (V olume 1: Long Pa- pers) , pages 14544–14607, Bangk ok, Thailand. Association f or Computational Linguistics. Y anxia Zhang, Jeffrey Olenic k, Chu-Hsiang Chang, Ste ve WJ Kozlo wski, and Hayle y Hung. 2018. T eamSense: Assessing P ersonal Affect and Group Cohesion in Small T eams Through Dyadic Interaction and Beha vior Analysis with W earab le Sensors . Proceedings of the A CM on Interactive , Mobile, W earable and Ubiquitous T echnologies , 2(3):1–22. Henr y P eng Zou, Wei-Chieh Huang, Y aozu W u, Y ankai Chen, Chunyu Miao , Hoang Nguy en, Y ue Zhou, Weizhi Zhang, Liancheng F ang, Langzhou He, et al. 2025. A Survey on Large Language Model Based Human-Agent Systems . A uthorea Preprints . 9. Language Resource References Brale y , McK enzie and Murra y , Gabriel. 2018. The Group Affect and P erformance (GAP) Corpus . Association f or Computing Machiner y , GIFT’18. Cavicchio , F eder ica and P oesio , Massimo . 2012. The Rov ereto Emotion and Cooperation Corpus: A New Resource to In vestigate Cooperation and Emotions . Spr inger . Haber , Janosch and Baumgär tner , Tim and T ak- maz, Ece and Gelderloos, Lieke and Bruni, Elia and F er nández, Raquel. 2019. The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue . Association f or Computational Linguistics. Kantharaju, Reshmashree B and P elachaud, Catherine. 2021. Social Signals of Cohesion in Multi-P ar ty Interactions . K outsombogera, Mar ia and V ogel, Car l. 2018. Modeling Collaborativ e Multimodal Behavior in Group Dialogues: The MUL TISIMO Cor pus . Kraaij, Wessel and Hain, Thomas and Lincoln, Mike and P ost, Wilfr ied. 2005. The AMI Meeting Cor pus . Litman, Diane and P aletz, Susannah and Rahimi, Zahra and Allegretti, Stefani and Rice, Caitlin. 2016. The T eams Cor pus and Entrainment in Multi-par ty Spoken Dialogues . Maman, Lucien and Ceccaldi, Eleonora and Lehmann-Willenbrock, Nale and Likforman- Sulem, Laurence and Chetouani, Mohamed and V olpe, Gualtiero and V ar ni, Giov anna. 2020. Game-on: A Multimodal Dataset for Cohesion and Group Analysis . IEEE. McDuff , Daniel and Thomas, P aul and Czerwin- ski, Mar y and Cras well, Nick. 2017. Multimodal Analysis of V ocal Collaborative Search: A Public Cor pus and Results . Olshefski, Christopher and Lugini, Luca and Singh, Ravneet and Litman, Diane and Godle y , Amanda. 2020. The Discussion T rac ker Corpus of Collaborativ e Argumentation . P eechatt, Michael and Alm, Cecilia Ov esdotter and Bailey , Reynold. 2024. MUL TICOLLAB: A Mul- timodal Corpus of Dialogues for Analyzing Col- laboration and F r ustration in Language . Re verdy , Justine and Russell, Sam O’Connor and Duquenne, Louise and Garaialde, Diego and Cow an, Benjamin R and Har te, Naomi. 2022. Roomreader : A Multimodal Corpus of Online Multipar ty Conv ersational Interactions . Richey , Colleen and D’Angelo, Cynthia M and Alozie, Nony e and Bratt, Harry and Shriberg, Elizabeth. 2016. The SRI Speech-Based Collab- orativ e Lear ning Corpus. ISLRN 199-041-455- 836-2 . Appendix A - Corpora Selection Process for CollA: A Relatively Less-Resourced Domain Figure 2: Human-human conv ersation cor pus building with manual annotations is a costly pro- cess. It was par ticular ly difﬁcult to ﬁnd cor pora with direct annotations on collaboration. Our recur- sive approach is time-consuming, but the bound- ar y is clearly deﬁned, which results in a relativ ely grounded and complete selection f or an overview of CollA cor pora.

Automatic Analysis of Collaboration Through Human Conversational Data Resources: A Review

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment