TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs

Noname man uscript No. (will b e inserted b y the editor) T opicResp onse: A Marriage of T opic Mo delling and Rasc h Mo delling for Automatic Measurement in MOOCs Jiazhen He 1 , 3 · Benjamin I. P . Rubinstein 1 · James Bailey 1 , 3 · Rui Zhang 1 · Sandra Milligan 2 Received: date / A ccepted: date Abstract This pap er explores the suitability of using automatically discov ered topics from MOOC discussion forums for mo delling studen ts’ academic abilities. The Rasch model from psyc hometrics is a p opular generative probabilistic model that relates latent student skill, laten t item diﬃcult y , and observed student-item resp onses within a principled, uniﬁed framework. According to scholarly educa- tional theory , disco v ered topics can b e regarded as appropriate measurement items if (1) students’ participation across the discov ered topics is well ﬁt by the Rasc h mo del, and if (2) the topics are interpretable to sub ject-matter exp erts as b eing educationally meaningful. Suc h Rasc h-scaled topics, with associated diﬃcult y lev- els, could b e of p otential b eneﬁt to curriculum reﬁnemen t, studen t assessment and p ersonalised feedback. The tec hnical challenge that remains, is to disco v er mean- ingful topics that sim ultaneously achiev e go o d statistical ﬁt with the Rasc h mo del. T o address this challenge, w e combine the Rasch mo del with non-negativ e matrix factorisation based topic modelling, join tly ﬁtting both mo dels. W e demonstrate the suitability of our approach with quantitativ e exp eriments on data from three Coursera MOOCs, and with qualitativ e survey results on topic interpretabilit y on a Discrete Optimisation MOOC. Keyw ords MOOCs · T opic Mo delling · Matrix F actorisation · Psychometrics · Item Resp onse Theory · Rasc h Mo del 1 Introduction Massiv e Op en Online Courses (MOOCs) ha v e attracted wide atten tion due to the promise of deliv ering education at scale. This new learning environmen t produces a Jiazhen He E-mail: jiazhenh@student.unimelb.edu.au Benjamin I. P . Rubinstein, James Bailey , Rui Zhang and Sandra Milligan E-mail: {brubinstein, baileyj, rui.zhang, s.milligan}@unimelb.edu.au 1 School of Computing & Information Systems, The Univ ersity of Melb ourne, Australia 2 Melbourne Graduate School of Education, The Univ ersity of Melb ourne, Australia 3 Data61/CSIRO, Australia ii Jiazhen He et al. v ariety of data ( e.g. , demographic data, student engagement, and forum activities), whic h oﬀer new opp ortunities to understanding studen t learning. While quizzes and assignments hav e dominated summative assessment, the many sources of rich studen t engagement data generated in MOOC platforms present new views on studen t learning and a ven ues for formativ e feedback. This pap er explores whether studen ts’ participation across automatically disco vered MOOC forum topics is suitable for mo delling academic abilit y . Our w ork is inspired by the imp ortance of forum discussions as an active learn- ing activity , and recent research on quantitativ e measurement of student learn- ing in the education communit y . In particular, i) MOOC discussion forums, as the main platform for studen t-instructor and student-studen t interactions, is of imp ortance in gaining insights into studen t learning. ii) Recent researc h in edu- cation (Milligan, 2015) suggests that a distinctive and complex learning skill is required to promote learning in MOOCs. Educators are interested in whether and ho w the possession of this complex learning skill ma y b e evidenced b y latent com- plex patterns of engagement, instead of traditional assessment to ols such as quizzes and assignments. iii) In order to v alidate such a hypothesis, measuremen t theory can b e used (Rasch, 1993; W righ t and Masters, 1982). A set of items is handcrafted from forum activities ( e.g. , “ c ontribute d a p ost attr acting votes fr om others ” and “ made r ep e ate d thr e ad visits in mor e than half the we eks ”), and calibrated ( e.g. , deleted or changed) to ﬁt a measuremen t model as evidence as to whether the set of items is appropriate for measuring the complex learning skill (Milligan, 2015). This pro cess is h uman-intensiv e and time-consuming as reﬂected by Figure 1. Driv en b y these observ ations, we inv estigate whether students’ participation in automatically discov ered forum topics can b e used as an instrumen t to mo del studen ts’ ability . If studen ts’ participation across the discov ered topics ﬁt a mea- suremen t mo del (in this pap er, we use the Rasch mo del) in terms of statistical eﬀectiv eness, and the topics are in terpretable to sub ject-matter exp erts by wa y of qualitativ e eﬀectiveness, then the disco v ered topics can b e regarded as useful items for measurement. The resulting scaled topics, endow ed with estimated diﬃculty lev els, can assist in subsequent curriculum reﬁnement, student assessment, and p ersonalised feedbac k. The technical challenge, then, is to automatically discov er topics suc h that studen ts’ participation across them ﬁt the Rasc h model. He et al (2016) hav e adapted topic modelling of students’ online forum p ostings, such that studen ts’ participation across these topics conforms to the Guttman scale. How ev er, the Guttman scale is widely regarded as ov erly-idealised and impractical in the real w orld. In contrast the Rasch mo del, one of the simplest item resp onse theory (IR T) mo dels and the basis for many extensions, has been widely used in education and psyc hology . It is a generativ e probabilistic model that represents student resp onses as noisy observ ations of latent student abilities related to item diﬃculties. It can b e view ed as a stochastic counterpart to the Guttman scale, p ermitting measuremen t error. If a person’s ability lev el is higher than an item’s diﬃcult y , the p erson will answer the item correctly in the Guttman scale, while in the Rasch model there is a certain probabilit y of incorrect resp onse. While the Guttman scale only p ermits ordering of persons and items, Rasch mo dels the locations on the scale and hence also meaningful diﬀerences (Scholten, 2011). The algorithm prop osed for the Guttman scale (He et al, 2016) does not adapt readily for Rasch mo delling. Instead w e propose the TopicResponse algorithm, which simultaneously performs non- T opicResp onse: Automatic Measuremen t in MOOCs iii MOOC Vi deos Assignments/ qui zzes Forum Post co ntent Post/ vi ew/vote behaviors Measur ement model Item cali bration Fi t? ... Handcrafted items Item 1 : C ontributed a post attr acting votes from ot hers Item2 : M ade repea ted vi sits i n more tha n half the weeks ... Automatic items Item 1 : Pr ogra mmi ng problems Item2 : Ass i gnment problems ... Yes No Scale (Different depending on model s) Time-consuming More a bl e persons Less diffi cult i tems More diffi cul t i tems Less able persons -3 -2 -1 0 1 2 3 i tem 1 i tem 2 person 1 person 2 Fig. 1: W orkﬂow for devising items manually v ersus automatically disco v ering topics as items for measurement. T raditionally , a set of items are handcrafted from MOOC forum b ehaviours, and then the students’ dic hotomous resp onses on the items are examined using the Rasc h mo del. If the mo del ﬁts well, then the students and items can be compared on an inferred scale (the ruler). Otherwise the items are reﬁned (c hanged, added or deleted) man ually until mo del ﬁt. The pro cess of handcrafting and calibration is time-consuming. Instead, we aim to automatically generate topics from discussion posts as items that ﬁt the Rasc h model b y design. negativ e matrix factorisation and Rasch mo del ﬁtting. The main contributions of this pap er include: – The ﬁrst study that combines topic modelling with Rasch mo delling in psy- c hometric testing: generating topics that measure students’ academic abilities based on online forum p ostings; – An algorithm TopicResponse ﬁtting NMF and Rasc h mo dels sim ultaneously , for whic h we provide a proof of conv ergence; and – Quan titative experiments on three Coursera MOOCs cov ering a broad sw ath of disciplines, establishing statistical eﬀectiv eness of our algorithm, and quali- tativ e results on a Discrete Optimisation MOOC, supp orting in terpretability . iv Jiazhen He et al. W e review related work in Section 2. In Section 3, we present preliminaries and formalise our problem. Our algorithm is in tro duced in Section 4, and ev aluated in Section 5. Section 6 concludes the paper. 2 Related W ork Man y studies hav e fo cused on item respon se theory (IR T) or MOOC data anal- ysis, but research on automatic discov ery of items for measurement in MOOCs has received little attention. The main relev ant w ork to this pap er is (He et al, 2016), where NMF-based topic modelling is adapted and used for Guttman scal- ing (Guttman, 1950) in order to measure studen ts’ latent abilities based on their MOOC forum p osts. A ma jor dra wback of that work is that the Guttman scale is regarded to b e the most restrictive IR T mo del and is ov erly idealised: it neither serv es as the basis of more sophisticated (probabilistic) mo dels, nor is it practical in the real world as a deterministic mo del. While the Guttman scale only mo d- els ordering of p ersons and items, the (probabilistic) Rasch mo del p ermits the in terpretation of the diﬀerences b etw een items and p eople (Scholten, 2011). The Rasc h mo del is a generative model that models student responses as noisy obser- v ations of latent student abilities in relation to item diﬃculties. The algorithm for Guttman scaling (He et al, 2016) do es not naturally extend to incorp orating Rasc h mo delling. 2.1 Item Resp onse Theory (IR T) The ﬁeld of IR T studies statistical mo dels for measurement in education and psy- c hology . Such models sp ecify the probabilit y of a p erson’s resp onse on an item as a mathematical function of the p erson’s and item’s latent attributes. A prin- cipal goal of IR T is to create a scale on which persons and items can b e placed and compared meaningfully . IR T has been used for computerised adaptive testing (CA T), whic h aims to accurately and eﬃciently assess individuals’ trait lev els, and is used in the Sc holastic Aptitude T est (SA T), Graduate Record Examination (GRE), while Chen et al (2005) prop osed a p ersonalised e-learning system based on IR T considering course material diﬃculty and learner abilit y . As a statistical model, IR T has attracted attention in mac hine learning re- cen tly . Bergner et al (2012) applied mo del-based collab orative ﬁltering to estimate the parameters for IR T models, considering IR T as a t ype of collab orative ﬁltering task, where the user-item in teractions are factorised into user and item parame- ters. Bac hrac h et al (2012) prop osed a probabilistic graphical mo del that jointly mo dels the diﬃculties of questions, the abilities of participants and the correct answ ers to questions in aptitude testing and cro wdsourcing settings. While in MOOCs, Champaign et al (2014) inv estigated the correlations b etw een resource use and studen ts’ skill and relative skill improv emen t measured b y IR T. Colvin et al (2014) analysed pre-post test questions using IR T, to compare the learning in MOOCs and a blended on-campus course. Past work has tended to focus on using already-devised items to measure student abilit y under IR T mo dels, while we are in terested in automatically discov ering conten t-based items that are c haracteristic of measuremen t in MOOCs (Milligan, 2015). T opicResp onse: Automatic Measuremen t in MOOCs v 2.2 MOOC F orums MOOC forums hav e b een of great in terest recently , due to the av ailabilit y of rich textual data and so cial b ehaviour. V arious studies ha ve been conducted such as sen timent analysis, communit y ﬁnding, question recommendation, answ ers & in- terv ention prediction. W en et al (2014) use sentimen t analysis to monitor students’ trending opinions tow ards the course and to correlate sentimen t with drop outs ov er time using surviv al analysis. Y ang et al (2015) predict students’ confusion during learning activities as expressed in discussion forums, using discussion b ehaviour and clickstream data; they further explore the impact of confusion on studen t drop out. Ramesh et al (2015) predict sen timen t in MOOC forums using hinge-loss Mark ov random ﬁelds. Gillani et al (2014) ﬁnd comm unities using Ba y esian Non- Negativ e Matrix F actorisation. Y ang et al (2014) recommend questions of interest to studen ts by designing a con text-aw are matrix factorisation mo del considering constrain ts on students and questions. MOOC forum data has also been leveraged in the task of predicting accepted answers to forum questions (Jenders et al, 2016) and predicting instructor in terv en tion (Chaturvedi et al, 2014). Despite the v ariety of studies, little machine learning researc h has explored forum discussions for the purp ose of measuremen t in MOOCs. 3 Preliminaries and Problem F ormulation W e choose NMF as the basic approach to discov er forum topics due to the in- terpretabilit y of the topics pro duced, and the extensibility of its optimisation for- m ulation. F or the IR T mo del for measurement, we fo cus on the Rasc h mo del for dic hotomous data due to its p opularity , and due to being the basis for man y extensions in education and psychology . W e next ov erview the Rasch mo del for dic hotomous data and NMF, and then deﬁne our problem. 3.1 Rasch Model The Rasch model (W right and Masters, 1982; Bond and F ox, 2001) for dichotomous data (correct/incorrect, agree/disagree responses) speciﬁes the probability of a p erson’s positive response (correct, agree) on an item as a logistic function of the diﬀerence b et ween the p erson’s ability and item diﬃcult y , p ij = P ( X ij = 1 | β i , θ j ) = 1 1 + exp ( − ( θ j − β i )) , (1) where latent θ j ∈ R denotes p erson j ’s ability , laten t β i ∈ R denotes item i ’s diﬃcult y , X ij ∈ { 0 , 1 } denotes p erson j ’s observed random resp onse on item i , and p ij is the probability of this response b eing positive. This probability is b est illustrated with the Item Characteristic Curv e (ICC) as depicted in Figure 2 and commonly used in the ﬁeld of IR T. It can b e seen that the higher a p erson’s ability is, relative to the diﬃculty of an item, the higher the probability of a p ositive resp onse on that item. When a person’s abilit y is equal to an item’s diﬃcult y on the laten t scale, p ositive resp onses are observed with 0.5 probabilit y . vi Jiazhen He et al. 4 3 2 1 0 1 2 3 4 A b i l i t y ( θ = 0 ) 0.0 0.2 0.4 0.6 0.8 1.0 Probability of positive response I t e m 1 ( β 1 = − 1 ) I t e m 2 ( β 2 = 0 ) I t e m 3 ( β 3 = 1 ) Fig. 2: The Item Characteristic Curves for three items (item 1–the easiest, 3–the most diﬃcult). A p erson with ability θ = 0 has 0.5 probability of responding p ositiv ely on item 2 with diﬃculty β = 0 , and higher (and low er) probabilit y on the easiest item 1 (most diﬃcult item 3, respectively). The latent measurement scale is analogous to the ruler sho wn in Figure 1, where p ersons and items are placed together and can b e compared meaningfully . The Rasc h mo del pro vides a wa y to construct the ruler using p ersons’ resp onses on items. Persons and items are lo cated along the scale according to their abilities θ j and diﬃculties β i resp ectiv ely . The Rasc h model can b e view ed as a stochastic counterpart to the Guttman scale. F or example, in Figure 1, person 1 and p erson 2 will ha ve positive resp onse on item 1 in a Guttman scale. While in a Rasc h scale, there are certain probabili- ties that person 1 and person 2 will enjo y p ositiv e resp onses on item 1, with p erson 1’s probability b eing higher. This error model leads to a higher level of measure- men t scale: the interv al scale, where we can tell how muc h more able person 2 is compared to person 1. F rom the Guttman scale, by comparison, we can tell that p erson 2 is b etter than p erson 1 but not b y ho w muc h. T able 1 further illustrates our setup, with an example of items for measuring basic mathematical abilit y , alongside h yp othetical students’ resp onses. The initial estimates (see Equations 6,7 b elow) for item diﬃculties and p erson abilities are pro duced on a logit scale. F or example, if p erson 1 resp onds to the items p ositively 20% of the time and negativ ely 80% of the time, then the p erson’s initial ability estimate is approximately − 1 . 39 b y taking the natural logarithm of the o dds ratio for p ositiv e resp onse 0 . 2 0 . 8 . 3.1.1 R asch Estimation Giv en an observ ed resp onse matrix x = [ x ij ] ( e.g. , T able 1), a basic goal is to estimate the p erson and item parameters θ j and β i . The most common esti- mation metho ds are based on maximum-lik eliho o d estimation, including: jointly maxim um-likelihoo d (JML) estimation, conditional maximum-lik eliho o d (CML) estimation and marginal maxim um-likelihoo d (MML) estimation (Baker and Kim, 2004). In this pap er, w e fo cus on JML. T opicResp onse: Automatic Measuremen t in MOOCs vii T able 1: An example of items for measuring basic mathematical abilit y , students’ resp onses, initial item diﬃcult y estimates and studen t ability estimates. Item 1 Item 2 Item 3 Item 4 Item 5 Proportion Ability θ 0 j (Coun t) ( + ) ( − ) ( × ) ( ÷ ) correct p θ j log  p θ j 1 − p θ j  P erson 1 1 0 0 0 0 0.20 -1.39 P erson 2 1 1 0 0 0 0.60 0.41 P erson 3 0 1 1 0 0 0.60 0.41 P erson 4 1 0 1 1 0 0.67 0.71 P erson 5 1 1 1 0 1 0.80 1.39 Proportion correct p β i 0.80 0.33 0.33 0.20 0.20 Diﬃculty β 0 i log  1 − p β i p β i  -1.39 0.71 0.71 1.39 1.39 Under the assumption that a sample of n p ersons is drawn independently at random from a population of persons p ossessing a latent skill attribute, and the assumption of lo cal indep endence that a person’s resp onses to diﬀeren t items are statistically indep endent, the probability of an observ ed data matrix x = [ x ij ] with k items and n p ersons is the pro duct of the probabilities of the individual resp onses, and can b e giv en by the joint lik elihoo d function L ( β , θ | x ) = k Y i =1 n Y j =1 P ( X ij = 1 | β i , θ j ) x ij (1 − P ( X ij = 1 | β i , θ j )) (1 − x ij ) = k Y i =1 n Y j =1 exp ( x ij ( θ j − β i )) 1 + exp( θ j − β i ) . (2) The log-lik eliho o d function is then log L ( β , θ | x ) = k X i =1 n X j =1 x ij ( θ j − β i ) − k X i =1 n X j =1 log(1 + exp ( θ j − β i )) . (3) The parameters of the Rasch mo del can b e estimated b y joint maximum lik eliho o d—maximisation of this expression—using Newton-Raphson (Bertsek as, 1999), whic h yields the follo wing iterative solution for β i and θ j , β t +1 i = β t i − P n j =1 ( p ij − x ij ) − P n j =1 p ij (1 − p ij ) for t ≥ 0 , (4) θ t +1 j = θ t j − P k i =1 ( x ij − p ij ) − P k i =1 p ij (1 − p ij ) for t ≥ 0 . (5) The con v ergence to a local optim um (with suitable step sizes) is guaranteed. The initial estimates of θ j , θ 0 j can be obtained b y ﬁrstly calculating the prop ortion of items that a p erson j resp onded correctly p θ j , and then taking the natural viii Jiazhen He et al. logarithm of the o dds of p erson j ’s correct response as sho wn in T able 1, whic h can b e formalised as follo ws: θ 0 j = log  p θ j 1 − p θ j  , p θ j = r j k , r j = k X i =1 x ij , (6) where r j denotes the num b er of items that p erson j responded to p ositively . Sim- ilarly , the initial estimates of β i , β 0 i can b e obtained b y β 0 i = log  1 − p β i p β i  , p β i = s i n , s i = n X j =1 x ij , (7) where s i denotes the n umber of p ersons who responded correctly on item i , and p β i denotes the prop ortion of p ersons who resp onded correctly on item i . F or those items receiving no correct resp onses ( s i = 0 ), or no incorrect re- sp onses ( s i = n ), some implementations of the Rasc h model will delete the item, while other mo dels handle the situation as follo ws (Baker and Kim, 2004), where  is a small n um b er ( e.g. , 1.0 is used in our exp eriments), s i = ( , if s i = 0 n − , if s i = n , r j = ( , if s i = 0 k − , if s i = k . These pseudo counts are similar to frequen tist Laplace corrections, or (weak) uni- form Ba yesian priors. 3.1.2 Evaluating Mo del Fit A set of items is said to measure a laten t attribute on an interv al scale when there is a close ﬁt b etw een data and mo del. The mo del-data ﬁt is t ypically examined using inﬁt and outﬁt statistics—tw o types of mean square error statistics—-conv eying information about the error in the estimates for eac h individual item and person. Outﬁt and inﬁt test statistics are deﬁned for each item and p erson to test the ﬁt of items and p ersons under the Rasch mo del, b y carefully summarising the Rasc h residuals. The Rasc h residuals are the diﬀerences b etw een the observ ed resp onses and the expected responses according to the Rasc h mo del. F ormally , the exp ected resp onse of p erson j on item i under the Rasch mo del E [ x ij ] (abbreviated to E ij ) is E [ X ij ] = p ij . The residual b etw een the observ ation x ij and the expected response E ij is then R ij = x ij − E ij . Standardised residuals are often used to assess the ﬁt of a single p erson-item resp onse Z ij = X ij − E ij p V ar( X ij − E ij ) = R ij p V ar( X ij ) , (8) where V ar( X ij ) = p ij (1 − p ij ) denotes the v ariance of X ij (abbreviated to S ij ). The outﬁt of item i summarises the squared standardised residuals, av eraged o ver n p ersons, Outﬁt i = 1 n n X 1 Z 2 ij = 1 n n X 1 R 2 ij S ij . (9) T opicResp onse: Automatic Measuremen t in MOOCs ix T ypical treatments assume standardised residuals Z ij appro ximately following a unit normal distribution. Their sum of squares therefore appro ximately follo ws a χ 2 distribution. Dividing this sum by its degrees of freedom yields a mean-square v alue, with an exp ectation of 1.0 and taking v alues in the range of 0 to inﬁnity . Outﬁt is sensitiv e to unexpected resp onses to items, e.g. , lucky guesses ( e.g. , a p erson resp onds 111001) or careless sequences of mistakes ( e.g. , a p erson re- sp onds 010100) (Linacre, 2002). Since outﬁt is sensitive to the very unexp ected observ ations (outliers), inﬁt was devised to be more sensitiv e to the ov erall pat- tern of resp onses (Linacre, 2006). Inﬁt is an information-weigh ted form of outﬁt: it w eights the observ ations b y their statistical information (model v ariance) which is larger for targeted observ ations, and smaller for extreme observ ations (Bond and F ox, 2001). In this pap er, w e focus on inﬁt. F ormally , the inﬁt of item i is giv en b y Inﬁt i = P n j =1 S ij Z 2 ij P n j =1 S ij = P n j =1 R 2 ij P n j =1 S ij . (10) Both outﬁt and inﬁt hav e the exp ected v alue of 1.0. V alues larger than 1.0 indicate mo del underﬁtting, i.e. , data is less predictable than the mo del expects, while v alues less than 1.0 indicate ov erﬁtting, i.e. , observ ations are highly pre- dictable (W righ t et al, 1994). Con ven tionally , the acceptable range is usually tak en to b e [0.7,1.3] or [0.8,1.2] dep ending on application. 3.2 Non-Negative Matrix F actorisation (NMF) Giv en a non-negativ e matrix V ∈ R m × n and a positive integer k , NMF factorises V into the product of a non-negativ e matrix W ∈ R m × k and a non-negativ e matrix H ∈ R k × n suc h that V ≈ WH V ( m × n ) ≈ W ( m × k ) × H ( k × n ) . A commonly-used measure for quan tifying the qualit y of this appro ximation is the F rob enius norm b etw een V and WH . Th us, NMF inv olv es solving argmin W , H k V − WH k 2 F s.t. W ≥ 0 , H ≥ 0 . (11) This ob jective function is con v ex in W and H separately , but not together. There- fore standard conv ex solv ers are not exp ected to ﬁnd a global optimum in general. The multiplicativ e update algorithm (Lee and Seung, 2001) is commonly used to ﬁnd a lo cal optimum, where W and H are up dated b y a m ultiplicative factor that dep ends on the qualit y of the appro ximation. x Jiazhen He et al.          stud 1 stud 2 · · · stud n solver 0 . 26 0 . 11 · · · 0 . 52 optim 0 . 32 0 . 18 · · · 0 . 06 code 0 . 68 0 . 01 · · · 0 . 83 algorithm 0 . 89 0 . 61 · · · 0 . 44 . . . . . . . . . . . . . . . word m 0 . 22 0 . 54 · · · 0 . 98          V          topic 1 topic 2 · · · topic k solver 0 . 22 0 . 01 · · · 0 . 12 optim 0 . 38 0 . 15 · · · 0 . 06 code 0 . 18 0 . 05 · · · 0 . 03 algorithm 0 . 09 0 . 21 · · · 0 . 01 . . . . . . . . . . . . . . . word m 0 . 02 0 . 04 · · · 0 . 12          W      stud 1 stud 2 · · · stud k topic 1 0 . 83 0 . 17 · · · 0 . 04 topic 2 0 . 21 0 . 75 · · · 0 . 16 . . . . . . . . . . . . . . . topic k 0 . 09 0 . 64 · · · 0 . 62      H Fig. 3: Example matrices: w ord-studen t V , word-topic W , topic-student H . In the presen t MOOC setting, we fo cus on the studen ts who contributed p osts or comments in forums. F or each student, we aggregate all posts or comments that they con tributed. Each student is represented b y a bag of w ords as shown in the example word-studen t matrix V in Figure 3, where m represen ts the num b er of w ords, and n represents the num b er of studen ts. Using NMF, a word-studen t matrix V can be factorised into tw o non-negativ e matrices: word-topic matrix W and topic-studen t matrix H . F or eac h studen t, the column vector of V is appro ximated b y a linear com bination of the columns of W , w eighted by the comp onen ts of H . Therefore, eac h column vector of W can b e regarded as a topic, and the memberships of students in these topics are enco ded b y H as shown in Figure 3. 3.3 Problem Statement W e seek to explore the feasibility of automatic discov ery of forum discussion topics for measuring studen ts’ academic abilities in MOOCs, as quan tiﬁed by the Rasch mo del. Our cen tral tenet is that topics can b e regarded as useful items for mea- suring a latent skill, if studen t resp onses to these topics are w ell ﬁt b y the Rasc h mo del, and if the topics are in terpretable to domain experts for educational rele- v ance. Therefore, we need to disco v er topics from students’ p osts and comments in MOOC forums, in such a wa y that students’ participation across these topics ﬁ ts the Rasc h model. Studen t item response records whether a student posts on the corresp onding topic or not. After disco very , topics must then be further assessed for in terpretability to domain exp erts. Our goal is decision supp ort. In particular, under the NMF framew ork, a word-studen t matrix V can be fac- torised in to t w o non-negativ e matrices: word-topic matrix W and topic-student matrix H . Our application requires that the topic-studen t matrix H b e a) bi- nary ensuring the resp onse of a studen t to a topic is dic hotomous; b) useful for T opicResp onse: Automatic Measuremen t in MOOCs xi T able 2: Glossary of symbols Symbol Description m the num b er of words n the num b er of students k the num b er of topics V = ( v ij ) m × n word-studen t matrix W = ( w ij ) m × k word-topic matrix H = ( h ij ) k × n topic-student matrix H ideal =  ( h ideal ) ij  1 × n matrix for students with ideal n umber of distinct topics p osted 1 r all-ones matrix with size 1 × n g j student j ’s grade β = ( β i ) k item diﬃculty vector θ = ( θ j ) n student ability vector X ij binary resp onse (0 or 1) of p erson j to item i x ij observed resp onse of p erson j to item i p ij the probability of positive response of p erson j to item i S ij v ariance of X ij Z ij standardised residual λ 0 , λ 1 , λ 2 , λ 3 regularisation co eﬃcients measuring studen ts’ academic abilities; and c) w ell-ﬁt by the Rasch model. NMF pro vides an elegan t framework for incorporating these constraints via adding nov el regularisation, as detailed in the next section. A glossary of the sym b ols most used in this pap er is giv en in T able 2. 4 The TopicResponse Algorithm: Joint NMF-Rasc h Estimation T o fav our topics that ﬁt the Rasch mo del, we join tly optimise wwwb oth NMF and Rasc h mo dels, which yields the ob jective function g ( W , H , θ , β ) = k V − WH k 2 F − λ 0 f R ( H , θ , β ) , f R ( H , θ , β ) = k X i =1 n X j =1 h ij ( θ j − β i ) − k X i =1 n X j =1 log (1 + exp ( θ j − β i )) , where f R ( H , θ , β ) is the log-likelihoo d function maximised in Rasc h estimation, and λ 0 > 0 is a user-sp eciﬁed parameter controlling the trade-oﬀ betw een the qualit y of factorisation and Rasc h estimation. W e ak sup ervision of item r esp onses. The ﬁt b et ween student topic resp onses H and the Rasch model will provide statistical evidence of measuring skill attainment. Ho wev er, it is diﬃcult to conclude what the topics are measuring without domain kno wledge. T o fa vour the topics that can be used to measure students’ academic abilities, we imp ose a constraint on H based on some studen t grade, which provides an indicator of student’s abilities (we discuss sources of auxiliary grade information b elo w). In particular, w e assume that there is the following relationship betw een xii Jiazhen He et al. the ideal num b er of distinct topics that each studen t j contributes and their grade g j ∈ [0 , 100] , ( h ideal ) 1 j = min  g j + w idth w idth  , k − 1  , w idth = 100 k − 1 , where H ideal is a 1 × n matrix, denoting the ideal num b er of distinct topics posted b y students. F or example under k = 10 items, studen t j scoring g j = 45 should p ost on a n umber of topics ( h ideal ) 1 j = 5 . The minim um and maximum num b er of diﬀerent topics that a studen t j posted is 1 and k − 1 resp ectively . This is motiv ated by the initialisation of θ and β as illustrated in Section 3.1.1, where p ositiv e resp onses on 0 or k topics is undesirable. This supervision constraint is markedly w eak er than a similar constraint found in (He et al, 2016), as demonstrated in Figure 4. He et al (2016) leverage the studen t grade to exactly determine the item responses for the Guttman scale. The Guttman scale, as a deterministic mo del, requires that if a student can get a diﬃcult item correct, they can also ac hieve correct resp onses on all easier items. This assumption is v ery restrictiv e, and rarely mak es sense in practice. The Rasc h mo del allo ws errors in the resp onses; and only constrains the num b er of distinct topics p osted b y a student, rather than the exact resp onse pattern. Most (MOOC) courses conduct multiple forms of assessmen t throughout the duration of teaching. F or example, weekly quizzes, tak e-home assignments, mid- term tests, pro jects, presentations, etc. In the large-scale MOOC context, such ev aluations ma y be peer-assessed. Students often en ter courses with some cum u- lativ e grade-p oint av erage that ma y b e (loosely) predictiv e of future p erformance. An y of these readily-av ailable sources of student information could be reasonably used to seed H ideal . Even ﬁnal course grades could b e used, particularly when the ultimate application of TopicResponse is not measuring students, but reﬁning curriculum. In order to encourage satisfaction of the H ideal soft constraint on topic re- sp onses, w e introduce a regularisation term on H , namely k 1 r H − H ideal k 2 F . H ideal for the Guttman scale                grade 8 25 46 67 89 98 78 35 55 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0                H ideal for the Rasch model  grade 8 25 46 67 89 98 78 35 55 1 3 5 7 9 9 8 4 5  Fig. 4: An example of H ideal in the Guttman scale and the Rasc h mo del. T opicResp onse: Automatic Measuremen t in MOOCs xiii Quantising & R e gularising the R esp onse Matrix. W e introduce regularisation term k W k 2 F , commonly used to preven t ov erﬁtting in NMF. T o encourage binary solu- tions, we imp ose an additional regularisation term k H ◦ H − H k 2 F , where op erator ◦ denotes the Hadamard pro du ct. Binary matrix factorisation (BMF) is a v ariation of NMF, where the input matrix and the tw o factorised matrices are all binary . Our approac h is inspired by those of Zhang et al (2007) and Zhang et al (2010). Our added term equals k H ◦ ( H − 1 ) k 2 F , whic h is minimised b y (only) binary H . T opicR esp onse Mo del. W e hav e the following regularisations: – k 1 r H − H ideal k 2 F to encourage a grade-guided H ; – k W k 2 F to prev ent ov erﬁtting; and – k H ◦ H − H k 2 F to encourage a binary item-resp onse solution. These terms together with join t NMF-Rasc h estimation yield ﬁnal ob jectiv e f ( W , H , θ , β ) = k V − WH k 2 F − λ 0 f R ( H , θ , β ) + λ 1 k W k 2 F + λ 2 k 1 r H − H ideal k 2 F + λ 3 k H ◦ H − H k 2 F , (12) where λ 1 , λ 2 , λ 3 > 0 are user-sp eciﬁed regularisation parameters, with primal pro- gram argmin W , H , θ , β f ( W , H , θ , β ) s.t. W ≥ 0 , H ≥ 0 . (13) T opicR esp onse Fitting Pr o c e dur e. A lo cal optimum of program (13) is achiev ed via iteration w ij ← w ij ( VH T ) ij ( WHH T + λ 0 W ) ij (14) h ij ← h ij 2( W T V ) ij + 8 λ 2 h 3 ij + 6 λ 2 h 2 ij + 2 λ 1 ( 1 T r H ideal ) ij + λ 3 ( θ − β ) + ij 2( W T WH ) ij + 12 λ 2 h 3 ij + 2 λ 1 ( 1 r H ) ij + 2 λ 2 h ij + λ 3 ( θ − β ) − ij (15) β i ← β i − P n j =1 ( p ij − h ij ) − P n j =1 p ij (1 − p ij ) (16) θ j ← θ j − P k i =1 ( h ij − p ij ) − P k i =1 p ij (1 − p ij ) (17) where ( θ − β ) = ( θ − β ) + − ( θ − β ) − ( θ − β ) + ij = ( ( θ − β ) ij if ( θ − β ) ij > 0 0 if otherwise ( θ − β ) − ij = ( − ( θ − β ) ij if ( θ − β ) ij < 0 0 if otherwise ( θ − β ) + and ( θ − β ) − denote the p ositive part and negative part of matrix ( θ − β ) resp ectiv ely . W e next describe how these update rules are derived. xiv Jiazhen He et al. The up date rules (16) and (17) can b e obtained using Newton’s method. The up date rules (14) and (15) can be derived via the Karush-Kuhn-T uc k er conditions necessary for lo cal optimalit y . First we construct the unconstrained Lagrangian L ( W , H , θ , β , α , γ ) = f ( W , H , θ , β ) + tr ( α W ) + tr ( γ H ) , where α ij , γ ij ≤ 0 are the Lagrangian dual v ariables for inequality constraints w ij ≥ 0 and h ij ≥ 0 resp ectively , and α = [ α ij ] , γ = [ γ ij ] denote their corre- sp onding matrices. The KKT condition of stationarit y requires that the deriv ativ e of L with resp ect to H , v anishes at a lo cal optimum H ? , W ? , α ? , γ ? : ∂ L ∂ W =2  W ? H ? H ?T − VH ?T + λ 0 W ?  + α ? = 0 , ∂ L ∂ H =2  W ?T W ? H ? − W ?T V + λ 1 1 T r 1 r H + λ 2 H ? − λ 1 1 T r H ideal  + 4 λ 2 H ? ◦ H ? ◦ H ? − 6 λ 2 H ? ◦ H ? − λ 3  ( θ − β ) + − ( θ − β ) −  + γ ? = 0 . Complemen tary slackness γ ? ij h ? ij = 0 , implies: 0 =  VH ?T − W ? H ? H ?T − λ 0 W ?  ij w ? ij , 0 =  2 W ?T V + 6 λ 2 H ? ◦ H ? + 2 λ 1 1 T r H ideal − 2 W ?T W ? H ? − 2 λ 1 1 T r 1 r H ? − 4 λ 2 H ? ◦ H ? ◦ H ? − 2 λ 2 H ? + λ 3 ( θ − β ) + − λ 3 ( θ − β ) − +8 λ 2 H ? ◦ H ? ◦ H ? − 8 λ 2 H ? ◦ H ? ◦ H ?  ij h ? ij . These tw o equations lead to the up dating rules (14) and (15). Regarding the up date rules (14), (15), (16) and (17) w e ha ve the following theorem: Theorem 1 The obje ctive function f ( W , H , θ , β ) of TopicResponse pr o gr am (13) is non-incr e asing under up date rules (14) , (15) , (16) and (17) . This result guarantees that the up date rules of W , H , θ and β ev entually con verge, and that the obtained solution will b e a lo cal optim um. The pro of of Theorem 1 is giv en in the App endix. Algorithm 1 TopicResponse Require: V , H ideal , λ 0 , λ 1 , λ 2 , λ 3 , k ; Ensure: A topic-student matrix, H , item diﬃculties β , p erson abilities θ ; 1: Initialise W , H using NMF; 2: Normalise W , H following (Zhang et al, 2007, 2010); 3: Initialise θ , β based on Eq. (6) and Eq. (7); 4: rep eat 5: Up date W , H , β , θ iterativ ely based on Eq. (14) to Eq. (17); 6: un til conv erged 7: return H ; T opicResp onse: Automatic Measuremen t in MOOCs xv Algorithm. Our o verall approac h TopicResponse is describ ed as Algorithm 1. W and H are initialised using plain NMF (Lee and Seung, 1999, 2001), then normalised (Zhang et al, 2007, 2010). θ and β are initialised based on Eq. (6) and Eq. (7), where x ij is replaced by h ij . At optimisation completion, estimates for topics, item diﬃculties and p erson abilities can b e obtained together. Co de for TopicResponse is a v ailable from the authors’ websites. 5 Exp eriments W e rep ort on extensive exp erimen ts ev aluating the eﬀectiveness of TopicRe- sponse on real MOOCs. In our exp eriments, we use the ﬁrst oﬀerings of three Coursera MOOCs from education, economics and computer science oﬀered b y The Univ ersit y of Melb ourne: Assessment and T e aching of 21st Century Skil ls deliv ered in 2014, Principles of Macr o e c onomics deliv ered in 2013, and Discr ete Optimisation delivered in 2013. W e denote these three courses b y EDU, ECON and OPT resp ectiv ely . 5.1 Dataset Preparation W e focus on the studen ts who con tributed posts or comments in forums. F or eac h studen t, we aggregate all the p osts and comments that they contributed. After stemming and removing stop words, a word-studen t matrix with normalised tf- idf in [0,1] is produced. The statistics of words and studen ts before and after prepro cessing, the dominated words, and the sparsity of word-studen t matrix (the p ercen tage of non-zeros v alues) for three MOOCs are display ed in T able 3. 5.2 Baseline and Ev aluation Metrics W e compare our algorithm TopicResponse with the baseline algorithm Grade- Guided NMF ( GG-NMF ), whic h minimises the follo wing ob jection function T able 3: Statistics of our three Coursera MOOC datasets. MOOC #Studen ts #W ords #W ords af- ter prepro- cessing Dominated words W ord- student matrix spar- sity EDU 1,749 28,931 18,391 student, learn, skill, work, teacher, use, assess, teach, problem, collab or 0.59% ECON 1,551 26,370 21,412 gdp, would, econom, think, product, goo d, one, economi, increas, inv est 0.50% OPT 1,092 19,284 16,128 use, solut, get, time, one, tri, python, work, optim, would 0.85% xvi Jiazhen He et al. f G ( W , H ) = k V − WH k 2 F + λ 1 k W k 2 F + λ 2 k 1 r H − H ideal k 2 F + λ 3 k H ◦ H − H k 2 F A lo cal optim um can be obtained using the Karush-Kuhn-T uck er conditions. Lik e TopicResponse , GG-NMF regularises H by considering the students’ grades as an indicator of academic abilit y . The diﬀerence is that TopicResponse optimises the Rasch estimation and NMF sim ultaneously , while in GG-NMF , the students’ topic resp onses H are ﬁrst obtained, and then are passed through the Rasch mo del. W e ev aluate the tw o algorithms in terms of the follo wing metrics. Quality of factorisation. W e measure k V − WH k 2 F so as to record how w ell the factorisation appro ximates the studen t-word matrix. Me asuring student ac ademic ability. Quality of constraint on studen ts’ topic par- ticipation, based on grades: k 1 r H − H ideal k 2 F . Ne gative lo g-likeliho o d. Log-lik eliho o d measures ﬁt of the Rasch mo del to the en- tire dataset. F or con venience, we lo ok at the negative log-likelihoo d, whic h should b e minimised: smaller is b etter. Th is measure is our main fo cus for Rasch, as it is imp ortan t to examine the mo del-level ﬁt b efore lo oking at item-level ﬁt. Item inﬁt. As illustrated in Section 3.1.2, item inﬁt examines the ﬁt of a par- ticular item, with non-ﬁtting items suitable for further reﬁnement. W e use the con ven tional acceptance range of [0.7, 1.3]. 5.3 Hyp erparameter Settings T able 4 presen ts the parameter v alues used for our parameter sensitivit y exp eri- men ts, where the default v alues sho wn in b oldface are used in experiments unless noted otherwise. 5.4 Main Results for GG-NMF and T opicResp onse In the ﬁrst group of exp eriments, we examine the p erformance of GG-NMF (base- line) and TopicResponse in terms of negative log-likelihoo d, the quality of fac- torisation WH in appro ximation V given b y k V − WH k 2 F , and the sup ervision T able 4: Hyp erparameter settings. Param. V alues Explored ( Default ) λ 0 [0 . 01 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 , 0 . 5] λ 1 [10 − 3 , 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2 ] λ 2 [10 − 3 , 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2 ] λ 3 [10 − 3 , 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2 ] k [5 , 10 , 15 , 20 , 25 , 30] ECON OPT EDU 0 1000 2000 3000 4000 5000 6000 7000 8000 Negative log-likelihood 6904 5124 7410 5274 4340 2200 GG-NMF TopicResponse Fig. 5: Negative log-likelihoo d as goo d- ness of ﬁt; Smaller is better. T opicResp onse: Automatic Measuremen t in MOOCs xvii ECON OPT EDU 0 200 400 600 800 1000 1200 1400 1600 1800 | | V − W H | | 2 F Quality of factorsiation 1471 1014 1619 1475 1017 1618 GG-NMF TopicResponse (a) Quality of factorisation, k V − WH k 2 F ECON OPT EDU 0 2 4 6 8 10 12 14 16 18 Quality of grade-guided constraint 7 4 9 4 3 17 GG-NMF TopicResponse (b) Quality of graded-guided constraint, k 1 r H − H ideal k 2 F Fig. 6: Performance of GG-NMF and TopicResponse in terms of k V − WH k 2 F and k 1 r H − H ideal k 2 F ; Smaller is b etter. soft constraint k 1 r H − H ideal k 2 F . F or GG-NMF , the factorisation and Rasch es- timation are separated, where topic-student resp onse matrix H is ﬁrst obtained using GG-NMF , and then is taken as input to Rasc h estimation. F or TopicRe- sponse , the negative log-lik eliho o d is optimised together with factorisation. The parameters are set using the b oldface default v alues in T able 4. Figure 5 displa ys the negativ e log-likelihoo d of GG-NMF and TopicResponse . It can be seen that TopicResponse can yield sup erior negative log-likelihoo d, implying b etter ﬁt b etw een the topic-student response matrix H and the Rasch mo del. TopicResponse therefore pro vides greater conﬁdence that other item- lev el ﬁt statistics suc h as inﬁt, will b e acceptable. Jointly optimising the matrix factorisation and R asch estimation c an bring us closer to glob al optima. W e presen t the results on qualit y of appro ximation k V − WH k 2 F and super- vision term k 1 r H − H ideal k 2 F , in Figure 6. F rom these plots, w e can see that without sacriﬁcing appro ximation performance in terms of k V − WH k 2 F , Topi- cResponse obtains superior k 1 r H − H ideal k 2 F (while obtaining excellen t negative log-lik eliho o ds as ab ov e). This p erformance again demonstrates that optimising the factorisation and Rasch estimation globally can b e sup erior to optimising them sep- arately . W e therefore conclude that TopicResponse is preferable to GG-NMF ; w e fo cus on results for TopicResponse in the remainder of our exp erimen ts. 5.5 Item Inﬁt, Item Diﬃculty and Student Abilit y W e further examine the inﬁt of each item, whic h indicates if the set of topics conform to the Rasc h mo del, and is appropriate for measuremen t. As illustrated in Section 3.1.2, a conv en tional acceptable range of inﬁt is 0.7 to 1.3. As an example, w e sho w the item inﬁt in Figure 7 on OPT MOOC. W e can see that the inﬁt of eac h item is in the acceptable range, with most very close to the (ideal) expected v alue of 1.0, indicating that the set of topics conform to the Rasch mo del and is appropriate for measuring studen t abilit y . A dditionally , we examine item diﬃculties and student abilities. Figure 8 dis- pla ys the histogram of item diﬃculty and studen t ability along a common scale. A ccording to the Rasch mo del, the higher a p erson’s ability relative to the dif- xviii Jiazhen He et al. 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 item infit 0.0 0.5 1.0 1.5 2.0 2.5 3.0 item infit frequencey Fig. 7: Item inﬁt histogram for OPT MOOC; inﬁt closer to 1 is better. 3 2 1 0 1 2 3 Location: Academic Ability Scale (logits) 0 100 200 300 400 Student frequency least able most able 0 1 2 Item frequency topic 1 easiest topic 10 most difficult Fig. 8: Histograms of OPT MOOC stu- den t abilit y lo cation (top) and item diﬃ- cult y lo cation (b ottom). ﬁcult y of a topic, the higher the probabilit y that p erson p osts on that topic. It can be seen that most students with low ability (around -2 logits), only dominate the “easiest” topic (topic 1 with diﬃcult -2.3 logits); this topic concerns general problem solving. In other words, these students are likely to p ost only on topic 1, and unlik ely to post on other topics. By comparison, the most able studen ts with abilities around 2, with high probabilit y con tribute to all the topics. 5.6 T opic Interpretation and Discussion W e qualitatively examine topic interpretation, in order to assess educational mean- ingfulness. W ell-scaled topics can p otentially b e used for curriculum reﬁnemen t. T able 5 presents the topics generated using TopicResponse , alongside inferred diﬃculties. T opics are interpreted b y an instructor who teaches a similar course. As the topics are not all course con ten t-related, w e en vision that instructors ex- amine disco vered topics prior to using all for reﬁning curriculum or taking other actions. Additionally , the inferred student abilit y levels and topic diﬃcult y levels could b e p otentially used for p ersonalised feedbac k, b y tailoring appropriate top- ics of course con ten t or forum discussion to students with their individual abilit y lev el taken into accoun t. F or example, most students (lo w est abilit y) only discuss solving problem in general, as sho wn in Figure 8. If they cannot obtain suﬃ- cien t help from forum discussions, they ma y b e prone to drop out without further topic exploration. Therefore, in interv ening with at-risk studen ts, it is advisable to lev erage discov ered topics to b etter fo cus measures. Such services ma y b e useful in prev enting drop out in early stages (when most drop outs typically occur). T opicResp onse: Automatic Measuremen t in MOOCs xix T able 5: T opics and diﬃculty lev els, b y TopicResponse on OPT MOOC. No. T opics Interpretation Inferred diﬃcult y 1 use time problem get solut one optim algorithm tri work Solving in general -2.30 2 cours thank would lectur realli great assign go od lik e think Course feedback -0.93 3 python use run program solver jav a matlab instal command work Python/Jav a/Matlab (How to start) -0.63 4 problem thank solut get grade knap- sack got feedback optim solv How knapscak problem is solv ed and graded -0.44 5 memori dp use column bb implement solv algorithm b ound tabl Comparing al- gorithms mem- ory/time 0.23 6 color node graph random edg greedi opt search swap iter Graph coloring 0.31 7 item v alu weigh t capac estim take solut calcul b est list Knapsack problem 0.33 8 ﬁle pi line solver data submit lib urllib2 solveit op en Using solvers 0.52 9 video h ttp class load lecture org prob- lem coursera optimization 001 Platform 1.17 10 submit assign assignment error messag view assignmen t_id detail class cours- era Assignment sub- mission 1.73 5.7 Parameter Sensitivit y T o v alidate the robustness of TopicResponse to parameter settings, a series of sensitivit y exp erimen ts w ere conducted. The parameter settings are sho wn in T a- ble 4. Negative log-likelihoo ds, k 1 r H − H ideal k 2 F , k V − WH k 2 F and k H ◦ H − H k 2 F are examined in these exp eriments. Due to space limitations, we rep ort here results for λ 0 on the OPT MOOC. The reader is referred to App endix B for results on parameters λ 1 , λ 2 , λ 3 , k on all three MOOCs. Eﬀe ct of Par ameter λ 0 . As can be seen in Figure 9, as λ 0 is increased TopicRe- sponse p erforms better in terms of negative log-likelihoo d, and performs w orse in terms of the other three metrics due to the regularisation on the Rasc h mo del. By con trast, the p erformance of GG-NMF does not change as there is no regularisa- tion term on its Rasc h estimation. Overall, TopicResponse p erforms w ell when λ 0 v aries b etw een 0.1 and 0.2. 6 Conclusion and F uture W ork W e hav e examined the suitabilit y of con tent-based items (topics) discov ered from MOOC forum discussions, for modelling studen t abilities. Our cen tral tenet is that topics can b e regarded as useful items for measuring latent skills, if student resp onses to these topics ﬁt the Rasch item-resp onse theory mo del, and if the dis- co vered topics are further interpretable to domain exp erts. W e propose to join tly optimise NMF and Rasch mo delling, in order to discov er Rasc h-scaled topics. W e xx Jiazhen He et al. 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 6000 4000 2000 0 2000 4000 6000 Negative log-likelihood GG-NMF TopicResponse (a) Negative log-ikelihoo d on OPT 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 50 100 150 200 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (b) k 1 r H − H ideal k 2 F on OPT 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 1014 1016 1018 1020 1022 1024 1026 1028 1030 | | V − W H | | 2 F GG-NMF TopicResponse (c) k V − WH k 2 F on OPT 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 200 400 600 800 1000 1200 | | H 2 − H | | 2 F GG-NMF TopicResponse (d) k H ◦ H − H k 2 F on OPT Fig. 9: P erformance of GG-NMF and TopicResponse on OPT with v arying λ 0 . pro vide a quantitativ e v alidation on three Coursera MOOCs, demonstrating that TopicResponse yields b etter global ﬁt to the Rasch mo del (observed with lo wer negativ e log-likelihoo d), main tains goo d qualit y of factorisation appro ximation, while measuring the students’ academic abilities (reﬂected by the grade-guided constrain t on students’ participation on topics). W e also provide qualitative ex- amination of topic interpretation with inferred diﬃculty lev els on a Discrete Opti- misation MOOC. The results on goo dness of ﬁt and our qualitativ e examination, together suggest p otential applications in curriculum reﬁnemen t, student assess- men t and p ersonalised feedback. W e opted to study the relatively simple Rasch mo del, as it forms the basis of v ery man y subsequent mo dels in the literature. One direction for extension, is that for an y model (lik e Rasc h), that ﬁts parameters via maximum-lik eliho o d estima- tion (or risk minimisation in general), the mo del can b e augmen ted with NMF as an additional regularisation. F or example, such an extension should b e straigh tfor- w ard for polychotomous observ ations, hierarc hical mo dels on laten t skills, mo dels that include more ﬂexible p er-student v ariation, etc. These represent fruitful di- rection for future research. Another p ossible extension could inv olv e augmen ting the W , H matrices in the NMF or Rasch ob jective terms with man ually-crafted items, to mak e eﬀectiv e use of prior knowledge. A ckno wledgements W e thank Jeﬀrey Chan for discussions related to this w ork, and the anonymous reviewers and editor for their thoughtful feedback. This work is supp orted b y Data61, and the Australian Research Council (DE160100584). T opicResp onse: Automatic Measuremen t in MOOCs xxi References Bac hrach Y, Graep el T, Mink a T, Guiv er J (2012) Ho w to grade a test without kno wing the answ ers—a Ba yesian graphical mo del for adaptive cro wdsourcing and aptitude testing. In: Pro ceedings of the 29th International Conference on Mac hine Learning (ICML-12), pp 1183–1190 Bak er FB, Kim SH (2004) Item resp onse theory: Parameter estimation techniques. CR C Press Bergner Y, Droschler S, Kortemey er G, Rayy an S, Seaton D, Pritchard DE (2012) Mo del-based collaborative ﬁltering analysis of studen t resp onse data: Machine- learning item resp onse theory . International Educational Data Mining Society Bertsek as DP (1999) Nonlinear programming. Athena Scientiﬁc Bond TG, F ox CM (2001) Applying the Rasc h mo del: F undamental measurement in the h uman sciences. La wrence Erlbaum Asso ciates Publishers Champaign J, Colvin KF, Liu A, F redericks C, Seaton D, Pritc hard DE (2014) Correlating skill and impro v emen t in 2 MOOCs with a studen t’s time on tasks. In: Pro ceedings of the First A CM Conference on Learning@Scale Conference, A CM, pp 11–20 Chaturv edi S, Goldwasser D, Daumé II I H (2014) Predicting instructor’s interv en- tion in MOOC forums. In: A CL (1), pp 1501–1511 Chen CM, Lee HM, Chen YH (2005) Personalized e-learning system using item resp onse theory . Computers & Education 44(3):237–255 Colvin KF, Champaign J, Liu A, F rederic ks C, Pritc hard DE (2014) Comparing learning in a MOOC and a blended on-campus course. In: Educational Data Mining 2014 Gillani N, Eynon R, Osborne M, Hjorth I, Rob erts S (2014) Communication com- m unities in MOOCs. arXiv preprin t arXiv:14034640 Guttman L (1950) The basis for scalogram analysis. In: Stouﬀer S (ed) Measure- men t and Prediction: The American Soldier, Wiley , New Y ork He J, Rubinstein BI, Bailey J, Zhang R, Milligan S, Chan J (2016) MOOCs meet measuremen t theory: A topic-modelling approac h. In: Thirtieth AAAI Confer- ence on Artiﬁcial In telligence Jenders M, Krestel R, Naumann F (2016) Which answ er is b est? Predicting ac- cepted answ ers in MOOC forums. WWW’2016 Companion Lee DD, Seung HS (1999) Learning the parts of ob jects b y non-negativ e matrix factorization. Nature 401(6755):788–791 Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: A dv ances in Neural Information Pro cessing Systems, pp 556–562 Linacre JM (2002) What do inﬁt and outﬁt, mean-square and standardized mean. Rasc h Measurement T ransactions 16(2):878 Linacre JM (2006) Misﬁt diagnosis: Inﬁt outﬁt mean-square standardized. Re- triev ed June 1:2006 Milligan S (2015) Cro wd-sourced learning in MOOCs: Learning analytics meets measuremen t theory . In: Pro ceedings of the Fifth In ternational Conference on Learning Analytics And Kno wledge, A CM, pp 151–155 Ramesh A, Kumar SH, F oulds J, Getoor L (2015) W eakly sup ervised mo dels of asp ect-sen timent for online course discussion forums. In: Annual Meeting of the Asso ciation for Computational Linguistics (A CL) xxii Jiazhen He et al. Rasc h G (1993) Probabilistic mo dels for some intelligence and attainmen t tests. ERIC Sc holten AZ (2011) Admissible statistics from a latent v ariable p ersp ective. Theory & Psyc hology 18:111–117 W en M, Y ang D, Rose C (2014) Sentimen t analysis in MOOC discussion forums: What do es it tell us? In: Educational Data Mining 2014 W right BD, Masters GN (1982) Rating Scale Analysis. Rasc h Measurement. ERIC W right BD, Linacre JM, Gustafson J, Martin-Lof P (1994) Reasonable mean- square ﬁt v alues. Rasch measuremen t transactions 8(3):370 Y ang D, Adamson D, Rosé CP (2014) Question recommendation with constrain ts for massive op en online courses. In: Pro ceedings of the 8th A CM Conference on Recommender systems, A CM, pp 49–56 Y ang D, W en M, Ho wley I, Kraut R, Rose C (2015) Exploring the eﬀect of confu- sion in discussion forums of massive op en online courses. In: Proceedings of the Second (2015) A CM Conference on Learning@ Scale, A CM, pp 121–130 Zhang Z, Ding C, Li T, Zhang X (2007) Binary matrix factorization with applica- tions. In: Data Mining, 2007. ICDM 2007. Sev enth IEEE In ternational Confer- ence on, IEEE, pp 391–400 Zhang ZY, Li T, Ding C, Ren XW, Zhang XS (2010) Binary matrix factoriza- tion for analyzing gene expression data. Data Mining and Kno wledge Discov ery 20(1):28–52 A Pro of of Theorem 1 The update rules for β i and θ j are deriv ed using the Newton-Raphson metho d (Bertsek as, 1999), where the conv ergence to a lo cal optim um is guaranteed. Here, we focus on the pro of for the up date rule for h ij . The up date rule for w ij can be prov ed similarly . W e closely follow the procedure describ ed in (Lee and Seung, 2001), where an auxiliary function similar to that used in the Exp ectation-Maximization (EM) algorithm is used for pro of. Deﬁnition 2 (Lee and Seung 2001) G ( h, h 0 ) is an auxiliary function for F ( h ) if the c on- ditions G ( h, h 0 ) ≥ F ( h ) , G ( h, h ) = F ( h ) , ar e satisﬁe d. Lemma 1 (Lee and Seung 2001) If G is an auxiliary function, then F is non-incr e asing under the up date h t +1 = argmin h G ( h, h t ) . (18) Pr oof The result follo ws from noting F ( h t +1 ) ≤ G ( h t +1 , h t ) ≤ G ( h t , h t ) = F ( h t ) . u t F or an y elemen t h ij in H , let F h ij denote the part of f ( W , H , θ , β ) in Eq. (12) relev an t to h ij . Since the update is essen tially elemen t-wise, it is suﬃcien t to show that each F h ij is non- increasing under the update rule of Eq. (15). T o prov e this, we deﬁne the auxiliary function regarding h ij as follows. Lemma 2 F unction G ( h ij , h t ij ) = F h ij ( h t ij ) + F 0 h ij ( h t ij )( h ij − h t ij ) + ϕ ij ( h ij − h t ij ) 2 , (19) T opicResp onse: Automatic Measuremen t in MOOCs xxiii wher e ϕ ij = 2( W T WH ) ij + 2 λ 1 ( 1 T r 1 r H ) ij + 12 λ 2 ( h t ij ) 3 + 2 λ 2 h t ij + λ 3 ( θ − β ) − ij 2 h t ij is an auxiliary function for F h ij . Pr oof It is ob vious that G ( h ij , h ij ) = F h ij . So we need only pro ve that G ( h ij , h t ij ) ≥ F h ij . Considering the T aylor series expansion of F h ij , F h ij = F h ij ( h t ij ) + F 0 h ij ( h t ij )( h ij − h t ij ) + 1 2 F 00 h ij ( h t ij )( h ij − h t ij ) 2 , G ( h ij , h t ij ) ≥ F h ij is equiv alent to ϕ ij ≥ 1 2 F 00 h ij ( h t ij ) , where F 00 h ij ( h t ij ) = 2( W T W ) ii + 2 λ 1 ( 1 T r 1 r ) ii + 12 λ 2 ( h t ij ) 2 − 12 λ 2 h t ij + 2 λ 2 . T o pro ve the ab ov e inequality , we hav e ϕ ij h t ij = ( W T WH ) ij + λ 1 ( 1 T r 1 r H ) ij + 6 λ 2 ( h t ij ) 3 + λ 2 h t ij + 0 . 5 λ 3 ( θ − β ) − ij = k X l =1 ( W T W ) il h t lj + λ 1 k X l =1 ( 1 T r 1 r ) il h t lj + +6 λ 2 ( h t ij ) 3 + λ 2 h t ij + 0 . 5 λ 3 ( θ − β ) − ij ≥ ( W T W ) ii h t ij + λ 1 ( 1 T r 1 r ) ii h t ij + 6 λ 2 ( h t ij ) 3 + λ 2 h t ij − 12 λ 2 h t ij ≥ h t ij  ( W T W ) ii + λ 1 ( 1 T r 1 r ) ii + 6 λ 2 ( h t ij ) 2 − 6 λ 2 h t ij + λ 2  = 1 2 F 00 h ij ( h t ij ) h t ij . Thus, G ( h ij , h t ij ) ≥ F h ij as claimed. u t Replacing G ( h ij , h t ij ) in Eq. (18) b y Eq. (19) and setting ∂ G ( h ij ,h t ij ) ∂ h ij to b e 0 results in the update rule in Eq. (15). Since Eq. (19) is an auxiliary function, F h ij is non-increasing under this up date rule. B Exp erimental Results of P arameter Sensitivity on Regularisation P arameters λ 1 , λ 2 , λ 3 , and k a) Eﬀect of Parameter λ 1 : As w e can see from Figure 11, GG-NMF and T opicResp onse are not sensitive to λ 1 , performing stably with varying λ 1 . T opicResp onse constantly p erforms better in terms of negativ e log-likelihoo d while main taining the comparable p erformance in terms of the other three metrics. b) Eﬀect of P arameter λ 2 : It can b e seen from Figure 12 that GG-NMF and T opicRe- sponse p erform well in terms of k 1 r H − H ideal k 2 F (Figure 12d to Figure 12f) and k H ◦ H − H k 2 F (Figure 12j to Figure 12l) when λ 2 v aries from 10 0 to 10 2 , and from 10 − 3 to 10 0 respec- tively . k V − WH k 2 F gets worse as λ 1 increases, but do es not change a lot compared to k 1 r H − H ideal k 2 F and k H ◦ H − H k 2 F . As λ 2 increases, the p erformance of GG-NMF and T opicResp onse in terms of negative log-lik eliho o d decrease, and T opicResp onse constan tly performs b etter than GG-NMF. Overall, λ 2 with v alues around 1.0 is go o d for GG-NMF and T opicResp onse. c) Eﬀect of P arameter λ 3 : It can be seen that GG-NMF and T opicResp onse perform well in terms of k 1 r H − H ideal k 2 F (Figure 13d to Figure 13f) and k H ◦ H − H k 2 F (Figure 13j to Figure 13l) when λ 3 v aries from 10 − 1 to 10 0 , and from 10 0 to 10 2 respectively . Similar to λ 2 , λ 3 does not aﬀect k V − WH k 2 F signiﬁcantly . T opicResp onse constan tly ac hieves b etter negative log-likelihood than GG-NMF. Overall, λ 3 with values around 1.0 is go o d for GG-NMF and T opicResponse. xxiv Jiazhen He et al. d) Eﬀect of the num b er of topics k : It can b e seen from Figure 14 that T opicRe- sponse constantly outp erforms GG-NMF in terms of negativ e log-likelihoo d, while getting slightly worse performance in the other three metrics. This is reasonable, as GG-NMF has more constraints and hence the mo del itself is less likely to perform as w ell as the less con- strained GG-NMF in other metrics. Overall, GG-NMF and T opicResponse perform well in the experiments when k is set to 10 or 15. W e c ho ose 10 as the value of k since a smaller num b er of topics are easier to analyse. 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 8000 6000 4000 2000 0 2000 4000 6000 8000 Negative log-likelihood GG-NMF TopicResponse (a) Negative log-ikelihoo d on EDU 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 8000 6000 4000 2000 0 2000 4000 6000 8000 Negative log-likelihood GG-NMF TopicResponse (b) Negative log-ikelihoo d on ECON 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 6000 4000 2000 0 2000 4000 6000 Negative log-likelihood GG-NMF TopicResponse (c) Negativ e log-ikelihood on OPT 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 50 100 150 200 250 300 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (d) k 1 r H − H ideal k 2 F on EDU 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 50 100 150 200 250 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (e) k 1 r H − H ideal k 2 F on ECON 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 50 100 150 200 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (f ) k 1 r H − H ideal k 2 F on OPT 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 1618 1620 1622 1624 1626 1628 | | V − W H | | 2 F GG-NMF TopicResponse (g) k V − WH k 2 F on EDU 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 1470 1475 1480 1485 1490 1495 | | V − W H | | 2 F GG-NMF TopicResponse (h) k V − WH k 2 F on ECON 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 1014 1016 1018 1020 1022 1024 1026 1028 1030 | | V − W H | | 2 F GG-NMF TopicResponse (i) k V − WH k 2 F on OPT 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 200 400 600 800 1000 1200 1400 1600 1800 | | H 2 − H | | 2 F GG-NMF TopicResponse (j) k H ◦ H − H k 2 F on EDU 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 200 400 600 800 1000 1200 1400 1600 | | H 2 − H | | 2 F GG-NMF TopicResponse (k) k H ◦ H − H k 2 F on ECON 0.01 0.10 0.20 0.30 0.40 0.50 λ 0 0 200 400 600 800 1000 1200 | | H 2 − H | | 2 F GG-NMF TopicResponse (l) k H ◦ H − H k 2 F on OPT Fig. 10: P erformance of GG-NMF and T opicResp onse with v arying λ 0 . T opicResp onse: Automatic Measuremen t in MOOCs xxv 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 2000 3000 4000 5000 6000 7000 8000 Negative log-likelihood GG-NMF TopicResponse (a) Negativ e log-likelihoo d on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 5000 5500 6000 6500 7000 Negative log-likelihood GG-NMF TopicResponse (b) Negative log-likelihoo d on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 4000 4200 4400 4600 4800 5000 5200 5400 Negative log-likelihood GG-NMF TopicResponse (c) Negative log-likelihoo d on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 0 5 10 15 20 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (d) k 1 r H − H ideal k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 0 1 2 3 4 5 6 7 8 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (e) k 1 r H − H ideal k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 0 1 2 3 4 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (f ) k 1 r H − H ideal k 2 F on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 1618 1618 1619 1619 1620 1620 1621 1621 | | V − W H | | 2 F GG-NMF TopicResponse (g) k V − WH k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 1471 1472 1473 1474 1475 1476 1477 1478 1479 | | V − W H | | 2 F GG-NMF TopicResponse (h) k V − WH k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 1014 1015 1016 1017 1018 1019 1020 | | V − W H | | 2 F GG-NMF TopicResponse (i) k V − WH k 2 F on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 60 80 100 120 140 | | H 2 − H | | 2 F GG-NMF TopicResponse (j) k H ◦ H − H k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 45 50 55 60 65 70 75 | | H 2 − H | | 2 F GG-NMF TopicResponse (k) k H ◦ H − H k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 1 28 30 32 34 36 38 40 42 | | H 2 − H | | 2 F GG-NMF TopicResponse (l) k H ◦ H − H k 2 F on OPT Fig. 11: P erformance of GG-NMF and T opicResp onse with v arying λ 1 . xxvi Jiazhen He et al. 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1000 2000 3000 4000 5000 6000 7000 8000 Negative log-likelihood GG-NMF TopicResponse (a) Negativ e log-likelihoo d on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 0 1000 2000 3000 4000 5000 6000 7000 8000 Negative log-likelihood GG-NMF TopicResponse (b) Negative log-likelihoo d on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 0 1000 2000 3000 4000 5000 6000 Negative log-likelihood GG-NMF TopicResponse (c) Negative log-likelihoo d on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (d) k 1 r H − H ideal k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (e) k 1 r H − H ideal k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (f ) k 1 r H − H ideal k 2 F on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1520 1540 1560 1580 1600 1620 1640 | | V − W H | | 2 F GG-NMF TopicResponse (g) k V − WH k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1380 1400 1420 1440 1460 1480 1500 | | V − W H | | 2 F GG-NMF TopicResponse (h) k V − WH k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 940 960 980 1000 1020 | | V − W H | | 2 F GG-NMF TopicResponse (i) k V − WH k 2 F on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 | | H 2 − H | | 2 F GG-NMF TopicResponse (j) k H ◦ H − H k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 | | H 2 − H | | 2 F GG-NMF TopicResponse (k) k H ◦ H − H k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 2 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 | | H 2 − H | | 2 F GG-NMF TopicResponse (l) k H ◦ H − H k 2 F on OPT Fig. 12: P erformance of GG-NMF and T opicResp onse with v arying λ 2 . T opicResp onse: Automatic Measuremen t in MOOCs xxvii 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 35000 30000 25000 20000 15000 10000 5000 0 5000 10000 Negative log-likelihood GG-NMF TopicResponse (a) Likelihoo d on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 40000 30000 20000 10000 0 10000 Negative log-likelihood GG-NMF TopicResponse (b) Likelihoo d on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 30000 25000 20000 15000 10000 5000 0 5000 10000 Negative log-likelihood GG-NMF TopicResponse (c) Likelihoo d on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1 0 - 1 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (d) k 1 r H − H ideal k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1 0 - 1 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (e) k 1 r H − H ideal k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1 0 - 1 1 0 0 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (f ) k 1 r H − H ideal k 2 F on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1580 1590 1600 1610 1620 1630 1640 1650 | | V − W H | | 2 F GG-NMF TopicResponse (g) k V − WH k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1445 1450 1455 1460 1465 1470 1475 1480 1485 1490 | | V − W H | | 2 F GG-NMF TopicResponse (h) k V − WH k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 985 990 995 1000 1005 1010 1015 1020 1025 1030 | | V − W H | | 2 F GG-NMF TopicResponse (i) k V − WH k 2 F on OPT 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 6 | | H 2 − H | | 2 F GG-NMF TopicResponse (j) k H ◦ H − H k 2 F on EDU 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 6 | | H 2 − H | | 2 F GG-NMF TopicResponse (k) k H ◦ H − H k 2 F on ECON 1 0 - 3 1 0 - 2 1 0 - 1 1 0 0 1 0 1 1 0 2 λ 3 1 0 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 6 | | H 2 − H | | 2 F GG-NMF TopicResponse (l) k H ◦ H − H k 2 F on OPT Fig. 13: P erformance of GG-NMF and T opicResp onse with v arying λ 3 . xxviii Jiazhen He et al. 5 10 15 20 25 30 k 2000 4000 6000 8000 10000 12000 14000 16000 18000 Negative log-likelihood GG-NMF TopicResponse (a) Negative log-ikelihoo d on EDU 5 10 15 20 25 30 k 2000 4000 6000 8000 10000 12000 14000 16000 Negative log-likelihood GG-NMF TopicResponse (b) Negative log-ikelihoo d on ECON 5 10 15 20 25 30 k 0 2000 4000 6000 8000 10000 12000 14000 Negative log-likelihood GG-NMF TopicResponse (c) Negativ e log-ikelihood on OPT 5 10 15 20 25 30 k 0 10 20 30 40 50 60 70 80 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (d) k 1 r H − H ideal k 2 F on EDU 5 10 15 20 25 30 k 0 10 20 30 40 50 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (e) k 1 r H − H ideal k 2 F on ECON 5 10 15 20 25 30 k 0 10 20 30 40 50 60 | | 1 r H − H i d e a l | | 2 F GG-NMF TopicResponse (f ) k 1 r H − H ideal k 2 F on OPT 5 10 15 20 25 30 k 1610 1612 1614 1616 1618 1620 1622 1624 1626 1628 | | V − W H | | 2 F GG-NMF TopicResponse (g) k V − WH k 2 F on EDU 5 10 15 20 25 30 k 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 | | V − W H | | 2 F GG-NMF TopicResponse (h) k V − WH k 2 F on ECON 5 10 15 20 25 30 k 1005 1010 1015 1020 1025 1030 | | V − W H | | 2 F GG-NMF TopicResponse (i) k V − WH k 2 F on OPT 5 10 15 20 25 30 k 0 100 200 300 400 500 600 700 800 | | H 2 − H | | 2 F GG-NMF TopicResponse (j) k H ◦ H − H k 2 F on EDU 5 10 15 20 25 30 k 0 100 200 300 400 500 600 | | H 2 − H | | 2 F GG-NMF TopicResponse (k) k H ◦ H − H k 2 F on ECON 5 10 15 20 25 30 k 0 100 200 300 400 500 600 | | H 2 − H | | 2 F GG-NMF TopicResponse (l) k H ◦ H − H k 2 F on OPT Fig. 14: P erformance of GG-NMF and T opicResp onse with v arying k .

TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for Automatic Measurement in MOOCs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment