Time-Series Adaptive Estimation of Vaccination Uptake Using Web Search Queries

Estimating vaccination uptake is an integral part of ensuring public health. It was recently shown that vaccination uptake can be estimated automatically from web data, instead of slowly collected clinical records or population surveys. All prior wor…

Authors: Niels Dalum Hansen, K{aa}re M{o}lbak, Ingemar J. Cox

Time-Series Adaptive Estimation of V accination Uptake Using W eb Sear ch Queries Niels Dalum Hansen University of Copenhagen/IBM nhansen@di.ku.dk Kåre Mø lbak Statens Ser um Institut KRM@ssi.dk Ingemar J . Co x University of Copenhagen ingemar .co x@di.ku.dk Christina Lioma University of Copenhagen c.lioma@di.ku.dk ABSTRA CT Es ti ma ti ng v a c ci n at io n up ta k e is a n in te gr al pa rt of en sur in g pu bl ic he al th . It w a s r ec en tl y s ho wn th at v a cc in at io n up ta ke c an be es ti ma te d a ut om at ic al ly fr om w eb da ta, in st ea d o f s lo wl y c ol le ct ed cl in ic al rec or ds or pop ul at io n s ur ve ys [2 ]. Al l p ri or w or k in th is ar ea as su mes th at fea tu re s of v a cc in at io n up ta k e col le ct ed f ro m th e w eb ar e te mpo ra ll y re gu la r. W e pr es en t t he fi rs t ev er me th od to re mo v e th is as su mp ti on fro m v ac ci na ti on up ta k e es ti ma tio n: ou r met hod dy na mi ca ll y ad ap ts to te mpor al fl uc tu at io ns in ti me se ri es w e b da t a us ed to es tim at e v a cc in at i on up ta k e. W e sh o w ou r me th od t o out per fo rm th e s ta te of th e a rt co mp ar ed to co mpe ti ti v e ba s el in es th at us e no t on ly w eb d at a bu t al so c ur at ed cl in ic al da ta . Th is per for ma nc e im pro v em en t is mo re pr ono un ce d fo r v a cc in es wh os e upt ak e h as been irr eg ul ar du e t o ne ga tiv e me di a at te n ti on (H PV- 1 an d HP V- 2) , pr ob lem s in v ac ci ne su pp ly (D iT eK iP ol ), an d ta rg et ed at c hi ld re n of 12 y ea rs ol d ( wh os e v a cc in at io n is mo re ir re g ul a r co mp ar ed to y ou ng er c hi ld re n). 1. INTR ODUCTION AND RELA TED WORK V ac ci na tio n pro gr am s ar e a n ef fi c ie n t an d co st e ff ec ti v e me th o d to im pro v e pu bli c he al th . Wi th su ff ic ien t ly ma ny peo ple v a cc i- na te d th e popu la ti on g ai n s he rd im m un it y , mea ni ng th e di se as e ca nn ot sp re ad . Ti me ly ac ti on s to a v oi d d ro ps i n v a cc in at ion co v er ag e ar e th er ef or e of gr ea t im p ort an ce . Ma n y co un t ri es ha v e no re gi st ri es of ti me ly v ac cin at io n up ta k e in fo rm a ti on , bu t rel y fo r ex am pl e on y ea rl y su rv ey s. I n su c h co un tr ies es ti ma ti on s of ne ar re al -t im e v a cci na ti on up ta ke ba se d so le ly on w eb d at a ar e v a lu ab le . W e ex te nd pr io r w ork in th is ar ea [2 ], wh ic h sh o w ed th at v ac cin at io n up ta k e ca n be es tim at ed su ff ic ie n tl y ac cu ra te ly fr om w eb se arc h da ta . O ur e xt en si on co ns is ts o f a ne w es ti ma tio n me th od t ha t a da p ts dy na mi ca ll y to te mpo ral f lu c tu at io ns in th e si gn al (w eb se arc h qu er ie s in ou r ca se ) in st ea d of as su mi ng te mpo ra l st at io na ri t y as in [2 ]. Th is co n tr ib uti on is no v el wi th in v a cc in at io n up ta k e es ti ma ti on . Li ne ar mod el s ha v e bee n us ed pr ev io us ly to es ti ma te he al th ev e n ts , for i ns ta nc e b y co m bi ni ng da ta fr om m ult ip le s ou rc es wi th an en se m bl e of de ci si on tr ee s [5 ], or , cl ose r to ou r w ork , b y us in g qu er y f re qu en ci es for in flu en za li k e il ln es s [1 ] or v ac ci na tio n up ta k e c  2017 International W orld Wide W eb Conference Committee (IW3C2), published under Creativ e Commons CC BY 4.0 License. WWW 2017, April 3–7, 2017, Perth, Australia. A CM 978-1-4503-4913-0/17/04. http://dx.doi.org/10.1145/3041021.3054251 . es ti ma ti on [ 2] . Th es e ap pr oa c he s ar e de sig ne d fo r st at io nar y ti me -s er ie s an al ys is , i. e. th ey as su me da ta is ge ne ra ted b y a st at io na ry st o ch as ti c pr oce ss . Ou r mo ti v at io n is th at v ac cin a- ti on up ta k e of ten doe s no t fo ll o w sta ti on ar y se aso na l pa tt er ns. Ex te rn al ev en ts su c h as di se as e ou tb rea ks , su sp ic io n of ad v er se ef fe ct s, or te mpo rar y v ac c in e sh ort ag es ca n al te r upt ak e pa tt er ns fo r sh or te r o r l on ge r pe ri ods of ti me. He nc e, w hi le hi st o ri ca l da ta is a good es ti ma to r in st ab le per iod s, as sh o wn in [2 ], w e re as on t ha t ad ap ti ng th e es tim at io n to an y un st ab il it y c an re du ce es ti ma ti on er r or . W e ex per im en ta ll y co nf ir m th is on al l of fi ci al c hi ld re n v a cc in es da ta us ed in De n ma rk bet w ee n 20 11 - 20 16 . 2. A GGREGA TION WITH REGRESSION TREES FOR TIME SERIES AD APT A TION T o ac co un t f or s ea so na l n on -st at io na ri ty , we use a n on li ne le ar ni ng me th od, Ag gr eg at io n Alg or it hm (A A) [6 ], de si gn ed to au to ma ti ca lly re du ce es ti ma ti on err or in a c ha ng in g en v ir o nm en t . AA w as re ce n tl y us ed in ti me ser ie s pr ed ic ti on co m bi n ed wit h an en se m bl e o f A RI MA m ode ls [ 3]. H o w ev e r, w e re as on th at AR IM A mod el s, or oth er tra di ti on al ti me se ri es mod el s, ar e no t li k el y to be suf fi ci en t fo r v ac ci na tio n up ta k e e st im at io n in ca se s wh er e: (i ) th er e is mo re th an on e da ta so ur c e, e. g. v a cci ne up ta k e da ta an d se ar c h fr eq u en c y da ta , an d (i i) wh en the ti me se rie s da ta to be es ti ma ted ar e as su med to be un a v a il abl e (n ea r re al -ti me ). T o ad dr ess th es e c hal le ng es , w e c om b in e A A wi th r eg re ss io n tr ee s, mo ti v at ed b y re ce n t re se ar c h sh o wi ng t ha t ra nd om fo re st s ou tpe rf orm AR IMA mod els on a vi an in fl ue nz a pr ed ic ti on [4 ]. A ra nd om fo re st , i .e . an ens em bl e of de ci sio n tr ee s, is w el l su it ed f or ou r pr ob le m si nc e it is ea sy to ex te nd to m ul ti pl e da ta so urc es . Ou r me th od w or ks as fo llo w s: W e in it ia ll y gen er at e a set of re gr es si on tr ee s. F or ea c h tim e st ep th e en se m bl e of re gre ss io n tr ee s is ret ra in ed ba se d on th e in it ia l se t of tre es a nd a w ei gh t ed su m is us ed to ma k e th e es ti ma ti on. A A is u se d to co n ti n uo us ly upd at e th e w ei gh ts of ea c h tr ee . Ea c h re gr es si on tr ee is tra in ed ba se d on a se t of fe a tu re s an d tr ain in g sa mp le s. F o r ea c h tr ee a fe at ur e se t is dr a wn wi th re pl ac em en t fr om th e co mp le te fe at ur e se t. T rai ni ng sa mp le s ar e se lec te d ba se d on ti me -r ela ti v e in di ce s, wh er e in de x 0 co rr es pond s to th e cur re n t tim e st ep . Th e in di ce s ar e un if or ml y dra w n wi th re pl ace me n t fr om th e in te rv al [0 : s ] , wh er e s is a win do w si ze . W e us e tr ee s wi th di ffe re n t win do w si ze s to ac c ou n t fo r st at io na ri ty an d no n- st at io na ri t y of th e si gn al . Ou r a dap ti v e v ac ci na ti on es ti ma ti on al gor it hm is sh o wn in Al go - ri th m 1 , wh er e η is th e le arn in g r at e, R T a se t of N re gr es si on tr e es , i th e am ou n t of i ni ti al tr ai ni ng da ta an d y th e v a c ci n at io n u pt a k e. 3. EV ALU A TION T o fac il it at e d ir ec t co mp ar iso n, w e ev a lu at e ou r met hod on th e sa me d at a as [2 ]: m on t hl y v ac ci na tio n up ta k e of all o ff ic ia l A lg or i th m 1 Ad ap ti v e ti me se rie s es ti ma ti on R eq ui r e: R T, η , i 1: W ← li st wi th w ei gh ts , in it ia li ze to be un if or m 2: X ← li st wi th th e fi rs t i tr ai ni ng sa mpl es 3: Y ← li st wit h th e fir st i ob se rv a ti on s of y 4: ˆ Y ← em pt y li st of es ti mat io ns 5: t ← cur re n t ti me st ep , st ar ti ng at i + 1 6: wh il e T ru e do 7: x t ← rec ei v e ne w ob se rv at io n fr om da ta st re am 8: f or n = 0 to N d o 9: T ra in R T [ n ] us in g X and Y 10 : ˆ Y te mp [ n ] ← es ti ma ti on of R T[ n ] gi v en x t 11 : e nd fo r 12 : ˆ Y [ t ] ← P N n = 0 W [ n ] · ˆ Y te mp [ n ] 13 : Y [ t ] ← ob se rv ed y at ti me t 14 : f or n = 0 to N d o 15 : W [ n ] ← W [ n ] · ex p( − η · ( ˆ Y te mp [ n ] − Y [ t ] ) 2 )) 16 : e nd fo r 17 : W ← nor ma li ze W 18 : X [ t ] ← x t 19 : t ← t + 1 20 : e nd wh il e c hi ld re n v ac ci ne s in De nm a rk fro m J an ua ry 20 11 - Ju ne 20 16 . V ac ci na tio n up ta k e is de fi ne d as th e to ta l n um ber of peop le v ac - ci na te d in a mo n th di vi d ed b y t he bi rt h co ho rt fo r th at m on th . T o es ti ma te v ac cin at io n up ta k e, w e us e fr equ en ci es of w eb se ar c h qu er ie s ext ra ct ed fro m Goo gl e T re n ds . W e us e th e exa ct sa me fr eq ue nc ie s of sin gl e te rm qu er ie s pr o vi ded b y [2 ] . W e co mp ar e to t w o ba se li ne s: (1 ) Li ne ar re gr es si on wi th la ss o re gu la ri za tio n wh ere the h ype r-p ar am et er is fo un d us in g th re e fo ld cr os s- v a li d at io n on th e tra in in g da ta ; (2 ) Li ne ar re gr es si on wi th el as tic ne t re gu la ri z at io n wh er e th e t w o h ype r- par am et er s ar e al so s el ec te d u si n g th re e fo ld cr os s-v al id at ion . W e al so in cl ud e fo r re fe ren ce t w o up per bou nd s co rr e spo nd in g to th e b est sc or e re por te d in [2 ] wh en us in g (i ) on ly w eb da ta, an d (i i) w eb da ta co m bi ne d wi th cl i ni ca l da ta . Th e se sc or es ar e no t th eo re ti ca l up - per bou nd s, bu t ju st th e bes t sc or es ac ro ss al l me thod s ev al ua te d in [2 ]. W e tre at th em as per fo rma nc e up per bou nd s bec au se th ey do no t co rr es pond to a n y in div id ua l me th od, bu t to t he bes t sc or e per v a cc i ne ac ro ss al l me th ods re por te d in [2 ]. Ne it he r ba se li ne s or up p e r bou nd s ac co un t fo r ti me- se ri es ad ap ta ti on , i. e. th ey al l as su me da ta st at ion ar it y . Th e i ni ti al n um ber of tr ai ni ng sa mp le s, i , is se t t o 24 . A ll al go ri th ms ar e ev al ua ted in a le a v e- one -o ut fa sh io n, wh er e al l da ta p ri o r to th e da ta poi n t bei ng es ti ma ted is us ed fo r tr ai ni ng . F or ou r al go rit hm (A T SE ) a pa ra met er se ar c h is per fo rm ed b y ra nd om ly sa mp li ng fro m th e fo llo w in g in t er v a ls : W in do w si ze in t er v al 1- 46, n um ber of fe at ur es de ri v ed fr om v a cc in at io n da ta 0- 45 , n um ber of fe at ur es de ri v ed fr om w eb dat a 0- 30 , n um ber of re gr es si on tr ee s 50 0- 10 00 0, η bet w ee n 0. 00 1- 0. 25 . T ab le 1 di sp la ys th e roo t me an sq uar ed er ro r (R MS E) bet w ee n th e es ti ma ted v ac ci na ti on u pt a k e an d th e re al v ac ci na t io n up ta k e fo r all me th ods . O ur me th o d yi el ds th e o v er all bes t per fo rm an ce co mp ar ed to th e ba se li nes (i t ou tpe rf o rm s al l ba se li ne s fo r 8 ou t of 12 v ac ci ne s). Ou r met hod a ls o ou tpe rf or ms t he up per bou nd s of [ 2] (a n y of th e t w o) fo r 6 v ac ci ne s. Th is su ppor ts ou r re as oni ng th at ad ap ti ng th e es ti ma ti on to te mpor al fl uc tu at io n s is a bet te r st ra te gy th an as su mi ng da ta s ta t io na ri t y . Ou r me th o d yi el ds th e st ro ng est per fo rm anc e im pr o v em en ts fo r H PV -1 , H PV- 2, MM R- 2( 12 ) a nd Di T eK iP o l. A ll of t hes e v a cc in es ha v e te mpo ra ll y ir re gu la r up ta k e pa tt er ns , as ex pl ai ne d ne xt . H PV -1 an d HP V- 2 ha v e bee n su b je ct to a he a vy me di a de ba te in De nm ar k, wi th a su bs eq ue n t dr op in v ac ci na ti on s . MM R- 2( 12 ) de no te s th e se co nd MM R v ac ci ne ta rg et ed 12 y ea r- ol ds . As c hi ld re n gr o w, pa ren t s V ac ci ne LA SS EN A TS E UB W [2 ] UB W C [2 ] HP V- 1 14 .6 13 .8 1 0. 0* 11 .5 9. 3 HP V- 2 15 .9 16 .1 1 0. 1 15 .4 8. 7 MM R- 1 12 .9 12 .9 1 2. 6* 16 .5 14 .9 MM R- 2( 4) 15 .5 14 .7 1 4. 2 12 .4 12 .3 MM R- 2( 12 ) 21 .7 21 .4 1 6. 0* 20 .8 16 .5 Di T e Ki P ol -1 16 .2 1 6.2 1 0. 8 8. 0 4.6 Di T e Ki P ol -2 14 .1 1 4.2 1 2. 4 9. 9 7.1 Di T e Ki P ol -3 10 .8 1 1.1 10 .0 * 17 .1 1 6. 4 Di T e Ki P ol -4 1 3. 7* 14 .3 * 14 .4 * 15 .4 1 4. 4 PC V- 1 7 .5 7. 8 10 .0 7 .7 5 .2 PC V- 2 9.6 * 9. 5 1 0.0 9 .6 6. 4 PC V- 3 9 .4 * 9.5 * 10. 1* 10 .3 6. 6 T a bl e 1: E st im at io n er ro r wh en es t im at i ng v a c ci na ti on u pt ak e fr om w eb se ar c h qu e ri es wi th ou r m et hod ( A T SE ), L as so ( LA S S) , El as ti c N e t ( EN ), a nd th e t w o per fo r ma nc e up pe r bou nd s of [2 ] wi th w eb se ar c h ( UB W) an d w e b s ea rc h an d cl in ic al da ta (U BW C ). B ol d m ar ks bes t (e xc lu di ng up per bou nd s) . As te ri sk m ar ks bet te r or eq u al to an y up per bou nd . ar e le ss li k el y to fo ll o w th e re co mm en ded v ac ci na tio n sc h ed ul e an d fl uc tua ti on s cor re la te d wi th me as le s ou tbr ea ks ar e ob se rv ed , th u s ma ki ng th e ti me s er ie s les s st at io nar y . La st ly , in re ce n t y ea rs th er e ha v e bee n pr ob le ms ob ta in in g a su ffi ci en t su pp ly of ce rt ai n Di T eK iP ol v ac ci ne s in De nm ark , wh ic h mi gh t ha v e fo rc e d peop le to pos tpo ne th e i ni t ia l v ac ci na tio n, h en ce in t rod uc in g irr eg ul ari ti es in t he s ig na l. F or P CV v a c ci n es th er e ha v e be en n o no te d ir re gu la ri t ie s in th ei r up ta k e pat te rn s, whi c h ex pl ai ns t he sl ig h t dr op in per fo rma nc e b y ou r me th od co mp ar ed to th e ba se li ne s. 4. CONCLUSION W e pr es en te d an au to ma ti c me th od for ne ar re al ti me est im a- ti on of h ea lt h ev en t s us in g w eb se ar c h qu er y da ta. O ur me th od co m bi ne s an Ag gr ega ti on Al go ri th m ( AA ) t o au to ma ti ca ll y r ed uc e es ti ma ti on er ro r in c ha n gi n g en vi ro nm en ts wi th re gre ss io n tr ee s. W e app li ed our me th o d to est im at e v a cc in at io n up ta k e i n al l of fi ci al Da ni sh c hi ld re n v a cc in es , fo ll o wi ng [2 ], an d sh o w ed th at ou r ap pr oa c h o v er al l ou t per fo rm ed str on g ba se li ne s th at as su me d da ta to be te m por al ly re gu lar . O ur me th od w as pa rt ic ul a rl y st ro ng es ti ma ti ng up ta k e fo r v ac cin es w it h kn o wn ir re gu la rit ie s in th ei r us ag e, su c h as HPV -1 , HP V- 2, MM R- 2( 12 ) an d Di T eK iP ol . Th is w or k c on fi rm s rec en t fi ndi ng s th at v ac ci na ti on u pt ak e ca n be au to ma tic al ly es ti ma te d on ly fr om w eb da ta , an d fu rt he r ex te nd s th is ar ea b y ac co un t in g fo r ir re gu la r up t ak e pat te rn s. F und ed b y IB M and t he Da nis h Age ncy f or S cie nce & H igh er E duc ati on. 5. REFERENCES [1 ] S. Y ang , M. Sa n til lan a, a nd S. Ko u. Acc ura te est ima tio n of in flu enz a ep id em ic s us in g goog le se ar c h da ta vi a ar go . Pr o c e e di ng s of t he Na tio nal A c ad emy o f Sci enc es , 11 2(4 7): 144 73 –14 478 , 20 15 . [2 ] N. D al um H an sen , C. L io ma , an d K. Mø lba k. En se m ble l ea rn ed v ac - ci nat ion up tak e pre dic tio n us in g we b se ar ch qu eri es. In Pr o c e e di ngs of t he 25 th AC M In te rna tio nal o n Con fer enc e on I nf or mat ion an d Kn owle dge Ma nag em en t , CI KM ’1 6, pa ge s 19 53 –1 95 6, 201 6. [3 ] W. Ja mi l, Y . Kal nis hk an, an d H . Bou ch ac hia . Agg reg at ion al go rit hm vs . a v er age fo r ti me se rie s pr ed ic ti on. I n Pr o c e e di ngs o f th e EC ML PK DD 20 16 Wo rk sho p on L ar g e- sc al e L e a rn ing fr o m Da ta St r e am s in E vo lv ing E nv ir o nm en ts , STR EAM EV OL V- 201 6 , 9, 20 16 . [4 ] M. J. K an e, N. P ri ce , M. Sc ot c h, an d P . Ra bi no wi tz. Co mpa ris on o f ar ima an d ra nd om f or es t tim e se ri es mode ls for p re di cti on o f a via n in flu enz a h5 n1 ou tb rea ks. BM C bi oi nfo rma tic s , 15 (1 ): 1, 20 14. [5 ] M. Sa nt ill ana , A. T. Ng uy en , M. Dr ed ze, M. J. P au l, E. O. N soe si e, a nd J. S. B ro wn st ei n. Com bi nin g se arc h, soc ial me dia , an d tra dit ion al dat a so ur ce s to im pr o ve in flu enz a su rv ei ll an ce. PL oS Com put B io l , 1 1( 10 ): e10 04 513 , 20 15 . [6 ] V. V ov k. Com peti tiv e on- lin e st at is tic s. Int ern ati ona l St at ist ic al R ev iew /R ev ue In te rn ati ona le de St at ist iqu e , pa ge s 213 –24 8, 200 1.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment