Designing Intelligent Automation based Solutions for Complex Social Problems
Deciding effective and timely preventive measures against complex social problems affecting relatively low income geographies is a difficult challenge. There is a strong need to adopt intelligent automation based solutions with low cost imprints to t…
Authors: Sanjay Podder, Janardan Misra, Senthil Kumaresan
Designing Intelligent A utomation based Solutions f or Complex Social Pr oblems Sanjay Podder S A N JA Y . P O D D E R @ AC C E N T U R E . C O M Accenture Labs, Bangalore 560066 India Janardan Misra JA NA R D A N . M I S R A @ AC C E N T U R E . C O M Accenture Labs, Bangalore 560066 India Senthil Kumar esan S E N T H I L . K U M A R E S A N @ AC C E N T U R E . C O M Accenture Labs, Bangalore 560066 India Neville Dubash N E V I L L E . D U BA S H @ A C C E N T U R E . C O M Accenture Labs, Bangalore 560066 India Indrani Bhattacharya I N D R A N I @ C I N I N D I A . O R G CINI, K olkata, India Abstract Deciding effecti ve and timely prev entiv e mea- sures against complex social problems af fecting relativ ely low income geographies is a dif ficult challenge. There is a strong need to adopt intelligent automation based solutions with lo w cost imprints to tackle these problems at larger scales. Starting with the hypothesis that analyti- cal modelling and analysis of social phenomena with high accuracy is in general inherently hard, in this paper we propose design frame work to en- able data-dri ven machine learning based adaptiv e solution approach tow ards enabling more effec- tiv e prev enti ve measures. W e use surve y data collected from a socio-economically backward region of India about adolescent girls to illustrate the design approach. 1. Introduction Relativ ely lo w income societies in countries lik e India f ace multitude of challenges ( UNICEF-India , 2016 ) including low empowerment of weaker sections of society , poor health and lo w nutrition, low quality of education, poor child protection, and poor quality of sanitation and hygiene. T o address these challenges and associated problems like child traf ficking ( Sarkar , Siddhartha , 2014 ; Chaitra Ar- 2016 ICML W orkshop on #Data4Good: Machine Learning in Social Good Applications , New Y ork, NY , USA. Cop yright by the author(s). junpuri , 2013 ), child labor , domestic violence etc. com- munity outreach program (COP) has emer ged as one of the prominent methods adopted in many of the Low and Middle Income Countries (LMIC) to deliv er healthcare services, monitor the condition of a vulnerable population, and propagate information during disasters and so on. W ith the advent of Social, Mobile, Analytics and Cloud based digital technologies, and the penetration of smart- phones in rural areas have enabled many organizations to upskill the community facilitators (CFs) through digital technologies. Existing literature ( Introne et al. , 2013 ; Schoder et al. , 2014 ) has highlighted the challenges in implementing a digital solution for deliv ering high-quality outreach services and the system architecture guidelines that hav e to be followed to overcome those challenges. Since, the internet penetration in rural areas is still rela- tiv ely lo w , organizations ha ve started to adopt a Mobile based Decision Support System (MDSS) that can w ork without any internet connectivity . MDSS have helped many organizations catering to outreach care, with in-built rule sets to cate gorize the target population and ease the work of outreach workers from complex analysis based upon multiple guidelines. W e argue in this paper that instead of a fix ed rule set, how a machine-learning based, dynamic and context aware computational model could help to provide improved qual- ity of care through the community outreach programs. Howe ver , the primary challenge in applying such data- driv en approaches in the context of social problems is the lack of verifiable and quality data. Gov ernment and researchers rely on NGOs that are deliv ering outreach services as a primary sources of data. But, primary 26 Designing Intelligent A utomation based Solutions f or Complex Social Problems data collected by many organizations may be unreliable and most of the information is collected to meet only organizational needs and may be unsuitable otherwise. 1.1. Example Scenario: Child T rafficking 30 Million people are traf ficked globally e very year . T raf- fickers often utilize mobile phones, social media, online classifieds, and other networking channels to interact with their circles about the victims. T o counter this, man y NGOs like CINI (Child In Need Institute) India ( CINI - Child In Need Institute , 2016 ) are leveraging field agents to monitor the vulnerable population of adolescent girls between the age group of 10 and 19. CINI field agents periodically collect data about vulnerable girls on aspects pertaining to verticals lik e education, protection, health, and nutrition and analyze the data along all these vertical together to determine potential vulnerabilities. 2. Proposal f or a Data Driven Dynamically Adaptive Design Framew ork T o solve complex social problems ( Introne et al. , 2013 ; Miller & Page , 2009 ) particularly affecting relativ ely low income geographies, there is greater need to adopt in- telligent computing solutions inv olving minimal cost im- print with maximum empo werment of the potential victims (PVs) which remain under powered owing to various socio- economic factors. W e will refer such AI driv en platforms or applications designed for Social Good to address complex social challenges as AI4SG. Next we argue for adoption of certain design themes while dev eloping these AI4SG applications and platforms. For illustration, we use survey data collected by CINI from a socio-economically backward region of India about 1000 adolescent girls towards their vulnerability analysis and consequent mitigation. Details of this can be found in earlier published study ( Ghosh et al. , 2015 ). Illustrati ve analysis of the data was carried out using R 3.2.4. 2.1. Design Thinking Proposals Social phenomena are inherently hard to model accu- rately ( Introne et al. , 2013 ; Miller & Page , 2009 ). Primary reason for this could be attributed to large number and variety of factors affecting the phenomena under study in ways too complex to be fully understood. T o further complicate the matter in the conte xt of social problems, for ethical reasons, controlled studies cannot be performed since actual social ev ents cannot be artificially created but could only be analyzed when the y occur naturally . Therefore approaches applying static and analytical solu- tions (e.g., closed form formula based vulnerability analy- sis ( Ghosh et al. , 2015 )) cannot reliably generalize to lar ger contexts and might remain locally rele v ant where most of the parameters in the model are approximately fixed and attributes with high predictiv e power are known with experience. When generalization beyond local boundaries C ontin u ous Lea rn in g Su r v e y D a t a/ F ea t u r e M a t rix Ad ap t iv e D a t a An aly sis R egion al F act or s Ext er n al D a t a So u r c e H igh ly R elev an t F ea t u r es C u lt u r al F act or s Socio - ec on om ic F act or s P as t Su r v e y D a t a Ad ap t iv e M L P r ed ic t iv e M o d el V u ln er ab ilit y Pr ed ic t or Figure 1. High Level Component Architecture of the Proposed Design Approach for the problem of Child T rafficking and large scale adoption are critical goals to achie ve, a data- driv en and machine learning based approach may provide an ef fectiv e resolution to this problem. Under such design framew ork, a computational model is generated (instead of a manually defined analytical model) from sample data collected from the field studies with feature-set designed in consultation with social scientists specializing in that field. W e aim to e volv e an approach to build a frame work to de- sign such models for various social problems. Framew ork is primarily based upon data-dri ven design methodology together with application of AI technologies to render ev entual solution amenable to wider adoption with lo w cost imprint and serving priorities at multiple levels ranging from potential victims (e.g., children as potential targets of trafficking) to field workers, to NGOs and Gov ernment Agencies interested in analysis of impact of their services, and eventually to social scientist interesting in scientifically studying the underlying phenomena at larger scales. For notational con venience, we will use community facili- tator , field agent, and agent interchangeably . Design Proposal 1: Eventual data-driven machine learning based pr edictive modelling Analytically designing solutions for complex social prob- lems is inherently hard and only ef fectiv e alternativ e is to design a model which optimally conforms to the data collected from the field. Machine learning provides an 27 Designing Intelligent A utomation based Solutions f or Complex Social Problems operational solution to this problem where patterns un- derlying the data which could pro vide clues to solving the problem can be computationally extracted and used in designing mitigation strategies. Often solving social problems requires an ability to make predictions well ahead of time before actual event may take place (e.g., vulnerability prediction for child traf fick- ing problem) using analysis of factors affecting potential victims. In this perspectiv e classification and re gression techniques provide required predictiv e model though initial design trials may be necessary to determine the right prediction technique or a combination of many . Howe ver , acquiring sufficient good quality data to train machine learning models in the context of social problems is very difficult. For example, ev en though the collected surve y data by CINI cov ers many interesting details ( Ghosh et al. , 2015 ) pertaining to ev entual vulnerability of children, it does not yet contain information about those who were actually reported to be trafficked. This may result into cold-start problem if only ML based model has to be used to design AI4SG applications. For this reason, this design proposal suggests that ML based data-driv en solution should be the e ventual design goal and in order to start the application in field work by CFs (and PVs) and to gain their trust, one needs to have alternative heuristic solutions resulting from prior field e xperiences designed in collaboration with social experts. For example, for the problem of child trafficking ( Ghosh et al. , 2015 ) discusses a linear conv ex model with 32 features along with a threshold to determine whether a child is vulnerable or not. This solution is being currently used by CFs of CINI to collect surve y data and perform analysis. Howe ver , it is important to add that with this proposal, we strongly suggest that except feature engineering, role of heuristic solutions should be minimized ov ertime as more data gets acquired on a real-time basis (see next proposal) so that final predictiv e model evolv es by learning only from actual data and less from subjecti ve experiences of solution designers to reduce biases and dependencies on AI4SG designers. Design Proposal 2: Real-time continuous learning based dynamic evolution of pr edictiv e model T o motiv ate this design choice, let us consider a hypothet- ical scenario related to human traf ficking use case. In this scenario, there has recently been cases of child traf ficking in a locality during election time, howe ver not all of those could be correctly predicted to be vulnerable by existing model. Therefore to update underlying prediction model, new data needs to be sent to its designers, which would then in volve new cycle of update and reloading of the predictiv e model to the agent de vices on periodic basis. Often such solutions ev en if built using ML techniques require centralized offline update of the predictiv e model and AI4SG applications running on agent devices cannot adapt themselves at run-time when new cases of actual victims become known! T o wards that we suggest that solutions for social problems must be designed as continuously adaptiv e applications which learn (from potentially incomplete data) while being in actual use by retraining themselves automatically when information about ne w actual incidents is entered on the agent device running the application. Eventually overtime each agent would hav e ev olved its own unique predictiv e model based upon the incidents of the trafficking known in her area and other cases where such trafficking did not take place for kno wn period of time. Applications should also update their prior predictions after improved training and send alerts about all those, who no w are no w in danger zone but earlier were not. Additionally , agent device or central server should be de- signed to analyze updated field-data to infer which factors are becoming increasingly critical in the light of ne w incidents so that right mitigation strategies can be designed or existing ones could be adapted to meet the requirements of the emerging scenarios. F or example, based upon these updated predictions, AI4SG application for child traffick- ing should send alerts to all the registered children (and/or their care takers) and community facilitators regarding changes in the mitigation strategies. Design Proposal 3: Structural analysis of data using feature corr elations, similarities, and clustering Collecting details about actual victims of social problems is a kno wn challenge ( Introne et al. , 2013 ) primarily because these victims are often out of access for any examination and only indirect data points could be collected with enough efforts. On the other hand, data for non-victims is relativ ely easier to acquire but it only makes design of prediction model harder owing to inherent bias towards non-victim class. Additional dif ficulty arises because when a prediction model used in practice, its predictions control mitigation strate gies which further bias population to wards its predictions and hence make it harder to know to what extent such a model is inherently accurate. Under such a scenario, unsupervised ML techniques should be used for complementary analysis of the data even as new data points get added. Examples of such analysis are considered next. Cluster and Similarity Analysis: Similarities among po- tential victims can be used to cluster them in social- groups by applying clustering techniques and to identify outliers. For example, a safety profile containing only 28 Designing Intelligent A utomation based Solutions f or Complex Social Problems those factors which may render a potential victim highly vulnerable could be defined and all the kno wn PVs having similar profiles within same locality can be made to socially connect with each other so that they can work as a group to address their vulnerabilities together (see Figure 5 ). Similarly , clustering analysis can be used to determine whether certain details about a new PV are far aw ay from others in the same locality? Note that in low income geographies, high lev els of social similarities within same locality are a commonly observed phenomena. If so, AI4SG application alerts the agent with factors where high deviations are present. The similarity graphs or clusters can be further augmented with conte xtual kno wledge about external en vironmental factors af fecting the underlying phenomena (e.g., large scale religious gathering making traf ficking of children easier for anti-social elements ( Chaitra Arjunpuri , 2013 )). Such augmented graphs (type of kno wledge graphs) can further assist in taking timely prev enti ve measures as per the emerging conte xts. For illustration, when we applied Hierarchical Clustering on CINI data in 17 dimensional PCA space, an outlier cluster with 7 sample points emerged which on closer analysis turned out to be in volving in v alid value ranges for certain attributes. Figure 2 illustrates the hierarchical clustering as dendrogram. Further similarity analysis of the Figure 2. Hierarchical Clustering of Data Points with PCA F actor Map with 7 Clusters data re vealed interesting insights on the nature of collected data samples. Figure 4 illustrates this. For example, for approximately 48% of the cases, each girl data had at least one another identical data point also present in the samples. Furthermore, the collected data turned out to be highly biased to wards high similarity regions and lacking samples in other spectrum of similarities - there existed less than 5% of pairs of girls with similarities less than 70%. Thus indicating clear need for collecting surve y data from multiple sources so that there are true representativ e samples present for designing predictiv e models and further analysis. Figure 3. Identifying distribution of Data Points on Similarity Scale F eature Correlation Analysis : Comparison of positiv e and negati ve correlations among features can bring subtle in- sights into the intentional design of underlying choices. For example, Figure 4 depicts the overall Correlogram with pair-wise correlations among the features (i.e., questions in the CINI survey data). In this correlogram features are ordered by first principal component loadings. T o com- plement this Correlogram, Figure 4 also depicts positi ve correlation graph among features as present in the survey data with correlation strength at least 0.5. This correlation Figure 4. Correlogram depicting degree of associations among features and associated Positiv e Correlation Graph analysis makes it clear as to which characteristics used in the underlying predictiv e model are actually related with each other and therefore possibly should be ana- lyzed together . This correlation analysis should be further strengthened by Principal Component Analysis (PCA) for identifying most explanatory features. For example, from the CINI data, it turned out that 1st principal component had only 21% explanatory po wer and hence any linear combination of features can only achiev e at most this much explanatory capacity . Another interesting insight has been that it would require at least 17 PCs to achiev e 85% information v ariance in data implying that most of the originally designed survey questions for CINI data were largely uncorrelated with one another . This may shed further insights on the social dynamics of the underlying population being surve yed. 29 Designing Intelligent A utomation based Solutions f or Complex Social Problems Design Proposal 4: V irtual agent based native spoken language interaction for decentralized empo werment of potential victims and for r eal-time data collection by CFs Generally community facilitators of COPs are required to in-person visit and collect data periodically about po- tentially vulnerable sections of society (e.g., potentially vulnerable children). This could be a critical impediment to scale such solutions to larger scale. This design proposal envisions the use of nati ve written and/or spoken language based virtual agents ex ecuting on the mobile devices of these CFs as well as (if feasible) potential victims (or places like schools kiosks) so that CFs can interact with their MDSSs with relativ e ease and each registered user (e.g., girl child) having access to this virtual agent can send her current state to their assigned CF without requiring in-person interaction. Ef fecti vely , each virtual agent on the potential victim’ s de vice acts as a proxy to actual community facilitator and thus enables these CFs to get connected to lar ger number of potentially vulnerable victims (e.g., children) simultaneously . Figure 5 below depicts a virtual agent based hypothetical advisory scenario on enrollment of a new member by a community facilitator . Such virtual agents may also be used to authenticate users, reduce deliberate or un- intentional fudging of data, and may enable emer gency responses. Figure 5. Illustrativ e Scenario where V irtual Assistant displays V ulnerability Analysis, Similarity based Clustering and recom- mends mitigation strategy to Community Facilitator on her mobile device 3. Conclusion T o address the grand challenges of complex social prob- lems in lo w income geographies, this paper presents design proposals to implement purely data-dri ven machine learn- ing based solutions for enabling dynamic decision making tow ards deciding timely prev entiv e measures which can be applied directly by field agents of community outreach programs. W e primarily focused on design of real-time continuous learning based predicti ve application scenarios and structural analysis of data to enable fine grained analy- sis of local population a field agent is responsible for . This needs to be augmented with additional design elements in- cluding enabling large scale data processing for wide scale adoption ( Coulton et al. , 2015 ), collectiv e collaboration, and techniques for knowledge graph generation and their use in deciding prev enti ve measures. References Chaitra Arjunpuri. India Faces Epidemic of Missing Children, 2013. URL http://www. aljazeera.com/indepth/features/2013/ 02/2013219121326666148.html . CINI - Child In Need Institute. CINI in India, 2016. URL http://www.cini- india.org/ . Coulton, Claudia J, Goerge, Robert, Putnam-Hornstein, Emily , and de Haan, Benjamin. Harnessing Big Data for Social Good: A Grand Challenge for Social W ork. 2015. URL http://aaswsw.org/wp- content/ uploads/2015/12/WP11- with- cover.pdf . Ghosh, Chiranjeeb, K untagod, Nataraj, Maitra, Anutosh, Paul, Sanjoy , and Bhattacharya, Indrani. Mitigating vulnerability of adolescent girls via innov ati ve usage of digital technologies: Insights from a field trial. In 2015 2nd International Confer ence on Science and Social Researc h: SoBeS-Social and Behavioural Science (CSSR 2015- SoBeS-Social and Behaviour al Science) , SHAH ALAM, Malaysia, October 2015. Introne, Joshua, Laubacher , Robert, Olson, Gary , and Malone, Thomas. Solving W icked Social Problems with Socio-Computational Systems. KI-K ¨ unstliche Intelligenz , 27(1):45–52, 2013. Miller , John H and Page, Scott E. Complex adaptive systems: An intr oduction to computational models of social life . Princeton univ ersity press, 2009. Sarkar , Siddhartha. Rethinking Human T raf ficking in India: Nature, Extent and Identification of Surviv ors. The Round T able , 103(5):483–495, 2014. Schoder , Detlef, Putzke, Johannes, Metaxas, Panagi- otis T akis, Gloor, Peter A., and Fischbach, Kai. Information Systems for “Wick ed Problems”. Business & Information Systems Engineering , 6(1):3–10, 2014. ISSN 1867-0202. URL http://dx.doi.org/10. 1007/s12599- 013- 0303- 3 . UNICEF-India. What we Do, 2016. URL http:// unicef.in/Whatwedo . 30
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment