Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing Tools with Educators in Hawai`i

Whose Knowledge Counts? Co-Designing Community-Centered AI A uditing T ools with Educators in Hawai‘i Dora Zhao Stanford University Stanford, CA, USA dorothyz@stanford.edu Hannah Cha Stanford University Stanford, CA, USA hcha417@stanford.edu Michael J. Ryan Stanford University Stanford, CA, USA michaeljryan@stanford.edu Angelina W ang Cornell T ech New Y ork, NY, USA angelina.wang@cornell.edu Rachel Baker-Ramos Georgia Institute of T echnology Atlanta, GA, USA rachelbaker@gatech.edu Evyn-Bree Helekahi-Kaiwi Ulu L ¯ ahui Foundation Honolulu, HI, USA evyn- bree.hele- kaiwi@ululahui.org Rebecca Diego Ulu L ¯ ahui Foundation Honolulu, HI, USA rebecca.diego@ululahui.org Josiah Hester Georgia Institute of T echnology Atlanta, GA, USA josiah@gatech.edu Diyi Y ang Stanford University Stanford, CA, USA diyiy@cs.stanford.edu Abstract Although generative AI is being deployed into classrooms with promises of aiding teachers, e ducators caution that these tools can have unintended pedagogical repercussions, including cultural misrepresentation and bias. These concerns are heightened in low- resource language and Indigenous education settings, where AI systems frequently underperform. W e investigate these challenges in Hawai‘i, where public schools operate under a statewide mandate to integrate Hawaiian language and culture into education. Through four co-design workshops with 22 public school educators, we surfaced concerns ab out using generative AI in educational settings, particularly around cultural misrepresentation, and corresponding designs for auditing tools that address these issues. W e nd that educators envision tools grounded in spe cic Hawaiian cultural values and practices, such as tracing the genealogy of knowledge in sour ce materials. Building on these insights, w e conceptualize AI auditing as a community-oriented process rather than the work of isolated individuals, and discuss implications for designing auditing tools. CCS Concepts • Human-centered computing → Empirical studies in HCI . Ke ywords Indigenous technologies, end-user auditing, community-based re- search, generative AI in education A CM Reference Format: Dora Zhao, Hannah Cha, Michael J. Ryan, Angelina W ang, Rachel Baker- Ramos, Evyn-Bree Helekahi-Kaiwi, Rebecca Diego, Josiah Hester, and Diyi Y ang. 2026. Whose Knowledge Counts? Co-Designing Community-Centered This work is licensed under a Creative Commons Attribution 4.0 International License. CHI ’26, Barcelona, Spain © 2026 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-2278-3/2026/04 https://doi.org/10.1145/3772318.3790958 AI Auditing T ools with Educators in Hawai‘i. In Procee dings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), A pril 13–17, 2026, Barcelona, Spain. A CM, New Y ork, NY, USA, 24 pages. https: //doi.org/10.1145/3772318.3790958 1 Introduction Generative AI systems, such as large language models (LLMs), are being increasingly deployed for educational purposes. These tech- nologies are touted to increase educators’ capacity to support stu- dents while maintaining high-quality instruction [ 62 ] and provide inexpensive ways to cater to diverse student populations [ 24 , 33 , 97 ]. Already , teachers have adopted generative AI to assist with many dierent tasks, including automated lesson planning [ 12 , 44 , 73 , 75 ], brainstorming teaching strategies [ 69 , 75 , 88 , 102 ], and adapting learning to increase student engagement [ 4 , 136 ]. Howev er , these technological solutions do not come without concerns: generative AI models are known to reproduce or e xacerbate social and cultural biases by misrepresenting cultural content and o verrepresenting W estern narratives [ 91 , 150 ]. For instance, when prompted with controversial questions, Anthropic’s Claude answers 11.7% mor e similarly to U .S. persp ectives than Chinese ones [ 39 ]. This bias manifests in educational contexts as well. As shown in Fig. 1, gen- erated outputs perp etuate dominant W estern narratives, glorifying Captain Cook’s relationship with Hawai‘i and glossing over the known harms he caused [ 66 , 131 ]. In doing so, these outputs re- produce imperial legacies, obscure the profound r epercussions on Indigenous communities, and skew understandings of histories. These misrepresentations have outsized consequences when AI systems are deploy ed in education settings as these technologies can shap e b oth what students learn and how they learn, such as impacting critical-thinking abilities or social development [ 60 ]. As is the case for the example in Fig. 1, this risk is compounded in the context of lo w-resource languages and Indigenous education settings, where AI systems are more prone to inaccuracies and mistranslations [ 61 , 122 , 173 ]. Howev er , while teachers want to b e able to assess the quality and cultural relevance of AI-generated CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. Figure 1: Output from ChatGPT when querie d about Captain James Cook’s relationship with Hawai‘i presents a more sanitized depiction of Cook, failing to highlight the negative impacts he had on Hawai‘i [66, 131]. outputs, they often lack the time, to ols, or training to make this tractable [23, 74, 98, 109, 117]. One hope is that model providers can pr oactively intervene to address these concerns. In practice, however , many institutional barriers hinder the adoption of responsible AI practices, making top-down approaches unreliable [ 68 , 156 ]. A fruitful alternative are bottom-up eorts in which end users act as auditors, surfacing harmful algorithmic behaviors they encounter in their day-to-day interactions [ 37 , 140 ]. Since end users possess situated knowledge about how tools are used in practice, they can identify harms prac- titioners or outsiders would overlook [ 92 , 100 , 139 , 172 ]. Y et, many end users lack the technical expertise to conduct systematic audits independently , highlighting the need for auditing tools designed to scaold their eorts [ 35 , 83 ]. While prior end-user auditing systems have been proposed for more general settings, educational contexts introduce additional challenges: K–12 teachers face high-stakes environments, understaed classrooms [ 111 ], and overwhelming workloads [ 48 ], alongside culturally situated demands to ensure AI-generated outputs align with curricular standards and pedagog- ical practices. These conditions raise critical questions about how auditing practices can b e ee ctively integrated into e ducational contexts, such as what concerns audits should prioritize and what tools are best suited to support teachers. T o make progress on answ ering these questions, we conduct a case study on co-designing AI auditing tools with public school educators in O‘ahu, Hawai‘i. W e situate our study in Hawai‘i due to the unique intersection of educational and cultural challenges that arise when integrating generativ e AI systems in this setting. Public school educators in Hawai‘i ar e rapidly adopting AI systems for pe dagogical purp oses [ 63 ]. At the same time, public schools in Hawai‘i operate under a government mandate to incorporate Hawaiian language and culture [ 114 , 146 ], and Hawai‘i is one of the few states to have public language immersion programs in an Indigenous language [ 67 ], ‘ ¯ Olelo Hawai‘i [ 122 ]. The dual pressure of harnessing generative AI for pe dagogical benet while avoiding cultural inaccuracies have already created tension among educa- tors [ 98 ]. By focusing on this intersection, our case study surfaces how educators navigate both pedagogical and sociotechnical con- cerns regarding generative AI in culturally sensitiv e contexts. In this work, w e conducted four co-design workshops with 22 public school educators. During the workshops, participants re- ected on their current uses of generative AI, discussed instances of cultural misrepresentation they encountered, and engaged in design exercises to ideate potential auditing to ols. The propose d tools centered on three priorities: identifying sources for gener- ated outputs, distinguishing whether W estern narratives or Kanaka (Native Hawaiian) e xperiences w ere represented, and agging prob- lematic model outputs. While these features resemble those sup- ported by general-purpose auditing tools, our ndings uncover the nuances behind participants’ requests that were shape d by community values and practices, impacting the resulting designs. For example, unlike conventional attribution or retrieval-based ap- proaches [ 28 , 104 , 112 ], participants emphasized the imp ortance of tracing the genealogy of sources given its central role in Hawaiian cultural practice. Our ndings on the specic harms and designs surfaced in our workshops ar e tied to a sele ct group of educators in O‘ahu and may not generalize to other settings. However , our broader insight — that general-purpose auditing processes often fail to accommodate local epistemologies, values, and knowledge systems — extends beyond this context. Drawing from decolonial perspectives within HCI [ 3 , 5 , 164 ], we argue that designing eective auditing prac- tices requires centering the plurality of community-specic ways Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain of knowing rather than assuming a universal, W estern default. Un- der this framing, we argue for reorienting end-user auditing as a community-oriented process rather than a task carried out by iso- lated individuals, and propose three design recommendations: (1) determining who should participate as auditors, (2) developing in- frastructures that embe d community values, and (3) reorienting the desired outcomes of audits. In total, we contribute the following: • Dimensions of cultural misrepresentation and corre- sponding auditing tool designs : W e provide empirical insights from our co-design workshops on the types of cul- tural misrepresentations educators encounter when using generative AI and highlight auditing tools they proposed to address these concerns. • Guidelines for designing community-centered audit- ing tools and infrastructures : W e argue for framing au- diting as a community-oriented process . Using decoloniality as a lens, we discuss design implications related to support- ing diverse community perspectives, involving community members in the creation of auditing tools, and r econsidering the intended outcomes for AI auditing. 2 Background In this section, we pro vide background on Hawai‘i and its public education system to contextualize the relevance and importance of this study . 2.1 Social and Cultural Context: Hawai‘i W e situate our study in Hawai‘i, as it presents a unique context where factors shaping culturally gr ounded AI auditing are critically converging in the education domain. Hawai‘i is home to Kanaka Maoli (Native Hawaiians) and is the only U.S. state to designate a Native language, ‘ ¯ Olelo Hawai‘i , as an ocial state language [ 1 ]. Beyond its strong Indigenous pr esence, Hawai‘i is also among the most multicultural states in the U .S., with the highest proportion of Multiracial residents nationwide [ 154 ]. Hawaiian researchers have noted that island systems can also ser ve as mo del systems for studying social and technical inter ventions given the bounded environment [ 13 ]. Recognizing that community-based research can unintentionally exploit or harm communities, even when well- intentioned [ 54 , 59 ], our work is grounded in a commitment to centering the needs of the communities we engage with. In this con- text, Hawai‘i oers a paradoxical advantage: despite the complexity of its cultural landscape, its island scale enables interventions to propagate more rapidly and b e observed more clearly than in larger , more diuse systems [13]. 2.2 Cultural Education and AI Usage in Hawai‘i’s Public Education System The education system in Hawai‘i r epresents a unique conuence where new generative AI systems are being utilized for curricu- lum that cover culturally sensitive topics. Public schools in Hawaii are rapidly adopting AI technologies, spurr ed by state-level part- nerships with AI providers [ 63 ]. This rapid integration intersects with long-standing commitments to culturally relevant education in Hawai‘i. Since 1987, the Department of Education in Hawai‘i has a mandate to integrate the study of Hawaiian culture and language in classr ooms; this mandate was further bolstered in 2015 when the Board of Education instituted policies around integrating Hawai- ian education into classroom standards [ 114 ]. The Department of Education acknowledges that “the knowledge of our k ¯ upuna is the guiding light that directs our purpose in support of Hawaiian educa- tion” , highlighting that learning must reect Indigenous teachings and values [ 1 ]. As a part of these cultural r evitalization eorts, 22 public schools in Hawai‘i oer Ka Papahana Kaiapuni Hawai‘i , or Hawaiian language immersion programs, which contain culturally grounded instruction taught in ‘ ¯ Olelo Hawai‘i [ 67 ]. Culturally re- sponsive pe dagogy in Hawai‘i is primarily base d on place-based and ¯ aina-based education [ 55 , 98 ] 1 , which ties educational activi- ties to local environments, community knowledge, and Indigenous epistemologies [56, 110, 124]. In the ethos of community-based participatory research, we worked with public schools in O‘ahu with the goal of centering the needs and priorities of e ducators in the community [ 86 ]. Members of our team have years-long relationships with our partner schools, providing a foundation of trust for this project. These relation- ships shaped key decisions around study design, recruitment, and data collection, ensuring that our process aligned with community values and minimized extractive practices. While we focus on public school educators in Hawai‘i for this work, our ndings speak to broader challenges emerging as gen- erative AI enters classrooms. The specic harms and tool designs surfaced are grounded in this community’s values and practices, but the challenges with designing auditing tools in these contexts — reasoning about cultural misrepresentation in data-scarce set- tings and designing tools that work across varied forms of expertise (e.g., cultural background, technical / AI literacy) — are likely to resonate with other Indigenous and marginalized communities. Furthermore, our insights into creating dual-purpose tools that both uncover problematic outputs and serve as pe dagogical aids for fostering students’ critical thinking also extend to a wide range of educational contexts. O verall, Hawai‘i’s cultural and educational landscape makes these dynamics visible in ways that may be harder to detect else where, oering an early window into challenges other communities will soon confront. 3 Related W ork 3.1 Algorithmic Auditing Algorithm auditing describes the process of surfacing harms, such as social bias, discrimination, or problematic behavior , by investi- gating outputs of algorithmic systems [ 19 , 82 ]. The process often involves repeatedly feeding an input to an algorithm, and analyzing its outputs to understand its behavior and subsequent impact [ 96 ]. In recent years, such audit-based methods have proven eective in identifying harmful behaviors when using AI systems in high-stakes domains including housing [ 6 , 72 , 175 ], healthcare [ 101 , 113 , 134 ], employment [ 27 , 159 , 161 ], and policing [ 14 , 174 ]. Although AI au- dits are often conducte d by researchers [ 22 , 31 , 42 , 43 , 128 ] or practi- tioners [ 96 , 132 , 148 ], end users can help identify harms from AI sys- tems by leveraging their lived e xperiences [ 37 , 140 ]. In fact, in many 1 ¯ Aina means “that which sustains us, ” referring to place as more than land, but also the people, culture, creatures, and envir onment. CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. cases, users can discover issues that expert auditors overlooked, allowing for more comprehensive audits [ 92 , 100 , 139 , 170 , 172 ]. Prior work has introduced community-centered auditing practices through focus groups and workshops, demonstrating how this for- mat is b enecial for identifying stereotypical representations of non- W estern cultures [ 51 ] and the lack of disability [ 92 ] and gen- der [ 50 ] representation in text-to-image models. Qadri et al . [126] further illustrated how community-driven e valuation can surface culturally specic and nuanced harms that standard AI evaluation practices miss. Nonetheless, a remaining challenge is how to best support end-users who are engaging in auditing. Currently , practi- tioners will rely on existing cro wdworking platforms for end-user auditing, which may not yield the most productive results [ 36 ]. As an alternative , systems such as Deng et al . [35] ’s W e Audit and Lam et al . [83] ’s IndieLabel provide examples of tooling built to better scaold end-users’ participation in auditing AI systems for text-to-image generation and toxicity classication respectively . These system artifacts exemplify broader calls to design auditing tools in a more participatory fashion [115]. In parallel, there is growing body of resear ch interested in how we can scaold end-user auditing processes for target demographic groups. Much of this work has focused on how to support youths in AI auditing, as they are often early adopters of these technolo- gies [ 107 , 125 , 144 ]. For e xample, Solyst et al . [144] conducted work- shops exploring how to support teenagers in engaging in critical AI auditing work. However , much of this scholarship focuses on feasibility (i.e., demonstrating that youths can pr oductively serve as auditors) rather than on concrete auditing tools or interfaces to best facilitate these processes. Overall, while there is a rich line of prior work that expands individual users’ role as auditors, there has been less focus on how to design auditing tools for the specic com- munities and the harms they experience. Through our co-design workshops, we illustrate ho w the nuances of the community con- text in Hawai‘i shape the auditing process, from what outputs are considered concerning to what tools are required. 3.2 Generative AI for Low-Resource Languages and Indigenous Communities Prior work has e xplored how generative AI systems can support the preservation and awareness of Indigenous knowledge systems [ 121 ]. Eorts include using AI for language revitalization, such as tran- scribing Indigenous languages [ 26 , 30 ] and promoting language learning [ 127 ]. The Lauleo project, for instance, crowdsourced audio data in ¯ Olelo Hawai‘i (the Hawaiian language) to improve voice-to- text systems [ 41 ]. Other work has used generative AI to support cultural education, such as Baker-Ramos et al . [9] , who developed a system that integrates mo ‘ olelo (Hawaiian cultural stories) and proverbs into lesson plans for Hawaiian language immersion pr o- grams. While AI systems have promising applications for cultural and language education in Indigenous communities, these technologies come with a complex set of challenges. Prior work contends that AI systems are fundamentally rooted in W estern traditions and often fail to properly account for Indigenous epistemologies [ 17 , 90 , 93 ]. Another challenge arises from data scarcity . AI systems are trained on W estern-centric data sources, making them prone to inaccu- racies, cultural misrepresentations, and mistranslations for those underrepresented in this data, including Indigenous communities[ 2 , 122 , 173 ]. For Hawai‘i, models mistranslate mo‘ olelo or omit proper ‘okina punctuation [ 98 ], reecting a poor grasp of ‘ ¯ Olelo Hawai‘i, a critically endangered language [ 122 ]. These errors can have pro- found eects on communities. Since Hawaiian culture is passed on via language and stories, which hav e spe cic, ancient moral lessons, mistranslations can have generational impacts on understanding of traditions [ 94 ]. In addition to concerns about model outputs, gener- ative AI systems raise environmental issues, such as the substantial energy and water requirements required for training and opera- tion [ 11 , 70 ]. These resource demands compound e xisting pressures on Indigenous communities [ 40 ]. For example, already , there are organizing movements emerging against data centers proposed to be built on Native land [20, 151]. Despite this need for community-le d oversight, there ar e many complexities to scaolding end-user auditing process with Indige- nous communities. Existing auditing approaches often fail to detect harms when auditors lack the lived experiences or cultural knowl- edge neede d to recognize them [ 170 ], making Indigenous commu- nity involvement critical. Y et many Indigenous researchers remain skeptical of AI’s ability to respect cultural nuance [ 18 ], complicat- ing eorts to conduct user-centered auditing. Without community control, the digitization of Indigenous data can also enable new forms of colonialism, where researchers or companies appropriate cultural resources to train AI without returning benets to the com- munities [ 87 , 145 , 169 ]. Indigenous communities have been vocal about this concern, kno wn as data sovereignty , which outlines how communities should maintain ownership over their data and ho w it is used [ 25 , 169 ]. Furthermore, prior research has also noted that standard approaches to fairness and bias are often insucient in Indigenous contexts as they may ignore how a community denes bias [ 142 ]. Despite these challenges, one recent work has touted the potential of justice-oriente d and community-based AI education for achieving data sovereignty [ 108 ]. Keeping this potential frame work in mind, our work grapples with these complex considerations, providing an e xample of how to create auditing tools that embed Indigenous knowledge systems and values. 3.3 Harms of Generative AI in Educational Settings Despite the potential bo on that deploying generative AI tools have for educational settings, there are corresponding concerns about their potential for harm. In particular , educators are concerned about the broader p edagogical impacts of generative AI systems and the detrimental eects on students’ learning, which are further compounded by students’ overreliance on AI systems [ 60 , 171 ]. This concern is shared by educators and students alike [ 123 ]. Prior work has highlighted that overreliance and the subsequent decline in critical thinking skills is especially concerning given AI sys- tems’ tendency to hallucinate [ 10 , 47 , 71 ]. This behavior means that educaators must spend extra time to not only learn how to use and integrate these technologies in their classroms but also to check AI outputs for p otential errors [ 60 ]. Thus, what was initially Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain promised as a time-saver can in fact result in increased labor for educators [60, 116]. The risks of generative AI systems are exacerbated when de- ployed with students from historically marginalized communities or in culturally sensitive contexts. These inequalities can mani- fest in dierent ways, including the disproportionate focus on the English language [ 167 ] or dierential performance based on de- mographic factors such as race or gender[ 8 ]. T eachers have also expressed concern in AI integration in the classroom exacerbating existing inequalities in educations [ 60 ]. For example, Gelder et al . [49] , nd that emerging technologies “don’t work” as intended for Hawaiian immersion teachers unless they are sp ecically designed with cultural relevance . This misalignment is a harm in it of itself as it undermines educators’ agency and devalues local pe dagogical curricula. Consequently , while prior work has charted the reper- cussions of AI tools in e ducational contexts, less emphasis has been placed on ho w to best equip e ducators to discover and address these harms. 3.4 Equity-Driven Design Methods and Decolonial Computing Prior participator y and co-design scholarship has focuse d on surfac- ing community priorities, redistributing design power , and mitigat- ing harms when working with marginalized communities [ 32 , 59 , 142 , 153 ]. From its origin, participatory design (PD) has supported a democratic approach to addressing social phenomena where power imbalances inuence system design [ 21 ]. PD has been applied across diverse contexts, including asset design [ 166 ], civic engagement and community safety [ 7 , 38 ], collectivist approaches to health in- equities [119], and addressing economic disadvantages [38]. How- ever , participation is not a panacea, and when applied uncritically , PD can reproduce or amplify existing power asymmetries [ 59 ]. T o address these concerns, Harrington et al . [59] advocate for decol- onizing PD practices, emphasizing equity-driven approaches in design. Related to Harrington et al . [59] ’s critique of PD , recent HCI scholarship has increasingly turned to decolonial the ory to under- stand and challenge the colonial legacies emb edded in comput- ing [ 103 , 141 , 164 ]. This work moves beyond postcolonial critiques of representation to address coloniality , the underlying logic of domination and erasure borne from colonialism that continues to shape the many facets of our lives [ 99 ]. Within computing, colo- niality manifests in the assumption of a univ ersalist, Eurocentric worldview that overlook diverse epistemologies and ways of kno w- ing [3]. While the exact denition of “decolonial” varies across works in HCI and critical computing, these works share a unifying ethos focused on embracing pluriversality , or sustaining a “world of many worlds, ” rather than designing for a single universal stan- dard [ 85 , 164 ]. Applying decolonial thinking to computing systems has led to critiques of technologies that enforce these universalist perspectives, such as content moderation on social media plat- forms [ 135 ], digital mental health practices [ 120 ], or data collection in natural language processing (NLP) [ 16 , 65 ]. Other works high- light how Indigenous and Global South communities repurpose technologies, originally designed with W estern defaults, to better t lo cal knowledge and values [ 15 , 53 , 80 ]. Beyond providing an analytical lens, decolonial theory has spurred researchers to con- sider what these systems would look like if they wer e designed in a fundamentally decolonial manner , such as those that emphasize the value of “car e” [ 135 , 157 ]. W e build on this scholarship to inform recommendations around how we can design end-user auditing processes that support the plurality of needs, ways of knowing, and values across dierent communities. 4 Methods 4.1 Participants and Sites Our co-design workshops were hosted in-person at two public ele- mentary schools in O‘ahu, Hawai‘i during March 2025. Of the two schools we hosted workshops at, one oers a Kaiapuni (Hawaiian language immersion) program. W e recruited participants through partnerships with a local education organization, comprising of educators and cultural experts based in Hawai‘i. Through our part- ners, we circulated advertisements to educators via internal mailing lists at each of the partner schools. The demographics of workshop participants are reported in Fig. 3; to preserve the anonymity of participants we provide only aggregate statistics. W orkshop par- ticipants were compensated with a $40 gift card. This project was approved by Stanford Univ ersity’s Institutional Review Board. 4.2 W orkshop Activities In total, we ran four one-hour workshops with 22 participants facil- itated by two members of the research team. Each session hosted a dierent set of participants: ve in W orkshop 1, four in W orkshop 2, seven in W orkshop 3, and six in W orkshop 4 (22 in total). T o establish a shar ed understanding and vocabulary of generative AI’s applications and biases across participants, all workshops started with a short presentation from the facilitators; this presentation was identical across all workshops. After , participants engaged in a design exercise. Participants in W orkshops 1 and 2 completed a rapid prototyping e xercise, whereas in W orkshops 3 and 4, par- ticipants used storyb oarding. The workshops used the following agenda: (1) After introduction and setting workshop norms, participants were invited to share how they are currently using gener- ative AI in the classroom (if at all). The researchers on the team then provided a brief presentation ab out dierent educational applications of generative AI, covering both teacher-facing (e .g., making lesson plans, creating rubrics) and student-facing (e .g., creating interactive activities) use cases, since not all participants had prior experience with these technologies. (2) Second, we invited participants to reect on concerns they had with using generative AI in educational settings. W e asked open-ended questions about their experiences, includ- ing whether they had personally encountered or heard of instances of misrepresentation, bias, or other harms. Im- portantly , participants were free to raise any topics they considered relevant, and the range of concerns extended beyond issues we introduced. (3) In the third part, participants were shown three design probes and were asked to engage in open discussion to CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. share their thoughts. The participants were not explicitly prompted to focus on any types of harm, including cultural representation. (4) Finally , we gave a brief presentation on AI auditing with ex- amples about what purpose auditing ser ves and what types of inputs / outputs are expected from auditing to ols. Par- ticipants then engaged in one of two design exercises to prototype auditing tools they would nd useful for identi- fying problematic outputs when using generative AI. All participants shared their designs with the group , fostering further discussion on the types of auditing tools participants wanted. At the conclusion of the workshop, participants were given a brief survey , covering their concerns with using generative AI tools, reections from the workshop, and demographic information. 4.2.1 Design Probes. For each workshop, we presented partici- pants with the following probes, which are intentionally designed to match how generative AI is commonly used in educational set- tings [ 46 ]. Our probes were developed acr oss two iterative feedback sessions with educators in Hawai‘i and depict actual outputs gen- erated from ChatGPT and Gemini in March 2025. The three probes are as follows: • Localizing quiz generation : Our rst probe presents an example of an e ducator creating a math problem for fth grade students base d in Honolulu, Hawai‘i, a common ed- ucational use case for generativ e AI system (Fig. 2 A ). The generated quiz question setup is around “pineapple picking, ” presenting a stereotypical and oensiv e example given the role of pineapple plantations in Hawai‘i’s colonization. • Chatting with a historical gure : The second probe presents a scenario where a student is interacting with a generative AI system simulating a historical gure , a feature oer ed by ex- isting e ducational AI platforms. 2 The student converses with Captain James Co ok, who presents his exploits in Hawai‘i as feats of navigation as opposed to acts of colonization (Fig. 1). • Generating images : Finally , our third probe covers text-to- image generation (Fig. 2), which e ducators may use when creating instructional materials. W e generated two initial images: the rst image depicting a “modern classroom in Hawai‘i” (Fig. 2 B1) and the second an ‘ohe kapala, a at bamboo stamp used in Hawai‘i for fabric-making that stu- dents learn to craft themselves as part of arts courses (Fig. 2 B2). Participants were allowed to explore additional images. In W orkshop 2, participants generated an image depicting hukilau , a traditional Hawaiian shing method. 4.2.2 Design Exercises. W e conducted two exercises intended to facilitate div ergent and convergent design thinking for participants. In the rst two w orkshops, participants engaged in a rapid proto- typing exercise intended to generate divergent designs [ 78 ]. Partic- ipants were given eight to ten minutes and asked to ideate as many ideas as possible for tools that would support them in discovering problematic outputs of generative AI systems. At the end of the time window , individuals presented their ideas to the workshop to seed further discussion on tool designs. In our second set of 2 https://app.schoolai.com/spaces/clmqu2ycm00g93b664ot10jq9 workshops, our goal was to rene upon these set of to ol ideas. Par- ticipants formed groups of two to three and selecte d one of the previously generated tool ideas for a stor yboarding exercise. For the selecte d tool idea, participants were aske d to sketch panels detailing what the educator is using AI for , what types of cultural misrepresentation the teacher is concerned about, drawing an ex- ample of an AI auditing tool they would use, and listing what the auditing tool reveals about the AI system that would make them comfortable to use it ( or decide to not use it). Similar to the rst set of workshops, at the conclusion of the exercise, each group shared their storyboard to facilitate further reective discussion. 4.3 Analysis W e audio-recorded and transcribe d three of the four workshops. For the remaining workshop (W orkshop 3), researchers took detailed notes, which were included in the analysis. W e adopted an itera- tive, inductive appr oach to thematic analysis with three members of the research team iteratively coding workshop transcripts and produced design artifacts (i.e., rapid prototyping sketches and sto- ryboards). W e rst independently generated initial codes capturing salient ideas, which were organized on Miro, a collaborative white- boarding tool. These codes were then grouped into broader themes over two synchronous me etings; these themes forme d the basis of our codebook (see Appendix A.2). The same three researchers applied this codeb ook independently analyzing all workshop tran- scripts and design artifacts. W e met synchronously after coding each transcript to rene theme denitions, add missing codes, and negotiate dierences in interpretation. Each w orkshop transcript is coded independently by all three researchers. After analyzing all transcripts, we synthesized the exemplar quotes from the pr ocess, removing any duplicates that emerged during coding. W e adopted a reexive thematic analysis approach to analyze the transcripts as our goal was to explore themes grounded in par- ticipants’ experiences, which was imp ortant given the emergent and relatively underexplor ed nature of the workshop topics [ 29 ]. Following existing practices for inductive approaches, we do not report inter-rater reliability as the goal of our analysis is surfacing salient patterns from the workshops rather than quantifying ho w often these themes arose [ 95 ]. Instead, the three authors analyzing the transcripts established a shared understanding of the mean- ing and application of themes through rounds of discussion and reection. T o preserve anonymity , we identify participants only by workshop number and omit personal identiers. 4.4 Positionality Our authors comprise a team of nine people with a range of back- grounds germane to cultural representation in Hawai‘i and AI au- diting practices. The team consists of members with experience on algorithmic approaches to measuring and mitigating social biases in ML systems, community-engaged design, and AI for education. T wo of our authors are of K anaka Maoli descent; one of the authors identies as Kama‘ ¯ aina (local) and is a former public school educa- tor in Hawai‘i; the remaining authors are cultural outsiders. Our workshop materials were re viewed by all members of the research team. Several team members have existing collaborations with the partner schools in O‘ahu, helping coordinate our workshops. The Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain Gener at e an image of a modern classr oom in Hawaiʻi Gener at e an image of ʻohe kāpala Gener at e a mat h puzzle f or 5t h gr aders in Honolulu, Hawaii Pineapple Picking Puzzle Malia and her friends w ent pineapple picking on a f arm out side Honolulu Malia pick ed 4 mor e pineapples t han Leilani Leilani pick ed twice as man y pineapples as Kai Alt oget her , t he y pick ed 28 pineapples Ho w man y pineapples did each person pick A B1 B2 Figure 2: Examples of design probes shown to participants during the design workshop. In addition to the example with Captain Cook shown in Fig. 1, participants were shown prob es related to creating assessment questions (left) and generating images (right). These outputs show actual outputs from ChatGPT and Gemini from March 2025. For the image generation probe, we asked the model to provide two images: the rst was of a “modern classroom in Hawai‘i” (2A) and the second of an ‘ohe k ¯ apala, a at bamboo stamp typically car ved with geometric patterns used to decorate clothing (2B). workshop materials wer e co-developed by all members of the re- search team, and coordination with the r esearch sites was led by senior authors on the team who had established relationship with schools in Hawai‘i. The rst and thir d authors (both cultural out- siders) conducted the workshops; in addition to the rst and third authors, the second author (also a cultural outsider) developed the codebook and analyzed the transcripts. Finally , all memb ers of the research team wer e invited to contribute fee dback on the extracted themes and resulting draft. 5 Findings In this section, we present learnings from our workshops. First, we situate how participants are engaging with AI systems, e xploring how they are currently using or want to use generative AI in educa- tional settings. Next, we discuss the categories of harm — fo cusing on dimensions of cultural representation — that teachers are most concerned about. This understanding helps us lo calize what types of harms participants are likely auditing for and in what scenarios these tools will likely be used. W e next discuss design features participants proposed for auditing tools, mapping them to harms. Finally , w e pr ovide a design vignette of an exemplar system created as a result of the co-design sessions. 5.1 How are teachers engaging with generative AI systems? First, we sought to understand how participants use generative AI in their day-to-day roles as educators. Of the 22 participants, all but one reported having used AI in the past, and 68 . 2% (N=15) specically used AI in their teaching practices. As shown in Fig. 3 B1, the most commonly used generative AI tools among teachers included general-purpose tools, such as Gemini (N=13) and Chat- GPT (N=11), as well as education-specic tools like MagicSchool AI (N=7). Many participants reported using AI systems to create lesson plans (N=14), generate impromptu classroom activities (N=10), or prepare materials for stand-in teachers (N=5). Some also used AI for tasks requiring knowledge of Hawaiian culture and practices, such as developing culturally relevant teaching materials or translating content into ‘ ¯ Olelo Hawai‘i (Fig. 3 B2). Overall, our participants’ usage patterns mirror results from recent work surveying how K - 12 teachers in the U .S. more broadly ar e employing generative AI tools [ 46 ]. However , using these to ols to assist de veloping culturally relevant educational material is unique to the sample we observe. W e asked participants whether they had encountered any cases where AI-generated outputs did not align with the history or per- spectives they b een taught or want to teach. Half of our participants reported p ersonally encountering instances of bias in the past when using generative AI tools for educational purposes. For example, a participant in W orkshop 1 found the mo del’s kno wledge of Hawai‘i to be surface-level: “ [generated outputs] are very mainstream...it knows the ones that most other people know like Aloha Oe but be- yond that, when you’re lo oking for kind of more specic things it’s a little bit lacking in that area. ” Another participant in W orkshop 3 mentioned that when they tried to generate images related to Hawai‘i, the output always contained people in hula skirts even when not specied. Finally , another participant in W orkshop 4 men- tioned how generated outputs have contained “incorrect facts ab out Hawaiian history . ” 5.2 What concerns should AI auditing tools address? Next, we examined the types of concerns participants raised about using generative AI and what harms audits should address. While participants surfaced a wide range of concerns — from student mis- use of AI (e.g., academic dishonesty , inhibited learning) to broader societal impacts (e.g., job loss) — we focus on issues with cultural CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. A 1 . Identity 1 . T ools P A RTICIP A NT DEMOGRA PHICS B GENERA TIVE A I US A GE 3 . Gender A sian Whit e Kānaka Maoli (Nativ e Hawaiian) Ot her Black 2. Race / Et hnicity 47 . 8% 39 . 1% 34 . 8% 8 . 7% 4 . 3% Gemin 59 . 1 ChatGP 50 . 0 Not ebookL 13 . 6 Kānaka Maol (Nativ e Hawaiian 40 . 9% F emal 86 . 3% Mal 13 . 6% Kama ‘āina (Local 54 . 5% Ot he 4 . 5% 34 . 8% 60 . 9% Generating lesson plans 47 . 8% W riting d o cum ent s ( e.g. , e m ails ) 43 . 5 % C r eating c lassr oo m a c ti v it es 39 . 1% C r eating cu lt u rall y r ele v ant t ea ch ing m at erials 2 1 . 7% Gra d ing st ud ent w or k C r eating s ub stit u t e plans 2 1 . 7% T ranslating c ont ent int o ʻŌ lelo H a w ai ʻ i 17 .4% 2. U se C ases Ma g i cSch ool A 31 . 8 O t he  4 . 5 Figure 3: W e report summar y statistics on our 22 workshop participants, including information on their demographics (A ) and generative AI usage (B). Of our 22 participants, all but one had use d AI tools in the past and 68.1% (N=15) had specically use d AI tools in educational settings. W e nd that e ducators had most frequently used AI for logistical tasks such as generating lesson plans and writing documents. While usage patterns in our sample broadly mirror those from sur veys conducte d with K-12 educators across the U.S. [ 46 ], approximately 40% of participants reported using AI to ols for tasks related to Hawaiian culture (as noted in the dark green outlined bars in B2), which includes generating lesson materials and translating content. representation in education settings, where auditing tools are es- pecially well-suited to intervene. W e identied ve recurring con- cerns: (1) inaccurate outputs, (2) sup ercial cultural representation, (3) dominance of W estern narratives, (4) lack of div ersity , and (5) framing Hawaiian culture as historical rather than living. For each dimension, we discuss the pedagogical consequences participants linked to each concern. Finally , we report how participants’ broader concerns about generative AI in education intersect with those related to cultural representation. 5.2.1 Presenting incorrect or hallucinated outputs. The most com- mon concern that participants expressed pertained to hallucinate d outputs, especially related to culturally incorrect information. This nding is consistent with prior works that have found users are most familiar with hallucinations as a form of harm or misrepr esen- tation from generative AI systems [ 126 ]. In our case, participants were especially concerned about the tangible repercussions halluci- nated outputs could have on their students’ education. For e xample, in W orkshop 1, a participant mentioned how generated outputs would conate parts of dierent mo‘ olelo, or Hawaiian cultural stories: I used ChatGPT to organize the dierent mo‘olelos throughout the year ... Like there was one about Maui and how we read ab out how to nd the Hawaiian islands. And then I noticed in ChatGPT — wait a sec- ond — this is talking about Maui when he gets the sun. This isn’t the same mo‘ olelo that they read. This participant was particularly concerned as learning mo‘ olelo was a core aspect of state-level standards for their students. Participants also noted inaccuracies with language generation. For example, participants found that AI-generated phrases in ‘ ¯ Olelo felt inconsistent with how nativ e speakers write (“ certain phrases you would never really use in a letter , but [the AI tool] uses that ” [W2]) or are missing ‘okina, a symbol commonly used in ‘ ¯ Olelo. These inaccuracies pose a particular challenge for teachers, such as those in W orkshop 4, that ar e part of the Kaiapuni pr ogram (Hawaiian language immersion) in which classes are taught exclusively in ‘ ¯ Olelo. 5.2.2 Providing a surface-level depiction of Hawaiian culture. A sec- ond dimension of cultural misrepresentation that participants dis- cussed was related to the specicity of representation that occurred in both existing e ducational materials and generated outputs [ 126 ]. Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain Figure 4: Educators are concerned that AI systems will misrepresent aspects of Hawaiian culture. W e report sur vey results on what concerns workshop participants have with using AI in e ducational contexts ( A). From our thematic analysis, we provide ve dimensions of cultural misrepresentation that our participants believe AI auditing tools ought to address (B). Educators mention a pedagogical concern that e xisting educational materials tend to cater towards the cultural experiences of those on the mainland, presenting concepts that were irrelevant or un- familiar to students. For instance, participants gave examples of provided educational materials or standardized testing fr equently making mentione d to concepts, such as “ snow days (W1)” , that their students had not heard of before: “ or the attic, [standardized tests] always talk about playing in the basement and my kids are always like, what’s a basement? W e don’t have those. (W1)” Partici- pants emphasized that these irrelevant examples would appear “ in the middle of high-stakes testing ” (W1), and the added confusion for students could have negative r epercussions on their academic performance [ 118 , 147 ]. This disconnect highlights a potential op- portunity for generative AI tools, which could ideally be used to generate personalized content that resonates with students’ local experiences. While generative AI presents the opportunity to localize edu- cation materials, participants note that the resulting outputs are supercial, reproducing the pe dagogical repercussions they aim to avoid. Participants reported that AI outputs often provided a “mainland” perspective of Hawaiian culture that was incongruous with students’ lived experiences and could even perpetuate harmful stereotypes. For example in W orkshop 3, participants found the design probe describing a math problem containing pineapples to provide a shallow reference to Hawaiian culture; they men- tioned that they frequently encountered generated AI outputs with pineapple even though “ [pineapples] don’t represent us at all ” (W3). Furthemore, participants pointed out that pineapples could be con- sidered oensive given the fruit’s association with the colonization of Hawai‘i [ 77 ]. As an alternativ e, participants wanted the gener- ated output to use artifacts germane to their students’ lives, such as kalo (taro). Thus, while generative AI tools hold promise for adapt- ing learning content for Hawaiian students, this personalization risks becoming another form of misrepresentation and enacting the same pedagogical consequences. 5.2.3 Reinforcing W estern narratives. It is not only what is in the outputs but whose narratives and viewpoints are showcased. Par- ticipants discussed how generative AI tools presented a W estern perspective of history , glossing over historical injustices or harm to Hawaiian communities. For example, a participant in W orkshop 1 highlighted how the outputs of generative AI tools skew in favor of W estern colonialism of Hawai‘i, even going as far as omitting crucial Indigenous history entirely: “ you’re only going to read it from the perspe ctive of the [ U.S.] government, right? They came in to save the day . But where’s the perspe ctive of Queen Liliuokalani? ” A participant in W orkshop 3 made the same point, stating “ we know certain things happened but AI said it hadn’t . ” Predominantly W estern-centric curricula and materials can not only marginalize Indigenous perspe ctives, but lead to the underachievement and disengagement of Indigenous students [ 129 , 155 ]. These erasures of Hawaiian history actively undermine students’ opportunities to engage critically with histor y , presenting a signicant barrier to culturally grounded learning [45]. 5.2.4 Failing to showcase diversity . One subset of stereotyping that participants were particularly concerned about related to outputs’ inability to capture the true diversity of Hawai‘i. As mentioned in CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. Sec. 2, Hawai‘i has a large multicultural population, yet as partic- ipants in W orkshop 2 noticed when observing generated images of classrooms in Hawaii‘ , “ these pictures all seem to just be like Polynesians, where Hawai‘i is so much more than just Polynesians. ” Similarly , a participant from W orkshop 1 co mmented that “ it’s in- teresting that there’s not a white child... like the assumption there that everybody here is of Hawaiian descent or even a mix, but lo ok at our school population — many students are white, right? Like 30% or something . ” The overrepr esentation of Hawaiian individuals in generated outputs can help prevent cultural erasure [ 126 ]. However , it risks attening the cultural diversity in Hawai‘i that mor e closely aligns with students and teachers’ lived experiences. 5.2.5 Representing Hawaiian culture as historical rather than living. Participants noted that AI outputs often portrayed Hawaiian cul- ture as belonging to the past rather than the pr esent. In response to a design probe showing an AI-generated classroom in Hawai‘i, participants described it as “ old-fashioned ” and “ like how it would look in the 80s ” (W2). Similarly , the initial image of a hukilau , a Hawaiian shing practice, that participants in W orkshop 2 gener- ated was rendered rst as a stylized historical line drawing; only after iterating on the prompt were participants able to generate more photorealistic and contemporary depictions. These obser- vations reect broader concerns about how K -12 education often engages in this whitewashing , where Indigenous people and history are framed as passive or “ relics of the distance past ” [ 138 ], rather than existing in the modern context. 5.2.6 Intersecting concerns around generativ e AI use. Finally , partic- ipants’ concerns were not isolated; issues of cultural representation often intersected with broader worries about generative AI usage in educational settings. Many educators focused on how cultural misrepresentations directly aected their students, which amplied existing concerns about student misuse of AI. While participants felt condent in their own ability to identify harmful representations, they were concerned that students lacked the same critical aware- ness to evaluate generated outputs. A s one participant explained, “ [students] are just believing whatever they se e, you know? ... It’s hard because you just b elieve it, right? Whatever you read, you b elieve. And a lot of it is biased ” (W1). In addition, participants were concerned that students would become overr eliant on generative AI. For e x- ample, a participant in W orkshop 3 stated that they try to teach students that generative AI is “ a resource but not the only resource . ” For students learning about Hawaiian culture, they emphasized the importance of seeking out other methods, such as speaking to community elders or learning stories that are traditionally passed down orally . Outside of students interacting with generative AI systems, par- ticipants also highlighted the compounding eects of limited AI literacy . For example, when encountering hallucinated or stereotyp- ical outputs, educators wondered if the problem stemmed from how they prompted the model. Some participants stated that they did not even know how to prompt models to get culturally relevant in- formation. Furthermore , concerns about cultural misrepr esentation often left participants feeling they had to “ double-check ” (W4) every AI-generated response. Ironically , these technologies marketed as time-saving may increase educators’ w orkloads instead of reducing them [60, 133]. 5.3 What functionalities do users want in AI auditing to ols? Finally , we present three categories of auditing to ols ideated by participants during our workshops. As illustrated in Fig. 5, these in- clude: (1) tracing sources, (2) identifying perspe ctives in generated outputs, and (3) agging instances of harmful content. While these tools r esemble e xisting, general-purpose approaches for auditing or mitigating hallucinations in generative AI, our ndings show that participants’ designs are shaped by the cultural context and educa- tional setting. For each design, w e describ e the underlying concerns it addresses as well as key considerations for implementation. 5.3.1 Identifying sources. As shown in Fig. 5A, participants in W orkshop 4 proposed an auditing tool that allows users to trace the genealogy of a linked source, displaying information about the author as well as the lineage of their teachers. Participants stressed that understanding where outputs are drawn from was essential for verifying whether they could trust outputs. But here, source verication is not simply attributing a citation to the output. As par- ticipants explained, in the context of cultural kno wledge in Hawai‘i, there is often no denitive ground-truth to compare outputs against since cultural artifacts, such as mo‘ olelo or proverbs, will dier de- pending on the practitioner’s interpretation. As a participant in W orkshop 4 stated: “ Someone can take a proverb and write a whole dierent aspect, so it is just ab out who’s your teacher? ... Y ou type in [the author’s] name, who was their kumu? Y ou’re always going back to who was their teacher? Be cause other than that, you wouldn’t trust it. ” Because of these interpretive dierences, participants described that it was often more valuable to understand the genealogy of knowledge — who authored a source and who taught them — than to evaluate the content of a generated output directly . Unlike tradi- tional fact-checking or citation methods, which presuppose there is a correct or incorrect answer , this design enables the more inter- pretive work that participants described as essential for assessing culturally grounded knowledge. How to implement such a feature presents challenges regarding technical feasibility . Knowledge attribution for generative AI is an active area of research within NLP [ 76 , 152 , 158 ]. Our setting provides additional challenges as conventional citation based sys- tems often rely on digitized and unambiguous references, whereas sources here may not be digitally accessible or interpretative . Fur- thermore, this tool would require creating and maintaining a corpus of culturally validated material, as well as require users to contribute to its source genealogy , would require signicant community la- bor and governance structures for who is allowed to contribute validated sources. Another critical consideration is regarding who has control and access to the curated data. This consideration is tied to the rich discussion on Indigenous data sovereignty surfaced in the existing literature, particularly in the conte xt of AI systems [ 18 , 25 ]. W e see similar concerns about the misappropriation of cultural resources in our workshops as well. For example, in W orkshop 4, participants mentioned they fear sharing resources they have created for the Kaiapuni program on the Internet out of fear it might be criticized Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain P erspectiv e Visualization Output Flagging W rit e a biograph y f or Queen Liliʻuokalani Generat e a pictur e of a luau FL A G UPL O AD PO V Select y our concern Sour ce A ttribution Draft a lesson plan about la v a t hat incorporat es r ele v ant moʻolelo SOURCE Moʻolelo of ʻŌhiʻa & Lehua R ecognizing Hawaiian National So v er eignty Missing fr om output SOURCE GENEAL OG Y BA CK GROUND DESCRIBE PO V QUO TES FROM OUTPUT (0) Hallucinat ed Cultur al Output s Hallucinat ed Cultur al Output s Shallo w Depictions of Hawaiian Cultur e M onolit hic V ie w of Hawai ʻi E r asur e v ia Hist orici z ation R einf or ce m ent of W est ern N arr ati v es A B C Figure 5: W e present wireframes of auditing features based on the sketches and stor yboards that participants created during the co-design sessions. The three design elements that occurred consistently across co-design workshops include d source attribution (A ), perspective visualization (B), and agging problematic outputs (C). W e map each wireframe to the participant concerns identied in Sec. 5.2 that the feature was designed to address. or misused. For instance, they mentioned online gures who are “ totally anti-Hawaiian and ip everything around and use Hawaiian culture against us ” , or examples of newspapers that publish in the Hawaiian language but are incongruous with Hawaiian culture such as claiming that “‘ ohana is a made up word” (W4). Thus, for this design to be successful, it most not only integrate with Hawai- ian epistemologies that center knowledge prov enance, but also uphold data sovereignty , ensuring any sour ce material or cultural interpretations collected remains in the hands of the community . 5.3.2 Visualizing perspectives. Stemming from concerns that mod- els reinforce W estern narratives and provide a narrow depiction of Hawai‘i, participants in W orkshops 1 and 2 expressed interest in visualizing dierent viewpoints within AI-generate d outputs (Fig. 5B). Participants wanted outputs that could show case a mul- tiplicity of perspectives. One proposed design was a lightweight visualization that displays the distribution of perspectives in a re- sponse: “ How do you know that you’re getting [a generated output] that is well-rounded... I want like a little visual that’s like ‘this [output] is solid’ ” (W1). Equally important, participants stressed, is visibility into omissions: “ it would be interesting to be able to identify what perspectives went into that [generated] biography and what p erspec- tives are missing ” (W2). Overall, these auditing tools are especially critical given that existing curricula and teaching materials already tend to foreground mainland perspectives (see Sec. 5.2). Educators already have to expend extra eort to adapt these materials for their classroom [ 168 ]. If AI systems are intended to help localize educational content [ 9 , 52 ], it becomes crucial to ensure they do not reproduce the same biases and compound existing inequities. Achieving such a visualization requires several te chnical con- siderations. Persp ective auditing is closely related to the classic NLP task of stance classication [ 81 ], which infers whether text supports or opp oses a given claim. One approach to persp ective visualization is combining topic modeling [ 34 ], to surface the key issues in a text, with stance classication, to detect alignments on those issues. Detected stances can then be compared against a repos- itory of known perspectives, either curate d directly or gathered from external sources. These approaches are helpful for visualizing perspectives include in text, but understanding what perspe ctives are missing is an innitely large design space. A tractable approach is to have users dene a perspective and then che ck if it is present in an output, although this method assumes the user knows what perspectives they believe ought to be included. One important distinction to make with such a feature is dis- tinguishing when to showcase a plurality of perspectives versus actively addressing historical erasure [ 106 ]. Framing erasures or false W estern narratives as mer e “perspective” dierences can un- dercut the severity of the issue. If we fail to acknowledge why Hawaiian perspe ctives have been historically suppressed, these auditing features can actually undermine eorts to address epis- temic injustices [ 45 , 149 ]. Even the notion of what constitutes a perspective demands careful consideration: if dened too broadly , it can atten the rich diversity within a community and treat distinct CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. voices as a monolith. For example, W orkshop 4 participants note d that even Hawaiian-language newspapers oered contrasting edi- torial stances depending on who was in charge. As one participant explained, “ Sometimes when you lo ok at the Hawaiian newspaper and you read it, you’re like, what am I reading? Y ou know , it’s all in Hawaiian, but they’re telling you, oh, ‘surng is bad’ . ” 5.3.3 F lagging problematic outputs. Finally , participants in W ork- shop 1 suggested tools for agging instances of cultural misrepre- sentations (e.g., hallucinations, ster eotyping) and other problematic outputs. Whereas the previous designs were center ed around help- ing participants identify issues with generated outputs, participants were interested in using this featur e for raising awareness. In par- ticular , they wanted to use this tool when presenting AI-generated outputs to their students. A s one participant suggested, “ there needs to b e a ver y blatant warning signal. I’m thinking about our kids like something that is very bright that grabs attention or like ways that they could report a bias on whatever they’re reading. ” (W1). Participants also described using ags to prompt students to iden- tify problematic outputs themselves. For example , a participant in W orkshop 1 requested, “ a way that [they] could b e ‘like, I think this might be a bias, ’ and then it could have a categor y of types of bias, and then the kid could identify theirs. ” Participants wanted students to actively engage with AI outputs, honing their ability to discern whether outputs are problematic and develop their own point of view rather than passively absorbing information. These ndings illustrate how educators view auditing tools as dual-purpose: not only for uncovering problematic outputs but also as a pedagogical aid to build students’ critical thinking skills. Flagging problematic outputs is more te chnically straightfor- ward than the other featur es but poses sociotechnical challenges for implementation. Unlike more straightforward hallucination de- tection in generative AI tools, cultural misrepresentation requires reconciling diverse, subjective judgments. In Hawai‘i, many teach- ers have recently relocated from the mainland, bringing varying levels of cultural knowledge [ 64 , 89 ]. For example, in W orkshop 2, when we showed generate d images that misrepresented an “ohe k ¯ apala” , a traditional Hawaiian stamp made from bamboo used to decorate clothing, our participants told us that they had not heard of the artifact before (see Fig. 2). In contrast, when reacting to the same design probes, a participant in W orkshop 4 who identies as Kanaka Maoli (Native Hawaiian) stated: “No, we’re not doing that; we’re culturally grounded. But I can denitely see it how a lot of new teachers in Hawai‘i, like, coming from the US would just use [the output], 100% would just use it without thinking twice. ” When educators’ backgrounds do not align with students’ lived ex- periences, this can result in cultural disconne cts or the inadvertent use of educational material that reinforces stereotypes or omits key cultural contexts. These challenges are comp ounded by unev en lev- els of cultural and AI literacy among e ducators, which may hinder their ability to audit AI outputs critically . 5.4 Design Vignette W e conclude by presenting an exemplar system distille d from our workshop ndings (Fig. 6). Our design fo cuses on two core themes that emerged around the importance of knowledge geneal- ogy within Hawaiian epistemology and the diversity of cultural expertise educators have. 5.4.1 Exemplar System Design. W e propose a design that allows educators to trace and contest generative AI outputs in a collective fashion. When educators come across an output, they can inspect relevant sources, which contain attributes about the sour ce’s author and their background, allowing users to visualize the genealogy of knowledge. If the user views the output as pr oblematic, they can provide their interpretation on the nature of the harm. These entries are compiled into a shared report database that grows over time . T o facilitate knowledge pro venance within this report database as well, submissions are linked to prior contributions by the same auditor along with information about that individual and additional trust signals (e.g., r esponse veried by a cultural expert). T o account for the diversity of expertise, the system scaolds the de velopment of critical auditing skills. Users who feel uncertain about their interpr etation can quer y into the report database to sur- face related concerns that others have raised, facilitating learning through doing. In addition, they can ag p otentially concerning outputs, which are then routed to a more experienced community member that oers their expertise or validates any interpretation that the user provided. Reecting participants’ desire to use audit- ing as a educational exercise, students can also engage in agging outputs or viewing the report database. Importantly , the collected database of agged outputs and any reference sources should re- main locally-hosted and shared only with a trusted community . For one, distrust in large tech corporations or fear of misappro- priation may otherwise deter users from contributing. Mor eover , as discussed in Sec. 5.3.1, digitizing Indigenous data can b ecome a new form of colonialism if non-community members appropriate cultural resources to train AI systems without any benet being returned to Indigenous communities [87, 145, 169, 169]. In the current design, we present the tool as an independent browser extension that is interoperable across multiple generative AI platforms. W e note that partnering with existing model or edu- cational technology providers is an alternative choice that could ensure easier maintenance. Howe ver , based on sur vey results from Sec. 5.1, teachers’ generative AI w orkows are distributed across many platforms, requiring auditing tools that provide more exibil- ity . Again, given the emphasis on ensuring data sovereignty in our workshops, building tools that can b e hosted and controlled lo cally are a top priority . 5.4.2 Implementation Challenges and Considerations. W e conclude by surveying which aspects of our design vignette are feasible in the short term and identify ar eas where future work can contribute. As discussed in Se c. 5.3.1, source attribution in LLMs remains a signicant technical challenge and represents a growing area of re- search [ 76 , 152 , 158 ]. While reliable attribution may remain dicult, retrieving related sources may be possible using existing NLP and information retrieval techniques [ 158 ]. Beyond technical barriers, howev er , there are substantial sociotechnical challenges in imple- menting such a system. A central, and contested, question is who should govern or maintain it. While our ndings suggest that con- trol should rest with the community , there was no clear consensus on what “community gov ernance” should look like. Feedback from Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain Gener at e a lesson plan co v ering t he Apology R esolution. CHA T GPT REL A TED REPO RTS The Apology R esolution was passed int o law b y Pr esident Bill Clint on in 1993 . It f ormally apologiz es t o Nativ e Hawaiians on behalf of t he U .S. f or it s r ole in t he o v er t hr o w . Hawai'i’ s So v er eignty ... Apology R esolution... Users flag output s See r elat ed sour ces A https://chat gpt.com REPO RT D A T ABA SE Sear ch Denial of Hawai'i’ s So v er eignt R e vie w ed b y: K umu Iolani T . D Quer y int o t he corpus t o find r elat ed r epor t s Vie w genealogy of sour ce B A3 Hawai`i’ s So v er eignty GENEAL OG Y INFO V erified e x per t s v alidat e r epor t s E C Figure 6: W e present an exemplar auditing tool based on the authors distilling the prototyped features, discussions, and context into a cohesive system. The tool is proposed to be built on top of existing AI platforms (e.g., ChatGPT , Gemini). Users can view if there are sources related to the generate d output (A ) and view more information about who authored the source as well as the genealogy of knowledge (B). If a user is concerned about the output, they can ag it to enter into the report database (C). Newcomers or less experienced auditors can also participate by comparing outputs to preexisting reports in the database (D) and agging potentially harmful results that are reviewed by more experience d community members (E). our workshops reected these tensions: participants in W orkshop 4 favored ov ersight by state education ocials (e .g., the Oce of Hawaiian Education); W orkshop 3 participants preferred avoiding any government or political involvement; and W orkshop 1 partici- pants expressed interest in a more decentralized, school-by-school approach. 6 Discussion Our ndings show that educators in Hawai‘i face comp eting dy- namics around the use of generative AI. On one hand, participants described top-down encouragement from the state to integrate these tools, r eecting an institutional push toward adoption. At the individual level, many workshop participants expressed interest in using generative AI for many reasons, including saving time, creating more localized education content, and oering more inter- active classroom activities. At the same time , howev er , participants voiced a wide range of concerns about the potential harms these technologies may introduce, particularly when applied to tasks related to cultural knowledge. Given that Hawaiian culture is a mandated part of the curriculum, these shortcomings of generative AI are not a hypothetical risk but rather an inevitable problem with which participants need to reckon. These tensions highlight the gap between the push to adopt AI and the lack of safeguards ensuring its appropriateness. While end-user auditing pr ovides a general approach to bridge this gap, our workshops underscore that participants require to ols designed with Hawaiian values at the forefront, rather than relying on general-purpose solutions. While our study is situated within the cultural and educational context of Hawai‘i, we speak to how our ndings oer insights that might inform end-user auditing practices for other marginal- ized communities. Although the specic harms and designs that participants reported are not generalizable, our takeaway that w e must consider community-specic designs for end-user auditing has broader implications. T o date, most work on auditing tools has focused on cr eating more general-purpose tools that are intended to work acr oss many settings and for many groups of p eople [ 36 , 115 ]. Howev er , as Bird [17] argued about NLP methods more broadly , this “one-size-ts-all” appr oach problematically treats language as a de contextualized technical issue to solve, stripping away how language is actually used by people within specic communities. This operationalization may not account for the needs, values, or ways of knowing within a spe cic community , especially those that have been historically marginalized. Within HCI, decoloniality has provided a useful framework for understanding how we can center a plurality of perspe ctives, rather than assuming a universalist default [ 135 , 164 , 165 ]. In particular , Alvarado Garcia et al . [5] provide an agenda for decolonizing HCI research, listing v e pathways (understanding the why , reconsid- ering the how , changing the for whom, expanding the what, and reecting on the what for) that can be applie d to reorient how we design sociotechnical systems. Our following discussion will touch on three of the pathways in the context of end-user auditing, CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. discussing how to (1) accommodate diverse perspectives when au- diting; (2) develop infrastructur es that map to community values; and (3) ensure audit outputs benet the community . 6.1 Determining who conducts community-centered audits Our ndings illustrate that auditing for harms in AI systems re- quires situated knowledge that individuals external to that commu- nity may lack [ 57 , 126 ]. For example, as a participant in W orkshop 1 noted when observing a bias probe: “ we lo oke d at that [generated] picture and we’re like that’s denitely not a picture of a Hawai- ian classroom, but someb ody in Kansas, they might think and say yeah, that’s what a classroom lo oks like because they’ve never come here. ” This point echoes calls from recent HCI work advocating for adopting more participatory methods in AI auditing by involving end users, such as the educators from our workshops, as audi- tors [ 35 , 115 ]. Moreover , our workshops re vealed that within com- munities, we should not expect participation to be uniform: dier- ences in cultural expertise — such as between K ¯ anaka or Kama‘ ¯ aina educators and those recently arrived from the mainland — shaped who could recognize harms. For example, while most participants in our workshops found the design probe related to “pineapple picking” to be oensive, some participants in W orkshop 2 stated that they “ don’t se e it being biased. ” This raises an imp ortant question of how to design for the di- versity of community memb ers’ perspe ctives and experiences in auditing systems. A s Alvarado Gar cia et al . [5] argue, the decolonial path of “e xpanding the what” requires embracing multiple frames of reference and supporting plural kno wledge systems. While lev er- aging users’ diverse range of lived expertise is a key b enet for end-user auditing, if applied uncritically , it also creates a veneer of equality as not all community members bring the same forms of knowledge. Rather than treating auditing as a generic task that can be performed by any user , these asymmetries ought to shape whose judgments carry weight when conducting audits. At the same time, it is essential to ensure that the responsibility of audit- ing does not disproportionately fall on marginalized community members [ 143 , 160 ]. Thus, auditing systems must center the epis- temic authority of those most directly connected to the cultural contexts at stake, while still pro viding pathways for allies to partic- ipate. Design Recommendations. W e provide two design recommenda- tions for how community-centered audits can explicitly account for these dierences in their design. First, auditing tasks should be designed to accommodate dierent tiers of involvement or “trust” rather than assuming auditors will come in with equal experiences. Following practices from situated learning [ 84 ], those with less ex- perience can start with less involved tasks. After observing experts and demonstrating that they can be trusted with more involved tasks, they can take on more key roles. How these levels of trust are operationalized will depend on the community . Examples could include delineating base d on task type (e .g., mor e interpretiv e work versus fact-based validation) or issue severity . Second, auditing systems must include mechanisms to handle disagreements and provide pathways for discussion when auditors’ judgments con- ict [ 35 ]. Especially when dealing with AI outputs that do not have a denitive right or wrong answer , such as is in our case when dealing with cultural artifacts, disagr eements are both inevitable and informative. Rather than tr eating these divergences as errors, systems should scaold structured dialogue [ 35 , 137 ], such as by allowing auditors to document their reasoning, surface alternative interpretations, and view ho w others within the community have assessed similar cases. This not only supports transparency but also fosters colle ctive sense-making, enabling auditing practices to b etter reect the diversity of knowledge present even within communities. 6.2 Developing infrastructures to support community audits While eorts to make AI auditing more participatory have largely focused on including end users in evaluating model outputs, we argue for extending this lens to the design of audit tools them- selves. The technologies that these audits rely on also imbue their own standards and politics in the same way that individuals’ po- sitionality inevitably shapes auditing outputs [ 162 ]. Relying on one-size-ts-all infrastructures risks imposing universalist notions of what constitutes a harmful or undesired output that may not align with the communities being ser ved. As illustrated in our work- shops, points of friction arise when general-purpose tooling fails to accommodate what our participants ne ed to conduct audits. For example, many participants expr essed interest in source auditing. Ostensibly , knowledge attribution or retrieval-based systems could provide this functionality; howev er , conversations with participants revealed that what they nee ded was not only knowing what the source says but also its genealogy . Since cultural content, such as mo‘ olelo and proverbs, are open to multiple interpretations, par- ticipants emphasized that assessing trustworthiness depends on knowing whose interpretation is b eing surfaced. In contrast, adopt- ing a decolonial approach, instead, asks ho w we can mov e from this mindset of universality to wards one of pluriv ersality , ensuring that auditing tools embed the local values of the communities they are intended to ser ve. This ties to Alvarado Garcia et al . [5] ’s pathway of “reconsidering the how” which invites us to think about how we can orient our methodologies to match what the communities we are engaging with want or value. In addition, this approach connects to Bird [17] ’s framework for non-e xtractive NLP , which argues for building language technologies that grant agency to the communities from which they come, by leveraging cultural resources such as elders and engaging the community wher e they are at. Design Recommendations. As a concrete recommendation, we propose expanding the frame of participatory auditing: from invit- ing communities to perform audits toward enabling them to shape the infrastructures that make auditing possible. How do w e enact this in practice? In this work, we adopt practices from participatory design (PD) and conduct co-design sessions. However , this method poses its own set of challenges. Prior work in HCI has documented the limitations and even harms of engaging in PD, especially when working with historically marginalized communities [ 59 ]. There are also concerns about how to maintain any tools built going for- ward, especially if they are created by researchers who may not be incentivized to invest in long-term maintenance [ 79 ]. Expanding Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain on the exer cises from our design workshops, we advocate for future work to draw on speculative design approaches, which has proven fruitful for imagining alternative futures particularly for histori- cally marginalize d groups [ 58 , 163 ]. In conjunction to extending PD methods to creating auditing tools, we advocate for further ex- plorations into equipping community members with the means to tool build themselves, allowing for community self-determination, rather than relying on researchers or model practitioners as inter- mediaries. W e recognize that building tools requires more technique overhead; howe ver , we see opportunities in leveraging advances in generative AI technologies to lower these barriers. For example, prompting and other lightweight interaction modalities, such as audio-based interfaces, could make it more feasible for people to create or customize auditing to ols without requiring deep tech- nical expertise. Already , there are initial explorations of how to support educators in developing their o wn AI tooling, which can be extended into the auditing context [60]. 6.3 Reconsidering valued outcomes of AI audits Finally , we question who benets from AI audits, mapping to the pathway about changing for whom HCI research is being done [ 5 ]. At the moment, most AI auditing endeavors focuses on oering tech- nical remediation and improving the model output [ 105 , 128 , 130 ]. Howev er , this design centralizes p ower in the hands of the platform or model pro viders who can decide whether to address these harms. It also leaves community members uncertain of whether they will benet from their labor . Furthermore, centering technical solutions presupposes that AI systems ought to b e used and simply need remediation, when it is possible that refusal, or choosing not to not use AI altogether , may b e the best course of action in the given situation. These considerations leads us to the question “ who are these audits intended to serve? Is the goal of the audit to improve the model’s performance, or is it to help the community ? ” In fact, participants in our workshops rarely expr essed interest in inuenc- ing platforms holistically . Instead, they imagined auditing features that supported locally consequential de cisions, such as whether a lesson plan was appropriate for a classroom or whether to use a generative AI system at all. Design Recommendations. Drawing on Shahid and V ashistha [135]’s work on decolonial approaches to content moderation, we adopt their framing that the goal of such practices should be to “re- pair , educate, and sustain communities. ” Applying this p erspective to AI auditing, we provide design recommendations about how to create tools that can benet communities directly . An important as- pect of this is designing auditing tools that incorporate educational opportunities for community members. Already in our workshops, we saw how participants viewed auditing as a useful educational ex- ercise for their students, encouraging them to engage more deeply with generated outputs rather than taking the content at face value . Lighter-weight interventions may involve r eection prompts that periodically appear as a user is interacting with a generative AI system that asks them to think ab out what assumptions the mo del may be making when generating a r esponse. A more heavy-weight approach could be building multi-tiered auditing tools, similar to our exemplar in Sec. 5.4, that can support community members in identifying problematic outputs while also scaolding learning for students or newcomers. Prior work [ 144 ] has also suggested integrating best practices from learning sciences in the design of auditing tools to help facilitate the process of learning by doing. While there are many design opportunities in this dir ection, the un- derlying principle is to build auditing tools that, rst and foremost, directly benet the communities using them. 7 Limitations and Broader Ethical Reections In this study we sought to make observations and recommenda- tions that could apply generally to practitioners se eking to build AI auditing tools for e ducators in Hawai‘i. T o support this, we con- ducted workshops with teachers lo cated in Hawai‘i. Due to the focused nature of our study , the scope of the participant pool is a limitation to our ndings. Our workshops were all conducted with elementary-level public school teachers located in O‘ahu. T eachers on other islands, private or charter school teachers, and teachers at dierent grade levels w ould have dierent concerns about using AI in educational contexts. In addition, although our sample size is in line with similar works at CHI, the limited participant size (N=22) can hinder generalizability . Furthermore our participants option- ally chose to participate in our study after school hours, so the population of participants at each school was self-selecting. This may have led to a participant population that was more familiar or willing to engage with AI systems, whereas teachers who are less familiar or even opposed to AI may have opted not to participate. Finally , we center educators’ perspectives in this study; howe ver , there are other stakeholders — students, administrators, cultural practitioners — who ar e not r epresented in our w orkshops. Thus, it remains future work to conrm how these observations align with shared values across a broader spectrum of educators in Hawai‘i. Another limitation of this work is our focus on text-based gen- erative AI systems. The propose d designs in Se c. 5.3 and design vignette are primarily intended for natural language input and out- put. This de cision reects how participants are primarily using generative AI systems. W e expect that the auditing ne eds, harm types, and required technical solutions will vary across modalities (e.g., for visual or for audio-based systems), necessitating dierent mechanisms for surfacing harm which are not explored in depth within this work. Finally , we want to provide a reection on the broader ethical considerations of this work. Initially , the missive of this project was to build a general-purpose auditing tool that could be used by educators in Hawai‘i. However , after engaging in workshops, it became clear that a generalizable approach to cultural auditing would be detrimental to the community’s needs. As described by participants, cultural knowledge is embedde d in localized, genera- tional practices, and access to this knowledge requires building deep relationships and trust with community members. Hence, attempts to abstract these culturally signicant practices into a generalized tool deployed across communities risks misrepresenting or atten- ing the epistemologies they aim to preserve. These considerations shifted us away from designing a generalizable tool, and instead ideate on frameworks that support community-centered auditing infrastructures. Ultimately , we argue the decision to build, or not to build, should be made not only considering technical feasibility , but through careful reection about whether the system can be CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. sustained ov er time and most importantly whether it provides long- lasting benets to the community beyond the scop e of an academic paper . 8 Conclusion Overall, this work sheds light on educator concerns surr ounding AI in the classroom and introduces a framework for co-developing community-centered auditing tools for Indigenous and low-resource contexts. Through four co-design workshops with 22 public school educators in O‘ahu, Hawai‘i, we identied key worries ar ound AI use in education — particularly regarding cultural representation — and surfaced auditing practices educators envision to address these issues. Drawing on these insights, we argue that AI auditing should be framed as a community practice : one actively shaped by the people and communities most ae cted by these technologies. This reframing invites us to reconsider who ser ves as an auditor within this process, how auditing infrastructures ought to b e designe d, and what outcomes should be valued in the auditing process. Acknowledgments W e thank our participants for their contributions, and our elemen- tary scho ol partners for helping facilitate the design workshops. In addition, we also are grateful to the Ulu L ¯ ahui Foundation, Purple Mai‘a Foundation, Hawaii Department of Education, the Oce of Hawaiian Education, Alika Spahn Naihe, and Amanda Nelson for taking the time to meet with members of our research team. Fi- nally , w e ar e grateful for members of the SALT Lab for their helpful feedback on this work. This research was partially supported by the National Science Foundation under award numbers IIS-2247357, CNS-2137784, and CNS-2145584. W e w ould also like to acknowledge support by the Alfred P. Sloan Foundation, DSO National Laboratories (DSO), VMW are, Google, and Catherine M. and James E. Allchin. Dora Zhao is supported in part by the Paul and Daisy Soros Fellowship for New Americans. Any opinions, ndings, conclusions, or rec- ommendations expressed in this material are those of the authors and do not necessarily reect the views of the National Science Foundation or other supporters. References [1] National Park Service 2023. ‘ ¯ Olelo Hawai‘i: Hawaiian Language . National Park Service. https://ww w .nps.gov/havo/learn/historyculture/olelo- hawaii.htm Hawai‘i V olcanoes National Park, accessed August 3, 2025. [2] Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, T anuja Ganu, Same er Segal, Mohamed Ahmed, et al. 2023. MEGA: Multilingual Evaluation of Generative AI. In Conference on Empirical Methods in Natural Language Processing (EMNLP) . [3] Syed Mustafa Ali. 2016. A brief introduction to decolonial computing. XRDS: Crossroads, The A CM Magazine for Students 22, 4 (2016), 16–21. [4] Marc Alier , María José Casañ, and Daniel Amo Filvà. 2023. Smart Learning Applications: Leveraging LLMs for Contextualized and Ethical Educational T ech- nology . In International Conference on T echnological Ecosystems for Enhancing Multiculturality (TEEM) . [5] Adriana Alvarado Garcia, Juan F Maestre, Manuhuia Barcham, Marilyn Iriarte , Marisol W ong- Villacres, Oscar A Lemus, Palak Dudani, Pedro Reynolds-Cuéllar , Ruotong W ang, and Ter esa Cerratto Pargman. 2021. Decolonial pathways: Our manifesto for a decolonizing agenda in HCI resear ch and design. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA) . [6] Joshua Asplund, Motahhare Eslami, Hari Sundaram, Christian Sandvig, and Karrie Karahalios. 2020. Auditing race and gender discrimination in online housing markets. In International AAAI Conference on W eb and So cial Media (ICWSM) . [7] Zinzi D Bailey , Nancy Krieger , Madina Agénor , Jasmine Graves, Natalia Linos, and Mary T Bassett. 2017. Structural racism and health inequities in the USA: evidence and interventions. The Lancet 389, 10077 (2017), 1453–1463. [8] Ryan S Baker and Aaron Hawn. 2022. Algorithmic bias in education. Interna- tional Journal of Articial Intelligence in Education 32, 4 (2022), 1052–1092. [9] Rachel Baker-Ramos, Will Gelder, Leah Cho, Jahnavi Kolakaluri, and Josiah Hester . 2025. Kumu Conne ct: Design Thinking for Place-Based Generative Educational T echnology in Hawaiian Immersion Schools. In Conference on Research on Equitable and Sustaine d Participation in Engineering, Computing, and T echnology (RESPECT) . [10] Matt Barnum. 2024. W e tested an AI tutor for kids. It struggled with basic math. The W all Street Journal (2024). https://www.wsj.com/tech/ai/ai- is- tutoring- students- but- still- struggles- with- basic- math- 694e76d3 [11] Noman Bashir , Priya Donti, James Cu, Sydney Sroka, Marija Ilic, Vivienne Sze, Christina Delimitrou, and Elsa Olivetti. 2024. The Climate and Sustainability Implications of Generative AI. An MI T Exploration of Generative AI (mar 27 2024). https://mit-genai.pubpub.org/pub/8ulgrckc. [12] Ahmet Baytak. 2024. The Content Analysis of the Lesson Plans Created by ChatGPT and Google Gemini. Research in Social Sciences and Technology 9, 1 (2024), 329–350. [13] Kamanamaikalani Beamer , Kawena Elkington, Pua Souza, Axel T uma, Andrea Thorenz, Sandra Köhler , K ¯ aneoka Kukea-Shultz, Keli‘i Kotubetey , and Kawika B Winter . 2023. Island and Indigenous systems of circularity: how Hawai‘i can inform the development of universal circular economy p olicy goals. Ecology and Society 28 (2023). Issue 1. [14] Richard A Berk. 2021. Articial intelligence, pr edictive policing, and risk assess- ment for law enforcement. A nnual Review of Criminology 4, 1 (2021), 209–237. [15] Nicola J Bidwell. 2021. Decolonising in the gaps: Community networks and the identity of African innovation. In Re-imagining Communication in Africa and the Caribbean: Global South Issues in Media, Culture and T echnology . Springer , 97–115. [16] Steven Bird. 2020. Decolonising speech and language te chnology . In International Conference on Computational Linguistics (COLING) . [17] Steven Bird. 2024. Must NLP be Extractive? . In A nnual Meeting of the Association for Computational Linguistics (A CL) . [18] Abeba Birhane and Olivia Guest. 2021. T owards decolonising computational sciences. K vinder, Køn & Forskning 29, 2 (2021), 60–73. [19] Abeba Birhane, Ryan Stee d, Victor Ojewale, Briana V ecchione, and Inioluwa Deb- orah Raji. 2024. AI auditing: The broken bus on the road to AI accountability . In IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) . [20] Zach Boblitt. 2025. Indigenous environmental advocates say data centers risk water , culture and informed consent. https://ww w .publicradiotulsa.org/lo cal- regional/2025- 10- 07/indigenous- environmental- advocates- say- data- centers- risk- water- culture- and- informe d- consent [21] Andrea Botero and Sampsa Hy ysalo. 2013. Ageing together: Steps towards evolutionary co-design in everyday practices. CoDesign 9, 1 (2013), 37–54. [22] Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accu- racy disparities in commercial gender classication. In Conference on Fairness, Accountability and Transparency (FA ccT) . [23] Beatriz Cabellos, Carlos De Aldama, and Juan-Ignacio Pozo. 2024. University teachers’ beliefs about the use of generative articial intelligence for teaching and learning. Frontiers in Psychology 15 (2024). [24] Miguel A. Cardona, Roberto J. Rodríguez, and Kristina Ishmael. 2023. Articial Intelligence (AI) and the Future of T eaching and Learning: Insights and Recommen- dations . T echnical Report. U.S. Department of Education, Oce of Educational T echnology (OET). https://ww w2.ed.gov/documents/ai- report/ai- report.pdf [25] Rogelio E Cardona-Rivera, J Kaleo Alladin, Breanne K Litts, and Melissa T ehee. 2024. Indigenous futures in generative articial intelligence: the paradox of participation. T eaching and Generative AI (2024). [26] Kaavya D Chaparala, Guido Zarrella, Bruce T orres Fischer , Larry Kimura, and Oiwi Parker Jones. 2024. Mai Ho ‘om ¯ auna i ka ‘ Ai: Language Mo dels Improve Automatic Spe ech Recognition in Hawaiian. In NeurIPS W orkshop on Ecient Natural Language and Spee ch Processing W orkshop (ENLSP) . [27] Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. 2018. Investigating the impact of gender on rank in resume search engines. In ACM CHI Conference on Human Factors in Computing Systems . [28] Xinxi Chen, Li Wang, W ei Wu, Qi Tang, and Yiyao Liu. 2024. Honest AI: Fine- Tuning "Small" Language Models to Say "I Don’t Know" , and Reducing Hal- lucination in RAG. In KDD Cup W orkshop for Retrieval Augmented Generation . [29] Victoria Clarke and Virginia Braun. 2014. Thematic analysis. In Encyclopedia of critical psychology . Springer , 1947–1952. [30] Rolando Coto-Solano, Sally Akevai Nicholas, Samiha Datta, Victoria Quint, Piripi Wills, Emma Ngakuravaru Powell, Liam Koka’ua, Syed Tanv eer , and Isaac Feldman. 2022. Development of Automatic Speech Recognition for the Documentation of Cook Islands M ¯ aori. In Language Resources and Evaluation Conference (LREC) . Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain [31] Henriette Cramer , Jean Garcia-Gathright, Aaron Springer , and Sravana Reddy. 2018. Assessing and addressing algorithmic bias in practice. Interactions 25, 6 (2018), 58–63. [32] Stefany Cruz, Alexander Redding, Connie W Chau, Claire Lu, Julia Persche, Josiah Hester , and Maia Jacobs. 2023. EquityW are: Co-designing wearables with and for low income communities in the US. In ACM CHI Conference on Human Factors in Computing Systems . [33] W ei Dai, Jionghao Lin, Hua Jin, T ongguang Li, Yi-Shan T sai, Dragan Gašević, and Guanliang Chen. 2023. Can large language models provide feedback to students? A case study on ChatGPT . In IEEE International Conference on Advanced Learning T echnologies (ICALT) . IEEE. [34] Scott Deerwester , Susan T Dumais, George W Furnas, Thomas K Landauer , and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the A merican Society for Information Science 41, 6 (1990), 391–407. [35] W esley Hanwen Deng, W ang Claire, Howard Ziyu Han, Jason I Hong, Kenneth Holstein, and Motahhare Eslami. 2025. W eaudit: Scaolding user auditors and ai practitioners in auditing generative ai. Proceedings of the ACM on Human- Computer Interaction 9, 7 (2025), 1–35. [36] W esley Hanwen Deng, Boyuan Guo, Alicia Devrio, Hong Shen, Motahhare Eslami, and Kenneth Holstein. 2023. Understanding practices, challenges, and opportunities for user-engaged algorithm auditing in industry practice. In A CM CHI Conference on Human Factors in Computing Systems . [37] Alicia DeV os, Aditi Dhabalia, Hong Shen, Kenneth Holstein, and Motahhare Eslami. 2022. T oward User-Driven Algorithm A uditing: Investigating users’ strategies for uncovering harmful algorithmic behavior . In ACM CHI Conference on Human Factors in Computing Systems . [38] T awanna R Dillahunt and Amelia R Malone. 2015. The promise of the sharing economy among disadvantaged communities. In A CM CHI Conference on Human Factors in Computing Systems . [39] Esin Durmus, Karina Nguyen, Thomas I Liao , Nicholas Schiefer , Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hateld-Dodds, Danny Hernandez, Nicholas Joseph, et al . 2024. Towar ds measuring the representation of subjective global opinions in language models. In Conference on Language Modeling (COLM) . [40] Nina-Simone Edwards. 2025. Unveiling the Environmental Impact of Large Language Models on Indigenous Communities. Tulane Environmental Law Journal 38, 1 (2025), 1–27. [41] Susan Enright. 2025. UH Hilo Hawaiian language college turns to AI to help secure the future of ‘ ¯ olelo Hawai‘i . https://hilo.hawaii.edu/chancellor/stories/2025/02/ 26/hawaiian- language- college- turns- to- ai/ [42] Motahhare Eslami, Kristen V accaro, Karrie Karahalios, and Kevin Hamilton. 2017. “Be careful; things can be w orse than they appear”’: Understanding biased algorithms and users’ behavior around them in rating platforms. In International AAAI Conference on W eb and Social Media (ICWSM) . [43] Motahhare Eslami, Kristen V accaro, Min K yung Le e, Amit Elazari Bar On, Eric Gilbert, and Karrie K arahalios. 2019. User attitudes towards algorithmic opacity and transparency in online reviewing platforms. In ACM CHI Conference on Human Factors in Computing Systems . [44] Haoxiang Fan, Guanzheng Chen, Xingbo W ang, and Zhenhui Peng. 2024. Lesson- Planner: Assisting novice teachers to prepare pedagogy-driven lesson plans with large language models. In A nnual ACM Symposium on User Interface Software and T echnology (UIST) . [45] Miranda Fricker . 2007. Epistemic injustice: Power and the ethics of knowing . Oxford university press. [46] Gallup. 2025. T eaching for T omorrow: Unlocking Six W eeks a Y ear With AI . T e chni- cal Report. Gallup and W alton Family Foundation. https://ww w .gallup.com/le/ analytics/691922/W alton- Family- Foundation- Gallup- T eachers- AI- Report.p df [47] Catherine A Gao, Frederick M Howard, Nikolay S Markov , Emma C Dyer, Siddhi Ramesh, Y uan Luo, and Alexander T Pearson. 2023. Comparing scientic abstracts generated by ChatGPT to real abstracts with dete ctors and blinded human reviewers. NPJ Digital Medicine 6, 1 (2023), 75. [48] Tray Geiger and Margarita Pivovaro va. 2018. The eects of working conditions on teacher retention. Teachers and Teaching 24, 6 (2018), 604–625. [49] William Gelder , Rachel Baker-Ramos, A young Cho, Jahnavi K olakaluri, Judith Uchidiuno, and Josiah Hester. 2024. “Those don’t work for us”’: An Assets- Based Approach to Incorporating Emerging T echnologies in Viable Hawaiian T eacher Support T ools for Culturally Relevant CS Education. In Conference on Research on Equitable and Sustaine d Participation in Engineering, Computing, and T echnology (RESPECT) . [50] Sourojit Ghosh, Nina Lutz, and A ylin Caliskan. 2024. “I Don’t See Myself Represented Here at All’: User Experiences of Stable Diusion Outputs Con- taining Representational Harms across Gender Identities and Nationalities. In AAAI/ACM Conference on AI, Ethics, and Society (AIES) . [51] Sourojit Ghosh, Pranav Narayanan V enkit, Sanjana Gautam, Shomir Wilson, and A ylin Caliskan. 2024. Do generative AI models output harm while repre- senting non- W estern cultures: Evidence from a community-centered approach. In AAAI/ACM Conference on AI, Ethics, and Society (AIES) . [52] Melissa Ozlem Grab. 2025. Teaching for Equity: An Exploration of AI’s Role in Culturally Responsive Teaching in Higher Education Settings. Innovative Higher Education (2025), 1–22. [53] Carlos Guerrero Millan, Bettina Nissen, and Larissa Pschetz. 2024. Cosmovision Of Data: An Indigenous Approach to T echnologies for Self-Determination. In ACM CHI Conference on Human Factors in Computing Systems . [54] Mascha Gugganig. 2021. Hawai‘i as a laborator y paradise: Divergent sociotech- nical island imaginaries. Science as Culture 30, 3 (2021), 342–366. [55] Edward Smith Handy , Elizabeth Green Handy , and Mary Kawena Pukui. 1972. Native planters in old Hawaii: Their life, lore, and environment . Bishop Museum Press. [56] Violet Harada. 2016. The Power of Place-Based Learning: Caring for Our Island Earth. https://ww w2.hawaii.edu/~vharada/PDF/works/2016- P laceBased.pdf Accessed: July 20, 2025. [57] Donna Haraway . 1988. Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective. Feminist Studies 14, 3 (1988), 575–599. [58] Christina Harrington and Tawanna R Dillahunt. 2021. Eliciting tech futures among Black young adults: A case study of remote spe culative co-design. In ACM CHI Conference on Human Factors in Computing Systems . [59] Christina Harrington, Sheena Erete, and Anne Marie Piper . 2019. Deconstructing community-based collaborative design: T owards more equitable participatory design engagements. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019). [60] Emma Harvey , Allison Koenecke, and Rene F Kizilcec. 2025. “Don’t Forget the T eachers”’: T owards an Educator-Center ed Understanding of Harms from Large Language Models in Education. In ACM CHI Conference on Human Factors in Computing Systems . [61] Md Arid Hasan, Prerona T arannum, Krishno Dey , Imran Razzak, and Usman Naseem. 2024. Do large language models speak all languages equally? a com- parative study in low-resource settings. arXiv preprint arXiv:2408.02237 (2024). [62] Reem Hashem, Nagla Ali, Farah El Zein, Patricia Fidalgo, and Othman Abu Khurma. 2024. AI to the rescue: Exploring the p otential of ChatGPT as a teacher ally for workload relief and burnout prevention. Research & Practice in T echnology Enhance d Learning 19 (2024). [63] Hawai‘i State Department of Education. 2025. Articial Intelligence. https: //hawaiipublicschools.org/student- programs/articial- intelligence/. Accessed on July 23, 2025. [64] Hawai‘i State Department of Education, Oce of T alent Management. 2024. Employment Report: School Y ear 2023–24 . T echnical Report. Hawai‘i State De- partment of Education. https://hawaiipublicschools.org/wp- content/uploads/ Employment- Report- SY- 2023- 24.pdf [65] William Held, Camille Harris, Michael Best, and Diyi Y ang. 2023. A material lens on coloniality in NLP. arXiv preprint arXiv:2311.08391 (2023). [66] HISTORY .com Editors. 2025. Captain Cook killed in Hawaii. https://www.history .com/this- day- in- history/februar y- 14/captain- cook- killed- in- hawaii. Accessed on August 24, 2025. [67] Daniel Homan, Seungoh Paek, Peter Leong, and Rochelle Pi‘ilani Ka‘aloa. 2023. Advancing Culturally-Relevant Computing through a Researcher-Practitioner Partnership: Kumu Perspectives on LLMs for Culturally Revitalizing CS Educa- tion in Hawaiian Schools. In EdMedia + Innovate Learning 2023 . Association for the Advancement of Computing in Education ( AACE). [68] Kenneth Holstein, Jennifer W ortman V aughan, Hal Daumé III, Miro Dudik, and Hanna W allach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In A CM CHI Conference on Human Factors in Computing Systems . [69] Bihao Hu, Longwei Zheng, Jiayi Zhu, Lishan Ding, Yilei W ang, and Xiaoqing Gu. 2024. T eaching plan generation and evaluation with GPT -4: Unleashing the potential of LLM in instructional design. IEEE Transactions on Learning T echnologies 17 (2024), 1445–1459. [70] Nidhal Jegham, Marwan Abdelatti, Chan Y oung Koh, Lassad Elmoubarki, and Abdeltawab Hendawi. 2025. How hungry is AI? benchmarking energy , water , and carbon footprint of llm inference. arXiv preprint arXiv:2505.09598 (2025). [71] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Y u, Dan Su, Y an Xu, Etsuko Ishii, Y e Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. A CM Comput. Surveys 55, 12 (2023). [72] Y oung J Juhn, Euijung Ryu, Chung-Il Wi, Katherine S King, Momin Malik, Santiago Romero-Brufau, Chunhua W eng, Sunghwan Sohn, Richar d R Sharp, and John D Halamka. 2022. Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index. Journal of the A merican Medical Informatics Association ( JAMIA) 29, 7 (2022), 1142–1151. [73] Muhammet Remzi Karaman et al . 2024. Are Lesson Plans Created by ChatGPT More Eective? An Experimental Study . International Journal of T echnology in Education 7, 1 (2024), 107–127. [74] Enkelejda Kasneci, Kathrin Seßler , Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer , Urs Gasser , Georg Groh, Stephan Günnemann, Eyke Hüllermeier , et al . 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Dierences 103 (2023). [75] Frank Kehoe. 2023. Leveraging generativ e AI tools for enhanced lesson planning in initial teacher education at post primary . Irish Journal of T echnology Enhance d CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. Learning 7, 2 (2023), 172–182. [76] Brandon Khoo, Raphaël C- W Phan, and Chern-Hong Lim. 2022. Deepfake attribution: On the source identication of articially generated images. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, 3 (2022). [77] Shana Klein. 2025. “The Kingdom Grown Out of a Little Boy’s Garden”’: Dole Pineapples and Hawaiian Occupation in US Art and Visual Culture. In The Routledge Companion to Art and the Formation of Empire . Routledge, 44–63. [78] Jake Knapp, John Zeratsky , and Braden Kowitz. 2016. Sprint: How to solve big problems and test new ideas in just ve days . Simon and Schuster. [79] Y asmine Kotturi, Julie Hui, TJ Johnson, Lutalo Sanifu, and T awanna R Dillahunt. 2024. Sustaining Community-Based Research in Computing: Lessons from T wo T ech Capacity Building Initiatives for Local Businesses. Proceedings of the ACM on Human-Computer Interaction 8, CSCW (2024). [80] Lindah Kotut and D Scott McCrickard. 2022. Winds of change: Seeking, pre- serving, and retelling indigenous knowledge through self-organized online communities. In ACM CHI Conference on Human Factors in Computing Systems . [81] Dilek Küçük and Fazli Can. 2020. Stance Detection: A Survey . ACM Comput. Surv . 53, 1, Article 12 (Feb. 2020). [82] Sabina Lacmanovic and Marinko Skare. 2025. Articial intelligence bias auditing–current approaches, challenges and lessons from practice. Review of Accounting and Finance 24, 3 (2025), 375–400. [83] Michelle S Lam, Mitchell L Gordon, Danaë Metaxa, Jerey T Hancock, James A Landay , and Michael S Bernstein. 2022. End-user audits: A system empowering communities to lead large-scale investigations of harmful algorithmic behavior. Proceedings of the ACM on Human-Computer Interaction 6, CSCW (2022). [84] Jean Lave and Etienne W enger . 1991. Situated learning: Legitimate peripheral participation . Cambridge University Press. [85] Shaimaa Lazem, Danilo Giglitto, Makuochi Samuel Nkwo, Hafeni Mthoko, Jes- sica Upani, and Anicia Peters. 2022. Challenges and paradoxes in decolonising HCI: A critical discussion. Computer Supported Cooperative W ork (CSCW) 31, 2 (2022), 159–196. [86] Christopher A. Le Dantec and Sarah Fox. 2015. Strangers at the Gate: Gaining Access, Building Rapport, and Co-Constructing Community-Based Research. In ACM Conference on Computer Supp orted Cooperative W ork & Social Computing (CSCW) . Association for Computing Machinery . [87] Brandon Ledward, Brennan T akayama, and W alter Kahumoku III. 2008. Hawai- ian Cultural Inuences in Education (HCIE): ‘Ohana and Community Integration in Culture-Based Education . Technical Report. https://w ww.ksbe.edu/assets/ research/collection/08_0128_ledward.pdf Accessed on July 12, 2025. [88] Gyeong-Geon Lee and Xiaoming Zhai. 2024. Using ChatGPT for science learning: A study on pre-service teachers’ lesson planning. IEEE Transactions on Learning T echnologies 17 (2024), 1643–1660. [89] Suevon Lee. 2019. The Challenges Of Finding Hawaii’s Next Generation Of T each- ers . https://www .civilbeat.org/2019/05/the- challenges- of- nding- hawaiis- next- generation- of- teachers/ [90] Jason Edward Lewis, H ¯ emi Whaanga, and Ceyda Y olgörmez. 2025. Abundant intelligences: placing AI within Indigenous knowledge frameworks. Ai & Society 40, 4 (2025), 2141–2157. [91] Zhaoming Liu. 2025. Cultural bias in large language models: A comprehensive analysis and mitigation strategies. Journal of Transcultural Communication 3, 2 (2025), 224–244. [92] Kelly Av ery Mack, Rida Qadri, Remi Denton, Shaun K Kane, and Cynthia L Bennett. 2024. “They only care to show us the wheelchair”’: disability represen- tation in text-to-image AI models. In ACM CHI Conference on Human Factors in Computing Systems . [93] Suvradip Maitra. 2020. Articial intelligence and indigenous perspectives: Protecting and empowering intelligent human beings. In AAAI/ACM Conference on AI, Ethics, and Society (AIES) . [94] David Malo. 2021. Hawaiian A ntiquities (Moolelo Hawaii) . Mint Editions. Origi- nally published 1903. [95] Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and inter-rater reliability in qualitative resear ch: Norms and guidelines for CSCW and HCI practice. Proce edings of the ACM on Human-Computer Interaction 3, CSCW (2019). [96] Danaë Metaxa, Joon Sung Park, Ronald E. Robertson, Karrie K arahalios, Christo Wilson, Je Hancock, and Christian Sandvig. 2021. Auditing Algorithms: Un- derstanding Algorithmic Systems from the Outside In. Foundations and Trends ® in Human–Computer Interaction 14, 4 (2021), 272–344. [97] Jennifer Meyer , Thorb en Jansen, Ronja Schiller , Lucas W Liebenow, Marlene Steinbach, Andrea Horbach, and Johanna Fleckenstein. 2024. Using LLMs to bring evidence-based fe edback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Articial Intelligence 6 (2024). [98] Manas Mhasakar , Rachel Baker-Ramos, Benjamin Carter, Evyn-Bree Helekahi- Kaiwi, and Josiah Hester . 2025. “I W ould Never Trust Anything W estern”’: Kumu (Educator) Perspectives on Use of LLMs for Culturally Revitalizing CS Education in Hawaiian Schools. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA) . [99] W alter D Mignolo. 2007. Delinking: The rhetoric of mo dernity , the logic of coloniality and the grammar of de-coloniality . Cultural Studies 21, 2-3 (2007), 449–514. [100] Nusrat Jahan Mim, Dipannita Nandi, Sadaf Sumyia Khan, Arundhuti Dey , and Syed Ishtiaque Ahmed. 2024. In-Between Visuals and Visible: The Impacts of T ext-to-Image Generative AI T o ols on Digital Image-making Practices in the Global South. In ACM CHI Conference on Human Factors in Computing Systems . [101] Mirja Mittermaier , Marium M Raza, and Joseph C K vedar. 2023. Bias in AI-based models for medical applications: challenges and mitigation strategies. NPJ Digital Medicine 6, 1 (2023), 113. [102] Reza Hadi Mogavi, Chao Deng, Justin Juho Kim, Pengyuan Zhou, Y oung D K won, Ahmed Hosny Saleh Metwally , Ahmed Tlili, Simone Bassanelli, Antonio Bucchiarone, Sujit Gujar , et al . 2024. ChatGPT in education: A blessing or a curse? A qualitative study exploring early adopters’ utilization and perceptions. Computers in Human Behavior: Articial Humans 2, 1 (2024). [103] Shakir Mohamed, Marie- Therese Png, and William Isaac. 2020. De colonial AI: Decolonial theory as sociotechnical for esight in articial intelligence. Philosophy & T echnology 33, 4 (2020), 659–684. [104] Marryam Y ahya Mohammed, Sama A yman Ali, Salma Khaled Ali, Ayad Abdul Majeed, and Ensaf Hussein Mohame d. 2025. Aftina: enhancing stability and preventing hallucination in AI-based Islamic fatwa generation using LLMs and RAG. Neural Computing and Applications (2025), 1–26. [105] Jakob Mökander . 2023. Auditing of AI: Legal, ethical and technical approaches. Digital Society 2, 3 (2023), 49. [106] W armhold Jan Thomas Mollema. 2025. A taxonomy of epistemic injustice in the context of AI and the case for generative hermeneutical erasure. AI and Ethics (July 2025). [107] Luis Morales-Navarro, Y asmin Kafai, V edya Konda, and Danaë Metaxa. 2024. Y outh as peer auditors: Engaging teenagers with algorithm auditing of machine learning applications. In ACM Interaction Design and Children Conference (IDC) . [108] Sukanya Kannan Moudgalya and Sai Swaminathan. 2024. T oward Data Sovereignty: Justice-oriente d and Community-based AI Education. In Conference on Research on Equitable and Sustaine d Participation in Engineering, Computing, and T echnology (RESPECT) . [109] Adelina Moura and Ana Amélia A Car valho. 2024. T eachers’ perceptions of the use of articial intelligence in the classroom. In International Conf erence on Lifelong Education and Leadership for All (ICLEL) . [110] Jackie Ng-Osorio. 2023. Learning from the Past to Grow the Future: Of ‘ ¯ Aina- Based Education. https://ww w .hauolimauloa.org/images/%CA%BBAina- Lit- Review_FINAL_05- 03- 23.pdf Accessed on July 20, 2025. [111] T uan D Nguyen, Chanh B Lam, and Paul Bruno. 2024. What do we know about the extent of teacher shortages nationwide? A systematic examination of reports of US teacher shortages. Aera Open 10 (2024). [112] Chaima Njeh, Haïfa Nakouri, and Fehmi Jaafar . 2024. Enhancing rag-retrieval to improve llms robustness and resilience to hallucinations. In International Conference on Hybrid A rticial Intelligence Systems . Springer , 201–213. [113] Ziad Obermeyer , Brian Powers, Christine V ogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of p opulations. Science 366, 6464 (2019), 447–453. [114] Oce of Hawaiian Education (OHE). 2015. P lan for Oce of Hawaiian Educa- tion Priorities. https://www.hawaiipublicschools.org/DOE%20Forms/Hawaiian/ OHE_DeliveryPlan.pdf . Accessed: 2025-07-23. [115] Victor Ojewale, Ryan Steed, Briana V ecchione, Abeba Birhane, and Inioluwa Deb- orah Raji. 2025. T owards AI accountability infrastructure: Gaps and opp ortuni- ties in AI audit tooling. In A CM CHI Conference on Human Factors in Computing Systems . [116] Luci Pangrazio. 2024. Data harms: The evidence against education data. Post- digital Science and Education 6, 4 (2024), 1049–1054. [117] Christina Ioanna Pappa, Despoina Georgiou, and Daniel Pittich. 2024. T echnol- ogy education in primary schools: addressing teachers’ perceptions, perceived barriers, and needs. International Journal of T e chnology and Design Education 34, 2 (2024), 485–503. [118] Babette Park, T erri Flowerday , and Roland Brünken. 2015. Cognitive and aec- tive eects of seductive details in multimedia learning. Computers in Human Behavior 44 (2015), 267–278. [119] Andrea G Parker and Rebecca E Grinter . 2014. Collectivistic health promotion tools: Accounting for the relationship b etween culture, food and nutrition. International Journal of Human-Computer Studies 72, 2 (2014), 185–206. [120] Sachin R Pendse, Daniel Nkemelu, Nicola J Bidwell, Sushrut Jadhav , Soumitra Pathare, Munmun De Choudhury , and Neha Kumar . 2022. From treatment to healing: envisioning a decolonial digital mental health. In ACM CHI Conference on Human Factors in Computing Systems . [121] Maneesha Perera, Rajith Vidanaarachchi, Sangeetha Chandrashekeran, Melissa Kennedy , Brendan Kennedy , and Saman Halgamuge. 2025. Indigenous peoples and articial intelligence: A systematic review and future directions. Big Data & Society 12, 2 (2025), 20539517251349170. Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain [122] Claudio S Pinhanez, Paulo R Cavalin, Marisa V asconcelos, Julio Nogima, et al . 2023. Balancing So cial Impact, Opportunities, and Ethical Constraints of Us- ing AI in the Documentation and Vitalization of Indigenous Languages. In International Joint Conference on Articial Intelligence (IJCAI) . [123] Grin Pitts, Viktoria Marcus, and Sanaz Motamedi. 2025. Student Persp ectives on the Benets and Risks of AI in Education. In ASEE A nnual Conference & Exposition . [124] Maureen K. Porter and Nik Cristobal. 2018. Cultivating Aloha ‘ ¯ Aina Through Critical Indigenous Pedagogies of Place. Journal of Folklore and Education (2018). https://jfepublications.org/article/cultivating- aloha- aina/ Accessed on July 20, 2025. [125] Snehal Prabhudesai, Ananya Prashant Kasi, Anmol Mansingh, Anindya Das An- tar , Hua Shen, and Nikola Banovic. 2025. “Here the GPT made a choice, and every choice can be biased”’: Ho w Students Critically Engage with LLMs through End- User A uditing Activity . In ACM CHI Conference on Human Factors in Computing Systems . [126] Rida Qadri, Mark Diaz, Ding W ang, and Michael Madaio. 2025. The Case for “Thick Evaluations”’ of Cultural Representation in AI. In AAAI/ACM Conference on AI, Ethics, and Society (AIES) . [127] Hazur Rahaman, Michelle Johnston, and Erik Champion. 2021. A udio- augmented arboreality: wildowers and language. Digital Creativity 32, 1 (2021), 22–37. [128] Inioluwa Deborah Raji, Andrew Smart, Reb ecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI accountability gap: Dening an end-to-end frame- work for internal algorithmic auditing. In ACM Conference on Fairness, Account- ability , and Transparency (FA ccT) . [129] T asha Riley , Troy Meston, Chesley Cutler, Samantha Low-Choy , Brittany A McCormack, Eun-Ji Amy Kim, Sonal Nakar , and Daniela V asco. 2024. W eaving stories of strength: Ethically integrating Indigenous content in teacher education and professional development programmes. T eaching and Teacher Education 142 (2024), 104513. [130] Pedro Saleiro, Benedict Kuester , Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T . Rodolfa, and Rayid Ghani. 2019. Aequitas: A Bias and Fairness Audit T oolkit. arXiv:1811.05577 [cs.LG] https://ar xiv .org/abs/1811.05577 [131] Anne Salmond. 2004. The Trial of the Cannibal Dog: Captain Cook in the South Seas . Penguin Books, London. [132] Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Ce dric Langbort. 2014. A uditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry 22, 2014 (2014), 4349–4357. [133] Neil Selwyn, Marita Ljungqvist, and Anders Sonesson. 2025. When the prompt- ing stops: Exploring teachers’ work around the educational frailties of generative AI tools. Learning, Media and Technology (2025), 1–14. [134] Laleh Seyyed-Kalantari, Haoran Zhang, Matthe w BA McDermott, Irene Y Chen, and Marzyeh Ghassemi. 2021. Underdiagnosis bias of articial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine 27, 12 (2021), 2176–2182. [135] Farhana Shahid and Aditya V ashistha. 2023. Decolonizing content moderation: does uniform global community standard resemble utopian equality or western power hegemony?. In ACM CHI Conference on Human Factors in Computing Systems . [136] Shikhar Sharma, Manas Mhasakar , Apur v Mehra, Utkarsh V enaik, Ujjwal Sing- hal, Dhruv Kumar , and Kashish Mittal. 2024. Comuniqa: Exploring large lan- guage models for improving english speaking skills. In ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMP ASS) . [137] Andrew Shaw, Andre Y e, Ranjay Krishna, and Amy Zhang. 2025. Agonistic Image Generation: Unsettling the Hegemony of Intention. In Procee dings of the 2025 ACM Conference on Fairness, Accountability , and Transparency . 438–463. [138] Sarah B Shear , Ryan T Knowles, Gregory J Soden, and Antonio J Castro. 2015. Manifesting destiny: Re/presentations of indigenous peoples in K–12 US history standards. The ory & Research in So cial Education 43, 1 (2015), 68–101. [139] Renee Shelby, Shalaleh Rismani, and Negar Rostamzadeh. 2024. Generative AI in Creative Practice: ML- Artist Folk Theories of T2I Use, Harm, and Harm- Reduction. In ACM CHI Conference on Human Factors in Computing Systems . [140] Hong Shen, Alicia De V os, Motahhare Eslami, and K enneth Holstein. 2021. Every- day algorithm auditing: Understanding the power of everyday users in surfacing harmful algorithmic behaviors. Proceedings of the ACM on Human-Computer Interaction 5, CSCW (2021). [141] Eugenia Siapera. 2022. AI content moderation, racism and (de) coloniality . International journal of bullying prevention 4, 1 (2022), 55–65. [142] Mona Sloane , Emanuel Moss, Olaitan A womolo, and Laura Forlano. 2022. Partic- ipation Is not a Design Fix for Machine Learning. In ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAMMO) . New Y ork, NY, USA. [143] William A Smith, T ara J Y osso, and Daniel G Solórzano. 2011. Challenging racial battle fatigue on historically White campuses: A critical race examination of race-related stress. Covert racism 32 (2011), 211–237. [144] Jaemarie Solyst, Cindy Peng, W esley Hanwen Deng, Praneetha Pratapa, Amy Ogan, Jessica Hammer , Jason Hong, and Motahhare Eslami. 2025. Investigating Y outh AI Auditing. In ACM Conference on Fairness, Accountability , and Trans- parency (F AccT) . [145] Isabella Spano and Yuxiao Zhang. 2025. Indigenous data sovereignty in intangi- ble cultural heritage governance: A complementary approach to public–private partnerships. International Journal of Cultural Property (2025). [146] State of Hawaii Legislative Reference Bureau. n.d.. Hawaii Constitutional Con- vention Studies 1978. https://lrb.hawaii.gov/hawaii- constitutional- convention- studies- 1978/. Accessed on August 25, 2025. [147] NarayanKripa Sundararajan and Olusola Adesope. 2020. Keep it coherent: A meta-analysis of the seductive details eect. Educational Psychology Review 32, 3 (2020), 707–734. [148] Latanya Sweeney . 2013. Discrimination in online ad delivery . Commun. ACM 56, 5 (2013), 44–54. [149] Kim T allBear . 2019. Caretaking relations, not American dreaming. Kalfou: A Journal of Comparative and Relational Ethnic Studies 6, 1 (2019), 24–41. [150] Y an Tao, Olga Viberg, Ryan S Baker , and René F Kizilce c. 2024. Cultural bias and cultural alignment of large language models. PNAS Nexus 3, 9 (2024). [151] Honor the Earth. 2026. No Data Centers on Native Land: Campaign Overview . https://www.honor earth.org/stopdatacolonialism [152] Deepa Tilwani, Revathy V enkataramanan, and Amit P Sheth. 2024. Neurosym- bolic ai approach to attribution in large language mo dels. IEEE Intelligent Systems 39, 6 (2024), 10–17. [153] Emily T seng, Meg Y oung, Marianne Aubin Le Quéré, Aimee Rinehart, and Harini Suresh. 2025. “Ownership, Not Just Happy Talk”’: Co-Designing a Participa- tory Large Language Model for Journalism. In ACM Conference on Fairness, Accountability , and Transparency (F AccT) . [154] U.S. Census Bureau. 2010. Hawaii — Guide to 2010 State and Local Census Ge- ography . https://ww w .census.gov/geographies/reference- les/2010/geo/state- local- geo- guides- 2010/hawaii.html. [155] Jacky Vallée . 2018. Eurocentrism in the curriculum: A barrier to Indigenous student success. (2018). [156] Rama Adithya V aranasi and Nitesh Goyal. 2023. “It is currently hodgepodge”: Examining AI/ML practitioners’ challenges during co-production of responsible AI values. In ACM CHI Conference on Human Factors in Computing Systems . [157] T ania Villalob os Lujan, Pratim Sengupta, Derya Akbaba, Soa Alessio-Robles, Amelia Lee Dogan, Monica Meltis, Firuzeh Shokooh V alle, and Lora Oehlberg. 2025. Interaction Design as a Form of Decolonial Care. In Conference on Cre- ativity and Cognition (C&C) . [158] Jingtan W ang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan-Sheng Foo, See-Kiong Ng, and Br yan Kian Hsiang Low . 2024. Source Attribution for Large Language Model-Generated Data. arXiv:2310.00646 [cs.LG] https://arxiv .org/ abs/2310.00646 [159] Athena W en, T anush Patil, Ansh Saxena, Yicheng Fu, Sean O’Brien, and Kevin Zhu. 2025. F AIRE: Assessing Racial and Gender Bias in AI-Driven Resume Evaluations. arXiv preprint arXiv:2504.01420 (2025). [160] Apryl A Williams, Zaida Bryant, and Christopher Carvell. 2019. Uncompensated emotional labor , racial battle fatigue, and (in) civility in digital spaces. So ciology Compass 13, 2 (2019). [161] K yra Wilson and Aylin Caliskan. 2024. Gender , race, and intersectional bias in resume screening via language model retrieval. In AAAI/ACM Conference on AI, Ethics, and Society (AIES) , V ol. 7. [162] Langdon Winner. 1980. Do Artifacts Have Politics? Daedalus 109, 1 (1980), 121–136. http://w ww .jstor .org/stable/20024652 [163] Richmond Y W ong and V era Khovanskaya. 2018. Speculative design in HCI: from corporate imaginations to critical orientations. In New Directions in Third W ave Human-Computer Interaction: V olume 2-Methodologies . Springer , 175–202. [164] Marisol W ong- Villacres, Adriana Alvarado Garcia, Juan F Maestre, Pedro Reynolds-Cuéllar , Heloisa Candello, Marilyn Iriarte, and Carl DiSalvo. 2020. Decolonizing learning spaces for sociotechnical research and design. In Com- panion Publication of the 2020 Conference on Computer Supp orted Cooperative W ork and Social Computing . 519–526. [165] Marisol W ong- Villacres, Adriana Alvarado Gar cia, and Javier Tibau. 2020. Re- ections from the classroom and beyond: Imagining a decolonized hci education. In Extende d Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA) . [166] Marisol W ong- Villacres, Carl DiSalvo, Neha Kumar , and Betsy DiSalvo. 2020. Culture in action: Unpacking capacities to inform assets-based design. In ACM CHI Conference on Human Factors in Computing Systems . [167] Lixiang Y an, Lele Sha, Linxuan Zhao, Y uheng Li, Roberto Martinez-Maldonado, Guanliang Chen, Xinyu Li, Y ueqiao Jin, and Dragan Gašević. 2024. Practical and ethical challenges of large language models in e ducation: A systematic scoping review . British Journal of Educational T echnology 55, 1 (2024), 90–112. [168] D Lilinoe Y ong and Ellen S Homan. 2014. T eacher T e chnology Narratives: Native Hawaiian Views on Education and Change. Qualitative Report 19 (2014), 16. CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. [169] Jason C. Y oung. 2019. The New Knowledge Politics of Digital Colonialism. Environment and Planning A: Economy and Space 51, 7 (Oct. 2019), 1424–1441. [170] Meg Y oung, Lassana Magassa, and Batya Friedman. 2019. T oward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information T echnology 21, 2 (2019), 89–103. [171] Chunpeng Zhai, Santoso Wibowo, and Lily D Li. 2024. The eects of over- reliance on AI dialogue systems on students’ cognitive abilities: a systematic review . Smart Learning Environments 11, 1 (2024), 28. [172] Lili Zhang, Xi Liao , Zaijia Y ang, Baihang Gao, Chunjie W ang, Qiuling Y ang, and Deshun Li. 2024. Partiality and misconception: Investigating cultural represen- tativeness in text-to-image models. In A CM CHI Conference on Human Factors in Computing Systems . [173] Tianyang Zhong, Zhenyuan Y ang, Zhengliang Liu, Ruidong Zhang, Yiheng Liu, Haiyang Sun, Yi Pan, Yiwei Li, Yifan Zhou, Hanqi Jiang, Junhao Chen, and Tianming Liu. 2024. Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research. arXiv:2412.04497 [cs.CL] https://arxiv .org/abs/2412.04497 [174] Marta Ziosi and Dasha Pruss. 2024. Evidence of what, for whom? the socially contested role of algorithmic bias in a predictive policing tool. In ACM Conference on Fairness, Accountability , and Transparency (FAccT) . [175] Leying Zou and W arut Khern-am nuai. 2023. AI and housing discrimination: The case of mortgage applications. AI and Ethics 3, 4 (2023), 1271–1281. A Appendix A.1 Study Materials W e provide the materials use d during our co-design workshops including the design exercise prompts and survey questions. A.1.1 Design Exercises. In Fig. 7, we provide the prompts for our two design exer cises: rapid prototyping and storyboarding. After the design exercise, participants wer e invited to share out with the group about designs they created. W e asked the following questions to scaold the sharing process: (1) What is the purpose of the tool? What types of biases does it aim to audit? (2) How would y ou use the tool? (3) What did you prioritize when designing this tool? A.1.2 Survey estions. At the conclusion of the workshop, par- ticipants were invited to complete the following survey . Questions are shown in Fig. 8 and 9. A.2 Codebo ok W e pro vide the codebook produced through our analysis process in T able 1. Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain 1 1 2 2 St oryboar ding Ex er cise R apid D esign Ex er cise Figure 7: W orkshop participants completed one of two design exercises. For participants in W orkshops 1 and 2, they took part in a rapid design exercise (top) focused on ideating many divergent designs for auditing tools. For participants in W orkshops 3 and 4, in groups of 2-3, they completed a stor yboarding exercise for a single design (bottom). CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. 1 1 2 2 3 3 4 4 4 4 5 5 6 6 7 7 8 8 8 8 9 9 9 9 10 10 Figure 8: Online Qualtrics sur vey that participants to ok after the co-design workshop (Part 1). Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing T ools with Educators in Hawai‘i CHI ’26, April 13–17, 2026, Barcelona, Spain 11 11 15 15 16 16 17 17 18 18 12 12 13 13 14 14 Figure 9: Online Qualtrics sur vey that participants to ok after the co-design workshop (Part 2). CHI ’26, April 13–17, 2026, Barcelona, Spain Zhao et al. Code Count Concerns about Generative AI Coherence (e .g., alignment to Hawaiian culture) 6 Connotations (e.g., Hawaiian cultur e being treated as part of history) 6 Incorrectness and Hallucinations 19 Irrelevance of Outputs to Hawaiian Culture 6 Only surface level repr esentation of Hawaiian culture 13 Erasure of Cultural Elements 11 Skewedness of Cultural Depiction 5 Not representing the diversity of Hawai‘i 6 Specicity of Cultural Elements 7 Challenges with identifying harms Disagreements about what is biased 11 Lack of data and sources about Hawaiian culture 9 Need community knowledge to do auditing / identify harms 20 No ground-truth to compare against 5 Lack of self-condence when identifying bias 7 Factors inuencing design Bigger picture issues beyond repr esentational bias 5 Grade-level and subject matter inuence needs 4 Importance of social support for e ducators 6 Accounting for the positionality of auditors 5 Being able to trust sources and vet reliability of information 37 Scarcity of data 4 Integrating Hawaiian V alues 4 Kilo (learning via observation) 2 Knowledge Provenance 8 T ensions in design A utomation vs. Reection 9 Cultural specicity vs. Unawareness 9 Data Sharing vs. Co-option 3 Eciency vs Quality 3 Stereotyping vs Personalization 7 A uditing T ool Ideas Perspective auditing 14 A uditing the prompt or person, not the output 2 Comparing across models and outputs 9 Source auditing 22 Flagging bias 3 T able 1: Codeb ook with themes and sub-codes surfaced through reexive thematic analysis describ ed in Sec. 4.3 .

Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing Tools with Educators in Hawai`i

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment