AI Model Modulation with Logits Redistribution

AI Model Modulation with Logits Re distribution Zihan W ang ∗ † University of Queensland, CSIRO’s Data61 Brisbane, A ustralia Zhongkui Ma ∗ University of Queensland Brisbane, A ustralia Xinguo Feng University of Queensland Brisbane, A ustralia Zhiyang Mei University of Queensland Brisbane, A ustralia Ethan Ma University of Queensland Brisbane, A ustralia Derui W ang CSIRO’s Data61 Melbourne, A ustralia Minhui Xue CSIRO’s Data61 Adelaide, A ustralia Guangdong Bai ‡ University of Queensland Brisbane, A ustralia Abstract Large-scale models are typically adapted to meet the diverse require- ments of model owners and users. Howev er , maintaining multiple specialized versions of the model is inecient. In response, we propose Aim , a novel model modulation paradigm that enables a single model to exhibit diverse behaviors to meet the specic end requirements. Aim enables two key modulation modes: utility and focus modulations. The former pro vides model owners with dy- namic control over output quality to deliv er varying utility levels, and the latter oers users precise control to shift model’s focused input features. Aim introduces a logits redistribution strategy that operates in a training data-agnostic and retraining-free manner . W e establish a formal foundation to ensure Aim ’s regulation capability , based on the statistical properties of logits ordering via joint prob- ability distributions. Our evaluation conrms Aim ’s practicality and versatility for AI model modulation, with tasks spanning im- age classication, semantic segmentation and text generation, and prevalent architectures including ResNet, SegFormer and Llama. CCS Concepts • Computing methodologies → Machine learning ; Articial intelligence . Ke ywords Neural Networks, Model Modulation, Usage Control A CM Reference Format: Zihan W ang, Zhongkui Ma, Xinguo Feng, Zhiyang Mei, Ethan Ma, Derui W ang, Minhui Xue, and Guangdong Bai. 2025. AI Model Mo dulation with Logits Redistribution. In Proceedings of the A CM W eb Conference 2025 (WW W ’25), A pril 28-May 2, 2025, Sydney , NSW , Australia. ACM, New Y ork, NY, USA, 11 pages. https://doi.org/10.1145/3696410.3714737 ∗ Equal contribution. † Zihan W ang is supported by the Google PhD Fellowship. ‡ Guangdong Bai is the corresponding author . Email: g.bai@uq.edu.au. This work is licensed under a Creative Commons Attribution 4.0 International License. WW W ’25, Sydney , NSW , Australia © 2025 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-1274-6/2025/04 https://doi.org/10.1145/3696410.3714737 1 Introduction Deep neural networks (DNNs) have rev olutionized various indus- tries such as healthcar e [ 17 ], nance [ 5 ], autonomous vehicles [ 33 ], and natural language processing [ 21 ], enabling signicant break- throughs in tasks like image recognition [ 20 ], semantic segmen- tation [ 27 ], and language translation [ 41 ]. Despite their success, the development of high-quality models demands e xtensive com- putational resources, massive datasets, and substantial nancial investment. This has restricted large-scale training to organizations with the necessary infrastructure, as seen with GPT -3 [ 3 ], which comprises 175 billion parameters and takes 355 GP U-years and $4.6M for a single training run [6, 37, 43, 49]. While the AI community continues to push the boundary of model performance in complex tasks, a critical challenge in the new era of AI rev olves around managing the intellectual property of established models and adapting them to meet diverse nee ds of downstream tasks. Spe cically , for model owners, the ability to maintain controllability is paramount, which enables them to deploy and customize mo dels for dierent market segments and operational environments with varying business goals. For model users, they seek adaptability , desiring models that can adjust their behavior to suit individual preferences and contextual ne eds. These demands are illustrated by two typical scenarios presented below: Scenario #1 (model owners) . An online ser vice provider oers dierent service tiers. Free-tier users receive low er-quality outputs, such as reduced resolution or basic code suggestions. Premium users, however , get enhanced results with higher quality and ad- ditional features. Real-world examples include cutout.pro [ 1 ] and together .ai [ 2 ], which provide models with free low utility options or varying capabilities at dierent price points. Scenario #2 (users) . Individual users interacting with AI sys- tems, such as driving assistance platforms, often seek adaptability in the model’s behavior to suit their preferences [ 11 , 26 , 32 , 50 ]. For instance, one driver may prioritize highlighting vehicles on the road, while another may emphasize detecting pe destrians. Such personalization has been oered in advanced driver assistance systems (AD AS) [ 11 ] to match individual driving styles like as- sertive or defensive driving, which impro ves user comfort and acceptance [13, 32]. WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Zihan W ang et al. Traditional te chniques such as early exit [ 23 , 38 , 47 , 52 ] and ne-tuning [ 16 , 31 ] can be employed to control model utility or adapt established models to spe cic tasks or constraints. Early exit introduces intermediate exit points at dierent layers within a neu- ral network, allowing early termination of inference for faster but potentially less accurate predictions. However , implementing early exit requires architectural modications, which may not always b e feasible due to limited model accessibility and can complicate inte- gration and maintenance. Fine-tuning adjusts a pre-trained model to a ne w task by retraining it on a smaller , task-spe cic dataset. Nev- ertheless, ne-tuning requires access to training data and involves retraining or additional optimization steps [ 4 , 16 , 25 , 31 , 42 ]. Even though both techniques can produce multiple tailored versions, managing these versions across a large user base is impractical, as the cost of maintaining consistency and applying updates across versions is pr ohibitively high [ 44 ]. These limitations underscor e the need for a exible, lightweight approach that allows modulation of the model’s usage without retraining or altering the model’s architecture. Our work . In this work, we propose a novel paradigm of model modulation that enables a single model to exhibit div erse behaviors, so as to satisfy the requirements of dierent utility levels or dier- ent feature focuses with a single model. This paradigm holds broad applicability in modern AI deployment, where contr olling model utility levels or adjusting model prioritization is critical, such as in machine learning as a service (MLaaS) [ 34 ] and on-device deploy- ment [ 30 ]. Ideally , the model modulation gets rid of the necessity of altering the underlying model parameters or architecture , and introduces controlled adjustments to the model’s responses. The core research question of model modulation lies in how to dynami- cally adapt the performance and b ehavior of a single mo del without the burden of retraining or maintaining multiple separate versions . W e introduce Aim ( AI M odulator) as an appr oach to model mod- ulation. It supports two types of modulation modes: utility and focus modulations. Utility modulation makes the mo del output de- viate from the original output, which is useful in scenarios where restricted responses are desir ed (Scenario #1) . Focus modulation aims to make the model more responsive to specic areas of in- terest, which is helpful for subsystems of safety-critical systems to anticipate specic potential hazards (Scenario #2) . The chal- lenges to address by Aim are at least twofold. For mo del owners seeking controllability , it is imp ortant to ensure that utility mo d- ulation preserves the model’s core knowledge so that, even when output quality is reduced, the outputs remain meaningful ( e.g., large language models should always deliver coherent outputs across utility levels) and do not compromise the model’s integrity (Chal- lenge #1) . For users desiring adaptability , balancing the trade-o in focus modulation b etween emphasizing sp ecic inputs (such as prioritizing certain features in AD AS) and maintaining ov erall performance is essential, as too much intervention would aect the model’s eectiveness in other areas (Challenge #2) . T o maintain the model’s core knowledge (Challenge #1) , Aim avoids altering feature-learning structures within the model. In- stead, it directly operates on and strategically adjusts the model’s logits to transform the original netw ork ( denoted as 𝑓 ∗ ) into a mod- ulated network (denoted as 𝑓 𝜖 ) that exhibits target b ehaviors. In particular , it incorporates a control function Λ that redistributes the model’s logits by adjusting their values according to sp ecic probability distributions. This allows for ne-grained control while maintaining the model’s integrity . Besides enabling model own- ers to oer var ying utility tiers, this granular approach strikes a balance between responsiveness to spe cic features and overall per- formance. This exibility allows users to tailor the model’s b ehavior to their needs, enhancing responsiveness without compromising the model’s overall eectiveness ( Challenge #2) . Since logits serve as a common intermediate r epresentation acr oss architectures, Aim operates as a training data-agnostic and retraining-free process by directly modifying o-the-shelf trained networks, making it well-suited for seamless integration across diverse neural network architectures. W e pr ovide a r obust formal foundation as the theoretical guaran- tee of Aim ’s eectiveness. Its core is to establish a direct relationship between the model’s behavior pre- and post- logits redistribution. By analyzing the statistical properties of logits thr ough joint proba- bility distributions, we quantify how controlled interventions ae ct their distribution and order . Our formal analysis ensur es that, given specic conditions on the logits’ distribution, the probability of achieving a desired modulation outcome can be precisely controlled. This formalization lays the groundwork for a probabilistic analysis of model behavior , oering a solid formal foundation for model modulation. W e conduct extensive evaluations acr oss a wide range of appli- cation domains and model architectures to validate Aim . Our eval- uation spans image classication, semantic segmentation, and text generation, utilizing prevalent deep neural netw ork architectures such as ResNet-56 [ 14 ], SegFormer-B2 [ 48 ], and Llama-3.1-8B [ 40 ]. Through utility modulation, Aim successfully provides model own- ers with ne-grained contr ol over model b ehavior across all settings. Aim ’s focus modulation, on the other hand, signicantly enhances the model’s ability to prioritize key features without compr omising overall performance. For example , in an autonomous driving task, Aim yields substantial improvement in the pedestrian segmenta- tion accuracy of a model that is originally trained to be focused on vehicle recognition. These experimental results validate that our method is practical, versatile, and broadly applicable across dierent AI systems and real-world scenarios, eectively meeting the diverse needs of both model owners and users. Contributions . Our main contributions are: • A new problem formulation of AI model modulation. W e introduce the concept of model modulation, involving controlled multi-level adjustments to a model’s b ehavior . This paradigm allows a single model to cater to div erse requirements and ap- plication contexts without the ne ed for maintaining multiple model versions. • A generic mo dulation approach. Aim is the rst practical schema for AI model modulation, featured by its lightweight, data-agnostic, and retraining-free attributes. It supports two modulation modes of utility and focus modulations. Aim e- ciently modulates the model’s output by r edistributing the logits through a control function that adjusts them accor ding to spe- cic probability distributions. • A formal framew ork and theoretical analysis. W e provide a robust theoretical framework for analyzing the impact of noise AI Model Modulation with Logits Redistribution WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia on the ordering of logits in neural networks. This formalization enables a systematic and probabilistic approach to model modu- lation, oering new insights into how controlled noise aects the logits’ distribution and their ranking. • Extensive empirical evaluation. W e implement Aim and val- idate its eectiveness across various application domains, in- cluding image classication, semantic segmentation, and text generation, using prevalent neural network ar chitectures such as ResNet, SegFormer , and Llama. Our results demonstrate that Aim oers ne-graine d control for model owners while enhanc- ing feature prioritization for users, all without compromising overall performance. A vailability . Our co de is publicly available at: https://github.com/ UQ- Trust- Lab/AIM/. 2 Problem Formulation In this section, we introduce the preliminaries regarding neural networks (Section 2.1) to facilitate the understanding of our work. W e then discuss the specic challenges associate d with managing and adapting trained models (Section 2.2) and formally dene the concept of model modulation (Section 2.3). 2.1 Deep Neural Networks Deep neural networks (DNNs) are computational mo dels comp osed of multiple layers that transform input data into outputs through learned weights and activation functions. They hav e achieved re- markable success in various domains by ee ctively modeling com- plex patterns and relationships in data [ 22 ]. Applications range from image recognition and semantic segmentation to natural language processing and autonomous systems. Formally , a DNN can b e represented as a function 𝑓 : R 𝑚 → R 𝑛 , mapping an input vector 𝑥 ∈ R 𝑚 to an output vector 𝑦 ∈ R 𝑛 . Each layer in the network performs a linear transformation followed by a non-linear activation, allowing the network to captur e intricate features through multiple lev els of abstraction [29]. Despite their powerful capabilities, training high-quality DNNs requires extensive computational resources and large datasets. The complexity and resource intensity of this process have led to a concentration of development within organizations that possess substantial infrastructure [ 35 ]. This situation underscores the im- portance of eciently utilizing trained models and nding ways to adapt them to various needs without incurring the high costs of retraining. 2.2 Motivation Adapting DNNs to meet diverse requirements is a major challenge in AI deployment. Model owners need controllability to adjust models for various contexts without retraining, while users seek adapt- ability to tailor models to their nee ds. Howev er , several challenges hinder these objectives: • Inexibility : Once optimized for spe cic tasks, trained models lack the inherent e xibility to adjust to new contexts or business needs. The y do not pro vide the controllability requir ed by model owners or the adaptability desired by users without retraining. • Limitations of Traditional Adaptation Approaches : Meth- ods like ne-tuning require access to original training data and substantial resources [ 16 ], while techniques like early exits de- mand architectural modications, which are often constrained by model accessibility [23]. • Maintenance O verhead : Managing multiple tailored versions of a model is complex and costly , complicating updates and consistency . • Performance Trade-os : Emphasizing specic features can degrade overall performance, making it dicult to maintain balance without retraining. These challenges highlight the need for a exible and ecient ap- proach that allo ws a single mo del to adjust its b ehavior dynamically without retraining. 2.3 Dening Model Modulation Model modulation is a paradigm designed to enable controlled adjustments to the b ehavior of a traine d network, allowing it to meet var ying requirements without retraining or modifying its architecture. Specically , for a trained neural network 𝑓 ∗ , model modulation applies a control function Λ parameterized by 𝜖 . This function adjusts the model’s output to produce a modulate d model 𝑓 𝜖 , dened as 𝑓 𝜖 ( 𝑥 ) = Λ ( 𝑓 ∗ ( 𝑥 ) , 𝜖 ) , where 𝜖 represents the modulation parameters controlling the ad- justments, depending on the type of modulation. W e formalize two primary modes of modulation: utility modu- lation and focus modulation, each designed to address the spe cic conditions for both model owners and users. 2.3.1 Utility Modulation. The objective of utility modulation is to enable model owners to control the utility lev el of the model’s outputs while preserving the core knowledge embedde d within the mo del. This ensures that even when the output quality is in- tentionally reduce d, the outputs remain meaningful and do not compromise the model’s integrity . Specically , utility modulation aims to ensure that the p erfor- mance of the modulated model 𝑓 𝜖 decreases in a predictable and controlled manner as 𝜖 increases. Formally , given a performance metric 𝑀 and two constants 𝜖 1 and 𝜖 2 , we require 𝑀 ( 𝑓 𝜖 1 ) ≤ 𝑀 ( 𝑓 𝜖 2 ) , ∀ 𝜖 1 ≥ 𝜖 2 ≥ 0 . Meanwhile, to ensure the integrity of the modulation process, we further impose the condition | 𝑀 ( 𝑓 𝜖 1 ) − 𝑀 ( 𝑓 𝜖 2 ) | < Δ ( 𝛿 ) , ∀| 𝜖 1 − 𝜖 2 | ≤ 𝛿 , where 𝛿 and Δ ( 𝛿 ) are small constants, with Δ ( 𝛿 ) being a function of 𝛿 . This guarante es gradual and ne-grained control over the model’s utility , enabling precise adjustments to its performance. 2.3.2 Focus Modulation. Focus mo dulation enables users to em- phasize specic features or classes without signicantly aecting the model’s overall performance. This allows the mo del to be more responsive to areas of interest while maintaining eectiveness in other areas. Specically , it aims for the performance of the modulated model 𝑓 𝜖 to maintain stable overall performance under the metric 𝑀 while enhancing a sp ecied metric 𝐸 as 𝜖 increases. Formally , for any two WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Zihan W ang et al. given constants 𝜖 1 and 𝜖 2 , we require | 𝑀 ( 𝑓 𝜖 1 ) − 𝑀 ( 𝑓 𝜖 2 ) | ≤ Δ ∧ 𝐸 ( 𝑓 𝜖 1 ) ≥ 𝐸 ( 𝑓 𝜖 2 ) , ∀ 𝜖 1 ≥ 𝜖 2 ≥ 0 , where Δ is a small constant repr esenting acceptable performance deviation. 3 Our Approach – Aim Given the objective to modulate the model’s output to align with varying user nee ds and application scenarios, a natural question arises: where should this adjustment take place? W e propose logits re- distribution as the most direct and eective p oint of intervention, as logits r epresent the nal decision stage of the model. This approach enables ne-graine d control over the model’s behavior without altering its underlying structure. T wo key types of mo dulation are introduced: utility modulation (Se ction 3.2), which adjusts the out- put quality , and focus modulation (Se ction 3.3), which enhances the model’s attention to specic features of inputs. 3.1 Logits Redistribution 3.1.1 Model Logits. The logits, which are the raw scores generated just before the nal output probabilities, are the primary determi- nants of a model’s decisions. The y encapsulate the learned featur es and internal condence levels acr oss dierent outcomes, ultimately dictating how predictions are ranked. Even minor modications to the logits can signicantly impact the model’s nal output, making them an ideal point for implementing controlled adjustments. By conceptualizing the neural network as comprising two comp o- nents, i.e . , the featur e extractor before the logits and the pr obability mapper after , the logits emerge as the most direct and ee ctive point for modulation. Formally , let 𝑓 1 denote the function mapping the input 𝑥 to the logits ˆ 𝑦 , and 𝑓 2 represent the function that maps ˆ 𝑦 into the nal output 𝑦 . The overall network can be expressed as 𝑓 = 𝑓 2 ◦ 𝑓 1 , where ˆ 𝑦 = 𝑓 1 ( 𝑥 ) and 𝑦 = 𝑓 2 ( ˆ 𝑦 ) . 3.1.2 Logits Redistribution. Based on this insight, Aim introduces a control function Λ : R 𝑛 → R 𝑛 that directly operates on the logits to modulate the model’s output. The mo dulated logits are obtained as ˆ 𝑦 ′ = Λ ( ˆ 𝑦 ) , and the overall network becomes 𝑓 = 𝑓 2 ◦ Λ ◦ 𝑓 1 , where 𝑓 1 extracts features from the input, Λ modulates the logits, and 𝑓 2 maps these modulated logits to the nal output. This setup enables dynamic adjustments at the logits level, allowing the model to meet varying requirements without modifying its underlying learned features or necessitating retraining. Our framework applies the contr ol function Λ to introduce tar- geted shifts to the logits by adding noise sampled from spe cic statistical distributions or by applying deterministic adjustments. Formally , we adjust the logits as ˆ 𝑦 ′ = Λ ( ˆ 𝑦 ) , which inuences the model’s output probabilities while preser ving the internal feature repr esentations and decision logic. This exible, lightweight approach to model modulation eectively serves the needs of both model owners and users. As illustrated in Figur e 1, Aim ’s logit redistribution is visualized through b ell-shaped colored L o g i t s No i s e L o g i t s R e d i s t r i b u t i o n L a b e l P r o b a b i l i t y Figure 1: An illustration of Aim ’s logits redistribution. regions repr esenting the probability distributions of added noise , with color intensity reecting the magnitude of the original log- its. The noise perturbs the local ordering of the logits, enabling controlled adjustments to the model’s output. 3.2 Utility Modulation Utility modulation caters to the requirements of model owners who wish to oer dierent ser vice tiers or control the utility of the model’s outputs. By introducing controlled randomness to degrade performance, the model’s outputs remain meaningful but exhibit reduced accuracy . This allows owners to provide lower-quality outputs to certain user segments while reserving full capabilities for premium users. 3.2.1 Definition. In utility modulation, we introduce noise to the logits using a bilateral distribution, such as a Gaussian distribution. The modulation is dened as Λ ( ˆ 𝑦 𝑖 ) = ˆ 𝑦 𝑖 + 𝜖 𝑖 , 𝜖 ∼ N ( 0 , 𝜎 2 ) , where 𝜖 𝑖 is noise sampled indep endently for each logit ˆ 𝑦 𝑖 . By ad- justing the standard deviation 𝜎 , mo del owners can control the degree of utility degradation, with higher noise levels leading to lower-quality outputs. 3.2.2 A nalysis. T o quantify the impact of noise on the mo del’s predictions, we analyze the probability that the ordering of the logits remains unchange d after adding noise, which implies that the model maintains its top prediction. Theorem 1. Let ˆ 𝑦 = ( ˆ 𝑦 1 , ˆ 𝑦 2 , . . . , ˆ 𝑦 𝑛 ) be a vector of logits with an ordering ˆ 𝑦 𝜏 1 ≤ ˆ 𝑦 𝜏 2 ≤ · · · ≤ ˆ 𝑦 𝜏 𝑛 , where 𝜏 is a permutation of 1 , 2 , . . . , 𝑛 . Let 𝜖 = ( 𝜖 1 , 𝜖 2 , . . . , 𝜖 𝑛 ) be a vector of i.i.d. Gaussian random variables 𝜖 𝑖 ∼ N ( 0 , 𝜎 2 ) . Dene the perturbed logits as ˆ 𝑦 ′ = ˆ 𝑦 + 𝜖 . The probability that the ordering of the logits remains unchanged after perturbation is Pr  ˆ 𝑦 ′ 𝜏 1 ≤ ˆ 𝑦 ′ 𝜏 2 ≤ · · · ≤ ˆ 𝑦 ′ 𝜏 𝑛  = 𝑛 − 1 Ö 𝑖 = 1 Φ  Δ 𝑖 √ 2 𝜎  , where Δ 𝑖 = ˆ 𝑦 𝜏 𝑖 + 1 − ˆ 𝑦 𝜏 𝑖 and Φ ( · ) is the cumulative distribution function (CDF) of the standard normal distribution. Proof. W e aim to calculate the probability that the ordering of the elements in the perturbe d vector ˆ 𝑦 ′ = ˆ 𝑦 + 𝜖 remains the same as the original ordering in ˆ 𝑦 , i.e., Pr ( ˆ 𝑦 ′ 𝜏 1 ≤ ˆ 𝑦 ′ 𝜏 2 ≤ · · · ≤ ˆ 𝑦 ′ 𝜏 𝑛 ) . This requires that, for all 𝑖 ∈ { 1 , 2 , . . . , 𝑛 − 1 } , ˆ 𝑦 𝜏 𝑖 + 1 + 𝜖 𝜏 𝑖 + 1 ≥ ˆ 𝑦 𝜏 𝑖 + 𝜖 𝜏 𝑖 . Rewriting this inequality , we obtain ˆ 𝑦 𝜏 𝑖 + 1 − ˆ 𝑦 𝜏 𝑖 ≥ 𝜖 𝜏 𝑖 − 𝜖 𝜏 𝑖 + 1 . Dene the gap between adjacent elements of the ordered logits as Δ 𝑖 = ˆ 𝑦 𝜏 𝑖 + 1 − ˆ 𝑦 𝜏 𝑖 . Therefore, for each 𝑖 , the condition simplies to Δ 𝑖 ≥ 𝜖 𝜏 𝑖 − 𝜖 𝜏 𝑖 + 1 . AI Model Modulation with Logits Redistribution WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Since each 𝜖 𝑖 ∼ N ( 0 , 𝜎 2 ) , the dierence 𝜖 𝜏 𝑖 − 𝜖 𝜏 𝑖 + 1 follows a normal distribution 𝜖 𝜏 𝑖 − 𝜖 𝜏 𝑖 + 1 ∼ N ( 0 , 2 𝜎 2 ) . Thus, the probability that the ordering is preserved for the 𝑖 -th pair is given by Pr ( Δ 𝑖 ≥ 𝜖 𝜏 𝑖 − 𝜖 𝜏 𝑖 + 1 ) = Φ  Δ 𝑖 √ 2 𝜎  , where Φ ( ·) is the CDF of the standard normal distribution. Since the events are independent (due to the noise terms’ inde- pendence), the probability of the entire order b eing preserved is the product of the probabilities over all pairs, completing the proof. □ Remark 1: Theorem 1 establishes a direct relationship between model utility and noise variance 𝜎 2 , enabling model owners to precisely regulate utility degradation by adjusting 𝜎 2 . Increasing 𝜎 2 disrupts the original logits’ ordering, reducing predictive ac- curacy , while the inherent continuity of the utility function and noise distribution ensures ne-grained control over p erformance levels. This mechanism aligns with Section 2.3.1, allowing tailored adjustments to meet diverse service requirements. T o further understand the impact of Aim ’s logits redistribution, we analyze its rate of change with respect to the noise variance 𝜎 2 . Theorem 2. Given the vector ˆ 𝑦 and the noise vector 𝜖 as in Theo- rem 1, the rate of change of the probability that the order is preserved with respect to the noise variance 𝜎 2 is 𝑑 𝑑 𝜎 2 Pr ( ˆ 𝑦 ′ 𝜏 1 ≤ ˆ 𝑦 ′ 𝜏 2 ≤ · · · ≤ ˆ 𝑦 ′ 𝜏 𝑛 ) = 𝑛 − 1  𝑖 = 1 − Δ 𝑖 2 √ 2 𝜎 3 · 𝜙  Δ 𝑖 √ 2 𝜎  Ö 𝑗 ≠ 𝑖 Φ  Δ 𝑗 √ 2 𝜎  ! , where 𝜙 ( ·) is the probability density function (PDF) of the standard normal distribution 𝜙 ( 𝑧 ) = 1 √ 2 𝜋 𝑒 − 𝑧 2 / 2 and Φ ( · ) is the CDF of the standard normal distribution. Proof. Let 𝑧 𝑖 = Δ 𝑖 √ 2 𝜎 . The derivative of 𝑧 𝑖 with respect to 𝜎 2 is 𝑑 𝑧 𝑖 𝑑 𝜎 2 = 𝑑 𝑑 𝜎 2  Δ 𝑖 √ 2 𝜎  = − Δ 𝑖 2 √ 2 𝜎 3 . Then, using the chain rule to dierentiate Φ ( 𝑧 𝑖 ) , we have 𝑑 𝑑 𝜎 2 Φ ( 𝑧 𝑖 ) = 𝜙 ( 𝑧 𝑖 ) · 𝑑 𝑧 𝑖 𝑑 𝜎 2 = 𝜙 ( 𝑧 𝑖 ) ·  − Δ 𝑖 2 √ 2 𝜎 3  , where 𝜙 ( 𝑧 𝑖 ) is the probability density function of the standard normal distribution 𝜙 ( 𝑧 𝑖 ) = 1 √ 2 𝜋 𝑒 − 𝑧 2 𝑖 / 2 . Next, applying the product rule to the entire product and 𝑑 𝑑 𝜎 2 𝑛 − 1 Ö 𝑖 = 1 Φ ( 𝑧 𝑖 ) = 𝑛 − 1  𝑖 = 1 𝑑 𝑑 𝜎 2 Φ ( 𝑧 𝑖 ) Ö 𝑗 ≠ 𝑖 Φ ( 𝑧 𝑗 ) ! . Substituting the derivative of Φ ( 𝑧 𝑖 ) , we hav e the desired result. □ Remark 2: The negative derivative indicates that as the noise variance 𝜎 2 increases, the probability of preserving the original logits order decreases, causing utility degradation. This probabil- ity drops sharply when 𝜎 2 nears the mean of dierences between logits ( Δ 𝑖 ), leading to rapid changes in predictions, ensuring the eectiveness of Aim . 3.3 Focus Modulation 3.3.1 Definition. Focus mo dulation adjusts the model’s responsive- ness to specic features of inputs, making it more or less attentive as needed. This is achiev ed by adding noise that is constrained to be either non-negative or non-positive, shifting the logits in a sp ecic direction. Formally , we modulate the logits as Λ ( ˆ 𝑦 𝑖 ) = ˆ 𝑦 𝑖 ± | 𝜖 | , 𝜖 ∼ N ( 0 , 𝜎 2 ) , where the sign ± is chosen to increase or de crease the emphasis on the target class or feature. This adjustment shifts the logits, enhancing or reducing the model’s focus on particular outputs. For example, in a driving assistance system, applying a p osi- tive shift (adding | 𝜖 | ) to the car detection component increases the model’s attention to car hazards, causing the vehicle to react more readily to car obstacles and potentially leading to more frequent interventions. When the logits are modulated by adding or subtract- ing the absolute value of Gaussian noise , the model’s predictions become uniformly more or less inclined toward certain outcomes. This consistent shift in the logits aects the softmax probabilities, making the model more or less attentive overall. 3.3.2 A nalysis. Consider tw o logits ˆ 𝑦 𝑖 (target) and ˆ 𝑦 𝑗 (reference). W e analyze the scenario in which non-negative noise modies a spe cic logit. Our analysis focuses on the probability that this adjustment changes the model’s prediction. Theorem 3. Given ˆ 𝑦 𝑖 ≤ ˆ 𝑦 𝑗 and a noise 𝜖 ∼ N ( 0 , 𝜎 2 ) , the proba- bility that ˆ 𝑦 ′ 𝑖 = ˆ 𝑦 𝑖 + | 𝜖 | remains less than or equal to ˆ 𝑦 𝑗 is Pr ( ˆ 𝑦 ′ 𝑖 ≤ ˆ 𝑦 𝑗 ) = 2 Φ  ˆ 𝑦 𝑗 − ˆ 𝑦 𝑖 𝜎  − 1 , where Φ ( ·) is the CDF of the standard normal distribution. Proof. Given two logits ˆ 𝑦 𝑖 and ˆ 𝑦 𝑗 such that ˆ 𝑦 𝑖 ≤ ˆ 𝑦 𝑗 , we con- sider the modulation of the target logit ˆ 𝑦 𝑖 with the noise term 𝜖 ∼ N ( 0 , 𝜎 2 ) . W e dene the modied logit as ˆ 𝑦 ′ 𝑖 = ˆ 𝑦 𝑖 + | 𝜖 | post- modulation. T o determine the probability that the order of the logits remains unchanged, we need to e valuate Pr ( ˆ 𝑦 ′ 𝑖 ≤ ˆ 𝑦 𝑗 ) = Pr ( ˆ 𝑦 𝑖 + | 𝜖 | ≤ ˆ 𝑦 𝑗 ) . This can be rewritten as Pr ( | 𝜖 | ≤ ˆ 𝑦 𝑗 − ˆ 𝑦 𝑖 ) . The absolute value | 𝜖 | follows a folded normal distribution. The CDF of | 𝜖 | can be derived from the properties of the normal distri- bution. Specically , we have Pr ( | 𝜖 | ≤ 𝑥 ) = Pr ( − 𝑥 ≤ 𝜖 ≤ 𝑥 ) = Φ  𝑥 𝜎  − Φ  − 𝑥 𝜎  = 2 Φ  𝑥 𝜎  − 1 , where Φ ( ·) is the CDF of the standard normal distribution. Thus, let 𝑥 = ˆ 𝑦 𝑗 − ˆ 𝑦 𝑖 , we obtain Pr ( | 𝜖 | ≤ ˆ 𝑦 𝑗 − ˆ 𝑦 𝑖 ) = 2 Φ  ˆ 𝑦 𝑗 − ˆ 𝑦 𝑖 𝜎  − 1 . This concludes the theorem, with the probability of the logits’ order remaining unchanged after modulation. □ The other cases of focus modulation can b e derived by combining the results for any pair of logits, and the case ˆ 𝑦 𝑖 ≥ ˆ 𝑦 𝑗 follows by symmetry . The case ˆ 𝑦 ′ 𝑖 = ˆ 𝑦 𝑖 − | 𝜖 | with ˆ 𝑦 𝑖 ≤ ˆ 𝑦 𝑗 is not considered, because it does not change the order of the logits. WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Zihan W ang et al. Remark 3: This theorem introduces a tunable control mecha- nism where the noise variance 𝜎 2 directly inuences the model’s focus intensity on targeted logits. By scaling 𝜎 2 , users amplify or dampen the system’s sensitivity to sp ecic features, achieving application-aligned behavior without structural changes. Cru- cially , this process preserves the core ordering dynamics (as for- malized in Section 2.3.2), ensuring stability while enabling strate- gic prioritization of critical inputs. 4 Experimental Evaluation T o validate the eectiveness of our proposed modulation method, Aim , we conduct comprehensiv e experiments addressing the two primary scenarios outline d in the introduction: providing dierent utility levels for model owners ( Scenario #1 ) and enabling users to tailor mo del behavior to their preferences ( Scenario #2 ). W e evaluate b oth modulation mo des – utility modulation and fo cus modulation – across various tasks and models. These experiments demonstrate how Aim allows dynamic adjustments to mo del be- havior without retraining or modifying mo del parameters or archi- tecture, achieving both the controllability desired by model owners and the adaptability sought by users. 4.1 Experimental Setup T o showcase the exibility and broad applicability of Aim , we con- duct experiments using models and datasets from various domains, including image classication, semantic segmentation, and text generation. The datasets represent widely recognized benchmarks across these tasks: • CIF AR-10 and CIF AR-100 [ 19 ]: Standard benchmarks for im- age classication, each containing 60,000 colored images; 10 and 100 classes, respectively . • ADE20K [ 51 ]: A large-scale scene parsing dataset comprising over 20,000 images acr oss 150 semantic categories, commonly used for semantic segmentation tasks. • KI T TI [ 9 ]: A real-world dataset colle cted from autonomous driving scenarios, providing data for tasks such as 2D/3D object detection, optical ow , and semantic segmentation. • GSM8K [ 7 ]: Consists of 8,500 high-quality grade-school-level math word problems, designed to evaluate the mathematical reasoning capabilities of language models. • MMLU [ 15 ]: The Massive Multitask Language Understanding benchmark with 57 diverse tasks (STEM, humanities, etc.) to evaluate language model reasoning and understanding. While Aim can b e applie d to any trained model, we use sev- eral common DNNs as a proof-of-concept, such as ResNet-56 [ 14 ], SegFormer-B2 [ 48 ], and Llama-3.1-8B [ 40 ]. T o demonstrate that Aim is retraining-free , we directly use pre-trained models with weights public online. By applying Aim to these models and datasets, we demonstrate its ability to oer both controllability for model owners and adaptability for users across a variety of AI applications. 4.2 Utility Modulation Utility modulation adjusts the model’s output to provide varying levels of utility . By controlling the noise level, owners can mo dulate σ σ Figure 2: Classication and semantic segmentation p erfor- mance under varying noise levels ( 𝜎 ) for utility mo dulation. model performance, allowing a basic version to be available to all users while encouraging upgrades for enhanced features. 4.2.1 Implementation. W e apply utility modulation across all mod- els by redistributing the model logits through the addition of con- trolled Gaussian noise with zero mean and var ying standard devia- tions ( 𝜎 ). Specically , the noise level is incr eased in increments of 0.2, allowing for ne-grained control over the modulation process. In cases where the model has smaller logits variance ( e .g., the Llama model due to normalization in the nal layer), the process stops earlier based on the logits’ mean and standard deviation to ensure eective modulation and stability . 4.2.2 Results. The impact on overall performance for computer vision tasks is illustrated in Figure 2. A s the noise level incr eased, the performance of ResNet-56 and SegFormer-B2 on dierent com- puter vision tasks gradually declined. For example, on CIF AR-10, the classication accuracy dropped from 94.37% (original model) to 20.00% as 𝜎 increased from 0 to 20. At a moderate noise level ( 𝜎 = 5 . 0 ), the accuracy was reduced to 72.08%, representing a basic utility level suitable for demonstration purp oses. On CIF AR-100, accuracy falls from 72.62% to 4.59% over the same range of 𝜎 . At 𝜎 = 5 . 0 , the accuracy is 43.62%. Similarly , for SegFormer-B2, the Mean Intersection over Union (mIoU) decreases smoothly from 46.20% (original mo del) to 1.24% as 𝜎 increases. At 𝜎 = 3 . 0 , the mIoU is 31.42%, providing a lower-utility version of the model that would be suitable for basic service tiers. Aim ’s utility modulation demonstrates ne-grained control over model performance across computer vision tasks, enabling strategic adjustments aligned with business goals without retraining or maintaining multiple models. Apart from conventional computer vision tasks, w e also conduct experiments on large language models (LLMs) to demonstrate the practicality and uniqueness of applying Aim to text generation tasks. This is particularly signicant be cause LLMs are integral to many applications, and ensuring that outputs remain coher ent and meaningful under mo dulation is crucial for user experience. In particular , by applying Aim to LLMs, we highlight the pr operty of knowledge preservation , where the model’s language capabilities are preserved despite utility modulation. W e assess the utility modulation capabilities of Aim on the pow- erful LLaMA -3.1-8B model. As shown in Figure 3, the performance degraded smo othly with increasing 𝜎 . On GSM8K, accuracy de- creased from 80.74% to 2.12%. At 𝜎 = 1 . 6 , accuracy was 59.36%. On MMLU, accuracy decreased from 66.40% to 28.03% over the range of 𝜎 . Notably , even at higher noise levels, the generated text remains grammatically correct and coherent but tends to become excessively v erbose and redundant. This increased verbosity can AI Model Modulation with Logits Redistribution WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Figure 3: Performance of Llama-3.1-8B on GSM8K and MMLU datasets with dierent noise levels ( 𝜎 ), accompanied by a sample MMLU question. ( a ) I m a g e ( b ) σ = 0 . 0 ( c ) σ = 0 . 6 ( d ) σ = 1 . 2 ( e ) σ = 1 . 8 ( a ) I ma ge ( b) σ = 0 . 0 ( c ) σ = 0 . 6 ( d) σ = 1 . 2 ( e ) σ = 1 . 8 Figure 4: Segmentation of pedestrians improves progressively with moderate noise levels (c-e) compared to no noise ( 𝜎 = 0 ), where pedestrians are partially or not detected (b). sometimes lead to incorrect answers, as the unnecessary elabora- tion may introduce confusion or logical errors [ 28 ]. Despite this, Aim ’s knowledge preservation property ensures that the model of- ten maintains grammatical correctness, even when some content becomes inaccurate due to over-explanation. Example 1 in Appen- dix A sho wcases sample outputs for the MMLU question illustrated in Figure 3; under utility mo dulation ( 𝜎 = 2 . 2 ), the r esponse is more verbose and includes superuous details compared to the baseline ( 𝜎 = 0 ). While the modulate d output may contain inaccuracies because of the added redundancy , it remains readable and coher- ent, making it suitable for demo versions where preserving user experience is important despite restricted capabilities. Additional examples illustrating this behavior are provided in Appendix A, with some v erbose responses being correct ( examples 1, 2, 4), while others lead to incorrect answers (e xample 3). The results across all datasets and models demonstrate that Aim ’s utility modulation eectively adjusts the utility level of models. By controlling the noise level 𝜎 , model owners can oer models with reduced performance as basic versions, encouraging users to up- grade for full capabilities. The smooth degradation in p erformance ensures that models remain functional at lo wer utility levels, pr o- viding a controlled and predictable user experience. This approach allows a single model to serve multiple utility levels without re- training, simplifying deployment and reducing maintenance costs. 4.2.3 Discussion. Our empirical results conrm a three-stage p er- formance trajectory under utility modulation, closely matching our theoretical framework in Section 3. At low noise levels, p er- formance remains high be cause the top logits clearly stand out, making small perturbations insucient to disrupt their ordering. As noise intensies to mo derate values, it b ecomes comparable to the typical gap between logits, triggering rapid reshuing and frequent misclassications. This middle phase is wher e noise has its largest overall impact, causing accuracy to drop sharply as logit 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 92 94 96 98 100 Accuracy (%) Person Accuracy mIoU 25 30 35 40 45 50 mIoU Figure 5: Focus modulation enhances targeted class accuracy but risks reducing overall mIoU if adjustments are excessive. dominance is lost. At high noise levels, adding more perturbation yields diminishing returns. With logits already heavily disrupted, performance settles near random-guessing accuracy . This progression is especially revealing for real-world deploy- ment, particularly in publicly accessible demo models. In the mod- erate noise range, model owners can precisely tune the utility con- straints to preserve core functionality while limiting access to only part of the model’s potential. Users see a functional system that clearly demonstrates the mo del’s pow er , yet also notice the benets of upgrading to a more capable version. This approach aligns well with business strategies that oer a free tier for broad access and a premium tier for users who require higher-quality results. 4.3 Focus Modulation While pro viding eective utility modulation for model owners, Aim also allows users to adapt the model’s behavior to suit individual preferences or contextual needs. By adjusting the model’s focus on specic features or aspects, users can enhance p erformance on areas of interest without the need for retraining. 4.3.1 Implementation. W e conduct focus modulation exclusively on semantic segmentation tasks as it intuitively aligns with real- world needs, such as ADAS, where prioritizing specic features (e.g., detecting pe destrians) is crucial. Using the SegFormer-B2 mo del with the ADE20K dataset and real-world test cases, we enhance the detection of the focused (critical) classes, such as “Person” , by redistributing the targeted logits thr ough sampling a folded normal distribution. The noise lev el is added in steps of 0.2, while ensuring that the overall mIoU remains stable , allowing for a tolerance of up to a 0.5% decrease from the original mIoU . 4.3.2 Results. As shown in Figure 5, increasing the noise level 𝜎 from 0.0 to 2.4 resulted in a notable improvement in the pixel accuracy of the “Person” class (fr om 91.24% to 96.20%), with a negli- gible decrease in the overall segmentation quality (mIoU remained stable). Figure 4, cropped for better clarity , demonstrates that with moderate noise levels ( 𝜎 = 0 . 6 , 1 . 2 , 1 . 8 ), the segmentation of pedes- trians progressively impro ves compared to no noise ( 𝜎 = 0 ), where pedestrians are partially or not detected. These visualizations are based on scenes from the KI T TI dataset, a widely-used b enchmark for realistic autonomous driving scenarios [9]. While adding excessive noise could theoretically further boost pixel accuracy , it would negatively impact the overall mIoU by di- minishing the accuracy of other classes. Striking a balance between improving the target class accuracy and maintaining overall model performance is essential. Our results show that moderate noise WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Zihan W ang et al. T able 1: Accuracy improvement (%) by obje ct class and aver- age mIoU change across noise levels 𝜎 . Class 𝜎 = 0 . 0 𝜎 = 0 . 2 𝜎 = 0 . 4 𝜎 = 0 . 6 𝜎 = 0 . 8 𝜎 = 1 . 0 Person 91.24 + 0.77 + 1.43 + 2.01 + 2.52 + 2.96 Car 91.70 + 0.53 + 1.03 + 1.48 + 1.88 + 2.26 Tree 87.95 + 0.91 + 1.73 + 2.46 + 3.10 + 3.68 Bicycle 75.90 + 2.01 + 3.75 + 5.13 + 6.46 + 7.53 Bus 92.30 + 0.32 + 0.60 + 0.84 + 1.09 + 1.32 Streetlight 29.02 + 1.90 + 3.99 + 6.16 + 8.37 + 10.65 Trac Light 42.22 + 2.38 + 4.75 + 6.80 + 8.98 + 10.91 avg. mIoU 46.20 + 0.00 + 0.00 - 0.01 - 0.02 - 0.02 levels can signicantly enhance the detection of critical classes like “Person” without substantially impacting the overall performance. Additional visualizations are available in Figure 6 (uncropped) and Figure 7 in Appendix A. W e also evaluate focus mo dulation on other classes such as “Trac Light” , “Bicycle” , and “Car” , which ar e likely to be of interest in applications like autonomous driving systems. These classes are critical for ensuring road safety and compliance with trac regulations. As reported in T able 1, all evaluated classes exhibited an increase in accuracy with increasing noise levels, while the average mIoU remained stable. For instance, at 𝜎 = 1 . 0 , the accuracy of the “Bicycle ” class increased from 75.90% to 83.43% ( + 7.53), with only a negligible decrease in the average mIoU ( - 0.02%). 4.3.3 Discussion. By carefully selecting the noise levels, we can signicantly enhance the segmentation of critical classes like “Per- son” without compromising the overall performance of the model. This approach provides a practical way to adjust model sensitivity in applications where certain detections are prioritized, oering users the ability to tailor the model’s r esponsiveness based on their preferences or requirements. Our focus modulation signicantly enhances the mo del’s ability to prioritize specic classes without compromising overall p er- formance. An important aspe ct of this approach is its eect on predictions near decision boundaries, where inputs are particularly prone to misclassication. By strategically redistributing the logits of targeted classes, Aim allows the model to favor specic classes, eectively pulling instances back from crossing into incorrect clas- sications and boosting the mo del’s condence in boundary cases. Overall, Aim provides exible, ne-grained contr ol over model behavior , allowing users to prioritize specic outputs without r e- training or altering the model architecture. This exibility is crucial for applications that require pr ecise adjustments while maintaining the model’s overall eectiveness. Remark 4: Experimental results conrm that Aim eectively modulates models across diverse applications without the need for retraining or architectural changes. This capability allows model owners to maintain control while enabling users to adapt the model to their specic ne eds, ther eby enhancing the exibility and user-centricity of AI deployments. 5 Related W ork Intermediate representations in neural networks . Early-exit techniques [ 12 , 18 , 24 , 36 , 38 , 46 ] leverage intermediate representa- tions within neural networks to reduce inference costs by dynami- cally skipping later layers when early pr edictions are suciently condent, trading o performance for latency . While focusing on computational eciency , they do not aim to mo dulate the model’s behavior to meet diverse user requirements. Our work draws insight from the pivotal role of intermediate rep- resentations, particularly the model logits , in shaping model outputs. By directly modifying the logits, we pr ovide ne-grained control over the model’s behavior without altering its architecture or re- quiring retraining. Rather than focusing on performance-latency trade-os, we enable post-training adaptation of utility and feature prioritization. Fine-tuning and transfer learning . Fine-tuning [ 16 , 31 ] and transfer learning [ 39 , 45 ] adapt pre-trained mo dels to new tasks or domains by retraining them on task-specic datasets, achiev- ing high performance on specialized tasks. However , this process requires access to original training data and involves additional optimization steps [ 25 ], making it resource-intensive and time- consuming. Managing multiple ne-tuned models for dier ent user groups also increases maintenance overhead and complicates con- sistency across updates [ 8 ]. In contrast, our method dynamically adjusts model outputs without retraining or data access, oering a lightweight alternative for multi-stakeholder adaptation. T emperature scaling and calibration . T emperature scaling [ 10 ] is a p ost-processing technique use d to calibrate neural network predictions by adjusting a temperature parameter in the softmax function, eectively modifying output probabilities without chang- ing model weights. It aims to impr ove the condence calibration of models, ensuring that predicted probabilities better reect true likelihoods. While temperature scaling adjusts the sharpness of the probability distribution, it preser ves the relative ordering of logits and does not provide control over the model’s utility levels or focus on specic features or classes. Our appr oach extends be- yond calibration by redistributing logits to enable contr olled utility adjustments and task-specic feature emphasis. 6 Conclusion W e pr opose a no vel paradigm for AI model mo dulation that bridges the gap between model owners’ nee d for controllability and users’ desire for adaptability . By enabling utility and focus modulation without retraining or altering the model’s architecture, our modu- lator Aim allows a single model to oer varying performance levels and personalize d feature responsiveness. This empowers model owners to eciently manage intellectual property and cater to dif- ferent market segments, while enabling users to align the model’s behavior with their preferences without compromising overall per- formance. Our theoretical analysis and experiments across diverse tasks validate Aim ’s practicality and eectiveness, pr oviding a exi- ble, ecient, and user-centric approach to AI deployment that meets the demands of modern applications in a complex AI landscape. Acknowledgments W e thank the revie wers for their insightful comments. This w ork is partially supp orted by Australian Research Council Discovery Projects (DP230101196, DP240103068). Minhui Xue is supp orted by CSIRO – National Science Foundation (US) AI Research Col- laboration Program. Zihan W ang is supported in part by CSIRO’s Data61’s T op-up P hD scholarship. AI Model Modulation with Logits Redistribution WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia References [1] 2018. cutout.pro. https://ww w .cutout.pro. [2] 2023. together .ai. https://w ww .together .ai. [3] T om Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastr y , Amanda Askell, et al . 2020. Language mo dels are few-shot learners. In Advances in neural information processing systems (NeurIPS) . 1877–1901. [4] Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Y uanhao Pu, Y uxuan Lei, Xiaolong Chen, Xingmei Wang, et al . 2024. When large language models meet personalization: Perspectives of challenges and opportunities. W orld Wide W eb 27, 4 (2024), 42. [5] Dawei Cheng, Fangzhou Y ang, Sheng Xiang, and Jin Liu. 2022. Financial time series forecasting with multi-modality graph neural network. Pattern Re cognition 121 (2022), 108218. [6] Li Chuan. 2023. Op enAI’s GPT -3 Language Model: A T echnical Overview . https: //lambdalabs.com/blog/demystifying- gpt- 3. [7] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser , Matthias P lappert, Jerry Tw orek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training V eriers to Solve Math W ord Problems. arXiv preprint arXiv:2110.14168 (2021). [8] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer . 2024. Qlora: Ecient netuning of quantized llms. Advances in Neural Information Processing Systems 36 (2024). [9] Andreas Geiger , Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Au- tonomous Driving? The KIT TI Vision Benchmark Suite. In Conference on Com- puter Vision and Pattern Recognition (CVPR) . [10] Chuan Guo, Geo Pleiss, Y u Sun, and Kilian Q W einberger . 2017. On calibration of modern neural networks. In International conference on machine learning . PMLR, 1321–1330. [11] Roland Erik Haas, Shamb o Bhattacharjee, and Dietmar PF Möller . 2020. Advanced driver assistance systems. Smart T echnologies: Scope and A pplications (2020), 345– 371. [12] Yizeng Han, Yifan Pu, Zihang Lai, Chaofei Wang, Shiji Song, Junfeng Cao , W enhui Huang, Chao Deng, and Gao Huang. 2022. Learning to weight samples for dynamic early-exiting networks. In European Conference on Computer Vision . Springer , 362–378. [13] Martina Hasenjäger and Heiko W ersing. 2017. Personalization in advanced driver assistance systems and autonomous vehicles: A review . In 2017 IEEE 20th International Conference on Intelligent T ransportation Systems (ITSC) . 1–7. doi:10.1109/I TSC.2017.8317803 [14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Procee dings of the IEEE conference on computer vision and pattern recognition . 770–778. [15] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Under- standing. Procee dings of the International Conference on Learning Representations (ICLR) (2021). [16] Jeremy Howard and Sebastian Ruder . 2018. Universal language model ne-tuning for text classication. arXiv preprint arXiv:1801.06146 (2018). [17] Shruti Kaushik, Abhinav Choudhury , Pankaj Kumar Sheron, Nataraj Dasgupta, Sayee Natarajan, Larry A Pickett, and V arun Dutt. 2020. AI in healthcare: time- series forecasting using statistical, neural, and ensemble architectures. Frontiers in big data 3 (2020), 4. [18] Yigitcan Kaya, Sanghyun Hong, and T udor Dumitras. 2019. Shallow-Deep Net- works: Understanding and Mitigating Network Overthinking. In Pr oceedings of the 36th International Conference on Machine Learning (ICML) (Pr oceedings of Ma- chine Learning Research, V ol. 97) , K amalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 3301–3310. https://proceedings.mlr .press/v97/kaya19a.html [19] Alex Krizhevsky , Georey Hinton, et al . 2009. Learning multiple lay ers of featur es from tiny images. (2009). [20] Alex Krizhevsky , Ilya Sutskever , and Georey E Hinton. 2012. Imagenet classi- cation with deep convolutional neural networks. Advances in neural information processing systems 25 (2012). [21] Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. 2022. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neuro- computing 470 (2022), 443–456. [22] Y ann LeCun, Y oshua Bengio, and Georey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436–444. [23] Xiangjie Li, Chenfei Lou, Yuchi Chen, Zhengping Zhu, Yingtao Shen, Y ehan Ma, and An Zou. 2023. Predictive exit: Prediction of ne-grained early exits for computation-and energy-ecient inference. In Proceedings of the AAAI Confer- ence on A rticial Intelligence , V ol. 37. 8657–8665. [24] Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, and Bin He. 2021. A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models. In Proceedings of the 2021 Conference of the North A meri- can Chapter of the Association for Computational Linguistics: Human Language T echnologies . Association for Computational Linguistics, Online, 2013–2023. doi:10.18653/v1/2021.naacl- main.162 [25] Haokun Liu, Derek T am, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A Rael. 2022. Few-shot parameter-ecient ne-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965. [26] Shuofeng Liu, Zihan W ang, Minhui Xue, Long W ang, Y uanchao Zhang, and Guangdong. Bai. 2024. Being Transparent is Merely the Beginning: Enforcing Purpose Limitation with Polynomial Approximation. USENIX Se curity . [27] Jonathan Long, Evan Shelhamer , and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition . 3431–3440. [28] Zhongkui Ma, Xinguo Feng, Zihan W ang, Shuofeng Liu, Mengyao Ma, Hao Guan, and Mark Huasong Meng. 2023. Formalizing Robustness Against Character- Level Perturbations for Neural Network Language Models. In Formal Methods and Software Engineering: 24th International Conference on Formal Engineering Methods (ICFEM)) (Brisbane, QLD , A ustralia). Springer- V erlag, Berlin, Heidelberg, 100–117. doi:10.1007/978- 981- 99- 7584- 6_7 [29] Zhongkui Ma, Zihan W ang, and Guangdong Bai. 2025. Convex Hull Approxima- tion for Activation Functions. Proc. ACM Program. Lang. 9, OOPSLA2, Article 308 (Oct. 2025), 27 pages. doi:10.1145/3763086 [30] Garima Nain, KK Pattanaik, and GK Sharma. 2022. To wards edge computing in intelligent manufacturing: Past, present and future. Journal of Manufacturing Systems 62 (2022), 588–611. [31] Maxime Oquab, Leon Bottou, Ivan Lapte v , and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural net- works. In Proceedings of the IEEE conference on computer vision and pattern recog- nition . 1717–1724. [32] Kapileswar Rana and Narendra Khatri. 2024. A utomotive intelligence: Unleashing the p otential of AI beyond advance driver assisting system, a comprehensive review . Computers and Electrical Engineering 117 (2024), 109237. [33] Ratheesh Ravindran, Michael J Santora, and Mohsin M Jamali. 2020. Multi-object detection and tracking, based on DNN, for autonomous vehicles: A review . IEEE Sensors Journal 21, 5 (2020), 5668–5677. [34] Mauro Rib eiro, Katarina Grolinger , and Miriam AM Capretz. 2015. Mlaas: Machine learning as a ser vice. In 2015 IEEE 14th international conference on machine learning and applications (ICMLA) . IEEE, 896–902. [35] Emma Strub ell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and policy considerations for modern deep learning research. In Proceedings of the AAAI conference on articial intelligence , V ol. 34. 13693–13696. [36] Tianxiang Sun, Y unhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, and Xipeng Qiu. 2021. Early exiting with ensemble internal classiers. arXiv pr eprint arXiv:2105.13792 (2021). [37] Chloe T aylor. 2023. ChatGPT creator OpenAI earnings: $80 million a month, $1 billion annual revenue , $540 million loss: Sam Altman. https://fortune.com/2023/08/30/chatgpt- creator- openai- earnings- 80- million- a- month- 1- billion- annual- revenue- 540- million- loss- sam- altman/. [38] Surat T e erapittayanon, Bradley McDanel, and Hsiang- T sung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd international conference on pattern recognition (ICPR) . IEEE, 2464–2469. [39] Lisa T orrey and Jude Shavlik. 2010. Transfer learning. In Handb ook of research on machine learning applications and trends: algorithms, methods, and techniques . IGI global, 242–264. [40] Hugo Touvr on, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar , et al . 2023. Llama: Open and ecient foundation language models. arXiv preprint arXiv:2302.13971 (2023). [41] A V aswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017). [42] Zihan W ang, Olivia Byrnes, Hu W ang, Ruoxi Sun, Congbo Ma, Huaming Chen, Qi W u, and Minhui Xue. 2023. Data hiding with deep learning: a survey unifying digital watermarking and steganography . IEEE Transactions on Computational Social Systems (2023). [43] Zihan W ang, Zhongkui Ma, Xinguo Feng, Ruoxi Sun, Hu W ang, Minhui Xue, and Guangdong Bai. 2024. CoreLocker: Neuron-level Usage Control. In 2024 IEEE Symposium on Security and Privacy (SP) . IEEE Computer Society , 2497–2514. [44] Zihan W ang, Zhongkui Ma, Xinguo Feng, Chuan Y an, Dongge Liu, Ruoxi Sun, Derui W ang, Minhui Xue, and Guangdong Bai. 2025. Re-Key-Free, Risky-Free: Adaptable Model Usage Control. arXiv:2511.18772 [cs. CR] https://arxiv .org/abs/ 2511.18772 [45] Karl W eiss, Taghi M Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big data 3 (2016), 1–40. [46] Bartosz Wójcik, Marcin Przewieźlikowski, Filip Szatkowski, Maciej W ołczyk, Klaudia Bałazy , Bartłomiej Krzepkowski, Igor Podolak, Jacek T abor , Marek Śmieja, and T omasz Trzciński. 2023. Zero time waste in pre-trained early exit neural networks. Neural Networks 168 (2023), 580–601. [47] Guoxuan Xia and Christos-Savvas Bouganis. 2023. Window-based early-exit cascades for uncertainty estimation: When deep ensembles are more ecient WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia Zihan W ang et al. than single models. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 17368–17380. [48] Enze Xie, W enhai W ang, Zhiding Yu, Anima Anandkumar , Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and ecient design for semantic segmentation with transformers. Advances in neural information processing systems 34 (2021), 12077–12090. [49] Chuan Y an, Ruomai Ren, Mark Huasong Meng, Liuhuo W an, Tian Y ang Ooi, and Guangdong Bai. 2024. Exploring chatgpt app ecosystem: Distribution, deployment and se curity . In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering . 1370–1382. [50] Chuan Y an, Liuhuo W an, Bowei Guan, Fengqi Yu, and Guangdong Bai. 2025. Tracking GPT s Third Party Ser vice: Automation, A nalysis, and Insights . Association for Computing Machinery , New Y ork, N Y , USA, 1602–1606. https://doi.org/10. 1145/3696630.3728711 [51] Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler , Adela Barriuso, and Antonio T orralba. 2017. Scene Parsing through ADE20K Dataset. In Proce edings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . [52] W angchunshu Zhou, Canwen Xu, T ao Ge, Julian McAuley , Ke Xu, and Furu W ei. 2020. Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Pr ocessing Systems (NeurIPS) 33 (2020), 18330–18341. A Additional Results 1 Q u e s t i o n : A n n i k a b r o u g h t $ 5 0 t o t h e t o w n f a i r . S h e s p e n t 2 h a l f o f i t o n f o o d a n d s n a c k s , a n d a n a d d i t i o n a l $ 1 0 f o r 3 r i d e s . H o w m u c h , i n d o l l a r s , i s l e f t ? 4 L e t ' s t h i n k s t e p b y s t e p . A n s w e r : 1 A n n i k a s p e n t h a l f o f h e r m o n e y o n f o o d a n d s n a c k s , s o s h e 2 s p e n t 5 0 / 2 = 2 5 d o l l a r s o n f o o d a n d s n a c k s . S h e a l s o s p e n t 3 a n a d d i t i o n a l 1 0 d o l l a r s o n r i d e s , s o s h e s p e n t 2 5 + 1 0 = 3 5 4 d o l l a r s i n t o t a l . A n n i k a s t a r t e d w i t h 5 0 d o l l a r s , a n d s h e 5 s p e n t 3 5 d o l l a r s , s o s h e h a s 5 0 - 3 5 = 1 5 d o l l a r s l e f t . 6 T h e a n s w e r i s 1 5 1 A n n i k a h a d a t o t a l o f $ 5 0 a n d s h e s p e n d s h a l f o f t h e t o - 2 t a l a m o u n t o n f o o d , a n d t h e r e m a i n i n g i s a f u l l a m o u n t . 3 S i n c e t h e r e i s n o m e n t i o n a b o u t t h e r i d e s b e f o r e s a y i n g 4 A n n i k a b r i n g s s o m e m o n e y , w e c a n g u e s s A n n i k a b r i n g s a t 5 l e a s t $ 1 0 f o r r i d e . S i n c e h a l f i s $ 2 5 a n d a t o t a l o f m o - 6 n e y t h a t A n n i k a h a s i s $ 5 0 a n d A n n i k a h a s r i d e s a f t e r 7 s p e n d i n g $ 2 5 , s o A n n i k a h a d t o t a l o f $ 3 5 l e f t t o u s e o n 8 t h e r i d e s . T h e r e f o r e , A n n i k a h a d m o r e t h a n $ 4 0 . S i n c e 9 s h e o n l y s p e n t h a l f t h e n t h e o t h e r a m o u n t m u s t c o m e a f t e r 10 $ 1 0 s p e n t o n r i d e s . T h i s $ 1 0 s p e n t o n r i d e s i s a d d e d t o 11 t h e h a l f t h a t A n n i k a h a s s p e n t , s o A n n i k a h a s m o r e t h a n 12 $ 4 5 l e f t . S o w e w i l l p r o c e e d w i t h t h e m e t h o d a b o v e a n d 13 w e c a l c u l a t e h a l f f r o m $ 5 0 . H a l f o f $ 5 0 i s $ 5 0 / 2 = $ 2 5 . 14 A n n i k a i s l e f t $ 5 0 - $ 2 5 = $ 2 5 w h e n s h e d e c i d e s t o b u y t h e 15 r i d e s . W h e n A n n i k a d e c i d e s t o b u y t h e r i d e s , A n n i k a i s 16 l e f t w i t h 2 5 d o l l a r s . S h e s p e n d s a n e x t r a 1 0 d o l l a r s t o 17 b u y r i d e s . S o A n n i k a i s h a v i n g 1 5 d o l l a r s . W h a t d o e s i t 18 m e a n ? I t m e a n s $ 5 0 - $ 2 5 - $ 1 0 i s t h e m i n i m u m m o n e y t h a t 19 A n n i k a m u s t h a v e a f t e r A n n i k a h a s s p e n t 1 0 d o l l a r s . A n d 20 t h i s m i n i m u m w i l l a l w a y s b e $ 1 5 . 21 T h e a n s w e r w o u l d t h e n b e $ 1 5 T ext generation example 1 (MMLU, 𝜎 = 0 and 𝜎 = 2 . 2 ) 1 Q u e s t i o n : A " d i s h e d f a c e " p r o f i l e i s o f t e n a s s o c i a t e d 2 w i t h : 3 A . a p r o t r u d i n g m a n d i b l e d u e t o r e a c t i v a t i o n o f t h e 4 c o n d y l a r c a r t i l a g e b y a c r o m e g a l y . 5 B . a r e c e s s i v e m a x i l l a d u e t o f a i l u r e o f e l o n g a t i o n o f 6 t h e c r a n i a l b a s e . 7 C . a n e n l a r g e d f r o n t a l b o n e d u e t o h y d r o c e p h a l y . 8 D . d e f e c t i v e d e v e l o p m e n t o f t h e m a x i l l a r y a i r s i n u s . 9 A n s w e r : 1 B 2 T h e r e i s a s i n g l e c h o i c e q u e s t i o n a b o u t a n a t o m y . 3 A n s w e r t h e q u e s t i o n b y r e p l y i n g A , B , C o r D . 1 B 2 A l l m u s c l e s i n a b i l i a t e r a l l y i n n e r v a t e d f a c i a l m u s c l e 3 g r o u p h a v e t h e s a m e n e r v e s u p p l y . D u r i n g u n i l a t e r a l 4 n e r v e p a r a l y s i s , w h i c h m u s c l e c h a r a c t e r i s t i c s w i l l 5 r e s u l t w h e n t r y i n g t o m i m i c t h e s e m u s c l e s o n t h e 6 p a r a l y z e d s i d e ? 7 A . H y p e r t r o p h i e d m u s c l e s 8 B . A t r o p h y 9 C . W e a k m u s c l e c o n t r a c t i o n a n d p o t e n t i a l c o m p e n s a t o r y 10 h y p e r t r o p h y 11 D . H y p e r e s t h e t i c ( m o r e s e n s i t i v e t h a n n o r m a l ) 12 A n s w e r : B T ext generation example 2 (MMLU, 𝜎 = 0 and 𝜎 = 2 . 2 ) AI Model Modulation with Logits Redistribution WW W ’25, April 28-May 2, 2025, Sy dney , NSW , A ustralia (a) Image (b) σ = 0 . 0 (c) σ = 0 . 6 (d) σ = 1 . 2 (e) σ = 1 . 8 ( a ) I ma ge ( b) σ = 0 . 0 (c ) σ = 0 . 6 ( d) σ = 1 . 2 (e ) σ = 1 . 8 Figure 6: Improved segmentation quality on targeted class (p eople) using Aim ’s focus modulation (uncropped). ( a) I m ag e ( b ) σ = 0 . 0 ( c) σ = 0 . 4 ( d ) σ = 0 . 8 ( e) σ = 1 . 2 (f) σ = 3 . 2 Figure 7: Improved segmentation quality on targeted class (street light) using Aim ’s focus modulation. 1 Q u e s t i o n : B r a n d o n ' s i P h o n e i s four t i m e s a s o l d a s B e n ' s 2 i P h o n e . B e n ' s i P h o n e i s two t i m e s o l d e r t h a n S u z y ' s i P h - 3 o n e . I f S u z y ' s i P h o n e i s 1 y e a r o l d , h o w o l d i s B r a n d o n ' s 4 i P h o n e ? 5 L e t ' s t h i n k s t e p b y s t e p . A n s w e r : 1 S u z y ' s i P h o n e i s 1 y e a r o l d . B e n ' s i P h o n e i s two t i m e s o - 2 l d e r t h a n S u z y ' s i P h o n e , s o i t i s 2 * 1 = 2 y e a r s o l d . B r a - 3 n d o n ' s i P h o n e i s 4 t i m e s a s o l d a s B e n ' s i P h o n e , s o i t i s 4 4 * 2 = 8 y e a r s o l d . 5 T h e a n s w e r i s 8 1 S u z y ' s i P h o n e i s 1 y e a r s o l d , a n d B e n ' s i P h o n e i s 2 y e a r s 2 o l d e r , m e a n i n g t h a t B e n ' s i P h o n e i s 1 + 2 = 3 y e a r s o l d . B e n ' s 3 i P h o n e i s 3 y e a r s o l d , a n d B r a n d o n ' s i P h o n e i s 4 t i m e s a s 4 o l d , m e a n i n g t h a t B r a n d o n ' s i P h o n e i s 3 * 4 = 1 2 y e a r s o l d . 5 T h e a n s w e r i s 1 2 T ext generation example 3 (GSM8K, 𝜎 = 0 and 𝜎 = 2 . 2 ) 1 Q u e s t i o n : T o u l o u s e h a s t w i c e a s m a n y s h e e p a s C h a r l e s t o n . 2 C h a r l e s t o n h a s 4 t i m e s a s m a n y s h e e p a s S e a t t l e . H o w m a n y 3 s h e e p d o T o u l o u s e , C h a r l e s t o n , a n d S e a t t l e h a v e t o g e t h e r 4 i f S e a t t l e h a s 2 0 s h e e p ? 5 L e t ' s t h i n k s t e p b y s t e p . A n s w e r : 1 S e a t t l e h a s 2 0 s h e e p . C h a r l e s t o n h a s 4 t i m e s a s m a n y 2 s h e e p a s S e a t t l e , s o C h a r l e s t o n h a s 2 0 * 4 = 8 0 s h e e p . 3 T o u l o u s e h a s 2 t i m e s a s m a n y s h e e p a s C h a r l e s t o n , s o 4 T o u l o u s e h a s 8 0 * 2 = 1 6 0 s h e e p . A l l t o g e t h e r , S e a t t l e , 5 C h a r l e s t o n , a n d T o u l o u s e h a v e 1 6 0 + 8 0 + 2 0 = 2 6 0 s h e e p . 6 T h e a n s w e r i s 2 6 0 1 C h a r l e s t o n h a s 4 t i m e s a s m a n y s h e e p a s S e a t t l e a n d 2 S e a t t l e h a s 2 0 , s o t h e t o t a l n u m b e r f o r C h a r l e s t o n i s 3 4 * 2 0 = 8 0 . T o u l o u s e a l s o h a s 2 t i m e s t h e n u m b e r o f 4 C h a r l e s t o n , s o t h e n u m b e r o f T o u l o u s e h a s c a n b e f o u n d a s 5 2 * 8 0 = 1 6 0 . T h e n u m b e r o f s h e e p o f T o u l o u s e p l u s t h e n u m - 6 b e r o f s h e e p o f C h a r l e s t o n p l u s t h e n u m b e r o f S e a t t l e i s 7 2 0 + 8 0 + 1 6 0 = 2 6 0 s h e e p . 8 T h e a n s w e r 2 6 0 T ext generation example 4 (GSM8K, 𝜎 = 0 and 𝜎 = 2 . 2 )

AI Model Modulation with Logits Redistribution

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment