Computer Science / Machine Learning

Exact Computation with Infinitely Wide Neural Networks

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers --- is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width. The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for the performance of a pure kernel-based method on CIFAR-10, being $10 %$ higher than the methods reported in [Novak et al., 2019], and only $6 %$ lower than the performance of the corresponding finite deep net architecture (once batch normalization, etc. are turned off). Theoretically, we also give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK.

Exact Computation with Infinitely Wide Neural Networks

From Interpretability to Inference An Estimation Framework for Universal Approximators

Approximate Query Processing Using Deep Generative Models

Communication-Efficient Federated Deep Learning with Asynchronous Model Updates and Temporally Weighted Aggregation

AI-Based Detection of Pilgrims Using Convolutional Neural Networks

Optimizing a Supply-Side Platforms Header Bidding Strategy with Thompson Sampling

A Parallel Projection Technique for Optimization Under Metric Constraints

Large-Scale Traffic Signal Control with a New Multi-Agent Reinforcement Learning Approach

Distributed Deep Convolutional Neural Networks for the Internet-of-Things

A Novel Approach to Distributed Hypothesis Testing and Non-Bayesian Learning Enhancing Learning Speed and Byzantine Resilience

Multi-Task Regression-Based Learning for Autonomous Drone Flight Control in Unstructured Outdoor Environments

An Introduction to Decentralized Stochastic Optimization with Gradient Tracking

Training DNN IoT Applications for Deployment on Analog NVM Crossbars

I Feel You A Theory of Mind Experiment in Games

Multimodal Functional Maximum Correlation for Emotion Recognition

A DRL-Based Method with Bayesian Optimization for Joint Link Adaptation and Device Scheduling in URLLC Industrial IoT Networks

Causify DataFlow A Framework for High-Performance Machine Learning Stream Processing

Boosting the Low-Altitude Economy A Reliability-Aware Dynamic Weight Allocation for Multi-modal UAV Beam Prediction

Can Small Training Runs Reliably Guide Data Curation? Rethinking the Use of Proxy Models

MSACL Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponential Stabilization

HFedMoE Resource-Aware Heterogeneous Federated Learning with Mixture-of-Experts

Scale-Adaptive Power Flow Analysis with Local Topology Slicing and Multi-Task Graph Learning

REE-TTT Adaptive Radar Echo Extrapolation Through Test-Time Training

Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT

Sparse Threats, Focused Defense Robust Reinforcement Learning Aware of Criticality for Safe Autonomous Driving

Dynamic Radar Network of UAVs A Joint Navigation and Tracking Approach

Neural Turtle Graphics for Modeling City Road Layouts

A Comprehensive Study on Temporal Modeling for Online Action Detection

sql4ml A Declarative End-to-End Workflow for Machine Learning

VC Dimensions of Nondeterministic Finite Automata for Words of Equal Length

Coordinate Matrix Machine A Human-Level Approach to Classifying Very Similar Documents

Attention Demands Focus A Unified View on Attention Allocation

A Generalized UCB Bandit Algorithm for ML-Based Estimators

A Graph-based Framework for Online Time Series Anomaly Detection Using Model Ensemble

Accelerating Storage-Based Training for Graph Neural Networks

Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives

AutoFed Manual-Free Federated Traffic Prediction via Personalized Prompt

Avatar Forcing Real-Time Interactive Head Avatar Generation for Natural Conversation

BandiK Efficient Multi-Task Decomposition Using a Multi-Bandit Framework

Bayesian Geometry in Large Language Models

Benchmarking the Computational and Representational Efficiency of State Space Models against Transformers on Long-Context Dyadic Sessions

Beyond Invariance Le Cam s Path to Robust Transfer Learning

Beyond Solo Giants The Power of Multi-Model Teams

Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models

CoLog Multimodal Anomaly Detection in OS Logs

Combatting Reward Hacking through Information-Theoretic Bias Reduction

Complexity-based code embeddings

Data Complexity-aware Deep Model Performance Forecasting

Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms

DatBench Discriminative, Faithful, and Efficient VLM Evaluations

Deep Delta Learning

Deep Networks Learn Deep Hierarchical Models

DéjàQ Open-Ended Evolution of Diverse, Learnable and Verifiable Problems

Dynamic Large Concept Models Latent Reasoning in an Adaptive Semantic Space

E-GRPO High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Edge AI Surge Hardware-Powered Real-Time Dynamics Prediction

Efficient Deployment of OpenPangu Models with Post-Training Quantization

Entropy-Adaptive Fine-Tuning Resolving Confident Conflicts to Mitigate Forgetting

Escaping the Homogeneity Trap in DSM Deep Networks

Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning

Evolving Networks, Shifting Datasets A Stability Test

FedSCAM Scam-resistant SAM for Robust Federated Optimization in Heterogeneous Environments

Flow Equivariant World Models Memory for Partially Observed Dynamic Environments

Generative Classifiers Avoid Shortcut Solutions

Geometric and Dynamic Scaling in Deep Transformers

Geometric Regularization in Mixture-of-Experts The Disconnect Between Weights and Activations

Geometry of Reason Spectral Signatures of Valid Mathematical Reasoning

Hierarchical Agents Tackle Real-World SWE with Bandit Optimization

HOLOGRAPH Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors

HyperCLOVA X 8B Omni

Infini-Attention Boosting Small-Scale Pretraining Limits

Interpretability-Guided Bi-objective Optimization Aligning Accuracy and Explainability

IRPO Scaling the Bradley-Terry Model via Reinforcement Learning

KernelEvolve Automating DLRM Kernels for AI Heterogeneity

KL Divergence in Alignment Covering Modes vs. Seeking Rewards

LearnAD Learning Interpretable Rules for Brain Networks in Alzheimer s Disease Classification

Learning from Historical Activations in Graph Neural Networks

Length-Aware Adversarial Training for Variable-Length Trajectories Digital Twins for Mall Shopper Paths

LION-DG Layer-Informed Initialization with Deep Gradient Protocols for Accelerated Neural Network Training

LOFA Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection