Computer Science / Computer Vision

Exact Computation with Infinitely Wide Neural Networks

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers --- is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width. The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for the performance of a pure kernel-based method on CIFAR-10, being $10 %$ higher than the methods reported in [Novak et al., 2019], and only $6 %$ lower than the performance of the corresponding finite deep net architecture (once batch normalization, etc. are turned off). Theoretically, we also give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK.

Exact Computation with Infinitely Wide Neural Networks

AI-Based Detection of Pilgrims Using Convolutional Neural Networks

S&CNet Monocular Depth Completion for Autonomous Systems and 3D Reconstruction

Multi-Task Regression-Based Learning for Autonomous Drone Flight Control in Unstructured Outdoor Environments

Genetic Programming for Evolving an Interpretable Model Front for Data Visualization

PathoSyn MRI Synthesis of Imaging-Pathology through Disentangled Deviation Diffusion

Evolving, Not Training Zero-Shot Reasoning Segmentation through Evolutionary Prompting

DarkEQA Assessing Vision-Language Models for Embodied Question Answering in Dimly Lit Indoor Settings

Decoupling Amplitude and Phase Attention in the Frequency Domain for RGB-Event-Based Visual Object Tracking

Neural Turtle Graphics for Modeling City Road Layouts

A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets

A Comprehensive Dataset for Human vs. AI Generated Image Detection

Adaptive Hybrid Optimizer-based Framework for Lumpy Skin Disease Identification

Agentic Retoucher for Text-To-Image Generation

AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures

Analyzing the Shopping Journey Computing Shelf Browsing Visits in a Physical Retail Store

Application of Deep Learning Techniques in Non-Contrast CT Pulmonary Angiogram for Diagnosing Pulmonary Embolism

Boosting LLMs for AI Vision Few-Shot Prompting & Validation Breakthroughs

CoFi-Dec Combating Hallucinations in LVLMs with Coarse-to-Fine Feedback

CogFlow Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

Counterfactually Guiding MLLMs Curbing Visual Hallucinations

Deep Learning Aids in Skin Disease Diagnosis

Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

DrivingGen A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

EgoGrasp World-Space Hand-Object Interaction Estimation from Egocentric Videos

EndoRare One-Shot Synthesis for Gastrointestinal Rarity Training

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

Enhancing Object Detection with Privileged Information A Model-Agnostic Teacher-Student Approach

Enhancing Ocular Disease Diagnosis with Pathology Context Networks

EscherVerse An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding

Evaluating Contextual Intelligence in Recyclability A Comprehensive Study of Image-Based Reasoning Systems

Evaluating the Impact of Compression Techniques on the Robustness of CNNs under Natural Corruptions

Evolving CNN Architectures From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks

F2IDiff Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

FaithSCAN Guarding Visual Truth in AI Responses

FALCON Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation

ForCM Mapping Forest Cover with Deep Learning & OBIA

HaineiFRDM Exploring Diffusion for Restoring Defects in High-Speed Films

HarmoniAD Bridging Structure and Semantics for Precise Anomaly Detection

Holi-DETR Contextual Holistic Fashion Detection Transformer

HY-Motion 1.0 Text-To-3D Motion Revolution

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

ITSELF Attention Guided Fine-Grained Alignment for Vision-Language Retrieval

LinMU Simplifying Multimodal Understanding with Linearization

Luminark Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

MF-RSVLM Enhancing Remote Sensing with Multi-Feature Fusion

Noise-Robust Tiny Object Localization with Flows

OpenGround Pioneering Zero-Shot 3D Visual Grounding

PathFound Dynamic Evidence-seeking in Pathological Diagnosis

PEG-DRNet Hybrid Modeling for Infrared Gas Leak Detection

PipeFlow Scalable Video Editing with Motion-Aware Frame Selection

PointRAFT Estimating Potato Tuber Weight from Incomplete 3D Data

RefSR-Adv Adversarial Attack on Reference-based Image Super-Resolution Models

Remote Sensing Change Detection via Weak Temporal Supervision

Revolutionizing Thin Structure Segmentation Meet TopoLoRA-SAM

RSAgent Iterative Reasoning for Precision Text-Guided Segmentation

ShowUI-$π$ The Dexterous Hand of GUIs

Slot-ID Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding

SpaceTimePilot Generative Rendering of Dynamic Scenes Across Space and Time

Synthetic Boost Enhancing Anomaly Detection in Manufacturing

Temporal Inpainting for Anomaly Detection in Satellite Imagery

Temporal Precision Unlocking Event-Level Video-Text Synchronization

VerLM Explaining Face Verification Using Natural Language

VIBE Visual Instruction Based Editor

Video and Language Alignment in 2D Systems for 3D Multi-object Scenes with Multi-Information Derivative-Free Control

VideoSpeculateRAG Efficient Visual Knowledge Integration for QA

ViLaCD-R1 Semantically Smart Remote Sensing Change Detection

Virtual-Eyes Enhancing LDCT Quality for Lung Cancer AI Detection

VisNet Efficient ReID with Alpha-Divergence and Dynamic Learning

VIT-Ped Visionary Intention Transformer for Pedestrian Behavior Analysis

WildIng A Wildlife Image Invariant Representation Model for Geographical Domain Shift

< Category Statistics (Total: 566) >

Start searching

No results found