Computer Vision

A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets

Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels. In practice, practitioners often choose among (i) training a compact custom CNN from scratch, (ii) using a large pre-trained CNN as a fixed feature extractor, and (iii) performing transfer learning via partial or full fine-tuning of a pre-trained backbone. This report presents a controlled comparison of these three paradigms across five real-world image classification datasets spanning road-surface defect recognition, agricultural variety identification, fruit/leaf disease recognition, pedestrian walkway encroachment recognition, and unauthorized vehicle recognition. Models are evaluated using accuracy and macro F1-score, complemented by efficiency metrics including training time per epoch and parameter counts. The results show that transfer learning consistently yields the strongest predictive performance, while the custom CNN provides an attractive efficiency--accuracy trade-off, especially when compute and memory budgets are constrained.

A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets

A Comprehensive Dataset for Human vs. AI Generated Image Detection

Adaptive Hybrid Optimizer-based Framework for Lumpy Skin Disease Identification

Agentic Retoucher for Text-To-Image Generation

AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures

Analyzing the Shopping Journey Computing Shelf Browsing Visits in a Physical Retail Store

Application of deep learning techniques in non-contrast computed tomography pulmonary angiogram for pulmonary embolism diagnosis

CogFlow Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

DarkEQA Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking

DeepInv A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion

Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

DrivingGen A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

EgoGrasp World-Space Hand-Object Interaction Estimation from Egocentric Videos

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

Enhancing Object Detection with Privileged Information A Model-Agnostic Teacher-Student Approach

EscherVerse An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding

Evaluating Contextual Intelligence in Recyclability A Comprehensive Study of Image-Based Reasoning Systems

Evaluating the Impact of Compression Techniques on the Robustness of CNNs under Natural Corruptions

Evolving CNN Architectures From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks

Evolving, Not Training Zero-Shot Reasoning Segmentation via Evolutionary Prompting

F2IDiff Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

FALCON Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation

HaineiFRDM Exploring Diffusion for Restoring Defects in High-Speed Films

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

ITSELF Attention Guided Fine-Grained Alignment for Vision-Language Retrieval

LinMU Simplifying Multimodal Understanding with Linearization

Luminark Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

Noise-Robust Tiny Object Localization with Flows

PathoSyn Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion

RefSR-Adv Adversarial Attack on Reference-based Image Super-Resolution Models

Remote Sensing Change Detection via Weak Temporal Supervision

Slot-ID Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding

SpaceTimePilot Generative Rendering of Dynamic Scenes Across Space and Time

SwinIFS Landmark Guided Swin Transformer For Identity Preserving Face Super Resolution

VerLM Explaining Face Verification Using Natural Language

VIBE Visual Instruction Based Editor

Video and Language Alignment in 2D Systems for 3D Multi-object Scenes with Multi-Information Derivative-Free Control

VIT-Ped Visionary Intention Transformer for Pedestrian Behavior Analysis

WildIng A Wildlife Image Invariant Representation Model for Geographical Domain Shift

< Category Statistics (Total: 301) >

Start searching

No results found