Image Rotation Angle Estimation: Comparing Circular-Aware Methods
Automatic image rotation estimation is a key preprocessing step in many vision pipelines. This task is challenging because angles have circular topology, creating boundary discontinuities that hinder standard regression methods. We present a comprehe…
Authors: Maximilian Woehrer
Imag e R o tation Angle Estimation: Comparing Circular - A w are Methods Maximilian W oehrer a,b a Resear ch Group Softwar e Arc hitecture, F aculty of Computer Science, U niversity of V ienna, Vienna, A ustria b MAXI solutions e.U., V ienna, Austria A R T I C L E I N F O Keyw ords : rotation estimation circular regression angular prediction transf er learning image or ientation deep learning A B S T R A C T Automatic image rotation estimation is a key preprocessing step in man y vision pipelines. This task is challenging because angles hav e circular topology , creating boundary discontinuities that hinder standard regression methods. W e present a comprehensive study of five circular -aw are methods for global orient ation estimation: direct angle regression with circular loss, classification via angular binning, unit-vector regression, phase-shifting coder , and circular Gaussian distribution. Using transf er learning from ImageNet-pretrained models, we systematically evaluate these methods across sixteen modern architectures by adapting their output heads for rotation-specific predictions. Our results show that probabilistic methods, particularly the circular Gaussian distribution, are the most robust across architectures, while classification achie ves the best accuracy on well-matc hed backbones but suffers training instabilities on others. The best configuration (classification with EfficientViT -B3) achie ves a mean absolute error (MAE) of 1.23 ° (mean across five independent runs) on the DR C-D dataset, while the circular Gaussian distribution with MambaOut Base achiev es a virtually identical 1.24 ° with greater robustness across backbones. Training and ev aluating our top-per f orming method- architecture combinations on COCO 2014, t he best configuration reaches 3.71 ° MAE, improving substantially over pr ior w ork, with further improvement to 2.84 ° on the larger COCO 2017 dataset. 1. Introduction Image rotation estimation is a common preprocessing step in computer vision pipelines, ensur ing that images are properl y oriented bef ore furt her analy sis. This task can be approached in two w ay s: discrete classification, which predicts cardinal r otations (e.g., 0 ° , 90 ° , 180 ° , 270 ° ), or continuous prediction across t he full 360 ° range. Continuous angle prediction is particularly challenging because angles hav e circular topology , creating two funda- mental problems. First, the same phy sical orient ation can be expressed by infinitely many numerical values: 0 ° , 360 ° , and 720 ° all descr ibe identical rotations. Second, t he circular boundary creates ar tificial discontinuities where physicall y similar angles appear numerically distant: 359 ° and 1 ° are only 2 ° apar t in reality but appear 358 ° apar t numer icall y . These circular properties cause significant problems for stan- dard deep lear ning approaches. When neural networ ks pre- dict angles as scalar values using typical L1 or L2 losses, they treat equivalent or ientations as differ ent t arg ets and compute ar tificially larg e er rors at the circular boundar y [ 1 , 2 ]. This sev erely hampers learning and model per f or mance unless the circular nature of the data is explicitl y addressed. While recent wor k in or iented object detection and pose estimation has shown the benefits of circular-a war e methods, these insights hav e not been systematicall y applied to image rotation estimation. Furt hermore, no comprehensive study has compared different circular-a ware methods across mod- ern neural architectures. With the ev olution from traditional CNNs to V ision Transf or mers and efficient architectures, practitioners lack clear guidance on which methods work best with which models. maximilian.woehrer@univie.ac.at (M. W oehrer) OR CID (s): 0000-0001-8536-4900 (M. W oehrer) T o address these limitations, we present a systematic ev aluation of five circular-a w are methods: direct angle re- gression with circular loss (D A), classification via angular binning (CLS), unit vector regression (UV), phase-shif ting coder (PSC), and circular Gaussian distribution (CGD). W e test t hese methods across sixteen moder n architectures to identify t he best combinations and provide practical guid- ance for method selection. 2. Related W ork Tr aditional rotation estimation methods relied on hand- crafted f eatures such as color distributions, edg e orienta- tions, and texture descriptors combined with classical ma- chine lear ning approaches. The introduction of deep con- v olutional networks marked a paradigm shift, with Fischer et al. [ 3 ] establishing a f oundational CNN-based approach that achie ved 20.97 ° MAE on COCO 2014 images, though it used standard reg ression losses that treat angular data as scalar values. Maji et al. [ 4 ] achie ved 8.38 ° MAE using an Xception architecture with specialized angular loss func- tions, and their code repository has since been updated with a Vision Transf or mer model claiming 6.5 ° MAE, t hough this later result remains unpublished. T ransf er learning from ImageN et-pretrained models has further impro ved results across rotation estimation tasks [ 5 , 6 ]. These dev elopments highlight both the potential of moder n architectures and t he persistent challeng es posed by circular data topology . The boundar y discontinuity at 0 ° /360 ° has dr iv en a range of circular-a w are methods, larg ely dev eloped in or iented object detection [ 7 , 8 ] and pose estimation [ 9 ]. Theoret- ical analyses hav e shown that low -dimensional rotation representations are inherentl y discontinuous, motivating boundary-free parameter izations [ 1 , 2 ]. Direct regression M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 1 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds with circular losses [ 4 ] computes minimum angular dis- tances but remains vulnerable to gradient conflicts. Clas- sification approaches discretize the angular space, wit h techniq ues lik e Circular Smoo t h Labels [ 10 ] and Dense Coded Labels [ 11 ] incor porating per iodic smoothing and Gra y coding for boundary handling. Probabilistic meth- ods such as von Mises mixtures [ 12 , 13 ] and Circular Gaussian Distributions [ 14 ] model angles as distr ibutions ov er discretized bins, enabling uncertainty quantification. Phase-Shifting Coders [ 15 ] provide continuous, boundar y- free representations through multiple cosine components with different phase offsets. Recent w ork has fur ther sought theoretically guaranteed continuous representations [ 16 ], reflecting ongoing interest in t his problem. While Xu et al. [ 17 ] provide an ov er vie w of classification and regression approaches f or or ientation estimation, these methods hav e not been systematicall y compared for global image rotation estimation across modern architectures. 3. Circular -A w are Methods Building on the f oundations established in related w ork, we now present fiv e circular-a w are methods that address t he boundary discontinuity problem through different paradigms. W e organize these methods into regression and classification approaches, each addressing the circular topology challenge through distinct strategies. W e provide a brief ov er view of each method, referring to t he cited literature f or more details. 3.1. Regr ession Methods Regression methods predict continuous angle values while carefull y handling the circular topology of angular data. These methods address the boundary discontinuity at 0 ° /360 ° through specialized loss functions, continuous parameterizations, or encoding schemes that maintain dif- f erentiability f or gradient-based optimization. 3.1.1. Direct Angle with Circular Loss The most intuitive approach predicts or ientation angles directly t hrough a single output neuron. While conce ptuall y straightf or w ard, t his method requires careful loss function design to handle t he circular nature of angular dat a. Tradi- tional loss functions treat angles near boundaries as numer- ically distant despite their geometric proximity , leading to optimization difficulties. Circular -aw are loss functions address t his limitation by incorporating angular distance into the objectiv e function. Follo wing Maji and Bose [ 4 ], we use a circular mean ab- solute er ror t hat computes t he loss based on the shorter angular distance min( | 𝜃 − 𝜃 | , 360 ◦ − | 𝜃 − 𝜃 | ) where 𝜃 and 𝜃 are the predicted and ground tr uth angles respectivel y . This f or mulation ensures t hat predictions near angular boundar ies receiv e appropriate gradient signals dur ing training, as the loss cor rectl y treats 1 ° and 359 ° as only 2 ° apart rather than 358 ° apar t. The direct angle approac h remains vulnerable to gra- dient conflicts when netw ork predictions span the 0 ° /360 ° boundary dur ing training, but the circular loss f ormulation significantly improv es conv ergence compared to standard regression losses. Our implementation uses a single regression head wit h circular MAE loss, providing a straightf orward baseline f or comparison wit h more sophisticated circular represen- tations. 3.1.2. Unit V ector Approach Unit vector repr esentation a v oids angular boundaries entirely by parameterizing orient ations as points on t he unit circle. The netw ork outputs two values representing cosine and sine components: [cos( 𝜃 ) , sin( 𝜃 )] . This represent ation naturally handles the circular topology since unit vectors pro vide continuous co v erage of the angular space without discontinuities. T sai et al. [ 18 ] demonstrated the effective- ness of this unit vector coding approach f or precise or ienta- tion estimation in rotated object detection. Rather than explicitl y nor malizing outputs to unit length during forw ard passes, our implementation uses regulariza- tion ter ms that encourage unit magnitude while preserving gradient flo w , f ollowing Pa vllo et al. [ 9 ]. The total loss combines MAE between predicted and targe t unit vectors with a regularization ter m 𝜆 ( ‖ 𝑣 ‖ − 1) 2 , where 𝜆 = 0 . 01 penalizes deviation from unit magnitude. This strategy bal- ances mathematical correctness with optimization stability . Decoding predicted unit vectors to angles uses the two- argument arctangent function: 𝜃 = atan2 ( 𝑣 sin , 𝑣 cos ) , which cor rectl y handles all quadrants and provides unique angle recov er y across the full 360 ° range. 3.1.3. Phase-Shifting Coder Phase-shifting approac hes encode angles through mul- tiple cosine components wit h different phase offsets, pro- viding a continuous and boundar y-free representation [ 15 ]. The encoding uses 𝑀 phase-shif ted terms: 𝑚 𝑛 = cos( 𝜔𝜃 + 2 𝜋 𝑛 ∕ 𝑀 ) for 𝑛 = 0 , 1 , … , 𝑀 − 1 . This parameterization dis- tributes angular inf or mation across multiple outputs while maintaining differentiability . The phase-shifting f ormulation addresses boundar y dis- continuity by ensur ing smooth variation across t he angular space. Since cosine functions are per iodic and continuous, the encoded representation av oids t he sharp transitions that plague direct angle prediction. Decoding uses 𝜃 = −(1∕ 𝜔 ) ar ct an( 𝑆 𝑠 ∕ 𝑆 𝑐 ) , where 𝑆 𝑠 = ∑ 𝑛 𝑚 𝑛 sin(2 𝜋 𝑛 ∕ 𝑀 ) , 𝑆 𝑐 = ∑ 𝑛 𝑚 𝑛 cos(2 𝜋 𝑛 ∕ 𝑀 ) . The net- w ork is trained to predict t he phase-shifted cosine values directly using MAE loss between predicted and targe t PSC representations. Our implementation uses three phases wit h unit fr e- quency ( 𝜔 = 1 ), pro viding a balance between representation capacity and computational efficiency . 3.2. Classification Methods Classification methods discretize the 360 ° angular space into bins, framing or ientation estimation as a multi-class pre- diction problem. This paradigm leverag es established classi- fication framew orks and often provides stable optimization M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 2 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds dynamics. The key challenge is handling the circular rela- tionship between bins, particularly the boundary betw een 0 ° and 360 ° , which can be addressed through specialized loss functions and probabilistic formulations. 3.2.1. Classification Classification approac hes discretize the 360 ° angular space into 𝑁 bins, treating rotation estimation as a standard multi-class problem. Each bin represents a range of angles, with bin centers serving as discrete angle representatives. This discretization enables the use of well-established clas- sification techniques and loss functions. This approach has been applied in automatic photo rotation estimation [ 6 ] and transf er learning scenar ios [ 5 ], though those wor ks discretize into onl y f our coarse classes (0 ° , 90 ° , 180 ° , 270 ° ); we extend the idea to fine-g rained 360-bin classification for continuous rotation estimation. Standard classification treats each bin independently , po- tentially creating ar tificial boundar ies between adjacent an- gle classes. While circular-a w are losses like Circular Smooth Labels (CSL) [ 10 ] and Dense Coded Labels (DCL) [ 11 ] ha v e been dev eloped to address this limit ation, our baseline im- plementation uses st andard cross-entropy loss for simplicity and stability . The classification approach offers sev eral practical ad- vantages, including stable training dynamics and straight- f or ward integ ration with existing classification framew orks. Cross-entropy loss pro vides robust optimization, and t he predicted angle is decoded as the center of the highes t- probability bin (argmax of sof tmax outputs), yielding angle resolution equal to the bin width. Our implementation employ s 360 bins (1 ° per bin) with standard cross-entropy loss f or training. 3.2.2. Circular Gaussian Distribution The Circular Gaussian Distribution (CGD) approach represents angles as probability distr ibutions over discretized angular bins. Rat her t han predicting point estimates, t he netw ork outputs a probability distr ibution t hat captures both the predicted angle and associated uncer tainty . Ground tr uth targe ts are encoded as Gaussian distr ibutions centered at t he true angle. This approach was introduced by Xu et al. [ 14 ] for rotated object detection to address boundary discontinuity issues in or iented bounding box regression. T arget encoding creates smooth probability distributions using circular distances with probability proportional to exp(− 𝑑 2 ( 𝜙 𝑖 , 𝜃 )∕(2 𝜎 2 )) where 𝑑 is the circular distance betw een bin center 𝜙 𝑖 and t arg et angle 𝜃 . T raining uses Kullbac k -Leibler diver gence between predicted and targ et distributions, while decoding uses argmax ov er the predicted distribution to identify the peak bin. The original formulation uses 180 bins for the [−90 ◦ , 90 ◦ ) range of or iented bounding box es; we adapt it to 360 bins cov er ing t he full [0 ◦ , 360 ◦ ) range for global image rotation, keeping 𝜎 = 6 ◦ as recommended by Xu et al. 4. Experiments 4.1. Datasets The DR C-D dataset w as introduced for content-preser ving rotation cor rection [ 19 ]. W e use the ground truth images from this dataset, which contain a large diversity of natural scenes with clear upright indicators, a voiding ambiguous orient ations such as aer ial vie ws or abstract scenes that lack clear orientation cues. The dataset ’s relev ance to a similar problem context and manageable size (1,474 unique training, 535 unique testing images) makes it f easible to train all 80 architecture-method combinations (16 architectures × 5 methods) with five seeds each, enabling t he extensiv e comparison presented in this wor k. T o enable comparison with pr ior wor k , we train and test our most promising configurations on t he COCO 2014 dataset [ 20 ] ( ∼ 83K training images), as used by Fischer et al. [ 3 ] and Maji and Bose [ 4 ]. W e use the same 1,030 test im- ages as Fisc her et al., compar ing against t heir reported MAE while noting that we cannot replicate their ex act rotation methodology due to undisclosed test rotation parameters. For Maji and Bose’ s method [ 4 ], whose e xact test set of 1,000 filtered COCO images is unknown, we evaluate t heir released model on the Fisc her test set to maintain consistent comparison conditions across all works. 4.2. Ro tation Methodology During training and testing, we apply synthetic rota- tions to images and handle t he resulting geometric transfor - mations using the larg est rotated rectangle approach. This method crops the maximum area from rotated images while a voiding black border artifacts that can confuse rotation estimation. Alternative approaches e xist, such as Follmann and Böttger ’ s [ 21 ] circular cropping method that uses a circle with radius equal to half the image size, with blac k borders smoothed using a Gaussian kernel (size 5 pixels), but we choose the rotated rectangle approach f or its area maximiza- tion properties. What matters more than t he specific method is ensur ing unif or m, artifact-free representations that force the neural network to learn genuine or ientation cues from image content rather than geometric processing ar tif acts. 4.3. T ransfer Learning Frame wor k Our approach follo ws established transfer lear ning method- ology similar to Amjoud and Amrouch [ 5 ], lev eraging ImageN et-pretrained [ 22 ] models as featur e extractors. Each architecture serves as a backbone wit h its final classifier replaced by t ask -specific heads optimized f or t he respective circular -aw are f or mulation. This strategy enables efficient comparison across methods while benefiting from rich pretrained representations. Figure 1 illustrates t he model architecture. W e select eight architecture families t hat span the ma- jor design paradigms in moder n visual recognition: pure transf or mers (Vi T [ 23 ], Swin [ 24 ]), pure conv olutions (Con- vNeXt V2 [ 25 ], EfficientNetV2 [ 26 ]), hybrid designs (Ef- ficientVi T [ 27 ], EdgeN eXt [ 28 ]), focal attention (Focal- Net [ 29 ]), and state-space-inspired gating (MambaOut [ 30 ]). M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 3 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds Backbone (pretrained) CNN, ViT , Hybrid, SSM, Focal Rotated input f D A ˆ θ direct UV (cos, sin) → atan2 → ˆ θ PSC M cosines → decode → ˆ θ CLS 360 logits → argmax → ˆ θ CGD 360-bin distr. → argmax → ˆ θ Regression Classification Figure 1: Model architecture. A pretrained backb one extracts features from the rotated input, which are passed to one of five circular-a wa re angle heads to predict the rotation angle 𝜃 . For each famil y we include a small and a large variant to disentangle the effect of model ca pacity from architectural inductive bias. The complete set of 16 backbones is listed in T able 1 . The timm librar y [ 31 ] provides consistent backbone handling across different architectures. For regression ap- proaches (direct angle, unit vector , PSC), we use a regression head with progressive dimensionality reduction (from bac k- bone f eatures through hidden lay ers of decreasing size), lay er normalization, R eLU activation, and dropout regular ization. Classification and CGD approaches use standard linear lay- ers to produce t he required number of output logits. 4.4. T raining and Optimization All experiments were r un on a single NVIDIA A100 GPU . W e use PyTorc h Lightning with mixed precision train- ing, AdamW [ 32 ] with square-root batch-size lear ning-rate scaling (to adapt eac h architecture ’ s base learning rate to the training batch size), ReduceLR OnPlateau lear ning-rate scheduling, and earl y stopping with patience of 15 epochs, training f or up to 1,000 epochs. W e use a batch size of 16; on a small dataset, larg er batches yield too f ew gradient steps per epoch relativ e to the epoch-based patience window , prev enting adequate fine-tuning of pretrained weights. No data augmentations are applied during training to av oid in- terfering with orient ation lear ning; input resolution follo ws each backbone ’ s def ault (for example, 224 × 224 f or most models). W e use fixed random seeds f or reproducibility: a predefined seed controls train/validation splits (10% v ali- dation) and generates consistent validation rotation angles, while training applies random rotations [0 ° , 360 ° ) each epoch. On DR C-D, a single fixed test seed is shared across all configurations to ensure directly comparable ev aluation; the reported variability reflects training randomness alone. On COCO 2014, we additionally av erage ov er fiv e test seeds to account for test-ro t ation variability . 4.5. Metrics W e define the circular distance between predicted angle 𝜃 and true angle 𝜃 as the minimum angular separation: 𝑑 ( 𝜃 , 𝜃 ) = min( | 𝜃 − 𝜃 | , 360 ◦ − | 𝜃 − 𝜃 | ) . This distance metric f or ms the f oundation for e valuating all methods in our study . W e repor t MAE, RMSE, and median error from circular distances, toge ther with Acc@ 𝑘 ° and AUC@ 𝑘 ° f or 𝑘 ∈ {2 , 5 , 10} , and t he 90th and 95t h percentile er rors (P90, P95) to characterize t ail beha vior . To quantify variability , DRC-D results are repor ted as means and standard deviations across five independent training runs with one test seed; COC O results use a single training r un a veraged ov er five test seeds. 5. Results 5.1. DR C-D: Summary Acr oss Heads and Backbones T able 1 show s results across all arc hitecture-method combinations, with each cell repor ting the mean and stan- dard deviation across five independent training r uns. CLS with EfficientVi T-B3 achiev es the best ov erall mean MAE of 1.23 ° , but exhibits training instability on sever al back - bone combinations where some r uns f ail to con ver ge: Mam- baOut Tiny (35.9 ° ), MambaOut Base (55.1 ° ), Swin Base (20.3 ° ), ConvN eXt V2 Base (20.7 ° ), and EfficientNetV2- R W T (29.4 ° ). CGD is the most robus t approac h, win- ning on 9 of 16 architectures with consistent con verg ence across all backbones; CGD with MambaOut Base reaches 1.24 ° , vir tually matching the best CLS result. PSC and UV achie ve competitiv e results on t he Con vNeXt V2 famil y , with PSC+Con vNeXt V2 Base being the most stable config- uration across all runs (std = 0.16 ° ). Direct-angle regression remains unstable across all backbones despite the circular loss. Backbone trends. EfficientViT -B3 is the standout bac kbone f or classification, achieving the bes t single-configuration result (1.23 ° ). MambaOut architectures pair best with CGD, producing the top-2 CGD results (1.24 ° for MambaOut Base and 1.33 ° for MambaOut Tin y). Con vNeXt V2 Base pairs most reliabl y wit h PSC and UV , yielding the most consistent results across r uns. The CLS instability patter n (training f ailures on some architectures) appears more pronounced f or deeper or complex models (MambaOut, Swin Base, Con- vNeXt V2 Base). Direct-angle regression collapses across all backbones, confir ming analyses t hat smoothing t he loss alone does not remov e boundar y issues [ 33 , 34 ]. 5.2. COCO Evaluation and Historical Comparison W e transf er the best architecture per method from DRC- D to COCO and compare against two historical baselines: Net-360 [ 3 ] and O AD-360 [ 4 ], whose released model we re- ev aluate in our pipeline. Table 2 summarizes all results. On COCO 2014, CGD with MambaOut Base achiev es the best MAE of 3.71 ° , follo wed by UV (4.51 ° ) and CLS (4.67 ° ). All four str uctured methods (CLS, UV , PSC, CGD) substantially outperform OAD-360 (10.07 ° ), which in tur n beats the early CNN baseline Net-360 (20.97 ° ). D A reaches 14.79 ° , impro ving ov er Net-360 but f alling well behind the other circular-a w are methods, confir ming t hat circular loss alone is insufficient f or competitiv e per f or mance. The released OAD-360 model is a Vision Tr ansf or mer, updated from the Xception architecture used in the or iginal publi- cation. Our re-evaluated MAE of 10.07 ° is higher than the M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 4 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds T able 1 DRC-D test p erformance across metho ds and backbones. MAE ° values a re means (standard deviation in parentheses) across five indep endent training runs. Best metho d p er a rchitecture in bold; column-b est mark ed with ∗ ; overall best underlined. Right-hand metrics are 5-run means for the best metho d p er a rchitecture (b old). † Several CLS configurations exhibit high va riance due to intermittent training instability; reported means include all five runs. Architecture Method (MAE ° , mean(std)) Best metho d metrics (5-run means) DA CLS UV PSC CGD Med ° RMSE ° Acc (2 ° /5 ° /10 ° ) AUC (2 ° /5 ° /10 ° ) P90/P95 ° ViT-Tiny 18.39(1.76) 6.85(0.74) 4.16(0.76) 4.17(0.22) 5.69(1.56) 2.31 7.69 0.43/0.81/0.96 0.22/0.48/0.69 6.49/11.82 ViT-Base 15.16(2.83) 2.97(1.80) 2.89(0.65) 2.80(0.58) 2.10(0.49) 1.30 3.97 0.70/0.98/1.00 0.37/0.69/0.84 2.99/5.53 EfficientViT-B0 15.68(4.06) 7.46(1.13) 5.87(2.01) 4.94(0.87) 4.87(0.52) 1.68 12.25 0.56/0.92/0.97 0.29/0.59/0.77 6.57/17.71 EfficientViT-B3 8.25(1.84) 1.23(0.61) ∗ 2.41(0.35) 2.74(1.53) 1.93(0.33) 0.79 2.17 0.90/0.99/1.00 0.55/0.80/0.90 1.89/3.18 ConvNeXt V2 Atto 7.88(2.11) ∗ 3.04(1.55) 2.69(0.41) 2.83(0.64) 3.46(1.54) 1.77 4.13 0.53/0.89/0.98 0.28/0.57/0.76 4.89/7.06 ConvNeXt V2 Base 61.42(8.16) 20.71(33.63) † 1.54(0.36) ∗ 1.47(0.16) ∗ 2.22(1.23) 0.96 2.44 0.80/0.99/1.00 0.46/0.75/0.87 2.47/3.81 EfficientNetV2-RW T 12.79(1.10) 29.39(30.06) † 3.16(0.44) 2.83(0.49) 3.14(1.01) 1.97 4.13 0.51/0.87/0.98 0.26/0.54/0.74 5.18/7.07 EfficientNetV2-RW M 11.42(2.02) 2.68(0.87) 2.83(0.85) 2.50(0.96) 3.47(0.50) 1.64 3.94 0.61/0.90/0.98 0.33/0.61/0.78 4.38/6.52 MambaOut Tiny 60.23(10.47) 35.94(40.68) † 2.20(0.81) 2.09(0.70) 1.33(0.22) 0.82 2.52 0.90/1.00/1.00 0.53/0.80/0.90 1.88/3.48 MambaOut Base 70.07(8.65) 55.15(42.98) † 1.99(0.38) 2.57(1.21) 1.24(0.53) ∗ 0.81 2.07 0.88/1.00/1.00 0.52/0.79/0.89 2.04/3.16 FocalNet Tiny LRF 12.59(3.94) 3.31(0.92) 2.34(0.60) 2.66(0.82) 2.28(0.63) 1.32 4.48 0.68/0.97/0.99 0.36/0.68/0.83 3.15/6.22 FocalNet Base LRF 11.32(2.69) 3.18(1.64) 2.28(0.59) 2.35(0.37) 1.81(0.59) 1.08 3.43 0.77/0.99/1.00 0.42/0.73/0.86 2.65/4.85 EdgeNeXt XX-Small 20.16(2.70) 3.29(1.28) 4.22(1.48) 3.54(0.64) 2.10(0.61) 1.05 4.38 0.78/0.99/0.99 0.43/0.73/0.86 3.19/6.22 EdgeNeXt Base 14.78(3.02) 1.77(1.74) 1.88(0.53) 2.30(0.82) 2.34(1.08) 0.59 4.48 0.92/0.99/0.99 0.66/0.85/0.92 2.76/7.00 Swin Tiny 74.23(6.96) 2.42(0.98) 3.30(1.39) 2.21(0.62) 2.03(0.90) 0.99 4.58 0.80/0.99/0.99 0.46/0.75/0.87 2.44/5.96 Swin Base 78.87(10.09) 20.33(34.64) † 2.31(0.65) 2.89(0.99) 1.39(0.39) 0.89 2.52 0.86/1.00/1.00 0.50/0.78/0.89 2.11/3.62 T able 2 COCO evaluation (tested on 1,030 val images). Each metho d is trained once; all values a re means across five test seeds; MAE standa rd deviations in parentheses. OAD-360 results are from our re-evaluation using their released mo del. Best on COCO 2014 in italic b old; overall b est in b old. Method Arch. MAE ° Med ° RMSE ° Acc (2 ° /5 ° /10 ° ) AUC (2 ° /5 ° /10 ° ) P90/P95 ° COCO 2014 ( ∼ 83K train images) Net-360 [ 3 ] AlexNet-like 20.97 – – –/–/– –/–/– –/– OAD-360 [ 4 ] ViT 10.07(1.07) 1.35 35.17 0.68/0.91/0.93 0.37/0.65/0.79 4.29 /88.18 Ours (DA) ConvNeXt V2-Atto 14.79(0.28) 3.35 31.32 0.34/0.63/0.79 0.18/0.38/0.55 41.33/65.67 Ours (CLS) EfficientViT-B3 4.67(0.42) 0.79 13.47 0.81/0.94/0.96 0.51/0.74/0.85 7.21/21.04 Ours (UV) ConvNeXt V2-Base 4.51(0.29) 1.40 11.26 0.63/0.89/0.96 0.35/0.62/0.78 6.67/16.94 Ours (PSC) ConvNeXt V2-Base 4.88(0.18) 1.41 12.52 0.62/0.89/0.95 0.35/0.61/0.77 6.78/18.41 Ours (CGD) MambaOut Base 3.71(0.48) 0.68 11.03 0.86/0.96/0.97 0.56/0.79/0.88 4.93/ 16.15 COCO 2017 ( ∼ 117K train images) Ours (CGD) MambaOut Base 2.84(0.43) 0.55 8.45 0.90/0.98/0.98 0.63/0.82/0.90 3.54/12.00 8.37 ° repor ted by Maji et al., which we attr ibute to differ- ences in t he test set and rotation sampling; their ev aluation protocol is not fully specified, precluding ex act reproduction. Notabl y , OAD-360 achie v es the best COCO 2014 P90 (4.29 ° ) but its P95 jumps to 88.18 ° , re v ealing a bimodal er ror distribution: the model is accurate on most images but produces catastrophic er rors on a small subset, a failure pattern characteristic of direct-angle reg ression. Tr aining t he best-per f or ming configuration, CGD with MambaOut Base, on the larger COCO 2017 train set ( ∼ 117K images) furt her reduces MAE to 2.84 ° , surpassing all pr ior methods across all reported metrics. This impro vement sug- gests that per f or mance is cur rently limited by dataset scale rather than method design. 6. Analy sis CLS and CGD as competitiv e top approaches. CLS achie ves the best single-configuration result on DRC-D (EfficientVi T-B3: 1.23 ° ), and its peak performance can be understood from two properties. First, discretizing 360 ° into bins eliminates the per iodicity problem entirely: t he 0 ° /360 ° wrap-around boundary that complicates regression- based losses simpl y does not exist. Second, t he classification objective is structurally identical to the ImageNet pretraining task, so the backbone’ s learned f eature representations are naturally compatible with a classification head, requir ing no repurposing of the f eature space. The predicted angle is decoded as t he center of the highest-probability bin (argmax). How ev er, CLS is not unif or ml y reliable across backbones: sever al architecture-approac h combinations ex- hibit training f ailures across multiple r uns. The instability is likely a consequence of how cross-entropy interacts with the pretrained feature space on a small dat aset. Cross- entropy loss agg ressiv ely concentrates probability mass on a single class, and when the backbone ’ s pretrained repre- sentations do not naturall y separate rotation-relev ant cues, the optimization can become trapped in poor regions of the loss landscape before meaningful or ientation f eatures emerg e. A strong ImageN et pr ior on texture or shape may pro vide little initial g radient signal for or ientation, causing earl y training dynamics to fail to escape high-loss plateaus. Architectures whose inductive biases happen to sur f ace orient ation-rele vant features early in fine-tuning are more resistant to this f ailure mode, which e xplains the strong M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 5 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds Input CGD GT Figure 2: Qualitative results of CGD (MambaOut Base) on COCO 2014. Left six: accurate predictions; right six: failure cases. dependence on the specific backbone rather t han on model capacity alone. CGD is nearl y tied at the top on DRC-D (MambaOut Base: 1.24 ° ) while remaining reliabl y stable across all tested backbones, arguably making it the more practical choice f or deplo yment. Probabilis tic super vision in CGD av oids shar p cross-entropy gradients and ne v er creates a rew ard for constant-output predictions, which explains its robustness. PSC provides a continuous code and rev ersible decod- ing, which is attractive when the code must be blended with other parameters (as in or iented detection) [ 15 , 34 , 33 ]. On global or ientation, CGD tends to hav e an edge due to explicit uncert ainty modeling, though PSC remains competitive on larg er backbones. Ev en with circular losses, the direct scalar f or mulation is br ittle: g radients conflict when predictions straddle the boundary , and the netw ork can settle into larg e- er ror modes. Our results quantify this effect across many backbones. Architectur e-approach com patibility. Not all combina- tions are equall y stable. A recur ring observation in T able 1 is that for a given method, scaling up the bac kbone does not reliably impro ve accuracy . With only 1,474 training images, larg er models r isk ov er fitting, and their stronger pretrained representations, optimized f or a different task distribution, can be harder to redirect to w ard rotation estimation. The interaction betw een method and architecture is t he dominant f actor: a w ell-matched smaller bac kbone can outper f or m a mismatched larg er one, highlighting t he import ance of systematic cross-arc hitecture ev aluation. Qualitative error analysis. Figure 2 show s representative successes and failures of our best model (CGD with Mam- baOut Base) on COCO 2014. The vas t major ity of images are corrected to wit hin a fe w degrees. Among the worst pre- dictions, er rors tend to cluster near cardinal rotations (90 ° , 180 ° ), where the model identifies a plausible or ientation axis but selects the wrong polar ity or per pendicular direction. This confusion typically occurs on images with weak grav - itational cues, such as close-up view s or symmetric scenes where upright and inv er ted orient ations are visually similar . 7. Limitations Sev eral limit ations should be ackno wledg ed when inter- preting our results. Deep lear ning training is inherently non- deterministic due to random weight initialization, data shuf- fling, and hardw are-specific floating-point operations. T o quantify this variability , all results in T able 1 report means and standard de viations across fiv e independent training runs with different random seeds. This multi-r un protocol substantially improv es reliability compared to single-run ev aluations: for some architecture-approac h combinations, individual r uns can vary by sev eral degrees in MAE, while five-run means are considerably more stable. Standard de- viations in the t able reflect t he tr ue r un-to-run variability and should be consulted alongside means when inter preting results. Our pr imary comparison uses DRC-D, which cont ains only 1,474 training images. While this small size makes the 80-configuration, five-seed study comput ationall y f easible, it also means that method rankings may shif t on larger and more div erse datasets. The COCO 2014 ev aluation provides initial evidence that our top findings transfer , but validating the full architecture-method gr id on a larger dataset w ould strengthen the conclusions. 8. Conclusion and Future W ork Modeling circular str ucture is essential for reliable angle prediction. In a comparison of five circular -a ware methods across sixteen architectures, classification and probabilistic methods consistently outperform direct scalar reg ression. CLS and CGD are competitive at the top on DRC-D: CLS achie ves the best single-configuration result (EfficientViT - B3: 1.23 ° ) while CGD offers equiv alent accuracy (Mam- baOut Base: 1.24 ° ) wit h better robustness across back - bone choices. Transf er ring our best configuration (CGD with MambaOut Base) to COCO 2014 yields 3.71 ° MAE, impro ving substantiall y o ver pr ior work, with further im- pro v ement to 2.84 ° when trained on t he larger COCO 2017 dataset. These findings, together with our reproducible framew ork, contr ibute to future work in circular reg ression, oriented object detection, and pose estimation. M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 6 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds Data and Code A v ailability Code and dat a are av ailable at https://github.com/maxwo e/im a ge- r otatio n- ang le- esti m ation . A demo is a vailable at http s://hu gging face.c o/spa ces/ma xwoe/ image- ro tatio n- an g le- estimation . Ref erences [1] Y . Zhou, C. Bar nes, J. Lu, J. Y ang, H. Li, On the Continuity of Ro tation Representations in Neural Netw orks, in: 2019 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 5738–5746. doi:10.1109/CV PR.2019.00589 . [2] J. Levinson, C. Este ves, K. Chen, N. Snavel y , A. Kanazaw a, A. Ros- tamizadeh, A. Makadia, An Anal ysis of S VD f or Deep R otation Estimation. [3] P . Fischer, A. Dosovitskiy , T . Brox, Image Orientation Estimation with Con volutional Networ ks, in: J. Gall, P . Gehler, B. Leibe (Eds.), Pattern Recognition, V ol. 9358, Springer International Publishing, Cham, 2015, pp. 368–378. doi:10.1007/978- 3- 319- 24947- 6_30 . [4] S. Maji, S. Bose, Deep Image Orient ation Angle Detection (Jun. 2020). , doi:10.48550/arXiv.2007.06709 . [5] A. B. Amjoud, M. Amrouch, Transf er Learning for Automatic Image Orientation Detection Using Deep Learning and Logistic Regression, IEEE Access 10 (2022) 128543–128553. do i:1 0.11 09/ ACCE SS. 202 2 . 3225455 . [6] U. Joshi, M. Guerzhoy , Automatic Photo Orient ation Detection wit h Conv olutional Neural Networks, in: 2017 14th Conf erence on Com- puter and Robot Vision (CR V), IEEE, Edmonton, AB, 2017, pp. 103– 108. doi:10.1109/CRV.2017.59 . [7] G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, L. Zhang, DOT A: A Large-Scale Dataset for Object Detec- tion in Aerial Images, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City , UT, 2018, pp. 3974–3983. doi:10.1109/CVPR.2018.00418 . [8] J. Ding, N. Xue, Y . Long, G.-S. Xia, Q. Lu, Learning RoI Transf or mer f or Or iented Object Detection in Aerial Images, in: 2019 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 2844–2853. doi:10.1109/CV PR.2019.00296 . [9] D. Pa vllo, D. Grangier, M. A uli, QuaterN et: A Quaternion-based Recurrent Model f or Human Motion (Jul. 2018). ar Xi v:1 80 5. 064 85 , doi:10.48550/arXiv.1805.06485 . [10] X. Y ang, J. Y an, Arbitrar y-Oriented Object Detection with Circular Smooth Label, in: A. V edaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.), Computer Vision – ECCV 2020, V ol. 12353, Springer Inter- national Publishing, Cham, 2020, pp. 677–694. doi:10.1007/978- 3- 0 30- 58598- 3_40 . [11] X. Y ang, L. Hou, Y . Zhou, W . W ang, J. Y an, Dense Label Encoding f or Boundary Discontinuity Free Rotation Detection, in: 2021 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, Nashville, TN, US A, 2021, pp. 15814–15824. doi:10 .1109/CV PR46437.2021.01556 . [12] S. Prokudin, P . Gehler, S. No wozin, Deep Directional Statistics: Pose Estimation with Uncert ainty Quantification, in: V . Fer rari, M. Heber t, C. Sminchisescu, Y . W eiss (Eds.), Computer Vision – ECCV 2018, V ol. 11213, Spr inger International Publishing, Cham, 2018, pp. 542– 559. doi:10.1007/978- 3- 030- 01240- 3_33 . [13] I. Gilitschenski, R. Sahoo, W . Schw ar ting, A. Amini, S. Karaman, D. Rus, Deep Orientation Uncert ainty Learning based on a Bingham Loss (2020). [14] H. Xu, X. Liu, Y . Ma, Z. Zhu, S. W ang, C. Y an, F. Dai, Rotated Object Detection with Circular Gaussian Distribution, Electronics 12 (15) (2023) 3265. doi:10.3390/electronics12153265 . [15] Y . Y u, F. Da, Phase-Shifting Coder: Predicting Accurate Or ientation in Or iented Object Detection (Mar . 2023). ar X iv : 22 1 1 .0 6 368 , d o i: 10.48550/arXiv.2211.06368 . [16] Z.-K. Xiao, G.- Y . Y ang, X. Y ang, T.-J. Mu, J. Y an, S.-M. Hu, Theo- reticall y Achie ving Continuous Representation of Oriented Bounding Box es, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, W A, USA, 2024, pp. 16912–16922. doi:10.1109/CVPR52733.2024.01600 . [17] R. Xu, Y . Shi, Z. Qi, Image Orientation Estimation Based On Deep Learning - A Survey , Procedia Computer Science 242 (2024) 1193– 1197. doi:10.1016/j.procs.2024.08.176 . [18] C.- Y . Tsai, W .-C. Lin, Precise Orientation Estimation f or R otated Ob- ject Detection Based on a U nit V ector Coding Approach, Electronics 13 (22) (2024) 4402. doi:10.3390/electronics13224402 . [19] L. Nie, C. Lin, K. Liao, S. Liu, Y . Zhao, Deep Rotation Correction without Angle Prior, IEEE Trans. on Image Process. 32 (2023) 2879– 2888. , doi:10.1109/TIP.2023.3275869 . [20] T .- Y . Lin, M. Maire, S. Belongie, L. Bourdev , R. Girshick, J. Hays, P . Perona, D. Ramanan, C. L. Zitnick, P . Dollár, Microsoft COCO: Common objects in context (2015). . [21] P . Follmann, T . Bottger, A rotationall y-in variant conv olution module by feature map back -rotation, in: 2018 IEEE Winter Conference on Applications of Computer Vision (W A CV), 2018, pp. 784–792. doi: 10.1109/WACV.2018.00091 . [22] O. R ussakovsky , J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy , A. Khosla, M. Bernstein, A. C. Berg, L. Fei- Fei, ImageNet Large Scale Visual Recognition Challenge (Jan. 2015). arXiv:1409.0575 , doi:10.48550/arXiv.1409.0575 . [23] A. Dosovitskiy , L. Bey er, A. Kolesniko v , D. W eissenbor n, X. Zhai, T . Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly , J. Uszkoreit, N. Houlsby , An Image is W orth 16x16 W ords: Trans- f ormers for Image Recognition at Scale (Jun. 2021). , doi:10.48550/arXiv.2010.11929 . [24] Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . W ei, Z. Zhang, S. Lin, B. Guo, Swin Transf ormer: Hierarchical Vision Transf ormer using Shifted Windo ws, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 2021, pp. 9992–10002. doi:10.1109/ICCV48922.2021.00986 . [25] S. W oo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, S. Xie, ConvN eXt V2: Co-designing and Scaling ConvNe ts with Masked Autoencoders, in: 2023 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, V ancouver , BC, Canada, 2023, pp. 16133–16142. doi:10.1109/CVPR52729.2023.01548 . [26] M. Tan, Q. V . Le, EfficientNetV2: Smaller Models and Faster Train- ing. [27] X. Liu, H. Peng, N. Zheng, Y . Y ang, H. Hu, Y . Y uan, EfficientVi T: Memory Efficient Vision T ransformer with Cascaded Group At- tention, in: 2023 IEEE/CVF Conf erence on Computer Vision and Pattern Recognition (CVPR), IEEE, V ancouv er, BC, Canada, 2023, pp. 14420–14430. doi:10.1109/CVPR52729.2023.01386 . [28] M. Maaz, A. Shaker, H. Cholakkal, S. Khan, S. W . Zamir, R. M. An wer , F . S. Khan, EdgeNeXt: Efficiently Amalgamated CNN- Transf ormer Architecture for Mobile Vision Applications (Oct. 2022). arXiv:2206.10589 , doi:10.48550/arXiv.2206.10589 . [29] J. Y ang, C. Li, X. Dai, J. Gao, Focal Modulation Networ ks. [30] W . Y u, X. W ang, MambaOut: Do W e Reall y Need Mamba for Vision? [31] R. Wightman, PyT orch image models (2019). doi:1 0.52 81/ze nodo. 4 414861 . [32] I. Loshchilo v , F. Hutter, Decoupled W eight Decay Regularization (Jan. 2019). , doi:10.48550/arXiv.1711.05101 . [33] H. Xu, X. Liu, H. Xu, Y . Ma, Z. Zhu, C. Y an, F . Dai, Rethinking Boundary Discontinuity Problem for Oriented Object Detection, in: 2024 IEEE/CVF Conference on Computer Vision and Patter n Recog- nition (CVPR), IEEE, Seattle, W A, USA, 2024, pp. 17406–17415. doi:10.1109/CVPR52733.2024.01648 . [34] Y . Y u, F . Da, On Boundary Discontinuity in Angle Regression Based Arbitrary Oriented Object Detection, IEEE Trans. Pattern Anal. Mach. Intell. 46 (10) (2024) 6494–6508. d o i :1 0 . 1 1 0 9 / T P A M I . 2 0 2 4 .3378777 . M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 7 of 7
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment