Image Rotation Angle Estimation: Comparing Circular-Aware Methods

Imag e R o tation Angle Estimation: Comparing Circular - A w are Methods Maximilian W oehrer a,b a Resear ch Group Softwar e Arc hitecture, F aculty of Computer Science, U niversity of V ienna, Vienna, A ustria b MAXI solutions e.U., V ienna, Austria A R T I C L E I N F O Keyw ords : rotation estimation circular regression angular prediction transf er learning image or ientation deep learning A B S T R A C T Automatic image rotation estimation is a key preprocessing step in man y vision pipelines. This task is challenging because angles hav e circular topology , creating boundary discontinuities that hinder standard regression methods. W e present a comprehensive study of ﬁve circular -aw are methods for global orient ation estimation: direct angle regression with circular loss, classiﬁcation via angular binning, unit-vector regression, phase-shifting coder , and circular Gaussian distribution. Using transf er learning from ImageNet-pretrained models, we systematically evaluate these methods across sixteen modern architectures by adapting their output heads for rotation-speciﬁc predictions. Our results show that probabilistic methods, particularly the circular Gaussian distribution, are the most robust across architectures, while classiﬁcation achie ves the best accuracy on well-matc hed backbones but suﬀers training instabilities on others. The best conﬁguration (classiﬁcation with EﬃcientViT -B3) achie ves a mean absolute error (MAE) of 1.23 ° (mean across ﬁve independent runs) on the DR C-D dataset, while the circular Gaussian distribution with MambaOut Base achiev es a virtually identical 1.24 ° with greater robustness across backbones. Training and ev aluating our top-per f orming method- architecture combinations on COCO 2014, t he best conﬁguration reaches 3.71 ° MAE, improving substantially over pr ior w ork, with further improvement to 2.84 ° on the larger COCO 2017 dataset. 1. Introduction Image rotation estimation is a common preprocessing step in computer vision pipelines, ensur ing that images are properl y oriented bef ore furt her analy sis. This task can be approached in two w ay s: discrete classiﬁcation, which predicts cardinal r otations (e.g., 0 ° , 90 ° , 180 ° , 270 ° ), or continuous prediction across t he full 360 ° range. Continuous angle prediction is particularly challenging because angles hav e circular topology , creating two funda- mental problems. First, the same phy sical orient ation can be expressed by inﬁnitely many numerical values: 0 ° , 360 ° , and 720 ° all descr ibe identical rotations. Second, t he circular boundary creates ar tiﬁcial discontinuities where physicall y similar angles appear numerically distant: 359 ° and 1 ° are only 2 ° apar t in reality but appear 358 ° apar t numer icall y . These circular properties cause signiﬁcant problems for stan- dard deep lear ning approaches. When neural networ ks pre- dict angles as scalar values using typical L1 or L2 losses, they treat equivalent or ientations as diﬀer ent t arg ets and compute ar tiﬁcially larg e er rors at the circular boundar y [ 1 , 2 ]. This sev erely hampers learning and model per f or mance unless the circular nature of the data is explicitl y addressed. While recent wor k in or iented object detection and pose estimation has shown the beneﬁts of circular-a war e methods, these insights hav e not been systematicall y applied to image rotation estimation. Furt hermore, no comprehensive study has compared diﬀerent circular-a ware methods across mod- ern neural architectures. With the ev olution from traditional CNNs to V ision Transf or mers and eﬃcient architectures, practitioners lack clear guidance on which methods work best with which models. maximilian.woehrer@univie.ac.at (M. W oehrer) OR CID (s): 0000-0001-8536-4900 (M. W oehrer) T o address these limitations, we present a systematic ev aluation of ﬁve circular-a w are methods: direct angle re- gression with circular loss (D A), classiﬁcation via angular binning (CLS), unit vector regression (UV), phase-shif ting coder (PSC), and circular Gaussian distribution (CGD). W e test t hese methods across sixteen moder n architectures to identify t he best combinations and provide practical guid- ance for method selection. 2. Related W ork Tr aditional rotation estimation methods relied on hand- crafted f eatures such as color distributions, edg e orienta- tions, and texture descriptors combined with classical ma- chine lear ning approaches. The introduction of deep con- v olutional networks marked a paradigm shift, with Fischer et al. [ 3 ] establishing a f oundational CNN-based approach that achie ved 20.97 ° MAE on COCO 2014 images, though it used standard reg ression losses that treat angular data as scalar values. Maji et al. [ 4 ] achie ved 8.38 ° MAE using an Xception architecture with specialized angular loss func- tions, and their code repository has since been updated with a Vision Transf or mer model claiming 6.5 ° MAE, t hough this later result remains unpublished. T ransf er learning from ImageN et-pretrained models has further impro ved results across rotation estimation tasks [ 5 , 6 ]. These dev elopments highlight both the potential of moder n architectures and t he persistent challeng es posed by circular data topology . The boundar y discontinuity at 0 ° /360 ° has dr iv en a range of circular-a w are methods, larg ely dev eloped in or iented object detection [ 7 , 8 ] and pose estimation [ 9 ]. Theoret- ical analyses hav e shown that low -dimensional rotation representations are inherentl y discontinuous, motivating boundary-free parameter izations [ 1 , 2 ]. Direct regression M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 1 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds with circular losses [ 4 ] computes minimum angular dis- tances but remains vulnerable to gradient conﬂicts. Clas- siﬁcation approaches discretize the angular space, wit h techniq ues lik e Circular Smoo t h Labels [ 10 ] and Dense Coded Labels [ 11 ] incor porating per iodic smoothing and Gra y coding for boundary handling. Probabilistic meth- ods such as von Mises mixtures [ 12 , 13 ] and Circular Gaussian Distributions [ 14 ] model angles as distr ibutions ov er discretized bins, enabling uncertainty quantiﬁcation. Phase-Shifting Coders [ 15 ] provide continuous, boundar y- free representations through multiple cosine components with diﬀerent phase oﬀsets. Recent w ork has fur ther sought theoretically guaranteed continuous representations [ 16 ], reﬂecting ongoing interest in t his problem. While Xu et al. [ 17 ] provide an ov er vie w of classiﬁcation and regression approaches f or or ientation estimation, these methods hav e not been systematicall y compared for global image rotation estimation across modern architectures. 3. Circular -A w are Methods Building on the f oundations established in related w ork, we now present ﬁv e circular-a w are methods that address t he boundary discontinuity problem through diﬀerent paradigms. W e organize these methods into regression and classiﬁcation approaches, each addressing the circular topology challenge through distinct strategies. W e provide a brief ov er view of each method, referring to t he cited literature f or more details. 3.1. Regr ession Methods Regression methods predict continuous angle values while carefull y handling the circular topology of angular data. These methods address the boundary discontinuity at 0 ° /360 ° through specialized loss functions, continuous parameterizations, or encoding schemes that maintain dif- f erentiability f or gradient-based optimization. 3.1.1. Direct Angle with Circular Loss The most intuitive approach predicts or ientation angles directly t hrough a single output neuron. While conce ptuall y straightf or w ard, t his method requires careful loss function design to handle t he circular nature of angular dat a. Tradi- tional loss functions treat angles near boundaries as numer- ically distant despite their geometric proximity , leading to optimization diﬃculties. Circular -aw are loss functions address t his limitation by incorporating angular distance into the objectiv e function. Follo wing Maji and Bose [ 4 ], we use a circular mean ab- solute er ror t hat computes t he loss based on the shorter angular distance min( |  𝜃 − 𝜃 | , 360 ◦ − |  𝜃 − 𝜃 | ) where  𝜃 and 𝜃 are the predicted and ground tr uth angles respectivel y . This f or mulation ensures t hat predictions near angular boundar ies receiv e appropriate gradient signals dur ing training, as the loss cor rectl y treats 1 ° and 359 ° as only 2 ° apart rather than 358 ° apar t. The direct angle approac h remains vulnerable to gra- dient conﬂicts when netw ork predictions span the 0 ° /360 ° boundary dur ing training, but the circular loss f ormulation signiﬁcantly improv es conv ergence compared to standard regression losses. Our implementation uses a single regression head wit h circular MAE loss, providing a straightf orward baseline f or comparison wit h more sophisticated circular represen- tations. 3.1.2. Unit V ector Approach Unit vector repr esentation a v oids angular boundaries entirely by parameterizing orient ations as points on t he unit circle. The netw ork outputs two values representing cosine and sine components: [cos( 𝜃 ) , sin( 𝜃 )] . This represent ation naturally handles the circular topology since unit vectors pro vide continuous co v erage of the angular space without discontinuities. T sai et al. [ 18 ] demonstrated the eﬀective- ness of this unit vector coding approach f or precise or ienta- tion estimation in rotated object detection. Rather than explicitl y nor malizing outputs to unit length during forw ard passes, our implementation uses regulariza- tion ter ms that encourage unit magnitude while preserving gradient ﬂo w , f ollowing Pa vllo et al. [ 9 ]. The total loss combines MAE between predicted and targe t unit vectors with a regularization ter m 𝜆 ( ‖ 𝑣 ‖ − 1) 2 , where 𝜆 = 0 . 01 penalizes deviation from unit magnitude. This strategy bal- ances mathematical correctness with optimization stability . Decoding predicted unit vectors to angles uses the two- argument arctangent function: 𝜃 = atan2 ( 𝑣 sin , 𝑣 cos ) , which cor rectl y handles all quadrants and provides unique angle recov er y across the full 360 ° range. 3.1.3. Phase-Shifting Coder Phase-shifting approac hes encode angles through mul- tiple cosine components wit h diﬀerent phase oﬀsets, pro- viding a continuous and boundar y-free representation [ 15 ]. The encoding uses 𝑀 phase-shif ted terms: 𝑚 𝑛 = cos( 𝜔𝜃 + 2 𝜋 𝑛 ∕ 𝑀 ) for 𝑛 = 0 , 1 , … , 𝑀 − 1 . This parameterization dis- tributes angular inf or mation across multiple outputs while maintaining diﬀerentiability . The phase-shifting f ormulation addresses boundar y dis- continuity by ensur ing smooth variation across t he angular space. Since cosine functions are per iodic and continuous, the encoded representation av oids t he sharp transitions that plague direct angle prediction. Decoding uses 𝜃 = −(1∕ 𝜔 ) ar ct an( 𝑆 𝑠 ∕ 𝑆 𝑐 ) , where 𝑆 𝑠 = ∑ 𝑛 𝑚 𝑛 sin(2 𝜋 𝑛 ∕ 𝑀 ) , 𝑆 𝑐 = ∑ 𝑛 𝑚 𝑛 cos(2 𝜋 𝑛 ∕ 𝑀 ) . The net- w ork is trained to predict t he phase-shifted cosine values directly using MAE loss between predicted and targe t PSC representations. Our implementation uses three phases wit h unit fr e- quency ( 𝜔 = 1 ), pro viding a balance between representation capacity and computational eﬃciency . 3.2. Classiﬁcation Methods Classiﬁcation methods discretize the 360 ° angular space into bins, framing or ientation estimation as a multi-class pre- diction problem. This paradigm leverag es established classi- ﬁcation framew orks and often provides stable optimization M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 2 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds dynamics. The key challenge is handling the circular rela- tionship between bins, particularly the boundary betw een 0 ° and 360 ° , which can be addressed through specialized loss functions and probabilistic formulations. 3.2.1. Classiﬁcation Classiﬁcation approac hes discretize the 360 ° angular space into 𝑁 bins, treating rotation estimation as a standard multi-class problem. Each bin represents a range of angles, with bin centers serving as discrete angle representatives. This discretization enables the use of well-established clas- siﬁcation techniques and loss functions. This approach has been applied in automatic photo rotation estimation [ 6 ] and transf er learning scenar ios [ 5 ], though those wor ks discretize into onl y f our coarse classes (0 ° , 90 ° , 180 ° , 270 ° ); we extend the idea to ﬁne-g rained 360-bin classiﬁcation for continuous rotation estimation. Standard classiﬁcation treats each bin independently , po- tentially creating ar tiﬁcial boundar ies between adjacent an- gle classes. While circular-a w are losses like Circular Smooth Labels (CSL) [ 10 ] and Dense Coded Labels (DCL) [ 11 ] ha v e been dev eloped to address this limit ation, our baseline im- plementation uses st andard cross-entropy loss for simplicity and stability . The classiﬁcation approach oﬀers sev eral practical ad- vantages, including stable training dynamics and straight- f or ward integ ration with existing classiﬁcation framew orks. Cross-entropy loss pro vides robust optimization, and t he predicted angle is decoded as the center of the highes t- probability bin (argmax of sof tmax outputs), yielding angle resolution equal to the bin width. Our implementation employ s 360 bins (1 ° per bin) with standard cross-entropy loss f or training. 3.2.2. Circular Gaussian Distribution The Circular Gaussian Distribution (CGD) approach represents angles as probability distr ibutions over discretized angular bins. Rat her t han predicting point estimates, t he netw ork outputs a probability distr ibution t hat captures both the predicted angle and associated uncer tainty . Ground tr uth targe ts are encoded as Gaussian distr ibutions centered at t he true angle. This approach was introduced by Xu et al. [ 14 ] for rotated object detection to address boundary discontinuity issues in or iented bounding box regression. T arget encoding creates smooth probability distributions using circular distances with probability proportional to exp(− 𝑑 2 ( 𝜙 𝑖 , 𝜃 )∕(2 𝜎 2 )) where 𝑑 is the circular distance betw een bin center 𝜙 𝑖 and t arg et angle 𝜃 . T raining uses Kullbac k -Leibler diver gence between predicted and targ et distributions, while decoding uses argmax ov er the predicted distribution to identify the peak bin. The original formulation uses 180 bins for the [−90 ◦ , 90 ◦ ) range of or iented bounding box es; we adapt it to 360 bins cov er ing t he full [0 ◦ , 360 ◦ ) range for global image rotation, keeping 𝜎 = 6 ◦ as recommended by Xu et al. 4. Experiments 4.1. Datasets The DR C-D dataset w as introduced for content-preser ving rotation cor rection [ 19 ]. W e use the ground truth images from this dataset, which contain a large diversity of natural scenes with clear upright indicators, a voiding ambiguous orient ations such as aer ial vie ws or abstract scenes that lack clear orientation cues. The dataset ’s relev ance to a similar problem context and manageable size (1,474 unique training, 535 unique testing images) makes it f easible to train all 80 architecture-method combinations (16 architectures × 5 methods) with ﬁve seeds each, enabling t he extensiv e comparison presented in this wor k. T o enable comparison with pr ior wor k , we train and test our most promising conﬁgurations on t he COCO 2014 dataset [ 20 ] ( ∼ 83K training images), as used by Fischer et al. [ 3 ] and Maji and Bose [ 4 ]. W e use the same 1,030 test im- ages as Fisc her et al., compar ing against t heir reported MAE while noting that we cannot replicate their ex act rotation methodology due to undisclosed test rotation parameters. For Maji and Bose’ s method [ 4 ], whose e xact test set of 1,000 ﬁltered COCO images is unknown, we evaluate t heir released model on the Fisc her test set to maintain consistent comparison conditions across all works. 4.2. Ro tation Methodology During training and testing, we apply synthetic rota- tions to images and handle t he resulting geometric transfor - mations using the larg est rotated rectangle approach. This method crops the maximum area from rotated images while a voiding black border artifacts that can confuse rotation estimation. Alternative approaches e xist, such as Follmann and Böttger ’ s [ 21 ] circular cropping method that uses a circle with radius equal to half the image size, with blac k borders smoothed using a Gaussian kernel (size 5 pixels), but we choose the rotated rectangle approach f or its area maximiza- tion properties. What matters more than t he speciﬁc method is ensur ing unif or m, artifact-free representations that force the neural network to learn genuine or ientation cues from image content rather than geometric processing ar tif acts. 4.3. T ransfer Learning Frame wor k Our approach follo ws established transfer lear ning method- ology similar to Amjoud and Amrouch [ 5 ], lev eraging ImageN et-pretrained [ 22 ] models as featur e extractors. Each architecture serves as a backbone wit h its ﬁnal classiﬁer replaced by t ask -speciﬁc heads optimized f or t he respective circular -aw are f or mulation. This strategy enables eﬃcient comparison across methods while beneﬁting from rich pretrained representations. Figure 1 illustrates t he model architecture. W e select eight architecture families t hat span the ma- jor design paradigms in moder n visual recognition: pure transf or mers (Vi T [ 23 ], Swin [ 24 ]), pure conv olutions (Con- vNeXt V2 [ 25 ], EﬃcientNetV2 [ 26 ]), hybrid designs (Ef- ﬁcientVi T [ 27 ], EdgeN eXt [ 28 ]), focal attention (Focal- Net [ 29 ]), and state-space-inspired gating (MambaOut [ 30 ]). M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 3 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds Backbone (pretrained) CNN, ViT , Hybrid, SSM, Focal Rotated input f D A ˆ θ direct UV (cos, sin) → atan2 → ˆ θ PSC M cosines → decode → ˆ θ CLS 360 logits → argmax → ˆ θ CGD 360-bin distr. → argmax → ˆ θ Regression Classiﬁcation Figure 1: Model architecture. A pretrained backb one extracts features from the rotated input, which are passed to one of ﬁve circular-a wa re angle heads to predict the rotation angle  𝜃 . For each famil y we include a small and a large variant to disentangle the eﬀect of model ca pacity from architectural inductive bias. The complete set of 16 backbones is listed in T able 1 . The timm librar y [ 31 ] provides consistent backbone handling across diﬀerent architectures. For regression ap- proaches (direct angle, unit vector , PSC), we use a regression head with progressive dimensionality reduction (from bac k- bone f eatures through hidden lay ers of decreasing size), lay er normalization, R eLU activation, and dropout regular ization. Classiﬁcation and CGD approaches use standard linear lay- ers to produce t he required number of output logits. 4.4. T raining and Optimization All experiments were r un on a single NVIDIA A100 GPU . W e use PyTorc h Lightning with mixed precision train- ing, AdamW [ 32 ] with square-root batch-size lear ning-rate scaling (to adapt eac h architecture ’ s base learning rate to the training batch size), ReduceLR OnPlateau lear ning-rate scheduling, and earl y stopping with patience of 15 epochs, training f or up to 1,000 epochs. W e use a batch size of 16; on a small dataset, larg er batches yield too f ew gradient steps per epoch relativ e to the epoch-based patience window , prev enting adequate ﬁne-tuning of pretrained weights. No data augmentations are applied during training to av oid in- terfering with orient ation lear ning; input resolution follo ws each backbone ’ s def ault (for example, 224 × 224 f or most models). W e use ﬁxed random seeds f or reproducibility: a predeﬁned seed controls train/validation splits (10% v ali- dation) and generates consistent validation rotation angles, while training applies random rotations [0 ° , 360 ° ) each epoch. On DR C-D, a single ﬁxed test seed is shared across all conﬁgurations to ensure directly comparable ev aluation; the reported variability reﬂects training randomness alone. On COCO 2014, we additionally av erage ov er ﬁv e test seeds to account for test-ro t ation variability . 4.5. Metrics W e deﬁne the circular distance between predicted angle  𝜃 and true angle 𝜃 as the minimum angular separation: 𝑑 (  𝜃 , 𝜃 ) = min( |  𝜃 − 𝜃 | , 360 ◦ − |  𝜃 − 𝜃 | ) . This distance metric f or ms the f oundation for e valuating all methods in our study . W e repor t MAE, RMSE, and median error from circular distances, toge ther with Acc@ 𝑘 ° and AUC@ 𝑘 ° f or 𝑘 ∈ {2 , 5 , 10} , and t he 90th and 95t h percentile er rors (P90, P95) to characterize t ail beha vior . To quantify variability , DRC-D results are repor ted as means and standard deviations across ﬁve independent training runs with one test seed; COC O results use a single training r un a veraged ov er ﬁve test seeds. 5. Results 5.1. DR C-D: Summary Acr oss Heads and Backbones T able 1 show s results across all arc hitecture-method combinations, with each cell repor ting the mean and stan- dard deviation across ﬁve independent training r uns. CLS with EﬃcientVi T-B3 achiev es the best ov erall mean MAE of 1.23 ° , but exhibits training instability on sever al back - bone combinations where some r uns f ail to con ver ge: Mam- baOut Tiny (35.9 ° ), MambaOut Base (55.1 ° ), Swin Base (20.3 ° ), ConvN eXt V2 Base (20.7 ° ), and EﬃcientNetV2- R W T (29.4 ° ). CGD is the most robus t approac h, win- ning on 9 of 16 architectures with consistent con verg ence across all backbones; CGD with MambaOut Base reaches 1.24 ° , vir tually matching the best CLS result. PSC and UV achie ve competitiv e results on t he Con vNeXt V2 famil y , with PSC+Con vNeXt V2 Base being the most stable conﬁg- uration across all runs (std = 0.16 ° ). Direct-angle regression remains unstable across all backbones despite the circular loss. Backbone trends. EﬃcientViT -B3 is the standout bac kbone f or classiﬁcation, achieving the bes t single-conﬁguration result (1.23 ° ). MambaOut architectures pair best with CGD, producing the top-2 CGD results (1.24 ° for MambaOut Base and 1.33 ° for MambaOut Tin y). Con vNeXt V2 Base pairs most reliabl y wit h PSC and UV , yielding the most consistent results across r uns. The CLS instability patter n (training f ailures on some architectures) appears more pronounced f or deeper or complex models (MambaOut, Swin Base, Con- vNeXt V2 Base). Direct-angle regression collapses across all backbones, conﬁr ming analyses t hat smoothing t he loss alone does not remov e boundar y issues [ 33 , 34 ]. 5.2. COCO Evaluation and Historical Comparison W e transf er the best architecture per method from DRC- D to COCO and compare against two historical baselines: Net-360 [ 3 ] and O AD-360 [ 4 ], whose released model we re- ev aluate in our pipeline. Table 2 summarizes all results. On COCO 2014, CGD with MambaOut Base achiev es the best MAE of 3.71 ° , follo wed by UV (4.51 ° ) and CLS (4.67 ° ). All four str uctured methods (CLS, UV , PSC, CGD) substantially outperform OAD-360 (10.07 ° ), which in tur n beats the early CNN baseline Net-360 (20.97 ° ). D A reaches 14.79 ° , impro ving ov er Net-360 but f alling well behind the other circular-a w are methods, conﬁr ming t hat circular loss alone is insuﬃcient f or competitiv e per f or mance. The released OAD-360 model is a Vision Tr ansf or mer, updated from the Xception architecture used in the or iginal publi- cation. Our re-evaluated MAE of 10.07 ° is higher than the M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 4 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds T able 1 DRC-D test p erformance across metho ds and backbones. MAE ° values a re means (standard deviation in parentheses) across ﬁve indep endent training runs. Best metho d p er a rchitecture in bold; column-b est mark ed with ∗ ; overall best underlined. Right-hand metrics are 5-run means for the best metho d p er a rchitecture (b old). † Several CLS conﬁgurations exhibit high va riance due to intermittent training instability; reported means include all ﬁve runs. Architecture Method (MAE ° , mean(std)) Best metho d metrics (5-run means) DA CLS UV PSC CGD Med ° RMSE ° Acc (2 ° /5 ° /10 ° ) AUC (2 ° /5 ° /10 ° ) P90/P95 ° ViT-Tiny 18.39(1.76) 6.85(0.74) 4.16(0.76) 4.17(0.22) 5.69(1.56) 2.31 7.69 0.43/0.81/0.96 0.22/0.48/0.69 6.49/11.82 ViT-Base 15.16(2.83) 2.97(1.80) 2.89(0.65) 2.80(0.58) 2.10(0.49) 1.30 3.97 0.70/0.98/1.00 0.37/0.69/0.84 2.99/5.53 EﬃcientViT-B0 15.68(4.06) 7.46(1.13) 5.87(2.01) 4.94(0.87) 4.87(0.52) 1.68 12.25 0.56/0.92/0.97 0.29/0.59/0.77 6.57/17.71 EﬃcientViT-B3 8.25(1.84) 1.23(0.61) ∗ 2.41(0.35) 2.74(1.53) 1.93(0.33) 0.79 2.17 0.90/0.99/1.00 0.55/0.80/0.90 1.89/3.18 ConvNeXt V2 Atto 7.88(2.11) ∗ 3.04(1.55) 2.69(0.41) 2.83(0.64) 3.46(1.54) 1.77 4.13 0.53/0.89/0.98 0.28/0.57/0.76 4.89/7.06 ConvNeXt V2 Base 61.42(8.16) 20.71(33.63) † 1.54(0.36) ∗ 1.47(0.16) ∗ 2.22(1.23) 0.96 2.44 0.80/0.99/1.00 0.46/0.75/0.87 2.47/3.81 EﬃcientNetV2-RW T 12.79(1.10) 29.39(30.06) † 3.16(0.44) 2.83(0.49) 3.14(1.01) 1.97 4.13 0.51/0.87/0.98 0.26/0.54/0.74 5.18/7.07 EﬃcientNetV2-RW M 11.42(2.02) 2.68(0.87) 2.83(0.85) 2.50(0.96) 3.47(0.50) 1.64 3.94 0.61/0.90/0.98 0.33/0.61/0.78 4.38/6.52 MambaOut Tiny 60.23(10.47) 35.94(40.68) † 2.20(0.81) 2.09(0.70) 1.33(0.22) 0.82 2.52 0.90/1.00/1.00 0.53/0.80/0.90 1.88/3.48 MambaOut Base 70.07(8.65) 55.15(42.98) † 1.99(0.38) 2.57(1.21) 1.24(0.53) ∗ 0.81 2.07 0.88/1.00/1.00 0.52/0.79/0.89 2.04/3.16 FocalNet Tiny LRF 12.59(3.94) 3.31(0.92) 2.34(0.60) 2.66(0.82) 2.28(0.63) 1.32 4.48 0.68/0.97/0.99 0.36/0.68/0.83 3.15/6.22 FocalNet Base LRF 11.32(2.69) 3.18(1.64) 2.28(0.59) 2.35(0.37) 1.81(0.59) 1.08 3.43 0.77/0.99/1.00 0.42/0.73/0.86 2.65/4.85 EdgeNeXt XX-Small 20.16(2.70) 3.29(1.28) 4.22(1.48) 3.54(0.64) 2.10(0.61) 1.05 4.38 0.78/0.99/0.99 0.43/0.73/0.86 3.19/6.22 EdgeNeXt Base 14.78(3.02) 1.77(1.74) 1.88(0.53) 2.30(0.82) 2.34(1.08) 0.59 4.48 0.92/0.99/0.99 0.66/0.85/0.92 2.76/7.00 Swin Tiny 74.23(6.96) 2.42(0.98) 3.30(1.39) 2.21(0.62) 2.03(0.90) 0.99 4.58 0.80/0.99/0.99 0.46/0.75/0.87 2.44/5.96 Swin Base 78.87(10.09) 20.33(34.64) † 2.31(0.65) 2.89(0.99) 1.39(0.39) 0.89 2.52 0.86/1.00/1.00 0.50/0.78/0.89 2.11/3.62 T able 2 COCO evaluation (tested on 1,030 val images). Each metho d is trained once; all values a re means across ﬁve test seeds; MAE standa rd deviations in parentheses. OAD-360 results are from our re-evaluation using their released mo del. Best on COCO 2014 in italic b old; overall b est in b old. Method Arch. MAE ° Med ° RMSE ° Acc (2 ° /5 ° /10 ° ) AUC (2 ° /5 ° /10 ° ) P90/P95 ° COCO 2014 ( ∼ 83K train images) Net-360 [ 3 ] AlexNet-like 20.97 – – –/–/– –/–/– –/– OAD-360 [ 4 ] ViT 10.07(1.07) 1.35 35.17 0.68/0.91/0.93 0.37/0.65/0.79 4.29 /88.18 Ours (DA) ConvNeXt V2-Atto 14.79(0.28) 3.35 31.32 0.34/0.63/0.79 0.18/0.38/0.55 41.33/65.67 Ours (CLS) EﬃcientViT-B3 4.67(0.42) 0.79 13.47 0.81/0.94/0.96 0.51/0.74/0.85 7.21/21.04 Ours (UV) ConvNeXt V2-Base 4.51(0.29) 1.40 11.26 0.63/0.89/0.96 0.35/0.62/0.78 6.67/16.94 Ours (PSC) ConvNeXt V2-Base 4.88(0.18) 1.41 12.52 0.62/0.89/0.95 0.35/0.61/0.77 6.78/18.41 Ours (CGD) MambaOut Base 3.71(0.48) 0.68 11.03 0.86/0.96/0.97 0.56/0.79/0.88 4.93/ 16.15 COCO 2017 ( ∼ 117K train images) Ours (CGD) MambaOut Base 2.84(0.43) 0.55 8.45 0.90/0.98/0.98 0.63/0.82/0.90 3.54/12.00 8.37 ° repor ted by Maji et al., which we attr ibute to diﬀer- ences in t he test set and rotation sampling; their ev aluation protocol is not fully speciﬁed, precluding ex act reproduction. Notabl y , OAD-360 achie v es the best COCO 2014 P90 (4.29 ° ) but its P95 jumps to 88.18 ° , re v ealing a bimodal er ror distribution: the model is accurate on most images but produces catastrophic er rors on a small subset, a failure pattern characteristic of direct-angle reg ression. Tr aining t he best-per f or ming conﬁguration, CGD with MambaOut Base, on the larger COCO 2017 train set ( ∼ 117K images) furt her reduces MAE to 2.84 ° , surpassing all pr ior methods across all reported metrics. This impro vement sug- gests that per f or mance is cur rently limited by dataset scale rather than method design. 6. Analy sis CLS and CGD as competitiv e top approaches. CLS achie ves the best single-conﬁguration result on DRC-D (EﬃcientVi T-B3: 1.23 ° ), and its peak performance can be understood from two properties. First, discretizing 360 ° into bins eliminates the per iodicity problem entirely: t he 0 ° /360 ° wrap-around boundary that complicates regression- based losses simpl y does not exist. Second, t he classiﬁcation objective is structurally identical to the ImageNet pretraining task, so the backbone’ s learned f eature representations are naturally compatible with a classiﬁcation head, requir ing no repurposing of the f eature space. The predicted angle is decoded as t he center of the highest-probability bin (argmax). How ev er, CLS is not unif or ml y reliable across backbones: sever al architecture-approac h combinations ex- hibit training f ailures across multiple r uns. The instability is likely a consequence of how cross-entropy interacts with the pretrained feature space on a small dat aset. Cross- entropy loss agg ressiv ely concentrates probability mass on a single class, and when the backbone ’ s pretrained repre- sentations do not naturall y separate rotation-relev ant cues, the optimization can become trapped in poor regions of the loss landscape before meaningful or ientation f eatures emerg e. A strong ImageN et pr ior on texture or shape may pro vide little initial g radient signal for or ientation, causing earl y training dynamics to fail to escape high-loss plateaus. Architectures whose inductive biases happen to sur f ace orient ation-rele vant features early in ﬁne-tuning are more resistant to this f ailure mode, which e xplains the strong M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 5 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds Input CGD GT Figure 2: Qualitative results of CGD (MambaOut Base) on COCO 2014. Left six: accurate predictions; right six: failure cases. dependence on the speciﬁc backbone rather t han on model capacity alone. CGD is nearl y tied at the top on DRC-D (MambaOut Base: 1.24 ° ) while remaining reliabl y stable across all tested backbones, arguably making it the more practical choice f or deplo yment. Probabilis tic super vision in CGD av oids shar p cross-entropy gradients and ne v er creates a rew ard for constant-output predictions, which explains its robustness. PSC provides a continuous code and rev ersible decod- ing, which is attractive when the code must be blended with other parameters (as in or iented detection) [ 15 , 34 , 33 ]. On global or ientation, CGD tends to hav e an edge due to explicit uncert ainty modeling, though PSC remains competitive on larg er backbones. Ev en with circular losses, the direct scalar f or mulation is br ittle: g radients conﬂict when predictions straddle the boundary , and the netw ork can settle into larg e- er ror modes. Our results quantify this eﬀect across many backbones. Architectur e-approach com patibility. Not all combina- tions are equall y stable. A recur ring observation in T able 1 is that for a given method, scaling up the bac kbone does not reliably impro ve accuracy . With only 1,474 training images, larg er models r isk ov er ﬁtting, and their stronger pretrained representations, optimized f or a diﬀerent task distribution, can be harder to redirect to w ard rotation estimation. The interaction betw een method and architecture is t he dominant f actor: a w ell-matched smaller bac kbone can outper f or m a mismatched larg er one, highlighting t he import ance of systematic cross-arc hitecture ev aluation. Qualitative error analysis. Figure 2 show s representative successes and failures of our best model (CGD with Mam- baOut Base) on COCO 2014. The vas t major ity of images are corrected to wit hin a fe w degrees. Among the worst pre- dictions, er rors tend to cluster near cardinal rotations (90 ° , 180 ° ), where the model identiﬁes a plausible or ientation axis but selects the wrong polar ity or per pendicular direction. This confusion typically occurs on images with weak grav - itational cues, such as close-up view s or symmetric scenes where upright and inv er ted orient ations are visually similar . 7. Limitations Sev eral limit ations should be ackno wledg ed when inter- preting our results. Deep lear ning training is inherently non- deterministic due to random weight initialization, data shuf- ﬂing, and hardw are-speciﬁc ﬂoating-point operations. T o quantify this variability , all results in T able 1 report means and standard de viations across ﬁv e independent training runs with diﬀerent random seeds. This multi-r un protocol substantially improv es reliability compared to single-run ev aluations: for some architecture-approac h combinations, individual r uns can vary by sev eral degrees in MAE, while ﬁve-run means are considerably more stable. Standard de- viations in the t able reﬂect t he tr ue r un-to-run variability and should be consulted alongside means when inter preting results. Our pr imary comparison uses DRC-D, which cont ains only 1,474 training images. While this small size makes the 80-conﬁguration, ﬁve-seed study comput ationall y f easible, it also means that method rankings may shif t on larger and more div erse datasets. The COCO 2014 ev aluation provides initial evidence that our top ﬁndings transfer , but validating the full architecture-method gr id on a larger dataset w ould strengthen the conclusions. 8. Conclusion and Future W ork Modeling circular str ucture is essential for reliable angle prediction. In a comparison of ﬁve circular -a ware methods across sixteen architectures, classiﬁcation and probabilistic methods consistently outperform direct scalar reg ression. CLS and CGD are competitive at the top on DRC-D: CLS achie ves the best single-conﬁguration result (EﬃcientViT - B3: 1.23 ° ) while CGD oﬀers equiv alent accuracy (Mam- baOut Base: 1.24 ° ) wit h better robustness across back - bone choices. Transf er ring our best conﬁguration (CGD with MambaOut Base) to COCO 2014 yields 3.71 ° MAE, impro ving substantiall y o ver pr ior work, with further im- pro v ement to 2.84 ° when trained on t he larger COCO 2017 dataset. These ﬁndings, together with our reproducible framew ork, contr ibute to future work in circular reg ression, oriented object detection, and pose estimation. M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 6 of 7 Image Rotation Angle Estimation: Comparing Circular-A wa re Metho ds Data and Code A v ailability Code and dat a are av ailable at https://github.com/maxwo e/im a ge- r otatio n- ang le- esti m ation . A demo is a vailable at http s://hu gging face.c o/spa ces/ma xwoe/ image- ro tatio n- an g le- estimation . Ref erences [1] Y . Zhou, C. Bar nes, J. Lu, J. Y ang, H. Li, On the Continuity of Ro tation Representations in Neural Netw orks, in: 2019 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 5738–5746. doi:10.1109/CV PR.2019.00589 . [2] J. Levinson, C. Este ves, K. Chen, N. Snavel y , A. Kanazaw a, A. Ros- tamizadeh, A. Makadia, An Anal ysis of S VD f or Deep R otation Estimation. [3] P . Fischer, A. Dosovitskiy , T . Brox, Image Orientation Estimation with Con volutional Networ ks, in: J. Gall, P . Gehler, B. Leibe (Eds.), Pattern Recognition, V ol. 9358, Springer International Publishing, Cham, 2015, pp. 368–378. doi:10.1007/978- 3- 319- 24947- 6_30 . [4] S. Maji, S. Bose, Deep Image Orient ation Angle Detection (Jun. 2020). , doi:10.48550/arXiv.2007.06709 . [5] A. B. Amjoud, M. Amrouch, Transf er Learning for Automatic Image Orientation Detection Using Deep Learning and Logistic Regression, IEEE Access 10 (2022) 128543–128553. do i:1 0.11 09/ ACCE SS. 202 2 . 3225455 . [6] U. Joshi, M. Guerzhoy , Automatic Photo Orient ation Detection wit h Conv olutional Neural Networks, in: 2017 14th Conf erence on Com- puter and Robot Vision (CR V), IEEE, Edmonton, AB, 2017, pp. 103– 108. doi:10.1109/CRV.2017.59 . [7] G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, L. Zhang, DOT A: A Large-Scale Dataset for Object Detec- tion in Aerial Images, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City , UT, 2018, pp. 3974–3983. doi:10.1109/CVPR.2018.00418 . [8] J. Ding, N. Xue, Y . Long, G.-S. Xia, Q. Lu, Learning RoI Transf or mer f or Or iented Object Detection in Aerial Images, in: 2019 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 2844–2853. doi:10.1109/CV PR.2019.00296 . [9] D. Pa vllo, D. Grangier, M. A uli, QuaterN et: A Quaternion-based Recurrent Model f or Human Motion (Jul. 2018). ar Xi v:1 80 5. 064 85 , doi:10.48550/arXiv.1805.06485 . [10] X. Y ang, J. Y an, Arbitrar y-Oriented Object Detection with Circular Smooth Label, in: A. V edaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.), Computer Vision – ECCV 2020, V ol. 12353, Springer Inter- national Publishing, Cham, 2020, pp. 677–694. doi:10.1007/978- 3- 0 30- 58598- 3_40 . [11] X. Y ang, L. Hou, Y . Zhou, W . W ang, J. Y an, Dense Label Encoding f or Boundary Discontinuity Free Rotation Detection, in: 2021 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, Nashville, TN, US A, 2021, pp. 15814–15824. doi:10 .1109/CV PR46437.2021.01556 . [12] S. Prokudin, P . Gehler, S. No wozin, Deep Directional Statistics: Pose Estimation with Uncert ainty Quantiﬁcation, in: V . Fer rari, M. Heber t, C. Sminchisescu, Y . W eiss (Eds.), Computer Vision – ECCV 2018, V ol. 11213, Spr inger International Publishing, Cham, 2018, pp. 542– 559. doi:10.1007/978- 3- 030- 01240- 3_33 . [13] I. Gilitschenski, R. Sahoo, W . Schw ar ting, A. Amini, S. Karaman, D. Rus, Deep Orientation Uncert ainty Learning based on a Bingham Loss (2020). [14] H. Xu, X. Liu, Y . Ma, Z. Zhu, S. W ang, C. Y an, F. Dai, Rotated Object Detection with Circular Gaussian Distribution, Electronics 12 (15) (2023) 3265. doi:10.3390/electronics12153265 . [15] Y . Y u, F. Da, Phase-Shifting Coder: Predicting Accurate Or ientation in Or iented Object Detection (Mar . 2023). ar X iv : 22 1 1 .0 6 368 , d o i: 10.48550/arXiv.2211.06368 . [16] Z.-K. Xiao, G.- Y . Y ang, X. Y ang, T.-J. Mu, J. Y an, S.-M. Hu, Theo- reticall y Achie ving Continuous Representation of Oriented Bounding Box es, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, W A, USA, 2024, pp. 16912–16922. doi:10.1109/CVPR52733.2024.01600 . [17] R. Xu, Y . Shi, Z. Qi, Image Orientation Estimation Based On Deep Learning - A Survey , Procedia Computer Science 242 (2024) 1193– 1197. doi:10.1016/j.procs.2024.08.176 . [18] C.- Y . Tsai, W .-C. Lin, Precise Orientation Estimation f or R otated Ob- ject Detection Based on a U nit V ector Coding Approach, Electronics 13 (22) (2024) 4402. doi:10.3390/electronics13224402 . [19] L. Nie, C. Lin, K. Liao, S. Liu, Y . Zhao, Deep Rotation Correction without Angle Prior, IEEE Trans. on Image Process. 32 (2023) 2879– 2888. , doi:10.1109/TIP.2023.3275869 . [20] T .- Y . Lin, M. Maire, S. Belongie, L. Bourdev , R. Girshick, J. Hays, P . Perona, D. Ramanan, C. L. Zitnick, P . Dollár, Microsoft COCO: Common objects in context (2015). . [21] P . Follmann, T . Bottger, A rotationall y-in variant conv olution module by feature map back -rotation, in: 2018 IEEE Winter Conference on Applications of Computer Vision (W A CV), 2018, pp. 784–792. doi: 10.1109/WACV.2018.00091 . [22] O. R ussakovsky , J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy , A. Khosla, M. Bernstein, A. C. Berg, L. Fei- Fei, ImageNet Large Scale Visual Recognition Challenge (Jan. 2015). arXiv:1409.0575 , doi:10.48550/arXiv.1409.0575 . [23] A. Dosovitskiy , L. Bey er, A. Kolesniko v , D. W eissenbor n, X. Zhai, T . Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly , J. Uszkoreit, N. Houlsby , An Image is W orth 16x16 W ords: Trans- f ormers for Image Recognition at Scale (Jun. 2021). , doi:10.48550/arXiv.2010.11929 . [24] Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . W ei, Z. Zhang, S. Lin, B. Guo, Swin Transf ormer: Hierarchical Vision Transf ormer using Shifted Windo ws, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 2021, pp. 9992–10002. doi:10.1109/ICCV48922.2021.00986 . [25] S. W oo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, S. Xie, ConvN eXt V2: Co-designing and Scaling ConvNe ts with Masked Autoencoders, in: 2023 IEEE/CVF Conf erence on Computer Vision and Patter n Recognition (CVPR), IEEE, V ancouver , BC, Canada, 2023, pp. 16133–16142. doi:10.1109/CVPR52729.2023.01548 . [26] M. Tan, Q. V . Le, EﬃcientNetV2: Smaller Models and Faster Train- ing. [27] X. Liu, H. Peng, N. Zheng, Y . Y ang, H. Hu, Y . Y uan, EﬃcientVi T: Memory Eﬃcient Vision T ransformer with Cascaded Group At- tention, in: 2023 IEEE/CVF Conf erence on Computer Vision and Pattern Recognition (CVPR), IEEE, V ancouv er, BC, Canada, 2023, pp. 14420–14430. doi:10.1109/CVPR52729.2023.01386 . [28] M. Maaz, A. Shaker, H. Cholakkal, S. Khan, S. W . Zamir, R. M. An wer , F . S. Khan, EdgeNeXt: Eﬃciently Amalgamated CNN- Transf ormer Architecture for Mobile Vision Applications (Oct. 2022). arXiv:2206.10589 , doi:10.48550/arXiv.2206.10589 . [29] J. Y ang, C. Li, X. Dai, J. Gao, Focal Modulation Networ ks. [30] W . Y u, X. W ang, MambaOut: Do W e Reall y Need Mamba for Vision? [31] R. Wightman, PyT orch image models (2019). doi:1 0.52 81/ze nodo. 4 414861 . [32] I. Loshchilo v , F. Hutter, Decoupled W eight Decay Regularization (Jan. 2019). , doi:10.48550/arXiv.1711.05101 . [33] H. Xu, X. Liu, H. Xu, Y . Ma, Z. Zhu, C. Y an, F . Dai, Rethinking Boundary Discontinuity Problem for Oriented Object Detection, in: 2024 IEEE/CVF Conference on Computer Vision and Patter n Recog- nition (CVPR), IEEE, Seattle, W A, USA, 2024, pp. 17406–17415. doi:10.1109/CVPR52733.2024.01648 . [34] Y . Y u, F . Da, On Boundary Discontinuity in Angle Regression Based Arbitrary Oriented Object Detection, IEEE Trans. Pattern Anal. Mach. Intell. 46 (10) (2024) 6494–6508. d o i :1 0 . 1 1 0 9 / T P A M I . 2 0 2 4 .3378777 . M. Woehrer: Pr eprint submitted to Elsevier . Under r eview at P attern Recognition Letter s. Page 7 of 7

Image Rotation Angle Estimation: Comparing Circular-Aware Methods

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment