Insider Purchase Signals in Microcap Equities: Gradient Boosting Detection of Abnormal Returns
This paper examines whether SEC Form 4 insider purchase filings predict abnormal returns in U.S. microcap stocks. The analysis covers 17,237 open-market purchases across 1,343 issuers from 2018 through 2024, restricted to market capitalizations between $30M and $500M. A gradient boosting classifier trained on insider identity, transaction history, and market conditions at disclosure achieves AUC of 0.70 on out-of-sample 2024 data. At an optimized threshold of 0.20, precision is 0.38 and recall is 0.69. The distance from the 52-week high dominates feature importance, accounting for 36% of predictive signal. A momentum pattern emerges in the data: transactions disclosed after price appreciation exceeding 10% yield the highest mean cumulative abnormal return (6.3%) and the highest probability of outperformance (36.7%). This contrasts with the simple mean-reversion intuition often applied to post-run-up entries. The result is robust to winsorization and holds across subsamples. These patterns are consistent with slower information incorporation in illiquid markets, where trend confirmation may filter for higher-conviction insider signals.
💡 Research Summary
This paper investigates whether SEC Form 4 insider purchase filings can be used to predict abnormal returns in U.S. micro‑cap equities (market capitalizations between $30 million and $500 million). The author assembles a high‑quality dataset covering 17,237 open‑market purchase transactions reported between 2018 and 2024 for 1,343 distinct issuers. Rigorous preprocessing steps—ticker matching, exclusion of filings with reporting lags over 90 days, minimum transaction value of $5,000, and a minimum average daily dollar volume of $200,000—ensure that only economically meaningful trades in liquid enough securities are retained.
Four groups of features are constructed: (1) insider characteristics (title score ranging from CEO = 5 to “Other” = 1 and raw transaction value), (2) recent trading history (first purchase in the past 12 months indicator and the ratio of the current transaction to the insider’s historical average), (3) market‑condition variables measured at the disclosure date (percentage distance from the 52‑week high and low, month‑to‑date return, 30‑day annualized volatility, market cap, average daily volume, and a binary biotech sector flag), and (4) a price‑deviation variable that captures the change between the transaction price and the closing price on the filing day.
The target variable is the cumulative abnormal return (CAR) over the 30 trading days following the filing, computed using the Fama‑French three‑factor model to adjust for systematic risk. A binary label is assigned: y = 1 if CAR > 10 % (approximately the top decile of the empirical distribution), otherwise y = 0. This creates a moderately imbalanced classification problem with a positive class rate of 27 %.
The primary predictive model is XGBoost (gradient‑boosted decision trees), chosen for its strength on heterogeneous tabular data. Logistic regression and random forest serve as baseline comparators. The sample is split temporally: 2018‑2022 for training (11,609 observations), 2023 for validation (2,982), and 2024 for out‑of‑sample testing (2,646). Hyper‑parameters are tuned via time‑series cross‑validation, and class‑weighting is applied to mitigate imbalance. The decision threshold is optimized on the validation set to maximize the F1 score, yielding an optimal cutoff of 0.20.
On the 2024 test set, XGBoost achieves an area under the ROC curve (AUC) of 0.70, outperforming logistic regression (AUC = 0.67) and matching random forest (AUC = 0.69). At the optimized threshold, precision is 0.38, recall 0.69, and F1 = 0.49. By contrast, the default 0.5 threshold yields a recall of only 0.17, underscoring the necessity of a lower cutoff for practical signal extraction.
Feature‑importance analysis (average gain) reveals that “percentage distance from the 52‑week high” dominates, contributing 36 % of the total predictive power—more than four times the next most important variable. The remaining top contributors are month‑to‑date return (8.1 %), 30‑day volatility (7.2 %), market capitalization at filing (6.6 %), and distance from the 52‑week low (6.5 %). Insider‑specific attributes such as title score and transaction value rank lower, suggesting that market conditions at the moment of disclosure carry more information than the identity of the insider.
A stratified analysis by price deviation at disclosure uncovers a striking momentum pattern. Transactions disclosed after the stock has risen more than 10 % relative to the price at the time of the trade exhibit a mean 30‑day CAR of 6.3 % and a 36.7 % probability of exceeding the 10 % threshold—the highest among all buckets. By contrast, disclosures occurring after a price decline produce a mean CAR of only 2.3 % and a 22.6 % outperformance probability. The difference between the lowest and highest buckets is statistically significant (t = ‑5.13, p < 0.001). Winsorized means and medians confirm that the result is not driven by outliers (median CAR in the highest bucket is 1.93 %).
Robustness checks include extending the return window to 20 and 60 days (the pattern persists, though predictive power weakens for longer horizons), testing performance in low‑volatility regimes (VIX < 20) where the model performs better, and examining sector effects (the biotech dummy ranks seventh, indicating modest sector influence). Calibration curves show reasonable alignment between predicted probabilities and actual positive rates in the 0.2–0.5 range, where most predictions fall.
The discussion interprets the dominance of the 52‑week‑high distance as reflecting two non‑exclusive mechanisms. First, a behavioral “buy‑low” motive: insiders may preferentially purchase when the price is far below its recent peak, anticipating a rebound. Second, a mechanical effect: the 10 % CAR target is easier to achieve when the stock is farther from its high, simply because a larger absolute price move is required. The momentum finding—higher abnormal returns following purchases disclosed during price strength—contradicts the conventional mean‑reversion intuition that insiders avoid buying into a run‑up. The author argues that in illiquid micro‑caps, price appreciation may signal the early stage of information diffusion rather than the completion of a trend, allowing insider purchases to act as a confirmation of high‑conviction information.
Limitations are acknowledged. Transaction‑cost estimates are simplistic (2 % spread, 1 % price impact), and the back‑test does not model execution slippage or market impact in detail. Potential information leakage between the insider’s trade and the public filing is not examined. The sample spans the COVID‑19 pandemic, so regime‑specific effects may be present.
Future research directions include (i) analyzing shorter post‑disclosure windows (1‑5 days) to capture more immediate price reactions, (ii) incorporating high‑frequency order‑book data to better model execution costs, (iii) testing alternative target definitions such as risk‑adjusted excess returns, and (iv) exploring interaction effects between liquidity measures and insider signals.
In conclusion, the study demonstrates that machine‑learning classifiers, particularly gradient‑boosted trees, can extract actionable predictive signals from Form 4 insider purchase filings in the micro‑cap segment. Market‑state variables—especially distance from the 52‑week high—carry the bulk of the predictive weight, while insider identity contributes modestly. The observed momentum pattern, where purchases disclosed after appreciable price gains outperform those disclosed after declines, challenges the prevailing mean‑reversion heuristic and underscores the importance of slow information incorporation and liquidity constraints in small‑cap markets. The findings suggest that, in environments with sparse public information, price momentum may validate rather than erode the informational value of insider trades.
Comments & Academic Discussion
Loading comments...
Leave a Comment