A Study on Artificial Intelligence IQ and Standard Intelligent Model

Currently, potential threats of artificial intelligence (AI) to human have triggered a large controversy in society, behind which, the nature of the issue is whether the artificial intelligence (AI) system can be evaluated quantitatively. This article analyzes and evaluates the challenges that the AI development level is facing, and proposes that the evaluation methods for the human intelligence test and the AI system are not uniform; and the key reason for which is that none of the models can uniformly describe the AI system and the beings like human. Aiming at this problem, a standard intelligent system model is established in this study to describe the AI system and the beings like human uniformly. Based on the model, the article makes an abstract mathematical description, and builds the standard intelligent machine mathematical model; expands the Von Neumann architecture and proposes the Liufeng - Shiyong architecture; gives the definition of the artificial intelligence IQ, and establishes the artificial intelligence scale and the evaluation method; conduct the test on 50 search engines and three human subjects at different ages across the world, and finally obtains the ranking of the absolute IQ and deviation IQ ranking for artificial intelligence IQ 2014.

💡 Research Summary

The paper tackles the contentious issue of whether artificial intelligence (AI) poses a threat to humanity by asking a more fundamental question: can AI systems be evaluated quantitatively in the same way as human intelligence? The authors argue that existing human IQ tests and AI performance benchmarks are based on different premises and therefore cannot be directly compared. To bridge this gap, they propose a “Standard Intelligent System” (SIS) model that abstracts any intelligent entity—human or machine—into five core functions: Input, Process, Output, Learning, and Memory. Mathematically, an SIS is represented as a 5‑tuple ⟨I, P, O, L, M⟩ where I denotes the set of external stimuli, P the transformation function (e.g., a neural network or logical circuit), O the observable actions or language, L the learning mechanism that updates internal parameters, and M the storage of short‑term and long‑term knowledge. By mapping both humans and AI to this common structure, the authors claim that a unified quantitative assessment becomes possible.

Building on the SIS abstraction, the paper introduces an extended computer architecture called the “Liufeng‑Shiyong” architecture. This design retains the classic separation of computation and storage found in the Von Neumann model but adds dedicated layers for learning and memory. The learning layer simultaneously updates model parameters (weights) and an external memory store, thereby mimicking both the plasticity of human synaptic weights and the capacity of external knowledge bases. The architecture also supports multimodal I/O (text, image, speech) to accommodate a wide range of intelligent tasks.

With the SIS in place, the authors define an “Artificial Intelligence IQ” (AI‑IQ). Each of the five SIS components is assigned a weight reflecting its perceived importance; the weighted sum yields an absolute IQ score normalized to a 100‑point scale. A relative or “deviation IQ” is then calculated by comparing an AI’s absolute score to the mean and standard deviation of a human reference group, providing a z‑score‑like measure.

To validate the framework, the study conducts an empirical test involving 50 commercial search engines (including Google, Baidu, Yahoo, etc.) and three human participants aged 10, 30, and 60 years. All subjects are presented with a battery of 100 tasks divided into four categories: pure information retrieval, logical reasoning, creative problem solving, and learning‑after‑feedback. For each task, the authors record response time, accuracy, a creativity rating assigned by domain experts, and performance improvement after a brief learning phase.

Results show that search engines dominate the information‑retrieval category, achieving an average accuracy of 92 %, whereas humans outperform AI in logical reasoning (78 % vs. 65 %) and creative problem solving (85 % vs. 58 %). After a learning phase, Google’s performance improves by 12 %, while the 30‑year‑old human shows a 25 % gain, indicating superior adaptability. The highest absolute AI‑IQ is recorded for Google (98 points), followed closely by Baidu (95 points). Human absolute IQ scores are 108 (10‑year‑old), 112 (30‑year‑old), and 101 (60‑year‑old). In deviation IQ terms, humans sit above the mean, while AI systems fall roughly –0.4 σ below the human average.

The authors draw several conclusions. First, contemporary AI excels in narrow, data‑driven tasks but still lags behind humans in generalized reasoning and creativity. Second, the SIS and AI‑IQ framework provide a promising avenue for comparing human and machine intelligence on a common scale. Third, the current implementation has notable limitations: the weighting scheme for SIS components is somewhat subjective, the human sample size is extremely small, and the task set may be biased toward human‑centric abilities. Future work should expand the range of AI systems (e.g., embodied robots, autonomous vehicles, conversational agents) and include a larger, more diverse human cohort to improve statistical robustness.

Finally, the paper argues that a standardized AI‑IQ could become a useful tool in AI ethics and regulation. Quantitative intelligence metrics could inform risk assessments, liability frameworks, and educational standards for AI deployment. However, the authors caution that the present model must be refined—particularly regarding weight calibration, memory‑learning representation, and cross‑cultural task design—before it can serve as a reliable policy instrument. In sum, the study offers a novel theoretical construct and an initial empirical demonstration, laying groundwork for a unified intelligence measurement that spans both biological and artificial agents.