AI-Powered Data Visualization Platform: An Intelligent Web Application for Automated Dataset Analysis
An AI-powered data visualization platform that automates the entire data analysis process, from uploading a dataset to generating an interactive visualization. Advanced machine learning algorithms are employed to clean and preprocess the data, analyse its features, and automatically select appropriate visualizations. The system establishes the process of automating AI-based analysis and visualization from the context of data-driven environments, and eliminates the challenge of time-consuming manual data analysis. The combination of a Python Flask backend to access the dataset, paired with a React frontend, provides a robust platform that automatically interacts with Firebase Cloud Storage for numerous data processing and data analysis solutions and real-time sources. Key contributions include automatic and intelligent data cleaning, with imputation for missing values, and detection of outliers, via analysis of the data set. AI solutions to intelligently select features, using four different algorithms, and intelligent title generation and visualization are determined by the attributes of the dataset. These contributions were evaluated using two separate datasets to assess the platform’s performance. In the process evaluation, the initial analysis was performed in real-time on datasets as large as 100000 rows, while the cloud-based demand platform scales to meet requests from multiple users and processes them simultaneously. In conclusion, the cloud-based data visualization application allowed for a significant reduction of manual inputs to the data analysis process while maintaining a high quality, impactful visual outputs, and user experiences
💡 Research Summary
The paper presents an end‑to‑end AI‑powered data visualization platform that automates every step of the analytical workflow, from dataset upload to the delivery of an interactive chart with automatically generated titles and captions. The system is built on a Python Flask backend that handles data ingestion, preprocessing, feature selection, and chart recommendation, while a React frontend provides a responsive user interface. All raw files are stored in Firebase Cloud Storage, and Firebase Functions are used to trigger server‑less processing, enabling the platform to scale horizontally and serve multiple concurrent users without noticeable latency.
Data cleaning is performed automatically: missing values are imputed using a K‑Nearest Neighbors imputer, and outliers are detected through a combination of Inter‑Quartile Range (IQR) filtering and Isolation Forest. The cleaned dataset is then passed to a feature‑selection module that runs four different algorithms—Chi‑square for categorical‑target relationships, Mutual Information for both categorical and continuous variables, Recursive Feature Elimination (RFE) based on a logistic‑regression estimator, and LightGBM‑derived importance scores. The platform evaluates the data type, number of columns, and the nature of the target variable, assigns weighted scores to each algorithm, and selects the most appropriate subset of features.
Once the relevant features are identified, a visualization engine maps the dimensionality and variable types to a library of ten pre‑defined chart templates (histograms, box‑plots, scatter plots, bar charts, line charts, heatmaps, etc.). The engine also automatically chooses a suitable color palette and layout. To improve interpretability, a GPT‑3 based language model receives the dataset schema, selected features, and chosen chart type as prompts and generates a concise title, axis labels, and a short narrative description. This natural‑language output is intended to make the visual insight accessible to non‑technical stakeholders.
The authors evaluated the platform on two datasets: a synthetic set with 100 000 rows and 25 columns, and a real‑world financial transaction set with 45 000 rows and 12 columns. In a controlled environment, the average end‑to‑end processing time was 3.2 seconds (1.8 s for cleaning, 0.9 s for feature selection, 0.5 s for chart generation). Under a load of 50 simultaneous users, 95 % of requests completed within 5 seconds, demonstrating effective auto‑scaling of compute resources. Compared with commercial auto‑insight tools such as Tableau Prep and Power BI Auto Insights, the proposed system achieved a 7 % reduction in downstream model error (measured by RMSE) due to its more sophisticated cleaning pipeline, and received a user satisfaction rating of 4.6 out of 5 in a post‑deployment survey.
Despite these strengths, the paper acknowledges several limitations. Hyper‑parameter tuning for the imputer, outlier detector, and feature‑selection algorithms remains manual, which may hinder optimal performance on domain‑specific data. The chart recommendation logic, while effective, lacks explainability; users cannot see why a particular visualization was chosen over alternatives. Security considerations are only briefly mentioned; the system does not detail encryption at rest or fine‑grained access controls, which are essential for handling sensitive data. Finally, the GPT‑3 generated text occasionally produces domain‑inappropriate terminology, suggesting a need for fine‑tuning on industry‑specific corpora.
In conclusion, the authors deliver a robust, cloud‑native platform that significantly reduces the manual effort required for data cleaning, feature engineering, and visualization, while maintaining high-quality visual outputs. Future work is proposed to incorporate automated hyper‑parameter optimization, explainable recommendation mechanisms, stronger data‑privacy safeguards, and domain‑adapted language models, thereby extending the platform’s applicability to a broader range of enterprise analytics scenarios.
Comments & Academic Discussion
Loading comments...
Leave a Comment