Qingyan Zhitan: Economic Census Data Mining, Visual Analysis, Exploration and Implementation

The economic census, the population census and the agricultural census constitute three periodic national census items. The economic census is carried out every five years, in every 3 and 8 years, and China has carried out four national economic censuses in 2004, 2008, 2013 and 2018. This year marks the fifth national economic census.

Economic census is the purpose of the state to grasp the scale and layout of the development of the secondary and tertiary industries of the national economy, to understand the current situation of China's industrial organization, industrial structure, industrial technology and the composition of various factors of production, to find out the basic situation of various types of enterprises and unit energy consumption in China, to establish and improve the basic unit directory database, basic information database and statistical electronic geographic information system covering all industries of the national economy, and to study and formulate national economic and social development plans. A large-scale survey of national conditions and national strength to improve the level of decision-making and management.

The current situation of China's economic census data research

China's control and evaluation of the quality of economic census data runs through the entire process of census work, and corresponding data quality control and evaluation methods have been formulated for pre-inventory, registration and summary during the event, and post-event spot checks. However, at this stage, there is a big gap between the research of economic census data by China's statistical agencies and foreign countries. Most developed countries such as Europe and the United States have entered the stage of web application, data warehouse application and intelligent data analysis technology application, China's statistical agencies have not been able to truly apply intelligent data analysis technology to economic census data, and still use network technology and database technology to obtain some results. Under the situation of the continuous development of China's economy and the increasing number of adjustments in the census process, the workload has become overwhelming, and there are more errors in too many manual operation links, which will cause the lag of work. Therefore, the census work needs to rely on higher technology for network data transmission, and scientifically use intelligent data analysis methods or data mining methods to study the results of census data.

Analysis of China's economic census data mining demand

The economic census is a major large-scale survey of national conditions and national strength, involving all legal entities, industrial activity units and self-employed households. The content of the survey varies depending on the respondent. In the final analysis, visual analysis of economic census data mining is data processing, which is inseparable from the management of data storage.

The management of data storage is aimed at the needs of economic census data processing business, with data entry storage and data analysis and processing as the two main functional links, through the database storage technology to meet its input needs, database script language technology to meet its data table processing needs, based on this idea to complete the data storage demand analysis, and through the gradual deepening of the demand analysis, determine the user management, data entry, data query and summary, system navigation help and system security management and other functional points.

Intelligent data analysis is based on data collection, statistics and summarization, and uses intelligent analysis models to mine and discover census data relationships, economic development problems, regional division of economic levels, economic indicator levels and countermeasures and suggestions for national economic development layout based on economic indicators. Intelligent data analysis needs to meet the needs of effective statistics and clustering of data reports that cannot be counted by existing database technology, the need to assist database technology to find problems more comprehensively so as to grasp the overall situation more comprehensively, and the need to do a good job of basic analysis and suggestions for the next step of economic development.

Economic census data mining demand analysis summary.

Methods and implementation of intelligent data analysis for economic census

The quality of census data is the lifeline of census work, and in view of the new characteristics of China's "Five Economic Census" survey subjects, such as the significant increase in the number of survey subjects, the unprecedented increase in the difficulty of verifying and verifying census units, and the first overall development of input-output surveys, the intelligent data analysis method of economic census is very important. First of all, we preprocess the original data, including data cleansing, missing value processing, etc. Then, we used cluster analysis to classify industries, and used association rule mining to find the correlation between different industries. Finally, we use visualization methods such as bar charts and line charts to show the output value and employment of different industries, as well as the trends of economic growth and employment over time.

There are many types of conventional intelligent data analysis, such as rough fuzzy sets, probabilistic rough sets, genetic algorithms, decision tree-based classification, Bayesian classification, hierarchical clustering, Bayesian nets, Markov nets, influence graph decision-making, and enhanced learning algorithms and data fusion analysis. Combined with the characteristics of economic census, various methods of classification and clustering can be used to realize the intelligent analysis of economic census, mainly including fuzzy clustering algorithm, MMD algorithm (also known as maximum and minimum distance algorithm), K-means clustering method, and FCM algorithm (also known as fuzzy C-means clustering method).

Fuzzy clustering algorithm

Fuzzy clustering algorithm is a widely used fuzzy mathematical method, which constructs a fuzzy matrix according to the attributes of the research object itself, and fuzzy clustering algorithm is a widely used fuzzy mathematical method, which constructs a fuzzy matrix according to the attributes of the research object itself, and on this basis, determines the clustering relationship according to a certain degree of membership. Clustering is an important method of unsupervised learning, which aims to cluster similar samples in the same class so that the distance or similarity between them is high, while non-similar samples are scattered in different classes.

The fuzzy clustering algorithm usually uses a vector to represent the attribution of a data point, and which dimension in the vector has a larger value means that the data point is closer to the corresponding cluster of the dimension, that is, the greater the probability of belonging to the cluster. In fuzzy clustering analysis, each sample point has a different membership degree for each cluster, not just a certain class or not.

MMD (Maximum Mean Discrepancy) algorithm

The Maximum Mean Discrepancy (MMD) algorithm is a method to measure the difference between two distributions, especially in transfer learning, and is widely used as a loss function in transfer learning. It is a measurement method based on the Gaussian kernel function to calculate the mean and difference of two samples with different distributions, which can effectively judge the similarity of the two distributions.

The advantage of MMD is that it does not require additional parameters, but directly uses the distribution characteristics of the data itself to calculate. In addition, MMD is also regarded as a pattern recognition algorithm based on Euclidean distance, which can avoid the problem of clustering seeds too close together, so as to have better performance.

k-means clustering method

The k-means clustering method is an unsupervised learning algorithm whose main goal is to divide the data into k groups, so that the similarity between the data points within each group is as high as possible, and the similarity between the data points between different groups is as low as possible. The basic idea is to iteratively find k clusters and then assign each data point to the nearest clustering center, forming k clusters.

The fuzzy clustering algorithm is closely related to the kmeans clustering algorithm (kmeans). The kmeans algorithm is clustered based on the Euclidean distance between samples, while fuzzy clustering is clustered based on the similarity metric between samples. Therefore, some ideas of the kmeans algorithm can be borrowed when performing fuzzy clustering analysis.

FCM algorithm

The FCM algorithm, also known as fuzzy C-means clustering algorithm, is a soft clustering method based on membership. It can divide the dataset into k classes, and each sample has a membership that belongs to each class, and the sum of all memberships is 1.

The goal of the FCM algorithm is to determine the clustering center and membership matrix by optimizing the objective function. The objective function is as follows: j m(u, v) = c n x m x k-v i 2, where v = (v1, v2,..., vc), m > 1 is the fuzzy parameter, which determines the ambiguity of the cluster, that is, the degree to which the data point can become multiple classes, in most cases m=2.

Prospect of intelligent data mining and visual analysis of economic census

With the advent of the era of big data, the Internet of Things, artificial intelligence and other technologies, the economic development situation is more complex, China pays attention to both the speed of development and the quality of development, which is also an important embodiment of the comprehensive strength of the country. Combined with the key and difficult points of economic census, the intelligent data mining and visual analysis of economic census will have the following characteristics:

It covers the whole process of economic census

From data collection to data entry into the database system, to effective analysis of data, and finally to generating data reports and presenting data analysis results, the visualization of economic census intelligent data mining needs to have the integrity of all the functions covered in the above processes.

The data storage system ensures the timeliness and sharing of census data

In the past, the data census was visited by census personnel and manually entered into electronic equipment, but the existing system can meet the needs of census units to directly enter online, and census personnel can view and review in real time, with strong timeliness; At the same time, the census data network allows units at all levels from top to bottom to view and operate the data, avoiding the problem that the superior needs the subordinate to carry out a long data report before understanding the data situation, and the system realizes the sharing of data.

The intelligent data analysis system applies the current popular data mining technology

Data mining technology in the network information appeared in the "big" phenomenon but the background of knowledge poverty, the purpose is to be able to find the potential law and effective "knowledge" from the massive information, the national economic census data to meet the characteristics of large and many, although there are some potential laws, but still can not avoid the existence of a lot of inherent problems that are difficult to find, efficient visual data mining technology needs to effectively make up for this defect.

MATLAB drawings are accurate and comprehensive

MATLAB is a very mature mathematical software, and its drawing function can realize the drawing of various graphs, and apply it to the presentation of economic census data analysis results to ensure the accuracy of drawing data. At the same time, MATLAB can be relatively easy to draw three-dimensional figures, and the drawing of the trend of economic census data will be more intuitive and more comprehensive, which is a function that the general statistical software does not have at present or has this function but the drawing effect is not so obvious.

In short, the mining and visual analysis of economic census data is an important task in the era of big data. Through the mining and visual analysis of economic census data, we can better understand the operation of the national economy and provide strong support for the formulation of macroeconomic policies. In future research, we will continue to explore more efficient and accurate economic census data mining and visual analysis methods, so as to make greater contributions to China's economic development.

Written by |Wang Qiuhui is a researcher at the Intelligent Data Mining Research Department of Qingyan Group.

Edit |Chen Zexi.

* |Internet.

Qingyan Zhitan: Economic Census Data Mining, Visual Analysis, Exploration and Implementation

Related Pages

Qingyan Zhitan talks about smart property to promote community development

Qingyan Zhi talks about scientific release and vigilance against the invasion of alien species and d

Qingyan Zhitan Digitalization Promotes Smart Forest Supervision

Digital technology helps build a new energy system under the climate challenge

Qingyan Zhi talks about the slow sweeping of fallen leaves to highlight the aesthetics of the city