A long time ago, we released (Listed Company Data Asset Index Measurement Data 2007-2022). Now, in order to further enrich the research content of the member group. We added it againEstimated data of the information disclosure level of data assets of listed companies 2001 2022!。It makes the research content on data assets richer and more complete.
Data Description:Since data assets have not yet been included in the balance sheet, their information disclosure is still voluntary. Therefore, the level of data asset information disclosure can be evaluated by mining the data asset information in the annual reports of listed companies.
The specific steps are as follows;
Clause. 1. Use "data assets" as a seed word.
Clause. 2. Drawing on the view that "data assets are data resources that can bring economic benefits" released by the China Academy of Information and Communications Technology in 2019, "data resources" are also used as seed words.
Clause. 3. According to the two seed words of "data assets" and "data resources", with the help of the word2vec neural network model, the similar word sets of the seed words were obtained by means of deep learning technology, and in order to improve the accuracy of the measurement, only the words with higher similarity (greater than or equal to 0.) were retained5), complete the dictionary construction.
Clause. Fourth, dig out the word frequency of seed words and similar words in the annual financial report, and calculate the level of data asset information disclosure, and the calculation formula is as follows.
datait=σdictionarywordsitn/totalwordsit×10
Where;01: datait (indicates the level of information disclosure of data assets). 02: dictionarywordsitn (the exact word frequency of the nth seed or similar word word in the dictionary in the annual financial report of **i year t). 03: TotalWordsit (the total word frequency of **i's annual report in year t).
In addition, because the variable of the level of information disclosure of data assets is a right-skewed distribution, it is necessary to carry out this indicatorLogarithmic processing (+1 logarithmic).。AgainNormalization [0,1].。Finally, an index that can measure the level of information disclosure of data assets of listed companies is obtained.
A total of 68,724 texts of listed companies were mined, including annual reports, social responsibility reports, ESG reports, sustainability reports, and environmental reports.
Among them: total texts (68,724) = annual reports (56,176) + social responsibility reports, ESG reports, sustainability reports, environmental reports and other texts (12,548).Data preview
Data Retrieval and **
You can get it by joining the data member database (at the same time, you can also ** more other massive high-quality data Du Niang.)