The development of big data

Mondo Technology Updated on 2024-02-01

It is precisely because of the widespread existence of big data that the solution of big data problems is very challenging. Its wide application has prompted more and more people to pay attention to and study big data. The following are some representative events in the development of big data.

In 2005, the Hadoop project was born. Hadoop was originally derived from Google's programming model package called MapReduce, which was originally only related to web indexing, and was introduced by Apache Software as a distributed system infrastructure. Hadoop can help users develop distributed programs without understanding the underlying details of distribution, and make full use of the power of clusters for high-speed computing and storage, so as to process data in a reliable, efficient, and scalable way. The core design of the Hadoop framework is HDFS and MapReduce, HDFS provides storage for massive data, and MapReduce provides computing for massive data.

At the end of 2008, "big data" was recognized by some well-known computer science researchers in the United States, and the industry organization "Computing Community Consortium" published an influential report on big data computing: creating revolutionary breakthroughs in business, science, and society. The fact that big data really matters about new uses and new insights, rather than the data itself, has changed the way people think about it. The Computing Community Coalition was the first to come up with the concept of big data.

In mid-2009, the United States ** passed the launch of datagov** way to make a wide variety of ** data available to the public. The ** is more than 4The 450,000 data sets were used to ensure that a number of smartphone apps could track information, including flight information, product recalls and region-specific unemployment rates, spurring similar initiatives across Kenya and the UK.

In February 2010, Kenneth Cooker published a 14-page report on big data in The Economist, "Data, Data Everywhere". In his report, Kucker said: "The world has an unimaginably large amount of digital information, and it is growing at a very fast rate. From the world of economics to science, from the first sector to the field of art, the impact of this huge amount of information has been felt in many ways. Scientists and computer engineers have coined a new term for this phenomenon: "big data". As a result, Cooker became one of the first data scientists to see the trends of the big data era.

In February 2011, IBM's Watson supercomputer, which scans and analyzes 4 terabytes (about 200 million pages of text) per second, won the championship by beating two human contestants on the famous American quiz TV show "Jeopardy". Later, the New York Times considered the moment a "big data computing victory."

In May 2011, the McKinsey & Company Global Institute (MGI), a world-renowned consulting firm, released a report - "Big Data: The Next New Frontier of Innovation, Competition and Productivity", which is the first time that a professional organization has comprehensively introduced and envisioned big data. According to the report, big data has permeated every industry and business function today, becoming an important production factor. The mining and use of massive amounts of data heralds the arrival of a new wave of productivity growth and consumer surpluses. The report also mentions that "big data" stems from a dramatic increase in the capacity and speed at which data is produced and collected – revolutionizing the ability to generate, transmit, share and access data as more people, devices and sensors are connected through digital networks.

In December 2011, the Ministry of Industry and Information Technology issued the 12th Five-Year Plan for the Internet of Things, proposing information processing technology as one of the four key technological innovation projects, including massive data storage, data mining, and image intelligent analysis, which are all important components of big data.

In January 2012, big data was one of the main themes at the World Economic Forum in Davos, Switzerland, and the report "Big Data, Big Impact" was released, declaring that data has become a new economic asset class, just like money or **.

In March 2012, Barack Obama launched the Big Data Research and Development Initiative at the White House, marking that big data has become an important feature of the times. On March 22, 2012, Obama announced that $200 million in the field of big data was a watershed in the rise of big data technology from business behavior to the national science and technology strategy. National digital sovereignty embodies the possession and control of data. Digital sovereignty will be another space for great power to play after border defense, coastal defense, and air defense.

In April 2012, Splunk, an American software company, was successfully listed on the NASDAQ on the 19th, becoming the first big data processing company to go public. Against the backdrop of a persistent and persistent U.S. economy, Splunk's standout trading performance on the first day was particularly impressive, with it more than doubling on the first day. Founded in 2003, Splunk is a leading software provider of big data monitoring and analytics services. The successful listing of Splunk has promoted the attention of the capital market to big data, and at the same time, IT vendors have also accelerated the deployment of big data.

In July 2012, the United Nations released a report on big data in New York summarizing how countries can use big data to better serve and protect their people. This *** exemplifies the respective roles, motivations, and needs of individuals, the public sector, and the private sector in a data ecosystem. For example, through the desire for attention and better service, individuals provide data and crowdsourced information, and demand for privacy and opt-out rights; The public sector provides statistical data, device information, health indicators, and tax and consumer information for the purpose of improving services and efficiency, as well as the need for privacy and opt-out powers. ** It also points out that the abundance of data resources available to people today, both old and new, can be used to analyze social demographics in real-time like never before.

In April 2014, the World Economic Forum released the 13th edition of the Global Information Technology Report on the theme of "The Rewards and Risks of Big Data". According to the report, policies for various ICTs will become even more important in the coming years. The increasing activity of the global big data industry and the accelerated development of technological evolution and application innovation have made countries gradually realize the great significance of big data in promoting economic development, improving public services, enhancing people's well-being, and even ensuring the best of life.

In May 2014, the White House released the 2014 Global "Big Data"** research report "Big Data: Seizing Opportunities and Protecting Value". The report encourages the use of data to drive social progress, especially in areas where markets and existing institutions do not otherwise support such progress; Frameworks, structures, and research are needed to help protect Americans' strong beliefs about protecting individual privacy, ensuring fairness, or preventing discrimination.

In March 2016, China's "13th Five-Year Plan" pointed out the implementation of the national big data strategy, taking big data as a basic strategic resource, comprehensively promoting the development of big data, accelerating the sharing, opening and development and application of data resources, and helping industrial transformation and upgrading and social governance innovation. Comprehensively promote the efficient collection and effective integration of big data in key areas, deepen the correlation analysis and integrated utilization of high-quality data and social data, and improve the accuracy and effectiveness of macro-control, market supervision, social governance, and public services. Accelerate the research of key technologies in the fields of massive data collection, storage, cleaning, analysis and mining, visualization, security and privacy protection.

In December 2018, China held the "National Conference on Industry and Informatization". At the meeting, it was proposed to deeply integrate big data with cutting-edge innovative technologies such as cloud computing and artificial intelligence. The emergence and development of cutting-edge technologies such as big data, cloud computing, and artificial intelligence all come from the progress of social production methods and the development of the information technology industry, and the integration of cutting-edge technologies will enable ultra-large-scale computing, intelligent automation and massive data analysis, and complete information processing with high complexity and precision in a short time.

Big data is a revolution that will change the way we live, work, and think. The quantitative shift brought about by the huge amount of new data** has attracted great attention from academia, business and politics.

2 The development of big data technology

Big data technology is a new generation of technology and architecture, which extracts value from various ultra-large-scale data at a low cost and fast collection, processing, and analysis technology. Big data technology continues to emerge and develop, making it easier, cheaper and faster for us to process massive amounts of data, becoming a good assistant for using data, and even changing the business model of many industries

1) In the direction of big data collection and preprocessing. The most common problem in this direction is the multi-source and diversity of data, which leads to differences in the quality of data, which seriously affects the availability of data. In response to these problems, many companies have launched a variety of data cleaning and quality control tools (such as IBM's Data Stage).

2) In the direction of big data storage and management. The most common challenges in this direction are large storage scale, complex storage management, and the need to balance structured, unstructured, and semi-structured data. The development of distributed file systems and distributed database related technologies is effectively solving these problems. In the direction of big data storage and management, the development of big data indexing and query technology, real-time and streaming big data storage and processing, deserves our special attention.

3) Big data hardware and software architecture. One of the core tenets of big data computing is to grasp both software and hardware, start from the specific application, carefully select the software and hardware architecture to implement, and continue to optimize it collaboratively in the process of operation. In today's big data applications, the most successful and popular example of hardware and software collaborative optimization is the deep learning system based on neural networks. Leading Internet companies in the industry have built large fleets dedicated to deep learning for vision and speech. Then, in the process of system operation, software and hardware are optimized to improve the efficiency of the learning system. Intel Corporation funded participation in GraphLab and Petuum's open-source systems.

4) The direction of big data computing mode. Due to the diverse needs of big data processing, a variety of typical computing modes have emerged, including big data query and analysis computing (such as HIVE), batch computing (such as Hadoop MapReduce), stream computing (such as Storm), iterative computing (such as Haloop), graph computing (such as Pregel) and in-memory computing (such as HANA), and the hybrid computing mode of these computing modes will become an effective means to meet the diverse needs of big data processing and application.

5) Big data analysis and mining direction. At the same time as the amount of data is rapidly expanding, it is necessary to carry out in-depth data analysis and mining, and the requirements for automatic analysis are getting higher and higher, and more and more big data analysis tools and products have emerged, such as R Hadoop version for big data mining and data mining algorithms developed based on MapReduce.

6) Big data visualization analysis. Helping people explore and interpret complex data through visualization is conducive to decision-makers to explore the business value of data, which in turn contributes to the development of big data. Many companies are also conducting research to try to introduce visualization into their different data analysis and display products, and all kinds of potentially relevant products will continue to appear. The successful launch of the visualization tool tabealu reflects the need for big data visualization.

7) Big data security. When we are using big data analysis and data mining to obtain business value, hackers are likely to attack us and collect useful information. Therefore, the security of big data has always been a research direction that enterprises and academia are very concerned about. Technologies such as file access control to limit the presentation of data, underlying device encryption, anonymization protection, and encryption protection are protecting data to the greatest extent.

The Big Data Knowledge Series is written by Professor Fan Chongjun's team, and there is no strict context for each article. **Please indicate the source for this article).

Related Pages