Talk about object storage again in 2020-1. From academic projects to national standards

2020-06-22

The theme of this article is the practical application of object storage and some case feedback. The main purpose is to discuss which scenarios are suitable for the development of object storage to this day, and the key determinants. As for the basic concept of object storage and some technical details, I believe that you have already obtained sufficient knowledge background in different places, so I won’t make an in-depth introduction here.



The content will be divided into three parts. The first part reviews the development and current status of object storage. The second part focuses on the applicable technical characteristics and applicable scenarios when using object storage. The third part summarizes and summarizes some practical application cases.


So first we will look at the development of object storage


From my understanding, there are several obvious time boundaries in the industry's understanding of object storage. Considering the speed of technological change in the IT industry, this is nicknamed "the knowledge of object storage for generations".


The standard for object storage originally came from the field of academic research. Carnegie Mellon University in the United States has a parallel data laboratory. In 1995, a project called network attached secure disk was established. In this project, object storage was first proposed. concept. As the project continues to move forward, it has been recognized on a larger scale. In 1999, SNIA (Network Storage Industry Association) established a working group called OSD (Object Storage Device). This working group released the ANSI x3 T10 standard, which is the prototype of object storage.


Five years later, in 2004, SNIA officially released the OSD 1.0 standard. The first wave of object storage technology trends emerged, a typical process from academic to industrial. In this first wave, Oracle launched a standard implementation called Lusture. I believe that people in universities and research institutions are more familiar with it. It is widely used in scientific computing and other fields. It can be described as the first generation of well-known object storage. 


The second wave of technology for object storage is powered by Amazon. AWS officially launched in 2006. S3 object storage is its first cloud service. With the global boom in cloud computing, the popularity of object storage has been further improved. Great development, I believe many people are aware of the existence of object storage for the first time from S3.


The third technology boom has a lot to do with the rise of OpenStack in my opinion. The earliest two components of the OpenStack project are Nova and Swift. Among them, Nova contributed by NASA is to solve the problem of virtualization, and another Swift project is the object storage contributed by Rackspace. Around 2012, a star project Ceph appeared in OpenStack, I believe everyone is familiar with it today. The feature of this unified storage project that first attracted everyone's attention is that it can provide distributed block storage, but I did not expect that in the past 8 years, the most popular and most used one is its object storage function. And the leading professional object storage company SwiftStack has also appeared in the news of being acquired by NVidia this year, and the development of things is really unpredictable.


Returning to the country, starting in 2018, a wave of object storage began to appear: Not only are there many object storage companies on the market, but also traditional IT manufacturers have launched their own object storage products, and a large number of Users began to adopt object storage solutions in different industries. Until 2020, object storage is still receiving much attention.


It has been 15 years since the object storage has been developed. We might as well look at some interesting perspectives, and different architecture diagrams can also see different senses of the times.



From the simple-style architecture diagrams attached to the papers and standards at the beginning to the flat two-dimensional architecture diagrams in the later period, it can be seen that the information expressed gradually changes from abstraction to more and more specific, emphasizing the description of specific deployment Logical architecture.



At the same time, if you pay attention to the technical highlights of the object storage products launched in different years, you will find that the technical highlights have changed from the advanced nature of the network to the combination of emphasis and container technology. I believe that you can also see the whole leopard and feel the IT technology. Wave after wave of change.



The relevant technical standards for object storage have also developed and changed in recent years.


Before 2015, the industry still followed the de facto standards and enterprise standards, and the main reference objects were Amazon's S3 and Swift.



In September 2015 and August 2019, the National Standardization Committee issued two relevant national standards. They are:


"Standard Number: GB/T 37732-2019; Standard Name: Information Technology Cloud Data Storage and Management Part 2: Object-based Cloud Storage Application Interface"


"Standard Number: GB/T 31916.2-2015; Information Technology Cloud Computing General Technical Requirements for Distributed Block Storage System"



Strictly speaking, the two major frameworks promulgated by the two are standards that belong to the category of cloud computing, but they include cloud storage technology standards, and object storage as a sub-category occupies a lot of space.


These two standards have made a lot of elaboration on the system architecture, functions and interface protocol specifications of object storage, which are more detailed and worth a look. (Related information can be obtained online)


The main focus of this article is on application scenarios, so let's take a look at the application scenarios of open source object storage Swift described at the time 8 years ago in 2012.


1. The first is the mobile Internet: the huge number of terminals represents a large number of users. The amount of data generated by users is also growing rapidly. The proportion of unstructured data is very large. Swift is more suitable in this scenario.


2. The second is in the field of games: the proportion of page games and mobile games is increasing, the number of game users is increasing, and the game-related data of each game user is also increasing, coupled with the need for large concurrent multiplayer online , This is all Swift is good at.


3. In addition to the traditional archiving field, there has been a need for "hot archiving" in recent years, and the requirements for data response have been greatly improved. From a few hours to a few minutes, Swift once reached the second level in a case test and performed excellently, so this It is also a new field.


4. In terms of big data, someone once considered the combination of Hadoop and swift and conducted some tests. The main method is to use Swift instead of HDFS. This attempt is mainly to consider the several benefits of swift. On HA, the native HDFS has a single order The risk of point failure, Namenode does not have HA. In addition, the HDFS client cache is 64M, which is sometimes not very suitable. In addition, swift itself is designed with multi-tenant support. It will be more convenient if the built system wants to be reused.


Of course, by 2020, the application scenarios have been greatly expanded. In addition to audio and video data, we have seen in high-performance computing, scientific research, big data analysis, AI, earthquake, remote sensing, meteorology, finance, medical, In the fields of transportation and security, object storage has begun to have practical applications.



The aggregate performance bandwidth of object storage has increased tremendously in recent years. For common analysis frameworks such as Big Data and AI typical computing frameworks such as Spark, Presto, Tensorflow, Teradata, Vertica, Splunk, etc., the bandwidth performance of 10GB/S is very attractive ; There are many MPP databases that have actually adopted it as back-end storage, and the influence of object storage is becoming more and more important.


WeChat
Weibo
Customer Service Hotline
400-606-9627

© 2016 United Information Technology Co., Ltd.粤ICP备05121547

Legal | Privacy | Sitemap | Careers | Feedback | Contact Us