A high performance multilingual serialization framework, Fury

github address:

FURY is a multi-language serialization framework based on JIT dynamic compilation and zero copy, providing the ultimate in performance and ease of use

It supports the mainstream programming languages J**A Python C++ Golang J**Ascript, and other languages can be easily extended.

Unified multilingual serialization core competency:

Highly optimized serialized primitives.

zero-copy serialization support, out of band serialization protocol, and off-heap memory read/write support.

Based on JIT dynamic compilation technology, asynchronous multi-threading automatically generates serialization at runtime, optimizes performance, increases method inline, caches and eliminates dead, reduces virtual method calls, conditional branches, hash lookups, metadata writes, memory reads and writes, etc.

Multi-protocol support: Combines the flexibility and ease of use of dynamic serialization with the cross-language capabilities of static serialization.

j**a serialization:

Seamlessly replaces JDK Kryo Hessian, without modifying anything**, but provides up to 170 times the performance, which can greatly improve the efficiency of RPC calling, data transfer, and object persistence in high-performance scenarios.

100% compatible with JDK serialization, native support for JDK custom serialization methods WriteObject ReadObject WriteReplace ReadResolve ReadObjectNoData

Cross-language object graph serialization:

Multilingual Automatically serialize arbitrary objects across languages without the need to create IDL files, manually compile schema generation**, and convert objects to intermediate formats.

Multilingual Automatically serialize shared and circular references across languages without caring about data duplication or recursive errors.

Object type polymorphism is supported, and multiple subtype objects can be serialized at the same time.

Row-memory serialization:

It provides a cache-friendly binary random access row-memory format that supports skip serialization and partial serialization, which is suitable for high-performance computing and large-scale data transmission scenarios.

It supports automatic conversion with arrow column storage.

** author Nanqiu * Fury serialization framework usage example * @slf4jpublic class furyexample ** Deserialization * param bytes byte array * return object * public static object deserialize(byte bytes) public static void main(string args).", arrays.tostring(furyexample.serialize(furyobject)))furyobject furydeserialize = (furyobject)furyexample.deserialize(bytes); log.info("Deserialization: [id:{}name:{}text:{}furyobject2:{}", furydeserialize.getid(),furydeserialize.getname(),furydeserialize.gettext(),furydeserialize.getfuryobject2())

FURY defines and implements a set of basic serialization capabilities, based on which different multi-language serialization protocols can be quickly built, and optimized through compilation acceleration to achieve high performance. At the same time, the performance optimization of one protocol in terms of basic capabilities can also benefit all serialization protocols.

Common operations involved in serialization include:

bitmap bit operation.

Integer codec.

Integer compression. String creation & copy optimization.

String encoding: ASCII UTF8 UTF16.

Memory copy optimization.

Array copy compression optimization.

Metadata encoding & compression & caching.

Fury has made a lot of optimizations for each language for these operations, combining SIMD instructions and advanced language features to push the performance to the extreme, so that it is convenient to use different protocols.

In large-scale data transmission scenarios, there are often multiple binary buffers inside an object graph, and the serialization framework will write these data to an intermediate buffer during the serialization process, introducing multiple time-consuming memory copies. Borrowing from the zero-copy design of pickle5, ray and arrow, Fury implements a set of out-of-band serialization protocols, which can directly capture all binary buffers in an object graph, avoid intermediate copies of buffers, and reduce the memory copy overhead to 0 during serialization.

fury turns off reference support when zero-copy serialization process. As shown in Fig

Currently, FURY has built-in zero-copy support for the following types:

j**a: all basic types of arrays, bytebuffer, arrowrecordbatch, and vectorschemaroot.

python: all arrays, numpy arrays, pyarrowstable、pyarrow.recordbatch。

golang：byte slice。

Users can also extend the new zero-copy type based on the Fury interface.

For custom type objects to be serialized, which usually contain a large amount of type information, FURY uses these type information to directly generate efficient serialization at runtime, and completes a large number of runtime operations in the dynamic compilation stage, thereby increasing method inlining and caching, reducing virtual method calls, conditional branches, hash lookups, metadata writes, in-memory reads and writes, etc., and finally greatly accelerates serialization performance.

For the J**A language, FURY implements a set of runtime generation frameworks, defines a set of operator expressions ir for serialization logic, makes type inference based on the generic information of object types at runtime, and then builds an expression tree describing serialization logic, and generates efficient j**a according to the expression tree , and then compile it into bytecode through Janino at runtime, and then load it into the user's classloader or the classloader created by Fury, and finally compile it into an efficient assembly through j**a jit.

Since the JVM JIT will skip the compilation and inlining of large methods, Fury has also implemented a set of optimizers to recursively split large methods into small methods, so as to ensure that all ** generated by FURY can be compiled and inlined, squeezing the performance of JVM to the extreme.

As shown in Fig

At the same time, FURY also supports asynchronous multi-threaded dynamic compilation, which submits the ** generation tasks of different serializers to the thread pool for execution, and executes them in interpreted mode before the compilation is completed, so as to ensure that there will be no serialization glitches and no need to warm up all types of serialization in advance.

Because serialization requires close manipulation of the objects of each programming language, and the programming language does not expose the low-level API of the memory model, there is a large overhead in calling through native methods, so it is not possible to build a unified serializer JIT framework through LLVM, but it is necessary to implement a specific ** generation framework and serializer construction logic in combination with language characteristics within each language.

While JIT compilation can greatly improve serialization efficiency and regenerate better serializations based on the statistical distribution of data at runtime, languages such as C++ Rust do not support reflection, do not have a virtual machine, and do not have a low-level API that provides an in-memory model, so they cannot generate serialization through JIT dynamic compilation for such languages.

For such scenarios, Fury is implementing an AOT static generation framework that generates serializations in advance based on the object's schema at compile time, and then uses the generated ones for automatic serialization. For Rust, Rust's macro will also be generated at compile time in the future, providing better ease of use.

When serializing custom types, the fields will be reordered to ensure that the fields of the same interface type are serialized sequentially, increasing the probability of cache hits, and also promoting CPU instruction caching to achieve more efficient serialization. For basic type fields, the write order is sorted in descending order of byte field size, so that if the start address is aligned, subsequent reads and writes will occur at the memory address alignment location, and the CPU execution is more efficient.

Based on the core capabilities of multilingual serialization provided by Fury, three serialization protocols are built on top of this, which are suitable for different scenarios

J**A serialization: It is suitable for pure J**A serialization scenarios and provides a performance improvement of more than 100 times.

Cross-language object graph serialization: Ideal for application-oriented multilingual programming, as well as high-performance cross-language serialization.

Row-memory serialization: suitable for distributed computing engines, such as Spark Flink Dories Velox Sample Stream Processing Framework Feature Storage.

In addition, users can also build their own protocols based on Fury's serialization capabilities.

Due to the widespread use of J**A in big data, cloud native, microservices, and enterprise-level applications, the performance optimization of J**A serialization can greatly reduce system latency, improve throughput, and reduce server costs.

That's why FURY has made a lot of extreme performance optimizations for j**a serialization, so that it has the following capabilities:

Extreme performance: By using the type and generic information of J**A objects, combined with JIT compilation and unsafe low-order operations, Fury has up to 170 times higher performance than JDK and up to 50 100 times better performance than Kryo Hessian.

100% JDK serialization API compatibility: The semantics of all JDK custom serialization methods WriteObject ReadObject WriteReplace ReadResolve ReadObjectNoData are supported to ensure the correctness of replacing JDK serialization in any scenario. Existing J**A serialization frameworks such as Kryo Hessian have certain correctness problems in these scenarios.

Backward and backward compatibility: If the class schema of the deserialization side and the serialization side are inconsistent, the deserialization can still be correctly deserialized, and the application can be independently upgraded and deployed, and fields can be added or deleted independently. In addition, the metadata is extremely compressed and shared, and the type-compatible mode achieves almost no performance loss compared with the type-consistent mode.

Metadata sharing: Metadata (class name, field name, final field type information, etc.) is shared between multiple serializations in a certain context (TCP connection), and this information will be sent to the peer during the first serialization in this context, and the peer can rebuild the same deserializer according to the type information.

Zero copy support: Supports out of band zero copy and off-heap memory read/write.

Cross-language object graph serialization is mainly used in scenarios where there are higher requirements for dynamics and ease of use. Although frameworks such as Protobuf Flatbuffer provide multilingual serialization capabilities, there are still some shortcomings:

IDL needs to be written in advance and statically compiled and generated**, which is not dynamic and flexible enough.

The resulting class does not conform to object-oriented design, cannot add behavior to the class, and cannot be used directly as a domain object for multilingual application development.

Subclass serialization is not supported. The main feature of object-oriented programming is that subclass methods are invoked through interfaces. This type of model is also not well supported. Although flatbuffer provides union, protobuf provides oneof any features, which require the type of object to be determined during serialization and deserialization, which is not in line with the design of object-oriented programming.

Loops and shared references are not supported, you need to redefine a set of idls for domain objects and implement reference parsing by yourself, and then write ** in each language to realize the conversion between domain objects and protocol objects, if the object graph has a deep number of nested layers, you need to write more**.

Combined with the above points, FURY implements a set of cross-language object graph serialization protocols:

Multi-language Automatic serialization of arbitrary objects across languages: Define two classes on the serialization and deserialization sides to automatically serialize objects in one language to objects in another language, without the need to create IDL files, compile schema generation, and convert handwriting.

Multilingual: Automatically serialize shared and circular references across languages.

Object type polymorphism is supported, which conforms to the object-oriented programming paradigm, and multiple subtype objects can be automatically deserialized at the same time, without the need for users to manually deal with them.

At the same time, we also support out of band zero copy on this protocol

Example of automatic cross-language serialization:

For high-performance computing and large-scale data transmission scenarios, data serialization and transmission are often the performance bottlenecks of the entire system. If the user only needs to read part of the data, or filter based on a field of an object, deserializing the entire data will incur additional overhead. Therefore, Fury also provides a set of binary data structures, which can read and write directly on binary data, avoiding serialization.

Apache Arrow is a mature column-store format that supports binary reads and writes. However, column storage cannot meet the requirements of all scenarios, and the data in link and stream computing scenarios is naturally a row storage structure, and the column computing engine will also use the row storage structure when data changes and hash join aggregation operations are involved.

Computing engines such as Spark Flink Doris Velox define a set of rowstore formats, which do not support cross-language and can only be used internally by their own engines and cannot be used in other frameworks. Although FlatBuffer can support on-demand deserialization, it requires static compilation of schema IDL and management offset, which cannot meet the dynamic and easy-to-use requirements of complex scenarios.

Therefore, in the early days, Fury borrowed the Spark Tungsten and Apache Arrow formats to implement a set of binary row memory structures that can be accessed randomly, and now implements the J**A Python C++ version, which realizes direct reading and writing on binary data, avoiding all serialization overhead.

The binary format of the fury row format, as shown in the image:

The format is densely stored, data aligned, cache-friendly, and read/write faster. By avoiding deserialization, the J**A GC pressure can be reduced. At the same time, it reduces the overhead of Python, and due to the dynamic nature of Python, Fury's data structure implements getattr getitem slice and other special methods to ensure that the behavior is consistent with the Python dataclass list object, and the user does not have any awareness.

The following is some j**a serialized performance data, in which the charts with the title of compatible are the performance data under the compatibility of supported types, and the charts with the title of the charts that do not contain compatible are the performance data under the compatibility of unsupported types. To be fair, all test furies have turned off the zero-copy feature.

Note: This article mainly refers to the Alibaba Cloud community blog post of Yang Chaokun, the author of Fury.

Read Recommended] More exciting content, such as:

Redis series.

Data Structures & Algorithms.

NACOS series.

MySQL series.

JVM series.

Kafka series.

Please move to the personal homepage of [Nanqiu] for reference. The content is constantly being updated.

About the author] An old babe who loves technology and life, focuses on the field of J**A, and pays attention to [Nanqiu classmates] to take you to learn and grow Xi together

A high performance multilingual serialization framework, Fury

Related Pages

10 games in a winning streak?Liao Basket is a ride in the dust!CCTV5 made a proper decision, and Guo

India's economic growth is on the rise, but it may be the first target of the dollar

"Return to the Ruins of the South China Sea" Chen Muchi's ancient guess is unbeatable, and I am sinc

OnePlus Ace Pro's top-of-the-line processor has high performance and runs smoothly, only 2299 yuan!

1 4313 is a high-strength, high-corrosion resistant stainless steel material Encyclopedia