tags : System Design, Systems, Codec

The description are itself vague so don’t get too pedantic and this is for personal ref.

FAQ

Why Little Endian useful?

  • It helps CPU increase length of numbers easily.
    44 33 22 11 (little end, int32)
    44 33 22 11 00 00 00 00 (little end, int64)
  • You can’t do this shit w big endian, it’ll change the number.
  • This is mostly useful at the compiler/cpu level but not so much useful at the protocol level.

What’s the deal with variable-length encoding?

  • Main idea: small numbers should take up less space than big numbers.
  • There are different schemes for doing so. Golang’s binary package does the same what Protocol Buffers does.

More resources

Encoding

Serialization is a specific instance of encoding. How serialization relates to the term “encoding” is slightly vague.

Binary to Text (Eg. Base64)

Encoding of binary data in a sequence of printable characters.

Text to Binary

This is what computers do anyway.

Fun experiments

Data serialization

Data serialization refers to the process of translating data structures or object state(from memory) into a different format capable of being stored (in-memory or file), or transmitted and reconstructed at a different point.

Use-cases

  • Transferring Data through the Wires
  • Creating file formats
  • Network Transmission
  • RPC via. Interface Definition Language (IDL)
  • Other uses

Formats

How to

  • File formats can be implemented using data serilization formats. Eg. Apache Parquet is implemented using the Apache Thrift framework.

Serialization and De-serialization

When you use serialization you also expect de-serialization. These are not part of the serialization format. These are separate things.

Serialization/Writing/Generation

  • Eg. Construct(python only) does both parsing and serialization.
  • Boost.Serialization, cereal, bitsery, msgpack also do serilization and deserialization but they have their own binary format.

Deserialization/Reading/Parsing

  • Eg. Kaitai just does parsing.

Serialization formats

These can be binary or plaintext. They can have different tradeoffs like, fixed width integers, varints, length-prefixed strings, circular references, weak references, suppress repeated values, efficient encoding, special-cased encodings, headers or lack of it, having to define schemas on advance etc. etc.

Binary formats

Text formats

  • .txt, markdown, yaml, xml

Protocols

How to

  • When we design a protocol, we need to design a protocol handler
  • Is not about how binary blobs are encoded. (They’re always binary for networking)
  • Is about whether the protocol is oriented around data structures or around text strings. Is it supposed to be readable by humans or by machines is the question to ask. Eg. HTTP is a text protocol, even though when it sends a jpeg(binary) image, it just sends the raw bytes, not a text encoding of it.

Binary protocols

  • Will always be more space efficient than text protocols.
  • Examples: RTP, TCP, IP, TLS, SSH, MQTT.

Text protocols

  • Examples: SMTP, HTTP, SIP.

Binary vs Text protocols

Not strict def. but general idea.

  • How a computer parses JSON:
    • You’re kind of like advancing one rune at a time
    • And kind of maintaining some look back, looking for a bunch of object delimiters
    • Keeping state for how deeply nested in this object you are etc.
    • complicated, stateful process.
  • How a computer parses Binary data:
    • It’ll say, hey, the next field coming up is a string, and it’s 70 bytes long.
    • Then the parser just like grabs the next 70 bytes
    • Interprets them as a string in memory and is done.

Some Theory

RPC

  • Data Serialization Formats
  • Protocols
  • RPC sort of combines both.
    • Not really, depends on implementation but you get the idea.
    • Some data of some format is getting transferred via some protocol

gRPC

What about GRPC?

  • A wrapper on top of an 2 (atm) server that communicates using PB
  • Offers features such as, Streaming, Cancellation, Circuit Breaking, load balancing, tracing, metric collection, header propagation,authorization, IDL etc.

REST vs GRPC

REST is not at all an RPC framework, it is an architectural style for constructing web services. So in an way, it’s an apples to oranges comparison. But there are usecases where you might want to use GRPC over REST and vice versa. Here’s a table from the internets.

  • See REpresentational State Transfer
  • With a PRC system peculiarities of serialization (like, say, JSON’s lack of 64-bit numbers) are a non-issue
  • REST can be implemented without HTTP, a home-grown binary substitute can be use and you can still be restful.
  • You can deploy a RESTful service over ordinary email exchange for instance.
  • But using HTTP has benefits, such as you’ll have HTTPs caching infrastructure at your disposal.
  • A detailed comparison of REST and gRPC | Hacker News