tags : Cryptography, Data Structures, HMAC, Encryption

FAQ

Are collisions expected?

  • Yes, because input can be any string, but the number returned is within some promised range, it’s possible that two different inputs can return the same number. This itself is the definition of an collision.
  • Good hash functions try to minimise how many collisions they produce. It’s not possible to completely eliminate collisions.
  • We have ways to handle collisions, when they occur.
  • Few hash functions are collision-free, but most are designed so that collisions are minimized. See Perfect hash function

Avalanche Effect

  • Refers to how many bits in the output value change when just a single bit of the input changes.
  • To say that a hash function has a good avalanche effect, a single bit flip in the input should result in an average of 50% the output bits flipping.

Non-Crypographic hashes?

  • Can be reversed using various methods, including brute-force attacks, rainbow table attacks, or other techniques depending on the specific hash function and its design.

Is MD5 cryptographic hash or no?

  • Originally it was a cryptographic hash function but later attacks were shown
  • Now its popularly used for non-cryptographic usage

TODO Length Extension Attack? what it do with HMAC?

MAC and HMAC

See HMAC

Types by usage

See The many flavors of hashing — Volution Notes

TypeExampleCryptographic
Shuffling Hashes/Lookup Hashesxxh3/SipHashNo
Integrity hashesNo
Similar Itemssoundex/perceptual hashNo
Signature hashesYes

Shuffling/Lookup hashes

  • What: Given an input, return a small number (16, 32, or 64 bits).
  • performance for small inputs
  • Eg. xxh3/xxh128/highwayhash/SipHash/djb2/Murmur3 (See comparision)
    • SipHash is mostly used as the hash function in hash-table implementation in most programming languages
  • Usage
    • Implementing data structures like hash tables
      • Use a hash function to determine which bucket an item belongs to(evenly distribute items among the buckets)
      • Converting a key into an index for memory addresses when constructing hash tables
    • Classifying inputs into a small number of buckets in sharding (see Scaling Databases)
    • Can be used for storage, but for storage, usually cryptographically strong hashes are used.
  • Tips

Integrity hashes

  • Not cryptographic
  • Eg. CRC, MD5

Signature Hashes/Crypographic Hashes/Secure hashes

  • Even theoretically ‘broken’ ones like MD5, these hashes exhibit distribution that’s indistinguishable from randomness
  • Non-invertible and must be resistant to
    • Collision attacks
    • Preimage attacks
  • Larger output size than general hashing functions
  • Eg. BLAKE3, SHA256(with HMAC).
  • Whether or not cryptographic hash functions are truly one-way functions is like P vs NP (Complexity Theory)
  • Not to be used for password hashes, to be used with libraries like libsodium etc.

Find similar items hashes

Key Derivation Hashes (KDF) / Token Hash

  • See PKI
  • Taking a known (and secret) key to generate any number of unguessable tokens.
  • It is possible to construct a public key signature using a cryptographic hash as the only underlying primitive. These can be proven secure based on the usual assumptions about the security of the underlying cryptographic hash.
  • A major drawback is that they are expensive to compute, and the signatures require a huge number of bits compared to other public key algorithms.
  • Example

Password Hashing

  • This is similar to KDF
  • When users are sending a username and password over HTTP
  • Not all password hashes are meant for key derivation. Eg. bcrypt should not be used for KDF. (Argon2 and scrypt) can be safely used as a password hashing function or as a KDF.
  • Eg. Argon2, scrypt, bcrypt. (See Aaron Toponce : Let’s Talk Password Hashing)

Encryption

  • Hashes are used as mixing functions in encryption
  • But you can also trivially turn a hash function into a cipher and encrypt with it (apologies if I missed an explanation of this in the article). Just hash a key and a counter together to create a keystream and XOR your plaintext to it. That’s how Salsa20 and ChaCha20 work.

Other uses

Properties of hash functions

See Hash Map in Data Structures

Uniform Distribution

One way function (Some)

Avalanche behavior

collision resistance

Performance

Cryptographic/Non-Cryptographic

Other Rough properties

  • Some hash functions work only for numbers, others only for strings, some work for anything, etc
  • Optionally takes a second parameter, called a seed or key;
  • Some hash functions are reusable across multiple topics, some are not