tags : Cryptography, Data Structures, HMAC, Encryption

## FAQ

### Are collisions expected?

- Yes, because
`input`

can be any string, but the number returned is`within some promised range`

, it’s possible that two different inputs can return the same number. This itself is the definition of an collision. - Good hash functions try to minimise how many collisions they produce. It’s not possible to completely eliminate collisions.
- We have ways to handle collisions, when they occur.
- Few hash functions are collision-free, but most are designed so that collisions are minimized. See Perfect hash function

### Avalanche Effect

- Refers to how many bits in the output value change when just a single bit of the input changes.
- To say that a hash function has a good avalanche effect, a single bit flip in the input should result in an average of 50% the output bits flipping.

### Non-Crypographic hashes?

- Can be reversed using various methods, including brute-force attacks, rainbow table attacks, or other techniques depending on the specific hash function and its design.

### Is MD5 cryptographic hash or no?

- Originally it was a cryptographic hash function but later attacks were shown
- Now its popularly used for non-cryptographic usage

### TODO Length Extension Attack? what it do with HMAC?

### MAC and HMAC

See HMAC

## Types by usage

- See The many flavors of hashing — Volution Notes
- See Use Fast Data Algorithms | Joey Lynch’s Site (You might as-well use something other than md5)

Type | Example | Cryptographic |
---|---|---|

Shuffling Hashes/Lookup Hashes | xxh3/SipHash | No |

Integrity hashes | No | |

Similar Items | soundex/perceptual hash | No |

Signature hashes | Yes |

### Shuffling/Lookup hashes/Similar

- What: Given an input, return a small number (16, 32, or 64 bits).
- performance for small inputs
- Eg. xxh3/xxh128/highwayhash/SipHash/djb2/Murmur3 (See comparision)
- SipHash is mostly used as the hash function in hash-table implementation in most programming languages

- Usage
- Implementing data structures like
`hash tables`

- Use a hash function to determine which bucket an item belongs to(evenly distribute items among the buckets)
- Converting a key into an index for memory addresses when constructing hash tables

- Classifying inputs into a small number of buckets in sharding (see Scaling Databases)
- Can be used for storage, but for storage, usually cryptographically strong hashes are used.

- Implementing data structures like
- Tips
- Do not use for security
- If input is controlled by user, consider using seed/key.
- Expect regular collisions
- Go with prime number or powers of 2 (avoids expensive modulo operations) for bucket count.

#### Find similar items hashes

- Text: Soundex / metaphone / doublemetaphone.
- Images: ahash/phash/dhash/whash/blockmeat/colormoment/colorhas/marrHildreth/Perceptual hashing - Wikipedia
- perceptual hash functions do not provide the same properties as cryptographic hash functions.
- AsuharietYgvar/AppleNeuralHash2ONNX#1 Working Collision?

- Audio: acoustid
- Digest: MD5

### Integrity hashes

- Not cryptographic
- Eg. CRC, MD5

### Signature Hashes/Crypographic Hashes/Secure hashes

- See Cryptographic and Non-Cryptographic Hash Functions · Dadario’s Blog
- Even theoretically ‘broken’ ones like MD5, these hashes exhibit distribution that’s indistinguishable from randomness
- Non-invertible and must be resistant to
- Collision attacks
- Preimage attacks

- Larger output size than general hashing functions
- Eg. BLAKE3, SHA256(with HMAC).
- Whether or not cryptographic hash functions are truly one-way functions is like P vs NP (Complexity Theory)
- Not to be used for password hashes, to be used with libraries like libsodium etc.

### Key Derivation Hashes (KDF) / Token Hash

- See PKI
- Taking a known (and secret) key to generate any number of unguessable tokens.
- It is possible to construct a public key signature using a cryptographic hash as the only underlying primitive. These can be proven secure based on the usual assumptions about the security of the underlying cryptographic hash.
- A major drawback is that they are expensive to compute, and the signatures require a huge number of bits compared to other public key algorithms.
- Example
- HKDF is a valid KDF (stricter requirements than a PRF, Based on HMAC)
- Blake3

### Password Hashing

- This is similar to KDF
- When users are sending a username and password over HTTP
- Not all password hashes are meant for key derivation. Eg. bcrypt should not be used for KDF. (Argon2 and scrypt) can be safely used as a password hashing function or as a KDF.
- Eg. Argon2, scrypt, bcrypt. (See Aaron Toponce : Let’s Talk Password Hashing)

### Encryption

- Hashes are used as
`mixing functions`

in encryption - But you can also trivially turn a hash function into a cipher and encrypt with it (apologies if I missed an explanation of this in the article). Just hash a key and a counter together to create a keystream and XOR your plaintext to it. That’s how Salsa20 and ChaCha20 work.

### Other uses

## Properties of hash functions

See Hash Map in Data Structures

### Uniform Distribution

### One way function (Some)

### Avalanche behavior

### collision resistance

### Performance

### Cryptographic/Non-Cryptographic

### Other Rough properties

- Some hash functions work only for numbers, others only for strings, some work for anything, etc
- Optionally takes a second parameter, called a seed or key;
- Some hash functions are reusable across multiple topics, some are not