tags : Cryptography, Data Structures, HMAC, Encryption
FAQ
Are collisions expected?
- Yes, because
input
can be any string, but the number returned iswithin some promised range
, it’s possible that two different inputs can return the same number. This itself is the definition of an collision. - Good hash functions try to minimise how many collisions they produce. It’s not possible to completely eliminate collisions.
- We have ways to handle collisions, when they occur.
- Few hash functions are collision-free, but most are designed so that collisions are minimized. See Perfect hash function
Avalanche Effect
- Refers to how many bits in the output value change when just a single bit of the input changes.
- To say that a hash function has a good avalanche effect, a single bit flip in the input should result in an average of 50% the output bits flipping.
Non-Crypographic hashes?
- Can be reversed using various methods, including brute-force attacks, rainbow table attacks, or other techniques depending on the specific hash function and its design.
Is MD5 cryptographic hash or no?
- Originally it was a cryptographic hash function but later attacks were shown
- Now its popularly used for non-cryptographic usage
TODO Length Extension Attack? what it do with HMAC?
MAC and HMAC
See HMAC
Types by usage
- See The many flavors of hashing — Volution Notes
- See Use Fast Data Algorithms | Joey Lynch’s Site (You might as-well use something other than md5)
Type | Example | Cryptographic |
---|---|---|
Shuffling Hashes/Lookup Hashes | xxh3/SipHash | No |
Integrity hashes | No | |
Similar Items | soundex/perceptual hash | No |
Signature hashes | Yes |
Shuffling/Lookup hashes/Similar
- What: Given an input, return a small number (16, 32, or 64 bits).
- performance for small inputs
- Eg. xxh3/xxh128/highwayhash/SipHash/djb2/Murmur3 (See comparision)
- SipHash is mostly used as the hash function in hash-table implementation in most programming languages
- Usage
- Implementing data structures like
hash tables
- Use a hash function to determine which bucket an item belongs to(evenly distribute items among the buckets)
- Converting a key into an index for memory addresses when constructing hash tables
- Classifying inputs into a small number of buckets in sharding (see Scaling Databases)
- Can be used for storage, but for storage, usually cryptographically strong hashes are used.
- Implementing data structures like
- Tips
- Do not use for security
- If input is controlled by user, consider using seed/key.
- Expect regular collisions
- Go with prime number or powers of 2 (avoids expensive modulo operations) for bucket count.
Find similar items hashes
- Text: Soundex / metaphone / doublemetaphone.
- Images: ahash/phash/dhash/whash/blockmeat/colormoment/colorhas/marrHildreth/Perceptual hashing - Wikipedia
- perceptual hash functions do not provide the same properties as cryptographic hash functions.
- AsuharietYgvar/AppleNeuralHash2ONNX#1 Working Collision?
- Audio: acoustid
- Digest: MD5
Integrity hashes
- Not cryptographic
- Eg. CRC, MD5
Signature Hashes/Crypographic Hashes/Secure hashes
- See Cryptographic and Non-Cryptographic Hash Functions · Dadario’s Blog
- Even theoretically ‘broken’ ones like MD5, these hashes exhibit distribution that’s indistinguishable from randomness
- Non-invertible and must be resistant to
- Collision attacks
- Preimage attacks
- Larger output size than general hashing functions
- Eg. BLAKE3, SHA256(with HMAC).
- Whether or not cryptographic hash functions are truly one-way functions is like P vs NP (Complexity Theory)
- Not to be used for password hashes, to be used with libraries like libsodium etc.
Key Derivation Hashes (KDF) / Token Hash
- See PKI
- Taking a known (and secret) key to generate any number of unguessable tokens.
- It is possible to construct a public key signature using a cryptographic hash as the only underlying primitive. These can be proven secure based on the usual assumptions about the security of the underlying cryptographic hash.
- A major drawback is that they are expensive to compute, and the signatures require a huge number of bits compared to other public key algorithms.
- Example
- HKDF is a valid KDF (stricter requirements than a PRF, Based on HMAC)
- Blake3
Password Hashing
- This is similar to KDF
- When users are sending a username and password over HTTP
- Not all password hashes are meant for key derivation. Eg. bcrypt should not be used for KDF. (Argon2 and scrypt) can be safely used as a password hashing function or as a KDF.
- Eg. Argon2, scrypt, bcrypt. (See Aaron Toponce : Let’s Talk Password Hashing)
Encryption
- Hashes are used as
mixing functions
in encryption - But you can also trivially turn a hash function into a cipher and encrypt with it (apologies if I missed an explanation of this in the article). Just hash a key and a counter together to create a keystream and XOR your plaintext to it. That’s how Salsa20 and ChaCha20 work.
Other uses
Properties of hash functions
See Hash Map in Data Structures
Uniform Distribution
One way function (Some)
Avalanche behavior
collision resistance
Performance
Cryptographic/Non-Cryptographic
Other Rough properties
- Some hash functions work only for numbers, others only for strings, some work for anything, etc
- Optionally takes a second parameter, called a seed or key;
- Some hash functions are reusable across multiple topics, some are not