tags : Algorithms, Bit Shifting
image found on twitter^ (gkcs_)
Abstract data types
- When we start questioning “is this a datastructure or an algorithm?”
- These are defined by what you can do with it; what operations it supports.
- Well, it uses a Data Structure and uses an Algorithm on top of the data structure.
Maps
Hashmap/Hashtable
Requirement
- Stability: Given the same key, your hash function must return the same answer.
- Distribution: Given two near identical keys, the result should be wildly different.
- Load factor: Once the number of entries across each bucket passes some percentage of their total size(load factor)
- The map will grow by doubling the number of buckets
- Redistributing the entries across them.
Implementation
-
Chaining
- Implemented using array/linked list
- An array of buckets each of which contains a pointer to an array of key/value entries.
- Entries are added to a map, assuming a good hash function distribution
- Hash(Key) = Hashed_Key => mask it & get bottom few bits to get bucket offset
-
Open Addressing
- No hash buckets
Worst case
- WC can be caused by bad distribution/collision. Eg. All key resolutions point to the same bucket, now for the specific bucket it’ll be linear search so if everything gets dumped into the same bucket, you essentially have a linked list, i.e running time for lookup.
Resources
Merklizing the key/value store
Bloom filters
A fast index to tell if a value is probably in a table or certainly isn’t.
Maps ADTs 👀
LRU
- Usually used for caching. It’s not exactly a FIFO because we update the position of things based on when they are accessed.
- We need fast lookup: Hashmap
- Once a value is fetched from the cache, we need to move the
fetched value
to thefront
of the list. So the end of the list will have thelru
value. - It’s like a combination of hashmap + linked list. The
value
points to actual nodes directly. Therefore we get for- lookup/get :
moving from position to front
and returnvalue
- update
insertion-at-front
(if new key)moving from position to front
(if updating existing key)- Even though update technically doesn’t mean they fetched the value, but they touched it. Our logic says, if its touched it’s
used
.
- expand_cache:
deletion-at-end
(remove lru)
- lookup/get :
-
Implementation
- Eviction
- When over capacity, we need to evict things.
- Since eviction happens without user input. We need a way to know what
key
we have for ourvalue
. For this we need areverseLookup
map, then we can also delete the originalkey
from thelookup
map.
- Eviction
Lists
Array
- Pedantic definition here
capacity
must defined & allocated before using, otherwise cannot form array- Cannot expand
- Good for random access
Linked list
Technically, every linked list is a tree, which means every linked list is a graph.
- Node based data structure
- Singly linked, Doubly linked
- Better control over memory than array, you could create a object pool etc.
- You can’t do a binary search on a linked list, you must traverse. (I mean you can do but it’ll not be efficient for obvious reasons). So usually bad choice if you need random access.
Linked List BAD for modern CPUs!
From a reddit post:
When linked lists were first discussed in algorithms books in the 1970’s, processor speeds were roughly comparable to the speed of accessing main memory. So there wasn’t a substantial difference in cost between accessing the next element of an array vs the cost of following a pointer to the next element in a linked list.
But on modern computers the difference is staggering: the cpu can execute hundreds or even thousands of instructions in the time it takes to access a random address in memory. (And accessing the disk is thousands of times slower than that.)
To make up for this difference, for decades processors have been using caches: the L1 cache is built into the processor and is extremely fast but relatively small, the L2 cache is larger and slightly slower to access, the L3 cache is even larger and even slower.
The caches do a lot to mitigate the cost of accessing memory, but they work best when you’re accessing contiguous blocks of memory.
And that’s why linked lists don’t do as well in practice as they seem to when you just count the number of instructions they execute.
With a linked list, every time you traverse to the next item in the list, you’re jumping to a totally random location in memory, and there’s a decent chance that address isn’t in cache.
With an array, getting the next element in the array is in adjacent memory, so after the first access the rest of the array is probably cached.
So, what most people find is that in practice, a dynamic array is much faster in practice for many common list sizes.
Also, keep in mind that linked lists are not necessarily that efficient in terms of memory: if each item is 8 bytes or smaller, then every item in the list needs an additional 8 bytes of overhead (for the pointer). So a linked list might have 2x overhead, while a dynamically growing array might perform quite well with only 25% overhead.
Of course, it totally depends on the exact usage. There are absolutely cases where linked lists perform better than alternatives, and they should still be used in those cases.
- c++ - Bjarne Stroustrup says we must avoid linked lists - Stack Overflow
- In defense of linked lists | Hacker News
Dynamic Array/Array List
- Rapper names: grow-able array, array list
- Implementations: Go Slices, Java ArrayList, Python’s list, Javascript []
- You get: Random access. Well suited for stack likes (LIFO)
insert/remove
at the end/tail is , good stuffinsert/remove
at the start/head is , pretty bad- Optimal
stack
implementation is possible, Optimalqueue
implementation is not possible.- i.e Implementing a
queue
in Javascript with.push
and.shift
is probably not the best idea. But there have been improvements in.shift
recently.
- i.e Implementing a
- Whenever there is no room for appending a new element
- It creates a larger array
- Copies the contents of the current array to it
- Deletes the current array
- Sets the newly created array as current
- Finally, appends the element normally.
Array Buffer / Ring Buffer / Circular buffer
- No fixed head or tail. This also grows like Dynamic arrays but at the same time maintains order.
- No shifting of other elements in the buffer needed at insertion/removal unlike dynamic array/arraylist. So, Well suited for queue like interfaces (FIFO)
- You can implement things like queue/double ended queue(deque)/circular queue etc. w it.
- Circular Buffer Performance Trick - Cybernetist
List ADTs 👀
Queue
- Can be implemented on top of a Linked List data structure.
- It constrains what you can do with a linked list.
- We don’t even need a doubly linked list to implement a basic queue.
- FIFO
- The algorithm a queue implements is FIFO.
- FIFO is bad for temporal locality
- Interface
insert-at-end
: (akaenqueue
)remove-at-front
: (akadequeue
)
Stack
- LIFO
- LIFO is good for temporal locality
- Good to think of arrows pointing backwards from the head. So usually just
prev
on a Node works. - Interface
insert-at-end
: (akapush
)remove-at-end
: (akapop
)
Deque (double ended queue)
- Not to be confused w the
dequeue
operation. (Spelling is different!ue
) - This is an abstract data type, Java has an implementation of this interface. It can be implemented via Linked list/or some custom implementation like Java does with
ArrayDeque
. insert-at-end
: (akaenqueue
/push
)remove-at-end
: (akapop
)remove-at-front
: (akadequeue
)insert-at-front
:
In Java,
ArrayDeque
doesn’t have
- The overhead of node allocations that LinkedList does
- The overhead of shifting the array contents left on remove that ArrayList has.
- So eg. in Java, if you had to implement a
Queue
, you’d go withArrayDeque
as the underlying data structure rather thanLinkedList
orArrayList
.- Summary: It’s pretty good.
Trees
- Technically, trees are just linked-lists with multiple paths.
- Trees are also graphs
- i.e
link list = tree = graph
See Trees
Tree ADTs 👀
- Priority Queue (Implemented using Heap)
Graphs
- See Graphs