tags : Linux, syscalls, O, Memory Allocation

Usecases

  • Allocate anon memory (malloc uses it internally)
  • Create efficient read and write of files vs read() and write()
  • Glue some resource(portion of a file) to VAS of some process.
  • Multiple processes can each map the same resource into their memory space
  • IPC (MAP_SHARED | MAP_ANONYMOUS)
  • More control on permission
  • Eliminates the protection domain crossing from system calls and
    • kernel/userspace data copies that I/O system calls such as read() imply. ( See O )

Issues w using it in Databases

Transactional Safety

  • OS can flush dirty pages at any time (even if the writing transaction hasn’t commited!)
  • Best case scenario can be read-only
  • Solution: OS CoW, Shadow paging

I/O Stalls

  • Because with mmap, the OS manages your file I/O, you have no idea if your page is in memory or disk
  • Reading from disk can cause I/O stalls

Error Handing

  • We want to validate checksum of a page when it’s loaded to memory
  • We want to check for corruption before writing things back to disk
  • We cannot do these because we don’t have the access to that level when I/O is handled by the OS
  • Trying to access mmap data can cause SIGBUS
  • Solution: Instead of having all these error handling in the buffer pool module, now all this error handling can be seen in rest of your codebase

Performance Issues

  • TODO : Add details?

File backed vs non-file backed (for PostgreSQL)

  • when using MAP_ANONYMOUS there’s no file based backing. Databases such as PostgreSQL use this for static shared memory allocation in startup.
  • But using mmap is discouraged for dynamic_shared_memory feature in PostgreSQL because it then needs to be backed by a file, when now because it’s backed by a file and managed by the os(linux)
    • it may write modified pages back to disk repeatedly, increasing system I/O load
    • or it might cause inconsistency also (if it os flushes dirty pages!)
    • Instead it recommends using shm_open (not a syscall)

Random notes

Interesting flags

  • MAP_SHARED (Share w other processes)
  • MAP_PRIVATE (Uses Copy on Write)
  • MAP_ANONYMOUS (No file based backing, directly on RAM)

Precautionary Tales

  • Use memory maps if you access data randomly, sparse reads.
  • read files normally if you access data sequentially.

More resources