tags : Operating Systems, Systems, Linux, Observability

Logs give us context into the long tail that averages and percentils don’t surface.

Writers/Sources

systemd journal

  • systemd needs to be there
  • systemd services. When they log to stderr / stdout, journald picks them up and writes them in binary format into it’s journal.
  • Usually stored in /var/log/journal in binary, use journalctl to view
  • journald can forward all its messages to syslog if needed

syslog

  • Usually the case when there’s no systemd
  • Services can write to /dev/log in the syslog format
  • A syslog daemon(eg. rsyslog, syslog-ng) picks it and writes it to destinations(could be file/others)
  • By default writes files to /var/log
  • Because it’s file based, we’ll probably need to use something like logrotate
  • History
    • syslog project first. It started in 1980. Birthed the syslog protocol. UDP only.
    • syslog-ng came in 1998. TCP support, encryption etc.
    • rsyslog came in 2004. Extends syslog-ng

kernel

  • Writes its own logs to a ring buffer.
  • journald or syslog daemons can read logs from this buffer then write to files or journal
  • dmesg can be used to view these
  • Audit Logs: Special case of kernel messages designed for auditing actions such as file access. Probably can be replaced by BPF.

Applications

Logging to stdout / stderr

On systems with systemd…

  • if you write to stderr/stdout and is running with systemd it will log to systemd-journal. That’s just the way it works.
  • More and more, I think the only time you want your program dealing with log files directly is if you have a need for specific multiple logs. But even then, it seems like adding labels or tags to your log stream would be the better.
  • Even in non-containerized apps, it seems writing to stdout/stderr is the way to go these days. If it’s something designed to be daemonized, letting whatever executes the process (systemd for example) can be told where to send the logs, defaulting to a common location like /var/log/syslog. If it’s running inside a docker container or k8s, they have their own way to send logs.

Language specifics

  • For Python, std logging goes a long way, additionally I like structlog
  • For Golang, there’s Zap, there logurus and few more. But now we have a standard in Go1.21m its called slog.
    • It sort of follows the port-adapter pattern using Handlers (See Design Patterns)
    • slog ships w JSON and logfmt handlers. But 3rd party handlers are also supported: zap/logurus etc

How?

  • It needs to be async, don’t block main thread for logging
  • Prefer structured logging if not intended for tty

Sampling

  • Not sampling means more storage
  • Sampling means it’ll cost cpu cycles to sample and trim things down.
  • We can adjust the sample rate based on traffic etc.
  • Types
    • Head / Ingestion: The first service in the request chain can make the decision about whether to trace, which can reduce tracing overhead on services further down.
    • Tail / Index: The tracing backend can make a determination about whether to persist the trace after the trace has been collected. This has tracing overheads, but allows you to make decisions like “always keep errors”.

Redaction

  • Remove sensitive info
  • This can be done inside the application but that doesn’t guarantee that it’ll never leak, deps, someone forgetting etc.
  • The other way is to redact outside the application with regex or using custom stuff like Ragel etc.)

Libraries and Logging

  • There’s some tension around whether libraries should log
  • Avoid taking a logger in any libraries you write if you can. Structure them so that the calling code logs, either by only returning errors or providing callbacks.
  • If possible, disable logging in the library. If the library will not let you disable logging, consider looking for a new library. If you can’t do that, write nop logger adapter.

Centralized logging

Log collector

  • journald
  • syslog-ng / rsyslogd (RELP if requires delivery guarantee)

Log routers/ log processors

People call these by different names.

  • Examples: journald-upload, Fluend, Logstash, Filebeat, Promtail

Central logging/ storage/query

Practices

Canonical Log Line

  • Resist emitting random logs throughout the request as this makes analysis difficult.
  • A canonical log line is a log emitted at the end of the request with everything that happened during the request. This makes querying the logs bliss.
  • See Fast and flexible observability with canonical log lines

Log levels

Other resources