Faster logs delivering from Fluentd

Recently, I’ve faced a problem with slow logs collection and their delivery to our logs aggregator — Graylog. So, I looked into fluentd documentation and digging up its sources to understand, how fluentd collects logs and how it delivers them.

It’s not an article in a regular sense; it’s more of a small tip…

Collecting logs

I should start from how fluentd collects logs. For these purposes, it has input plugins.

Input plugins say for themselves; they have only one job to do — collect logs from different sources and transform them into fluentd records. But they don’t output them immediately. Instead, they are being collected into buffers.

Let’s take a tail plug-in that can collect logs from files by tailing them. For that plugin, I’ve used the following configuration:

@type tail
tag kubernetes.containers.*
path /var/log/containers/*.log
  refresh_interval 2
  read_from_head true
  pos_file /var/log/fluentd-containers.log.pos
  rotate_wait 5
  enable_watch_timer true
  enable_stat_watcher false
  open_on_every_update false
@type json
  time_format %Y-%m-%dT%H:%M:%S.%NZ
view raw fluentd-source.conf hosted with ❤ by GitHub

What does it do? Actually, it just runs tail -F, but a little smarter. It can remember the last read position, supports different parsers, etc. Because of this configuration, you will get a fluentd record for every single line in your log files.

It will store all of them in fluentd buffer (which can be memory, a file, or anything else, if you have a plugin for that).

Delivering logs

When fluentd has parsed logs and pushed them into the buffer, it starts pull logs from buffer and output them somewhere else. For this, fluentd has output plugins.

Output plugin receives the fluentd record, parses it in an appropriate format for specified output (in our case Graylog) and delivers it via transport (http, udp, tcp, whatever…).

I’ll show an example for Graylog. Here is our configuration for outputting logs to Graylog:

<match **>
@type copy
@type gelf
host "#{ENV['GELF_HOST']}"
port "#{ENV['GELF_PORT']}"
protocol "#{ENV['GELF_PROTOCOL']}"
view raw fluentd-match.conf hosted with ❤ by GitHub

The configuration says to its plugin and fluentd, that all the records from fluentd should be delivered to our Graylog server (GELF protocol).

Where was the slowdown?

Turns out that input/output plugins and their configurations have nothing with slow delivering (by default).

The problem was in the buffer configuration itself. And, turns out, that the default configuration for buffer has specified interval by which it should be enqueued for delivering.

For our purpose, in, we need to deliver logs from specific containers as fast as possible. Turns out that you can change fluentd behavior and how it should work with the buffer. The directive for controlling it is called buffer.

So, let’s change our configuration for output plugin to force immediate sending of logs from the buffer:

<match **>
@type copy
@type gelf
host "#{ENV['GELF_HOST']}"
port "#{ENV['GELF_PORT']}"
protocol "#{ENV['GELF_PROTOCOL']}"
flush_at_shutdown true
flush_mode immediate
flush_thread_count 8
flush_thread_interval 1
flush_thread_burst_interval 1
retry_forever true
retry_type exponential_backoff
view raw fluentd-match.conf hosted with ❤ by GitHub

You can change the behavior of the buffer in specific output plugins. I set flush_mode to immediate, so right after fluentd record is pushed into the buffer, it will be enqueued for delivering to our Graylog cluster.


Before my changes in configuration (default behavior), it sends logs each 5 seconds:

When I applied buffer configuration with immediate flush mode, it delivers logs much faster:

I understand that fluentd describes this case in documentation, though I hope that this tip helps you to save your time for other work.

Eugene Obrezkov, Senior Software Engineer at, Kyiv, Ukraine.

6 thoughts on “Faster logs delivering from Fluentd

    1. We used it to send logs from tiny containers, but there were a lot of them. From each container could come from several kilobytes to hundreds of kilobytes. Containers itself were roughly from 300 to 500 containers. Those are not huge payloads, but there was a load, for sure.


  1. hello! we are missing the logs for fast restarting pods. We have set the refresh_interval to 5. Do we also need to set flush_mode immediate? thanks


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.