Compiling and Testing a FluentD Plugin
Compiling and Testing a FluentD Plugin
This post will be in the context of running FluentD on a VM using the td-agent and filebeat packages.
Background
I’ve been looking into how to optimize FluentD. I want Aggregators that can handle a lot of throughput and utilize each CPU core. FluentD is written in Ruby, and is thus subject to the constraints of the Global Interpretor Lock like Python.
This way seems to work well. As long as you break out each <process>
block in your /etc/td-agent/td-agent.conf
to handle one input plugin (although not necessary), it’ll work for you. You can break it out into many sub-configurations. Its just a bit harder to maintain and scale.
I tried using the more native workers
configuration property, but plugins must explicitly support it.
Beats is a common way to ship logs. I was surprised to find that the FluentD Beats Plugin doesn’t support Multiple Workers. In order to test some changes to the plugin, I needed to be able to compile from source. And off we go with my forked repo…
The Process
Download, Build, Compile
- Prep the file system and clone the plugin repo:
mkdir -p /tmp/fluent-plugin-beats && \
cd /tmp/fluent-plugin-beats && \
git clone --single-branch -b multi-workers https://github.com/chicken231/fluent-plugin-beats
- Use
td-agent
’sgem
wrapper to build the gem:
td-agent-gem build fluent-plugin-beats.gemspec
- The
build
command generates a gem and adds a version number into the file name. Now you can install the gem to make it available to FluentD:
td-agent-gem install fluent-plugin-beats-0.1.4.gem
Configure, Install, Test
Assumptions:
td-agent
is installed and configured. Here’s an excerpt from my/etc/td-agent/td-agent.conf
:
# general system config. Note the log format for later.
<system>
workers 4
suppress_config_dump
<log>
format json
</log>
</system>
# beats input
<source>
@type beats
metadata_as_tag
port 5044
bind 0.0.0.0
</source>
# use this when testing to print to stdout and thus the log file
<match **>
@type stdout
</match>
- Filebeat is installed, configured, and enabled to point to Logstash output on
localhost:5044
and is capturing files matching/var/log/*.log
(default).
- Start
td-agent
andfilebeat
:
systemctl start td-agent filebeat
- Anything that comes in as an input to FluentD will write into
/var/log/td-agent/td-agent.log
. So tail it:
tail -f /var/log/td-agent/td-agent.log
echo
some stuff and append to a file in a directory thatfilebeat
is monitoring:
echo WOOOO $(date) >> /var/log/temp.log
- Observe a log printed to
td-agent
’s log. Note the logs are in JSON. An unflattened example of a message fromfilebeat
:
{
"@timestamp": "2018-10-11T02:26:50.313Z",
"@metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.4.2"
},
"input": {
"type": "log"
},
"beat": {
"name": "centos-beats",
"hostname": "centos-beats",
"version": "6.4.2"
},
"host": {
"name": "centos-beats"
},
"source": "/var/log/temp.log",
"offset": 754,
"message": "WOOOO Thu Oct 11 02:26:45 UTC 2018",
"prospector": {
"type": "log"
}
}
And there we are. We downloaded a fork of a plugin that had changes to enable multiple workers. Built it, compiled it, tested it, and watched the messages from Filebeat stream through the FluentD logs.