ELK Stack – Installing and Configuring Curator

In this post I am going to quickly cover what is needed to get Curator up and running on the ELK stack. In the last few posts about the ELK stack I covered everything needed to get it installed, configured and ingesting logs reliably. If you missed those posts, you can find them here:
ELK 5 on Ubuntu 16.04

Once the ELK stack is running for a bit, you will likely notice that disk space begins to disappear quickly as you begin storing more data for longer periods of time. If you’re anything like me the last thing you need is another manual task to have to remember to perform like logging in and clearing out old data, of course you also want to have some level of consistency here. Well it turns out Elastic already has this covered with Curator. With Curator you define an actions file telling it which indice to clean up and how many days worth of data to retain. For example on the Winlogbeat indices, I can tell Curator delete any indice older than 60 days and then schedule Curator to run as a cron job once a week and that’s it. Curator will then maintain 60 days worth of logs on the instance.

Installing Curator

There are a few different ways to install Curator but the Python pip way generally seems to be the easiest, so that is what I am going to cover here on Ubuntu 16.04.

You can find more info on Curators official page here:
https://www.elastic.co/guide/en/elasticsearch/client/curator/current/index.html

1.) Make sure you have Python pip installed:
$ sudo apt-get install python-pip

2.) Install Curator:
$ sudo pip install elasticsearch-curator

3.) Create a new directory to store the configuration files in and let’s also create the Curator Config file:
$ mkdir Curator
$ cd Curator
$ nano Curator-Config.yml

Click here to download both configuration files (Curator.zip – 2KB) or you can just copy and paste the following:

4.) Now we need to create an Actions File, this is where you define what data to delete and after how long:
$ nano Actions-File.yml

Click here to download both configuration files (Curator.zip – 2KB) or you can just copy and paste the following:

In this configuration, I have an entry for each beats agents (Winlogbeat, Filebeat, Packetbeat, Metricbeat) since the data from each one is being stored in its own indice. You will also want to pay attention to the timestring! In this case, it is set to %Y.%m.%d but it will need to match the pattern set on the indices. If you followed my ELK Stack – Tips, Tricks and Troubleshooting post where I made some changes to limit the number of shards created, the timestring pattern you will want to use is:

The time string is what will be referenced in determining how old the logs are in order to delete them. Lastly the unit_count will be the number of days worth of data that we would like to retain, I have set it to 28 days here.

If you are not sure what pattern the indices are using, you can easily check by taking a look at the shards:

Here we can see that the indices are following the %Y.%m.%d format and the way the actions file is configured should match on these indices just fine.

More info on the data patterns can be found here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html#built-in-date-formats

5.) Time to put it all together and perform a “dry run”. The following command will run Curator and output what files it would delete without actually deleting anything:
$ sudo curator –config /home/rob/Curator/Curator-Config.yml –dry-run /home/rob/Curator/Actions-File.yml

You should see something similar to the following:

We see there are few indices that are old enough and should be deleted like the following indice:

6.) Now that we know the configuration is good, this time we’ll run the same command but without the –dry-run:
$ sudo curator –config /home/rob/Curator/Curator-Config.yml /home/rob/Curator/Actions-File.yml

7.) Now recheck the shards and verify that there are not any indices older than what is defined in the Actions-File (28 days):

Awesome, everything appears to work as expected and we just freed up all of that disk space!

Note: You may have noticed there are some gaps in the indices referenced above as well as different timestamp patterns used on some of the them, that is just because this is a demo box that has be on and off over the last few months and had a few config changes.

8.) Now to automate this task just set up a simple cron job:
$ crontab -e

And then add the same Curator command used above with the addition of the timing, the following line will run Curator everyday at midnight:

And that is it, the ELK stack will now take care of itself!

Comments are closed.