Bash scripts for managing backup, delete, and restore of elasticsearch indexes created by logstash.

License: MIT License

Shell 100.00%

elasticsearch-logstash-index-mgmt's Introduction

Elasticsearch Logstash Index Management

Please note that Elasticsearch provides the python based Curator which manages closing/deleting and maintenance with lots of tuning capabilities. It is worth investigating Curator as an elasticsearch-maintained solution for your cluster's time-based index maintenance needs.

If you are using Curator with Elasticsearch >= 1.0.0 (and Hubot) and you want a way to restore old indices, try hubot-elk-restore.

If you prefer to roll your own, or need functionality that's not (yet) available in Curator, these scripts may serve as a good starting point. This collection of bash scripts for manages Elasticsearch indices. They are specifically designed around the daily index pattern used in Logstash.

Support is integrated for uploading backups to S3 using s3cmd.

Each script has samples included, use '-h' or check the source. THe default index is 'logstash', but this option is configurable with '-g' for 'marvel' or custom index names.

These are heavily inspired by a previous collection of scripts.

elasticsearch-remove-old-indices.sh

This script generically walks through the indices, sorts them lexicographically, and deletes anything older than the configured number of indices.

elasticsearch-remove-expired-indices.sh

This script generically walks through the indices, and deletes anything older than the configured expiration date.

elasticsearch-close-old-indices.sh

This script generically walks through the indices, sorts them lexicographically, and closes indices older than the configured number of indices.

elasticsearch-backup-index.sh

Backup handles making a backup and a restore script for a given index. The default is yesterday's index, or you can pass a specific date for backup. You can optionally keep the backup locally, and/or push it to s3. If you want to get tricky, you can override the s3cmd command, and you could have this script push the backups to a storage server on your network (or whatever you want, really).

elasticsearch-restore-index.sh

Restore handles retrieving a backup file and restore script (from S3), and then executing the restore script locally after download.

Cron

Manual

Something like this might be helpful, assuming you placed the scripts in the /opt/es/ directory (formatted for an /etc/cron.d/ file):

00 7 * * * root /bin/bash /opt/es/elasticsearch-backup-index.sh -b "s3://es-bucket" -i "/opt/elasticsearch/data/elasticsearch/nodes/0/indices" -c "s3cmd put -c /path/to/.s3cfg"
00 9 * * * root /bin/bash /opt/es/elasticsearch-remove-old-indices.sh -i 21

`es-backup-index.sh`

Alternatively @sergedu provided a sample cron script es-backup-index.sh.

The es-backup-index script is ready for use, and assuming your setup is consistent, these instructions (included in the script itself) should be enough to get started:

# This is a wrapper script for daily run
# i.e. you can run it by cron as follows
## m h  dom mon dow   command
#  11 4 * * * /opt/es/es-backup-index.sh >> /var/log/elasticsearch/esindexbackup.log

elasticsearch-logstash-index-mgmt's People

Contributors

Stargazers

Watchers

Forkers

fthzkrtn madandroid rnavarro andreioprisan dpippen mflu inbeom gweinhold ericchaves vuquochuy pindar cgoel kslimani aimanparvaiz venkatakarunakar wotas99 poom dasrecht kurtpayne pqjzheng ngelik tropian grantcurrey crazw sudoconf qxip lyrixx patrickyee ngocthanhit delaosa linn billzhuang bugcy013 arktos65 grigio prontodev sangram-chavan microphobic defenestratexp k001 glmrenard fsiddiqui mayvazyan zhangwei5095 swarajgiri stevengonsalvez sergedu cccsv madkoala jeremyshort kalranitin dvopsway ianblenke cegeka vboulaye aaw janssg lekster ehgwallace kanteshraj bhdrkn macbash paulka jiekechoo zz hanzhefeng yangliucheng lhe1 devopsberlin simudream guillermoortiz nenads79 alexandrazhou edunext carioca-tech him3010 swapnil-jaiswal aeppert badlamer rhidayat1980 leizhangimo bastienf asidbackup olivierpt ddoloroi petromatviichuk junyitian laxman-sm inthecloud247 nudgeapm mallikarjuna-budur spartantri birdie7761 twosixlabs d-ulyanov yubobo hiephm navya476 caanone martoni35

elasticsearch-logstash-index-mgmt's Issues

Add support for multi-node clusters and distributed shards

Intentionally supporting only nodes that have all data at the moment.

Discussion:
it looks like
https://github.com/dpippen/elasticsearch-logstash-index-mgmt/blob/master/elasticsearch-backup-index.sh
assumes a 1 or 2 node ES cluster
...
yeah, it would probably be a matter of grabbing routings and then making a worklist to run on each node
...
if you get the routing and filter for primary shard you might have a list of 7 nodes (some nodes end up with 2 primary shards)
to get it running somewhat naturally with the current code, i think i'd want a state entry in redis and strategically offset (via cron) bash scripts.
but i really don't feel good about that
wasn't archiving supposed to make it to ES by 1.0?
you could rely on the es notion of the state
dump the mappings, create a snapshot of node and shard assignments
each node is responsible for making a tar of its shards and copying to S3
so in my example there would be 7 tar files for the index on S3
do you want to be able to restore those via script?
the other 13 nodes would just exit, nothing to do
the restore would be similar, just pull down logstash-$DATE-*.tgz

Restore script is not aware of download directory, so it must be run in /tmp (default)

Restore script should be aware of both the current directory and the target tmp directory.

Year-month value for storage is determined by now, needs to be determined by yesterday.

The last day of a month will get stored in the wrong location, since the backup is pulled based on yesterday, but then stored in a target directory based on today.

elasticsearch-remove-old-indices.sh does not work on alpine linux

The awk expression in elasticsearch-remove-old-indices.sh does not work on Alpine Linux (musl based, http://wiki.alpinelinux.org/wiki/Awk).

Maybe can be substituted by a more simple `awk '/logstash/{print $3}'?

Merge effort with elasticsearch/curator?

Howdy!

Lovely tool you're working on! Any possibly channeling effort on this project into the Elasticsearch Curator tool, instead? It has similar goals to this project and is officially maintained by us at Elasticsearch. The project itself is fairly old (2 years? maybe more?) and supports lots of index management activities, with more on the way! We're adding snapshotting, shard routing, and other cool features very soon, and I would love your help in improving curator.

Thoughts?

You support marvel or heka too

Hello.

The command ./elasticsearch-remove-old-indices.sh works like a charm. Thanks.

But I suggest you to add to your readme that you support too marvel and hekad.

In my case, I just ran

./elasticsearch-remove-old-indices.sh -e http://es.lxc:9200 -g marvel -i 1

And it works perfectly.

support for es 2.0

Looks like there is a problem when restoring. Mapping has changed on ES 2.
MapperParsingException[Failed to parse mapping [testing-2016.03.22]: Root mapping definition has unsupported parameters...

ElasticSearch used is 2.1.1 for testing this script.
As a test scenario:
curl -XPUT 'http://localhost:9200/testing-2016.03.22/' -d '{"settings":{"number_of_shards":5,"number_of_replicas":0},"mappings":{"testing-2016.03.22":{"mappings":{"doc":{"properties":{"date":{"type":"date","format":"strict_date_optional_time||epoch_millis"},"sentence":{"type":"string"},"value":{"type":"long"}}}}}}}'

Problem with elasticsearch-remove-old-indices.sh

Hi,

I use elasticsearch and logstash for several website. For each website I have 2 indexes per day. For example for my website www.ZZZZZZ.com there is one index named "logstash-ZZZZZZ_index-YYYY.MM.DD" and another one named "logstash-ZZZZZZ_failure-YYYY.MM.DD".

When I do :
elasticsearch-remove-old-indices.sh -i 1 -g logstash-ZZZZZZ_failure
i haven't any problem but when I do:
elasticsearch-remove-old-indices.sh -i 1 -g logstash-ZZZZZZ_index

I have this error message:
"No indices returned containing 'logstash-ZZZZZZ_index' from http://localhost:9200"
Although there is 3 index open in Elasticsearch:
logstash-ZZZZZZ_index.2013.07.27
logstash-ZZZZZZ_index.2013.07.28
logstash-ZZZZZZ_index.2013.07.29

i don't understand why I have this problem. Can you help me?

Remove s3cmd-specific variables.

Should be more clear about optional backup techniques.

Remove 'S3CMD' in favor of something like 'BACKUP_COMMAND', and add a check to ensure that the supplied command is valid on the system.

This way s3cmd, boto, scp, cp, etc. all make more sense and using an alternative to s3cmd feels less hacky.

Need to be sure this works properly in the restoration process, too.

Failure scenarios need to clean up ES

Some failures during restore result in partial restore states. When failures occur mid-restore, proper clean up should happen. (deleting the new ES-index, for example)

Remove index close

Hi,

Thank you for your help with my last problem.

For my different site, I keep open only one index, that of the current day. Every day my crontab closes all indexes of the day before. This part is okay.

I would also remove indexes older than 7 days. But "http://localhost:9200/_status?pretty=true" only gives list of open indexes. How can I have a list of all index?

It would be highly appreciate if someone could help or share any idea on it. Many thanks.

Regards,
Hubert

support for ES 5

Any idea when you will write for ES 5. Bocz in es5 it stored folder with uuid name

imperialwicket / elasticsearch-logstash-index-mgmt Goto Github PK