线上ES突然报错too many open files
首先使用如下命令查看es cluster当前file descriptor status
1 | curl -XGET '127.0.0.1:9200/_cat/nodes?v&h=ip,fdc,fdm' |
概念:
fdc
: file_desc.current, fdc, fileDescriptorCurrentfdm
: file_desc.max, fdm, fileDescriptorMax
先解决问题
in crease file descriptor limit
- Display the current hard limit of your machine: The hard limit is the maximum server limit that can be set without tuning the kernel parameters in proc file system
1 | $ ulimit -aH |
Edit the /etc/security/limits.conf
1
2
3
4
5
6
7
8
9
10$ sudo cat /etc/security/limits.conf | grep -v '#'
* soft nofile 128000
* hard nofile 128000
* soft nproc 65535
* hard nproc 65535
es soft memlock unlimited
es hard memlock unlimited
<domain> <type> <item> <value>Add parameter to ES during startup:
-Des.max-open-files=true
问题解决后在想Root Cause
File Descriptor
fd: Linux有个概念是一切皆文件
,所以Linux内的操作都是各式各样的文件操作,但是操作文件不能总是从头找吧,所以类似于数据库的索引,Linux对所有的文件也进行了索引,这个索引就是file descriptor
fd的结构基本就是,索引号,然后指向file的指针, 注意,fd只是针对单一一个process的
而ES会使用大量的文件
Lucene uses a very large number of files. At the same time, Elasticsearch uses a large number of sockets to communicate between nodes and HTTP clients. All of this requires available file descriptors.
Sadly, many modern Linux distributions ship with a paltry 1,024 file descriptors allowed per process. This is far too low for even a small Elasticsearch node, let alone one that is handling hundreds of indices.
You should increase your file descriptor count to something very large, such as 64,000. This process is irritatingly difficult and highly dependent on your particular OS and distribution. Consult the documentation for your OS to determine how best to change the allowed file descriptor count.
所以要增大这个值…
除此之外,ES shard: primary and replica 也是导致ES open file 特别多的原因
ES performance tuning, verg good
另外需要控制index的life cycle,使用index template配合rollup或者delete等操作