Elasticsearch 7 集群搭建完整指南(Ubuntu 多服务器版)
Elasticsearch 7 集群搭建完整指南(Ubuntu 多服务器版)
概述
Elasticsearch 7 是一个分布式的搜索和分析引擎,适用于全文搜索、结构化搜索、分析以及这三种功能的组合。本指南将详细介绍如何在多台 Ubuntu 服务器上搭建一个生产级别的 Elasticsearch 7 集群。
系统要求
硬件要求
- CPU: 至少 2 核,推荐 4 核以上
- 内存: 最少 4GB,推荐 8GB 以上(生产环境建议 16GB+)
- 磁盘: SSD 推荐,至少 20GB 可用空间
- 网络: 集群节点间需要网络互通,建议内网延迟 < 5ms
软件要求
- 操作系统: Ubuntu 20.04 LTS 或 22.04 LTS
- Java: OpenJDK 11 或更高版本
- Docker: 20.10+ (如果使用 Docker 部署)
网络规划
- 节点1: 192.168.1.101 (主节点 + 数据节点)
- 节点2: 192.168.1.102 (主节点 + 数据节点)
- 节点3: 192.168.1.103 (主节点 + 数据节点)
- 负载均衡: 192.168.1.110 (可选)
方案一:Docker 多服务器部署(推荐)
1. 系统初始化(所有节点执行)
Ubuntu 系统优化
# 更新系统
sudo apt update && sudo apt upgrade -y
# 安装必要工具
sudo apt install -y curl wget vim net-tools htop unzip
# 配置系统参数
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "vm.swappiness=1" | sudo tee -a /etc/sysctl.conf
echo "fs.file-max=65536" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# 配置文件描述符限制
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft memlock unlimited" | sudo tee -a /etc/security/limits.conf
echo "* hard memlock unlimited" | sudo tee -a /etc/security/limits.conf
# 配置主机名和 hosts(根据实际 IP 修改)
sudo tee -a /etc/hosts << EOF
192.168.1.101 es-node-01
192.168.1.102 es-node-02
192.168.1.103 es-node-03
EOF安装 Docker(所有节点)
# 卸载旧版本
sudo apt remove docker docker-engine docker.io containerd runc
# 安装依赖
sudo apt install -y apt-transport-https ca-certificates curl gnupg lsb-release
# 添加 Docker GPG 密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 设置稳定版仓库
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 安装 Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# 启动并设置开机自启
sudo systemctl start docker
sudo systemctl enable docker
# 配置 Docker 守护进程
sudo tee /etc/docker/daemon.json << EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
},
"storage-driver": "overlay2",
"live-restore": true
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
# 验证安装
docker --version2. 创建跨服务器 Docker 网络(主节点)
# 在主节点上创建 overlay 网络
docker network create --driver overlay --attachable elasticsearch-net
# 验证网络创建
docker network ls | grep elasticsearch-net3. 各节点配置和启动
节点1 配置(192.168.1.101)
创建 /opt/elasticsearch/docker-compose.yml:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.15
container_name: elasticsearch
hostname: es-node-01
environment:
- node.name=es-node-01
- cluster.name=es-multi-server-cluster
- node.master=true
- node.data=true
- node.ingest=true
- discovery.seed_hosts=es-node-02:9300,es-node-03:9300
- cluster.initial_master_nodes=es-node-01,es-node-02,es-node-03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.http.ssl.enabled=false
- network.host=0.0.0.0
- network.publish_host=192.168.1.101
- transport.publish_host=192.168.1.101
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
- elasticsearch-logs:/usr/share/elasticsearch/logs
ports:
- 9200:9200
- 9300:9300
networks:
- elasticsearch-net
restart: unless-stopped
volumes:
elasticsearch-data:
driver: local
elasticsearch-logs:
driver: local
networks:
elasticsearch-net:
external: true
driver: overlay节点2 配置(192.168.1.102)
创建 /opt/elasticsearch/docker-compose.yml:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.15
container_name: elasticsearch
hostname: es-node-02
environment:
- node.name=es-node-02
- cluster.name=es-multi-server-cluster
- node.master=true
- node.data=true
- node.ingest=true
- discovery.seed_hosts=es-node-01:9300,es-node-03:9300
- cluster.initial_master_nodes=es-node-01,es-node-02,es-node-03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.http.ssl.enabled=false
- network.host=0.0.0.0
- network.publish_host=192.168.1.102
- transport.publish_host=192.168.1.102
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
- elasticsearch-logs:/usr/share/elasticsearch/logs
ports:
- 9200:9200
- 9300:9300
networks:
- elasticsearch-net
restart: unless-stopped
volumes:
elasticsearch-data:
driver: local
elasticsearch-logs:
driver: local
networks:
elasticsearch-net:
external: true
driver: overlay节点3 配置(192.168.1.103)
创建 /opt/elasticsearch/docker-compose.yml:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.15
container_name: elasticsearch
hostname: es-node-03
environment:
- node.name=es-node-03
- cluster.name=es-multi-server-cluster
- node.master=true
- node.data=true
- node.ingest=true
- discovery.seed_hosts=es-node-01:9300,es-node-02:9300
- cluster.initial_master_nodes=es-node-01,es-node-02,es-node-03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.http.ssl.enabled=false
- network.host=0.0.0.0
- network.publish_host=192.168.1.103
- transport.publish_host=192.168.1.103
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
- elasticsearch-logs:/usr/share/elasticsearch/logs
ports:
- 9200:9200
- 9300:9300
networks:
- elasticsearch-net
restart: unless-stopped
volumes:
elasticsearch-data:
driver: local
elasticsearch-logs:
driver: local
networks:
elasticsearch-net:
external: true
driver: overlay4. 启动各节点服务
# 在节点1上执行
cd /opt/elasticsearch
docker-compose up -d
# 在节点2上执行
cd /opt/elasticsearch
docker-compose up -d
# 在节点3上执行
cd /opt/elasticsearch
docker-compose up -d
# 查看日志
docker-compose logs -f
# 等待服务完全启动(约2-3分钟)
sleep 180
# 验证集群状态
curl -X GET "http://localhost:9200/_cluster/health?pretty"方案二:Ubuntu 二进制包多服务器部署
1. 系统初始化(所有节点执行)
# 更新系统
sudo apt update && sudo apt upgrade -y
# 安装必要工具
sudo apt install -y curl wget vim net-tools htop unzip openjdk-11-jdk
# 配置主机名(根据节点修改)
sudo hostnamectl set-hostname es-node1 # 节点1
# sudo hostnamectl set-hostname es-node2 # 节点2
# sudo hostnamectl set-hostname es-node3 # 节点3
# 配置 hosts 文件(所有节点相同)
sudo tee -a /etc/hosts << EOF
192.168.1.101 es-node1
192.168.1.102 es-node2
192.168.1.103 es-node3
EOF
# 系统优化配置
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "vm.swappiness=1" | sudo tee -a /etc/sysctl.conf
echo "net.core.rmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 87380 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 65536 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.core.somaxconn = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog = 8192" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_fin_timeout = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_time = 120" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_probes = 3" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# 文件描述符限制
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft nproc 4096" | sudo tee -a /etc/security/limits.conf
echo "* hard nproc 4096" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch soft memlock unlimited" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch hard memlock unlimited" | sudo tee -a /etc/security/limits.conf
# 创建数据目录
sudo mkdir -p /usr/local/elasticsearch/data
sudo mkdir -p /usr/local/elasticsearch/logs2. 下载安装包(所有节点执行)
# 下载 Elasticsearch 7.17.15
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.15-linux-x86_64.tar.gz
# 验证文件完整性
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.15-linux-x86_64.tar.gz.sha512
sha512sum -c elasticsearch-7.17.15-linux-x86_64.tar.gz.sha512
# 解压并安装
tar -xzf elasticsearch-7.17.15-linux-x86_64.tar.gz
sudo mv elasticsearch-7.17.15 /usr/local/elasticsearch
sudo chown -R root:root /usr/local/elasticsearch3. 创建用户和权限(所有节点执行)
# 创建 elasticsearch 用户
sudo useradd -M -r -s /bin/false elasticsearch
# 设置权限
sudo chown -R elasticsearch:elasticsearch /usr/local/elasticsearch/data
sudo chown -R elasticsearch:elasticsearch /usr/local/elasticsearch/logs
sudo chmod -R 755 /usr/local/elasticsearch4. 节点1配置(192.168.1.101)
创建 /usr/local/elasticsearch/config/elasticsearch.yml:
# 集群配置
cluster.name: es-multi-server-cluster
node.name: es-node1
node.master: true
node.data: true
node.ingest: true
# 网络配置
network.host: 192.168.1.101
http.port: 9200
transport.port: 9300
# 集群发现
discovery.seed_hosts: ["192.168.1.101:9300", "192.168.1.102:9300", "192.168.1.103:9300"]
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]
# 数据路径
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs
# 内存配置
bootstrap.memory_lock: true
# JVM 配置(在 jvm.options 文件中设置)
# -Xms2g
# -Xmx2g5. 节点2配置(192.168.1.102)
创建 /usr/local/elasticsearch/config/elasticsearch.yml:
# 集群配置
cluster.name: es-multi-server-cluster
node.name: es-node2
node.master: true
node.data: true
node.ingest: true
# 网络配置
network.host: 192.168.1.102
http.port: 9200
transport.port: 9300
# 集群发现
discovery.seed_hosts: ["192.168.1.101:9300", "192.168.1.102:9300", "192.168.1.103:9300"]
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]
# 数据路径
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs
# 内存配置
bootstrap.memory_lock: true
# JVM 配置(在 jvm.options 文件中设置)
# -Xms2g
# -Xmx2g6. 节点3配置(192.168.1.103)
创建 /usr/local/elasticsearch/config/elasticsearch.yml:
# 集群配置
cluster.name: es-multi-server-cluster
node.name: es-node3
node.master: true
node.data: true
node.ingest: true
# 网络配置
network.host: 192.168.1.103
http.port: 9200
transport.port: 9300
# 集群发现
discovery.seed_hosts: ["192.168.1.101:9300", "192.168.1.102:9300", "192.168.1.103:9300"]
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]
# 数据路径
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs
# 内存配置
bootstrap.memory_lock: true
# JVM 配置(在 jvm.options 文件中设置)
# -Xms2g
# -Xmx2g7. JVM 配置(所有节点通用)
编辑 /usr/local/elasticsearch/config/jvm.options:
# 堆内存配置
-Xms2g
-Xmx2g
# GC 配置
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=16m
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
# 堆转储
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/elasticsearch/logs
# GC 日志
-Xlog:gc*,gc+age=trace,safepoint:file=/usr/local/elasticsearch/logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# 其他优化
-XX:+UseStringDeduplication
-XX:+UseCompressedOops
-XX:MaxDirectMemorySize=1g8. 创建 Systemd 服务(所有节点执行)
创建 /etc/systemd/system/elasticsearch.service:
[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target
[Service]
Type=notify
RuntimeDirectory=elasticsearch
PrivateTmp=true
Environment=ES_HOME=/usr/local/elasticsearch
Environment=ES_PATH_CONF=/usr/local/elasticsearch/config
Environment=PID_DIR=/run/elasticsearch
Environment=ES_SD_NOTIFY=true
EnvironmentFile=-/etc/default/elasticsearch
WorkingDirectory=/usr/local/elasticsearch
User=elasticsearch
Group=elasticsearch
ExecStart=/usr/local/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet
# 标准输出重定向到 systemd
# 注意:Elasticsearch 使用自己的日志系统,日志默认存储在 /var/log/elasticsearch
# 标准输出重定向到 systemd 日志,便于查看启动时的错误信息
StandardOutput=journal
StandardError=inherit
# 限制
LimitNOFILE=65536
LimitNPROC=4096
LimitAS=infinity
LimitFSIZE=infinity
# 超时设置
TimeoutStopSec=0
KillSignal=SIGTERM
KillMode=process
SendSIGKILL=no
SuccessExitStatus=143
TimeoutStartSec=75
[Install]
WantedBy=multi-user.target然后执行:
# 重新加载 systemd
sudo systemctl daemon-reload9. 启动服务(所有节点依次执行)
# 启动 Elasticsearch 服务
sudo systemctl start elasticsearch
# 设置开机自启
sudo systemctl enable elasticsearch
# 查看服务状态
sudo systemctl status elasticsearch
# 查看日志
sudo journalctl -u elasticsearch -f
# 等待服务完全启动(约2-3分钟)
sleep 180
# 验证服务
curl -X GET "http://localhost:9200/"10. 验证集群状态
# 在任意节点上验证集群状态
curl -X GET "http://localhost:9200/_cluster/health?pretty"
# 查看节点信息
curl -X GET "http://localhost:9200/_nodes?pretty"
# 查看集群状态
curl -X GET "http://localhost:9200/_stats?pretty"集群验证与测试
1. 健康检查(多服务器验证)
# 检查集群健康状态(在所有节点上验证)
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.102:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.103:9200/_cluster/health?pretty"
# 查看节点信息
curl -X GET "http://192.168.1.101:9200/_nodes/stats?pretty"
# 查看集群统计信息
curl -X GET "http://192.168.1.101:9200/_stats?pretty"
# 查看索引信息
curl -X GET "http://192.168.1.101:9200/_cat/indices?v"2. 跨节点功能测试
# 在节点1上创建索引
curl -X PUT "http://192.168.1.101:9200/test_index?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}'
# 在节点2上插入文档
curl -X POST "http://192.168.1.102:9200/test_index/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"message": "Hello from node2",
"timestamp": "2024-01-01T00:00:00"
}'
# 在节点3上插入另一个文档
curl -X POST "http://192.168.1.103:9200/test_index/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"message": "Hello from node3",
"timestamp": "2024-01-01T01:00:00"
}'
# 在任意节点上搜索文档(验证数据同步)
curl -X GET "http://192.168.1.101:9200/test_index/_search?pretty"
# 验证负载均衡(查看分片分布)
curl -X GET "http://192.168.1.101:9200/_cat/shards?v"
# 删除索引
curl -X DELETE "http://192.168.1.101:9200/test_index?pretty"3. 高可用性测试
# 测试节点故障转移
# 1. 停止节点2的服务(模拟节点故障)
sudo systemctl stop elasticsearch # 在节点2上执行
# 2. 在节点1或3上检查集群状态
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
# 应该显示 "status": "yellow"(因为副本分片可能不足)
# 3. 检查节点列表(节点2应该显示为不可用)
curl -X GET "http://192.168.1.101:9200/_cat/nodes?v"
# 4. 重新启动节点2
sudo systemctl start elasticsearch # 在节点2上执行
# 5. 验证集群恢复正常
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
# 应该显示 "status": "green"4. 集群状态说明
- green: 所有主分片和副本分片都正常分配
- yellow: 所有主分片都正常分配,但至少有一个副本分片未分配
- red: 至少有一个主分片未分配
性能优化(Ubuntu 多服务器优化)
1. JVM 调优(所有节点)
# 修改 JVM 参数
sudo vim /usr/local/elasticsearch/config/jvm.options
# 推荐配置(根据服务器内存调整)
# 对于 8GB 内存服务器:
-Xms4g # 初始堆内存 4GB,适合中小型数据量
-Xmx4g # 最大堆内存 4GB,与初始值相同避免动态调整
# 对于 16GB 内存服务器:
-Xms8g # 初始堆内存 8GB,适合中等数据量和查询负载
-Xmx8g # 最大堆内存 8GB,充分利用 16GB 物理内存的 50%
# 对于 32GB 内存服务器:
-Xms16g # 初始堆内存 16GB,适合大数据量和高并发查询场景
-Xmx16g # 最大堆内存 16GB,32GB 物理内存的 50% 是最优配置
# GC 配置
-XX:+UseG1GC # 使用 G1 垃圾收集器,适合大内存应用
-XX:MaxGCPauseMillis=200 # G1 最大停顿时间目标,200ms 平衡吞吐量和响应时间
-XX:+UseStringDeduplication # 字符串去重,减少内存占用
-XX:-UnlockExperimentalVMOptions # 解锁实验性 JVM 选项
-XX:+UseCGroupMemoryLimitForHeap # 使用 CGroup 内存限制,容器环境中自动调整2. Ubuntu 系统优化(所有节点)
# 内核参数优化
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "vm.swappiness=1" | sudo tee -a /etc/sysctl.conf
# 文件系统优化
echo "fs.file-max=65536" | sudo tee -a /etc/sysctl.conf
echo "fs.nr_open=65536" | sudo tee -a /etc/sysctl.conf
# 网络优化
echo "net.core.rmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 87380 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 65536 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.core.somaxconn = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog = 8192" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_fin_timeout = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_time = 120" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_probes = 3" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_tw_reuse = 1" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_tw_recycle = 0" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 1024 65535" | sudo tee -a /etc/sysctl.conf
# 虚拟内存优化
echo "vm.vfs_cache_pressure=50" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_background_bytes=67108864" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_bytes=536870912" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_ratio=80" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_background_ratio=5" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_expire_centisecs=12000" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# 禁用 swap(重要)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
# CPU 频率优化(可选)
echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
sudo systemctl restart cpufrequtils3. Elasticsearch 配置优化(所有节点)
编辑 /usr/local/elasticsearch/config/elasticsearch.yml,添加以下配置:
# 内存优化配置
indices.memory.index_buffer_size: 30%
indices.queries.cache.size: 10%
indices.fielddata.cache.size: 20%
# 搜索优化配置
thread_pool.search.size: 10
thread_pool.search.queue_size: 2000
thread_pool.search.min_queue_size: 1000
thread_pool.search.max_queue_size: 3000
thread_pool.search.auto_queue_frame_size: 2000
thread_pool.search.target_response_time: 1s
# 索引优化配置
thread_pool.write.size: 8
thread_pool.write.queue_size: 1000
# 网络优化配置
network.tcp.keep_alive: true
network.tcp.reuse_address: true
network.tcp.send_buffer_size: 65536
network.tcp.receive_buffer_size: 65536
# 分片优化配置
cluster.routing.allocation.cluster_concurrent_rebalance: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
cluster.routing.allocation.node_initial_primaries_recoveries: 4
# 查询缓存配置
indices.queries.cache.count: 10000
indices.queries.cache.size: 15%
indices.fielddata.cache.size: 30%
indices.requests.cache.size: 2%
# 断路器配置(防止内存溢出)
indices.breaker.total.use_real_memory: false
indices.breaker.total.limit: 70%
indices.breaker.request.limit: 60%
indices.breaker.fielddata.limit: 60%4. 磁盘 I/O 优化(Ubuntu)
# SSD 优化(如果是 SSD 磁盘)
echo "deadline" | sudo tee /sys/block/sda/queue/scheduler
# 挂载选项优化
# 编辑 /etc/fstab,在相应的挂载条目添加:
# UUID=xxx /data ext4 defaults,noatime,nodiratime,nobarrier 0 2
# 文件系统缓存优化
echo "vm.vfs_cache_pressure=50" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_background_bytes=67108864" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_bytes=536870912" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p5. 网络优化(多服务器环境)
# 网络接口优化(根据实际网卡名称调整)
sudo ethtool -G eth0 rx 4096 tx 4096
sudo ethtool -K eth0 gso on
sudo ethtool -K eth0 tso on
sudo ethtool -K eth0 gro on
# 网络缓冲区优化
echo "net.core.rmem_default=262144" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_default=262144" | sudo tee -a /etc/sysctl.conf
echo "net.core.optmem_max=65536" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p监控和维护
1. 安装 Kibana 监控
# 下载 Kibana
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.17.15-linux-x86_64.tar.gz
# 解压
tar -xzf kibana-7.17.15-linux-x86_64.tar.gz
sudo mv kibana-7.17.15-linux-x86_64 /usr/local/kibana
# 配置 kibana.yml
sudo tee /usr/local/kibana/config/kibana.yml << EOF
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.username: "elastic"
elasticsearch.password: "password"
EOF
# 启动 Kibana
cd /usr/local/kibana
nohup ./bin/kibana &2. 多服务器监控命令
# 集群健康状态(从任意节点检查)
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.102:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.103:9200/_cluster/health?pretty"
# 节点统计信息
curl -X GET "http://192.168.1.101:9200/_nodes/stats?pretty"
# 索引统计信息
curl -X GET "http://192.168.1.101:9200/_stats?pretty"
# 集群设置
curl -X GET "http://192.168.1.101:9200/_cluster/settings?pretty"
# 查看分片分配
curl -X GET "http://192.168.1.101:9200/_cat/shards?v"
# 查看节点信息
curl -X GET "http://192.168.1.101:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,load_5m,load_15m,disk.used_percent"故障排除(Ubuntu 多服务器环境)
1. 多服务器常见问题
问题1: 集群无法形成
- 检查所有节点间的网络连接:
ping 192.168.1.102 ping 192.168.1.103 telnet 192.168.1.102 9300 telnet 192.168.1.103 9300 - 确认所有节点的防火墙规则:
sudo ufw status sudo ufw allow 9200/tcp sudo ufw allow 9300/tcp - 验证配置文件中的 IP 地址和主机名
- 检查所有节点的节点名称是否唯一
- 确认所有节点的集群名称一致
问题2: 内存不足(多服务器)
- 检查每个节点的内存使用情况:
free -h top -p $(pgrep -d',' java) - 增加 JVM 堆内存(考虑服务器总内存,建议不超过物理内存的 50%)
- 减少分片数量(过多的分片会消耗大量内存)
- 优化查询(避免复杂聚合和深度分页)
- 添加更多数据节点(分散内存压力)
问题3: 磁盘空间不足(多服务器)
- 检查所有节点的磁盘空间:
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do echo "=== Node $ip ===" ssh user@$ip "df -h" done - 清理旧索引(删除不再需要的历史数据索引)
- 增加磁盘空间(扩展存储或添加新节点)
- 配置索引生命周期管理 (ILM)(自动管理索引的生命周期,包括删除旧索引)
- 检查日志文件大小并清理(Elasticsearch 日志可能占用大量空间)
问题4: 网络分区(多服务器)
- 检查网络连接稳定性(物理连接、网线、交换机端口状态)
- 验证交换机/路由器配置(VLAN、端口聚合、STP 配置)
- 检查网络延迟:
ping -c 10 192.168.1.102 ping -c 10 192.168.1.103 - 调整网络超时参数(增加 discovery 超时时间,避免网络抖动导致节点脱离集群)
2. 多服务器日志分析
# 批量查看错误日志
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Errors on Node $ip ==="
ssh user@$ip "sudo grep ERROR /var/log/elasticsearch/elasticsearch.log | tail -10"
done
# 查看集群状态变化(所有节点)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Cluster state changes on Node $ip ==="
ssh user@$ip "sudo grep 'cluster state updated' /var/log/elasticsearch/elasticsearch.log | tail -5"
done
# 查看分片分配情况(所有节点)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Shard allocation on Node $ip ==="
ssh user@$ip "sudo grep 'shard allocation' /var/log/elasticsearch/elasticsearch.log | tail -5"
done
# 查看主节点选举日志
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Master election on Node $ip ==="
ssh user@$ip "sudo grep 'master.*elected' /var/log/elasticsearch/elasticsearch.log | tail -5"
done3. 多服务器网络诊断
# 检查端口连通性
for ip in 192.168.1.102 192.168.1.103; do
echo "=== Testing connection to $ip ==="
nc -zv $ip 9200
nc -zv $ip 9300
done
# 检查网络配置
ip addr show
netstat -tlnp | grep -E '9200|9300'
# 检查 DNS 解析
nslookup es-node1
nslookup es-node2
nslookup es-node3
# 检查路由表
route -n4. 多服务器性能诊断
# 检查各节点资源使用情况
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Resource usage on Node $ip ==="
ssh user@$ip "top -bn1 | head -20"
done
# 检查磁盘 I/O
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Disk I/O on Node $ip ==="
ssh user@$ip "iostat -x 1 3"
done
# 检查网络带宽
for ip in 192.168.1.102 192.168.1.103; do
echo "=== Network bandwidth test to $ip ==="
iperf3 -c $ip -t 10
done5. 原始多服务器日志查看
# 查看所有节点的日志(最近的)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Recent logs on Node $ip ==="
ssh user@$ip "sudo tail -50 /var/log/elasticsearch/elasticsearch.log"
done
# 查看所有节点的 GC 日志
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== GC logs on Node $ip ==="
ssh user@$ip "sudo tail -20 /usr/local/elasticsearch/logs/gc.log"
done
# 批量重启所有节点(谨慎操作)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
echo "=== Restarting Node $ip ==="
ssh user@$ip "sudo systemctl restart elasticsearch"
sleep 30 # 等待30秒,确保集群稳定,避免同时重启导致集群不可用
done总结
本指南提供了两种在 Ubuntu 多服务器环境下搭建 Elasticsearch 7 集群的完整方案:
- Docker 部署方案:适合快速部署和测试环境,配置简单,易于管理
- 二进制包部署方案:适合生产环境,性能更好,控制更精细
关键要点:
- 系统优化:内核参数、文件描述符、内存锁定等配置对性能至关重要
- 网络配置:确保所有节点间网络互通,防火墙规则正确
- 内存配置:JVM 堆内存建议设置为物理内存的 50%,但不超过 32GB
- 监控维护:定期监控集群健康状态,及时处理故障
- 数据安全:配置副本分片,确保数据高可用
性能优化亮点:
- JVM G1 垃圾收集器调优
- 系统内核参数优化
- 网络缓冲区调优
- 磁盘 I/O 优化
- Elasticsearch 内部缓存配置
运维管理:
- 提供了完整的监控命令集合
- 详细的故障排除指南
- 多服务器环境下的日志分析方法
- 性能诊断工具使用
通过遵循本指南,您可以搭建一个高性能、高可用的 Elasticsearch 7 集群,满足生产环境的搜索和分析需求。
后续建议:
- 定期更新 Elasticsearch 版本以获得最新功能和安全修复
- 配置自动化监控告警系统
- 建立完善的备份恢复机制
- 根据实际业务负载持续优化集群配置
- 考虑使用 Elasticsearch 的 X-Pack 安全功能加强集群安全
希望这份指南对您搭建 Elasticsearch 集群有所帮助!如有问题,请参考故障排除部分或查阅官方文档。