Contents

Elasticsearch 7 集群搭建完整指南(Ubuntu 多服务器版)

Contents

Elasticsearch 7 集群搭建完整指南(Ubuntu 多服务器版)

概述

Elasticsearch 7 是一个分布式的搜索和分析引擎,适用于全文搜索、结构化搜索、分析以及这三种功能的组合。本指南将详细介绍如何在多台 Ubuntu 服务器上搭建一个生产级别的 Elasticsearch 7 集群。

系统要求

硬件要求

  • CPU: 至少 2 核,推荐 4 核以上
  • 内存: 最少 4GB,推荐 8GB 以上(生产环境建议 16GB+)
  • 磁盘: SSD 推荐,至少 20GB 可用空间
  • 网络: 集群节点间需要网络互通,建议内网延迟 < 5ms

软件要求

  • 操作系统: Ubuntu 20.04 LTS 或 22.04 LTS
  • Java: OpenJDK 11 或更高版本
  • Docker: 20.10+ (如果使用 Docker 部署)

网络规划

  • 节点1: 192.168.1.101 (主节点 + 数据节点)
  • 节点2: 192.168.1.102 (主节点 + 数据节点)
  • 节点3: 192.168.1.103 (主节点 + 数据节点)
  • 负载均衡: 192.168.1.110 (可选)

方案一:Docker 多服务器部署(推荐)

1. 系统初始化(所有节点执行)

Ubuntu 系统优化

# 更新系统
sudo apt update && sudo apt upgrade -y

# 安装必要工具
sudo apt install -y curl wget vim net-tools htop unzip

# 配置系统参数
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "vm.swappiness=1" | sudo tee -a /etc/sysctl.conf
echo "fs.file-max=65536" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

# 配置文件描述符限制
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft memlock unlimited" | sudo tee -a /etc/security/limits.conf
echo "* hard memlock unlimited" | sudo tee -a /etc/security/limits.conf

# 配置主机名和 hosts(根据实际 IP 修改)
sudo tee -a /etc/hosts << EOF
192.168.1.101 es-node-01
192.168.1.102 es-node-02
192.168.1.103 es-node-03
EOF

安装 Docker(所有节点)

# 卸载旧版本
sudo apt remove docker docker-engine docker.io containerd runc

# 安装依赖
sudo apt install -y apt-transport-https ca-certificates curl gnupg lsb-release

# 添加 Docker GPG 密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# 设置稳定版仓库
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# 安装 Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# 启动并设置开机自启
sudo systemctl start docker
sudo systemctl enable docker

# 配置 Docker 守护进程
sudo tee /etc/docker/daemon.json << EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  },
  "storage-driver": "overlay2",
  "live-restore": true
}
EOF

sudo systemctl daemon-reload
sudo systemctl restart docker

# 验证安装
docker --version

2. 创建跨服务器 Docker 网络(主节点)

# 在主节点上创建 overlay 网络
docker network create --driver overlay --attachable elasticsearch-net

# 验证网络创建
docker network ls | grep elasticsearch-net

3. 各节点配置和启动

节点1 配置(192.168.1.101)

创建 /opt/elasticsearch/docker-compose.yml

version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.15
    container_name: elasticsearch
    hostname: es-node-01
    environment:
      - node.name=es-node-01
      - cluster.name=es-multi-server-cluster
      - node.master=true
      - node.data=true
      - node.ingest=true
      - discovery.seed_hosts=es-node-02:9300,es-node-03:9300
      - cluster.initial_master_nodes=es-node-01,es-node-02,es-node-03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.http.ssl.enabled=false
      - network.host=0.0.0.0
      - network.publish_host=192.168.1.101
      - transport.publish_host=192.168.1.101
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
      - elasticsearch-logs:/usr/share/elasticsearch/logs
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elasticsearch-net
    restart: unless-stopped

volumes:
  elasticsearch-data:
    driver: local
  elasticsearch-logs:
    driver: local

networks:
  elasticsearch-net:
    external: true
    driver: overlay

节点2 配置(192.168.1.102)

创建 /opt/elasticsearch/docker-compose.yml

version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.15
    container_name: elasticsearch
    hostname: es-node-02
    environment:
      - node.name=es-node-02
      - cluster.name=es-multi-server-cluster
      - node.master=true
      - node.data=true
      - node.ingest=true
      - discovery.seed_hosts=es-node-01:9300,es-node-03:9300
      - cluster.initial_master_nodes=es-node-01,es-node-02,es-node-03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.http.ssl.enabled=false
      - network.host=0.0.0.0
      - network.publish_host=192.168.1.102
      - transport.publish_host=192.168.1.102
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
      - elasticsearch-logs:/usr/share/elasticsearch/logs
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elasticsearch-net
    restart: unless-stopped

volumes:
  elasticsearch-data:
    driver: local
  elasticsearch-logs:
    driver: local

networks:
  elasticsearch-net:
    external: true
    driver: overlay

节点3 配置(192.168.1.103)

创建 /opt/elasticsearch/docker-compose.yml

version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.15
    container_name: elasticsearch
    hostname: es-node-03
    environment:
      - node.name=es-node-03
      - cluster.name=es-multi-server-cluster
      - node.master=true
      - node.data=true
      - node.ingest=true
      - discovery.seed_hosts=es-node-01:9300,es-node-02:9300
      - cluster.initial_master_nodes=es-node-01,es-node-02,es-node-03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - xpack.security.enabled=true
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.http.ssl.enabled=false
      - network.host=0.0.0.0
      - network.publish_host=192.168.1.103
      - transport.publish_host=192.168.1.103
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
      - elasticsearch-logs:/usr/share/elasticsearch/logs
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elasticsearch-net
    restart: unless-stopped

volumes:
  elasticsearch-data:
    driver: local
  elasticsearch-logs:
    driver: local

networks:
  elasticsearch-net:
    external: true
    driver: overlay

4. 启动各节点服务

# 在节点1上执行
cd /opt/elasticsearch
docker-compose up -d

# 在节点2上执行
cd /opt/elasticsearch
docker-compose up -d

# 在节点3上执行
cd /opt/elasticsearch
docker-compose up -d

# 查看日志
docker-compose logs -f

# 等待服务完全启动(约2-3分钟)
sleep 180

# 验证集群状态
curl -X GET "http://localhost:9200/_cluster/health?pretty"

方案二:Ubuntu 二进制包多服务器部署

1. 系统初始化(所有节点执行)

# 更新系统
sudo apt update && sudo apt upgrade -y

# 安装必要工具
sudo apt install -y curl wget vim net-tools htop unzip openjdk-11-jdk

# 配置主机名(根据节点修改)
sudo hostnamectl set-hostname es-node1  # 节点1
# sudo hostnamectl set-hostname es-node2  # 节点2
# sudo hostnamectl set-hostname es-node3  # 节点3

# 配置 hosts 文件(所有节点相同)
sudo tee -a /etc/hosts << EOF
192.168.1.101 es-node1
192.168.1.102 es-node2
192.168.1.103 es-node3
EOF

# 系统优化配置
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "vm.swappiness=1" | sudo tee -a /etc/sysctl.conf
echo "net.core.rmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 87380 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 65536 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.core.somaxconn = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog = 8192" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_fin_timeout = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_time = 120" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_probes = 3" | sudo tee -a /etc/sysctl.conf

sudo sysctl -p

# 文件描述符限制
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* soft nproc 4096" | sudo tee -a /etc/security/limits.conf
echo "* hard nproc 4096" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch soft memlock unlimited" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch hard memlock unlimited" | sudo tee -a /etc/security/limits.conf

# 创建数据目录
sudo mkdir -p /usr/local/elasticsearch/data
sudo mkdir -p /usr/local/elasticsearch/logs

2. 下载安装包(所有节点执行)

# 下载 Elasticsearch 7.17.15
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.15-linux-x86_64.tar.gz

# 验证文件完整性
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.15-linux-x86_64.tar.gz.sha512
sha512sum -c elasticsearch-7.17.15-linux-x86_64.tar.gz.sha512

# 解压并安装
tar -xzf elasticsearch-7.17.15-linux-x86_64.tar.gz
sudo mv elasticsearch-7.17.15 /usr/local/elasticsearch
sudo chown -R root:root /usr/local/elasticsearch

3. 创建用户和权限(所有节点执行)

# 创建 elasticsearch 用户
sudo useradd -M -r -s /bin/false elasticsearch

# 设置权限
sudo chown -R elasticsearch:elasticsearch /usr/local/elasticsearch/data
sudo chown -R elasticsearch:elasticsearch /usr/local/elasticsearch/logs
sudo chmod -R 755 /usr/local/elasticsearch

4. 节点1配置(192.168.1.101)

创建 /usr/local/elasticsearch/config/elasticsearch.yml

# 集群配置
cluster.name: es-multi-server-cluster
node.name: es-node1
node.master: true
node.data: true
node.ingest: true

# 网络配置
network.host: 192.168.1.101
http.port: 9200
transport.port: 9300

# 集群发现
discovery.seed_hosts: ["192.168.1.101:9300", "192.168.1.102:9300", "192.168.1.103:9300"]
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]

# 数据路径
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs

# 内存配置
bootstrap.memory_lock: true

# JVM 配置(在 jvm.options 文件中设置)
# -Xms2g
# -Xmx2g

5. 节点2配置(192.168.1.102)

创建 /usr/local/elasticsearch/config/elasticsearch.yml

# 集群配置
cluster.name: es-multi-server-cluster
node.name: es-node2
node.master: true
node.data: true
node.ingest: true

# 网络配置
network.host: 192.168.1.102
http.port: 9200
transport.port: 9300

# 集群发现
discovery.seed_hosts: ["192.168.1.101:9300", "192.168.1.102:9300", "192.168.1.103:9300"]
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]

# 数据路径
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs

# 内存配置
bootstrap.memory_lock: true

# JVM 配置(在 jvm.options 文件中设置)
# -Xms2g
# -Xmx2g

6. 节点3配置(192.168.1.103)

创建 /usr/local/elasticsearch/config/elasticsearch.yml

# 集群配置
cluster.name: es-multi-server-cluster
node.name: es-node3
node.master: true
node.data: true
node.ingest: true

# 网络配置
network.host: 192.168.1.103
http.port: 9200
transport.port: 9300

# 集群发现
discovery.seed_hosts: ["192.168.1.101:9300", "192.168.1.102:9300", "192.168.1.103:9300"]
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]

# 数据路径
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs

# 内存配置
bootstrap.memory_lock: true

# JVM 配置(在 jvm.options 文件中设置)
# -Xms2g
# -Xmx2g

7. JVM 配置(所有节点通用)

编辑 /usr/local/elasticsearch/config/jvm.options

# 堆内存配置
-Xms2g
-Xmx2g

# GC 配置
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=16m
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

# 堆转储
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/elasticsearch/logs

# GC 日志
-Xlog:gc*,gc+age=trace,safepoint:file=/usr/local/elasticsearch/logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

# 其他优化
-XX:+UseStringDeduplication
-XX:+UseCompressedOops
-XX:MaxDirectMemorySize=1g

8. 创建 Systemd 服务(所有节点执行)

创建 /etc/systemd/system/elasticsearch.service

[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
Type=notify
RuntimeDirectory=elasticsearch
PrivateTmp=true
Environment=ES_HOME=/usr/local/elasticsearch
Environment=ES_PATH_CONF=/usr/local/elasticsearch/config
Environment=PID_DIR=/run/elasticsearch
Environment=ES_SD_NOTIFY=true
EnvironmentFile=-/etc/default/elasticsearch

WorkingDirectory=/usr/local/elasticsearch

User=elasticsearch
Group=elasticsearch

ExecStart=/usr/local/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet

# 标准输出重定向到 systemd
# 注意:Elasticsearch 使用自己的日志系统,日志默认存储在 /var/log/elasticsearch
# 标准输出重定向到 systemd 日志,便于查看启动时的错误信息
StandardOutput=journal
StandardError=inherit

# 限制
LimitNOFILE=65536
LimitNPROC=4096
LimitAS=infinity
LimitFSIZE=infinity

# 超时设置
TimeoutStopSec=0
KillSignal=SIGTERM
KillMode=process
SendSIGKILL=no
SuccessExitStatus=143
TimeoutStartSec=75

[Install]
WantedBy=multi-user.target

然后执行:

# 重新加载 systemd
sudo systemctl daemon-reload

9. 启动服务(所有节点依次执行)

# 启动 Elasticsearch 服务
sudo systemctl start elasticsearch

# 设置开机自启
sudo systemctl enable elasticsearch

# 查看服务状态
sudo systemctl status elasticsearch

# 查看日志
sudo journalctl -u elasticsearch -f

# 等待服务完全启动(约2-3分钟)
sleep 180

# 验证服务
curl -X GET "http://localhost:9200/"

10. 验证集群状态

# 在任意节点上验证集群状态
curl -X GET "http://localhost:9200/_cluster/health?pretty"

# 查看节点信息
curl -X GET "http://localhost:9200/_nodes?pretty"

# 查看集群状态
curl -X GET "http://localhost:9200/_stats?pretty"

集群验证与测试

1. 健康检查(多服务器验证)

# 检查集群健康状态(在所有节点上验证)
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.102:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.103:9200/_cluster/health?pretty"

# 查看节点信息
curl -X GET "http://192.168.1.101:9200/_nodes/stats?pretty"

# 查看集群统计信息
curl -X GET "http://192.168.1.101:9200/_stats?pretty"

# 查看索引信息
curl -X GET "http://192.168.1.101:9200/_cat/indices?v"

2. 跨节点功能测试

# 在节点1上创建索引
curl -X PUT "http://192.168.1.101:9200/test_index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}'

# 在节点2上插入文档
curl -X POST "http://192.168.1.102:9200/test_index/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "message": "Hello from node2",
  "timestamp": "2024-01-01T00:00:00"
}'

# 在节点3上插入另一个文档
curl -X POST "http://192.168.1.103:9200/test_index/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "message": "Hello from node3",
  "timestamp": "2024-01-01T01:00:00"
}'

# 在任意节点上搜索文档(验证数据同步)
curl -X GET "http://192.168.1.101:9200/test_index/_search?pretty"

# 验证负载均衡(查看分片分布)
curl -X GET "http://192.168.1.101:9200/_cat/shards?v"

# 删除索引
curl -X DELETE "http://192.168.1.101:9200/test_index?pretty"

3. 高可用性测试

# 测试节点故障转移
# 1. 停止节点2的服务(模拟节点故障)
sudo systemctl stop elasticsearch  # 在节点2上执行

# 2. 在节点1或3上检查集群状态
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
# 应该显示 "status": "yellow"(因为副本分片可能不足)

# 3. 检查节点列表(节点2应该显示为不可用)
curl -X GET "http://192.168.1.101:9200/_cat/nodes?v"

# 4. 重新启动节点2
sudo systemctl start elasticsearch  # 在节点2上执行

# 5. 验证集群恢复正常
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
# 应该显示 "status": "green"

4. 集群状态说明

  • green: 所有主分片和副本分片都正常分配
  • yellow: 所有主分片都正常分配,但至少有一个副本分片未分配
  • red: 至少有一个主分片未分配

性能优化(Ubuntu 多服务器优化)

1. JVM 调优(所有节点)

# 修改 JVM 参数
sudo vim /usr/local/elasticsearch/config/jvm.options

# 推荐配置(根据服务器内存调整)
# 对于 8GB 内存服务器:
-Xms4g  # 初始堆内存 4GB,适合中小型数据量
-Xmx4g  # 最大堆内存 4GB,与初始值相同避免动态调整

# 对于 16GB 内存服务器:
-Xms8g  # 初始堆内存 8GB,适合中等数据量和查询负载
-Xmx8g  # 最大堆内存 8GB,充分利用 16GB 物理内存的 50%

# 对于 32GB 内存服务器:
-Xms16g  # 初始堆内存 16GB,适合大数据量和高并发查询场景
-Xmx16g  # 最大堆内存 16GB,32GB 物理内存的 50% 是最优配置

# GC 配置
-XX:+UseG1GC  # 使用 G1 垃圾收集器,适合大内存应用
-XX:MaxGCPauseMillis=200  # G1 最大停顿时间目标,200ms 平衡吞吐量和响应时间
-XX:+UseStringDeduplication  # 字符串去重,减少内存占用
-XX:-UnlockExperimentalVMOptions  # 解锁实验性 JVM 选项
-XX:+UseCGroupMemoryLimitForHeap  # 使用 CGroup 内存限制,容器环境中自动调整

2. Ubuntu 系统优化(所有节点)

# 内核参数优化
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
echo "vm.swappiness=1" | sudo tee -a /etc/sysctl.conf

# 文件系统优化
echo "fs.file-max=65536" | sudo tee -a /etc/sysctl.conf
echo "fs.nr_open=65536" | sudo tee -a /etc/sysctl.conf

# 网络优化
echo "net.core.rmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_max = 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 87380 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 65536 16777216" | sudo tee -a /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.core.somaxconn = 16384" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog = 8192" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_fin_timeout = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_time = 120" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_intvl = 30" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_keepalive_probes = 3" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_tw_reuse = 1" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_tw_recycle = 0" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 1024 65535" | sudo tee -a /etc/sysctl.conf

# 虚拟内存优化
echo "vm.vfs_cache_pressure=50" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_background_bytes=67108864" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_bytes=536870912" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_ratio=80" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_background_ratio=5" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_expire_centisecs=12000" | sudo tee -a /etc/sysctl.conf

sudo sysctl -p

# 禁用 swap(重要)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab

# CPU 频率优化(可选)
echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
sudo systemctl restart cpufrequtils

3. Elasticsearch 配置优化(所有节点)

编辑 /usr/local/elasticsearch/config/elasticsearch.yml,添加以下配置:

# 内存优化配置
indices.memory.index_buffer_size: 30%
indices.queries.cache.size: 10%
indices.fielddata.cache.size: 20%

# 搜索优化配置
thread_pool.search.size: 10
thread_pool.search.queue_size: 2000
thread_pool.search.min_queue_size: 1000
thread_pool.search.max_queue_size: 3000
thread_pool.search.auto_queue_frame_size: 2000
thread_pool.search.target_response_time: 1s

# 索引优化配置
thread_pool.write.size: 8
thread_pool.write.queue_size: 1000

# 网络优化配置
network.tcp.keep_alive: true
network.tcp.reuse_address: true
network.tcp.send_buffer_size: 65536
network.tcp.receive_buffer_size: 65536

# 分片优化配置
cluster.routing.allocation.cluster_concurrent_rebalance: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
cluster.routing.allocation.node_initial_primaries_recoveries: 4

# 查询缓存配置
indices.queries.cache.count: 10000
indices.queries.cache.size: 15%
indices.fielddata.cache.size: 30%
indices.requests.cache.size: 2%

# 断路器配置(防止内存溢出)
indices.breaker.total.use_real_memory: false
indices.breaker.total.limit: 70%
indices.breaker.request.limit: 60%
indices.breaker.fielddata.limit: 60%

4. 磁盘 I/O 优化(Ubuntu)

# SSD 优化(如果是 SSD 磁盘)
echo "deadline" | sudo tee /sys/block/sda/queue/scheduler

# 挂载选项优化
# 编辑 /etc/fstab,在相应的挂载条目添加:
# UUID=xxx /data ext4 defaults,noatime,nodiratime,nobarrier 0 2

# 文件系统缓存优化
echo "vm.vfs_cache_pressure=50" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_background_bytes=67108864" | sudo tee -a /etc/sysctl.conf
echo "vm.dirty_bytes=536870912" | sudo tee -a /etc/sysctl.conf

sudo sysctl -p

5. 网络优化(多服务器环境)

# 网络接口优化(根据实际网卡名称调整)
sudo ethtool -G eth0 rx 4096 tx 4096
sudo ethtool -K eth0 gso on
sudo ethtool -K eth0 tso on
sudo ethtool -K eth0 gro on

# 网络缓冲区优化
echo "net.core.rmem_default=262144" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_default=262144" | sudo tee -a /etc/sysctl.conf
echo "net.core.optmem_max=65536" | sudo tee -a /etc/sysctl.conf

sudo sysctl -p

监控和维护

1. 安装 Kibana 监控

# 下载 Kibana
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.17.15-linux-x86_64.tar.gz

# 解压
tar -xzf kibana-7.17.15-linux-x86_64.tar.gz
sudo mv kibana-7.17.15-linux-x86_64 /usr/local/kibana

# 配置 kibana.yml
sudo tee /usr/local/kibana/config/kibana.yml << EOF
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.username: "elastic"
elasticsearch.password: "password"
EOF

# 启动 Kibana
cd /usr/local/kibana
nohup ./bin/kibana &

2. 多服务器监控命令

# 集群健康状态(从任意节点检查)
curl -X GET "http://192.168.1.101:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.102:9200/_cluster/health?pretty"
curl -X GET "http://192.168.1.103:9200/_cluster/health?pretty"

# 节点统计信息
curl -X GET "http://192.168.1.101:9200/_nodes/stats?pretty"

# 索引统计信息
curl -X GET "http://192.168.1.101:9200/_stats?pretty"

# 集群设置
curl -X GET "http://192.168.1.101:9200/_cluster/settings?pretty"

# 查看分片分配
curl -X GET "http://192.168.1.101:9200/_cat/shards?v"

# 查看节点信息
curl -X GET "http://192.168.1.101:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,load_5m,load_15m,disk.used_percent"

故障排除(Ubuntu 多服务器环境)

1. 多服务器常见问题

问题1: 集群无法形成

  • 检查所有节点间的网络连接:
    ping 192.168.1.102
    ping 192.168.1.103
    
    telnet 192.168.1.102 9300
    telnet 192.168.1.103 9300
  • 确认所有节点的防火墙规则:
    sudo ufw status
    sudo ufw allow 9200/tcp
    sudo ufw allow 9300/tcp
  • 验证配置文件中的 IP 地址和主机名
  • 检查所有节点的节点名称是否唯一
  • 确认所有节点的集群名称一致

问题2: 内存不足(多服务器)

  • 检查每个节点的内存使用情况:
    free -h
    top -p $(pgrep -d',' java)
  • 增加 JVM 堆内存(考虑服务器总内存,建议不超过物理内存的 50%)
  • 减少分片数量(过多的分片会消耗大量内存)
  • 优化查询(避免复杂聚合和深度分页)
  • 添加更多数据节点(分散内存压力)

问题3: 磁盘空间不足(多服务器)

  • 检查所有节点的磁盘空间:
    for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
        echo "=== Node $ip ==="
        ssh user@$ip "df -h"
    done
  • 清理旧索引(删除不再需要的历史数据索引)
  • 增加磁盘空间(扩展存储或添加新节点)
  • 配置索引生命周期管理 (ILM)(自动管理索引的生命周期,包括删除旧索引)
  • 检查日志文件大小并清理(Elasticsearch 日志可能占用大量空间)

问题4: 网络分区(多服务器)

  • 检查网络连接稳定性(物理连接、网线、交换机端口状态)
  • 验证交换机/路由器配置(VLAN、端口聚合、STP 配置)
  • 检查网络延迟:
    ping -c 10 192.168.1.102
    ping -c 10 192.168.1.103
  • 调整网络超时参数(增加 discovery 超时时间,避免网络抖动导致节点脱离集群)

2. 多服务器日志分析

# 批量查看错误日志
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Errors on Node $ip ==="
    ssh user@$ip "sudo grep ERROR /var/log/elasticsearch/elasticsearch.log | tail -10"
done

# 查看集群状态变化(所有节点)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Cluster state changes on Node $ip ==="
    ssh user@$ip "sudo grep 'cluster state updated' /var/log/elasticsearch/elasticsearch.log | tail -5"
done

# 查看分片分配情况(所有节点)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Shard allocation on Node $ip ==="
    ssh user@$ip "sudo grep 'shard allocation' /var/log/elasticsearch/elasticsearch.log | tail -5"
done

# 查看主节点选举日志
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Master election on Node $ip ==="
    ssh user@$ip "sudo grep 'master.*elected' /var/log/elasticsearch/elasticsearch.log | tail -5"
done

3. 多服务器网络诊断

# 检查端口连通性
for ip in 192.168.1.102 192.168.1.103; do
    echo "=== Testing connection to $ip ==="
    nc -zv $ip 9200
    nc -zv $ip 9300
done

# 检查网络配置
ip addr show
netstat -tlnp | grep -E '9200|9300'

# 检查 DNS 解析
nslookup es-node1
nslookup es-node2
nslookup es-node3

# 检查路由表
route -n

4. 多服务器性能诊断

# 检查各节点资源使用情况
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Resource usage on Node $ip ==="
    ssh user@$ip "top -bn1 | head -20"
done

# 检查磁盘 I/O
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Disk I/O on Node $ip ==="
    ssh user@$ip "iostat -x 1 3"
done

# 检查网络带宽
for ip in 192.168.1.102 192.168.1.103; do
    echo "=== Network bandwidth test to $ip ==="
    iperf3 -c $ip -t 10
done

5. 原始多服务器日志查看

# 查看所有节点的日志(最近的)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Recent logs on Node $ip ==="
    ssh user@$ip "sudo tail -50 /var/log/elasticsearch/elasticsearch.log"
done

# 查看所有节点的 GC 日志
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== GC logs on Node $ip ==="
    ssh user@$ip "sudo tail -20 /usr/local/elasticsearch/logs/gc.log"
done

# 批量重启所有节点(谨慎操作)
for ip in 192.168.1.101 192.168.1.102 192.168.1.103; do
    echo "=== Restarting Node $ip ==="
    ssh user@$ip "sudo systemctl restart elasticsearch"
    sleep 30  # 等待30秒,确保集群稳定,避免同时重启导致集群不可用
done

总结

本指南提供了两种在 Ubuntu 多服务器环境下搭建 Elasticsearch 7 集群的完整方案:

  1. Docker 部署方案:适合快速部署和测试环境,配置简单,易于管理
  2. 二进制包部署方案:适合生产环境,性能更好,控制更精细

关键要点:

  • 系统优化:内核参数、文件描述符、内存锁定等配置对性能至关重要
  • 网络配置:确保所有节点间网络互通,防火墙规则正确
  • 内存配置:JVM 堆内存建议设置为物理内存的 50%,但不超过 32GB
  • 监控维护:定期监控集群健康状态,及时处理故障
  • 数据安全:配置副本分片,确保数据高可用

性能优化亮点:

  • JVM G1 垃圾收集器调优
  • 系统内核参数优化
  • 网络缓冲区调优
  • 磁盘 I/O 优化
  • Elasticsearch 内部缓存配置

运维管理:

  • 提供了完整的监控命令集合
  • 详细的故障排除指南
  • 多服务器环境下的日志分析方法
  • 性能诊断工具使用

通过遵循本指南,您可以搭建一个高性能、高可用的 Elasticsearch 7 集群,满足生产环境的搜索和分析需求。

后续建议:

  1. 定期更新 Elasticsearch 版本以获得最新功能和安全修复
  2. 配置自动化监控告警系统
  3. 建立完善的备份恢复机制
  4. 根据实际业务负载持续优化集群配置
  5. 考虑使用 Elasticsearch 的 X-Pack 安全功能加强集群安全

希望这份指南对您搭建 Elasticsearch 集群有所帮助!如有问题,请参考故障排除部分或查阅官方文档。