RRabbitMQ Handbook

İLERİ

Clustering & Node Discovery

Tek bir RabbitMQ sunucusu çökürse ne olur? Cluster kurarak birden fazla sunucuyu tek bir sistem gibi çalıştırırsın — biri düşse diğerleri devam eder.

Seviye: İleri — Bu bölüm production cluster deneyimi gerektirir. Önce Temel ve Orta seviye konularını tamamlayın.

📖 Teknik detay: Cluster'daki tüm node'lar kullanıcı, exchange ve binding bilgilerini paylaşır. Ancak mesajların kendisi varsayılan olarak kopyalanmaz — bunun için Quorum Queue kullanmalısın (Quorum Queues ve High Availability sayfasında anlatılıyor).

RabbitMQ Cluster (Erlang Distribution) rabbit@node1 Disc Node Metadata (replicated) Queue Leader: orders-q Queue Follower: payments-q :5672 AMQP | :15672 Mgmt :25672 Erlang dist | :4369 epmd rabbit@node2 Disc Node Metadata (replicated) Queue Leader: payments-q Queue Follower: orders-q :5672 AMQP | :15672 Mgmt :25672 Erlang dist | :4369 epmd rabbit@node3 Disc Node Metadata (replicated) Queue Follower: orders-q Queue Follower: payments-q :5672 AMQP | :15672 Mgmt :25672 Erlang dist | :4369 epmd

Cluster Formation Yöntemleri

Yöntem Kullanım Alanı Konfigürasyon
Config file Statik ortamlar, VM'ler cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
DNS DNS SRV kayıtları olan ortamlar cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
Kubernetes K8s StatefulSet cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
Consul HashiCorp Consul kullanan ortamlar cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
etcd etcd kullanan ortamlar cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd
Manual CLI Test, dev, one-off setup rabbitmqctl join_cluster rabbit@node1

Gerekli Portlar

Port Protokol Amaç
4369 TCP epmd (node discovery daemon)
5672 TCP AMQP 0-9-1 / AMQP 1.0 client bağlantıları
15672 HTTP Management UI & HTTP API
25672 TCP Erlang distribution (inter-node iletişim)
35672-35682 TCP CLI tools (Erlang distribution client)
6000-6500 TCP Stream replication

Network Partition Handling

Strateji Davranış Ne Zaman Risk
pause-minority Azınlık taraftaki node'lar kendini duraklatır Önerilen (çoğu production) Minority side geçici unavailable
autoheal Partition çözüldüğünde kaybeden taraf restart olur Data loss tolere edilebiliyorsa Kaybeden taraftaki non-replicated mesajlar silinir
ignore Her iki taraf da çalışmaya devam eder (split-brain) Production'da kullanmayın Split-brain, veri tutarsızlığı

2-node cluster KULLANMAYIN: İki node'lu cluster'da partition olduğunda majority belirlenemez. pause-minority her iki tarafı da durdurur → tam unavailability. Minimum 3 node (tek sayı) zorunludur.

Gerçek hayat senaryosu: Kubernetes'te 3-node RabbitMQ cluster: StatefulSet + Headless Service + K8s peer discovery plugin. Her pod aynı Erlang cookie'yi Secret'tan alır. Parallel pod management policy ile restart deadlock'u önlenir.

# Cluster formation
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.service_name = rabbitmq-headless
cluster_formation.k8s.hostname_suffix = .rabbitmq-headless.messaging.svc.cluster.local
cluster_formation.node_cleanup.interval = 30
cluster_formation.node_cleanup.only_log_warning = true

# Partition handling
cluster_partition_handling = pause_minority

# Resource limits
vm_memory_high_watermark.relative = 0.7
disk_free_limit.absolute = 2GB

# Queue defaults
queue_leader_locator = balanced

# Quorum queue settings
quorum_queue.initial_cluster_size = 3

# Consumer timeout
consumer_timeout = 1800000
services:
  rabbit1:
    image: rabbitmq:4.3-management
    hostname: rabbit1
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie-value"
      RABBITMQ_NODENAME: "rabbit@rabbit1"
    ports:
      - "5672:5672"
      - "15672:15672"
    volumes:
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
    restart: unless-stopped
    networks:
      - rabbitmq-net

  rabbit2:
    image: rabbitmq:4.3-management
    hostname: rabbit2
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie-value"
      RABBITMQ_NODENAME: "rabbit@rabbit2"
    ports:
      - "5673:5672"
      - "15673:15672"
    volumes:
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
    restart: unless-stopped
    networks:
      - rabbitmq-net

  rabbit3:
    image: rabbitmq:4.3-management
    hostname: rabbit3
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie-value"
      RABBITMQ_NODENAME: "rabbit@rabbit3"
    ports:
      - "5674:5672"
      - "15674:15672"
    volumes:
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
    restart: unless-stopped
    networks:
      - rabbitmq-net

networks:
  rabbitmq-net:
    driver: bridge

Erlang Cookie: Tüm cluster node'ları aynı Erlang cookie değerine sahip olmalıdır. Cookie farklıysa node'lar birbirine bağlanamaz. Production'da bu değeri Secret Manager'dan (Vault, K8s Secret) alın, environment variable'da hardcode etmeyin.

# ❌ 2-node cluster: partition olduğunda majority belirlenemez
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit@node1
cluster_formation.classic_config.nodes.2 = rabbit@node2

# pause_minority → her iki taraf da minority → TAM OUTAGE
cluster_partition_handling = pause_minority
# autoheal → bir taraf silinir → DATA LOSS
# ignore → split-brain → VERİ TUTARSIZLIĞI
# ✅ 3-node: 1 node düşse bile majority (2/3) sağlanır
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit@node1
cluster_formation.classic_config.nodes.2 = rabbit@node2
cluster_formation.classic_config.nodes.3 = rabbit@node3

# pause_minority güvenle çalışır: minority (1) durur, majority (2) devam eder
cluster_partition_handling = pause_minority
queue_leader_locator = balanced