İLERİ

Clustering & Node Discovery

Tek bir RabbitMQ sunucusu çökürse ne olur? Cluster kurarak birden fazla sunucuyu tek bir sistem gibi çalıştırırsın — biri düşse diğerleri devam eder.

Seviye: İleri — Bu bölüm production cluster deneyimi gerektirir. Önce Temel ve Orta seviye konularını tamamlayın.

📖 Teknik detay: Cluster'daki tüm node'lar kullanıcı, exchange ve binding bilgilerini paylaşır. Ancak mesajların kendisi varsayılan olarak kopyalanmaz — bunun için Quorum Queue kullanmalısın (Quorum Queues ve High Availability sayfasında anlatılıyor).

Cluster Formation Yöntemleri

Yöntem	Kullanım Alanı	Konfigürasyon
Config file	Statik ortamlar, VM'ler	`cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config`
DNS	DNS SRV kayıtları olan ortamlar	`cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns`
Kubernetes	K8s StatefulSet	`cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s`
Consul	HashiCorp Consul kullanan ortamlar	`cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul`
etcd	etcd kullanan ortamlar	`cluster_formation.peer_discovery_backend = rabbit_peer_discovery_etcd`
Manual CLI	Test, dev, one-off setup	`rabbitmqctl join_cluster rabbit@node1`

Gerekli Portlar

Port	Protokol	Amaç
`4369`	TCP	epmd (node discovery daemon)
`5672`	TCP	AMQP 0-9-1 / AMQP 1.0 client bağlantıları
`15672`	HTTP	Management UI & HTTP API
`25672`	TCP	Erlang distribution (inter-node iletişim)
`35672-35682`	TCP	CLI tools (Erlang distribution client)
`6000-6500`	TCP	Stream replication

Network Partition Handling

Strateji	Davranış	Ne Zaman	Risk
`pause-minority`	Azınlık taraftaki node'lar kendini duraklatır	Önerilen (çoğu production)	Minority side geçici unavailable
`autoheal`	Partition çözüldüğünde kaybeden taraf restart olur	Data loss tolere edilebiliyorsa	Kaybeden taraftaki non-replicated mesajlar silinir
`ignore`	Her iki taraf da çalışmaya devam eder (split-brain)	Production'da kullanmayın	Split-brain, veri tutarsızlığı

2-node cluster KULLANMAYIN: İki node'lu cluster'da partition olduğunda majority belirlenemez. pause-minority her iki tarafı da durdurur → tam unavailability. Minimum 3 node (tek sayı) zorunludur.

Gerçek hayat senaryosu: Kubernetes'te 3-node RabbitMQ cluster: StatefulSet + Headless Service + K8s peer discovery plugin. Her pod aynı Erlang cookie'yi Secret'tan alır. Parallel pod management policy ile restart deadlock'u önlenir.

# Cluster formation
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.service_name = rabbitmq-headless
cluster_formation.k8s.hostname_suffix = .rabbitmq-headless.messaging.svc.cluster.local
cluster_formation.node_cleanup.interval = 30
cluster_formation.node_cleanup.only_log_warning = true

# Partition handling
cluster_partition_handling = pause_minority

# Resource limits
vm_memory_high_watermark.relative = 0.7
disk_free_limit.absolute = 2GB

# Queue defaults
queue_leader_locator = balanced

# Quorum queue settings
quorum_queue.initial_cluster_size = 3

# Consumer timeout
consumer_timeout = 1800000

services:
  rabbit1:
    image: rabbitmq:4.3-management
    hostname: rabbit1
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie-value"
      RABBITMQ_NODENAME: "rabbit@rabbit1"
    ports:
      - "5672:5672"
      - "15672:15672"
    volumes:
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
    restart: unless-stopped
    networks:
      - rabbitmq-net

  rabbit2:
    image: rabbitmq:4.3-management
    hostname: rabbit2
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie-value"
      RABBITMQ_NODENAME: "rabbit@rabbit2"
    ports:
      - "5673:5672"
      - "15673:15672"
    volumes:
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
    restart: unless-stopped
    networks:
      - rabbitmq-net

  rabbit3:
    image: rabbitmq:4.3-management
    hostname: rabbit3
    environment:
      RABBITMQ_ERLANG_COOKIE: "secret-cookie-value"
      RABBITMQ_NODENAME: "rabbit@rabbit3"
    ports:
      - "5674:5672"
      - "15674:15672"
    volumes:
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '2.0'
    restart: unless-stopped
    networks:
      - rabbitmq-net

networks:
  rabbitmq-net:
    driver: bridge

Erlang Cookie: Tüm cluster node'ları aynı Erlang cookie değerine sahip olmalıdır. Cookie farklıysa node'lar birbirine bağlanamaz. Production'da bu değeri Secret Manager'dan (Vault, K8s Secret) alın, environment variable'da hardcode etmeyin.

# ❌ 2-node cluster: partition olduğunda majority belirlenemez
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit@node1
cluster_formation.classic_config.nodes.2 = rabbit@node2

# pause_minority → her iki taraf da minority → TAM OUTAGE
cluster_partition_handling = pause_minority
# autoheal → bir taraf silinir → DATA LOSS
# ignore → split-brain → VERİ TUTARSIZLIĞI

# ✅ 3-node: 1 node düşse bile majority (2/3) sağlanır
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit@node1
cluster_formation.classic_config.nodes.2 = rabbit@node2
cluster_formation.classic_config.nodes.3 = rabbit@node3

# pause_minority güvenle çalışır: minority (1) durur, majority (2) devam eder
cluster_partition_handling = pause_minority
queue_leader_locator = balanced