Table of contents : Technical Reviewers Breif Contents Contents in Detail Acknowledgments Introduction Properties of a Linux Enterprise Cluster Architecture of the Linux Enterprise Cluster The Load Balancer The Shared Storage Device The Print Server The Cluster Node Manager No Single Point of Failure In Conclusion Primer High Availability Terminology Linux Enterprise Cluster Terminology PART I: Cluster Resources 1: Starting Services How Do Cluster Services Get Started? Starting Services with init The /etc/inittab File Respawning Services with init Managing the init Script Symbolic Links with chkconfig Managing the init Script Symbolic Links with ntsysv Removing Services You Do Not Need Using the Red Hat init Scripts on Cluster Nodes In Conclusion 2: Handling Packets Netfilter A Brief History of Netfilter Setting the Default Chain Policy Using iptables and ipchains Clear Existing Rules Set the Default INPUT Chain Policy FTP Passive FTP DNS Telnet SSH Email HTTP ICMP Review and Save Your Rules Routing Packets with the Linux Kernel Matching Packets for a Particular Destination Network Matching Packets for a Particular Destination Host Matching Packets for Any Destination or Host To View Your Routing Rules Making Your Routing Rules Permanent iptables and Routing Resource Script The ip Command In Conclusion 3: Compiling the Kernel What You Will Need Step 1: Get the Source Code Using the Stock Kernel Using the Kernel Supplied with Your Distribution Decide Which Kernel Version to Use Step 2: Set the Options You Want Installing a New Kernel Upgrading or Patching the Kernel Upgrading a Kernel from a Distribution Vendor Set Your Kernel Options Step 3: Compile the Code Step 4: Install the Object Code and Configuration File Install the System.map Save the Kernel Configuration File Step 5: Configure Your Boot Loader In Conclusion PART II: High Availability 4: Synchronizing Servers with rysnc and SSH rsync Open SSH 2 and rsync SSH: A Secure Data Transport SSH Encryption Keys Establishing the Two-Way Trust Relationship: Part 1 Establishing the Two-Way Trust Relationship: Part 2 Two-Node SSH Client-Server Recipe Create a User Account That Will Own the Data Files Configure the Open SSH2 Host Key on the SSH Server Create a User Encryption Key for This New Account on the SSH Client Copy the Public Half of the User Encryption Key from the SSH Client to the SSH Server Test the Connection from the SSH Client to the SSH Server Using the Secure Data Transport Improved Security rsync over SSH Copying a Single File with rsync rsync over Slow WAN Connections Scheduled rsync Snapshots ipchains/iptables Firewall Rules for rsync and SSH In Conclusion 5: Cloning Systems with SystemImager SystemImager Cloning the Golden Client with SystemImager SystemImager Recipe Install the SystemImager Server Software on the SystemImager Server Using the Installer Program from SystemImager Install the SystemImager Client Software on the Golden Client Create a System Image of the Golden Client on the SystemImager Server Make the Primary Data Server into a DHCP Server Create a Boot Floppy for the Golden Client Start rsync as a Daemon on the Primary Data Server Install the Golden Client System Image on the New Clone Post-installation Notes Performing Maintenance: Updating Clients SystemInstaller System Configurator In Conclusion 6: Heartbeat Introduction and Theory The Physical Paths of the Heartbeats Serial Cable Connection Ethernet Cable Connection Partitioned Clusters and STONITH Heartbeat Control Messages Heartbeats Cluster Transition Messages Retransmission Requests Ethernet Heartbeat Control Messages Security and Heartbeat Control Messages How Client Computers Access Resources Failover Using IP Address Takeover (IPAT) Secondary IP Addresses and IP Aliases Ethernet NIC Device Names Secondary IP Address Names IP Aliases Offering Services Gratuitous ARP (GARP) Broadcasts Resource Scripts Status of the Resource Resource Ownership Using init Scripts as Heartbeat Resource Scripts Heartbeat Configuration Files In Conclusion 7: A Sample Heartbeat Configuration Recipe Preparations Step 1: Install Heartbeat Step 2: Configure /etc/ha.d/ha.cf Step 3: Configure /etc/ha.d/haresources Configure the haresources File Step 4: Configure /etc/ha.d/authkeys Step 5: Install Heartbeat on the Backup Server Step 6: Set the System Time Step 7: Launch Heartbeat Launch Heartbeat on the Primary Server Launch Heartbeat on the Backup Server Examining the Log Files on the Primary Server Stopping and Starting Heartbeat Monitoring Resources In Conclusion 8: Heartbeat Resources and Maintenance The Haresources File Syntax Haresources File Syntax: Primary-Server Name Haresources File Syntax: IP Alias Heartbeat’s Automated Network Interface Card Selection Process Specifying a Network Interface Card Customizing IP Address Takeover with the iptakeover Script The Haresources File Syntax: Resources Load Sharing with Heartbeat Load Sharing with Heartbeat: Round-Robin DNS Problems with Round Robin DNS Load Balancing Wide-Area Load Balancing Operator Alerts: Audible Alarm Operator Alerts: Email Alerts Heartbeat Maintenance Changing Heartbeat Configuration Files Server Maintenance and the Heartbeat auto_failback Option Forcing the Primary Server into Standby Mode Tuning Heartbeat’s Deadtime Value Informational Messages in Heartbeat’s Log Failover and Respawn (Automatically Restarting Failed Resources) License Manager Failover In Conclusion 9: Stonith and ipfail Stonith An Unconventional Approach: Using a Single Stonith Device Sample Heartbeat with Stonith Configuration Stonith Sequence of Events Stonith Devices Viewing the Current List of Supported Stonith Devices The Stonith Meatware “Device” Using the Stonith Meatware Device with Heartbeat Using a “Real” Stonith Device Avoiding Multiple Stonith Events Network Failures ipfail Watchdog and Softdog Enable Watchdog in the Kernel Kernel Panic-Hang or Reboot? Configure Heartbeat to Support Watchdog Testing Your Heartbeat Configuration In Conclusion PART III: Cluster Theory and Practice 10: How to Build a Linux Enterprise Cluster Steps for Building a Linux Enterprise Cluster NAS Server Kernel Netfilter and Kernel Packet Routing Cloning a Linux Machine Cluster Naming Scheme Applying System Configuration Changes to All Nodes Building an LVS-NAT Cluster Building an LVS-DR Cluster Installing Software to Remove Failed Cluster Nodes Installing Software to Monitor the Cluster Nodes Monitoring the Performance of Cluster Nodes Updating Software on Cluster Nodes and Servers Centralizing User Account Administration Installing a Printing System Installing a Highly Available Batch Job-Scheduling System Purchasing the Cluster Nodes In Conclusion 11: The Linux Virtual Server: Introduction and Theory LVS IP Address Name Conventions The Virtual IP (VIP) The Real IP (RIP) The Director’s IP (DIP) The Client Computer’s IP (CIP) IP Addresses in an LVS Cluster Types of LVS Clusters Network Address Translation (LVS-NAT) Direct Routing (LVS-DR) IP Tunneling (LVS-TUN) LVS Scheduling Methods Fixed (or Non-dynamic) Scheduling Methods Dynamic Scheduling Methods In Conclusion 12: The LVS-NAT Cluster How Client Computers Access LVS-NAT Cluster Resources Virtual IP Addresses on LVS-NAT Real Servers Building an LVS-NAT Web Cluster Recipe for LVS-NAT Step 1: Install the Operating System Step 2: Configure and Start Apache on the Real Server Step 3: Set the Default Route on the Real Server Step 4: Install the LVS Software on the Director Step 5: Configure LVS on the Director Step 6: Test the Cluster Configuration LocalNode: Using the Director as a Real Server In Conclusion 13: The LVS-DR Cluster How Client Computers Access LVS-DR Cluster Services ARP Broadcasts and the LVS-DR Cluster Client Computers and ARP Broadcasts In Conclusion 14: The Load Balancer LVS and Netfilter The Director’s Connection Tracking Table Hash Table Structure Controlling the Hash Buckets Viewing the Connection Tracking Table Timeout Values for Connection Tracking Records Return Packets and the Netfilter Hooks LVS Without Persistence LVS Persistence Persistent Connection Template Types of Persistent Connections Persistent Client Connection (PCC) Persistent Port Connection (PPC) Port Affinity Netfilter Marked Packets In Conclusion 15: The High-Availability Cluster Redundant LVS Directors High-Availability Cluster Design Goals The High-Availability LVS-DR Cluster Introduction to ldirectord How ldirectord Monitors Cluster Nodes (LVS Real Servers) LVS, Heartbeat, and ldirectord Recipe Hide the Loopback Interface Install the Heartbeat on a Primary and a Backup Director Install ldirectord and Its Required Software Components Install ldirectord Test Your ldirectord Installation Create the ldirectord Configuration File Create the Health Check Web Page Start ldirectord Manually and Test Your Configuration Add ldirectord to the Heartbeat Configuration Stateful Failover of the IPVS Table Modifications to Allow Failover to a Real Server Inside the Cluster In Conclusion 16: The Network File System Lock Arbitration The Lock Arbitrator The Existing Kernel Lock Arbitration Methods The Network Lock Manager (NLM) NLM and Kernel Lock Arbitration NLM and Kernel BSD flock NLM and Kernel System V lockf NLM and Kernel Posix fcntl NFS and File Lock (dotlock) Arbitration Finding the Locks Held by the Linux Kernel Performance Issues with NFS-Bottlenecks and Perceptions Single Transactions and User Perception of NFS Performance Multiple Transactions and User Perception of NFS Performance Managing Lock and GETATTR Operations in a Cluster Environment Managing Attribute Caching Managing Interactive User Applications and Batch Jobs in a Cluster Environment Run Batch Jobs Outside the Cluster Use Multiple NAS Servers Measuring NFS Latency Measuring Total I/O Operations Achieving the Best NAS Performance Possible NFS Client Configuration Options Putting It All Together Developing NFS Additional Starting Points for Information on Linux and NFS In Conclusion PART IV: Maintenance and Monitoring 17: The Simple Network Management Protocol and Mon Mon Mon Alerts Mon Monitoring Scripts Where to Run Mon Basic Mon Recipe Step 1: Compile and Install the fping Package Step 2: Install the SNMP Package Step 3: Install the Required CPAN Modules for Mon Step 4: Install the Mon Software Step 5: Create the /etc/mon/mon.cf Configuration File Step 6: Test by Running the fping.monitor and mail.alert Scripts Manually Step 7: Create the Mon Log Directory and Mon Log File Step 8: Start the Mon Program in Debugging Mode and Test Mon and SNMP “Proof of Concept” Recipe Step 1: Install Net-SNMP Client Software on Each Cluster Node Step 2: Create the snmp.conf Configuration File Step 3: Start the SNMP Agent Step 4: View the SNMP MIB Locally Step 5: Install the SNMP Monitoring Agents Examine the Output of netsnmp-freespace.monitor Install Mon on All Cluster Nodes Mon and SNMP “Real-World” Recipe Step 1: Create the snmpd.conf on All Cluster Nodes Step 2: Install netsnmp-proc.monitor and the New mon.cf File on the Cluster Node Manager Step 3: Install the Mon init Script Step 4: Run SNMP and Mon Email Alerts from Mon Creating Your Own SNMP Script A Sample Custom SNMP Script Monitoring Your SNMP Script with Mon Things to Monitor with SNMP Monitoring Scripts Forcing a Stonith Event with Mon Forcing a Heartbeat Failover with Mon In Conclusion 18: Ganglia Introduction to Ganglia gmond gmetad Installing Ganglia’s Prerequisite Packages Install PHP Install RRDtool Install Apache on the Cluster Node Manager Installing Ganglia on the Cluster Node Manager Installing Ganglia on the Cluster Nodes Configuring gmetad and gmond on the Cluster Node Manager Modify /etc/gmetad.conf Modify /etc/gmond.conf Start gmond and gmetad Add the Ganglia Page to Your Apache Configuration The Ganglia Web Package The Title Section The Node Snapshot Section Examining the Cluster Node from the Ganglia Web Package Examining the Cluster Node from the Shell Prompt gstat Running a Command on the Least-Loaded Cluster Node Using gstat Creating Custom Metrics with gmetric In Conclusion 19: Case Studies in Cluster Administration Administering Accounts Without Active Directory Legacy Unix Account Administration Methods: The Problem The Best of Both Worlds Building a Reliable Cluster Account Authentication Mechanism Using the Local passwd and Group File on Each Cluster Node A Simple Script Building a Fault-Tolerant Print Spooler Cluster Nodes and Job Ordering LPRng: A Linux Enterprise Cluster Printing System Cluster Nodes and Print Jobs Building the Cluster Printing System Based on LPRng Install LPRng on the Central Print Server Install LPRng on the Cluster Nodes Modify the /etc/printcap.local File on the Cluster Nodes Modify the /etc/printcap File on the Central Print Server Managing Print Jobs on the Central Print Server Managing Print Jobs from the Cluster Nodes Rebooting Nodes for Preventative Maintenance Using ipvsadm Commands to Remove a Cluster Node Changing the Weight of a Cluster Node to 0 Disabling Telnet Access to One of the Cluster Nodes Sending and Receiving Email in a Cluster Environment Creating a Batch Job-Scheduling System with No Single Point of Failure Run ssh-keygen Modify the sshd_config File on Each Cluster Node Create the (RSA) known_hosts Entries on the Cluster Node Manager The Batch Job Scheduler In Conclusion 20: The Linux Cluster Environment The Linux Enterprise Cluster Applications Running on the Cluster The Cluster Node Manager The Clients The NAS Server High-Availability NAS Server Highly Available Serial Devices Serial-to-IP Communication Device High-Availability Modems Highly Available Database Server High-Availability SQL Server Putting It All Together The Cluster Environment In Conclusion A: Downloading Software from the Internet (from a Text Terminal) Using Lynx Using Wget What to Do with the File You Downloaded tar Files rpm Files Creating Your Own Tape Archive (tar) File B: Troubleshooting with the tcpdump Utility C: Adding Network Interface Cards to Your System Monolithic Versus Modular Kernels View Your Existing Configuration Install the Card and Reboot Run linuxconf Testing the New NIC Changes Made by linuxconf Using the New NIC Testing and Troubleshooting D: Strategies for Dependency Failures What Are Dependencies? Dynamic Executables and Shared Objects rpm Packages and Shared Library Dependencies Fixing a Dependency Failure Manually Fixing a Dependency Failure Automatically Using Yum to Install rpm Packages Installing a New rpm Package with Yum In Conclusion E: Other Potential Cluster Filesystems and Lock Arbitration Methods F: LVS Clusters and the Apache Configuration File ServerName DocumentRoot BindAddress Port Listen Apache Virtual Host Configuration on Cluster Nodes Apache IP-Based Virtual Hosts Name-Based Virtual Hosts Self-Referential (Redirection) URLs IP-Based Virtual Hosts and Self-Referential URLs Name-Based Virtual Hosts and Self-Referential URLs Verify Your Virtual Host Configuration Index Updates About the CD-ROM CD License Agreement