147 122 9MB
English Pages [454]
RED HAT®
TRAINING Comprehensive, hands-on training that solves real world problems
Red Hat OpenStack Administration II Student Workbook
© 2017 Red Hat, Inc.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Rendered for Nokia. Please do not distribute.
RED HAT OPENSTACK ADMINISTRATION II
Rendered for Nokia. Please do not distribute.
Red Hat OpenStack Administration II
Red Hat OpenStack Platform 10.1 CL210 Red Hat OpenStack Administration II Edition 2 20171006 20171006 Authors:
Editor:
Adolfo Vazquez, Snehangshu Karmakar, Razique Mahroua, Morgan Weetman, Victor Costea, Michael Jarrett, Philip Sweany, Fiona Allen, Prasad Mukhedkar Seth Kenlon, David O'Brien, Forrest Taylor, Robert Locke
Copyright © 2017 Red Hat, Inc. The contents of this course and all its modules and related materials, including handouts to audience members, are Copyright © 2017 Red Hat, Inc. No part of this publication may be stored in a retrieval system, transmitted or reproduced in any way, including, but not limited to, photocopy, photograph, magnetic, electronic or other record, without the prior written permission of Red Hat, Inc. This instructional program, including all material provided herein, is supplied without any guarantees from Red Hat, Inc. Red Hat, Inc. assumes no liability for damages or legal action arising from the use or misuse of contents or details contained herein. If you believe Red Hat training materials are being used, copied, or otherwise improperly distributed please e-mail [email protected] or phone toll-free (USA) +1 (866) 626-2994 or +1 (919) 754-3700. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, Hibernate, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux® is the registered trademark of Linus Torvalds in the United States and other countries. Java® is a registered trademark of Oracle and/or its affiliates. XFS® is a registered trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. The OpenStack® Word Mark and OpenStack Logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community. All other trademarks are the property of their respective owners.
Rendered for Nokia. Please do not distribute.
Document Conventions vii Notes and Warnings ................................................................................................ vii Introduction ix Red Hat OpenStack Administration II ......................................................................... ix Orientation to the Classroom Environment ................................................................. x Internationalization ................................................................................................ xix 1. Managing an Enterprise OpenStack Deployment 1 Describing Undercloud and Overcloud Architectures .................................................... 2 Quiz: Describing Undercloud and Overcloud Architectures ............................................ 8 Describing Undercloud Components ......................................................................... 10 Guided Exercise: Describing Undercloud Components ................................................. 17 Verifying the Functionality of Overcloud Services ...................................................... 20 Guided Exercise: Verifying the Functionality of Overcloud Services .............................. 28 Lab: Managing an Enterprise OpenStack Deployment ................................................. 35 Summary ............................................................................................................... 41 2. Managing Internal OpenStack Communication 43 Describing the Identity Service Architecture ............................................................. 44 Quiz: Describing the Identity Service Architecture ...................................................... 51 Administering the Service Catalog ........................................................................... 53 Guided Exercise: Administering the Service Catalog ................................................... 57 Managing Message Brokering ................................................................................... 61 Guided Exercise: Managing Message Brokering ......................................................... 66 Lab: Managing Internal OpenStack Communication .................................................... 70 Summary .............................................................................................................. 76 3. Building and Customizing Images 77 Describing Image Formats ....................................................................................... 78 Quiz: Describing Image Formats .............................................................................. 80 Building an Image .................................................................................................. 82 Guided Exercise: Building an Image .......................................................................... 87 Customizing an Image ............................................................................................. 91 Guided Exercise: Customizing an Image .................................................................... 95 Lab: Building and Customizing Images .................................................................... 102 Summary ............................................................................................................. 109 4. Managing Storage 111 Describing Storage Options ..................................................................................... 112 Quiz: Describing Storage Options ............................................................................ 116 Configuring Ceph Storage ....................................................................................... 118 Guided Exercise: Configuring Ceph Storage .............................................................. 124 Managing Object Storage ....................................................................................... 128 Guided Exercise: Managing Object Storage .............................................................. 135 Lab: Managing Storage .......................................................................................... 138 Summary ............................................................................................................. 143 5. Managing and Troubleshooting Virtual Network Infrastructure Managing SDN Segments and Subnets .................................................................... Guided Exercise: Managing SDN Segments and Subnets ............................................ Tracing Multitenancy Network Flows ....................................................................... Guided Exercise: Tracing Multitenancy Network Flows ............................................... Troubleshooting Network Issues .............................................................................
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
145 146 156 164 184 197
v
Red Hat OpenStack Administration II Guided Exercise: Troubleshooting Network Issues ...................................................... 211 Lab: Managing and Troubleshooting Virtual Network Infrastructure ........................... 220 Summary ............................................................................................................. 232 6. Managing Resilient Compute Resources 233 Configuring an Overcloud Deployment ................................................................... 234 Guided Exercise: Configuring an Overcloud Deployment ........................................... 246 Scaling Compute Nodes ........................................................................................ 254 Guided Exercise: Scaling Compute Nodes ................................................................ 264 Migrating Instances using Block Storage ................................................................. 267 Guided Exercise: Migrating Instances using Block Storage ......................................... 273 Migrating Instances with Shared Storage ................................................................ 280 Guided Exercise: Migrating Instances with Shared Storage ........................................ 284 Lab: Managing Resilient Compute Resources ............................................................ 291 Summary ............................................................................................................. 302 7. Troubleshooting OpenStack Issues 303 Troubleshooting Compute Nodes ........................................................................... 304 Guided Exercise: Troubleshooting Compute Nodes ................................................... 309 Troubleshooting Authentication and Messaging ........................................................ 314 Guided Exercise: Troubleshooting Authentication and Messaging ................................ 318 Troubleshooting OpenStack Networking, Image, and Volume Services ........................ 322 Guided Exercise: Troubleshooting OpenStack Networking, Image, and Volume Services .............................................................................................................. 329 Lab: Troubleshooting OpenStack ............................................................................ 339 Summary ............................................................................................................ 349 8. Monitoring Cloud Metrics for Autoscaling 351 Describing OpenStack Telemetry Architecture ......................................................... 352 Quiz: Describing OpenStack Telemetry Architecture ................................................. 358 Analyzing Cloud Metrics for Autoscaling ................................................................. 360 Guided Exercise: Analyzing Cloud Metrics for Autoscaling .......................................... 371 Lab: Monitoring Cloud Metrics for Autoscaling ........................................................ 378 Summary ............................................................................................................ 388 9. Orchestrating Deployments 389 Describing Orchestration Architecture .................................................................... 390 Quiz: Describing Orchestration Architecture ............................................................ 394 Writing Heat Orchestration Templates .................................................................... 396 Guided Exercise: Writing Heat Orchestration Templates ............................................ 406 Configuring Stack Autoscaling ................................................................................ 418 Quiz: Configuring Stack Autoscaling ....................................................................... 427 Summary ............................................................................................................. 431
vi
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Document Conventions Notes and Warnings Note "Notes" are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should have no negative consequences, but you might miss out on a trick that makes your life easier.
Important "Important" boxes detail things that are easily missed: configuration changes that only apply to the current session, or services that need restarting before an update will apply. Ignoring a box labeled "Important" will not cause data loss, but may cause irritation and frustration.
Warning "Warnings" should not be ignored. Ignoring warnings will most likely cause data loss.
References "References" describe where to find external documentation relevant to a subject.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
vii
viii
Rendered for Nokia. Please do not distribute.
Introduction Red Hat OpenStack Administration II Red Hat OpenStack Administration II (CL210) is designed for system administrators who intend to implement a cloud computing environment using OpenStack. Students will learn how to configure, use, and maintain Red Hat OpenStack Platform. The focus of this course is managing OpenStack using the unified command-line interface, managing instances, and maintaining an enterprise deployment of OpenStack. Exam competencies covered in the course include: expand compute nodes on Red Hat OpenStack Platform using the undercloud (Red Hat OpenStack Platform director); manage images, networking, object storage, and block storage; provide orchestration and autoscaling (scale-out and scale-in); and build a customized image.
Objectives • Expand compute nodes on the overcloud. • Customize instances. • Troubleshoot individual services as well as OpenStack holistically. • Manage the migration of live instances. • Create templates and configure autoscaling of stacks.
Audience • Cloud administrators, cloud operators, and system administrators interested in, or responsible for, maintaining a private cloud.
Prerequisites • Red Hat Certified System Administrator (RHCSA in Red Hat Enterprise Linux) certification or equivalent experience. • Red Hat OpenStack Administration I (CL110) course or equivalent experience.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
ix
Introduction
Orientation to the Classroom Environment
Figure 0.1: CL210 classroom architecture Student systems share an external IPv4 network, 172.25.250.0/24, with a gateway of 172.25.250.254 (workstation.lab.example.com). DNS services for the private network are provided by 172.25.250.254. The OpenStack overcloud virtual machines share internal IPv4 networks, 172.24.X.0/24 and connect to the undercloud virtual machine and power interfaces on the 172.25.249.0/24 network. The networks used by instances include the 192.168.Y.0/24 IPv4 networks and allocate from 172.25.250.0/24 for public access. The workstation virtual machine is the only one that provides a graphical user interface. In most cases, students should log in to the workstation virtual machine and use ssh to connect to the other virtual machines. A web browser can also be used to log in to the Red Hat OpenStack Platform Dashboard web interface. The following table lists the virtual machines that are available in the classroom environment: Classroom Machines Machine name
IP addresses
Role
workstation.lab.example.com, workstationN.example.com
172.25.250.254, 172.25.252.N
Graphical workstation
director.lab.example.com
172.25.250.200, 172.25.249.200
Undercloud node
x
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
OpenStack Packages and Documentation Machine name
IP addresses
Role
power.lab.example.com
172.25.250.100, 172.25.249.100, 172.25.249.101+
IPMI power management of nodes
controller0.overcloud.example.com
172.25.250.1, 172.25.249.P, 172.24.X.1
Overcloud controller node
compute0.overcloud.example.com
172.25.250.2, 172.25.249.R, 172.24.X.2
Overcloud first compute node
compute1.overcloud.example.com
172.25.250.12, 172.25.249.S, 172.24.X.12
Overcloud additional compute node
ceph0.overcloud.example.com
172.25.250.3, 172.25.249.T, 172.24.X.3
Overcloud storage node
classroom.example.com
172.25.254.254, 172.25.252.254, 172.25.253.254
Classroom utility server
The environment runs a central utility server, classroom.example.com, which acts as a NAT router for the classroom network to the outside world. It provides DNS, DHCP, HTTP, and other content services to the student lab machines. It uses two alternative names, content.example.com and materials.example.com, to provide course content used in the hands-on exercises.
Note Access to the classroom utility server is restricted; shell access is unavailable.
System and Application Credentials System credentials
User name
Password
Unprivileged shell login
student
student
Privileged shell login
root
redhat
OpenStack Packages and Documentation Repositories suitable for package installation are available at http:// content.example.com/rhosp10.1/x86_64/dvd/. This URL also provides a docs subdirectory, containing a documentation snapshot in PDF format (docs/pdfs) and HTML format (docs/html).
Lab Exercise Setup and Grading Most activities use the lab command, executed on workstation, to prepare and evaluate exercises. The lab command takes two arguments: the activity's name and a verb of setup, grade, or cleanup.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
xi
Introduction • The setup verb is used at the beginning of an exercise or lab. It verifies that the systems are ready for the activity, possibly making some configuration changes to them. • The grade verb is executed at the end of a lab. It provides external confirmation that the activity's requested steps were performed correctly. • The cleanup verb can be used to selectively undo elements of the activity before moving on to later activities.
Instructor-Led Training (ILT) In an Instructor-Led Training classroom, students are assigned a physical computer (foundationX.ilt.example.com), to access the virtual machines running on that host. Students are automatically logged in to the host as user kiosk with the password redhat. Controlling the Virtual Machines On foundationX, the rht-vmctl command is used to work with the virtual machines. The rht-vmctl commands in the following table must be run as kiosk on foundationX, and can be used with controller0 (as in the examples) or any virtual machine. rht-vmctl Commands Action
Command
Start controller0 machine.
rht-vmctl start controller0
View physical console to log in and work with controller0 machine.
rht-vmctl view controller0
Reset controller0 machine to its previous state and restart the virtual machine. Caution: Any work generated on the disk will be lost.
rht-vmctl reset controller0
At the start of a lab exercise, if instructed to reset a single virtual machine node, then you are expected to run rht-vmctl reset nodename on the foundationX system as the kiosk user. At the start of a lab exercise, if instructed to reset all virtual machines, then run the rht-vmctl reset all command on the foundationX system as the kiosk user. In this course, however, "resetting all virtual machines" normally refers to resetting only the overcloud nodes and the undercloud node, as described in the following section.
Starting the Overcloud from a New Provision The course lab environment automatically starts only the foundation lab nodes workstation, power and director. If not yet started, then first start the course lab environment. Use the rht-vmctl command. [kiosk@foundationX ~]$ rht-vmctl start all
Wait sufficiently to ensure that all nodes have finished booting and initializing services. The rhtvmctl output displays RUNNING when the nodes are initialized, but is not an indication that the nodes have completed their startup procedures. When ready, open a workstation console to continue. Log in as student, password student. Confirm that the nova-compute service is running.
xii
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Starting the Overcloud from a New Provision
[student@workstation ~]$ ssh stack@director [stack@director ~]$ openstack compute service list +----+----------------+--------------------------+----------+---------+-------+ | ID | Binary | Host | Zone | Status | State | +----+----------------+--------------------------+----------+---------+-------+ | 1 | nova-cert | director.lab.example.com | internal | enabled | up | | 2 | nova-scheduler | director.lab.example.com | internal | enabled | up | | 3 | nova-conductor | director.lab.example.com | internal | enabled | up | | 4 | nova-compute | director.lab.example.com | nova | enabled | down | +----+----------------+--------------------------+----------+---------+-------+
Verify that the nova-compute service is up, or comes up within 60 seconds. Uncommonly, after environment resets, nova-compute can appear to remain in a down state. Restart novacompute to resolve this issue. Although the openstack-service restart nova-compute command works correctly, using the systemctl cpmmand may be faster because it is a lower level operating system request. Use of sudo, for root privilege, is required. [stack@director ~]$ sudo systemctl restart openstack-nova-compute [stack@director ~]$ openstack compute service list +----+----------------+--------------------------+----------+---------+-------+ | ID | Binary | Host | Zone | Status | State | +----+----------------+--------------------------+----------+---------+-------+ | 1 | nova-cert | director.lab.example.com | internal | enabled | up | | 2 | nova-scheduler | director.lab.example.com | internal | enabled | up | | 3 | nova-conductor | director.lab.example.com | internal | enabled | up | | 4 | nova-compute | director.lab.example.com | nova | enabled | up | +----+----------------+--------------------------+----------+---------+-------+
Do not continue until the nova-compute service is up. Determine whether the overcloud nodes are actually running, from the viewpoint of the hypervisor environment underneath the virtual machines, not from the viewpoint of the openstack server list. For a newly provisioned environment, the overcloud nodes will still be off, but it is recommended practice to always check. Use the rht-vmctl command. Are the overcloud nodes controller0, ceph0 and compute0 still DEFINED as expected? [kiosk@foundationX ~]$ rht-vmctl status all
Return to the director system and start each node using the openstack command. Under all normal circumstances, do not use rht-vmctl to start overcloud nodes! Include compute1 only when working in the chapter where the second compute node is built and used. In all other chapters, compute1 is powered off and ignored. [stack@director ~]$ openstack server list -c Name -c Status -c Networks +-------------------------+---------+------------------------+ | Name | Status | Networks | +-------------------------+---------+------------------------+ | overcloud-compute-0 | SHUTOFF | ctlplane=172.25.249.P | | overcloud-cephstorage-0 | SHUTOFF | ctlplane=172.25.249.Q | | overcloud-controller-0 | SHUTOFF | ctlplane=172.25.249.R | +-------------------------+---------+------------------------+ [stack@director ~]$ openstack server start overcloud-controller-0 [stack@director ~]$ openstack server start overcloud-cephstorage-0 [stack@director ~]$ openstack server start overcloud-compute-0
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
xiii
Introduction
Stopping Cleanly at the End of a Session When finished for the day or whenever you are done practicing for a while, you may shut down your course lab environment safely. Start by shutting down the overcloud nodes. [stack@director ~]$ openstack server list -c Name -c Status -c Networks +-------------------------+--------+------------------------+ | Name | Status | Networks | +-------------------------+--------+------------------------+ | overcloud-compute-0 | ACTIVE | ctlplane=172.25.249.P | | overcloud-cephstorage-0 | ACTIVE | ctlplane=172.25.249.Q | | overcloud-controller-0 | ACTIVE | ctlplane=172.25.249.R | +-------------------------+--------+------------------------+ [stack@director ~]$ openstack server stop overcloud-controller-0 [stack@director ~]$ openstack server stop overcloud-cephstorage-0 [stack@director ~]$ openstack server stop overcloud-compute-0
Wait until OpenStack has stopped the overcloud nodes, then shut down the rest of the environment. Use the rht-vmctl command to stop the remaining virtual machines. [kiosk@foundationX ~]$ rht-vmctl stop all
Starting After an Unclean Shutdown If your classroom system or environment was shutdown without using the clean shutdown procedure above, you may experience an environment where the OpenStack knowledge about the nodes does not match the physical or running status of the nodes. This is simple to determine and resolve. As with a clean startup, verify that the nova-compute service is up. Use the sudo systemctl command if necessary. Do not continue until the nova-compute service is up. [stack@director ~]$ sudo systemctl restart openstack-nova-compute [stack@director ~]$ openstack compute service list +----+----------------+--------------------------+----------+---------+-------+ | ID | Binary | Host | Zone | Status | State | +----+----------------+--------------------------+----------+---------+-------+ | 1 | nova-cert | director.lab.example.com | internal | enabled | up | | 2 | nova-scheduler | director.lab.example.com | internal | enabled | up | | 3 | nova-conductor | director.lab.example.com | internal | enabled | up | | 4 | nova-compute | director.lab.example.com | nova | enabled | up | +----+----------------+--------------------------+----------+---------+-------+
At this point, it is expected that the overcloud nodes are not running yet, because the course lab environment only auto-starts workstation, power and director. Check using the rhtvmctl. [kiosk@foundationX ~]$ rht-vmctl status all
This is an important step The node status at the hypervisor level determines the correct command to use from director such that both the hypervisor and director agree about the overcloud nodes’ state. Return to director and determine the overcloud node status from an OpenStack viewpoint.
xiv
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Checking the Health of the Overcloud Environment
[stack@director ~]$ openstack server list
Use the following table to determine the correct command to use to either start or synchronize the state or the overcloud nodes. The hypervisor state is down the left column. The OpenStack state is along the top row. Action choice depends on the hypervisor and OpenStack states SHUTOFF
ACTIVE
DEFINED
openstack server start
openstack server reboot
RUNNING
nova reset-state --active
No action needed
The exception to the rule, for crtitical scenarios Starting overcloud nodes from director physically starts nodes at the hypervisor level. On rare occasions, nodes appear to start in OpenStack, but remain DEFINED in the rht-vmctl status. To resolve, use rht-vmctl start for each affected node. This is also the resolution for hung or unresponsive nodes. Resolve using rht-vmctl poweroff. Once the node powers off, use rht-vmctl start to boot only the affected node.
Checking the Health of the Overcloud Environment The overcloud-health-check script is provided for the ability to check the general health of the course overcloud environment at any time. This script is invoked, with a prompt that allows skipping, at the beginning of every exercise setup script. Invoke this script manually at any time to verify the overcloud. [student@workstation ~]$ lab overcloud-health-check setup Checking the health of the overcloud: This script's initial task thoroughly validates the overcloud environment," taking a minute or more, but checking is not required before each exercise." If you are without overcloud problems, with a stable environment, say (n)o." Pressing 'Enter' or allowing the 20 second timeout will default to (n)o." You should *always* say (y)es if any of the following conditions are true:" - You have just reset the overcloud nodes using "rht-vmctl reset" in ILT." - You have just reset the overcloud nodes from the Online course dashboard." - You have restarted or rebooted overcloud nodes or any critical services." - You suspect your environment has a problem and would prefer to validate." [?] Check the overcloud environment? (y|N) Verifying overcloud nodes · · · · · · · · ·
Retrieving state for overcloud-compute-0.................... Retrieving state for overcloud-cephstorage-0................ Retrieving state for overcloud-controller-0................. Waiting for overcloud-compute-0 to be available............. Waiting for overcloud-cephstorage-0 to be available......... Waiting for overcloud-controller-0 to be available.......... Verifying ceph0 access...................................... Starting ceph0 disk arrays and restarting ceph.target....... Verifying ceph0 service, please wait........................
SUCCESS SUCCESS SUCCESS SUCCESS SUCCESS SUCCESS SUCCESS SUCCESS SUCCESS
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
xv
Introduction · Checking RabbitMQ (5m timer)................................ SUCCESS · Ensuring the Downloads directory exists..................... SUCCESS · Ensuring OpenStack services are running, please wait........ SUCCESS
Ceph Command Sumnmary As a general troubleshooting technique, these commands can be perfomed at any time if the Ceph services are found to be down or unresponsive. These commands are also built into the overcloud-health-check and are performed when that script is run. [student@workstation ~]$ ssh stack@director [stack@director ~]$ ssh heat-admin@ceph0 [heat-admin@overcloud-ceph-storage-0 ~]$ systemctl list-units ceph\* UNIT LOAD ACTIVE SUB DESCRIPTION [email protected] loaded active running Ceph object storage daemon [email protected] loaded active running Ceph object storage daemon [email protected] loaded active running Ceph object storage daemon ceph-mon.target loaded active running ceph target allowing to start... ceph-osd.target loaded active running ceph target allowing to start... ceph-radosgw.target loaded active running ceph target allowing to start... ceph.target loaded active running ceph target allowing to start... ... output omitted ...
You should see three ceph-osd@# services. If these services do not exist at all, then the systemd services that were to create the OSD services for each disk device did not complete successfully. In this scenario, manually create the OSDs by starting these device services: [heat-admin@overcloud-ceph-storage-0 [heat-admin@overcloud-ceph-storage-0 [heat-admin@overcloud-ceph-storage-0 [heat-admin@overcloud-ceph-storage-0 [heat-admin@overcloud-ceph-storage-0 [heat-admin@overcloud-ceph-storage-0
~]$ ~]$ ~]$ ~]$ ~]$ ~]$
sudo sudo sudo sudo sudo sudo
systemctl systemctl systemctl systemctl systemctl systemctl
start start start start start start
ceph-disk@dev-vdb1 ceph-disk@dev-vdb2 ceph-disk@dev-vdc1 ceph-disk@dev-vdc2 ceph-disk@dev-vdd1 ceph-disk@dev-vdd2
These ceph-disk services will complete and then exit when their corresponding OSD service is created. If the ceph-disk services exist in a failed state, then an actual problem exists with the physical or virtual storage devices used as the ceph storage: /dev/vdb, /dev/vdc, and /dev/vdd. If the ceph-osd@# services exist in a failed state, they can usually fixed by restarting them. [heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph-dis@0 [heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph-disk@1 [heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph-disk@2
The above three commands are equivalent to the single command below. Target services are designed to simplify starting sets of services or for declaring the services that represent a functional state. After starting the OSDs, use the ceph -s command to verify that ceph has a status of HEALTH_OK. [heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph.target0 [heat-admin@overcloud-ceph-storage-0 ~]$ sudo ceph -s cluster 8b57b9ee-a257-11e7-bac9-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 5, quorum 0 overcloud-controller-0 osdmap e47: 3 osds: 3 up, 3 in
xvi
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
To Reset Your Environment flags sortbitwise,require_jewel_osds pgmap v1004: 224 pgs, 6 pools, 4751 MB data, 1125 objects 5182 MB used, 53152 MB / 58334 MB avail 224 active+clean
To Reset Your Environment Critical concept When the CL210 coursebook instructs to reset virtual machines, the intention is to reset only the overcloud to an initial state. Unless something else is wrong with any physical system or online environment that is deemed unfixable, there is no reason to reset all virtual machines or to reprovision a new lab environment. What "resetting the overcloud" means Whether you are working in a physical or online environment, certain systems never need to be reset, because they remain materially unaffected by exercises and labs. This table lists the systems never to be reset and those intending to be reset as a group during this course: Which systems normally should or should not be reset never to be reset
always reset as a group
classroom
controller0
workstation
compute0
power
compute1 ceph0 director
Technically, the director system is the undercloud. However, in the context of "resetting the overcloud", director must be included because director's services and databases are full of control, management and monitoring information about the overcloud it is managing. Therefore, to reset the overcloud without resetting director is to load a fresh overcloud with director still retaining stale information about the previous overcloud just discarded. In a physical clkassroom, use the rht-vmctl command to reset only the relevant nodes. Although you can type one rht-vmctl command per node, which is tedious, there is an interactive option to choose which nodes to reset and which nodes to skip. Don't forget the -i option or else you will inadvertently reset all of your virtual machines. While not catastrophic, it can be an annoying time-waster. [kiosk@foundation ~]$ rht-vmctl reset -i all Are you sure you want to rest workstation? (y/n) n Are you sure you want to rest director? (y/n) y Are you sure you want to rest controller0? (y/n) y Are you sure you want to rest compute0? (y/n) y Are you sure you want to rest compute1? (y/n) n Are you sure you want to rest ceph0? (y/n) y Are you sure you want to rest power? (y/n) n Powering off director. Resetting director. Creating virtual machine disk overlay for cl210-director-vda.qcow2 Starting director.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
xvii
Introduction Powering off controller0. Resetting controller0. Creating virtual machine disk Powering off compute0. Resetting compute0. Creating virtual machine disk Powering off ceph0. Resetting ceph0. Creating virtual machine disk Creating virtual machine disk Creating virtual machine disk Creating virtual machine disk
overlay for cl210-controller0-vda.qcow2
overlay for cl210-compute0-vda.qcow2
overlay overlay overlay overlay
for for for for
cl210-ceph0-vda.qcow2 cl210-ceph0-vdb.qcow2 cl210-ceph0-vdc.qcow2 cl210-ceph0-vdd.qcow2
The director node is configured to start automatically, while the overcloud nodes are configured to not start automatically. This is the same behavior as a newly provisioned lab environment. Give director sufficient time to finish booting and initializing services, then ssh to director to complete the normal overcloud nodes startup tasks. [student@workstation ~]$ ssh stack@director [stack@director ~]$ openstack compute service list [stack@director ~]$ openstack server list [stack@director ~]$ openstack server start overcloud-controller-0 [stack@director ~]$ openstack server start overcloud-cephstorage-0 [stack@director ~]$ openstack server start overcloud-compute-0 [stack@director ~]$ openstack server start overcloud-compute-1
Wait sufficiently to allow overcloud nodes to finish booting and initializing services. Then use the health check script to validate the overcloud lab environment. [stack@director ~]$ exit [student@workstation ~]$ lab overcloud-health-check setup
What if "resetting the overcloud" does not result in a stable anvironment? Resettting the covercloud properly always creates a stable environment. There could be further technical issues outside of simple control in a physical system or the online environment. It is possible to have improper disks, foundation domain configuration, images, blueprints, or virtual volumes, or damage caused by misconfiguration, typing mistakes, misuse of setup scripts, or neglect to use cleanup scripts. To reset everything, if deemed necessary, takes time but results in a fresh environment. Use rht-vmctl fullreset to pull down and start clean disk images from the classroom system. [kiosk@foundationX ~]$ rht-vmctl fullreset all
After the environment is re-provisioned, start again with the instructions for a new environment.
xviii
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Internationalization
Internationalization Language support Red Hat Enterprise Linux 7 officially supports 22 languages: English, Assamese, Bengali, Chinese (Simplified), Chinese (Traditional), French, German, Gujarati, Hindi, Italian, Japanese, Kannada, Korean, Malayalam, Marathi, Odia, Portuguese (Brazilian), Punjabi, Russian, Spanish, Tamil, and Telugu.
Per-user language selection Users may prefer to use a different language for their desktop environment than the systemwide default. They may also want to set their account to use a different keyboard layout or input method. Language settings In the GNOME desktop environment, the user may be prompted to set their preferred language and input method on first login. If not, then the easiest way for an individual user to adjust their preferred language and input method settings is to use the Region & Language application. Run the command gnome-control-center region, or from the top bar, select (User) > Settings. In the window that opens, select Region & Language. The user can click the Language box and select their preferred language from the list that appears. This will also update the Formats setting to the default for that language. The next time the user logs in, these changes will take full effect. These settings affect the GNOME desktop environment and any applications, including gnometerminal, started inside it. However, they do not apply to that account if accessed through an ssh login from a remote system or a local text console (such as tty2).
Note A user can make their shell environment use the same LANG setting as their graphical environment, even when they log in through a text console or over ssh. One way to do this is to place code similar to the following in the user's ~/.bashrc file. This example code will set the language used on a text login to match the one currently set for the user's GNOME desktop environment: i=$(grep 'Language=' /var/lib/AccountService/users/${USER} \ | sed 's/Language=//') if [ "$i" != "" ]; then export LANG=$i fi
Japanese, Korean, Chinese, or other languages with a non-Latin character set may not display properly on local text consoles.
Individual commands can be made to use another language by setting the LANG variable on the command line: [user@host ~]$ LANG=fr_FR.utf8 date
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
xix
Introduction jeu. avril 24 17:55:01 CDT 2014
Subsequent commands will revert to using the system's default language for output. The locale command can be used to check the current value of LANG and other related environment variables. Input method settings GNOME 3 in Red Hat Enterprise Linux 7 automatically uses the IBus input method selection system, which makes it easy to change keyboard layouts and input methods quickly. The Region & Language application can also be used to enable alternative input methods. In the Region & Language application's window, the Input Sources box shows what input methods are currently available. By default, English (US) may be the only available method. Highlight English (US) and click the keyboard icon to see the current keyboard layout. To add another input method, click the + button at the bottom left of the Input Sources window. An Add an Input Source window will open. Select your language, and then your preferred input method or keyboard layout. Once more than one input method is configured, the user can switch between them quickly by typing Super+Space (sometimes called Windows+Space). A status indicator will also appear in the GNOME top bar, which has two functions: It indicates which input method is active, and acts as a menu that can be used to switch between input methods or select advanced features of more complex input methods. Some of the methods are marked with gears, which indicate that those methods have advanced configuration options and capabilities. For example, the Japanese Japanese (Kana Kanji) input method allows the user to pre-edit text in Latin and use Down Arrow and Up Arrow keys to select the correct characters to use. US English speakers may find also this useful. For example, under English (United States) is the keyboard layout English (international AltGr dead keys), which treats AltGr (or the right Alt) on a PC 104/105-key keyboard as a "secondary-shift" modifier key and dead key activation key for typing additional characters. There are also Dvorak and other alternative layouts available.
Note Any Unicode character can be entered in the GNOME desktop environment if the user knows the character's Unicode code point, by typing Ctrl+Shift+U, followed by the code point. After Ctrl+Shift+U has been typed, an underlined u will be displayed to indicate that the system is waiting for Unicode code point entry. For example, the lowercase Greek letter lambda has the code point U+03BB, and can be entered by typing Ctrl+Shift+U, then 03bb, then Enter.
System-wide default language settings The system's default language is set to US English, using the UTF-8 encoding of Unicode as its character set (en_US.utf8), but this can be changed during or after installation. From the command line, root can change the system-wide locale settings with the localectl command. If localectl is run with no arguments, it will display the current system-wide locale settings.
xx
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Language packs To set the system-wide language, run the command localectl set-locale LANG=locale, where locale is the appropriate $LANG from the "Language Codes Reference" table in this chapter. The change will take effect for users on their next login, and is stored in /etc/ locale.conf. [root@host ~]# localectl set-locale LANG=fr_FR.utf8
In GNOME, an administrative user can change this setting from Region & Language and clicking the Login Screen button at the upper-right corner of the window. Changing the Language of the login screen will also adjust the system-wide default language setting stored in the /etc/ locale.conf configuration file.
Important Local text consoles such as tty2 are more limited in the fonts that they can display than gnome-terminal and ssh sessions. For example, Japanese, Korean, and Chinese characters may not display as expected on a local text console. For this reason, it may make sense to use English or another language with a Latin character set for the system's text console. Likewise, local text consoles are more limited in the input methods they support, and this is managed separately from the graphical desktop environment. The available global input settings can be configured through localectl for both local text virtual consoles and the X11 graphical environment. See the localectl(1), kbd(4), and vconsole.conf(5) man pages for more information.
Language packs When using non-English languages, you may want to install additional "language packs" to provide additional translations, dictionaries, and so forth. To view the list of available langpacks, run yum langavailable. To view the list of langpacks currently installed on the system, run yum langlist. To add an additional langpack to the system, run yum langinstall code, where code is the code in square brackets after the language name in the output of yum langavailable.
References locale(7), localectl(1), kbd(4), locale.conf(5), vconsole.conf(5), unicode(7), utf-8(7), and yum-langpacks(8) man pages Conversions between the names of the graphical desktop environment's X11 layouts and their names in localectl can be found in the file /usr/share/X11/xkb/rules/ base.lst.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
xxi
Introduction
Language Codes Reference Language Codes Language
$LANG value
English (US)
en_US.utf8
Assamese
as_IN.utf8
Bengali
bn_IN.utf8
Chinese (Simplified)
zh_CN.utf8
Chinese (Traditional)
zh_TW.utf8
French
fr_FR.utf8
German
de_DE.utf8
Gujarati
gu_IN.utf8
Hindi
hi_IN.utf8
Italian
it_IT.utf8
Japanese
ja_JP.utf8
Kannada
kn_IN.utf8
Korean
ko_KR.utf8
Malayalam
ml_IN.utf8
Marathi
mr_IN.utf8
Odia
or_IN.utf8
Portuguese (Brazilian)
pt_BR.utf8
Punjabi
pa_IN.utf8
Russian
ru_RU.utf8
Spanish
es_ES.utf8
Tamil
ta_IN.utf8
Telugu
te_IN.utf8
xxii
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 1
MANAGING AN ENTERPRISE OPENSTACK DEPLOYMENT Overview Goal
Manage the Undercloud, the Overcloud, and related services.
Objectives
• Describe the Undercloud architecture and the Overcloud architecture. • Describe the Undercloud components used for building the Overcloud. • Verify the functionality of the Undercloud and the Overcloud services.
Sections
• Describing Undercloud and Overcloud Architectures (and Quiz) • Describing Undercloud Components (and Guided Exercise) • Verifying the Functionality of Undercloud and Overcloud Services (and Guided Exercise)
Lab
• Managing an Enterprise OpenStack Deployment
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
1
Chapter 1. Managing an Enterprise OpenStack Deployment
Describing Undercloud and Overcloud Architectures Objectives After completing this section, students should be able to: • Describe the OpenStack overcloud architecture and terminology. • Describe the OpenStack undercloud architecture and terminology. • Describe the benefits of using OpenStack to install OpenStack.
Introducing Red Hat OpenStack Platform The Red Hat OpenStack Platform consists of interacting components implemented as services that control computing, storage, and networking resources. Cloud administrators manage their infrastructure to configure, control, and automate the provisioning and monitoring of OpenStack resources. Figure 1.1: OpenStack core components provides an overview of the OpenStack architecture as presented in the prerequisite OpenStack Administration I (CL110) course.
Figure 1.1: OpenStack core components The following table reviews the OpenStack core services. Together, these components provide the services necessary to deploy either tenant workload systems or OpenStack infrastructure systems.
2
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introducing Red Hat OpenStack Platform OpenStack Core Component Services Service
Function
Dashboard
Provides a modular, graphical user interface to manage OpenStack. It can launch server instances, configure networking topology, set rolebased access controls, provision persistent and ephemeral storage, monitor run-time metrics, and organize projects and users.
Identity
Provides user authentication and authorization to OpenStack components. Identity supports multiple authentication mechanisms, including user name and password credentials, tokens, and other authentication protocols. As the central user and service account catalog, Identity acts as a single sign-on (SSO) for command line and graphical end user activity and the inter-component service API.
OpenStack Networking Provides the creation and management of a virtual networking infrastructure in an OpenStack cloud, including networks, subnets, routers, firewalls, and virtual private networks (VPN). Designed as a pluggable architecture, OpenStack Networking supports multiple vendors and networking technologies. Block Storage
Provides persistent block storage and management to create and delete virtual disk devices, and to attach and detach server instance block devices. It also manages snapshots, backups, and boot functionality.
Compute
Provides and schedules on-demand virtual machines deployed and run on preconfigured compute nodes operating on nested virtualization or bare metal hardware. The Compute service scales by adding additional virtualization resources, such as hypervisor hosts utilizing libvirtd, Qemu, and KVM technologies.
Image Storage
Provides a registry service for virtual disk images, storing prebuilt images, system snapshots, and vendor-supplied appliances for retrieval and use as templates to deploy server instances and applications.
Object Storage
Provides HTTP-accessible, redundant, distributed, and replicated storage for large amounts of data, including static entities such as pictures, videos, email messages, files, disk images, and backups.
Telemetry
Provides central collection, storage and retrieval for user-level usage metrics on OpenStack clouds. Data is collected from component-aware agent notifications or infrastructure polling, used for alerting, system monitoring, customer billing, and implementing advanced features such as auto scaling.
Orchestration
Provides a template-based methodology for creating and managing OpenStack cloud storage, networking and compute resources. A heat orchestration template (HOT) defines a collection of resources, known as a stack, to be provisioned and deployed as a single, repeatable, running entity. In addition to recognizing essential resource types such as server instances, subnets, volumes, security groups, and floating IPs, templates provide additional configuration for advanced functionality, such as high availability, auto-scaling, authentication, and nested stacks.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
3
Chapter 1. Managing an Enterprise OpenStack Deployment The OpenStack core components provide a comprehensive set of services to provision end user cloud workloads consisting of deployed server instances organized by tenant projects. With orchestration, arrangements of complex multi-server applications have become easy to define and deploy with push-button simplicity. Still, the installation and management of OpenStack cloud infrastructure itself has remained difficult to master and maintain, until the introduction of Red Hat OpenStack Platform (RHOSP) director. The RHOSP director is a standalone OpenStack all-in-one installation, providing a tool set for installing and managing a complete OpenStack infrastructure environment. It is based primarily on the OpenStack Deployment component developed in the TripleO project, which is an abbreviation for "OpenStack-On-OpenStack". The Deployment service uses OpenStack components running on the dedicated all-in-one installation (the undercloud) to install an operational OpenStack cloud (the overcloud), utilizing extended core components, plus new components, to locate, provision, deploy and configure bare metal systems as OpenStack controller, compute, networking and storage nodes. The following table describes the OpenStack deployment component services. OpenStack Component Services for OpenStack-On-OpenStack Service
Function
Orchestration for TripleO
Provides a set of YAML-based templates to define configuration and provisioning instructions to deploy OpenStack infrastructure servers. Orchestration, defined previously as a core component, defines server roles to provision OpenStack infrastructure.
Bare Metal Provisioning
Enables provisioning server instance deployments to physical (bare metal) machines using hardware-specific drivers. Bare Metal Provisioning integrates with the Compute service to provision the bare metal machines in the same way as virtual machines, first introspecting the physical machines to obtain hardware attributes and configuration.
Workflow
Managed by the Mistral workflow service. A user typically writes a workflow using workflow language based on YAML and uploads the workflow definition to Mistral with its REST API. Then user can start this workflow manually using the same API, or configure a trigger to start the workflow on some event. Provides a set of workflows for certain RHOSP director-specific actions, such as importing and deploying plans.
Messaging
Provides a secure and scalable messaging service for providing asynchronous communication for intra-cloud applications. Other OpenStack components integrate with Messaging to provide functional equivalence to third-party Simple Queue Service (SQS) and Simple Notification Service (SNS) services. Messaging provides the communication for the Workflow service.
Deployment
Provides a tool set for installing, upgrading and operating OpenStack clouds using OpenStack components and methods.
Introducing the Undercloud and Overcloud The Red Hat OpenStack Platform uses a pair of terms to distinguish between the standalone RHOSP director cloud used to deploy and manage production clouds, and the production cloud or clouds used to deploy and manage end-user production workloads: undercloud and overcloud.
4
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introducing the Undercloud and Overcloud
Figure 1.2: An undercloud deploys an overcloud The undercloud is the Red Hat OpenStack Platform director machine itself, plus the provisioning network and resources required to perform undercloud tasks. During the building process for the overcloud, the machine nodes being provisioned to become controller, compute, network, and storage systems are considered to be the workload of the undercloud. When deployment and all configuration stages are complete, these nodes reboot to become the overcloud. The overcloud is a Red Hat OpenStack Platform environment resulting from a template configuration deployed from the undercloud. Prior to the introduction of the undercloud, any similar Red Hat OpenStack Platform environment would have simply been called the cloud. Using the terms undercloud and overcloud provides a distinction between the two Red Hat OpenStack Platform installations. Each cloud has a complete set of component services, endpoints, authentication, and purpose. To access and manage the undercloud, connect to the Identity service endpoint of the RHOSP director system. To access and manage the overcloud, connect to the Identity service endpoint on a controller system in the overcloud. Stated again: the undercloud installs the overcloud. However, the undercloud is not only an installation tool set. It is a comprehensive platform for managing, monitoring, upgrading, scaling and deleting overclouds. Currently, the undercloud supports deploying and managing a single overcloud. In the future, the undercloud will allow an administrator to deploy and manage many tenant overclouds. What about Packstack? The Puppet-based Packstack installer was the original tool for effective installations of Red Hat OpenStack Platform. Packstack is deprecated, and will be discontinued in a future Red Hat
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
5
Chapter 1. Managing an Enterprise OpenStack Deployment OpenStack Platform release. Packstack is no longer the preferred tool for common cloud installations, but remains useful for limited use cases. Packstack was an internal tool developed to create proof-of-concept (POC) deployments of one or possibly a few systems. First-adopter RHOSP clients and analysts popularized it, and some have pushed the tool beyond recommended use. Compared to RHOSP director, there are advantages and disadvantages: Packstack advantages and disadvantages Advantages
Disadvantages
Easy to use command-line interface
Command line only, no GUI, no web UI
Permits installations to preinstalled hosts
Requires preinstalled hosts, no bare metal
Puppet-driven configuration is powerful
Requires Puppet mastery to extend
One install host drives multi-host deployment
Does not scale well to larger deployments
Permits limited changes by rerunning tool
Not guaranteed safe or idempotent
Simple to use, debugged through log files
No workflow or orchestration
Single controller POC installation
No multiple controller or HA installs
Single customizable answer file
Complex configurations are difficult
Installs controller and compute nodes
No custom roles, no storage nodes
Simple configure-and-run implementation
No validation, no deploy monitoring
Single interface implementation
No composable roles
Installation only
No upgrading, monitoring or management
Undercloud Recommended Practices Red Hat OpenStack Platform director is a tool used to install and manage the deployment and lifecycle of Red Hat OpenStack Platform 7 (Kilo) and later versions. It is targeted for cloud operator use cases where managed updates, upgrades and infrastructure control are critical for underlying OpenStack operations. It also provides an API driven framework providing hardware introspection, environment monitoring, capacity planning, utilization metrics, service allocation and stack management. Lifecycle management for cloud infrastructure has operational tasks similar to legacy enterprise management, but also incorporates new interpretations of Continuous Integration and (DevOps). The cloud industry differentiates stages of lifecycle management by categorizing tasks as Day 0 (Planning), Day 1 (Deploying), and Day 2 (Operations). • Planning - introspection, network topology, service parameters, resource capacity. • Deployment - deployment orchestration, service configuration, sanity checks, testing. • Operations - updates and upgrades, scaling up and down, change management, compliance. As a Day 0 Planning tool, director provides default, customizable configuration files to define cloud architecture, including networking and storage topologies, OpenStack service parameters, and third party plugin integration. These default files and templates implement Red Hat's highly available reference architecture and recommended practices. Director is most commonly recognized as a Day 1 Deployment tool, performing orchestration, configuration and validation for building overclouds. Tasks include hardware preparation, software deployment using Puppet manifests and Tempest validation scripts, making it easier on operators to learn and implement customizations within the director framework as a recommended practice for consistency and re-usability.
6
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Undercloud Recommended Practices Director is designed as central management for ongoing Day 2 Operations. It can perform environment health checks, auto-scale an overcloud by adding or replacing nodes, minor release updates and major version upgrades, plus patching, monitoring and regulation compliance. To use the overcloud for Day 2 management, all management must be accomplished using the undercloud CLI or APIs. Currently, there is no reasonable expectation that the undercloud can detect, interpret, or reconcile manual changes not implemented through the undercloud. Using outside tool sets loses the ability to perform safe and predictable updates, upgrades, and scaling. Integration with third party tools that exclusively call undercloud APIs is recommended, and does not break Day 2 operation support. Recommended examples include integration between the undercloud and Red Hat CloudForms, Red Hat Satellite, and Ansible Tower by Red Hat. The undercloud uses a variety of popular and stable OpenStack components to provide required services, including the Deployment Service for image deployment, creation, and environment templating, Bare Metal for bare metal introspection, Orchestration for component definition, ordering, and deployment, and Puppet for post-instantiation configuration. The undercloud includes tools that help with hardware testing, and is architected to facilitate future functionality for automated OpenStack upgrades and patch management, centralized log collection, and problem identification. Overcloud nodes are deployed from the undercloud machine using a dedicated, isolated provisioning network. Overcloud nodes must be configured to PXE boot on this provisioning network, with network booting on other NICs disabled. These nodes must also support the Intelligent Platform Management Interface (IPMI). Each candidate system needs to have a single NIC on the provisioning network. This NIC must not be used for remote connectivity, because the deployment process will reconfigure NICs for Open vSwitch bridging. Minimal information must be gathered about candidate nodes before beginning deployment configuration, including the MAC address of the appropriate provisioning NIC, the IP address of the IPMI NIC, the IPMI user name and password. Later in this course, you will view and learn the undercloud configuration used to build the classroom overcloud on your student system. No previous undercloud knowledge is required, but it is recommended to become proficient with the technologies mentioned in this section before using the undercloud to deploy and manage a production environment.
References Further information is available about RHOSP Director at Red Hat OpenStack Platform Director Life Cycle https://access.redhat.com/support/policy/updates/openstack/platform/director TripleO Architecture https://docs.openstack.org/tripleo-docs/latest/install/introduction/architecture.html TripleO documentation https://docs.openstack.org/tripleo-docs/latest/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
7
Chapter 1. Managing an Enterprise OpenStack Deployment
Quiz: Describing Undercloud and Overcloud Architectures Choose the correct answer(s) to the following questions: 1.
Which tool is recommended for all production Red Hat OpenStack Platform installs? a. b. c. d. e.
2.
Which four of these components are services of the undercloud? (Choose four.) a. b. c. d. e. f.
3.
Data Processing Deployment Bare Metal Database Orchestration Workflow
Which four of these capabilities are part of the undercloud's duties? (Choose four.) a. b. c. d. e.
8
The overcloud Foreman Packstack RHOSP director (undercloud) Manual package install
Application scaling Automated upgrades Patch management Central log collection Monitoring
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution Choose the correct answer(s) to the following questions: 1.
Which tool is recommended for all production Red Hat OpenStack Platform installs? a. b. c. d. e.
2.
Which four of these components are services of the undercloud? (Choose four.) a. b. c. d. e. f.
3.
The overcloud Foreman Packstack RHOSP director (undercloud) Manual package install
Data Processing Deployment Bare Metal Database Orchestration Workflow
Which four of these capabilities are part of the undercloud's duties? (Choose four.) a. b. c. d. e.
Application scaling Automated upgrades Patch management Central log collection Monitoring
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
9
Chapter 1. Managing an Enterprise OpenStack Deployment
Describing Undercloud Components Objectives After completing this section, students should be able to: • Describe the OpenStack components performing undercloud services. • Describe the technologies that implement bare metal deployment. • Start the overcloud from the undercloud.
Undercloud Services Red Hat OpenStack Platform director is a deployment cloud for OpenStack infrastructure, in which the cloud workload is the overcloud systems themselves: controllers, compute nodes, and storage nodes. Since infrastructure nodes are commonly built directly on physical hardware systems, the undercloud may be referred to as a bare metal cloud. However, as you will experience in this course, an undercloud can deploy infrastructure to virtual systems for learning, testing, and specific use cases. Similarly, overclouds almost exclusively deploy virtual machines and containers but can be used to deploy tenant workloads directly to dedicated, physical systems, such as blade servers or enterprise rack systems, by incorporating bare metal drivers and methods. Therefore, the terms bare metal cloud and tenant workload cloud are only a convenient frame of reference. Deployment Service Architecture The Deployment Service is an architecture designed to use native OpenStack component APIs to configure, deploy, and manage OpenStack environments using other existing, supported OpenStack components. By utilizing the technology of other current projects, the Deployment Service developers can focus on creating additional technology required to manage the deployment process instead of attempting to reimplement services already provided by other components. When these other components receive feature requests, patches and bug fixes, the undercloud automatically inherits these enhancements. System administrators will find the Deployment Service architecture relatively easy to learn, because they are already experienced with the standard OpenStack components that it uses. For example, the Deployment Service: • stores its images in the Image service. • creates Heat templates for resource deployment by the Orchestration service. • obtains physical machine configuration using the Bare Metal service. • performs complex post-deployment configuration using Puppet manifests. • manages task interaction and prerequisite ordering using the Workflow service. • configures network interfaces using the Networking service. • obtains provisioning volumes from the Block Storage service. The Deployment service generates the data required to instruct subordinate services to perform deployment and installation tasks. It comes preconfigured with custom configurations and sample templates for common deployment scenarios. The following table describes the primary concepts and tasks being introduced in the Deployment service.
10
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Undercloud Services Deployment Service Terminology Term
Definition
Bare metal provisioning
Building a node starting with a machine that has no operating system installed, or re-purposing an existing system by replacing its boot disk and configuration. Bare metal provisioning methodology works for building virtual machine nodes from scratch, as is demonstrated in this course.
Introspection
Discovering the attributes of physical or virtual nodes to determine whether they meet deployment requirements. This process requires PXE network booting candidate nodes using prebuilt images designed to query IPMI attributes and communicate the information back to a database-recoding service on the undercloud.
overcloud-full
A prebuilt boot-disk image containing an unconfigured but completely installed set of OpenStack and Ceph software packages. Used to create overcloud nodes quickly.
Orchestration
Deploying the foundation configuration to the overcloud nodes from predefined overcloud role templates tailored for a specific use case by additional environment files.
High availability
Controller nodes can be built with redundancy by creating more than one using Pacemaker clustering to provide failover for each component service between the controller nodes. Compute nodes, by design, are already redundantly scalable.
Deployment roles
The Deployment service comes preconfigured with a handful of well-defined and customizable overcloud node deployment roles: Controller API service node, Compute hypervisor node, CephStorage Ceph block (RADOS) and object (RGW) storage node, BlockStorage block (Cinder) storage node, and ObjectStorage object (Swift) storage node. Node roles may be allocated by manual tagging, or by configuring automated detection using the Automated Health Check (AHC) tools.
Composable services
A pattern-based design architecture for OpenStack node roles, allowing custom service placement, collocation, and new service integration beyond the five predefined deployment roles.
Workflow
A predeployment plan generated to manage task ordering and inter-communication. Workflow allows administrators to monitor the provisioning process, troubleshoot, customize, and restart provisioning tasks.
Orchestration Service (Heat) The orchestration service provides a template-based engine for the undercloud, used to create and manage resources such as storage, networking, instances, and applications as a repeatable running environment. The default Heat templates are located at /usr/share/openstacktripleo-heat-templates. Templates create stacks. Stacks are collections of resources such as server instances, virtual disk volumes, fixed and floating IP addresses, users, projects, and configuration files. The packaged templates include working examples of multiple configurations for tailoring a custom infrastructure stack. The following table describes the primary concepts and entities of the Orchestration service templates.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
11
Chapter 1. Managing an Enterprise OpenStack Deployment Orchestration Terminology Term
Definition
Resources
A template section that defines infrastructure elements to deploy, such as virtual machines, network ports, and storage disks.
Parameters
A template section to define deployment-specific parameter settings provided to satisfy template resource requirements. Most templates define default parameters for all settings.
Outputs
Output parameters dynamically generated during deployment and specified as information required to be passed back to the administrator. For example, public IP addresses, instance names, and other deployment results.
Template directory
A location for storing and invoking modified templates, allowing the default templates to remain unmodified and reusable.
Environment directory
A location for storing environment files. Environment files are specific to a deployment event, containing parameter settings defining this particular deployment. The design allows a specific overcloud design to be reused with new resource names and settings, without modify underlying templates. Environment files affect the runtime behavior of a template, overriding resource implementations and parameters.
An overcloud deployment is invoked by specifying a template directory and a location for the environment files: [user@undercloud]$ openstack overcloud deploy \ --templates /my_template_dir --environment_directory /my_environment_files_dir
Bare Metal Service (Ironic) The Bare Metal provisioning service first performs introspection on each candidate node, to query and record node-specific hardware capabilities and configuration. It also provides a mechanism, through the use of PXE and iSCSI, to install a boot disk image on to a qualified node, as depicted in the Figure 6.1: Bare Metal boot disk provisioning image later in this course. The default image is called overcloud-full, which has a full set of OpenStack and Ceph software packages preinstalled on it, ready to be configured as any of the overcloud deployment roles. A correctly built, custom image may also be used as a deployment disk, to create a specialized enterprise or cloud server instance. Workflow Service (Mistral) The Workflow service creates and implements task execution plans called workflows. Complex multi-step deployments have task sets and interconnected task relationships that determine order of execution and prioritization. The Workflow service provides state management, correct execution order, parallelism, synchronization and high availability. Cloud administrators can define and modify plans to coordinate resource building and redeployment. The Workflow service does not perform the actual tasks, but acts as a coordinator for worker processes and manages asynchronous event messaging and notification to track task execution. The design allows for the creation of custom work processes and the ability to scale and be highly available. Intelligent Platform Management Interface The Intelligent Platform Management Interface (IPMI) is an industry-standard specification for out-of-band management, monitoring, and configuration of computer systems. It is independent
12
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Completed Classroom Topology of any installed operating system, providing access to a hardware-implemented, message-based network management interface. IPMI provides the ability to perform power management tasks (power down, power up, reboot) on systems even when an operating system or CPU is nonfunctional. Management can be used to interact with a system during failures, or as part of boot or initialization procedures. IPMI can also be used to gather run-time information about hardware state, including component status, temperatures, voltages, and may include the ability to send alerts. A baseboard management controller (BMC) provides the chip-level functionality of IPMI. Commonly implemented as an embedded micro-controller, BMC manages the interaction and reporting between relevant IPMI and system buses. IPMI is designed as a server remote access and control interface specification. It remains consistent across a variety of vendor hardware implementations, including CIMC, DRAC, iDRAC, iLO, ILOM, and IMM hardware platform interfaces. The primary functions of the specification include monitoring, power control, logging, and inventory management. IPMI is intended to be used with systems management software, although it can be invoked directly through simple command line utilities.
Note In this course, the overcloud is deployed on virtual machines possessing no hardware or IPMI layer. Instead, a single virtual machine named power emulates a separate IPMI interface for each overcloud virtual machine. IPMI commands are sent to a node-specific IP address on power, where virtual BMC software performs power management activities by communicating with the hypervisor to perform platform management requests. A subset of the IPMI specification is implemented: to power up, power down, and obtain configuration and state notifications.
Completed Classroom Topology On the following page, Figure 1.3: Completed classroom overcloud portrays four deployed nodes: controller0, compute0, compute1, and ceph0. The compute1 node will be deployed later in this chapter as an overcloud stack upgrade. Use this diagram as a reference when verifying the live overcloud configuration.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
13
Chapter 1. Managing an Enterprise OpenStack Deployment
Figure 1.3: Completed classroom overcloud
14
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Overcloud Management
Overcloud Management Following deployment, the overcloud can be managed from the undercloud. Use the OpenStack CLI to start, stop, and monitor the status of the overcloud nodes. Use openstack server list to determine the servers' current status. [stack@director ~]$ openstack server list -c Name -c Status +-------------------------+---------+ | Name | Status | +-------------------------+---------+ | overcloud-compute-0 | SHUTOFF | | overcloud-controller-0 | SHUTOFF | | overcloud-cephstorage-0 | SHUTOFF | +-------------------------+---------+
Use openstack server start to boot each node. The servers should be started in the order shown. The servers may take many minutes to display an ACTIVE status, so be patient and continue to recheck until all servers are running. [stack@director ~]$ openstack server [stack@director ~]$ openstack server [stack@director ~]$ openstack server [stack@director ~]$ openstack server +-------------------------+--------+ | Name | Status | +-------------------------+--------+ | overcloud-compute-0 | ACTIVE | | overcloud-controller-0 | ACTIVE | | overcloud-cephstorage-0 | ACTIVE | +-------------------------+--------+
start overcloud-controller-0 start overcloud-cephstorage-0 start overcloud-compute-0 list -c Name -c Status
You may experience a scenario where the status of nodes is ACTIVE, but checking the virtual machine power state from the online environment or the hypervisor shows the nodes are actually powered off. In this scenario, the undercloud must instruct the nodes to be stopped first (to synchronize the recognized node state, even though the nodes are already off) before the nodes are started again. This can all be accomplished with one command; enter openstack server reboot for each node. [stack@director ~]$ openstack server [stack@director ~]$ openstack server [stack@director ~]$ openstack server [stack@director ~]$ openstack server +-------------------------+--------+ | Name | Status | +-------------------------+--------+ | overcloud-compute-0 | REBOOT | | overcloud-controller-0 | REBOOT | | overcloud-cephstorage-0 | REBOOT | +-------------------------+--------+
reboot overcloud-controller-0 reboot overcloud-cephstorage-0 reboot overcloud-compute-0 list -c Name -c Status
The nodes will first display a status of REBOOT, but will quickly switch to ACTIVE while they continue to start.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
15
Chapter 1. Managing an Enterprise OpenStack Deployment
References The Director Installation & Usage guide for Red Hat OpenStack Platform 10 https://access.redhat.com/documentation/en-US/index.html The Architecture Guide for Red Hat OpenStack Platform 10 https://access.redhat.com/documentation/en-US/index.html Intelligent Platform Management Interface Specification https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-second-gen-interfacespec-v2-rev1-1.html
16
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Describing Undercloud Components
Guided Exercise: Describing Undercloud Components In this exercise, you will connect to the undercloud node, director to launch the predefined overcloud. You will use the OpenStack CLI on the undercloud to manage the overcloud nodes. Outcomes You should be able to: • Connect to and observe the undercloud system. • Launch the overcloud from the undercloud. Steps 1. Confirm that the infrastructure and undercloud virtual machines (workstation, power, and director) are started and accessible. 1.1. Log in to workstation as student with a password of student. 1.2. Log in to power as student, using SSH, then exit. [student@workstation ~]$ ssh power.lab.example.com [student@power ~]$ exit
1.3. Log in to director as the stack user, using SSH. The login is passwordless when coming from workstation. [student@workstation ~]$ ssh [email protected] [stack@director ~]$
2.
As the stack user on director, check the status of the undercloud. If the nova-compute service displays as down, wait until the status changes to up before continuing. The wait should be no more than a minute or two. 2.1. Use the OpenStack CLI to list the status of the undercloud compute services. [stack@director ~]$ openstack compute service list -c Binary -c Status -c State +----------------+---------+-------+ | Binary | Status | State | +----------------+---------+-------+ | nova-cert | enabled | up | | nova-scheduler | enabled | up | | nova-conductor | enabled | up | | nova-compute | enabled | up | +----------------+---------+-------+
Wait until nova-compute displays as up before trying to start the overcloud nodes. 3.
As the stack user on director, check the overcloud status. If necessary, start the overcloud.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
17
Chapter 1. Managing an Enterprise OpenStack Deployment 3.1. Use the OpenStack CLI to list the overcloud server names and current status. [stack@director ~]$ openstack server list -c Name -c Status +-------------------------+---------+ | Name | Status | +-------------------------+---------+ | overcloud-compute-0 | SHUTOFF | | overcloud-controller-0 | SHUTOFF | | overcloud-cephstorage-0 | SHUTOFF | +-------------------------+---------+
In the above output, the overcloud nodes are SHUTOFF and need to be started. 3.2. Use the OpenStack CLI to start the overcloud nodes in the order shown. [stack@director ~]$ openstack server start overcloud-controller-0 [stack@director ~]$ openstack server start overcloud-cephstorage-0 [stack@director ~]$ openstack server start overcloud-compute-0
3.3. Use the OpenStack CLI to confirm that the overcloud nodes have transitioned ACTIVE. When done, log out from director. [stack@director ~]$ openstack server list -c Name -c Status +-------------------------+--------+ | Name | Status | +-------------------------+--------+ | overcloud-compute-0 | ACTIVE | | overcloud-controller-0 | ACTIVE | | overcloud-cephstorage-0 | ACTIVE | +-------------------------+--------+ [stack@director ~]$ exit
18
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Note The classroom environment uses virtualization and imaging techniques that would not be appropriate if used for a production OpenStack infrastructure. Due to these techniques, it is possible for the node status reported by the undercloud, the power state determined by the IPMI service, and the actual virtual machine state to initially be out of sync. If an initial openstack server list command displays all nodes as ACTIVE, but the actual virtual machines are shut down, run openstack server reboot for each node. [stack@director ~]$ openstack server reboot overcloud-compute-0 [stack@director ~]$ openstack server reboot overcloud-cephstorage-0 [stack@director ~]$ openstack server reboot overcloud-controller-0
If openstack server start or openstack server reboot commands generate errors, or the nodes fail to become ACTIVE, first confirm that the novacompute service is up, then run the openstack server set command for each node, followed by the openstack server reboot command for each node. Allow each set of commands, for all three nodes, to show the expected state before continuing with the next set of commands: [stack@director ~]$ openstack compute service list
[stack@director ~]$ openstack server set --state active \ overcloud-compute-0 [stack@director ~]$ openstack server set --state active \ overcloud-cephstorage-0 [stack@director ~]$ openstack server set --state active \ overcloud-controller-0
[stack@director ~]$ openstack server reboot overcloud-compute-0 [stack@director ~]$ openstack server reboot overcloud-cephstorage-0 [stack@director ~]$ openstack server reboot overcloud-controller-0
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
19
Chapter 1. Managing an Enterprise OpenStack Deployment
Verifying the Functionality of Overcloud Services Objectives After completing this section, students should be able to: • Locate and view output from overcloud provisioning • Test specific overcloud functionality • Run tests on overcloud components
Verifying an Undercloud The undercloud is architected to be more than an installation tool. This course discusses orchestration for both initial install and for compute node scaling. RHOSP director also performs numerous Day 2 activities. Therefore, the undercloud system is not intended to be uninstalled or decommissioned after the overcloud is installed. The undercloud can be checked for proper configuration: • view service and network configuration • view introspection results to confirm accurate node capability assessment • view workflow configurations Currently, the undercloud is capable of installing a single overcloud with the stack name overcloud. The Workflow Service is capable of managing multiple plans and stacks. In a future release, the undercloud will be able to install, access, and manage multiple overclouds. Currently supported ongoing activities for the undercloud include: • monitoring the health of an overcloud • gathering and storing metrics from an overcloud • validating and introspecting new nodes for overcloud scaling • performance testing of nodes, components, and scenarios • performing minor release updates, such as security fixes • performing automated major version upgrades to Red Hat OpenStack Platform • auto-scaling or replacing HA controllers and compute nodes; currently, scaling storage nodes is handled by the storage platform, not the undercloud • managing platform-level access to infrastructure nodes, including power management
Verifying an Overcloud Once built, an overcloud is a production infrastructure with many interacting components. To avoid damaging live data and applications, verify installation operation before deploying production workloads. Verifying involves multiple levels of checking:
20
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Verifying an Overcloud • • • • •
view introspection results to confirm accurate node capability assessment compare compute, storage and network configuration to original templates perform power management testing on every node, not just selected nodes install and run the Testing service to confirm component-specific operation deploy an internal-only server instance to validate console access
Viewing introspection results The bare metal introspection process returned data to the ironic-inspector listener to store as baremetal node parameters, which may be viewed using baremetal show. Extra data was stored in the Object Store, one text file object per node, in an object container called ironicinspector. The container is owned by the ironic user in the service project. To view this data, download the object file and parse it with a JSON tool such as jq. Downloading an object file created by a service account (or, more broadly, to run any OpenStack command as a service user) requires using that service user's authentication. It is not necessary to create a permanent authentication rc file for a service account, since running commands as a service user is not a typical or regular task. Instead, override the current authentication environment by prepending only the service account's environment variables to the extraordinary command. For example: [user@demo ~]$ OS_TENANT_NAME=service OS_USERNAME=service account \ OS_PASSWORD=service password openstack object action
The container name in which the files are stored matches the name of the service which created them; for the introspection process the service is ironic-inspector. The password for the ironic service user is found in the undercloud-passwords.conf file. Use the openstack baremetal node to locate the file name used to store introspection results for a node. [user@demo ~]$ openstack baremetal node show 5206cc66-b513-4b01-ac1b-cd2d6de06b7d -c extra +-------+----------------------------------------------------------------+ | Field | Value | +-------+----------------------------------------------------------------+ | extra | {u'hardware_swift_object': | | | 'extra_hardware-5206cc66-b513-4b01-ac1b-cd2d6de06b7d'} | +-------+----------------------------------------------------------------+
The information required to download introspection results in summarized in the following table. Locating introspection node data objects Parameter
Value or location
service user
ironic
service passwords
/home/stack/undercloud-passwords.conf
container name
ironic-inspector
baremetal node field name
extra
parameter name in field
'hardware_swift_object'
container name
ironic-inspector
object name
'extra_hardware-node-id'
The following example downloads the extra_hardware-node-id file:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
21
Chapter 1. Managing an Enterprise OpenStack Deployment
[user@demo ~]$ OS_TENANT_NAME=service OS_USERNAME=ironic \ OS_PASSWORD=260f5ab5bd24adc54597ea2b6ea94fa6c5aae326 \ openstack object save ironic-inspector \ extra_hardware-5206cc66-b513-4b01-ac1b-cd2d6de06b7d
Parse the resulting JSON structure with the jq command: [user@demo ~]$ jq . < extra_hardware-5206cc66-b513-4b01-ac1b-cd2d6de06b7d [ [ "disk", "logical", "count", "1" ], [ "disk", "vda", "size", "42" ], ...output omitted...
The displayed data are attributes of the introspected node. This data can be used to verify that the introspection process correctly analyzed this node, or to customize the introspection process. Such customization is an advanced RHOSP director installation topic and is beyond the scope of this course. Viewing orchestration results The orchestration process deployed each of the registered nodes as one of the standard server roles. Compare the orchestration templates and environment files to those finished servers. To browse those servers, use the heat-admin user, the same Linux user account used by orchestration to access and configure the systems using SSH. When using the provisioning network for direct access from director, the stack user has password-less SSH access. The heat-admin user has sudo privileges; use sudo -i to switch user to root password-less. View the following resources to verify the configuration of your course-specific overcloud: • list services on each node to view which systemd-configured services are running on each type of deployment role server. • compare the static IP addresses set in the orchestration network template files to the network addresses on each overcloud node • compare the NIC configuration of the controller deployment role network orchestration template to the network interfaces and OpenvSwitch bridges on controller0 • compare the NIC configuration of the compute deployment role network orchestration template to the network interfaces on compute0 • compare the NIC configuration of the ceph-storage deployment role network orchestration template to the network interfaces on ceph0 • compare the disk configuration in the orchestration storage template file to the output of ceph osd and ceph status commands
22
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Verifying an Overcloud Testing IPMI power management The power virtual machine acts like an IPMI hardware layer for each of the overcloud nodes. One virtual address, per node, is added to the provisioning network interface on power, and is configured to listen on port 623. Properly structured IPMI commands sent to a listener are translated by the IPMI emulator into requests for the underlying hypervisor system, which performs the action on the requested node. IPMI IP addresses Node name and KVM domain name
IP address on provisioning network
Virtual IP address on power IPMI emulator
controller0
172.25.249.1
172.25.249.101
compute0
172.25.249.2
172.25.249.102
compute1
172.25.249.12
172.25.249.112
ceph0
172.25.249.3
172.25.249.103
This classroom does not require the full IPMI set of capabilities, only the ability to power cycle or start nodes programmatically on demand. The command-line utility to test the functionality of the power IPMI emulation uses this syntax: [user@demo ~]$ ipmitool -I lanplus -U admin -P password -H IP power status|on|off
The -I interface options are compiled into the command and may be seen with ipmitool -h. The lanplus choice indicates the use of the IPMI v2.0 RMCP+ LAN Interface. For example, to view the power status of the controller0 node, run the following command. [user@demo ~]$ ipmitool -I lanplus -U admin -P password -H 172.25.249.101 power status Chassis Power is on
Testing OpenStack components Red Hat OpenStack Platform includes a Testing Service module (codenamed Tempest) with preconfigured per-module tests to perform rigorous testing prior to beginning production. The standard tests are designed to load the overcloud and run for many hours to prove readiness. The Testing Service systems also includes a shorter and simpler set of tests known as the smoke tests. These tests also perform standard OpenStack operations, but are designed to confirm a working configuration. Failures in these atomic tests, an inability to perform typical OpenStack project user tasks, indicates a probable misconfiguration or inoperable hardware. Using the Testing service requires some preparatory tasks: • Testing is invoked as the admin user of the overcloud to be tested. The current environment file must be loaded before starting. • The system running the tests must have access to the internal API network. This can be a temporary interface configured only for the duration of testing. • An external network and subnet, must exist before running testing. • Internet access is expected by default, to obtain a CirrOS image to use in testing. In our classroom, we specify a local image from the command line to avoid this requirement. • The heat_stack_user role must exist in the tested overcloud. • Installing the openstack-tempest-all package installs all component tests, including tests for components not installed on the overcloud. Manual editing of the tempest configuration file can turn off unneeded components.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
23
Chapter 1. Managing an Enterprise OpenStack Deployment The Testing service API tests are designed to use only the OpenStack API, and not one of the Python client interfaces. The intent is for this testing to validate the API, by performing both valid and invalid API invocations against component APIs to ensure stability and proper error handling. The Testing Service can also be used to test client tool implementations if they can operate in a raw testing mode which allows passing JSON directly to the client. Scenario tests are also included. These test are a related series of steps to create more complex objects and project states, confirmed for functionality, and then removed. The Testing service runs the full-length tests by default. However, the service also provides a method for running only shorter smoke tests or to skip tests, by creating a text file to list tests by name, then including the file as an option when testing is run. This is useful for including or excluding tests as required, such as skipping tests that may be inoperable due to component updates or customization, or where individual features have been disabled. Adding *.smoke to the skip list limits tests to the smoke tests. One method for running tests is the tools/run-test.sh script, which uses a skip list file with both include and exclude regular expression syntax for selecting tests. This course uses this method because the Testing service CLI in RHOSP10 is not yet feature complete. However, the tempest run command is available as another simple test invocation method. The newer Testing service CLI also includes the useful temptest cleanup command, which can find and delete resources created by the Testing service, even if tests have aborted or completed with a failed status and left orphaned resources. To use this tool, first run the command with the --init-saved-state option before running any tests. This option creates a saved_state.json file containing a list of existing resources from the current cloud deployment that will be preserved from subsequent cleanup commands. The following example demonstrates the correct order in which to use the tempest cleanup commands. [user@demo ~]$ tempest cleanup --init-saved-state [user@demo ~]$ tempest run --smoke [user@demo ~]$ tempest cleanup
Using VNC to access an internal-only instance console An internal-only instance, by definition, is a server available on an internal project network without external access. Because an internal-only server requires that absolute minimum number of prerequisite objects, it is common to use one to test basic cloud functionality. The objects required include a flavor, an image, a network and a subnet available for use in this user's project. The objects may be owned by this project or shared from another project. No external networks, routers, floating IPs, security groups, key pairs, persistent volumes or other non-core resources are required. The subnet to which the instance is deployed can use DHCP or not, however, an IP address must be available. An internal-only server instance is not be accessible from any system other than an authorized controller node for that overcloud. To gain access to the server's console, a user may access the controller through a VNC- or Spice-enabled browser, or a websockets-implemented VNC or Spice client. Since Red Hat OpenStack Platform support for Spice is not yet released, this course uses and describes VNC console components and configuration. Each compute node runs a vncserver process, listening on the internal API network at one or more ports starting at 5900 and going up, depending on the number of instances deployed on that compute node. Each controller node runs a novncproxy process, listening at port 6080 on the same internal API network. The remaining services belong to the Compute Service (codenamed Nova) with components on both the controller and compute nodes.
24
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Verifying an Overcloud To access the console, a user clicks the server's instance name from the Dashboard Project/ Compute/Instances screen to reach the instance detail screen, which has 4 sub-tabs. Clicking the Console sub-tab initiates a request for a VNC console connection. The following list describes the resulting Compute service and VNC interactions to build the instance-specific URL. Each component named is followed by its access location in parentheses (node name, network name, port number): • A client browser (workstation, external), configured with the NoVNC plug-in, connects to the Dashboard haproxy (controller0, external, port 80) to request to open a console to a specific running server instance. • Haproxy passes an access URL request to nova-api (controller0, internal API, port 8774). • Nova-api passes a get_vnc_console request to nova-compute (compute0, internal_API, AMQP). • Nova-compute passes the get_vnc_console request to libvirt (compute0), which returns a host IP and port. • Nova-compute returns a generated token and a connect_info object to nova-api (controller0, internal API, AMQP). • Nova-api passes an authorize_console request to nova-consoleauth (compute0, internal API, AMQP), which caches the connect_info object with the token as the index, waiting for the actual connection request to occur. • Nova-api returns a nova-novncproxy URL and the instance-specific token to Dashboard (controller0, internal API), which passes the URL and token to the browser (workstation, external). In Figure 1.4: The constructed nova-novncproxy instance-specific URL, notice the URL, which includes inline parameters for the token and instance ID for the requested server instance demo, at the bottom of the Dashboard screen as the mouse hovers over the click-able link in the blue message area.
Figure 1.4: The constructed nova-novncproxy instance-specific URL
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
25
Chapter 1. Managing an Enterprise OpenStack Deployment
Note The requirement that a user clicks the link titled Click here to show only console, plus any messages about keyboard non-response, is not an error. It is the result of browser settings forbidding cross domain scripts from running automatically. A user could select settings, such as show all content or load unsafe scripts, that disable protective security policies, but it is not recommended. Instead, manually click the link.
The Compute service has obtained connection information that it has cached with the console authorization service, to be requested and used by any user who provides the correct token. The URL passed to the browser is not the direct address of the demo instance, but instead is the novncproxy address, which constructs a connection reverse proxy, to allow the demo instance to initiate console screen refreshes. The following list describes the remaining interactions to complete the reverse proxy VNC connection when the URL is clicked: • The browser (workstation, external) connects to the URL, proxied by haproxy (controller0, external, port 6080) to reach nova-novncproxy (controller0, internal API, port 6080). nova-novncproxy parses the token and instance ID from the URL. • Using the token, nova-novncproxy retrieves the connect_info object from novaconsoleauth (controller0, internal API, AMQP). • nova-novncproxy connects directly to vncserver (compute0, internal API, 5900+) at the port designated for the requested VM and creates a reverse proxy to send graphics back through the Dashboard haproxy (controller0, internal API, port 80) to the user's browser (workstation, external). Deploying and connecting to the VNC console of an internal-only server instance validates core Compute service, Messaging service and network access functionality.
Verifying an Overcloud The following steps outline the process to verify an overcloud deployment. 1.
On the undercloud, add a port to the control plane interface br-ctlplane and assign it an IP address.
2.
Install the openstack-tempest package and component test packages.
3.
Create a testing configuration directory and populate it with configuration files.
4.
Create a provider network on the overcloud and retrieve its ID.
5.
Run the config_tempest tool configuration script using the external network ID as an argument.
6.
Optionally, edit the /etc/tempest.conf file to select or clear the services to be tested.
7.
Use the tempest-smoke-skip-sample sample file to create the tempest-smoke-skip file. The file lists tests to run and tests to skip.
8.
Run the tools/run-tests-sh --skip-file ./tempest-smoke-skip command to test the environment.
26
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Verifying an Overcloud
References Intelligent Platform Management Interface https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface Access an instance from console https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/ console-url.html How to select virtual consoles https://docs.openstack.org/security-guide/compute/how-to-select-virtualconsoles.html Further information is available in the OpenStack Integration Test Suite Guide for Red Hat OpenStack Platform 10; at https://access.redhat.com/documentation/en-US/index.html
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
27
Chapter 1. Managing an Enterprise OpenStack Deployment
Guided Exercise: Verifying the Functionality of Overcloud Services In this exercise, you will view the results of the deployment tasks that created the overcloud on your system. You will verify the operation and configuration of the undercloud, then verify the operation and configuration of the overcloud to compare and contrast the differences. Finally, to validate that the overcloud is functional, you will install and run the Testing service. Outcomes You should be able to: • Connect to and observe the undercloud virtual machine. • Connect to and observe the overcloud virtual machines. • Install and run the Testing service. Before you begin Log in to workstation as student with password student. On workstation, run the lab deployment-overcloud-verify setup command. The script checks that the m1.web flavor, the rhel7 image, and the default admin account are available. [student@workstation ~]$ lab deployment-overcloud-verify setup
Steps 1. Log in to director as the stack user. Observe that the stackrc environment file automatically loaded. You will use the stack user's authentication environment to query and manage the undercloud. 1.1. SSH to the stack user on the director system. No password is required. View the stack user's environment, which is used to connect to the undercloud. [student@workstation ~]$ ssh stack@director [stack@director ~]$ env | grep OS_ OS_IMAGE_API_VERSION=1 OS_PASSWORD=96c087815748c87090a92472c61e93f3b0dcd737 OS_AUTH_URL=https://172.25.249.201:13000/v2.0 OS_USERNAME=admin OS_TENANT_NAME=admin OS_NO_CACHE=True OS_CLOUDNAME=undercloud
1.2. View the current overcloud server list to find the provisioning network address for each node. The IP addresses shown here may differ from yours. [stack@director ~]$ openstack server list -c Name -c Status -c Networks +-------------------------+---------+------------------------+ | Name | Status | Networks | +-------------------------+---------+------------------------+ | overcloud-controller-0 | ACTIVE | ctlplane=172.25.249.52 |
28
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| overcloud-compute-0 | ACTIVE | ctlplane=172.25.249.53 | | overcloud-cephstorage-0 | ACTIVE | ctlplane=172.25.249.58 | +-------------------------+---------+------------------------+
2.
Log in to each overcloud system to view the unique services running on each node type, using the heat-admin account that was provisioned during deployment. The heat-admin on each node is configured with the SSH keys for the stack user from director to allow password-less access. 2.1. Using SSH, log in to the controller0 service API node. List relevant services and network configuration, then log out. [stack@director ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ ip addr | grep -E 'eth0|vlan|br-ex' 2: eth0: mtu 1500 qdisc pfifo_fast state inet 172.25.249.59/24 brd 172.25.249.255 scope global eth0 inet 172.25.249.50/32 brd 172.25.249.255 scope global eth0 9: br-ex: mtu 1500 qdisc noqueue state inet 172.25.250.1/24 brd 172.25.250.255 scope global br-ex inet 172.25.250.50/32 brd 172.25.250.255 scope global br-ex 10: vlan40: mtu 1500 qdisc noqueue state inet 172.24.4.1/24 brd 172.24.4.255 scope global vlan40 inet 172.24.4.50/32 brd 172.24.4.255 scope global vlan40 11: vlan20: mtu 1500 qdisc noqueue state inet 172.24.2.1/24 brd 172.24.2.255 scope global vlan20 12: vlan10: mtu 1500 qdisc noqueue state inet 172.24.1.1/24 brd 172.24.1.255 scope global vlan10 inet 172.24.1.51/32 brd 172.24.1.255 scope global vlan10 inet 172.24.1.50/32 brd 172.24.1.255 scope global vlan10 13: vlan30: mtu 1500 qdisc noqueue state inet 172.24.3.1/24 brd 172.24.3.255 scope global vlan30 inet 172.24.3.50/32 brd 172.24.3.255 scope global vlan30 [heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl list-br ...output omitted... [heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl list-ifaces br-trunk ...output omitted... [heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl list-ifaces br-ex ...output omitted... [heat-admin@overcloud-controller-0 ~]$ systemctl -t service list-units \ open\* neutron\* ceph\* ...output omitted... [heat-admin@overcloud-controller-0 ~]$ exit [stack@director ~]$
2.2. Using SSH, log in to the compute0 hypervisor node. List relevant services and network configuration, then log out. [stack@director ~]$ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$ ip addr | grep -E 'eth0|vlan|eth2' 2: eth0: mtu 1500 qdisc pfifo_fast state inet 172.25.249.57/24 brd 172.25.249.255 scope global eth0 4: eth2: mtu 1500 qdisc pfifo_fast state inet 172.25.250.2/24 brd 172.25.250.255 scope global eth2 10: vlan20: mtu 1500 qdisc noqueue state inet 172.24.2.2/24 brd 172.24.2.255 scope global vlan20 11: vlan10: mtu 1500 qdisc noqueue state inet 172.24.1.2/24 brd 172.24.1.255 scope global vlan10 12: vlan30: mtu 1500 qdisc noqueue state inet 172.24.3.2/24 brd 172.24.3.255 scope global vlan30
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
29
Chapter 1. Managing an Enterprise OpenStack Deployment [heat-admin@overcloud-compute-0 ...output omitted... [heat-admin@overcloud-compute-0 ...output omitted... [heat-admin@overcloud-compute-0 open\* neutron\* ceph\* ...output omitted... [heat-admin@overcloud-compute-0 [stack@director ~]$
~]$ sudo ovs-vsctl list-br ~]$ sudo ovs-vsctl list-ifaces br-trunk ~]$ systemctl -t service list-units \
~]$ exit
2.3. Using SSH, log in to the ceph0 storage node. List relevant services and network configuration, then log out. [stack@director ~]$ ssh heat-admin@ceph0 [heat-admin@overcloud-cephstorage-0 ~]$ ip addr | grep -E 'eth0|vlan|eth2' 2: eth0: mtu 1500 qdisc pfifo_fast state inet 172.25.249.56/24 brd 172.25.249.255 scope global eth0 4: eth2: mtu 1500 qdisc pfifo_fast state inet 172.25.250.3/24 brd 172.25.250.255 scope global eth2 6: vlan40: mtu 1500 qdisc noqueue state inet 172.24.4.3/24 brd 172.24.4.255 scope global vlan40 7: vlan30: mtu 1500 qdisc noqueue state inet 172.24.3.3/24 brd 172.24.3.255 scope global vlan30 [heat-admin@overcloud-cephstorage-0 ~]$ sudo ovs-vsctl show ...output omitted... [heat-admin@overcloud-cephstorage-0 ~]$ systemctl -t service list-units \ open\* neutron\* ceph\* ...output omitted... [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph status ...output omitted... [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd lspools ...output omitted... [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd ls ...output omitted... [heat-admin@overcloud-cephstorage-0 ~]$ lsblk -fs ...output omitted... [heat-admin@overcloud-cephstorage-0 ~]$ exit [stack@director ~]$
3.
Test the IPMI emulation software which is performing power management for the overcloud's virtual machine nodes. 3.1. Use the IPMI command-line tool to power the compute1 node on and off. The compute1 node will be provisioned as the second compute node in a later chapter, but is not currently in use. All other nodes are currently functioning cloud nodes; do not perform these commands on any other nodes. The IPMI address for compute1 is 172.25.249.112. Start by checking the node's current power status. [stack@director ~]$ ipmitool -I lanplus -U admin -P password \ -H 172.25.249.112 power status Chassis Power is off
3.2. Toggle the compute1 power on and off. When you are finished practicing the IPMI functionality, leave the compute1 node powered off.
30
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
You might receive a failure message, as in the example commands below. This can indicate that the command request was received while the host was transitioning between states. Wait, then submit the command request again. [stack@director ~]$ ipmitool -I lanplus -U admin -P password \ -H 172.25.249.112 power on Chassis Power Control: Up/On [stack@director ~]$ ipmitool -I lanplus -U admin -P password \ -H 172.25.249.112 power off Set Chassis Power Control to Down/Off failed: Command response could not be provided [stack@director ~]$ ipmitool -I lanplus -U admin -P password \ -H 172.25.249.112 power off Chassis Power Control: Down/Off
4.
Authenticate as the admin user in the admin project in the overcloud. Source the overcloudrc authentication environment file. The loaded environment provides admin user access in the overcloud. [stack@director ~]$ source overcloudrc
5.
Confirm that the heat_stack_user role is available in the overcloud. [stack@director ~]$ openstack role list -c Name +-----------------+ | Name | +-----------------+ | heat_stack_user | | ResellerAdmin | | _member_ | | swiftoperator | | admin | +-----------------+
6.
Install the Tempest testing service and component tests. Create a test configuration directory, and populate it with configuration files using the configure-tempestdirectory script. Run the config-tempest script to configure the tests for the overcloud, using overcloudrc environment parameters, the external provider-172.25.250 network ID, and the cirros-0.3.4-x86_64-disk.img image from http://materials.example.com. 6.1. Install the tempest package and all available component test packages. [stack@director ~]$ sudo yum -y install openstack-tempest{,-all}
6.2. Create a test configuration working directory. Run the configure-tempestdirectory script from the new directory. The script populates the working directory with configuration files. [stack@director ~]$ mkdir ~/tempest
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
31
Chapter 1. Managing an Enterprise OpenStack Deployment [stack@director ~]$ cd ~/tempest [stack@director tempest]$ /usr/share/openstack-tempest-13.0.0/tools/configure-tempest-directory
6.3. Locate the network ID for the provider-172.25.250 external network. [stack@director tempest]$ openstack network show provider-172.25.250 \ -c id -f value 1eef8ec9-d4be-438b-bf18-381a40cbec60
6.4. Run the config_tempest setup script using the external network ID. This populates the tempest configuration files based on components currently installed. [stack@director tempest]$ tools/config_tempest.py \ --deployer-input ~/tempest-deployer-input.conf --debug \ --create identity.uri $OS_AUTH_URL identity.admin_password $OS_PASSWORD \ --image http://materials.example.com/cirros-0.3.4-x86_64-disk.img \ --network-id 1eef8ec9-d4be-438b-bf18-381a40cbec60 2017-06-19 16:02:56.499 10562 INFO tempest [-] Using tempest config file /etc/ tempest/tempest.conf 2017-06-19 16:02:57.415 10562 INFO __main__ [-] Reading defaults from file '/ home/stack/tempest/etc/default-overrides.conf' 2017-06-19 16:02:57.418 10562 INFO __main__ [-] Adding options from deployerinput file '/home/stack/tempest-deployer-input.conf' ...output omitted...
7.
Configure and run a smoke test. The dynamic configuration in the previous step included mistral and designate component tests, which are not installed in this overcloud. Edit the configuration to disable mistral and designate testing. Use the test skip file found in student's Downloads directory on workstation to also exclude tests for API versions not in use on this overcloud. Exit from director after the test run. 7.1. Edit the etc/tempest.conf testing configuration file to mark components as not available. Locate and edit the service_available section to disable mistral and designate testing. Leave existing entries; only add mistral and designate as False. The section should appear as shown when done. [stack@director tempest]$ cat ./etc/tempest.conf ...output omitted... [service_available] glance = True manila = False cinder = True swift = True sahara = False nova = True neutron = True trove = False ceilometer = True ironic = False heat = True zaqar = False horizon = True mistral = False designate = False ...output omitted...
32
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
7.2. Create a file named tempest-smoke-skip to list tests to run and tests to skip. Locate the sample file named tempest-smoke-skip-sample in student's Downloads directory on workstation. Copy the file to the Testing service working directory on director and rename it. Review the entries in the skip file. [stack@director tempest]$ scp \ student@workstation:Downloads/tempest-smoke-skip-sample ./tempest-smoke-skip Warning: Permanently added 'workstation,172.25.250.254' (ECDSA) to the list of known hosts. student@workstation's password: student tempest-smoke-skip 100% 998 1.0KB/s 00:00 [stack@director tempest]$ cat ./tempest-smoke-skip +.*smoke -ceilometer.* -designate_tempest_plugin.* -inspector_tempest_plugin.* -manila_tempest_tests.* -mistral_tempest_tests.* -neutron.* -neutron_fwaas.* -neutron_vpnaas.* -sahara_tempest_plugin.* -tempest.api.data_processing.* -tempest.api.identity.* -tempest.api.image.* -tempest.api.network.* -tempest.api.object_storage.* -tempest.api.orchestration.* -tempest.api.volume.* -tempest.scenario.*
7.3. Run the tempest cleanup command to save a list of pre-existing cloud resources. [stack@director tempest]$ tempest cleanup --init-saved-state
7.4. Run the tests, specifying tempest-smoke-skip as the skip file. Although no test failures are expected, view the output for any that occur to observe the troubleshooting information provided by the Testing Service. This command may take 10 minutes or longer to complete. [stack@director tempest]$ tools/run-tests.sh --skip-file ./tempest-smoke-skip \ --concurrency 1 ====== Totals ====== Ran: 13 tests in 93.0000 sec. - Passed: 13 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 0 Sum of execute time for each test: 15.4832 sec. ============== Worker Balance ============== - Worker 0 (13 tests) => 0:01:26.541830
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
33
Chapter 1. Managing an Enterprise OpenStack Deployment 7.5. Run the tempest cleanup command to remove resources not listed in the earlier save list. There may be none to delete, if all tests completed successfully and performed their own cleanups. [stack@director tempest]$ tempest cleanup
7.6. Finish the test results review, then exit director. [stack@director tempest]$ exit [student@workstation ~]$
Cleanup On workstation, run the lab deployment-overcloud-verify cleanup script to clean up this exercise. [student@workstation ~]$ lab deployment-overcloud-verify cleanup
34
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Lab: Managing an Enterprise OpenStack Deployment
Lab: Managing an Enterprise OpenStack Deployment In this lab, you will validate that the overcloud is functional by deploying a server instance using a new user and project, creating the resources required. The lab is designed to be accomplished using the OpenStack CLI, but you can also perform tasks using the dashboard (http:// dashboard.overcloud.example.com). You can find the admin password in the /home/ stack/overcloudrc file on director. Outcomes You should be able to: • Create the resources required to deploy a server instance. • Deploy and verify an external instance. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab deployment-review setup command. The script checks that the m1.web flavor, the rhel7 image, and the provider-172.25.250 network exist to test instance deployment. The script also checks that the default admin account is available. [student@workstation ~]$ lab deployment-review setup
Steps 1. On workstation, load the admin user environment file. To prepare for deploying an server instance, create the production project in which to work, and an operator1 user with the password redhat. Create an authentication environment file for this new user. 2.
The lab setup script preconfigured an external provider network and subnet, an image, and multiple flavors. Working as the operator1 user, create the security resources required to deploy this server instance, including a key pair named operator1-keypair1.pem placed in student's home directory, and a production-ssh security group with rules for SSH and ICMP.
3.
Create the network resources required to deploy an external instance, including a production-network1 network, a production-subnet1 subnet using the range 192.168.0.0/24, a DNS server at 172.25.250.254, and a production-router1 router. Use the external provider-172.25.250 network to provide a floating IP address.
4.
Deploy the production-web1 server instance using the rhel7 image and the m1.web flavor.
5.
When deployed, use ssh to log in to the instance console. From the instance, verify network connectivity by using ping to reach the external gateway at 172.25.250.254. Exit the production-web1 instance when finished.
Evaluation On workstation, run the lab deployment-review grade command to confirm the success of this exercise.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
35
Chapter 1. Managing an Enterprise OpenStack Deployment
[student@workstation ~(operator1-production)]$ lab deployment-review grade
Cleanup On workstation, run the lab deployment-review cleanup script to clean up this exercise. [student@workstation ~(operator1-production)]$ lab deployment-review cleanup
36
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution In this lab, you will validate that the overcloud is functional by deploying a server instance using a new user and project, creating the resources required. The lab is designed to be accomplished using the OpenStack CLI, but you can also perform tasks using the dashboard (http:// dashboard.overcloud.example.com). You can find the admin password in the /home/ stack/overcloudrc file on director. Outcomes You should be able to: • Create the resources required to deploy a server instance. • Deploy and verify an external instance. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab deployment-review setup command. The script checks that the m1.web flavor, the rhel7 image, and the provider-172.25.250 network exist to test instance deployment. The script also checks that the default admin account is available. [student@workstation ~]$ lab deployment-review setup
Steps 1. On workstation, load the admin user environment file. To prepare for deploying an server instance, create the production project in which to work, and an operator1 user with the password redhat. Create an authentication environment file for this new user. 1.1. On workstation, source the admin-rc authentication environment file in the student home directory. View the admin password in the OS_PASSWORD variable. [student@workstation ~]$ source admin-rc [student@workstation ~(admin-admin)]$ env | grep "^OS_" OS_REGION_NAME=regionOne OS_PASSWORD=mbhZABea3qjUTZGNqVMWerqz8 OS_AUTH_URL=http://172.25.250.50:5000/v2.0 OS_USERNAME=admin OS_TENANT_NAME=admin
1.2. As admin, create the production project and the operator1 user. [student@workstation ~(admin-admin)]$ openstack project create \ --description Production production ...output omitted... [student@workstation ~(admin-admin)]$ openstack user create \ --project production --password redhat --email [email protected] operator1
1.3. Create a new authentication environment file by copying the existing admin-rc file. [student@workstation ~(admin-admin)]$ cp admin-rc operator1-production-rc
1.4. Edit the file with the new user's settings. Match the settings shown here.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
37
Chapter 1. Managing an Enterprise OpenStack Deployment
unset OS_SERVICE_TOKEN export OS_AUTH_URL=http://172.25.250.50:5000/v2.0 export OS_PASSWORD=redhat export OS_REGION_NAME=regionOne export OS_TENANT_NAME=production export OS_USERNAME=operator1 export PS1='[\u@\h \W(operator1-production)]\$ '
2.
The lab setup script preconfigured an external provider network and subnet, an image, and multiple flavors. Working as the operator1 user, create the security resources required to deploy this server instance, including a key pair named operator1-keypair1.pem placed in student's home directory, and a production-ssh security group with rules for SSH and ICMP. 2.1. Source the new environment file. Remaining lab tasks must be performed as this production project member. [student@workstation ~(admin-admin)]$ source operator1-production-rc
2.2. Create a keypair. Redirect the command output into the operator1-keypair1.pem file. Set the required permissions on the key pair file. [student@workstation ~(operator1-production)]$ openstack keypair create \ operator1-keypair1 > /home/student/operator1-keypair1.pem [student@workstation ~(operator1-production)]$ chmod 600 operator1-keypair1.pem
2.3. Create a security group with rules for SSH and ICMP access. [student@workstation ~(operator1-production)]$ openstack security group \ create production-ssh ...output omitted... [student@workstation ~(operator1-production)]$ openstack security group \ rule create --protocol tcp --dst-port 22 production-ssh ...output omitted... [student@workstation ~(operator1-production)]$ openstack security group \ rule create --protocol icmp production-ssh ...output omitted...
3.
Create the network resources required to deploy an external instance, including a production-network1 network, a production-subnet1 subnet using the range 192.168.0.0/24, a DNS server at 172.25.250.254, and a production-router1 router. Use the external provider-172.25.250 network to provide a floating IP address. 3.1. Create a project network and subnet. [student@workstation ~(operator1-production)]$ openstack network create \ production-network1 ...output omitted... [student@workstation ~(operator1-production)]$ openstack subnet create \ --dhcp \ --subnet-range 192.168.0.0/24 \ --dns-nameserver 172.25.250.254 \ --network production-network1 \
38
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution production-subnet1 ...output omitted...
3.2. Create a router. Set the gateway address. Add the internal network interface. [student@workstation ~(operator1-production)]$ openstack router create \ production-router1 ...output omitted... [student@workstation ~(operator1-production)]$ neutron router-gateway-set \ production-router1 provider-172.25.250 ...output omitted... [student@workstation ~(operator1-production)]$ openstack router add subnet \ production-router1 production-subnet1 ...output omitted...
3.3. Create a floating IP, taken from the external network. You will use this address to deploy the server instance. [student@workstation ~(operator1-production)]$ openstack floating ip \ create provider-172.25.250 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ ...output omitted... | floating_ip_address | 172.25.250.N | ...output omitted...
4.
Deploy the production-web1 server instance using the rhel7 image and the m1.web flavor. 4.1. Deploy the server instance, and verify the instance has an ACTIVE status. [student@workstation ~(operator1-production)]$ openstack server create \ --nic net-id=production-network1 \ --security-group production-ssh \ --image rhel7 \ --flavor m1.web \ --key-name operator1-keypair1 \ --wait production-web1 ...output omitted... [student@workstation ~(operator1-production)]$ openstack server show \ production-web1 -c status -f value ACTIVE
4.2. Attach the floating IP address to the active server. [student@workstation ~(operator1-production)]$ openstack server add \ floating ip production-web1 172.25.250.N ...output omitted...
5.
When deployed, use ssh to log in to the instance console. From the instance, verify network connectivity by using ping to reach the external gateway at 172.25.250.254. Exit the production-web1 instance when finished.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
39
Chapter 1. Managing an Enterprise OpenStack Deployment 5.1. Use the ssh command with the key pair to log in to the instance as the cloud-user user at the floating IP address. [student@workstation ~(operator1-production)]$ ssh -i operator1-keypair1.pem \ [email protected]
5.2. Test for external network access. Ping the network gateway from production-web1. [cloud-user@production-web1 ~]$ ping -c3 172.25.250.254 PING 172.25.250.254 (172.25.250.254) 56(84) bytes of data. 64 bytes from 172.25.250.254: icmp_seq=1 ttl=63 time=0.804 ms 64 bytes from 172.25.250.254: icmp_seq=2 ttl=63 time=0.847 ms 64 bytes from 172.25.250.254: icmp_seq=3 ttl=63 time=0.862 ms --- 172.25.250.254 ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2001ms rtt min/avg/max/mdev = 0.804/0.837/0.862/0.041 ms
5.3. When finished testing, exit the production-web1 server instance. [cloud-user@production-web1 ~]$ exit [student@workstation ~(operator1-production)]$
Evaluation On workstation, run the lab deployment-review grade command to confirm the success of this exercise. [student@workstation ~(operator1-production)]$ lab deployment-review grade
Cleanup On workstation, run the lab deployment-review cleanup script to clean up this exercise. [student@workstation ~(operator1-production)]$ lab deployment-review cleanup
40
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Summary
Summary In this chapter, you learned: • Enterprise clouds today are built using multiple, interconnected cloud structures. The undercloud is a provisioning and management cloud for building and managing the production clouds. Red Hat OpenStack Platform director is the undercloud in Red Hat OpenStack Platform. • An enterprise production cloud is known as an overcloud. Underclouds and overclouds utilize the same technologies, but manage different workloads. Underclouds manage cloud infrastructure, while overclouds manage production, tenant workloads. • There are three major steps in overcloud provisioning. Introspection discovers and queries available systems to gather node capabilities. Orchestration uses templates and environment files to configure everything about the cloud deployment. Testing is designed to validate all the standard functionality of the components that were installed. • Common open technologies are used in physical and virtual clouds. Intelligent Platform Management Interface (IPMI) is the power management technology used to control nodes. Virtual Network Computing (VNC) is the remote access technology used to access deployed instance consoles. • The introspection process defines the basic technical characteristics of nodes to be deployed. Using those characteristics, overcloud deployment can automatically assign deployment roles to specific nodes. • The orchestration process defines the specific configuration for each node's hardware and software. The provided default templates cover a majority of common use cases and designs. • OpenStack includes a testing component which has hundreds of tests to verify every component in an overcloud. Tests and configuration are completely customizable, and include short, validation smoke tests and longer running, more comprehensive full tests.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
41
42
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 2
MANAGING INTERNAL OPENSTACK COMMUNICATION Overview Goal
Administer the Keystone identity service and the AMQP messaging service.
Objectives
• Describe the user and service authentication architecture. • Administer the service catalog. • Manage messages with the message broker.
Sections
• Describing the Identity Service Architecture (and Quiz) • Administering the Service Catalog (and Guided Exercise) • Managing Message Brokering (and Guided Exercise)
Lab
• Managing Internal OpenStack Communication
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
43
Chapter 2. Managing Internal OpenStack Communication
Describing the Identity Service Architecture Objectives After completing this section, students should be able to: • Describe the Identity Service architecture • Compare and contrast the available token providers • Describe differences between Identity Service versions
Identity Service Architecture The OpenStack Identity Service (code named Keystone) provides authentication, role-based authorization, policy management, and token handling using internal service functions categorized as identity, policy, token, resource, role assignment, and catalog. The Identity Service API is available at configurable endpoints segregated by public and internal traffic. The API can be provided redundantly by multiple Controller nodes using Pacemaker with a virtual IP (VIP) address. The internal service functions manage different aspects of the Identity Service: Identity Identity encompasses authentication and authorization functions. Users are a digital representation of a person, system, or service using other OpenStack services. Users are authenticated before requesting services from OpenStack components. Users must be assigned a role to participate in a project. Users may be managed using groups, introduced in Identity Service v3, which can be assigned roles and attached to projects the same as individual users. Projects (also referred to by the deprecated description tenant) are collections of owned resources such as networks, images, servers, and security groups. These are structured according to the development needs of an organization. A project can represent a customer, account, or any organizational unit. With Identity Service v3, projects can contain sub-projects, which inherit project role assignments and quotas from higher projects. Resource Resource functions manage domains, which are an Identity Service v3 entity for creating segregated collections of users, groups and projects. Domains allow multiple organizations to share a single OpenStack installation. Users, projects, and resources created in one domain cannot be transferred to another domain; by design, they must be recreated. OpenStack creates a single domain named default for a new installation. In Identity Service v2, multiple domains are not recognized and all activities use the default domain. Token Token functions create, manage and validate time-limited tokens which users pass to other OpenStack components to request service. A token is a structured enumeration of user access rights designed to simplify the requirement that each individual OpenStack service request be verified for sufficient user privilege. Token protocols have evolved since the early OpenStack days, and are discussed further in this chapter. Policy Policy functions provide a rule-based authorization engine and an associated rule management interface. Policy rules define the capabilities of roles. Default roles include admin, _member_, swiftoperator, and heat_stack_user. Custom roles may be created by building policies.
44
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Identity Service Architecture Role Assignment Role assignment functions are used to assign users to projects. Users do not belong to projects, instead they have a role in a project. Users may be assigned multiple roles for the same project, and may also be assigned different roles in multiple projects. Roles define a set of user privileges to perform specific operations on OpenStack services, defined by policy definitions. The most commonly recognized roles are _member_, which can perform all normal activities within a project, and admin, which adds additional permissions to create users, projects, and other restricted resource objects. Catalog Catalog functions store connection information about every other OpenStack service component, in the form of endpoints. The catalog contains multiple endpoint entries for each service, to allow service traffic to be segregated by public, internal, and administration tasks for traffic management and security reasons. Since OpenStack services may be redundantly installed on multiple controller and compute nodes, the catalog contains endpoints for each. When users authenticate and obtain a token to use when accessing services, they are, at the same time, being given the current URL of the requested service.
Note Red Hat OpenStack Platform supports both Identity Service v2 and v3. Identity v3 requires the use of the new authentication environment variables OS_IDENTITY_API_VERSION and OS_DOMAIN_NAME, and a change to the OS_AUTH_URL for the new version's endpoint. This OpenStack System Administration II course only uses Identity Service v2. Each listed Identity Service function supports multiple choices of back ends, defined through plug-ins, which can be one of the following types (not all functions support all back-end types): • Key Value Store: A file-based or in-memory dictionary using primary key lookups. • Memcached: A distributed-memory shared caching structure. • Structured Query Language: OpenStack uses SQLAlchemy as the default persistent data store for most components. SQLAlchemy is a Python-based SQL toolkit. • Pluggable Authentication Module: Using the Linux PAM authentication service. • Lightweight Directory Access Protocol: Uses the LDAP protocol to connect to an existing back-end directory, such as IdM or AD, for user authentication and role information. Configuration files are located in the /etc/keystone directory: Configuration and Log Files in /etc/keystone File name
Description
keystone.conf
The primary configuration file defines drivers, credentials, token protocols, filters and policies, and security attributes.
keystone-paste-ini
Specified by the config_file parameter in the primary configuration file, this file provides PasteDeploy configuration entries. PasteDeploy is a method for
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
45
Chapter 2. Managing Internal OpenStack Communication File name
Description configuring a WSGI pipeline and server, specified from an INI-style file rather than being hard-coded into program code. This configuration file defines the WSGI server, the applications used, and the middleware pipelines and filters that process requests.
logging.conf
Specifies the logging configuration for the Identity Service.
policy.json
Specifies role-based access policies determining which user can access which objects and how they can be accessed.
default_catalog.templates
The relative API endpoints for all OpenStack services are defined in this template file, which is referenced in the primary configuration file.
Authentication Tokens The Identity Service confirms a user's identity through an authentication process specified through plug-in configuration, then provides the user with a token that represents the user's identity. A typical user token is scoped, meaning that it lists the resources and access for which it may be used. Tokens have a limited time frame, allowing the user to perform service requests without further authentication until the token expires or is revoked. A scoped token lists the user rights and privileges, as defined in roles relevant to the current project. A requested OpenStack service checks the provided roles and requested resource access, then either allow or deny the requested service. Any user may use the openstack token issue command to request a current scoped token with output showing the user id, the (scope) project, and the new token expiration. This token type is actually one of three types of authorization scope: unscoped, project-scoped, and domain-scoped. Because domains are a new feature supported in the Identity Service v3, earlier documentation may refer only to scoped and unscoped tokens, in which scope is project-based. Token Scope
Description
Unscoped
Unscoped tokens are authentication-only tokens that do not contain a project, role, and service information payload. For example, an unscoped token may be used when authentication is provided by an Identity Provider other than the Identity Service, such as an LDAP, RADIUS, or AD server. The token is used to authenticate with the Identity Service, which then exchanges the unscoped token with the authenticated user's appropriate scoped token. An unscoped token may also be referred to as an Identity Service default token, which is not associated with a project or domain and may be exchanged for a scoped token.
Project-scoped
Project-scoped tokens provide authorization to perform operations on a service endpoint utilizing the resources of a single project, allowing activities specified by the user's role in that project. These tokens contain the relevant service catalog, roles, and project information as payload and are considered to be associated to a specific project.
Domain-scoped
Domain-scoped tokens apply to services that occur at the domain level, not at the project or user level. This token's payload contains the domain's
46
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Token Providers Token Scope
Description service catalog, and is limited to services that do not require per-project endpoints. The token payload also contains project and role information for the user, within the specified domain.
Token Providers There are four types of token providers: UUID, PKI, PKIZ, and the newest provider, Fernet (pronounced fehr'nεt). All tokens are comprised of a payload, in JSON or random-generated UUID format, contained in a transport format, such as a URL-friendly hexadecimal or cryptographic message syntax (CMS) packaging. The default OpenStack recommended token provider has changed a few times, as the OpenStack developers have addressed token size, security, and performance issues. UUID Tokens UUID tokens were the original and default token provider up until the Folsom release. They are 32 byte randomly generated UUIDs, which must be persistently stored in the Identity Service's configured back end to permit the Identity Service to validate the UUID each time a user makes a service request to any service endpoint. Although UUIDs are lightweight and easy to validate with a simple lookup, they have two disadvantages. First, because UUID tokens must be retained by the Identity Service back end for repetitive lookups, the storage space used grows as new tokens are generated. Until recently, expired tokens were not regularly purged from the back-end store, leading to service performance degradation. Second, every individual service API call must bundle the request and token together to send to the service component, where the service unpacks the UUID and sends a validation request to the Identity Service. The Identity Service looks up the token's identity to determine the roles and authorizations of the user, sending the information back to the resource service to determine if the service component will process the user request. This generates a tremendous amount of network traffic and activity to and from the Identity Service, which creates a scaling limitation. PKI and PKIZ Tokens Public Key Infrastructure (PKI) tokens were introduced in the Grizzly release as a solution that would decrease the scale-limiting overhead on the Identity Service back-end and increase the security of tokens by using certificates and keys to sign and validate tokens. PKI uses a JSON payload, asymmetric keys, and the cryptographic message syntax (CMS) transport format. PKIZ tokens apply zlib compression after the JSON payload in an attempt to shrink the total token size, which typically exceeds 1600 bytes. The payload contains the service catalog with a size generally proportional to the number of service entries in the catalog. The advantage of PKI tokens, because of the public key methodology, is the ability of the requested resource service component to verify and read the payload authorizations without needing to send the token back to the Identity Service for every request. To process request tokens, the requested service is only required to obtain the Identity Service's signing certificate, the current revocation list, and the CA public certificate that validates the signing certificate. Validated and unencoded tokens and payloads can be stored and shared using memcache, eliminating some repetitive token processing overhead. The disadvantage of the PKI token provider method is unacceptable performance due to oversized shared caches, increased load on the identity service back end, and other problems associated with handling tokens with large payloads. PKI tokens take longer to create and to
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
47
Chapter 2. Managing Internal OpenStack Communication validate than UUID tokens. Subsequently, UUID tokens again became the recommended token provider. PKI/PKIZ token support was deprecated in the Mitaka release and was removed in the Ocata release. Fernet Tokens Fernet tokens are an implementation of a symmetric key cryptographic authentication method, which uses the same key to both encrypt and decrypt, designed specifically to process service API request tokens. Fernet supports using multiple keys, always using the first key (the current key) in the list to perform encryption and attempting other keys in the list (former keys and about-to-become-current staged keys) to perform decryption. This technique allows Fernet keys to be rotated regularly for increased security, while still allowing tokens created with previous keys to be decrypted. Fernet tokens do not exceed 250 bytes and are not persisted in the Identity Service back end. Fernet token payloads use the MessagePack binary serialization format to efficiently carry the authentication and authorization metadata, which is then encrypted and signed. Fernet tokens do not require persistence nor do they require maintenance, as they are created and validated instantaneously on any Identity Service node that can access the Fernet symmetric keys. The symmetric keys are stored and shared on all Identity Service nodes in a key repository located by default at /etc/keystone/fernet-keys/. The Fernet token provider was introduced in the Kilo release and is the default token provider in the Ocata release. In earlier OpenStack developer documentation, these tokens were referred to as authenticated encryption (AE) tokens.
Warning All of these token providers (UUID, PKI, PKIZ, and Fernet) are known as bearer tokens, which means that anyone holding the token can impersonate the user represented in that token without having to provide any authentication credentials. Bearer tokens must be protected from unnecessary disclosure to prevent unauthorized access.
Identity Service Administration Token providers typically require minimal management tasks after they have been properly installed and configured. UUID tokens require flushing expired tokens regularly. Fernet tokens require rotating keys regularly for security, and distributing the key repository among all Identity Service nodes in a multi-node HA deployment. PKI token providers require maintenance of certificates, expirations, and revocations, plus management of the persistent store. Since PKI tokens are deprecated, this section only discusses UUID and Fernet token tasks. Flushing Expired UUID Tokens By default, the Identity Service's expired tokens remain stored in its database, increasing the database size and degrading service performance. Red Hat recommends changing the daily token_flush cron job to run hourly to find and flush expired tokens. In /var/spool/cron/ keystone, modify the task to be hourly (instead of the default daily) and redirect output to a log file: PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh @hourly keystone-manage token_flush &> /var/log/keystone/keystone-tokenflush.log
If necessary, the tokens flushed in the last hour can be viewed in the log file /var/log/ keystone/keystone-tokenflush.log. The log file does not grow in size, since the cron
48
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Identity Service Administration job overwrites the log file each hour. When the cron job is first modified, the token database will be larger than it will need to be in the future, since it will now be flushed hourly. However, the database will not automatically reclaim unused space and should be truncated to relinquish all currently used disk space: [user@demo ~]# echo "TRUNCATE TABLE token" | sudo mysql -D keystone
Rotating Fernet Tokens Fernet tokens do not require persistence, but the Fernet symmetric keys must be shared by all Identity Service nodes that may be asked to validate a Fernet token. Since tokens should be replaced on a regular basis to minimize the ability to create impersonated Fernet tokens, the Fernet token provider uses a rotation method to put new symmetric keys into use without breaking the ability to decrypt Fernet tokens created with a previous key. To understand key rotation, the terminology of Fernet key usage is descriptive: • Primary Key: the primary key is considered to be the current key. There can only be one primary key on a single Identity Service node, recognized because its file name always has the highest index number. Primary keys are used to both encrypt and decrypt Fernet tokens. • Secondary Key: a secondary key is the key that was formerly a primary key and has been replaced (rotated out). It is only used to decrypt Fernet tokens; specifically, to decrypt any remaining Fernet tokens that it had originally encrypted. A secondary key's file is named with an index that is lower than the highest, but never has the index of 0. • Staged Key: a staged key is a newly added key that will be the next primary key when the keys are next rotated. Similar to a secondary key, it is only used to decrypt tokens, which seems unnecessary since it has not yet been a primary key and has never encrypted tokens on this Identity Service node. However, in a multi-node Identity Service configuration, after the key repository has been updated with a new staged key and distributed to all Identity Service nodes, those nodes will perform key rotation one at a time. A staged key on one node may be needed to decrypt tokens created by another node where that key has already become the primary key. The staged key is always recognized by having a file name with the index of 0. Service Account Deprecation OpenStack users learn about the default service accounts that are created by a typical installation and are role-assigned to the service project. These service users have names for each of the OpenStack service components, such as keystone, nova, glance, neutron, swift, and cinder. The primary purpose of these accounts is to be the service side of twoway PKI certificate authentication protocols. As the OpenStack Identity Service developers move towards future tokenless authentication methods in the Pike, Queen, and later releases, and the removal of the PKI token provider in the Ocata release, these service accounts will no longer be necessary and will also be removed in a future release. Endpoint Deprecation for adminURL Since the beginning of the Identity Service, there have always been three types of endpoints; publicURL, internalURL, and adminURL. Originally, the endpoints were designed to segregate traffic onto public or private networks for security reasons. The admin_url endpoint was implemented only in the Identity Service, where a small set of additional API functions allowed an admin to bootstrap the Identity Service. Other services did not implement adminonly API distinctions. In later OpenStack releases, having a separate adminURL endpoint became unnecessary because users could be checked for their role privileges no matter which endpoint they used, and allowed access to admin-only privileges accordingly.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
49
Chapter 2. Managing Internal OpenStack Communication When the Identity Service v2 API becomes deprecated in some future release, the last remaining adminURL distinction, that of the end user and admin CRUD PasteDeploy pipeline routines, will no longer be necessary and the adminURL endpoint will also be deprecated and removed.
References Keystone tokens https://docs.openstack.org/keystone/latest/admin/identity-tokens.html
50
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Quiz: Describing the Identity Service Architecture
Quiz: Describing the Identity Service Architecture Choose the correct answer(s) to the following questions: 1.
Which service in the Keystone architecture is responsible for domains? a. b. c. d. e.
2.
Which service in the Keystone architecture provides a rule-based authorization engine? a. b. c. d. e.
3.
Scoped Token Domain Token Unscoped Token PKI Token
Which Keystone configuration file contains role-based access policy entries that determine which user can access which objects and how they can be accessed? a. b. c. d.
5.
Policy Resource Catalog Token User
Which type of token authorization describes tokens that are not attached to a project? a. b. c. d.
4.
Policy Resource Catalog Token User
policy.json default_catalog.templates keystone-paste.ini keystone-env.conf
Which two token providers use cryptographic message syntax (CMS)? (Choose two.) a. b. c. d. e.
Fernet PKI PKIZ Scoped token UUID
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
51
Chapter 2. Managing Internal OpenStack Communication
Solution Choose the correct answer(s) to the following questions: 1.
Which service in the Keystone architecture is responsible for domains? a. b. c. d. e.
2.
Which service in the Keystone architecture provides a rule-based authorization engine? a. b. c. d. e.
3.
policy.json default_catalog.templates keystone-paste.ini keystone-env.conf
Which two token providers use cryptographic message syntax (CMS)? (Choose two.) a. b. c. d. e.
52
Scoped Token Domain Token Unscoped Token PKI Token
Which Keystone configuration file contains role-based access policy entries that determine which user can access which objects and how they can be accessed? a. b. c. d.
5.
Policy Resource Catalog Token User
Which type of token authorization describes tokens that are not attached to a project? a. b. c. d.
4.
Policy Resource Catalog Token User
Fernet PKI PKIZ Scoped token UUID
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Administering the Service Catalog
Administering the Service Catalog Objective After completing this section, students should be able to administer the service catalog.
Keystone Service Catalog The service catalog is a crucial element of the Keystone architecture. The service catalog provides a list of endpoint URLs that can be dynamically discovered by API clients. The service endpoints that can be accessed by a token are provided by the service catalog. Without a service catalog, API clients would be unaware of which URL an API request should use. The openstack catalog show command displays catalog information for a service. The service name is passed as an argument to the command. For example, to view the service catalog data for nova compute, use the following command: [user@demo ~(admin)]$ openstack catalog show nova +-----------+----------------------------------------------------+ | Field | Value | +-----------+----------------------------------------------------+ | endpoints | regionOne | | | |
| | | |
| name
| nova
| publicURL: http://172.25.250.50:8774/v2.1 | internalURL: http://172.24.1.50:8774/v2.1 | adminURL: http://172.24.1.50:8774/v2.1 | | |
| type | compute | +-----------+----------------------------------------------------+
The region for the URLs. The list of internal, public, and admin URLs by which an API client request can access the Nova compute service. The user-facing service name. The OpenStack registered type, such as image-service and object-store.
Endpoints An endpoint is a URL that an API client uses to access a service in OpenStack. Every service has one or more endpoints. There are three types of endpoint URLs: adminURL, publicURL, and internalURL. The adminURL should only be consumed by those who require administrative access to a service endpoint. The internalURL is used by services to communicate with each other on a network that is unmetered or free of bandwidth charges. The publicURL is designed with the intention of being consumed by end users from a public network. The adminURL is meant only for access requiring administrative privileges. To list the services and their endpoints, use the openstack catalog list command as the OpenStack admin user. [user@demo ~(admin)]$ openstack catalog list +---------+---------+------------------------------------------------+
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
53
Chapter 2. Managing Internal OpenStack Communication | Name | Type | Endpoints | +---------+---------+------------------------------------------------+ | nova | compute | regionOne | | | | publicURL: https://172.25.249.201:13774/v2.1 | | | | internalURL: http://172.25.249.200:8774/v2.1 | | | | adminURL: http://172.25.249.200:8774/v2.1 | | | | | | neutron | network | regionOne | | | | publicURL: https://172.25.249.201:13696 | | | | internalURL: http://172.25.249.200:9696 | | | | adminURL: http://172.25.249.200:9696 | ...output omitted...
To list the ID, region, service name, and service type of all the endpoints, use the openstack endpoint list command. [user@demo ~(admin)]$ openstack endpoint list +----------------------------------+-----------+--------------+----------------+ | ID | Region | Service Name | Service Type | +----------------------------------+-----------+--------------+----------------+ | d1812da138514794b27d266a22f66b15 | regionOne | aodh | alarming | | b1484c933ba74028965a51d4d0aa9f04 | regionOne | nova | compute | | 4c6117b491c243aabbf40d7dfdf5ce9a | regionOne | heat-cfn | cloudformation | | eeaa5964c26042e38c632d1a12e001f3 | regionOne | heat | orchestration | | 1aeed510fa9a433795a4ab5db80e19ec | regionOne | glance | image | ...output omitted...
Troubleshooting A proper catalog and endpoint configuration are essential for the OpenStack environment to function effectively. Common issues that lead to troubleshooting are misconfigured endpoints and user authentication. There is a known issue documented in BZ-1404324 where the scheduled token flushing job is not effective enough for large deployments, we will review the fix in the following guided exercise. When issues do arise, there are steps that can be taken to investigate and find a resolution to the issue. The following is a list of troubleshooting steps: • Ensure the authentication credentials and token are appropriate using the curl command to retrieve the service catalog. [user@demo ~(admin)]$ curl -s -X POST http://172.25.250.50:35357/v2.0/tokens \ -d '{"auth": {"passwordCredentials": {"username":"admin", \ "password":"Y7Q72DfAjKjUgA2G87yHEJ2Bz"}, "tenantName":"admin"}}' \ -H "Content-type: application/json" | jq . { "access": { "metadata": { "roles": [ "f79b6d8bfada4ab89a7d84ce4a0747ff" ], "is_admin": 0 }, "user": { "name": "admin", "roles": [ { "name": "admin" } ], "id": "15ceac73d7bb4437a34ee26670571612",
54
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting "roles_links": [], "username": "admin" }, "serviceCatalog": [ { "name": "nova", "type": "compute", "endpoints_links": [], "endpoints": [ ...output omitted...
• Inspect the /var/log/keystone/keystone.log for [Errno 111] Connection refused errors. This indicates there is an issue connecting to a service endpoint. 2017-06-04 14:07:49.332 2855 ERROR oslo.messaging._drivers.impl_rabbit [req-1b8d5196-d787-49db-be60-025ce0ab575d - - - - -] [73809126-9833-487aa69a-4a7d9dffd08c] AMQP server on 172.25.249.200:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None
• Every service has an API log that should be inspected when troubleshooting endpoints. For example, if an operator cannot retrieve Glance image data, an inspection of /var/log/ glance/api.log may provide useful information. Query the file for DiscoveryFailure. DiscoveryFailure: Could not determine a suitable URL for the plugin 2017-05-30 04:31:17.650 277258 INFO eventlet.wsgi.server [-] 172.24.3.1 - - [30/ May/2017 04:31:17] "GET /v2/images HTTP/1.1" 500 139 0.003257
• Include the --debug option to the openstack catalog show command (or to any openstack command) to view the HTTP request from the client and the responses from the endpoints. For example, the following lists the HTTP request from nova compute and the response from the endpoint. [user@demo ~(admin)]$ openstack catalog show nova --debug ...output omitted... Get auth_ref REQ: curl -g -i -X GET http://172.25.250.50:5000/v2.0 -H "Accept: application/json" -H "User-Agent: osc-lib keystoneauth1/2.12.2 python-requests/2.10.0 CPython/2.7.5" Starting new HTTP connection (1): 172.25.250.50 "GET /v2.0 HTTP/1.1" 200 230 RESP: [200] Date: Mon, 05 Jun 2017 08:11:19 GMT Server: Apache Vary: X-AuthToken,Accept-Encoding x-openstack-request-id: req-64ed1753-5c56-4f61-b62a-46bf3097c912 Content-Encoding: gzip Content-Length: 230 Content-Type: application/json Making authentication request to http://172.25.250.50:5000/v2.0/tokens "POST /v2.0/tokens HTTP/1.1" 200 1097
Administering the Service Catalog The following steps outline the process for displaying the service catalog and service endpoints. 1.
Use the command openstack token issue to retrieve a scoped token.
2.
Verify the token by using the curl command with the token to list projects.
3.
Display the service catalog using the openstack catalog list command.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
55
Chapter 2. Managing Internal OpenStack Communication 4.
Display endpoints and the ID for a particular service using the openstack catalog show command, for instance, passing the service name nova as an argument.
References Identity Concepts https://docs.openstack.org/keystone/latest/admin/identity-concepts.html API endpoint configuration https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configurationrecommendations.html
56
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Administering the Service Catalog
Guided Exercise: Administering the Service Catalog In this exercise, you will view the Keystone endpoints and catalog, issue a token, and manage token expiration. Outcomes You should be able to: • View the Keystone service catalog. • View the Keystone service endpoints. • Issue a Keystone token. • Clear expired tokens from the database. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab communication-svc-catalog setup command. This script will ensure the OpenStack services are running and the environment is properly configured for this guided exercise. [student@workstation ~]$ lab communication-svc-catalog setup
Steps 1. On workstation, source the Keystone admin-rc file and list the Keystone endpoints registry. Take note of the available service names and types. [student@workstation ~]$ source admin-rc [student@workstation ~(admin-admin)]$ openstack endpoint list +----------------------------------+-----------+--------------+----------------+ | ID | Region | Service Name | Service Type | +----------------------------------+-----------+--------------+----------------+ | d1812da138514794b27d266a22f66b15 | regionOne | aodh | alarming | | b1484c933ba74028965a51d4d0aa9f04 | regionOne | nova | compute | | 4c6117b491c243aabbf40d7dfdf5ce9a | regionOne | heat-cfn | cloudformation | | eeaa5964c26042e38c632d1a12e001f3 | regionOne | heat | orchestration | | 1aeed510fa9a433795a4ab5db80e19ec | regionOne | glance | image | | 77010d1ff8684b3292aad55e30a3db29 | regionOne | gnocchi | metric | | 1d023037af8e4feea5e23ff57ad0cb77 | regionOne | keystone | identity | | 30b535478d024416986a8e3cc52a7971 | regionOne | cinderv2 | volumev2 | | 23fef1b434664188970e2e6b011eb3fa | regionOne | ceilometer | metering | | 4cf973e0d1f34f2497f8c521b6128ca7 | regionOne | swift | object-store | | 853e51122b5e490ab0b85289ad879371 | regionOne | cinderv3 | volumev3 | | 7f2a3a364a7a4a608f8581aed3b7b9e0 | regionOne | neutron | network | | ca01cf7bee8542b7bd5c068f873bcd51 | regionOne | cinder | volume | +----------------------------------+-----------+--------------+----------------+
2.
View the Keystone service catalog and notice the endpoint URLs (especially the IP addresses), the version number, and the port number.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
57
Chapter 2. Managing Internal OpenStack Communication
[student@workstation ~(admin-admin)]$ openstack catalog list -f value ...output omitted... keystone identity regionOne publicURL: http://172.25.250.50:5000/v2.0 internalURL: http://172.24.1.50:5000/v2.0 adminURL: http://172.25.249.50:35357/v2.0
3.
Issue an admin token to manually (using curl) find information about OpenStack. [student@workstation ~(admin-admin)]$ openstack token issue +------------+----------------------------------+ | Field | Value | +------------+----------------------------------+ | expires | 2017-05-26 09:21:38+00:00 | | id | 1cdacca5070b44ada325f861007461c1 | | project_id | fd0ce487ea074bc0ace047accb3163da | | user_id | 15ceac73d7bb4437a34ee26670571612 | +------------+----------------------------------+
4.
Verify the token retrieved in the previous command. Use the curl command with the token ID to retrieve the projects (tenants) for the admin user. [student@workstation ~(admin-admin)]$ curl -H "X-Auth-Token:\ 1cdacca5070b44ada325f861007461c1" http://172.25.250.50:5000/v2.0/tenants {"tenants_links": [], "tenants": [{"description": "admin tenant", "enabled": true, "id": "0b73c3d8b10e430faeb972fec5afa5e6", "name": "admin"}]}
5.
Use SSH to connect to director as the user root. The database, MariaDB, resides on director and provides storage for expired tokens. Accessing MariaDB enables you to determine the amount of space used for expired tokens. [student@workstation ~(admin-admin)]$ ssh root@director
6.
Log in to MariaDB. [root@director ~]# mysql -u root Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 981 Server version: 5.5.52-MariaDB MariaDB Server ...output omitted...
7.
Use an SQL statement to list the tables and pay special attention to the size of the token table. MariaDB [(none)]> use keystone MariaDB [keystone]> SELECT table_name, (data_length+index_length) tablesize \ FROM information_schema.tables; +----------------------------------------------+-----------+ | table_name | tablesize | +----------------------------------------------+-----------+ ...output omitted... token | 4308992 |
58
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
...output omitted...
8.
Use an SQL statement to view the amount of space used for expired Keystone tokens. MariaDB [keystone]> SELECT COUNT(*) FROM token WHERE token.expires < \ CONVERT_TZ(NOW(), @@session.time_zone, '+00:00'); +----------+ | COUNT(*) | +----------+ | 149 | +----------+ 1 row in set (0.00 sec)
9.
Truncate the token table then ensure the amount of space used for expired tokens is zero.
MariaDB [keystone]> TRUNCATE TABLE token; Query OK, 0 rows affected (0.04 sec) MariaDB [keystone]> SELECT COUNT(*) FROM token WHERE token.expires < \ CONVERT_TZ(NOW(), @@session.time_zone, '+00:00'); +----------+ | COUNT(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)
10. Log out of MariaDB. MariaDB [keystone]> exit Bye [root@director ~]#
11.
Ensure that the Keystone user has a cron job to flush tokens from the database. [root@director ~]# crontab -u keystone -l ...output omitted... PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh 1 0 * * * keystone-manage token_flush >>/dev/null 2>1
12. Modify the cron job to run keystone-manage token_flush hourly. [root@director ~]# crontab -u keystone -e ...output omitted... PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh @hourly keystone-manage token_flush >>/dev/null 2>1
13. Log out of director. [root@director ~]# exit [student@workstation ~(admin-admin)]$
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
59
Chapter 2. Managing Internal OpenStack Communication Cleanup From workstation, run the lab communication-svc-catalog cleanup script to clean up the resources created in this exercise. [student@workstation ~(admin-admin)]$ lab communication-svc-catalog cleanup
60
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Managing Message Brokering
Managing Message Brokering Objective After completing this section, students should be able to manage messages and the message broker.
RabbitMQ Overview OpenStack software provides a collection of services covering all the functionality associated with a private cloud solution. Those services are composed internally of different components, allowing a flexible and scalable configuration. OpenStack services base their back end on two services, a database for persistence and a message broker for supporting communications among the components of each service. Any message broker solution supporting AMQP can be used as a message broker back end. Red Hat includes RabbitMQ as the message broker to be used on its OpenStack architecture, since it provides enterprise-level features useful for setting up advanced configurations. The following table provides some common RabbitMQ terms and definitions. Term
Description
Exchange
retrieves published messages from the producer and distributes them to queues
Publisher/Producer
applications that publish the message
Consumer
applications that process the message
Queues
stores the message
Routing Key
used by the exchange to determine how to route the message
Binding
the link between a queue and an exchange
A message broker allows message sending and receiving among producers and consumer applications. Internally, this communication is executed by RabbitMQ using exchanges, queues, and the bindings among those two. When an application produces a message that it wants to send to one or more consumer applications, it places that message on an exchange to which one or more queues are bound. Consumers can subscribe to those queues in order to receive the message from the producer. The communication is based on the routing key included in the message to be transmitted.
Exchange Overview The exchange's interaction with a queue is based on the match between the routing key included in the message and the binding key associated to the queue on the related exchange. Depending on the usage of those two elements, there are several types of exchanges in RabbitMQ. • Direct Consumers are subscribed to a queue with an associated binding key, and the producer sets the routing key of the message to be the same as that of the binding key of the queue to which the desired consumer is subscribed. • Topic
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
61
Chapter 2. Managing Internal OpenStack Communication Consumers are subscribed to a queue that has a binding key including wildcards, so producers can send messages with different but related routing keys to that queue. • Fanout The message is broadcast to all the subscribed queues without regard for whether the routing and binding keys match. • Headers This makes use of the header properties of the message to perform the match against the binding arguments of the queue.
Configuration Files and Logs The following table provides a description of RabbitMQ configuration files. File name
Description
/etc/rabbitmq/enabled_plugins
contains a list of the enabled plugins
/etc/rabbitmq/rabbitmq-env.conf
overrides the defaults built in to the RabbitMQ startup scripts
/etc/rabbitmq/rabbitmq.config
provides the standard Erlang configuration file that allows the RabbitMQ core application, Erlang services, and RabbitMQ plugins to be configured
/var/log/rabbitmq/ [email protected]
contains logs of runtime events
Troubleshooting OpenStack services follow a component architecture. The functionalities of a service are split into different components, and each component communicates with other components using the message broker. In order to troubleshoot a problem with an OpenStack service, it is important to understand the workflow a request follows as it moves through the different components of the service. Generally, the OpenStack service architecture provides a unique component to make each service’s API available. The Cinder block storage service, for example, is managed by the cinder-api service. The API component is the entry point to the rest of the component architecture of its service. When trying to isolate a problem with a service, check its API provider first. After the API component has been verified, and if no errors appear in the log files, confirm that the remaining components can communicate without issue. Any error related to the RabbitMQ message broker, or its configuration in the related service configuration file, should appear in the log files of the service. For the Cinder block storage service, after the cinderapi has processed the petition through the Cinder API, the petition is processed by both the cinder-volume and cinder-scheduler processes. These components take care of communicating among themselves using the RabbitMQ message broker to create the volume on the most feasible storage back end location. Cinder block storage service components (cinderscheduler, for example) do not function correctly with a broken RabbitMQ back end that crashes unexpectedly. Debug the issue by checking the component-related logs, such as /var/ log/cinder/scheduler.log. Then check for problems with the component as a client for the RabbitMQ message broker. When a component crashes from RabbitMQ-related issues, it is usually
62
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
RabbitMQ Utilities due to a misconfiguration of either authorization or encryption. These errors are in a related configuration file, such as /etc/cinder/cinder.conf for Cinder components. Sometimes, however, a crash occurs for reasons other than RabbitMQ, such as unavailable Cinder block storage services.
RabbitMQ Utilities RabbitMQ provides a suite of utilities to check the RabbitMQ daemon status and to execute administrative operations. These tools are used to check the different configurable elements on a RabbitMQ instance, including the queues used by the producers and consumers to share messages, the exchanges to which those queues are connected to, and the bindings among the components. The following table describes the RabbitMQ utility commands. Utility
Description
rabbitmqctl
command line tool for managing a RabbitMQ broker
rabbitmqadmin provided by the management plugin, used to perform the same actions as the web-based UI, and can be used with scripting
Commands for rabbitmqctl The following is a list of typical commands that are used with the rabbitmqctl command. • Use the report command to show a summary of the current status of the RabbitMQ daemon, including the number and types of exchanges and queues. [user@demo ~]$ rabbitmqctl report {listeners,[{clustering,25672,"::"},{amqp,5672,"172.25.249.200"}]}, ...output omitted...
• Use the add_user command to create RabbitMQ users. For example, to create a RabbitMQ user named demo with redhat as the password, use the following command: [user@demo ~]$ rabbitmqctl add_user demo redhat
• Use the set_permissions command to set the authorization for a RabbitMQ user. This option sets the configure, write, and read permissions that correspond to the three wildcards used in the command, respectively. For example, to set configure, write, and read permissions for the RabbitMQ user demo, use the following command: [user@demo ~]$ rabbitmqctl set_permissions demo ".*" ".*" ".*"
• Use the list_users command to list the RabbitMQ users. [user@demo ~]$ rabbitmqctl list_users Listing users ... c65393088ebee0e2170b044f924f2d924ae78276 [administrator] demo
• Use the set_user_tags command to enable authorization for the management back end. For example, to assign the RabbitMQ user demo administrator access, use the following command.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
63
Chapter 2. Managing Internal OpenStack Communication
[user@demo ~]$ rabbitmqctl set_user_tags demo administrator
• Use the list_exchanges command with rabbitmqctl to show the default configured exchanges on the RabbitMQ daemon. [user@demo ~]$ rabbitmqctl list_exchanges Listing exchanges ... amq.match headers keystone topic q-agent-notifier-security_group-update_fanout fanout ...output omitted...
• Use the list_queues command to list the available queues and their attributes. [user@demo ~]$ rabbitmqctl list_queues Listing queues ... q-agent-notifier-port-delete.director.lab.example.com 0 ...output omitted...
• Use the list_consumers command to list all the consumers and the queues to which they are subscribed. [user@demo ~]$ rabbitmqctl list_consumers Listing consumers ... q-agent-notifier-port-delete.director.lab.example.com [email protected]> 2 true 0 [] q-agent-notifier-port-update.director.lab.example.com [email protected]> 2 true 0 [] mistral_executor.0.0.0.0 [email protected]> 2 true 0 [] ...output omitted...
Commands for rabbitmqadmin The following is a list of typical commands that are used with the rabbitmqadmin command. The rabbitmqadmin command must be executed as the root user or a RabbitMQ user with appropriate permissions. Prior to using the command, the rabbitmq_management plugin must be enabled and the rabbitmqadmin binary must be added to the PATH environment variable and set executable for the user root. This can be accomplished through manually exporting the binary and setting execution permissions or by using the command rabbitmq-plugins enable rabbitmq_management. • Use the declare queue command to create a queue. For example, to create a new queue name demo.queue, use the following command: [root@demo ~]# rabbitmqadmin -u demo -p redhat declare queue name=demo.queue
• Use the declare exchange command to create an exchange. For example, to create a topic exchange named demo.topic, use the following command: [root@demo ~]# rabbitmqadmin -u demo -p redhat declare exchange name=demo.topic \ type=topic
64
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
RabbitMQ Utilities • Use the publish command to publish a message to a queue. For example, to publish the message 'demo message!' to the demo.queue queue, execute the command, type the message, then press Ctrl+D to publish the message. [root@demo ~]# rabbitmqadmin -u demo -p redhat publish routing_key=demo.queue 'demo message!' Ctrl+D Message published
• Use the get command to display a message for a queue. For example, to display the message published to the queue demo.queue use the following command: [root@demo ~]# rabbitmqadmin -u rabbitmqauth -p redhat get queue=demo.queue -f json { "exchange": "", "message_count": 0, "payload": "'demo message!'\n", "payload_bytes": 15, "payload_encoding": "string", "properties": [], "redelivered": true, "routing_key": "demo.queue" } ]
Publishing a Message to a Queue The following steps outline the process for publishing a message to a queue. 1.
Create a RabbitMQ user using the rabbitmqctl add_user command.
2.
Configure the user permissions using the rabbitmqctl set_permissions command.
3.
Set the user tag to administrator or guest, using the rabbitmqctl set_user_tags command.
4.
Create a message queue using the rabbitmqadmin declare queue command.
5.
Publish a message to a queue using the rabbitmqadmin publish command.
6.
Display the queued message using the rabbitmqadmin get command.
References Management CLI https://www.rabbitmq.com/management-cli.html Management Plugins https://www.rabbitmq.com/management.html Troubleshooting https://www.rabbitmq.com/troubleshooting.html
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
65
Chapter 2. Managing Internal OpenStack Communication
Guided Exercise: Managing Message Brokering In this exercise, you will enable the RabbitMQ Management Plugin to create an exchange and queue, publish a message, and retrieve it. Resources Files:
http://material.example.com/cl210_producer, http:// material.example.com/cl210_consumer
Outcomes You should be able to: • • • • • •
Authorize a RabbitMQ user. Enable the RabbitMQ Management Plugin. Create a message exchange. Create a message queue. Publish a message to a queue. Retrieve a published message.
Before you begin Log in to workstation as student using student as the password. On workstation, run the lab communication-msg-brokering setup command. This ensures that the required utility is available on director. [student@workstation ~]$ lab communication-msg-brokering setup
Steps 1. From workstation, use SSH to connect to director as the stack user. Use sudo to become the root user. [student@workstation ~]$ ssh stack@director [stack@director ~]$ sudo -i
2.
Create a rabbitmq user named rabbitmqauth with redhat as the password. [root@director ~]# rabbitmqctl add_user rabbitmqauth redhat Creating user "rabbitmqauth" ...
3.
Configure permissions for the rabbitmqauth user. Use wildcard syntax to assign all resources to each of the three permissions for configure, write, and read. [root@director ~]# rabbitmqctl set_permissions rabbitmqauth ".*" ".*" ".*" Setting permissions for user "rabbitmqauth" in vhost "/" ...
4.
Set the administrator user tag to enable privileges for rabbitmqauth. [root@director ~]# rabbitmqctl set_user_tags rabbitmqauth administrator Setting tags for user "rabbitmqauth" to [administrator] ...
66
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
5.
Verify that a RabbitMQ Management configuration file exists in root's home directory. The contents should match as shown here. [root@director ~]# cat ~/.rabbitmqadmin.conf [default] hostname = 172.25.249.200 port = 15672 username = rabbitmqauth password = redhat
6.
Verify that rabbitmqauth is configured as an administrator. [root@director ~]# rabbitmqctl list_users Listing users ... c65393088ebee0e2170b044f924f2d924ae78276 [administrator] rabbitmqauth [administrator]
7.
Create an exchange topic named cl210.topic. [root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf declare exchange \ name=cl210.topic type=topic exchange declared
8.
Verify that the exchange topic is created. [root@director ~]# rabbitmqctl list_exchanges | grep cl210.topic cl210.topic topic
9.
Download the scripts cl210_producer and cl210_consumer from http:// materials.example.com/ to /root and make them executable. [root@director ~]# wget http://materials.example.com/cl210_producer [root@director ~]# wget http://materials.example.com/cl210_consumer
10. On workstation, open a second terminal. Using SSH, log in as the stack user to director. Switch to the root user. Launch the cl210_consumer script using anonymous.info as the routing key. [student@workstation ~]$ ssh stack@director [stack@director ~]$ sudo -i [root@director ~]# python /root/cl210_consumer anonymous.info
11.
In the first terminal, launch the cl210_producer script to send messages using the routing key anonymous.info. [root@director ~]# python /root/cl210_producer [x] Sent 'anonymous.info':'Hello World!'
12. In the second terminal, sent message(s) are received and displayed. Running the cl210_producer script multiple times sends multiple messages.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
67
Chapter 2. Managing Internal OpenStack Communication
[x] 'anonymous.info':'Hello World!'
Exit this cl210_consumer terminal after observing the message(s) being received. You are finished with the example publisher-consumer exchange scripts. 13. The next practice is to observe a message queue. Create a queue named redhat.queue. [root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf declare queue \ name=redhat.queue queue declared
14. Verify that the queue is created. The message count is zero. [root@director ~]# rabbitmqctl list_queues | grep redhat redhat.queue 0
15. Publish messages to the redhat.queue queue. These first two examples include the message payload on the command line. [root@director ~]# rabbitmqadmin publish routing_key=redhat.queue Message published [root@director ~]# rabbitmqadmin publish routing_key=redhat.queue Message published
-c ~/.rabbitmqadmin.conf \ payload="a message" -c ~/.rabbitmqadmin.conf \ payload="another message"
16. Publish a third message to the redhat.queue queue, but without using the payload parameter. When executing the command without specifying a payload, rabbitmqadmin waits for multi-line input. Press Ctrl+D when the cursor is alone at the first space of a new line to end message entry and publish the message. [root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf \ publish routing_key=redhat.queue message line 1 message line 2 message line 3 Ctrl+D Message published
17.
Verify that the redhat queue has an increased message count. [root@director ~]# rabbitmqctl list_queues | grep redhat redhat.queue 3
18. Display the first message in the queue. The message_count field indicates how many more messages exist after this one. [root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf get queue=redhat.queue \ -f pretty_json [ {
68
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
"exchange": "", "message_count": 2, "payload": "a message", "payload_bytes": 9, "payload_encoding": "string", "properties": [], "redelivered": false, "routing_key": "redhat.queue" } ]
19. Display multiple messages using the count option. Each displayed message indicates how many more messages follow. The redelivered field indicates whether you have previously viewed this specific message. [root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf get queue=redhat.queue \ count=2 -f pretty_json [ { "exchange": "", "message_count": 2, "payload": "a message", "payload_bytes": 9, "payload_encoding": "string", "properties": [], "redelivered": true, "routing_key": "redhat.queue" } { "exchange": "", "message_count": 1, "payload": "another message", "payload_bytes": 15, "payload_encoding": "string", "properties": [], "redelivered": false, "routing_key": "redhat.queue" } ]
20. When finished, delete the queue named redhat.queue. Return to workstation. [root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf delete queue \ name=redhat.queue queue deleted [root@director ~]# exit [stack@director ~]$ exit [student@workstation ~]$
Cleanup From workstation, run lab communication-msg-brokering cleanup to clean up resources created for this exercise. [student@workstation ~]$ lab communication-msg-brokering cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
69
Chapter 2. Managing Internal OpenStack Communication
Lab: Managing Internal OpenStack Communication In this lab, you will troubleshoot and fix issues with the Keystone identity service and the RabbitMQ message broker. Outcomes You should be able to: • Troubleshoot the Keystone identity service. • Troubleshoot the RabbitMQ message broker. Scenario During a recent deployment of the overcloud, cloud administrators are reporting issues with the Compute and Image services. Cloud administrators are not able to access the Image service nor the Compute service APIs. You have been tasked with troubleshooting and fixing these issues. Before you begin Log in to workstation as student with a password of student. On workstation, run the lab communication-review setup command. This ensures that the OpenStack services are running and the environment has been properly configured for this lab. [student@workstation ~]$ lab communication-review setup
Steps 1. From workstation, verify the issue by attempting to list instances as the OpenStack admin user. The command is expected to hang. 2.
Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting.
3.
Check the Compute service logs for any applicable errors.
4.
Investigate and fix the issue based on the error discovered in the log. Modify the incorrect rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf and use HUP signal to respawn the beam.smp process. Log out of the controller0 node when finished.
5.
From workstation, attempt to aqgain list instances, to verify that the issue is fixed. This command is expected to display instances or return to a command prompt without hangiing.
6.
Next, attempt to list images as well. The command is expected to fail, returning an internal server error.
7.
Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting.
8.
Inspect the Image service logs for any applicable errors.
9.
The error in the Image service log indicates a communication issue with the Image service API and the Identity service. In a previous step, you verified that the Identity service could
70
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
communicate with the Compute service API, so the next logical step is to focus on the Image service configuration. Investigate and fix the issue based on the traceback found in the Image service log. 10. From workstation, again attempt to list images to verify the fix. This command should succeed and returning a command prompt without error. Cleanup From workstation, run the lab communication-review cleanup script to clean up the resources created in this exercise. [student@workstation ~]$ lab communication-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
71
Chapter 2. Managing Internal OpenStack Communication
Solution In this lab, you will troubleshoot and fix issues with the Keystone identity service and the RabbitMQ message broker. Outcomes You should be able to: • Troubleshoot the Keystone identity service. • Troubleshoot the RabbitMQ message broker. Scenario During a recent deployment of the overcloud, cloud administrators are reporting issues with the Compute and Image services. Cloud administrators are not able to access the Image service nor the Compute service APIs. You have been tasked with troubleshooting and fixing these issues. Before you begin Log in to workstation as student with a password of student. On workstation, run the lab communication-review setup command. This ensures that the OpenStack services are running and the environment has been properly configured for this lab. [student@workstation ~]$ lab communication-review setup
Steps 1. From workstation, verify the issue by attempting to list instances as the OpenStack admin user. The command is expected to hang. 1.1. From workstation, source the admin-rc credential file. Attempt to list any running instances. The command is expected to hang, and does not return to the command prompt. Use Ctrl+C to escape the command. [student@workstation ~]$ source admin-rc [student@workstation ~(admin-admin)]$ openstack server list
2.
Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting. 2.1. From workstation, use SSH to connect to controller0 as the heat-admin user. [student@workstation ~(admin-admin)]$ ssh heat-admin@controller0
3.
Check the Compute service logs for any applicable errors. 3.1. Check /var/log/nova/nova-conductor.log on controller0 for a recent error from the AMQP server. [heat-admin@controller0 ~]$ sudo tail /var/log/nova/nova-conductor.log 2017-05-30 02:54:28.223 6693 ERROR oslo.messaging._drivers.impl_rabbit [-] [3a3a6e2f-00bf-4a4a-8ba5-91bc32c381dc] AMQP server on 172.24.1.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None
72
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 4.
Investigate and fix the issue based on the error discovered in the log. Modify the incorrect rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf and use HUP signal to respawn the beam.smp process. Log out of the controller0 node when finished. 4.1. Modify the incorrect rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf by setting the variable NODE_PORT to 5672. Check that the variable is correct by displaying the value again with the --get option. Because this file does not have a section header, crudini requires specifying the section as "". [heat-admin@controller0 ~]$ sudo crudini \ --set /etc/rabbitmq/rabbitmq-env.conf \ "" NODE_PORT 5672 [heat-admin@controller0 ~]$ sudo crudini \ --get /etc/rabbitmq/rabbitmq-env.conf \ "" NODE_PORT 5672
4.2. List the process ID for the beam.smp process. The beam.smp process is the application virtual machine that interprets the Erlang language bytecode in which RabbitMQ works. By locating and restarting this process, RabbitMQ reloads the fixed configuration. [heat-admin@controller0 ~]$ sudo ps -ef | grep beam.smp rabbitmq 837197 836998 10 03:42 ? 00:00:01 /usr/lib64/erlang/erts-7.3.1.2/bin/ beam.smp -rabbit tcp_listeners [{"172.24.1.1",56721
4.3. Restart beam.smp by send a hangup signal to the retrieved process ID. [heat-admin@controller0 ~]$ sudo kill -HUP 837197
4.4. List the beam.smp process ID to verify the tcp_listeners port is now 5672. [heat-admin@controller0 ~]$ sudo ps -ef |grep beam.smp rabbitmq 837197 836998 10 03:42 ? 00:00:01 /usr/lib64/erlang/erts-7.3.1.2/bin/ beam.smp -rabbit tcp_listeners [{"172.24.1.1",5672
4.5. Log out of controller0. [heat-admin@controller0 ~]$ exit [student@workstation ~(admin-admin)]$
5.
From workstation, attempt to aqgain list instances, to verify that the issue is fixed. This command is expected to display instances or return to a command prompt without hangiing. 5.1. From workstation, list the instances again. [student@workstation ~(admin-admin)]$ openstack server list [student@workstation ~(admin-admin)]$
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
73
Chapter 2. Managing Internal OpenStack Communication 6.
Next, attempt to list images as well. The command is expected to fail, returning an internal server error. 6.1. Attempt to list images. [student@workstation ~(admin-admin)]$ openstack image list Internal Server Error (HTTP 500)
7.
Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting. 7.1. From workstation, use SSH to connect to controller0 as the heat-admin user. [student@workstation ~(admin-admin)]$ ssh heat-admin@controller0
8.
Inspect the Image service logs for any applicable errors. 8.1. Inspect /var/log/glance/api.log on controller0 and focus on tracebacks that involve auth and URL [heat-admin@controller0 ~]$ sudo tail /var/log/glance/api.log -n 30 raise exceptions.DiscoveryFailure('Could not determine a suitable URL ' DiscoveryFailure: Could not determine a suitable URL for the plugin 2017-05-30 04:31:17.650 277258 INFO eventlet.wsgi.server [-] 172.24.3.1 - - [30/ May/2017 04:31:17] "GET /v2/images HTTP/1.1" 500 139 0.003257
9.
The error in the Image service log indicates a communication issue with the Image service API and the Identity service. In a previous step, you verified that the Identity service could communicate with the Compute service API, so the next logical step is to focus on the Image service configuration. Investigate and fix the issue based on the traceback found in the Image service log. 9.1. First, view the endpoint URL for the Identity service. [student@workstation ~(admin-admin)]$ openstack catalog show identity +-----------+---------------------------------------------+ | Field | Value | +-----------+---------------------------------------------+ | endpoints | regionOne | | | publicURL: http://172.25.250.50:5000/v2.0 | | | internalURL: http://172.24.1.50:5000/v2.0 | | | adminURL: http://172.25.249.50:35357/v2.0 | | | | | name | keystone | | type | identity | +-----------+---------------------------------------------+
9.2. The traceback in /var/log/glance/api.log indicated an issue determining the authentication URL. Inspect /etc/glance/glance-api.conf to verify auth_url setting, noting the incorrect port. [heat-admin@controller0 ~]$ sudo grep 'auth_url' /etc/glance/glance-api.conf #auth_url = None auth_url=http://172.25.249.60:3535
74
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 9.3. Modify the auth_url setting in /etc/glance/glance-api.conf to use port 35357. Check that the variable is correct by displaying the value again with the --get option. [heat-admin@controller0 ~]$ sudo crudini \ --set /etc/glance/glance-api.conf keystone_authtoken \ auth_url http://172.25.249.50:35357 [heat-admin@controller0 ~]$ sudo crudini \ --get /etc/glance/glance-api.conf \ keystone_authtoken auth_url http://172.25.249.50:35357
9.4. Restart the openstack-glance-api service. When finished, exit from controller0. [heat-admin@controller0 ~]$ sudo systemctl restart openstack-glance-api [heat-admin@controller0 ~]$ exit [student@workstation ~(admin-admin)]$
10. From workstation, again attempt to list images to verify the fix. This command should succeed and returning a command prompt without error. 10.1. From workstation, attempt to list images. This command should succeed and returning a command prompt without error. [student@workstation ~(admin-admin)]$ openstack image list [student@workstation ~(admin-admin)]$
Cleanup From workstation, run the lab communication-review cleanup script to clean up the resources created in this exercise. [student@workstation ~]$ lab communication-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
75
Chapter 2. Managing Internal OpenStack Communication
Summary In this chapter, you learned: • RabbitMQ provides a suite of utilities to check the RabbitMQ daemon status and to execute administrative operations on it. • Red Hat OpenStack Platform recommends creating a cron job that runs hourly to purge expired Keystone tokens. • The Keystone endpoint adminURL should only be consumed by those who require administrative access to a service endpoint. • PKIZ tokens add compression using zlib making them smaller than PKI tokens. • Fernet tokens have a maximum limit of 250 bytes, which makes them small enough to be ideal for API calls and minimize the data kept on disk.
76
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 3
BUILDING AND CUSTOMIZING IMAGES Overview Goal
Build and customize images
Objectives
• Describe common image formats for OpenStack. • Build an image using diskimage-builder. • Customize an image using guestfish and virt-customize.
Sections
• Describing Image Formats (and Quiz) • Building an Image (and Guided Exercise) • Customizing an Image (and Guided Exercise)
Lab
• Building and Customizing Images
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
77
Chapter 3. Building and Customizing Images
Describing Image Formats Objective After completing this section, students should be able to describe the common image formats used within Red Hat OpenStack Platform.
Common Image Formats Building custom images is a great way to implement items that are standard across your organization. Instances may be short-lived, so adding them to a configuration management system may or may not be desirable. Items that could be configured in a custom image include security hardening, third-party agents for monitoring or backup, and operator accounts with associated SSH keys. Red Hat OpenStack Platform supports many virtual disk image formats, including RAW, QCOW2, AMI, VHD, and VMDK. In this chapter we will discuss the RAW and QCOW2 formats, their features, and their use in Red Hat OpenStack Platform. Image Format Overview Format
Description
RAW
A RAW format image usually has an img extension, and contains an exact copy of a disk.
QCOW2
The Qemu Copy On Write v2 format.
AMI
Amazon Machine Image format.
VHD
Virtual Hard Disk format, used in Microsoft Virtual PC.
VMDK
Virtual Machine Disk format, created by VMware but now an open format.
The RAW format is a bootable, uncompressed virtual disk image, whereas the QCOW2 format is more complex and supports many features. File systems that support sparse files allow RAW images to be only the size of the used data. This means that a RAW image of a 20 GiB disk may only be 3 GiB in size. The attributes of both are compared in the following table. Comparison of RAW and QCOW2 Image Formats Attribute
RAW
QCOW2
Image Size
A RAW image will take up the same amount of disk space as the data it contains as long as it is sparse. Unused space in the source does not consume space in the image.
QCOW2 is a sparse representation of the virtual disk image. Consequently, it is smaller than a RAW image of the same source. It also supports compression using zlib.
Performance
Considered better than QCOW2 because disk space is all allocated on VM creation. This avoids the latencies introduced by allocating space as required.
Considered not as good as RAW due to the latency of performing disk allocation as space is required.
Encryption
Not applicable.
Optional. Uses 128-bit AES.
78
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Images in OpenStack Services Attribute
RAW
QCOW2
Snapshots
Not applicable.
Supports multiple snapshots, which are a read-only record of the image at a particular point in time.
Copy-on-write
Not applicable.
Reduces storage consumption by writing changes back to a copy of the data to be modified. The original is left unchanged.
When choosing between improved VM performance and reduced storage consumption, reduced storage consumption is usually preferred. The performance difference between RAW and QCOW2 images is not great enough to outweigh the cost of allocated but underused storage.
Images in OpenStack Services The OpenStack Compute service is the role that runs instances within Red Hat OpenStack Platform. The image format required is dependent on the backend storage system configured. In the default file based backend QCOW2 is the preferred image format because libvirt does not support snapshots of RAW images. However when using Ceph the image needs to be converted to RAW in order to leverage the clusters own snapshotting capabilities.
References Further information is available in the documentation for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en-us/red_hat_openstack_platform
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
79
Chapter 3. Building and Customizing Images
Quiz: Describing Image Formats Choose the correct answers to the following questions: 1.
What is the correct image format when using Ceph as the back end for the OpenStack Image service? a. b. c. d.
2.
Which four image formats are supported by Red Hat OpenStack Platform? (Choose four.) a. b. c. d. e.
3.
VMDK VBOX VHD QCOW2 RAW
Which three features are part of the QCOW2 format? (Choose three.) a. b. c. d. e.
80
QCOW2 VHD VMDK RAW
Encryption DFRWS support Snapshots Multi-bit error correction Copy-on-write
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution Choose the correct answers to the following questions: 1.
What is the correct image format when using Ceph as the back end for the OpenStack Image service? a. b. c. d.
2.
Which four image formats are supported by Red Hat OpenStack Platform? (Choose four.) a. b. c. d. e.
3.
QCOW2 VHD VMDK RAW
VMDK VBOX VHD QCOW2 RAW
Which three features are part of the QCOW2 format? (Choose three.) a. b. c. d. e.
Encryption DFRWS support Snapshots Multi-bit error correction Copy-on-write
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
81
Chapter 3. Building and Customizing Images
Building an Image Objective After completing this section, students should be able to build an image using diskimagebuilder.
Building a Custom Image The benefits of building custom images include: ensuring monitoring agents are present; aligning with the organization's security policy; and provisioning a common set of troubleshooting tools. diskimage-builder is a tool for building and customizing cloud images. It can output virtual disk images in a variety of formats, such as QCOW2 and RAW. Elements are applied by diskimagebuilder during the build process to customize the image. An element is a code set that runs within a chroot environment and alters how an image is built. For example, the docker elements export a tar file from a named container allowing other elements to build on top of it, or the element bootloader, which installs grub2 on the boot partition of the system.
Diskimage-builder Architecture diskimage-builder bind mounts /proc, /sys, and /dev in a chroot environment. The imagebuilding process produces minimal systems that possess all the required bits to fulfill their purpose with OpenStack. Images can be as simple as a file system image or can be customized to provide whole disk images. Upon completion of the file system tree, a loopback device with file system (or partition table and file system) is built and the file system tree copied into it. Diskimage-builder Elements Elements are used to specify what goes into the image and any modifications that are desired. Images are required to use at least one base distribution element, and there are multiple elements for a given distribution. For example, the distribution element could be rhel7, and then other elements are used to modify the rhel7 base image. Scripts are invoked and applied to the image based on multiple elements. Diskimage-builder Element Dependencies Each element has the ability to use element-deps and element-provides to define or affect dependencies. element-deps is a plain-text file containing a list of elements that will be added to the list of elements built into the image when it is created. element-provides is a plain-text file that contains a list of elements that are provided by this element. These particular elements are not included with the elements built into the image at creation time. The diskimage-builder package includes numerous elements: [user@demo ~]$ ls /usr/share/diskimage-builder/elements apt-conf docker apt-preferences dpkg apt-sources dracut-network architecture-emulation-binaries dracut-ramdisk baremetal dynamic-login base element-manifest bootloader enable-serial-console cache-url epel centos fedora centos7 fedora-minimal
82
pip-and-virtualenv pip-cache pkg-map posix proliant-tools pypi python-brickclient ramdisk ramdisk-base rax-nova-agent
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Diskimage-builder Architecture centos-minimal cleanup-kernel-initrd cloud-init cloud-init-datasources cloud-init-disable-resizefs cloud-init-nocloud debian debian-minimal debian-systemd debian-upstart debootstrap deploy deploy-baremetal deploy-ironic deploy-kexec deploy-targetcli deploy-tgtadm devuser dhcp-all-interfaces dib-init-system dib-python dib-run-parts disable-selinux dkms
gentoo growroot grub2 hpdsa hwburnin hwdiscovery ilo install-bin install-static install-types ironic-agent ironic-discoverd-ramdisk iso local-config manifests mellanox modprobe-blacklist no-final-image oat-client openssh-server opensuse opensuse-minimal package-installs partitioning-sfdisk
redhat-common rhel rhel7 rhel-common rpm-distro runtime-ssh-host-keys select-boot-kernel-initrd selinux-permissive serial-console simple-init source-repositories stable-interface-names svc-map sysctl uboot ubuntu ubuntu-core ubuntu-minimal ubuntu-signed vm yum yum-minimal zypper zypper-minimal
Each element has scripts that are applied to the images as they are built. The following example shows the scripts for the base element. [user@demo ~]$ tree /usr/share/diskimage-builder/elements/base /usr/share/diskimage-builder/elements/base/ |-- cleanup.d | |-- 01-ccache | `-- 99-tidy-logs |-- element-deps |-- environment.d | `-- 10-ccache.bash |-- extra-data.d | `-- 50-store-build-settings |-- install.d | |-- 00-baseline-environment | |-- 00-up-to-date | |-- 10-cloud-init | |-- 50-store-build-settings | `-- 80-disable-rfc3041 |-- package-installs.yaml |-- pkg-map |-- pre-install.d | `-- 03-baseline-tools |-- README.rst `-- root.d `-- 01-ccache 6 directories, 15 files
Diskimage-builder Phase Subdirectories Phase subdirectories should be located under an element directory; they may or may not exist by default, so create them as required. They contain executable scripts that have a two-digit numerical prefix, and are executed in numerical order. The convention is to store data files in the element directory, but to only store executable scripts in the phase subdirectory. If a script is not executable it will not run. The phase subdirectories are processed in the order listed in the following table:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
83
Chapter 3. Building and Customizing Images Phase Subdirectories Phase Subdirectory
Description
root.d
Builds or modifies the initial root file system content. This is where customizations are added, such as building on an existing image. Only one element can use this at a time unless particular care is taken not to overwrite, but instead to adapt the context extracted by other elements.
extra-data.d
Include extra data from the host environment that hooks may need when building the image. This copies any data such as SSH keys, or HTTP proxy settings, under $TMP_HOOKS_PATH.
pre-install.d
Prior to any customization or package installation, this code runs in a chroot environment.
install.d
In this phase the operating system and packages are installed, this code runs in a chroot environment.
post-install.d
This is the recommended phase to use for performing tasks that must be handled after the operating system and application installation, but before the first boot of the image. For example, running systemctl enable to enable required services.
block-device.d
Customize the block device, for example, to make partitions. Runs before the cleanup.d phase runs, but after the target tree is fully populated.
finalize.d
Runs in a chroot environment upon completion of the root file system content being copied to the mounted file system. Tuning of the root file system is performed in this phase, so it is important to limit the operations to only those necessary to affect the file system metadata and image itself. postinstall.d is preferred for most operations.
cleanup.d
The root file system content is cleaned of temporary files.
Diskimage-builder Environment Variables A number of environment variables must be exported, depending upon the required image customization. Typically, at a minimum, the following variables will be exported: Minimal Diskimage-builder Variables Variable
Description
DIB_LOCAL_IMAGE
The base image to build from.
NODE_DIST
The distribution of the base image, for example rhel7.
DIB_YUM_REPO_CONF
The client yum repository configuration files to be copied into the chroot environment during image building.
ELEMENTS_PATH
The path to a working copy of the elements.
84
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Diskimage-builder Architecture
Important Yum repository configuration files specified by DIB_YUM_REPO_CONF are copied into /etc/yum.repos.d during the image build and removed when the build is done. The intention is to provide the specified yum repository access only during the build and not to leave that yum repository access in the final image. However, this removal behavior may cause an unintended result; a yum repository configuration file specified in DIB_YUM_REPO_CONF that matches an already existing configuration file in the starting base image will result in that configuration file being removed from the final image at the end of the build. Be sure to check for existing repository configuration and exclude it from DIB_YUM_REPO_CONF if it should remain in the final built image.
Diskimage-builder Options We will examine some of the options available in the context of the following example: [user@demo ~]$ disk-image-create vm rhel7 -n \ -p python-django-compressor -a amd64 -o web.img 2>&1 | tee diskimage-build.log
The vm element provides sane defaults for virtual machine disk images. The next option is the distribution; the rhel7 option is provided to specify that the image will be Red Hat Enterprise Linux 7. The -n option skips the default inclusion of the base element, which might be desirable if you prefer not to have cloud-init and package updates installed. The -p option specifies which packages to install; here we are installing the python-django-compressor package. The -a option specifies the architecture of the image. The -o option specifies the output image name. Diskimage-builder Execution Each element contains a set of scripts to execute. In the following excerpt from the diskimagebuild.log file, we see the scripts that were executed as part of the root phase. Target: root.d Script --------------------------------------01-ccache 10-rhel7-cloud-image 50-yum-cache 90-base-dib-run-parts
Seconds ---------0.017 93.202 0.045 0.037
The run time for each script is shown on the right. Scripts that reside in the extra-data.d phase subdirectory were then executed: Target: extra-data.d Script --------------------------------------01-inject-ramdisk-build-files 10-create-pkg-map-dir 20-manifest-dir 50-add-targetcli-module 50-store-build-settings 75-inject-element-manifest
Seconds ---------0.031 0.114 0.021 0.038 0.006 0.040
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
85
Chapter 3. Building and Customizing Images 98-source-repositories 99-enable-install-types 99-squash-package-install 99-yum-repo-conf
0.041 0.023 0.221 0.039
From these examples, you can confirm the order that the phases were executed in and the order of script execution in each phase.
Red Hat-provided Images: Cloud-Init CloudInit is included in images provided by Red Hat and provides an interface for complex customization early in the instance initialization process. It can accept customization data in several formats, including standard shell script and cloud-config. The choice of customization methods must be considered carefully because there are different features in all of them. To avoid the proliferation of images, you can choose to add customization that is common across the organization to images, and then perform more granular customization with CloudInit. If only a small variety of system types are required, it might be simpler to perform all customization using diskimage-builder.
Building an Image The following steps outline the process for building an image with diskimage-builder. 1.
Download a base image.
2.
Open a terminal and create a working copy of the diskimage-builder elements.
3.
Add a script to perform the desired customization under the working copy of the relevant element phase directory.
4.
Export the variables that diskimage-builder requires: ELEMENT_PATH, DIB_LOCAL_IMAGE, NODE_DIST, and DIB_YUM_REPO_CONF.
5.
Build the image using the disk-image-create command and appropriate options.
6.
Upload the image to the OpenStack Image service.
7.
Launch an instance using the custom image.
8.
Attach a floating IP to the instance.
9.
Connect to the instance using SSH and verify the customization was executed.
References Diskimage-builder Documentation https://docs.openstack.org/diskimage-builder/latest/
86
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Building an Image
Guided Exercise: Building an Image In this exercise you will build and customize a disk image using diskimage-builder. Resources Base Image
http://materials.example.com/osp-small.qcow2
Working Copy of diskimage- /home/student/elements builder Elements Outcomes You should be able to: • Build and customize an image using diskimage-builder. • Upload the image into the OpenStack image service. • Spawn an instance using the customized image. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab customization-img-building setup command. This ensures that the required packages are installed on workstation, and provision the environment with a public network, a private network, a private key, and security rules to access the instance. [student@workstation ~]$ lab customization-img-building setup
Steps 1. From workstation, retrieve the osp-small.qcow2 image from http:// materials.example.com/osp-small.qcow2 and save it under /home/student/. [student@workstation ~]$ wget http://materials.example.com/osp-small.qcow2
2.
Create a copy of the diskimage-builder elements directory to work with under /home/ student/. [student@workstation ~]$ cp -a /usr/share/diskimage-builder/elements /home/student/
3.
Create a post-install.d directory under the working copy of the rhel7 element. [student@workstation ~]$ mkdir -p /home/student/elements/rhel7/post-install.d
4.
Add three scripts under the rhel7 element post-install.d directory to enable the vsftpd service, add vsftpd:ALL to /etc/hosts.allow, and disable anonymous ftp in / etc/vsftpd/vsftpd.conf. [student@workstation ~]$ cd /home/student/elements/rhel7/post-install.d/ [student@workstation post-install.d]$ cat /etc/hosts.allow fi EOF [student@workstation post-install.d]$ cat exit
Cleanup From workstation, run the lab customization-img-building cleanup command to clean up this exercise. [student@workstation ~]$ lab customization-img-building cleanup
90
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Customizing an Image
Customizing an Image Objectives After completing this section, students should be able to customize an image using guestfish and virt-customize.
Making Minor Image Customizations Building an image using diskimage-builder can take several minutes, and may require a copy of the elements directory to be used for each image. If you only require a small number of customizations, you could save time by using the guestfish or virt-customize commands to modify a base image, such as the one provided by Red Hat in the rhel-guest-image-7 package. The image provided in the rhel-guest-image-7 package has a minimal set of packages and has cloud-init installed and enabled. You can download the rhel-guest-image-7 package from https://access.redhat.com/downloads.
Guestfish and Virt-customize Internals guestfish and virt-customize both use the libguestfs API to perform their functions. Libguestfs needs a back end that can work with the various image formats, and by default it uses libvirt. The process to open an image for editing with a libvirt back end includes creating an overlay file for the image, creating an appliance, booting the appliance with or without network support, and mounting the partitions. You can investigate the process in more detail by exporting two environment variables, LIBGUESTFS_DEBUG=1 and LIBGUESTFS_TRACE=1, and then executing guestfish or virt-customize with the -a option to add a disk.
Using Guestfish to Customize Images guestfish is a low-level tool that exposes the libguestfs API directly, which means that you can manipulate images in a very granular fashion. The following example uses the -i option to mount partitions automatically, the -a option to add the disk image, and the --network option to enable network access. It then installs the aide package, sets the password for root, and restores SELinux file contexts. [user@demo ~]$ guestfish -i --network -a ~/demo-rhel-base.qcow2 Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: 'help' for help on commands 'man' to read the manual 'quit' to quit the shell Operating system: Red Hat Enterprise Linux Server 7.3 (Maipo) /dev/sda1 mounted on / > command "yum -y install aide" ...output omitted... > command "echo letmein | passwd --stdin root" > selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
91
Chapter 3. Building and Customizing Images
Using Virt-customize to Customize Images virt-customize is a high-level tool that also uses the libguestfs API, but eases imagebuilding by performing tasks using simple options that may have required multiple API calls to achieve using guestfish or the libguestfs API directly. The following example shows virtcustomize using the -a option to add the disk, installing a package, setting the root password, and resetting SELinux contexts. [user@demo ~]$ virt-customize -a ~/demo-rhel-base.qcow2 \ --install aide \ --root-password password:letmein \ --selinux-relabel ...output omitted...
The following table compares the two tools. Comparison of guestfish and virt-customize Commands Feature
guestfish
virt-customize
Complexity
A low-level tool that exposes the guestfs API directly.
A high-level tool that is easier to use and simplifies common tasks.
SELinux support
Use the selinux-relabel /etc/ selinux/targeted/contexts/ files/file_contexts / command to restore SELinux file contexts.
Use the --selinux-relabel option to restore file contexts. This option will use the touch /.autorelabel command if relabeling is unsuccessful.
Options
For low-level tasks such as manipulating partitions, scripting, and remote access.
For common tasks such as installing packages, changing passwords, setting the host name and time zone, and registering with Subscription Manager.
The --selinux-relabel customization option relabels files in the guest so that they have the correct SELinux label. This option tries to relabel files immediately. If unsuccessful, /.autorelabel is created on the image. This schedules the relabel operation for the next time the image boots.
Use Cases For most common image customization tasks, virt-customize is the best choice. However, as listed in the table above, the less frequent low-level tasks should be performed with the guestfish command.
Important When working with images that have SELinux enabled, ensure that the correct SELinux relabeling syntax is used to reset proper labels on files modified. Files with incorrectly labeled context will cause SELinux access denials. If the mislabeled files are critical system files, the image may not be able to boot until labeling is fixed.
Customizing an Image with guestfish
92
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Use Cases The following steps outline the process for customizing an image with guestfish. 1.
Download a base image.
2.
Execute the guestfish command. Use -i to automatically mount the partitions and use a to add the image.
3.
Perform the changes you require, using commands such as add, rm, and command.
Important If your image will have SELinux enabled, ensure you relabel any affected files using the selinux-relabel /etc/selinux/targeted/contexts/files/ file_contexts / command.
4.
Exit the guestfish shell.
5.
Upload the image to the OpenStack Image service.
6.
Launch an instance using the custom image.
7.
Attach a floating IP to the instance.
8.
Connect to the instance using SSH and verify the customization was executed.
Customizing an Image with virt-customize The following steps outline the process for customizing an image with virt-customize. 1.
Download a base image.
2.
Execute the virt-customize command. Use -a to add the image, and then use other options such as --run-command, --install, --write and --root-password.
Important If your image will have SELinux enabled, ensure you use the --selinuxrelabel option last. Running the restorecon command inside the image will not work through virt-customize.
3.
Upload the image to the OpenStack Image service.
4.
Launch an instance using the custom image.
5.
Attach a floating IP to the instance.
6.
Connect to the instance using SSH and verify the customization was executed.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
93
Chapter 3. Building and Customizing Images
References guestfish - the guest file system shell http://libguestfs.org/guestfish.1.html virt-customize - Customize a virtual machine http://libguestfs.org/virt-customize.1.html
94
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Customizing an Image
Guided Exercise: Customizing an Image In this exercise you will customize disk images using guestfish and virt-customize. Resources Base Image
http://materials.example.com/osp-small.qcow2
Outcomes You should be able to: • Customize an image using guestfish. • Customize an image using virt-customize. • Upload an image into Glance. • Spawn an instance using a customized image. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab customization-img-customizing setup command. This ensures that the required packages are installed on workstation, and provisions the environment with a public network, a private network, a private key, and security rules to access the instance. [student@workstation ~]$ lab customization-img-customizing setup
Steps 1. From workstation, retrieve the osp-small.qcow2 image from http:// materials.example.com/osp-small.qcow2 and save it as /home/student/ finance-rhel-db.qcow2. [student@workstation ~]$ wget http://materials.example.com/osp-small.qcow2 \ -O ~/finance-rhel-db.qcow2
2.
Using the guestfish command, open the image for editing and include network access. [student@workstation ~]$ guestfish -i --network -a ~/finance-rhel-db.qcow2 Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: 'help' for help on commands 'man' to read the manual 'quit' to quit the shell Operating system: Red Hat Enterprise Linux Server 7.3 (Maipo) /dev/sda1 mounted on / >
3.
Install the mariadb and mariadb-server packages.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
95
Chapter 3. Building and Customizing Images
> command "yum -y install mariadb mariadb-server" ...output omitted... Installed: mariadb.x86_64 1:5.5.52-1.el7 mariadb-server.x86_64 1:5.5.52-1.el7 Dependency Installed: libaio.x86_64 0:0.3.109-13.el7 perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7 perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7 perl-DBD-MySQL.x86_64 0:4.023-5.el7 perl-DBI.x86_64 0:1.627-4.el7 perl-Data-Dumper.x86_64 0:2.145-3.el7 perl-IO-Compress.noarch 0:2.061-2.el7 perl-Net-Daemon.noarch 0:0.48-5.el7 perl-PlRPC.noarch 0:0.2020-14.el7 Complete!
4.
Enable the mariadb service. > command "systemctl enable mariadb"
5.
Because there was no output, ensure the mariadb service was enabled. > command "systemctl is-enabled mariadb" enabled
6.
Ensure the SELinux contexts for all affected files are correct.
Important Files modified from inside the guestfish tool are written without valid SELinux context. Failure to relabel critical modified files, such as /etc/passwd, will result in an unusable image, since SELinux properly denies access to files with improper context, during the boot process. Although a relabel can be configured using touch /.autorelabel from within guestfish, this would be persistent on the image, resulting in a relabel being performed on every boot for every instance deployed using this image. Instead, the foollowing step performs the relabel just once, right now.
> selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /
7.
Exit from the guestfish shell. > exit [student@workstation ~]$
96
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
8.
As the developer1 OpenStack user, upload the finance-rhel-db.qcow2 image to the image service as finance-rhel-db, with a minimum disk requirement of 10 GiB, and a minimum RAM requirement of 2 GiB. 8.1. Source the developer1-finance-rc credential file. [student@workstation ~]$ source developer1-finance-rc [student@workstation ~(developer1-finance)]$
8.2. As the developer1 OpenStack user, upload the finance-rhel-db.qcow2 image to the image service as finance-rhel-db. [student@workstation ~(developer1-finance)]$ openstack image create \ --disk-format qcow2 \ --min-disk 10 \ --min-ram 2048 \ --file finance-rhel-db.qcow2 \ finance-rhel-db ...output omitted...
9.
Launch an instance in the environment using the following attributes: Instance Attributes Attribute
Value
flavor
m1.database
key pair
developer1-keypair1
network
finance-network1
image
finance-rhel-db
security group
finance-db
name
finance-db1
[student@workstation ~(developer1-finance)]$ openstack server create \ --flavor m1.database \ --key-name developer1-keypair1 \ --nic net-id=finance-network1 \ --security-group finance-db \ --image finance-rhel-db \ --wait finance-db1 ...output omitted...
10. List the available floating IP addresses, and then allocate one to finance-db1. 10.1. List the floating IPs; unallocated IPs have None listed as their Port value. [student@workstation ~(developer1-finance)]$ openstack floating ip list \ -c "Floating IP Address" -c Port +---------------------+------+ | Floating IP Address | Port | +---------------------+------+ | 172.25.250.P | None | | 172.25.250.R | None |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
97
Chapter 3. Building and Customizing Images +---------------------+------+
10.2.Attach an unallocated floating IP to the finance-db1 instance. [student@workstation ~(developer1-finance)]$ openstack server add floating \ ip finance-db1 172.25.250.P
11.
Use ssh to connect to the finance-db1 instance. Ensure the mariadb-server package is installed, and that the mariadb service is enabled and running. 11.1. Log in to the finance-db1 instance using ~/developer1-keypair1.pem with ssh. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts. [cloud-user@finance-db1 ~]$
11.2. Confirm that the mariadb-server package is installed. [cloud-user@finance-db1 ~]$ rpm -q mariadb-server mariadb-server-5.5.52-1.el7.x86_64
11.3. Confirm that the mariadb service is enabled and running, and then log out. [cloud-user@finance-db1 ~]$ systemctl status mariadb ...output omitted... Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2017-05-29 20:49:37 EDT; 9min ago Process: 1033 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=0/SUCCESS) Process: 815 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS) Main PID: 1031 (mysqld_safe) ...output omitted... [cloud-user@finance-db1 ~]$ exit logout Connection to 172.25.250.P closed. [student@workstation ~(developer1-finance)]$
12. From workstation, retrieve the osp-small.qcow2 image from http:// materials.example.com/osp-small.qcow2 and save it as /home/student/ finance-rhel-mail.qcow2. [student@workstation ~(developer1-finance)]$ wget \ http://materials.example.com/osp-small.qcow2 -O ~/finance-rhel-mail.qcow2
13. Use the virt-customize command to customize the ~/finance-rhel-mail.qcow2 image. Enable the postfix service, configure postfix to listen on all interfaces, and relay all mail to workstation.lab.example.com. Install the mailx package to enable sending a test email. Ensure the SELinux contexts are restored.
98
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~(developer1-finance)]$ virt-customize \ -a ~/finance-rhel-mail.qcow2 \ --run-command 'systemctl enable postfix' \ --run-command 'postconf -e "relayhost = [workstation.lab.example.com]"' \ --run-command 'postconf -e "inet_interfaces = all"' \ --run-command 'yum -y install mailx' \ --selinux-relabel [ 0.0] Examining the guest ... [ 84.7] Setting a random seed [ 84.7] Running: systemctl enable postfix [ 86.5] Running: postconf -e "relayhost = [workstation.lab.example.com]" [ 88.4] Running: postconf -e "inet_interfaces = all" [ 89.8] Running: yum -y install mailx [ 174.0] SELinux relabelling [ 532.7] Finishing off
14. As the developer1 OpenStack user, upload the finance-rhel-mail.qcow2 image to the image service as finance-rhel-mail, with a minimum disk requirement of 10 GiB, and a minimum RAM requirement of 2 GiB. 14.1. Use the openstack command to upload the finance-rhel-mail.qcow2 image to the image service. [student@workstation ~(developer1-finance)]$ openstack image create \ --disk-format qcow2 \ --min-disk 10 \ --min-ram 2048 \ --file ~/finance-rhel-mail.qcow2 \ finance-rhel-mail ...output omitted...
15. Launch an instance in the environment using the following attributes: Instance Attributes Attribute
Value
flavor
m1.web
key pair
developer1-keypair1
network
finance-network1
image
finance-rhel-mail
security group
finance-mail
name
finance-mail1
[student@workstation ~(developer1-finance)]$ openstack server create \ --flavor m1.web \ --key-name developer1-keypair1 \ --nic net-id=finance-network1 \ --security-group finance-mail \ --image finance-rhel-mail \ --wait finance-mail1 ...output omitted...
16. List the available floating IP addresses, and allocate one to finance-mail1.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
99
Chapter 3. Building and Customizing Images 16.1. List the available floating IPs. [student@workstation ~(developer1-finance)]$ openstack floating ip list \ -c "Floating IP Address" -c Port +---------------------+--------------------------------------+ | Floating IP Address | Port | +---------------------+--------------------------------------+ | 172.25.250.P | 1ce9ffa5-b52b-4581-a696-52f464912500 | | 172.25.250.R | None | +---------------------+--------------------------------------+
16.2.Attach an available floating IP to the finance-mail1 instance. [student@workstation ~(developer1-finance)]$ openstack server add floating \ ip finance-mail1 172.25.250.R
17.
Use ssh to connect to the finance-mail1 instance. Ensure the postfix service is running, that postfix is listening on all interfaces, and that the relay_host directive is correct. 17.1. Log in to the finance-mail1 instance using ~/developer1-keypair1.pem with ssh. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.R' (ECDSA) to the list of known hosts. [cloud-user@finance-mail1 ~]$
17.2. Ensure the postfix service is running. [cloud-user@finance-mail1 ~]$ systemctl status postfix ...output omitted... Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2017-05-29 00:59:32 EDT; 4s ago Process: 1064 ExecStart=/usr/sbin/postfix start (code=exited, status=0/ SUCCESS) Process: 1061 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS) Process: 1058 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS) Main PID: 1136 (master) ...output omitted...
17.3. Ensure postfix is listening on all interfaces. [cloud-user@finance-mail1 ~]$ sudo ss -tnlp | grep master LISTEN 0 100 *:25 *:* users:(("master",pid=1136,fd=13)) LISTEN 0 100 :::25 :::* users:(("master",pid=1136,fd=14))
17.4. Ensure the relayhost directive is configured correctly. [cloud-user@finance-mail1 ~]$ postconf relayhost
100
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
relayhost = [workstation.lab.example.com]
17.5. Send a test email to [email protected]. [cloud-user@finance-mail1 ~]$ mail -s "Test" [email protected] Hello World! . EOT
17.6. Return to workstation. Use the mail command to confirm that the test email arrived. [cloud-user@finance-mail1 ~]$ exit [student@workstation ~]$ mail Heirloom Mail version 12.5 7/5/10. Type ? for help. "/var/spool/mail/student": 1 message 1 new >N 1 Cloud User Mon May 29 01:18 22/979 & q
"Test"
Cleanup From workstation, run the lab customization-img-customizing cleanup command to clean up this exercise. [student@workstation ~]$ lab customization-img-customizing cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
101
Chapter 3. Building and Customizing Images
Lab: Building and Customizing Images In this lab, you will build a disk image using diskimage-builder, and then modify it using guestfish. Resources Base Image URL
http://materials.example.com/osp-small.qcow2
Diskimage-builder elements directory
/usr/share/diskimage-builder/elements
Outcomes You will be able to: • Build an image using diskimage-builder. • Customize the image using the guestfish command. • Upload the image to the OpenStack image service. • Spawn an instance using the customized image. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab customization-review setup command. This ensures that the required packages are installed on workstation, and provisions the environment with a public network, a private network, a key pair, and security rules to access the instance. [student@workstation ~]$ lab customization-review setup
Steps 1. From workstation, retrieve the osp-small.qcow2 image from http:// materials.example.com/osp-small.qcow2 and save it in the /home/student/ directory. 2.
Create a copy of the diskimage-builder elements directory to work with in the /home/ student/ directory.
3.
Create a post-install.d directory under the working copy of the rhel7 element.
4.
Add a script under the rhel7/post-install.d directory to enable the httpd service.
5.
Export the following environment variables, which diskimage-builder requires. Environment Variables
102
Variable
Content
NODE_DIST
rhel7
DIB_LOCAL_IMAGE
/home/student/osp-small.qcow2
DIB_YUM_REPO_CONF
"/etc/yum.repos.d/openstack.repo"
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Variable
Content
ELEMENTS_PATH
/home/student/elements
6.
Build a RHEL 7 image named production-rhel-web.qcow2 using the diskimagebuilder elements configured previously. Include the httpd package in the image.
7.
Add a custom web index page to the production-rhel-web.qcow2 image using guestfish. Include the text production-rhel-web in the index.html file. Ensure the SELinux context of /var/www/html/index.html is correct.
8.
As the operator1 user, create a new OpenStack image named production-rhel-web using the production-rhel-web.qcow2 image, with a minimum disk requirement of 10 GiB, and a minimum RAM requirement of 2 GiB.
9.
As the operator1 user, launch an instance using the following attributes: Instance Attributes Attribute
Value
flavor
m1.web
key pair
operator1-keypair1
network
production-network1
image
production-rhel-web
security group
production-web
name
production-web1
10. List the available floating IP addresses, and then allocate one to production-web1. 11.
Log in to the production-web1 instance using operator1-keypair1.pem with ssh. Ensure the httpd package is installed, and that the httpd service is enabled and running.
12. From workstation, confirm that the custom web page, displayed from productionweb1, contains the text production-rhel-web. Evaluation From workstation, run the lab customization-review grade command to confirm the success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~]$ lab customization-review grade
Cleanup From workstation, run the lab customization-review cleanup command to clean up this exercise. [student@workstation ~]$ lab customization-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
103
Chapter 3. Building and Customizing Images
Solution In this lab, you will build a disk image using diskimage-builder, and then modify it using guestfish. Resources Base Image URL
http://materials.example.com/osp-small.qcow2
Diskimage-builder elements directory
/usr/share/diskimage-builder/elements
Outcomes You will be able to: • Build an image using diskimage-builder. • Customize the image using the guestfish command. • Upload the image to the OpenStack image service. • Spawn an instance using the customized image. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab customization-review setup command. This ensures that the required packages are installed on workstation, and provisions the environment with a public network, a private network, a key pair, and security rules to access the instance. [student@workstation ~]$ lab customization-review setup
Steps 1. From workstation, retrieve the osp-small.qcow2 image from http:// materials.example.com/osp-small.qcow2 and save it in the /home/student/ directory. [student@workstation ~]$ wget http://materials.example.com/osp-small.qcow2 \ -O /home/student/osp-small.qcow2
2.
Create a copy of the diskimage-builder elements directory to work with in the /home/ student/ directory. [student@workstation ~]$ cp -a /usr/share/diskimage-builder/elements /home/student/
3.
Create a post-install.d directory under the working copy of the rhel7 element. [student@workstation ~]$ mkdir -p /home/student/elements/rhel7/post-install.d
4.
Add a script under the rhel7/post-install.d directory to enable the httpd service. 4.1. Add a script to enable the httpd service.
104
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
[student@workstation ~]$ cd /home/student/elements/rhel7/post-install.d/ [student@workstation post-install.d]$ cat
7.2. Create a new /var/www/html/index.html file.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
105
Chapter 3. Building and Customizing Images
> touch /var/www/html/index.html
7.3. Edit the /var/www/html/index.html file and include the required key words. > edit /var/www/html/index.html This instance uses the production-rhel-web image.
7.4. To ensure the new index page works with SELinux in enforcing mode, restore the /var/ www/ directory context (including the index.html file). > selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /var/ www/
7.5. Exit the guestfish shell. > exit [student@workstation ~]$
8.
As the operator1 user, create a new OpenStack image named production-rhel-web using the production-rhel-web.qcow2 image, with a minimum disk requirement of 10 GiB, and a minimum RAM requirement of 2 GiB. 8.1. Source the operator1-production-rc credentials file. [student@workstation ~]$ source operator1-production-rc [student@workstation ~(operator1-production)]$
8.2. Upload the production-rhel-web.qcow2 image to the OpenStack Image service. [student@workstation ~(operator1-production)]$ openstack image create \ --disk-format qcow2 \ --min-disk 10 \ --min-ram 2048 \ --file production-rhel-web.qcow2 \ production-rhel-web ...output omitted...
9.
As the operator1 user, launch an instance using the following attributes: Instance Attributes
106
Attribute
Value
flavor
m1.web
key pair
operator1-keypair1
network
production-network1
image
production-rhel-web
security group
production-web
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution Attribute
Value
name
production-web1
[student@workstation ~(operator1-production)]$ openstack server create \ --flavor m1.web \ --key-name operator1-keypair1 \ --nic net-id=production-network1 \ --image production-rhel-web \ --security-group production-web \ --wait production-web1 ...output omitted...
10. List the available floating IP addresses, and then allocate one to production-web1. 10.1. List the floating IPs. Available IP addresses have the Port attribute set to None. [student@workstation ~(operator1-production)]$ openstack floating ip list \ -c "Floating IP Address" -c Port +---------------------+------+ | Floating IP Address | Port | +---------------------+------+ | 172.25.250.P | None | +---------------------+------+
10.2.Attach an available floating IP to the production-web1 instance. [student@workstation ~(operator1-production)]$ openstack server add \ floating ip production-web1 172.25.250.P
11.
Log in to the production-web1 instance using operator1-keypair1.pem with ssh. Ensure the httpd package is installed, and that the httpd service is enabled and running. 11.1. Use SSH to log in to the production-web1 instance using operator1keypair1.pem. [student@workstation ~(operator1-production)]$ ssh -i operator1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts. [cloud-user@production-web1 ~]$
11.2. Confirm that the httpd package is installed. [cloud-user@production-web1 ~]$ rpm -q httpd httpd-2.4.6-45.el7.x86_64
11.3. Confirm that the httpd service is running. [cloud-user@production-web1 ~]$ systemctl status httpd ...output omitted... Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2017-05-24 23:55:42 EDT; 8min ago Docs: man:httpd(8)
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
107
Chapter 3. Building and Customizing Images man:apachectl(8) Main PID: 833 (httpd) ...output omitted...
11.4. Exit the instance to return to workstation. [cloud-user@production-web1 ~]$ exit
12. From workstation, confirm that the custom web page, displayed from productionweb1, contains the text production-rhel-web. [student@workstation ~(operator1-production)]$ curl http://172.25.250.P/index.html This instance uses the production-rhel-web image.
Evaluation From workstation, run the lab customization-review grade command to confirm the success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~]$ lab customization-review grade
Cleanup From workstation, run the lab customization-review cleanup command to clean up this exercise. [student@workstation ~]$ lab customization-review cleanup
108
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Summary
Summary In this chapter, you learned: • The pros and cons of building an image versus customizing an existing one, such as meeting organization security standards, including third-party agents, and adding operator accounts. • When to use the guestfish or virt-customize tools. Use guestfish when you need to perform low-level tasks such as partitioning disks. Use virt-customize for all common customization tasks such as setting passwords and installing packages. • Making changes to an image using these tools affects SELinux file contexts, because SELinux is not supported directly in the chroot environment.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
109
110
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 4
MANAGING STORAGE Overview Goal
Manage Ceph and Swift storage for OpenStack.
Objectives
• Describe back-end storage options for OpenStack services. • Configure Ceph as the back-end storage for OpenStack services. • Manage Swift as object storage.
Sections
• Describing Storage Options (and Quiz) • Configuring Ceph Storage (and Guided Exercise) • Managing Object Storage (and Guided Exercise)
Lab
• Managing Storage
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
111
Chapter 4. Managing Storage
Describing Storage Options Objectives After completing this section, students should be able to describe back-end storage options for OpenStack services.
Storage in Red Hat OpenStack Platform A cloud environment such as Red Hat OpenStack Platform requires applications that take advantage of the features provided by this environment. They should be designed to leverage the scalability of compute and storage resources in Red Hat OpenStack Platform. These resources include both computing and storage resources used by users. Although the some storage configurations can use simple back ends, such as the volume group for the OpenStack block storage service, Red Hat OpenStack Platform also supports enterprise-level back ends. This support includes the most common SAN infrastructures, as well as support for DAS and NAS devices. This allows reuse of existing storage infrastructure as a back end for OpenStack. In a physical enterprise environment, servers are often installed with local storage drives attached to them, and use external storage to scale that local storage. This is also true of a cloud-based instance, where the instance has some associated local storage, and also some external storage as a way to scale the local storage. In cloud environments, storage is a key resource that needs to be managed appropriately so that the maximum number of users can take advantage of those resources. Local storage for instances is based in the compute nodes where those instances run, and Red Hat OpenStack Platform recycles this local storage when an instance terminates. This type of storage is known as ephemeral storage, and it includes both the effective storage space a user can use inside of an instance and the storage used for swap memory by the instance. All the ephemeral storage resources are removed when the instance terminates. The disk drive space of the physical servers on which instances run limits the available local storage. To scale the storage of an instance, Red Hat OpenStack Platform provisions additional space with the OpenStack block storage service, object storage service, or file share service. The storage resources provided by those services are persistent, so they remain after the instance terminates.
Storage Options for OpenStack Services OpenStack services require two types of storage: ephemeral storage and persistent storage. Ephemeral storage uses the local storage available in the compute nodes on which instances run. This storage usually provides better performance because it uses the same back end that the instance's virtual disk. Because of this, ephemeral storage is usually the best option for storing elements that require the best performance, such as the operating system or swap disks. Although ephemeral storage usually provides better performance, sometimes users need to store data persistently. Red Hat OpenStack Platform services provide persistent storage in the form of block storage and object storage. The block storage service allows storing data on a device available in the instance file system. The object storage service provides an external storage infrastructure available to instances. Red Hat OpenStack Platform supports several storage systems as the back end for their services. Those storage systems include:
112
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Benefits, Recommended Practices, and Use Cases LVM The block storage service supports LVM as a storage back end. LVM is available but not officially supported by Red Hat. An LVM-based back end requires a volume group. Each block storage volume uses a logical volume as its back end. Red Hat Ceph Storage The block storage and image services both support Red Hat Ceph Storage as a storage back end. Red Hat Ceph Storage provides petabyte-scale storage and has no single point of failure. Red Hat OpenStack Platform uses RBD to access Red Hat Ceph Storage. Each new volume or image created in Red Hat OpenStack Platform uses an RBD image on Red Hat Ceph Storage. NFS Red Hat OpenStack Platform services such as the block storage service support NFS as a storage back end. Each volume back end resides in the NFS shares specified in the driver options in the block storage service configuration file. Vendor-specific Storage Supported storage hardware vendor provides a driver for Red Hat OpenStack Platform services to use their storage infrastructure as a back end.
Note Red Hat provides support for Red Hat Ceph Storage and NFS.
Benefits, Recommended Practices, and Use Cases The undercloud currently supports both Red Hat Ceph Storage and NFS as storage back ends for Red Hat OpenStack Platform systems. Most of the existing NAS and SAN solutions can export storage using NFS. In addition, some storage hardware vendors provide drivers for the different Red Hat OpenStack Platform services. These drivers can interact natively with the storage systems provided by those vendors. LVM is suitable for use in test environments. The storage volumes are created on the local storage of the machine where the block storage service is running. This back end uses that machine as an iSCSI target to export those storage volumes. This configuration is a bottleneck when scaling up the environment. Red Hat Ceph Storage is a separate infrastructure from Red Hat OpenStack Platform. This storage system provides fault tolerance and scalability. Red Hat Ceph Storage is not the best choice for some proof-of-concept environments, because of its hardware requirements. The undercloud can collocate some Red Hat Ceph Storage services in the controller node. This configuration reduces the number of resources needed. Because of the growing demand for computing and storage resources, the undercloud now supports hyper-converged infrastructures (HCI). These infrastructures use compute nodes where both Red Hat OpenStack Platform and Red Hat Ceph Storage services run. The use of hyperconverged nodes is pushing the need for better utilization of the underlying hardware resources.
Storage Architecture for OpenStack Services The supported architectures by Red Hat Ceph Storage and the object storage service (Swift) are the following:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
113
Chapter 4. Managing Storage Red Hat Ceph Storage Architecture The Red Hat Ceph Storage architecture has two main elements: monitors (MONs) and object storage devices (OSDs). The monitors manage the cluster metadata, and they are the front end for the Ceph cluster. A client that wants to access a Ceph cluster needs at least one monitor IP address or host name. Each object storage device has a disk device associated with it. A node can have several object storage devices. The undercloud deploys Ceph with one monitor running on the controller node. The Red Hat Ceph Storage architecture is discussed further in a later section.
Note The Red Hat OpenStack Platform block storage and image services support Red Hat Ceph Storage as their storage back end.
Swift Architecture The Red Hat OpenStack Platform Swift service architecture has a front-end service, the proxy server (swift-proxy), and three back-end services: account server (swift-account); object server (swift-object); and container server (swift-container). The proxy server maintains the Swift API. Red Hat OpenStack Platform configures the Keystone endpoint for Swift with the URI for this API.
Figure 4.1: Swift architecture
114
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Storage Architecture for OpenStack Services
References Further information is available in the Storage Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
115
Chapter 4. Managing Storage
Quiz: Describing Storage Options Choose the correct answers to the following questions: 1.
Red Hat provides support for which two storage back ends? (Choose two.) a. b. c. d. e.
2.
Which two benefits are provided by a Red Hat Ceph Storage-based back end over NFS? (Choose two.) a. b. c. d. e.
3.
CephFS Ceph Gateway (RADOSGW) RBD Ceph native API (librados)
Which two Red Hat OpenStack Platform services are supported to use Red Hat Ceph Storage as its back end? (Choose two.) a. b. c. d. e.
116
Production-ready environments Cluster environments Proof of concept environments High performance environments (local storage based)
Which method uses the Red Hat OpenStack Platform block storage service to access Ceph? a. b. c. d.
5.
Snapshots No single point of failure Petabyte-scale storage Thin provisioning Integration with Red Hat OpenStack Platform
What is an LVM-based back end suitable for in Red Hat OpenStack Platform? a. b. c. d.
4.
In-memory NFS Red Hat Ceph Storage Raw devices LVM
Share file system service Block storage service Image service Compute service Object storage service
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution Choose the correct answers to the following questions: 1.
Red Hat provides support for which two storage back ends? (Choose two.) a. b. c. d. e.
2.
Which two benefits are provided by a Red Hat Ceph Storage-based back end over NFS? (Choose two.) a. b. c. d. e.
3.
Production-ready environments Cluster environments Proof of concept environments High performance environments (local storage based)
Which method uses the Red Hat OpenStack Platform block storage service to access Ceph? a. b. c. d.
5.
Snapshots No single point of failure Petabyte-scale storage Thin provisioning Integration with Red Hat OpenStack Platform
What is an LVM-based back end suitable for in Red Hat OpenStack Platform? a. b. c. d.
4.
In-memory NFS Red Hat Ceph Storage Raw devices LVM
CephFS Ceph Gateway (RADOSGW) RBD Ceph native API (librados)
Which two Red Hat OpenStack Platform services are supported to use Red Hat Ceph Storage as its back end? (Choose two.) a. b. c. d. e.
Share file system service Block storage service Image service Compute service Object storage service
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
117
Chapter 4. Managing Storage
Configuring Ceph Storage Objectives After completing this section, students should be able to configure Ceph as the back-end storage for OpenStack services.
Red Hat Ceph Storage Architecture Hardware-based storage infrastructures inherently have limited scalability by design. Cloud computing infrastructures require a storage system that can scale in parallel with the computing resources. Software-defined storage systems, such as Red Hat Ceph Storage, can scale at the same pace as the computing resources. Red Hat Ceph Storage also supports features such as snapshotting and thin-provisioning. The Ceph architecture is based on the daemons listed in Figure 4.2: Red Hat Ceph storage architecture. Multiple OSDs can run on a single server, but can also run across servers. These daemons can be scaled out to meet the requirements of the architecture being deployed.
Figure 4.2: Red Hat Ceph storage architecture Ceph Monitors Ceph monitors (MONs) are daemons that maintain a master copy of the cluster map. The cluster map is a collection of five maps that contain information about the Ceph cluster state and configuration. Ceph daemons and clients can check in periodically with the monitors to be sure they have the most recent copy of the map. In this way they provide consensus for distributed
118
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Ceph Access Methods decision making. The monitors must establish a consensus regarding the state of the cluster. This means that an odd number of monitors is required to avoid a stalled vote, and a minimum of three monitors must be configured. For the Ceph Storage cluster to be operational and accessible, more than 50% of monitors must be running and operational. If the number of active monitors falls below this threshold, the complete Ceph Storage cluster will become inaccessible to any client. This is done to protect the integrity of the data. Ceph Object Storage Devices Ceph Object Storage Devices (OSDs) are the building blocks of a Ceph Storage cluster. OSDs connect a disk to the Ceph Storage cluster. Each hard disk to be used for the Ceph cluster has a file system on it, and an OSD daemon associated with it. Red Hat Ceph Storage currently only supports using the XFS file system. Extended Attributes (xattrs) are used to store information about the internal object state, snapshot metadata, and Ceph Gateway Access Control Lists (ACLs). Extended attributes are enabled by default on XFS file systems. The goal for the OSD daemon is to bring the computing power as close as possible to the physical data to improve performance. Each OSD has its own journal, not related to the file-system journal. Journals use raw volumes on the OSD nodes, and should be configured on a separate device, and if possible a fast device, such as an SSD, for performance oriented and/or heavy write environments. Depending on the Ceph deployment tool used, the journal is configured such that if a Ceph OSD, or a node where a Ceph OSD is located, fails, the journal is replayed when the OSD restarts. The replay sequence starts after the last sync operation, as previous journal records were trimmed out. Metadata Server The Ceph Metadata Server (MDS) is a service that provides POSIX-compliant, shared file-system metadata management, which supports both directory hierarchy and file metadata, including ownership, time stamps, and mode. MDS uses RADOS to store metadata instead of local storage, and has no access to file content, because it is only required for file access. RADOS is an object storage service and is part of Red Hat Ceph Storage. MDS also enables CephFS to interact with the Ceph Object Store, mapping an inode to an object, and recording where data is stored within a tree. Clients accessing a CephFS file system first make a request to an MDS, which provides the information needed to get files from the correct OSDs.
Note The metadata server is not deployed by the undercloud in the default Ceph configuration.
Ceph Access Methods The following methods are available for accessing a Ceph cluster: • The Ceph native API (librados): native interface to the Ceph cluster. Service interfaces built on this native interface include the Ceph Block Device, the Ceph Gateway, and the Ceph File System. • The Ceph Gateway (RADOSGW): RESTful APIs for Amazon S3 and Swift compatibility. The Ceph Gateway is referred to as radosgw.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
119
Chapter 4. Managing Storage • The Ceph Block Device (RBD, librbd): This is a Python module that provides file-like access to Ceph Block Device images. • The Ceph File System (CephFS, libcephfs): provides access to a Ceph cluster via a POSIX-like interface.
Red Hat Ceph Storage Terminology Pools Pools are logical partitions for storing objects under the same name tag, which support multiple name spaces. The Controlled Replication Under Scalable Hashing (CRUSH) algorithm is used to select the OSDs hosting the data for a pool. Each pool is assigned a single CRUSH rule for its placement strategy. The CRUSH rule is responsible for determining which OSDs receive the data for all the pools using that particular CRUSH rule. A pool name must be specified for each I/O request. When a cluster is deployed without creating a pool, Ceph uses the default pools for storing data. By default, only the rbd pool is created when Red Hat Ceph Storage is installed. The ceph osd lspools command displays the current pools in the cluster. This includes the pools created by the undercloud to integrate Red Hat Ceph Storage with Red Hat OpenStack Platform services. [root@demo]# ceph osd lspools 0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,
Users A Ceph client, which can be either a user or a service, requires a Ceph user to access the Ceph cluster. By default, Red Hat Ceph Storage creates the admin user. The admin user can create other users and their associated key-ring files. Each user has an associated key-ring file. The usual location of this file is the /etc/ceph directory on the client machine. Permissions are granted at the pool level for each Ceph user, either for all pools or to one or more specific pools. These permissions can be read, write, or execute. The users available in a Ceph cluster can be listed using the ceph auth list command. These users include the admin user created by default, and the openstack user created by the undercloud for integration with Red Hat OpenStack Platform services. [root@demo]# ceph auth list installed auth entries: [... output omitted ...] client.admin key: AQBELB9ZAAAAABAAt7mbiBwBA8H60Z7p34D6hA== caps: [mds] allow * caps: [mon] allow * caps: [osd] allow * [... output omitted ...] client.openstack key: AQBELB9ZAAAAABAAmS+6yVgIuc7aZA/CL8rZoA== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics
120
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Integration with Red Hat OpenStack Platform
Integration with Red Hat OpenStack Platform The undercloud supports the deployment of Red Hat Ceph Storage as the back end for Red Hat OpenStack Platform services, such as the block storage and the image services. Both the block storage and image services use RBD images as the back end for volumes and images respectively. Each service requires a Ceph user and a Ceph pool. The undercloud creates a pool named images for the image service, and a pool named volumes for the block storage service. The undercloud also creates by default the openstack user, who has access to both the block storage service pools and the image service pools.
Hyper-converged Infrastructures The demand for computing and storage resources in cloud computing environments is growing. This growing demand is pushing for better utilization of the underlying hardware resources. The undercloud supports this initiative by supporting the deployment of hyper-converged nodes. These hyper-converged nodes include both compute and Red Hat Ceph Storage services. The undercloud supports the deployment and management of Red Hat OpenStack Platform environments that only use hyper-converged nodes, as well as Red Hat OpenStack Platform environments with a mix of hyper-converged and compute nodes without any Ceph service. Hyper-converged node configuration needs to be adjusted manually after deployment to avoid degradation of either computing or storage services, because of shared hardware resources.
Troubleshooting Ceph Red Hat Ceph Storage uses a configuration file, ceph.conf, under the /etc/ceph directory. All the machines running Ceph daemons, and the Ceph clients use this configuration file. Each Ceph daemon creates a log file on the machine where it is running. These log files are located in the / var/log/ceph directory. The Red Hat Ceph Storage CLI tools provide several commands that you can use to determine the status of the Ceph cluster. For example, the ceph health command determines the current health status of the cluster. This status can be HEALTH_OK when no errors are present, HEALTH_WARN, or HEALTH_ERR when the cluster has some issues. [root@demo]# ceph health HEALTH_OK
The ceph -s command provides more details about the Ceph cluster's status, such as the number of MONs and OSDs and the status of the current placement groups (PGs). [root@demo]# ceph -s cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e53: 3 osds: 3 up, 3 in flags sortbitwise pgmap v1108: 224 pgs, 6 pools, 595 MB data, 404 objects 1897 MB used, 56437 MB / 58334 MB avail 224 active+clean
The ceph -w command, in addition to the Ceph cluster's status, returns Ceph cluster events. Enter Ctrl+C to exit this command.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
121
Chapter 4. Managing Storage
[root@demo]# ceph -w cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e53: 3 osds: 3 up, 3 in flags sortbitwise pgmap v1108: 224 pgs, 6 pools, 595 MB data, 404 objects 1897 MB used, 56437 MB / 58334 MB avail 224 active+clean 2017-05-30 15:43:35.402634 mon.0 [INF] from='client.? 172.24.3.3:0/2002402609' entity='client.admin' cmd=[{"prefix": "auth list"}]: dispatch ...output omitted...
There are other commands available, such as the ceph osd tree command, which shows the status of the OSD daemons, either up or down. This command also displays the machine where those OSD daemons are running. [root@demo]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.05499 root default -2 0.05499 host overcloud-cephstorage-0 0 0.01799 osd.0 up 1.00000 1.00000 1 0.01799 osd.1 up 1.00000 1.00000 2 0.01799 osd.2 up 1.00000 1.00000
OSD daemons can be managed using systemd unit files. The systemctl stop cephosd@osdid command supports the management of a single OSD daemon with the ID osdid. This command has to be executed in the Ceph node where the OSD with the corresponding ID is located. If the OSD with an ID of 0 is located on the demo server, the following command would be used to stop that OSD daemon: [root@demo]# systemctl stop ceph-osd@0
Troubleshooting OSD Problems If the cluster is not healthy, the ceph -s command displays a detailed status report. This status report contains the following information: • Current status of the OSDs (up, down, out, in). An OSD's status is up if the OSD is running, and down if the OSD is not running. An OSD's status is in if the OSD allows data read and write, or out if the OSD does not. • OSD capacity limit information (nearfull or full). • Current status of the placement groups (PGs). Although Ceph is built for seamless scalability, this does not mean that the OSDs cannot run out of space. Space-related warning or error conditions are reported both by the ceph -s and ceph health commands, and OSD usage details are reported by the ceph osd df command. When an OSD reaches the full threshold, it stops accepting write requests, although read requests are still served.
122
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Ceph Troubleshooting MON Problems Monitoring servers (MONs) maintain the cluster map to ensure cluster quorum and to avoid typical split-brain situations. For a Ceph cluster to be healthy, it has to have quorum, which means that more than half of the configured MON servers are operational, and that operational MON servers communicate with each other. If a MON only sees half of all other MONs or fewer, it becomes non-operational. This behavior prohibits normal operation, and can lead to cluster downtime, affecting users. Usually a MON failure is related to network problems, but additional information about what caused the crash can be gained using the ceph daemon mon.monid quorum_status command, and investigating the /var/log/ceph/ceph.log and /var/log/ ceph/ceph-mon.hostname.log files. In the previous command, a MON ID uses the following format: mon.monid, where monid is the ID of the MON (a number starting at 0). Recovery typically involves restarting any failed MONs. If the MON with an ID of 1 is located on the demo server, the following command would be used to get additional information about the quorum status for the MON: [root@demo]# ceph daemon mon.1 quorum_status
Configuring Ceph Storage The following steps outline the process for managing Ceph MON and OSD daemons and verifying their status. 1.
Log in to an OpenStack controller node.
2.
Verify the availability of the ceph client key rings.
3.
Verify the monitor daemon and authentication settings in the Ceph cluster's configuration file.
4.
Verify that the Ceph cluster health is HEALTH_OK.
5.
Verify the number of MON and OSD daemons configured in the Ceph cluster.
6.
Verify that the MON daemon's associated service, ceph-mon, is running.
7.
Locate the log file for the MON daemon.
8.
Log in to a Ceph node.
9.
Verify which two OSDs are in the up state.
10.
Locate the log files for the three OSD daemons.
References Further information is available in the Red Hat Ceph Storage for the Overcloud Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
123
Chapter 4. Managing Storage
Guided Exercise: Configuring Ceph Storage In this exercise, you will verify the status of a Ceph cluster. You will also verify the Ceph cluster configuration as the back end for OpenStack services. Finally you will troubleshoot and fix an issue with a Ceph OSD. Outcomes You should be able to: • Verify the status of a Ceph cluster. • Verify Ceph pools and user for Red Hat OpenStack Platform services. • Troubleshoot and fix an issue with a Ceph OSD. Before you begin Log in to workstation as student using student as the password. From workstation, run lab storage-config-ceph setup to verify that OpenStack services are running and the resources created in previous sections are available. [student@workstation ~]$ lab storage-config-ceph setup
Steps 1. Verify that the Ceph cluster status is HEALTH_OK. 1.1. Log in to controller0 using the heat-admin user. [student@workstation ~]$ ssh heat-admin@controller0
1.2. Verify Ceph cluster status using the sudo ceph health command. [heat-admin@overcloud-controller-0 ~]$ sudo ceph health HEALTH_OK
2.
Verify the status of the Ceph daemons and the cluster's latest events. 2.1. Using the sudo ceph -s command, you will see a MON daemon and three OSD daemons. The three OSD daemons' states will be up and in. [heat-admin@overcloud-controller-0 ~]$ sudo ceph -s cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e50: 3 osds: 3 up, 3 in flags sortbitwise pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects 121 MB used, 58213 MB / 58334 MB avail 224 active+clean
124
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
2.2. Display the Ceph cluster's latest events using the sudo ceph -w command. Press Ctrl+C to break the event listing. [heat-admin@overcloud-controller-0 ~]$ sudo ceph -w cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e50: 3 osds: 3 up, 3 in flags sortbitwise pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects 121 MB used, 58213 MB / 58334 MB avail 224 active+clean 2017-05-22 10:48:03.427574 mon.0 [INF] pgmap v574: 224 pgs: 224 active+clean; 1359 kB data, 122 MB used, 58212 MB / 58334 MB avail ...output omitted... Ctrl+C
3.
Verify that the pools and the openstack user, required for configuring Ceph as the back end for Red Hat OpenStack Platform services, are available. 3.1. Verify that the images and volumes pools are available using the sudo ceph osd lspools command. [heat-admin@overcloud-controller-0 ~]$ sudo ceph osd lspools 0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,
3.2. Verify that the openstack user is available using the sudo ceph auth list command. This user will have rwx permissions for both the images and volumes pools. [heat-admin@overcloud-controller-0 ~]$ sudo ceph auth list ...output omitted... client.openstack key: AQBELB9ZAAAAABAAmS+6yVgIuc7aZA/CL8rZoA== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics ...output omitted...
4.
Stop the OSD daemon with ID 0. Verify the Ceph cluster's status. 4.1. Verify that the Ceph cluster's status is HEALTH_OK, and the three OSD daemons are up and in. [heat-admin@overcloud-controller-0 ~]$ sudo ceph -s cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e50: 3 osds: 3 up, 3 in flags sortbitwise pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
125
Chapter 4. Managing Storage 121 MB used, 58213 MB / 58334 MB avail 224 active+clean
4.2. Log out of controller0. Log in to ceph0 as heat-admin. [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$ ssh heat-admin@ceph0
4.3. Use the systemd unit file for ceph-osd to stop the OSD daemon with ID 0. [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl stop ceph-osd@0
4.4. Verify that the OSD daemon with ID 0 is down. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY ...output omitted... 0 0.01799 osd.0 down 1.00000 1.00000 ...output omitted...
4.5. Verify the Ceph cluster's status is HEALTH_WARN. The two OSDs daemons are up and in out of three. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -w cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_WARN 224 pgs degraded 224 pgs undersized recovery 72/216 objects degraded (33.333%) 1/3 in osds are down monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e43: 3 osds: 2 up, 3 in; 224 remapped pgs flags sortbitwise pgmap v153: 224 pgs, 6 pools, 1720 kB data, 72 objects 114 MB used, 58220 MB / 58334 MB avail 72/216 objects degraded (33.333%) 224 active+undersized+degraded mon.0 [INF] pgmap v153: 224 pgs: 224 active+undersized+degraded; 1720 kB data, 114 MB used, 58220 MB / 58334 MB avail; 72/216 objects degraded (33.333%) mon.0 [INF] osd.0 out (down for 304.628763) mon.0 [INF] osdmap e44: 3 osds: 2 up, 2 in ...output omitted... Ctrl+C
5.
Start the OSD daemon with ID 0 to fix the issue. Verify that the Ceph cluster's status is HEALTH_OK. 5.1. Use the systemd unit file for ceph-osd to start the OSD daemon with ID 0. [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl start ceph-osd@0
126
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
5.2. Verify the Ceph cluster's status is HEALTH_OK. The three OSD daemons are up and in. It may take some time until the cluster status changes to HEALTH_OK. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -s cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8 health HEALTH_OK monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0} election epoch 4, quorum 0 overcloud-controller-0 osdmap e50: 3 osds: 3 up, 3 in flags sortbitwise pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects 121 MB used, 58213 MB / 58334 MB avail 224 active+clean
5.3. Exit the ceph0 node to return to workstation. [heat-admin@overcloud-cephstorage-0 ~]$ exit [student@workstation ~]$
Cleanup From workstation, run the lab storage-config-ceph cleanup script to clean up this exercise. [student@workstation ~]$ lab storage-config-ceph cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
127
Chapter 4. Managing Storage
Managing Object Storage Objectives After completing this section, students should be able to manage Swift as object storage.
Swift Architecture Swift is a fully distributed storage solution, where both static data and binary objects are stored. It is neither a file system nor a real-time data storage system. It can easily scale to multiple petabytes or billions of objects. The Swift components listed in the following table are all required for the architecture to work properly. Component
Description
Proxy Server
Processes all API calls and locates the requested object. Encodes and decodes data if Erasure Code is being used.
Ring
Maps the names of entities to their stored location on disk. Accounts, containers, and object servers each have their own ring.
Account Server
Holds a list of all containers.
Container Server
Holds a list of all objects.
Object Server
Stores, retrieves, and deletes objects.
The proxy server interacts with the appropriate ring to route requests and locate objects. The ring stores a mapping between stored entities and their physical location. By default, each partition of the ring is replicated three times to ensure a fully distributed solution. Data is evenly distributed across the capacity of the cluster. Zones ensure that data is isolated. Because data is replicated across zones, failure in one zone does not impact the rest of the cluster.
Removing and Rebalancing Zones It is important to understand the concepts behind a storage system, to comprehend the policies, and to design and plan carefully before production. Zones are created to ensure that failure is not an option. Each data replica should reside within a different zone. Zone configuration ensures that should one zone fail there are still two up and running that can either accept new objects or retrieve stored objects. The recommended number of zones is five, on five separate nodes. As mentioned previously, Swift, by default, writes three replicas. If there are only three zones and one becomes unavailable, Swift cannot hand off the replica to another node. With five nodes, Swift has options and can automatically write the replica to another node ensuring that eventually there will be three replicas. After Swift is set up and configured, it is possible to rectify or alter the storage policy. Extra devices can be added at any time.
128
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Swift Commands Storage rings can be built on any hardware that has the appropriate version of Swift installed. Upon building or rebalancing (changing) the ring structure, the rings must be redistributed to include all of the servers in the cluster. The swift-ring-builder utility is used to build and manage rings. To build the three rings for account, object, and container, the following syntax is used to add a new device to a ring: [root@demo]# swift-ring-builder account.builder add zzone-ipaddress:6202/device weight [root@demo]# swift-ring-builder container.builder add zzone-ipaddress:6201/device weight [root@demo]# swift-ring-builder object.builder add zzone-ipaddress:6200/device weight
The zone includes a number as the ID for the rack to which the server belongs. The ipaddress is the IP address of the server. The device is the device partition to add. The weight includes the size of the device's partition.
Note Pfrior to the Netwon release of OpenStack, the Object service used ports 6002, 6001 and 6000 for the account, container, and object services. These earlier default Swift ports overlapped with ports already registered with IANA for X-Server, causing SELinux policy conflicts and security risks. Red Hat OpenStack Platform switched to the new ports in the Juno release, and the upstream Swift project completed the switch in Newton.
Swift Commands There are two sets of commands for Swift, an older version and a newer version. The older commands, for example, swift post, swift list, and swift stat, are still supported. However, OpenStack is moving to the OpenStack Unified CLI described below.
Note By default, the following commands require the OpenStack user to have either the admin or swiftoperator roles.
The openstack container command is used to manage objects in Openstack. The openstack container create command is used to create containers: [user@demo ~]$ openstack container create cont1
The openstack container list command displays all containers available to the user: [user@demo ~]$ openstack container list +------------+ | Name | +------------+ | cont1 | +------------+
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
129
Chapter 4. Managing Storage The openstack container delete command deletes the specified container: [user@demo ~]$ openstack container delete cont1
The openstack object create command uploads an existing object to the specified container: [user@demo ~]$ openstack object create cont1 object.cont1 +--------------+-----------+----------------------------------+ | object | container | etag | +--------------+-----------+----------------------------------+ | object.cont1 | cont1 | d41d8cd98f00b204e9800998ecf8427e | +--------------+-----------+----------------------------------+
The openstack container save command saves the contents of an existing container locally: [user@demo ~]$ openstack container save cont1 [user@demo ~]$ ls -al -rw-rw-r--. 1 user user 0 May 29 09:45 object.cont1
The openstack object list command lists all of the objects stored in the specified container: [user@demo ~]$ openstack object list cont1 +--------------+ | Name | +--------------+ | object.cont1 | +--------------+
The openstack object delete command deletes an object from the specified container: [user@demo ~]$ openstack object delete cont1 object.cont1
Comparing Ceph with Swift for Object Storage Both Swift and Ceph are open source Object Storage systems. They both use standard hardware, allow scale-out storage, and are easy to deploy in enterprises of all sizes. This is perhaps where the similarities end. Ceph lends itself to block access storage, transactional storage, and is recommended for single sites. Swift uses Object API access to storage, and is recommended for unstructured data and geographical distribution. Applications that mostly use block access storage are built in a different way from those that use object access storage. The decision might come down to which applications need object storage and how they access it. Swift protects written data first and can therefore take additional time to update the entire cluster. Ceph does not do this, which makes it a better candidate for databases and real-time data. Swift would be a better choice for large-scale, geographically dispersed, unstructured data. This means that you might need or want both Ceph and Swift. This decision will depend on the types of applications, the geographical structure of your data centers, the type of objects that need to be stored, consistency of the data replicated, transactional performance requirements, and the number of objects to be stored.
130
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Benefits, Use Cases, and Recommended Practices
Benefits, Use Cases, and Recommended Practices Comparing object storage with block storage One of the main differences between block storage and object storage is that a volume can only be accessed via instances, and by one instance at a time, whereas any instance or service can access the objects stored in containers as all objects stored within Swift have an accessible URL. Swift also supports the Amazon Simple Storage Service (S3) API. The Benefits of Using Swift Object storage has several distinct advantages over volume storage. As previously mentioned, it is accessible from any OpenStack service, it supports the Amazon S3 API, and is fully distributed. The reduced cost can also be an advantage, with object storage you only pay for the amount of storage that you use—you upload 5GB, you pay for 5GB. With volume storage, you pay for the size of the disk you create; if you create a 50GB volume, you will pay for all 50GB whether or not it is all used. However, be aware that if you use Swift over multiple data centers then the cost can spiral because you are moving a lot of data over the internet; this can get expensive. Swift is best used for large pools of small objects. It is easily scalable, whereas volumes are not. Use Cases A major university uses Swift to store videos of every sporting event for both men's and women's sporting events. All events for an entire year are stored in an omnipresent and easily accessible storage solution. Students, alumni, and fans can use any internet-enabled web browser to access the university's web site and click a link to view, in its entirety, their desired sporting event. Recommended Practice: Disk Failure It is Friday night and a drive has failed. You do not want to start messing with it before the weekend. Swift starts an automatic, self-healing, workaround by writing replicas to a hand-off node. Monday comes around and you change the failed drive, format it and mount it. The drive is, of course, empty. Swift, however, will automatically start replicating data that is supposed to be in that zone. In this case, you do not even have to do anything to the ring as the physical drive was simply replaced—zones do not change so no need to rebalance the ring.
Note If you were to change the size of the physical drive, then you would have to rebalance the ring.
Configuration and Log Files File name
Description
/var/log/swift/swift.log
Default location of all log entries.
/var/log/messages
Location of all messages related to HAProxy and Swift configuration, and Swift CLI tool requests.
/etc/swift/object-server.conf
Holds the configuration for the different back-end Swift services supporting replication (objectreplicator), object information management in
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
131
Chapter 4. Managing Storage File name
Description containers (object-updater), and object integrity (object-auditor).
Troubleshooting Swift Swift logs all troubleshooting events in /var/log/swift/swift.log. You should start your troubleshooting process here. Swift logging is very verbose and the generated logs can be used for monitoring, audit records, and performance. Logs are organized by log level and syslog facility. Log lines for the same request have the same transaction ID. Make sure that all processes are running; the basic ones required are Proxy Server, Account Server, Container Server, Object Server, and Auth Server. [user@demo ~]$ ps -aux | grep swift swift 2249....../usr/bin/python2 container-server.conf swift 2267....../usr/bin/python2 account-server.conf swift 2275....../usr/bin/python2 container-server.conf swift 2276....../usr/bin/python2 server.conf swift 2281....../usr/bin/python2 container-server.conf swift 2294....../usr/bin/python2 server.conf swift 2303....../usr/bin/python2 server.conf swift 2305....../usr/bin/python2 server.conf swift 2306....../usr/bin/python2 server.conf swift 2311....../usr/bin/python2 container-server.conf swift 2312....../usr/bin/python2 server.conf swift 2313....../usr/bin/python2 server.conf swift 2314....../usr/bin/python2 server.conf swift 2948....../usr/bin/python2 server.conf swift 2954....../usr/bin/python2 container-server.conf swift 2988....../usr/bin/python2 server.conf
/usr/bin/swift-container-updater /etc/swift/ /usr/bin/swift-account-replicator /etc/swift/ /usr/bin/swift-container-auditor /etc/swift/ /usr/bin/swift-account-reaper /etc/swift/account/usr/bin/swift-container-replicator /etc/swift/ /usr/bin/swift-object-updater /etc/swift/object/usr/bin/swift-account-auditor /etc/swift/account/usr/bin/swift-object-replicator /etc/swift/object/usr/bin/swift-object-auditor /etc/swift/object/usr/bin/swift-container-server /etc/swift/ /usr/bin/swift-account-server /etc/swift/account/usr/bin/swift-object-server /etc/swift/object/usr/bin/swift-proxy-server /etc/swift/proxy/usr/bin/swift-account-server /etc/swift/account/usr/bin/swift-container-server /etc/swift/ /usr/bin/swift-object-server /etc/swift/object-
Detecting Failed Drives Swift has a script called swift-drive-audit, which you can run either manually or via cron. This script checks for bad drives and unmounts them if any errors are found. Swift then works around the bad drives by replicating data to another drive. The output of the script is written to / var/log/kern.log. Drive Failure It is imperative to unmount the failed drive; this should be the first step taken. This action makes object retrieval by Swift much easier. Replace the drive, format it and mount it, and let the
132
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Swift replication feature take over. The new drive will quickly populate with replicas. If a drive cannot be replaced immediately, ensure that it is unmounted, that the mount point is owned by root, and the device weight is set to 0. Setting the weight to 0 is preferable to removing it from the ring because it gives Swift the chance to try and replicate from the failing disk (it could be that some data is retrievable), and after the disk has been replaced you can increase the weight of the disk, removing the need to rebuild the ring. The following commands show how to change the weight of a device using the swiftring-builder command. In the following command, service is either account, object, or container, device is the device's partition name, and weight is the new weight. [root@demo]# swift-ring-builder service.builder set_weight device weight
For example, to set the weight of a device named vdd to 0, the previous command must be executed using the three rings, as follows: [root@demo]# swift-ring-builder account.builder set_weight z1-172.24.4.1:6002/vdd 0 d1r1z1-172.24.4.1:6002R172.24.4.1:6002/vdd_"" weight set to 0.0 [root@demo]# swift-ring-builder container.builder set_weight z1-172.24.4.1:6001/vdd 0 d1r1z1-172.24.4.1:6002R172.24.4.1:6001/vdd_"" weight set to 0.0 [root@demo]# swift-ring-builder object.builder set_weight z1-172.24.4.1:6000/vdd 0 d1r1z1-172.24.4.1:6002R172.24.4.1:6000/vdd_"" weight set to 0.0
The three rings must then be rebalanced: [root@demo]# swift-ring-builder account.builder rebalance [root@demo]# swift-ring-builder container.builder rebalance [root@demo]# swift-ring-builder object.builder rebalance
The device can be added back to Swift using the swift-ring-builder set_weight command, with the new weight for the device. The device's weight has to be updated in the three rings. For example, if a device's weight has to be changed to 100, the following commands must be executed using the three rings, as follows: [root@demo]# swift-ring-builder account.builder set_weight z1-172.24.4.1:6002/vdd 100 d1r1z1-172.24.4.1:6002R172.24.4.1:6002/vdd_"" weight set to 100.0 [root@demo]# swift-ring-builder container.builder set_weight z1-172.24.4.1:6001/vdd 100 d1r1z1-172.24.4.1:6002R172.24.4.1:6001/vdd_"" weight set to 100.0 [root@demo]# swift-ring-builder object.builder set_weight z1-172.24.4.1:6000/vdd 100 d1r1z1-172.24.4.1:6002R172.24.4.1:6000/vdd_"" weight set to 100.0
The three rings must then be rebalanced. The weight associated with each device on each ring can then be obtained using the swift-ring-builder command. The following command returns information for each device, including the weight associated with the device in that ring: [root@demo]# swift-ring-builder /etc/swift/account.builder /etc/swift/account.builder, build version 6 ... output omitted ... Devices: id region zone ip address:port replication ip:port balance 0 1 1 172.24.4.1:6002 172.24.4.1:6002 flags meta 1024 100.00
name weight partitions vdd 100.00
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
133
Chapter 4. Managing Storage Server Failure Should a server be experiencing hardware issues, ensure that the Swift services are not running. This guarantees that Swift will work around the failure and start replicating to another server. If the problem can be fixed within a relatively short time, for example, a couple of hours, then let Swift work around the failure automatically and get the server back online. When online again, Swift will ensure that anything missing during the downtime is updated. If the problem is more severe, or no quick fix is possible, it is best to remove the devices from the ring. After repairs have been carried out, add the devices to the ring again. Remember to reformat the devices before adding them to the ring, because they will almost certainly be responsible for a different set of partitions than before.
Managing Object Storage The following steps outline the process for managing object storage using the OpenStack unified CLI. 1.
Source the keystone credentials environment file.
2.
Create a new container.
3.
Verify that the container has been correctly created.
4.
Create a file to upload to the container.
5.
Upload the file as an object to the container.
6.
Verify that the object has been correctly created.
7.
Download the object.
References Further information is available in the Object Storage section of the Storage Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
134
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Managing Object Storage
Guided Exercise: Managing Object Storage In this exercise, you will upload an object to the OpenStack object storage service, retrieve that object from an instance, and then verify that the object has been correctly downloaded to the instance. Resources Files
/home/student/developer1-finance-rc
Outcomes You should be able to: • Upload an object to the OpenStack object storage service. • Download an object from the OpenStack object storage service to an instance. Before you begin Log in to workstation as student using student as the password. From workstation, run lab storage-obj-storage setup to verify that OpenStack services are running and the resources created in previous sections are available. [student@workstation ~]$ lab storage-obj-storage setup
Steps 1. Create a 10MB file named dataset.dat. As the developer1 user, create a container called container1 in the OpenStack object storage service. Upload the dataset.dat file to this container. 1.1. Create a 10MB file named dataset.dat. [student@workstation ~]$ dd if=/dev/zero of=~/dataset.dat bs=10M count=1
1.2. Load the credentials for the developer1 user. This user has been configured by the lab script with the role swiftoperator. [student@workstation ~]$ source developer1-finance-rc
1.3. Create a new container named container1. [student@workstation ~(developer1-finance)]$ openstack container create \ container1 +--------------------+------------+---------------+ | account | container | x-trans-id | +--------------------+------------+---------------+ | AUTH_c968(...)020a | container1 | tx3b(...)e8f3 | +--------------------+------------+---------------+
1.4. Upload the dataset.dat file to the container1 container.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
135
Chapter 4. Managing Storage
[student@workstation ~(developer1-finance)]$ openstack object create \ container1 dataset.dat +-------------+------------+----------------------------------+ | object | container | etag | +-------------+------------+----------------------------------+ | dataset.dat | container1 | f1c9645dbc14efddc7d8a322685f26eb | +-------------+------------+----------------------------------+
2.
Download the dataset.dat object to the finance-web1 instance created by the lab script. 2.1. Verify that the finance-web1 instance's status is ACTIVE. Verify the floating IP address associated with the instance. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web1 +------------------------+---------------------------------------------+ | Field | Value | +------------------------+---------------------------------------------+ ...output omitted... | addresses | finance-network1=192.168.0.N, 172.25.250.P | ...output omitted... | key_name | developer1-keypair1 | | name | finance-web1 | ...output omitted... | status | ACTIVE | ...output omitted... +------------------------+---------------------------------------------+
2.2. Copy the credentials file for the developer1 user to the finance-web1 instance. Use the cloud-user user and the /home/student/developer1-keypair1.pem key file. [student@workstation ~(developer1-finance)]$ scp -i developer1-keypair1.pem \ developer1-finance-rc \ [email protected]:~
2.3. Log in to the finance-web1 instance using cloud-user as the user and the /home/ student/developer1-keypair1.pem key file. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \ [email protected]
2.4. Load the credentials for the developer1 user. [cloud-user@finance-web1 ~]$ source developer1-finance-rc
2.5. Download the dataset.dat object from the object storage service. [cloud-user@finance-web1 ~(developer1-finance)]$ openstack object save \ container1 dataset.dat
136
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
2.6. Verify that the dataset.dat object has been downloaded. When done, log out from the instance. [cloud-user@finance-web1 ~(developer1-finance)]$ ls -lh dataset.dat -rw-rw-r--. 1 cloud-user cloud-user 10M May 26 06:58 dataset.dat [cloud-user@finance-web1 ~(developer1-finance)]$ exit
Cleanup From workstation, run the lab storage-obj-storage cleanup script to clean up this exercise. [student@workstation ~]$ lab storage-obj-storage cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
137
Chapter 4. Managing Storage
Lab: Managing Storage In this lab, you will fix an issue in the Ceph environment. You will also upload a MOTD file to the OpenStack object storage service. Finally, you will retrieve that MOTD file inside an instance. Resources Files:
http://materials.example.com/motd.custom
Outcomes You should be able to: • Fix an issue in a Ceph environment. • Upload a file to the Object storage service. • Download and implement an object in the Object storage service inside an instance. Before you begin Log in to workstation as student using student as the password. From workstation, run lab storage-review setup, which verifies OpenStack services and previously created resources. This script also misconfigures Ceph and launches a productionweb1 instance with OpenStack CLI tools. [student@workstation ~]$ lab storage-review setup
Steps 1. The Ceph cluster has a status issue. Fix the issue to return the status to HEALTH_OK. 2.
As the operator1 user, create a new container called container4 in the Object storage service. Upload the custom MOTD file available at http://materials.example.com/ motd.custom to this container.
3.
Log in to the production-web1 instance, and download the motd.custom object from Swift to /etc/motd. Use the operator1 user credentials.
4.
Verify that the MOTD file includes the message Updated MOTD message.
Evaluation On workstation, run the lab storage-review grade command to confirm success of this exercise. [student@workstation ~]$ lab storage-review grade
Cleanup From workstation, run the lab storage-review cleanup script to clean up this exercise. [student@workstation ~]$ lab storage-review cleanup
138
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution In this lab, you will fix an issue in the Ceph environment. You will also upload a MOTD file to the OpenStack object storage service. Finally, you will retrieve that MOTD file inside an instance. Resources Files:
http://materials.example.com/motd.custom
Outcomes You should be able to: • Fix an issue in a Ceph environment. • Upload a file to the Object storage service. • Download and implement an object in the Object storage service inside an instance. Before you begin Log in to workstation as student using student as the password. From workstation, run lab storage-review setup, which verifies OpenStack services and previously created resources. This script also misconfigures Ceph and launches a productionweb1 instance with OpenStack CLI tools. [student@workstation ~]$ lab storage-review setup
Steps 1. The Ceph cluster has a status issue. Fix the issue to return the status to HEALTH_OK. 1.1. Log in to ceph0 as the heat-admin user. [student@workstation ~]$ ssh heat-admin@ceph0
1.2. Determine the Ceph cluster status. This status will be HEALTH_WARN. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph health HEALTH_WARN 224 pgs degraded; 224 pgs stuck unclean; 224 pgs undersized; recovery 501/870 objects degraded (57.586%)
1.3. Determine what the issue is by verifying the status of the Ceph daemons. Only two OSD daemons will be reported as up and in, instead of the expected three up and three in. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -s health HEALTH_WARN ...output omitted... osdmap e50: 3 osds: 2 up, 2 in; 224 remapped pgs flags sortbitwise ...output omitted...
1.4. Determine which OSD daemon is down. The status of the OSD daemon with ID 0 on ceph0 is down.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
139
Chapter 4. Managing Storage
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.05499 root default -2 0.05499 host overcloud-cephstorage-0 0 0.01799 osd.0 down 0 1.00000 1 0.01799 osd.1 up 1.00000 1.00000 2 0.01799 osd.2 up 1.00000 1.00000
1.5. Start the OSD daemon with ID 0 using the systemd unit file. [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl start ceph-osd@0
1.6. Verify that the Ceph cluster status is HEALTH_OK. Initial displays may show the Ceph cluster in recovery mode, with the percentage still degraded shown in parenthesis. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph health HEALTH_WARN 8 pgs degraded; recovery 26/27975 objects degraded (0.093%) [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph health HEALTH_OK
1.7. Exit the ceph0 node to return to workstation. [heat-admin@overcloud-cephstorage-0 ~]$ exit
2.
As the operator1 user, create a new container called container4 in the Object storage service. Upload the custom MOTD file available at http://materials.example.com/ motd.custom to this container. 2.1. Download the motd.custom file from http://materials.example.com/ motd.custom. [student@workstation ~]$ wget http://materials.example.com/motd.custom
2.2. View the contents of the motd.custom file. This file contains a new MOTD message. [student@workstation ~]$ cat ~/motd.custom Updated MOTD message
2.3. Load the credentials for the operator1 user. [student@workstation ~]$ source operator1-production-rc
2.4. Create a new container named container4. [student@workstation ~(operator1-production)]$ openstack container create \ container4 +--------------------+------------+---------------+ | account | container | x-trans-id | +--------------------+------------+---------------+ | AUTH_fd0c(...)63da | container4 | txb9(...)8011 |
140
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution +--------------------+------------+---------------+
2.5. Create a new object in the container4 container using the motd.custom file. [student@workstation ~(operator1-production)]$ openstack object create \ container4 motd.custom +-------------+------------+----------------------------------+ | object | container | etag | +-------------+------------+----------------------------------+ | motd.custom | container4 | 776c9b861983c6e95da77499046113bf | +-------------+------------+----------------------------------+
3.
Log in to the production-web1 instance, and download the motd.custom object from Swift to /etc/motd. Use the operator1 user credentials. 3.1. Verify the floating IP for the production-web1 instance. [student@workstation ~(operator1-production)]$ openstack server list \ -c Name -c Networks +-----------------+-------------------------------------------------+ | Name | Networks | +-----------------+-------------------------------------------------+ | production-web1 | production-network1=192.168.0.N, 172.25.250.P | +-----------------+-------------------------------------------------+
3.2. Copy the operator1 user credentials to the production-web1 instance. Use clouduser as the user and the /home/student/operator1-keypair1.pem key file. [student@workstation ~(operator1-production)]$ scp \ -i ~/operator1-keypair1.pem \ operator1-production-rc \ [email protected]:~
3.3. Log in to the production-web1 instance as the cloud-user user. Use the /home/ student/operator1-keypair1.pem key file. [student@workstation ~(operator1-production)]$ ssh \ -i ~/operator1-keypair1.pem \ [email protected]
3.4. Load the operator1 user credentials. [cloud-user@production-web1 ~]$ source operator1-production-rc
3.5. Download the motd.custom object from the Object service using the operator1production-rc user credentials. Use the --file option to save the object as /etc/ motd. Because writing /etc files requires root privileges, use sudo. Use the -E option to carry the operator1 shell environment credentials into the new sudo root child shell, because this command requires operator1's access to the Object storage container while also requiring root privilege to write the /etc/motd file.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
141
Chapter 4. Managing Storage
[cloud-user@production-web1 ~(operator1-production)]$ sudo -E \ openstack object save \ --file /etc/motd \ container4 \ motd.custom
4.
Verify that the MOTD file includes the message Updated MOTD message. 4.1. Verify that the MOTD file was updated. [cloud-user@production-web1 ~(operator1-production)]$ cat /etc/motd Updated MOTD message
Evaluation On workstation, run the lab storage-review grade command to confirm success of this exercise. [student@workstation ~]$ lab storage-review grade
Cleanup From workstation, run the lab storage-review cleanup script to clean up this exercise. [student@workstation ~]$ lab storage-review cleanup
142
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Summary
Summary In this chapter, you learned: • Red Hat OpenStack Platform supports both Red Hat Ceph Storage and NFS as storage back ends. • The Red Hat Ceph Storage architecture is based on monitor (MON) daemons and object storage device (OSD) daemons. • Red Hat Ceph Storage features include seamless scalability and no single point of failure. • The Red Hat OpenStack Platform block storage and image services use RBDs to access Ceph, and require both a user and pool to access the cluster. • The Red Hat OpenStack Platform object storage service (Swift) provides object storage for instances. • The Swift architecture includes a front-end service, the proxy server, and three back-end services: the account server, the object server, and the container server. • Users can create containers in Swift, and upload objects to those containers.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
143
144
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 5
MANAGING AND TROUBLESHOOTING VIRTUAL NETWORK INFRASTRUCTURE Overview Goal
Manage and troubleshoot virtual network infrastructure
Objectives
• Manage software-defined networking (SDN) segments and subnets. • Follow multi-tenant network paths. • Troubleshoot software-defined network issues.
Sections
• Managing SDN Segments and Subnets (and Guided Exercise) • Tracing Multitenancy Network Flows (and Guided Exercise) • Troubleshooting Network Issues (and Guided Exercise)
Lab
• Managing and Troubleshooting Virtual Network Infrastructure
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
145
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Managing SDN Segments and Subnets Objectives After completing this section, students should be able to: • Discuss Software-defined networking (SDN). • Discuss SDN implementation and use cases.
Software-defined Networking Software-defined networking (SDN) is a networking model that allows network administrators to manage network services through the abstraction of several networking layers. SDN decouples the software that handles the traffic, called the control plane, and the underlying mechanisms that route the traffic, called the data plane. SDN enables communication between the control plane and the data plane. For example, the OpenFlow project, combined with the OpenDaylight project, provides such implementation. SDN does not change the underlying protocols used in networking; rather, it enables the utilization of application knowledge to provision networks. Networking protocols, such as TCP/ IP and Ethernet standards, rely on manual configuration by administrators for applications. They do not manage networking applications, such as their network usage, the endpoint requirements, or how much and how fast the data needs to be transferred. The goal of SDN is to extract knowledge of how an application is being used by the application administrator or the application's configuration data itself. History The origins of SDN development can be traced to around the mid 1990s. Research and development continued through the early 2000s by several universities and organizations. In 2011, the Open Networking Foundation (ONF) was founded to promote SDN and other related technologies such as OpenFlow. Benefits of SDN Consumers continue to demand fast, reliable, secure, and omnipresent network connections to satisfy their need for personal mobile devices such as smartphones and tablets. Service providers are utilizing virtualization and SDN technologies to better meet those needs. Benefits of SDN include: • The decoupling of the control plane and data plane enables both planes to evolve independently, which results in several advantages such as high flexibility, being vendoragnostic, open programmability, and a centralized network view. • Security features that allow administrators to route traffic through a single, centrally located, firewall. One advantage of this is the ability to utilize intrusion detection methods on real-time captures of network traffic. • Automated load balancing in SDNs enhances performance of servers load balancing, and reduces the complexity of implementation.
146
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Benefits of SDN over Hardware for Networking • Network scalability allows data centers to use features of software-defined networking along with virtualized servers and storage to implement dynamic environments where computing resources are added and removed as needed. • Reduced operational costs by minimizing the need to deploy, maintain, and replace expensive hardware such as many of the servers and network switches within a data center.
Benefits of SDN over Hardware for Networking Hardware-based networking solutions require extensive manual deployment, configuration, maintenance, and a replacement plan. Traditional network infrastructures are mostly static configurations that commingle vendor solutions and proprietary hardware and software solutions that make it difficult to scale to business needs. The SDN architecture delivers an open technology that eliminates costly vendor lock-in and proprietary networking devices. Arguments for using SDN over hardware for networking are growing as the technology continues to develop as a smart and inexpensive approach to deploy network solutions. Many companies and organizations currently use SDN technology within their data centers, taking advantage of cost savings, performance factors, and scalability.
SDN Architecture and Services SDN is based on the concept of separation between controlled services and controllers that control those services. Controllers manipulate services by way of interfaces. Interfaces are mainly API invocations through some library or system call. However, such interfaces may be extended with protocol definitions that use local inter-process communication (IPC) or a protocol that can also act remotely. A protocol may be defined as an open standard or in a proprietary manner. Architectural Components The following list defines and explains the architectural components: • Application Plane: The plane where applications and services that define network behavior reside. • Management Plane: Handles monitoring, configuration, and maintenance of network devices, such as making decisions regarding the state of a network device. • Control Plane: Responsible for making decisions on how packets should be forwarded by one or more network devices, and for pushing such decisions down to the network devices for execution. • Operational Plane: Responsible for managing the operational state of the network device, such as whether the device is active or inactive, the number of ports available, the status of each port, and so on. • Forwarding Plane: Responsible for handling packets in the data path based on the instructions received from the control plane. Actions of the forwarding plane include actions like forwarding, dropping, and changing packets.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
147
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Figure 5.1: SDN Architecture
SDN Terminology Term
Definition
Application
SDN applications are programs that communicate their network requirements and desired network behavior to the SDN controller over a northbound interface (NBI).
Datapath
The SDN datapath is a logical network device that exposes visibility control over its advertised forwarding and data processing capabilities. An SDN datapath comprises a Control to Data-Plane Interface (CDPI) agent and a set of one or more traffic forwarding engines.
Controller
The SDN controller is a logically centralized entity in charge of translating the requirements from the SDN application layer down to the SDN datapaths. SDN controllers provides a view of the network to the SDN applications.
Control to Data-Plane Interface (CDPI)
The CDPI is the interface defined between an SDN controller and an SDN datapath that provides control of all forwarding operations, capabilities advertisement, statistics reporting, and event notification.
Northbound Interfaces (NBI)
NBIs are interfaces between SDN applications and SDN controllers. They typically provide network views and enable expression of network behavior and requirements.
Introduction to Networking Administrators should be familiar with networking concepts when working with Red Hat OpenStack Platform. The Neutron networking service is the SDN networking project that provides Networking-as-a-service (NaaS) in virtual environments. It implements traditional networking features such as subnetting, bridging, VLANs, and more recent technologies, such as VXLANs and GRE tunnels. Network Bridges A network bridge is a network device that connects multiple network segments. Bridges can connect multiple devices, and each device can send Ethernet frames to other devices without having the frame removed and replaced by a router. Bridges keep the traffic isolated, and in most
148
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introduction to Networking cases, the switch is aware of which MAC addresses are accessible at each port. Switches monitor network activity and maintain a MAC learning table. Generic Router Encapsulation (GRE) The Generic Routing Encapsulation (GRE) protocol is an encapsulation protocol developed by Cisco Systems, which encapsulates a wide variety of network layer protocols inside virtual point-to-point links, called tunnels, over an Internet network. A point-to-point connection is a connection between two nodes, or endpoints. The GRE protocol is used to run networks on top of other networks, and within an existing TCP/IP network, two endpoints can be configured with GRE tunnels. The GRE data is encapsulated in a header, itself encapsulated in the header of the underlying TCP/IP network. The endpoints can either be bridged or routed if IP addresses are manually assigned by administrators. For routed traffic, a single link is the next hop in a routing table.
Figure 5.2: GRE Ethernet header Virtual LAN (VLAN) You can partition a single layer 2 network to create multiple distinct broadcast domains that are mutually isolated, so that packets can only pass between them through one or more routers. Such segregation is referred to as a Virtual Local Area Network (VLAN). VLANs provide the segmentation services traditionally provided only by routers in LAN configurations. VLANs address issues such as scalability, security, and network management. Routers in VLAN topologies provide broadcast filtering, security, address summary, and traffic-flow management. VLANs can also help to create multiple layer 3 networks on a single physical infrastructure. For example, if a DHCP server is plugged into a switch, it serves any host on that switch that is configured for DHCP. By using VLANs, the network can be easily split up, so that some hosts do not use that DHCP server and obtain link-local addresses, or an address from a different DHCP server. A VLAN is defined by an IEEE 802.1Q standard for carrying traffic on an Ethernet. 802.1.Q VLANs are distinguished by their 4-bytes VLAN tag inserted in the Ethernet header. Within this 4-byte VLAN tag, 12 bits represent the VLAN ID. This limits the number of VLAN IDs on a network to 4096.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
149
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Figure 5.3: VLAN header VXLAN Tunnels Virtual eXtensible LAN (VXLAN) is a network virtualization technology that solves the scalability problems associated with large cloud computing deployments. It increases scalability up to 16 million logical networks and allows the adjacency of layer 2 links across IP networks. The VXLAN protocol encapsulates L2 networks and tunnels them over L3 networks.
Figure 5.4: VXLAN Ethernet header
Introducing the Neutron Networking Service The OpenStack networking (Neutron) project provides networking as a service, which is consumed by other OpenStack projects, such as the Nova compute service or the Designate DNS as a Service (DNSaaS). Similar to the other OpenStack services, OpenStack Networking exposes a set of various Application Program Interfaces (APIs) to programmatically build rich networking topologies and implement networking policies, such as multi-tier application topologies or highly-available web applications. OpenStack Networking ships with a set of core plug-ins that administrators can install and configure based on their needs. Such implementation allows administrators to utilize a variety of layer 2 and layer 3 networking technologies. Figure 5.5: The OpenStack Networking service shows how OpenStack Networking services can be deployed: the two compute nodes run the Open vSwitch agent, which communicate with the network node, itself running a set of dedicated OpenStack Networking services. Services includes the metadata server, the Neutron networking server, as well as a set of extra components, such as the Firewall-as-a-Service (FWaaS), or the Load Balancing-as-a-Service (LBaaS).
150
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introducing the Neutron Networking Service
Figure 5.5: The OpenStack Networking service OpenStack Networking Terminology and Concepts OpenStack Networking defines two types of networks, tenant and provider networks. Administrators can share any of these types of networks among projects as part of the network creation process. The following lists some of the OpenStack Networking concepts administrators should be familiar with. • Tenant networks OpenStack users create tenant networks for connectivity within projects. By default, these networks are completely isolated and are not shared among projects. OpenStack Networking supports the following types of network isolation and overlay technologies: ◦ Flat: All instances reside on the same network and can be shared with underlying hosts. Flat networks do not recognize the concepts of VLAN tagging or network segregation. Use cases for flat networks are limited to testing or proof-of-concept because there is no overlap allowed. Only one network is supported, which limits the number of available IP addresses. ◦ VLAN: This type of networking allows users to create multiple tenant networks using VLAN IDs, allowing network segregation. One use case is a web layer instance with traffic segregated from database layer instances. ◦ GRE and VXLAN: These networks provide encapsulation for overlay networks to activate and control communication between compute instances. • Provider networks
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
151
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure These networks map to the existing physical network in a data center and are usually flat or VLAN networks. • Subnets A subnet is a block of IP addresses provided by the tenant and provider networks whenever new ports are created. • Ports A port is a connection for attaching a single device, such as the virtual NIC of an instance, to the virtual network. Ports also provide the associated configuration, such as a MAC address and IP address, to be used on that port. • Routers Routers forward data packets between networks. They provide L3 and NAT forwarding for instances on tenant networks to external networks. A router is required to send traffic outside of the tenant networks. Routers can also be used to connect the tenant network to an external network using a floating IP address. Routers are created by authenticated users within a project and are owned by that project. When tenant instances require external access, users can assign networks that have been declared external by an OpenStack administrator to their project-owned router. Routers implement Source Network Address Translation (SNAT) to provide outbound external connectivity and Destination Network Address Translation (DNAT) for inbound external connectivity. • Security groups A security group is a virtual firewall allowing instances to control outbound and inbound traffic. It contains a set of security group rules, which are parsed when data packets are sent out of or into an instance. Managing Networks Before launching instances, the virtual network infrastructure to which instances will connect must be created. Prior to creating a network, it is important to consider what subnets will be used. A router is used to direct traffic from one subnet to another. Create provider network The provider network enables external access to instances. It allows external access from instances using Network Address Translation (NAT), a floating IP address, and suitable security group rules. • To create a provider network, run the openstack network create command. Specify the network type by using the --provider-network-type option. [user@demo ~]$ openstack network create \ --external \ --provider-network-type vlan \ --provider-physical-network datacentre \ --provider-segment 500 \ provider-demo
152
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Layer 2 Traffic Flow • Similar to a physical network, the virtual network requires a subnet. The provider network shares the same subnet and gateway associated with the physical network connected to the provider network. To create a subnet for a provider network, run the openstack subnet create command: [user@demo ~]$ openstack subnet create \ --no-dhcp \ --subnet-range 172.25.250.0/24 \ --gateway 172.25.250.254 \ --dns-nameserver 172.25.250.254 \ --allocation-pool start=172.25.250.101,end=172.25.250.189 \ --network provider-demo \ provider-subnet-demo
Managing Tenant Networks Tenant networks provide internal network access for instances of a particular project. • To create a tenant network, run the openstack network create command. [user@demo ~]$ openstack network create demo-network1
• Create the corresponding subnet for the tenant network, specifying the tenant network CIDR. By default, this subnet uses DHCP so the instances can obtain IP addresses. The first IP address of the subnet is reserved as the gateway IP address. [user@demo ~]$ openstack subnet create \ --network demo-network1 \ --subnet-range=192.168.1.0/24 \ --dns-nameserver=172.25.250.254 \ --dhcp demo-subnet1
Layer 2 Traffic Flow Figure 5.6: Layer 2 Traffic Flow describes the network flow for an instance running in an OpenStack environment, using Open vSwitch as a virtual switch.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
153
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Figure 5.6: Layer 2 Traffic Flow 1.
Packets leaving the eth0 interface of the instance are routed to a Linux bridge.
2.
The Linux bridge is connected to an Open vSwitch bridge by a vEth pair. The Linux bridge is used for inbound and outbound firewall rules, as defined by the security groups. Packets traverse the vEth pair to reach the integration bridge, usually named br-int.
3.
Packets are then moved to the external bridge, usually br-ex, over patch ports. OVS flows manage packet headers according to the network configuration. For example, flows are used to strip VLAN tags from network packets before forwarding them to the physical interfaces.
Managing Networks and Subnets The following steps outline the process for managing networks and subnets in OpenStack. 1.
Create the provider network.
2.
Create a subnet for a provider network, and specify the floating IP address slice using the --allocation-pool option.
3.
Create a tenant network; for example, a VXLAN-based network.
154
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Layer 2 Traffic Flow 4.
Create the corresponding subnet for the tenant network, specifying the tenant network CIDR. The first IP address of the subnet is reserved as the gateway IP address.
References Further information is available in the Networking Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
155
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Guided Exercise: Managing SDN Segments and Subnets In this exercise, you will manage networks and routers. You will also review the implementation of the network environment. Outcomes You should be able to: • Create networks • Create routers • Review the network implementation Before you begin Log in to workstation as student using student as the password. Run the lab network-managing-sdn setup command. This script ensures that the OpenStack services are running and the environment is properly configured for this exercise. The script creates the OpenStack user developer1 and the OpenStack administrative user architect1 in the research project. The script also creates the rhel7 image and the m1.small flavor. [student@workstation ~]$ lab network-managing-sdn setup
Steps 1. From workstation, source the developer1-research-rc credentials file. As the developer1 user, create a network for the project. Name the network researchnetwork1. [student@workstation ~]$ source developer1-research-rc [student@workstation ~(developer1-research)]$ openstack network create \ research-network1 +-------------------------+--------------------------------------+ | Field | Value | +-------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2017-06-07T18:43:05Z | | description | | | headers | | | id | b4b6cea6-51ed-45ae-95ff-9e67512a4fc8 | | ipv4_address_scope | None | | ipv6_address_scope | None | | mtu | 1446 | | name | research-network1 | | port_security_enabled | True | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | qos_policy_id | None | | revision_number | 3 | | router:external | Internal |
156
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| shared | False | | status | ACTIVE | | subnets | | | tags | [] | | updated_at | 2017-06-07T18:43:05Z | +-------------------------+--------------------------------------+
2.
Create the subnet research-subnet1 for the network in the 192.168.1.0/24 range. Use 172.25.250.254 as the DNS server. [student@workstation ~(developer1-research)]$ openstack subnet create \ --network research-network1 \ --subnet-range=192.168.1.0/24 \ --dns-nameserver=172.25.250.254 \ --dhcp research-subnet1 +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | 192.168.1.2-192.168.1.254 | | cidr | 192.168.1.0/24 | | created_at | 2017-06-07T18:47:44Z | | description | | | dns_nameservers | 172.25.250.254 | | enable_dhcp | True | | gateway_ip | 192.168.1.1 | | headers | | | host_routes | | | id | f952b9e9-bf30-4889-bb89-4303b4e849ae | | ip_version | 4 | | ipv6_address_mode | None | | ipv6_ra_mode | None | | name | research-subnet1 | | network_id | b4b6cea6-51ed-45ae-95ff-9e67512a4fc8 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | revision_number | 2 | | service_types | [] | | subnetpool_id | None | | updated_at | 2017-06-07T18:47:44Z | +-------------------+--------------------------------------+
3.
Open another terminal and log in to the controller node, controller0, to review the ML2 configuration. Ensure that there are driver entries for VLAN networks. 3.1. Log in to the controller node as the heat-admin user and become root. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ sudo -i [root@overcloud-controller-0 ~]#
3.2. Go to the /etc/neutron/ directory. Use the crudini command to retrieve the values for the type_drivers key in the ml2 group. Ensure that the vlan driver is included. [root@overcloud-controller-0 heat-admin]# cd /etc/neutron [root@overcloud-controller-0 neutron]# crudini --get plugin.ini ml2 type_drivers vxlan,vlan,flat,gre
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
157
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 3.3. Retrieve the name of the physical network used by VLAN networks. ML2 groups are named after the driver, for example, ml2_type_vlan. [root@overcloud-controller-0 neutron]# crudini --get plugin.ini \ ml2_type_vlan network_vlan_ranges datacentre:1:1000 [root@overcloud-controller-0 neutron]# exit [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$ exit
4.
On workstation, as the architect1 user, create the provider network provider-172.25.250. The network will be used to provide external connectivity. Use vlan as the provider network type with an segment ID of 500. Use datacentre as the physical network name, as defined in the ML2 configuration file. [student@workstation ~(developer1-research)]$ source architect1-research-rc [student@workstation ~(architect1-research)]$ openstack network create \ --external \ --provider-network-type vlan \ --provider-physical-network datacentre \ --provider-segment 500 \ provider-172.25.250 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2017-06-07T20:33:50Z | | description | | | headers | | | id | e4ab7774-8f69-4383-817f-e6e1d063c7d3 | | ipv4_address_scope | None | | ipv6_address_scope | None | | is_default | False | | mtu | 1496 | | name | provider-172.25.250 | | port_security_enabled | True | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | provider:network_type | vlan | | provider:physical_network | datacentre | | provider:segmentation_id | 500 | | qos_policy_id | None | | revision_number | 4 | | router:external | External | | shared | False | | status | ACTIVE | | subnets | | | tags | [] | | updated_at | 2017-06-07T20:33:50Z | +---------------------------+--------------------------------------+
5.
158
Create the subnet for the provider network provider-172.25.250 with an allocation pool of 172.25.250.101 - 172.25.250.189. Name the subnet providersubnet-172.25.250. Use 172.25.250.254 for both the DNS server and the gateway. Disable DHCP for this network.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~(architect1-research)]$ openstack subnet create \ --no-dhcp \ --subnet-range 172.25.250.0/24 \ --gateway 172.25.250.254 \ --dns-nameserver 172.25.250.254 \ --allocation-pool start=172.25.250.101,end=172.25.250.189 \ --network provider-172.25.250 \ provider-subnet-172.25.250 +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | 172.25.250.101-172.25.250.189 | | cidr | 172.25.250.0/24 | | created_at | 2017-06-07T20:42:26Z | | description | | | dns_nameservers | 172.25.250.254 | | enable_dhcp | False | | gateway_ip | 172.25.250.254 | | headers | | | host_routes | | | id | 07ea3c70-18ab-43ba-a334-717042842cf7 | | ip_version | 4 | | ipv6_address_mode | None | | ipv6_ra_mode | None | | name | provider-subnet-172.25.250 | | network_id | e4ab7774-8f69-4383-817f-e6e1d063c7d3 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | revision_number | 2 | | service_types | [] | | subnetpool_id | None | | updated_at | 2017-06-07T20:42:26Z | +-------------------+--------------------------------------+
6.
As the developer1 user, create the router research-router1. Add an interface to research-router1 in the research-subnet1 subnet. Define the router as a gateway for the provider-172.25.250 network. 6.1. Source the developer1-research-rc credentials file and create the researchrouter1 router. [student@workstation ~(architect1-research)]$ source developer1-research-rc [student@workstation ~(developer1-research)]$ openstack router create \ research-router1 +-------------------------+--------------------------------------+ | Field | Value | +-------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2017-06-07T20:56:46Z | | description | | | external_gateway_info | null | | flavor_id | None | | headers | | | id | dbf911e3-c3c4-4607-b4e2-ced7112c7541 | | name | research-router1 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | revision_number | 3 |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
159
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure | routes | | | status | ACTIVE | | updated_at | 2017-06-07T20:56:46Z | +-------------------------+--------------------------------------+
6.2. Add an interface to research-router1 in the research-subnet1 subnet. The command does not produce any output. [student@workstation ~(developer1-research)]$ openstack router add \ subnet research-router1 research-subnet1
6.3. Use the neutron command to define the router as a gateway for the provider-172.25.250 network. [student@workstation ~(developer1-research)]$ neutron router-gateway-set \ research-router1 provider-172.25.250 Set gateway for router research-router1
7.
Create a floating IP in the provider network, provider-172.25.250. [student@workstation ~(developer1-research)]$ openstack floating ip \ create provider-172.25.250 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | created_at | 2017-06-07T22:44:51Z | | description | | | fixed_ip_address | None | | floating_ip_address | 172.25.250.P | | floating_network_id | e4ab7774-8f69-4383-817f-e6e1d063c7d3 | | headers | | | id | 26b0ab61-170e-403f-b67d-558b94597e08 | | port_id | None | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | project_id | 6b2eb5c2e59743b9b345ee54a7f87321 | | revision_number | 1 | | router_id | None | | status | DOWN | | updated_at | 2017-06-07T22:44:51Z | +---------------------+--------------------------------------+
8.
Launch the research-web1 instance in the environment. Use the m1.small flavor and the rhel7 image. Connect the instance to the research-network1 network. [student@workstation ~(developer1-research)]$ openstack server create \ --image rhel7 \ --flavor m1.small \ --nic net-id=research-network1 \ --wait research-web1 +--------------------------------------+--------------------------------------+ | Field | Value | +--------------------------------------+--------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active |
160
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| OS-SRV-USG:launched_at | 2017-06-07T22:50:55.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | research-network1=192.168.1.N | | adminPass | CEkrjL8hKWtR | ...output omitted... +--------------------------------------+--------------------------------------+
9.
Associate the floating IP, created previously, to the instance. 9.1. View the floating IP created earlier. [student@workstation ~(developer1-research)]$ openstack floating ip list \ -f value -c 'Floating IP Address' 172.25.250.P
9.2. Associate the IP to the research-web1 instance. [student@workstation ~(developer1-research)]$ openstack server add \ floating ip research-web1 172.25.250.P
9.3. List the network ports. Locate the UUID of the port corresponding to the instance in the research-network1 network. In the output, f952b9e9-bf30-4889-bb89-4303b4e849ae is the ID of the subnet for the research-network1 network. [student@workstation ~(developer1-research)]$ openstack subnet list \ -c ID -c Name +--------------------------------------+------------------+ | ID | Name | +--------------------------------------+------------------+ | f952b9e9-bf30-4889-bb89-4303b4e849ae | research-subnet1 | ...output omitted.. +--------------------------------------+------------------+ [student@workstation ~(developer1-research)]$ openstack port list -f json [ { "Fixed IP Addresses": "ip_address='192.168.1.N', subnet_id='f952b9e9bf30-4889-bb89-4303b4e849ae'", "ID": "1f5285b0-76b5-41db-9cc7-578289ddc83c", "MAC Address": "fa:16:3e:f0:04:a9", "Name": "" }, ...output omitted...
10. Open another terminal. Use the ssh command to log in to the compute0 virtual machine as the heat-admin user. [student@workstation ~]$ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$
11.
List the Linux bridges in the environment. Ensure that there is a qbr bridge that uses the first ten characters of the Neutron port in its name. The bridge has two ports in it: the TAP
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
161
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure device that the instance uses and the qvb vEth pair, which connects the Linux bridge to the integration bridge. [heat-admin@overcloud-compute-0 ~]$ brctl show qbr1f5285b0-76 8000.ce25a52e5a32 no qvb1f5285b0-76 tap1f5285b0-76
12. Exit from the compute node and connect to the controller node. [heat-admin@overcloud-compute-0 ~]$ exit [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$
13. To determine the port ID of the phy-br-ex bridge, use the ovs-ofctl command. The output lists the ports in the br-ex bridge. [heat-admin@overcloud-controller-0 ~]$ sudo ovs-ofctl show br-ex OFPT_FEATURES_REPLY (xid=0x2): dpid:000052540002fa01 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(eth2): addr:52:54:00:02:fa:01 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(phy-br-ex): addr:1a:5d:d6:bb:01:a1 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max ...output omitted...
14. Dump the flows for the external bridge, br-ex. Review the entries to locate the flow for the packets passing through the tenant network. Locate the rule that handles packets in the phy-br-ex port. The following output shows how the internal VLAN ID, 2, is replaced with the VLAN ID 500 as defined by the --provider-segment 500 option. [heat-admin@overcloud-controller-0 ~]$ sudo ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0xbcb9ae293ed51406, duration=2332.961s, table=0, n_packets=297, n_bytes=12530, idle_age=872, priority=4,in_port=2,dl_vlan=2 actions=mod_vlan_vid:500,NORMAL ...output omitted...
15. Exit from the controller0 node. [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$
Cleanup From workstation, run the lab network-managing-sdn cleanup script to clean up the resources created in this exercise.
162
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~]$ lab network-managing-sdn cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
163
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Tracing Multitenancy Network Flows Objectives After completing this section, students should be able to: • Discuss network flow and network paths. • Discuss VLAN translation in OpenStack. • Discuss network tunneling. • Discuss the usage of Netfilter in OpenStack. • Discuss the various network devices used in OpenStack. • Discuss security groups and floating IPs.
Introduction to Modular Layer 2 (ML2) The Modular Layer 2 (ML2) plug-in is a framework that enables the usage of various technologies. For instance, administrators can interact with Open vSwitch, which is a technology that provides virtual switching, or Cisco equipment, using the various plug-ins available for OpenStack Networking. ML2 Drivers and Networks Types Starting with Red Hat OpenStack Platform 4 (Havana), the introduction of the ML2 architecture allows users to use more than one networking technology. Before the introduction of the ML2 plug-in, it was not possible to simultaneously run multiple network plug-ins such as Linux bridges and Open vSwitch bridges. The ML2 framework creates a layer of abstraction that separates the management of network types from the mechanisms used to access these networks, and allows multiple mechanism drivers to access the same networks simultaneously. The implementation of ML2 gives the possibility to companies and manufacturers to develop their own plug-ins. To this day, there are more than 20 drivers available from various manufacturers, including Cisco, Microsoft, Nicira, Ryu, and Lenovo. Drivers implement a set of extensible mechanisms for various network back-ends to be able to communicate with OpenStack Networking services. The implementations can either utilize layer 2 agents with a Remote Procedure Call (RPC) or use the OpenStack Networking mechanism drivers to interact with external devices or controllers. In OpenStack, each network type is managed by a ML2 driver. Such drivers maintain any needed network state, and can perform network validation or the creation of networks for OpenStack projects. The ML2 plug-in currently includes drivers for the following network types: • Local: a network that can only be implemented on a single host. Local networks must only be used in proof-of-concept or development environments. • Flat: a network that does not support segmentation. A traditional layer 2 Ethernet network can be considered a flat network. Servers that are connected to flat networks can listen to the broadcast traffic and can contact each other. In OpenStack terminology, flat networks are used to connect instances to existing layer 2 networks, or provider networks.
164
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
OpenStack Networking Concepts • VLAN: a network that uses VLANs for segmentation. When users create VLAN networks, a VLAN identifier (ID) is assigned from the range defined in the OpenStack Networking configuration. Administrators must configure the network switches to trunk the corresponding VLANs. • GRE and VXLAN: networks that are similar to VLAN networks. GRE and VXLAN are overlay networks that encapsulate network traffic. Both networks receive a unique tunnel identifier. However, unlike VLANs, overlay networks do not require any synchronization between the OpenStack environment and layer 2 switches. The following lists some of the available OpenStack Networking ML2 plug-ins: • • • • • • • • • • • • • • •
Open vSwitch Cisco UCS and Nexus Linux Bridge Nicira Network Virtualization Platform (NVP) Ryu and OpenFlow Controller NEC OpenFlow Big Switch Controller Cloudbase Hyper-V MidoNet PLUMgrid Embrane IBM SDN-VE Nuage Networks OpenContrail Lenovo Networking
OpenStack Networking Concepts OpenStack Networking manages services such as network routing, DHCP, and injection of metadata into instances. OpenStack Networking services can either be deployed on a stand-alone node, which is usually referred to as Network node, or adjacently to other OpenStack services. In a stand-alone configuration, servers perform dedicated network tasks, such as managing Layer 3 routing for the network traffic to and from the instances.
Note Red Hat OpenStack Platform 10 adds support for composable roles. Composable roles allow administrators to separate the network services into a custom role.
Layer 2 Population The layer 2 (L2) population driver enables broadcast, multicast, and unicast traffic to scale out on large overlay networks. By default, Open vSwitch GRE and VXLAN networks replicate broadcasts to every agent, including those that do not host the destination network. This leads to a significant network and processing overhead. L2 population is a mechanism driver for OpenStack Networking ML2 plug-ins that leverages the implementation of overlay networks. The service works by gaining full knowledge of the topology, which includes the MAC address and the IP address of each port. As a result, forwarding tables can be programmed beforehand and the processing of ARP requests is optimized. By populating the forwarding tables of virtual switches,
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
165
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure such as Linux bridges or Open vSwitch bridges, the driver decreases the broadcast traffic inside the physical networks.
Introduction to Layer 2 and Layer 3 Networking When designing their virtual network, administrators need to anticipate where the majority of traffic is going to be sent. In general, network traffic moves faster within the same logical network than between different networks. This is explained by the fact that the traffic between logical networks, which use different subnets, needs to pass through a router, which results in additional latency and overhead. Figure 5.7: Network routing on separate VLANs shows the network traffic flowing between instances on separate VLANs:
Figure 5.7: Network routing on separate VLANs Switching occurs at a lower level of the network, that is, on layer 2, which functions faster than routing that occurs at layer 3. Administrators should consider having as few network hops as possible between instances. Figure 5.8: Network switching shows a switched network that spans on two physical systems, which allows two instances to directly communicate without using a router. The instances share the same subnet, which indicates that they are on the same logical network:
166
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introduction to Subnets
Figure 5.8: Network switching
Introduction to Subnets A subnet is a logical subdivision of an IP network. On TCP/IP networks, the logical subdivision is defined as all devices whose IP addresses have the same prefix. For example, using a /24 subnet mask, all devices with IP addresses on 172.16.0.0/24 would be part of the same subnet with 256 possible addresses. Addresses on the /24 subnet include a network address of 172.16.0.0 and a broadcast address of 172.16.0.255, leaving 254 available host addresses on the same subnet. A /24 subnet can be split by using a /25 subnet mask: 172.16.0.0/25 and 172.16.0.128/25, with 126 hosts per subnet. The first subnet would have a range from 172.16.0.0 (network) to 172.16.0.127 (broadcast) leaving 126 available host addresses. The second subnet would have a range from 172.16.0.128 (network) to 172.16.0.255 (broadcast) leaving 126 available host addresses. This demonstrates that networks can be divided into one or more subnets depending on their subnet mask. A subnet may be used to represent all servers present in the same geographic location, or on the same Local Area Network (LAN). By using subnets to divide the network, administrators can connect many devices spread across multiple segments to the Internet. Subnets are a useful way to share a network and create subdivisions on segments. The practice of creating subnet is called subnetting. Figure 5.9: Network subnets shows three subnets connected to the same router.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
167
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Figure 5.9: Network subnets Subnets can be represented in two ways: • Variable Length Subnet Mask (VLSM): subnet addresses are traditionally displayed using the network address accompanied by the subnet mask. For example: Network Address: 192.168.100.0 Subnet mask: 255.255.255.0
• Classless Inter Domain Routing (CIDR): this format shortens the subnet mask into its total number of active bits. For example, in 192.168.100.0/24 the /24 is a shortened representation of 255.255.255.0, which is a total of the number of flipped bits when converted to binary. Management of Subnets in OpenStack This same networking concept of subnetting applies in OpenStack. OpenStack Networking provides the API for virtual networking capabilities, which includes not only subnet management, but also routers and firewalls. The virtual network infrastructure allows instances to communicate with each other, as well as externally using the physical network. In OpenStack, a subnet is attached to a network, and a network can have one or multiple subnets. IP addresses are generally first allocated in blocks of subnets. For example, the IP address range of 192.168.100.0 - 192.168.100.255 with a subnet mask of 255.555.255.0 allows for 254 IP addresses to be used. The first and last addresses are reserved for the network and broadcast.
168
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introduction to Subnets
Note Since all layer 2 plug-ins provide a total isolation between layer 2 networks, administrators can use overlapping subnets. This is made possible by the use of network namespaces that have their own routing tables. Routing tables manage the routing of traffic. As each namespace has its own routing table, OpenStack Networking is able to provide overlapping address in different virtual networks.
Administrators can use both the Horizon dashboard and the command-line interface to manage subnets. The following output shows two subnets, each belonging to a network. [user@demo ~]$ openstack subnet list -c Name -c Network -c Subnet +--------------+--------------------------------------+-----------------+ | Name | Network | Subnet | +--------------+--------------------------------------+-----------------+ | subinternal1 | 0062e02b-7e40-407f-ac43-49e84de096ed | 192.168.0.0/24 | | subexternal1 | 8d633bda-3ef4-4267-878f-265d5845f20a | 172.25.250.0/24 | +--------------+--------------------------------------+-----------------+
The subinternal1 subnet is an internal subnet, which provides internal networking for instances. The openstack subnet show command allows administrators to review the details for a given subnet. [user@demo ~]$ openstack subnet show subinternal1 +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | 192.168.0.2-192.168.0.254 | | cidr | 192.168.0.0/24 | | created_at | 2017-05-03T16:47:29Z | | description | | | dns_nameservers | | | enable_dhcp | True | | gateway_ip | 192.168.0.1 | | host_routes | | | id | 9f42ecca-0f8b-4968-bb53-a01350df7c7c | | ip_version | 4 | | ipv6_address_mode | None | | ipv6_ra_mode | None | | name | subinternal1 | | network_id | 0062e02b-7e40-407f-ac43-49e84de096ed | | project_id | c06a559eb68d4c5a846d9b7c829b50d2 | | project_id | c06a559eb68d4c5a846d9b7c829b50d2 | | revision_number | 2 | | service_types | [] | | subnetpool_id | None | | updated_at | 2017-05-03T16:47:29Z | +-------------------+--------------------------------------+
The Network Topology view in the Horizon dashboard allows administrators to review their network infrastructure. Figure 5.10: Network topology shows a basic topology comprised of an external network and a private network, connected by a router:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
169
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Figure 5.10: Network topology
Introduction to Network Namespaces A Linux network namespace is a copy of the Linux network stack, which can be seen as a container for a set of identifiers. Namespaces provide a level of direction to specific identifiers and make it possible to differentiate between identifiers with the same exact name. Namespaces give administrators the possibility to have different and separate instances of network interfaces and routing tables that operate independently of each other. Network namespaces have their own network routes and their own firewall rules, as well as their own network devices. Linux network namespaces are used to prevent collisions between the physical networks on the network host and the logical networks used by the virtual machines. They also prevent collisions across different logical networks that are not routed to each other. Usage of Namespaces in OpenStack Networks for OpenStack projects might overlap with those of the physical network. For example, if a management network is implemented on the eth2 device, and also happens to be on the 192.168.101.0/24 subnet, routing problems will occur, because the host cannot determine whether to send a packet on the subnet of a project network or to eth2. If end users are permitted to create their own logical networks and subnets, then the system must be designed to avoid the possibility of such collisions. OpenStack Networking uses Linux network namespaces to prevent collisions between the physical networks on the network host, and the logical networks used by the instances. OpenStack Networking typically implements two namespaces: • Namespaces for routers, named qrouter-UUID, where UUID is the router ID. The router namespace contains TAP devices like qr-YYY, qr-ZZZ, and qg-VVV as well as the corresponding routes.
170
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introduction to Floating IPs • Namespaces for projects that use DHCP services, named qdhcp-UUID, where UUID is the network ID. The project namespace contains the tapXXX interfaces and the dnsmasq process that listens on that interface in order to provide DHCP services for project networks. This namespace allows overlapping IPs between various subnets on the same network host. The following output shows the implementation of network namespaces after the creation of a project. In this setup, the namespaces are created on the controller, which also runs the networking services. [user@demo ~]$ ip netns list qrouter-89bae387-396c-4b24-a064-241103bcdb14 qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed
The UUID of a OpenStack Networking router. The UUID of a OpenStack Networking network. Administrators can access the various network devices in the namespace by running the ip netns exec qdhcp-UUID command. The following output shows the TAP device that the DHCP server uses for providing IP leases to the instances in a qdhcp namespace: [user@demo ~]$ sudo ip netns exec qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed ip a ...output omitted... 21: tapae83329c-91: mtu 1446 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:f2:48:da brd ff:ff:ff:ff:ff:ff inet 192.168.0.2/24 brd 192.168.0.255 scope global tapae83329c-91 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fef2:48da/64 scope link valid_lft forever preferred_lft forever
Introduction to Floating IPs In OpenStack terminology, a floating IP is an IP address allocated from a pool for a network. A floating IP is a routable IP address that is publicly reachable. Floating IPs enable communication from the external network to instances with a floating IP. Routing from a floating IP to a private IP assigned to an instance is performed by the OpenStack Networking L3 agent, which manages the routers as well as the floating IPs. The service generates a set of routing rules to create a static one-to-one mapping, from a floating IP on the external network, to the private IP assigned to an instance. The OpenStack Networking L3 agent interacts with the Netfilter service in order to create a routing topology for the floating IPs. Implementation of Floating IPs Floating IP addresses are not directly assigned to instances. Rather, a floating IP is an IP address attached to an OpenStack networking virtual device. They are IP aliases defined on router interfaces. The following sequence is a high-level description of how the floating IP address 172.25.250.28 is implemented when a user assigns it to an instance. It does not describe the extra configuration performed by various network agents. 1.
When a floating IP is attached to an instance, an IP alias is added to the qg-UUID device, where UUID is the truncated identifier of the router port in the external network. Administrators can view the IP address on the network node by listing the IP addresses in the network namespace for the router:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
171
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
[user@demo ~]$ ip netns exec qrouter-UUID ip addr sh dev qg-XXX 23: qg-9d11d7d6-45: mtu 1496 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:56:28:e4 brd ff:ff:ff:ff:ff:ff inet 172.25.250.25/24 brd 172.25.250.255 scope global qg-9d11d7d6-45 valid_lft forever preferred_lft forever inet 172.25.250.28/32 brd 172.25.250.28 scope global qg-9d11d7d6-45 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe56:28e4/64 scope link valid_lft forever preferred_lft forever
2.
A set of Netfilter rules is created in the router namespace. This routes the packet between the instance's IP and the floating IP. OpenStack Networking implements a rule for incoming traffic (SNAT) as well as for the outgoing traffic (DNAT). The following output shows the two Netfilter rules in the router namespace. [user@demo ~]$ ip netns exec qrouter-UUID iptables -L -nv -t nat | grep 250.28 24 1632 DNAT all -- * * 0.0.0.0/0 172.25.250.28 to:192.168.0.11 8 672 SNAT all -- * * 192.168.0.11 0.0.0.0/0 to:172.25.250.28
Note The same network can be used to allocate floating IP addresses to instances even if they have been added to private networks at the same time. The addresses allocated as floating IPs from this network are bound to the qrouter namespace on the network node, and perform both the Source Network Address Translation (SNAT) and Destination Network Address Translation (DNAT) to the associated private IP address. In contrast, the IP address allocated to the instance for direct external network access is bound directly inside the instance, and allows the instance to communicate directly with external networks.
Usage of Netfilter by OpenStack Networking Netfilter, which is a framework provided by the Linux kernel, allows networking-related operations to be implemented in the form of rules. Netfilter analyzes and inspects packets in order to determine how to handle them. It uses user-defined rules to route the packet through the network stack. OpenStack Networking uses Netfilter for handling network packets, managing security groups, and routing network packets for the floating IPs allocated to instances. Figure 5.11: Netfilter packets inspection shows how networks packets are handled by the program.
172
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introduction to Floating IPs
Routing Decision
Prerouting
Forward
Postrouting
Input
Output
Local Process
Linux Kernel Inspection Point
Figure 5.11: Netfilter packets inspection OpenStack Networking uses Netfilter to: • Set basic rules for various network services, such as NTP, VXLAN, or SNMP traffic. • Allow source NAT on outgoing traffic, which is the traffic originating from instances. • Set a default rule that drops any unmatched traffic. • Create a rule that allows direct traffic from the instance's network devices to the security group chain. • Set rules that allow traffic from a defined set of IP and MAC address pairs. • Allow DHCP traffic from DCHP servers to the instances. • Prevent DHCP spoofing by the instances. • Drop any packet that is not associated with a state. States include NEW, ESTABLISHED, RELATED, INVALID, and UNTRACKED. • Routes direct packets that are associated with a known session to the RETURN chain. The following output shows some of the rules implemented in a compute node. The neutronopenvswi-FORWARD chain contains the two rules that direct the instance's traffic to the security group chain. In the following output, the instance's security group chain is named neutron-openvswi-scb2aafd8-b ...output omitted... Chain neutron-openvswi-FORWARD (1 references) pkts bytes target prot opt in out source destination 4593 387K neutron-openvswi-sg-chain all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-out tapcb2aafd8-b1 --physdev-is-bridged /* Direct traffic from the VM interface to the security group chain. */
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
173
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 4647 380K neutron-openvswi-sg-chain all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-in tapcb2aafd8-b1 --physdev-is-bridged /* Direct traffic from the VM interface to the security group chain. */ ...output omitted... Chain neutron-openvswi-scb2aafd8-b (1 references) pkts bytes target prot opt in out source destination 4645 379K RETURN all -- * * 192.168.0.11 0.0.0.0/0 MAC FA:16:3E:DC:58:D1 /* Allow traffic from defined IP/MAC pairs. */ 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* Drop traffic without an IP/MAC allow rule. */
Virtual Network Devices OpenStack Networking uses virtual devices for various purposes. Routers, floating IPs, instances, and DHCP server are the main virtual objects that require virtual network devices. Assuming the usage of Open vSwitch as the network plug-in, there are four distinct type of virtual networking devices: TAP devices, vEth pairs, Linux bridges, and Open vSwitch bridges. A TAP device, such as vnet0 is how hypervisors such as KVM implement a virtual network interface card. Virtual network cards are typically called VIF or vNIC. An Ethernet frame sent to a TAP device is received by the guest operating system. A vEth pair is a pair of virtual network interfaces connected together. An Ethernet frame sent to one end of a vEth pair is received by the other end of a vEth pair. OpenStack Networking makes use of vEth pairs as virtual patch cables in order to make connections between virtual bridges. A Linux bridge behaves like a hub: administrators can connect multiple network interface devices, whether physical or virtual, to a Linux bridge. Any Ethernet frames that come in from one interface attached to the bridge is transmitted to all of the other devices. Moreover, bridges are aware of the MAC addresses of the devices attached to them. An Open vSwitch bridge behaves like a virtual switch: network interface devices connect to Open vSwitch bridge's ports, and the ports can be configured like a physical switch's ports, including VLAN configurations. For an Ethernet frame to travel from eth0, which is the local network interface of a instance, to the physical network, it must pass through six devices inside of the host: 1.
A TAP device, such as vnet0.
2.
A Linux bridge, such as qbrcb2aafd8-b1.
3.
A project vEth pair, such as qvbcb2aafd8-b1 and qvocb2aafd8-b1.
4.
The Open vSwitch integration bridge, br-int.
5.
The provider vEth pair, int-br-eth1 and phy-br-eth1.
6.
The physical network interface card; for example, eth1.
Introduction to Security Groups Security groups and security rules filter the type and direction of network traffic sent to, and received from, an OpenStack Networking port. This provides an additional layer of security to complement any firewall rules present on compute nodes. Security groups are containers of objects with one or more security rules. A single security group can manage traffic to multiple
174
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
br-tun and VLAN Translation OpenStack instances. Both the ports created for floating IP addresses, as well as the instances, are associated with a security group. If none is specified, the port is then associated with the default security group. Additional security rules can be added to the default security group to modify its behavior, or new security groups can be created as necessary.
Note By default, the group drops all inbound traffic and allows all outbound traffic.
Implementation of Security Groups When a new security group is created, OpenStack Networking and the Nova compute service define an adequate set of Netfilter rules. For example, if administrators add a security rule to allow the ICMP traffic to pass through in order to reach instances in a project, a set of rule sequences is implemented to route the traffic from the external network to the instance. Netfilter rules are created on the compute node. Each time a new rule is created, a Netfilter rule is inserted in the neutron-openvswi-XXX chain. The following output shows the Netfilter rule that allow remote connections to the TCP port 565 after the creation of a security group rule. [user@demo ~]$ iptables -L -nv Chain neutron-openvswi-icb2aafd8-b (1 references) ...output omitted... 0 0 RETURN tcp -- * * 0.0.0.0/0 tcp dpt:565 ...output omitted...
0.0.0.0/0
br-tun and VLAN Translation When creating virtual networks, the translation between VLAN IDs and tunnel IDs is performed by OpenFlow rules running on the br-tun tunnel bridge. The tunnel bridge is connected to the Open vSwitch integration bridge, br-int through patch ports. The OpenFlow rules manage the traffic in the tunnel, which translates VLAN-tagged traffic from the integration bridge into GRE tunnels. The following output shows the flow rules on the bridge before the creation of any instance. This is a single rule that causes the bridge to drop all traffic. [user@demo ~]$ ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=871.283s, table=0, n_packets=4, n_bytes=300, idle_age=862, priority=1 actions=drop
After an instance is running on a compute node, the rules are modified to look something like the following output. [user@demo ~]$ ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=422.158s, table=0, n_packets=2, n_bytes=120, idle_age=55, priority=3,tun_id=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:1 cookie=0x0, duration=421.948s, table=0, n_packets=64, n_bytes=8337, idle_age=31, priority=3,tun_id=0x2,dl_dst=fa:16:3e:dd:c1:62 actions=mod_vlan_vid:1,NORMAL cookie=0x0, duration=422.357s, table=0, n_packets=82, n_bytes=10443, idle_age=31, priority=4,in_port=1,dl_vlan=1 actions=set_tunnel:0x2,NORMAL
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
175
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure cookie=0x0, duration=1502.657s, table=0, n_packets=8, n_bytes=596, idle_age=423, priority=1 actions=drop
The Open vSwitch agent is responsible for configuring flow rules on both the integration bridge and the external bridge for VLAN translation. For example, when br-ex receives a frame marked with VLAN ID of 1 on the port associated with phy-br-eth1, it modifies the VLAN ID in the frame to 101. Similarly, when the integration bridge, br-int receives a frame marked with VLAN ID of 101 on the port associated with int-br-eth1, it modifies the VLAN ID in the frame to 1.
OpenStack Networking DHCP The OpenStack Networking DHCP agent manages the network namespaces as well as the IP allocations for instances in projects. The DHCP agent uses the dnsmasq process to manage the IP address allocated to the virtual machines.
Note If the OpenStack Networking DHCP agent is enabled and running when a subnet is created, then by default, the subnet has DHCP enabled.
The DHCP agent runs inside a network namespace, named qdhcp-UUID, where UUID is the UUID of a project network. [user@demo ~]$ ip netns list qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed
Inside the namespace, the dnsmasq process binds to a TAP device, such as tapae83329c-91. The following output shows the TAP device on a network node, inside a namespace. [user@demo ~]$ ip netns exec qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed ip a ...output omitted... 21: tapae83329c-91: mtu 1446 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:f2:48:da brd ff:ff:ff:ff:ff:ff inet 192.168.0.2/24 brd 192.168.0.255 scope global tapae83329c-91 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fef2:48da/64 scope link valid_lft forever preferred_lft forever
This interface is a port in the integration bridge. [user@demo ~]$ ovs-vsctl show Bridge br-int ...output omitted... Port "tapae83329c-91" tag: 5 Interface "tapae83329c-91" type: internal ...output omitted...
Administrators can locate the dnsmasq process associated with the namespace by searching the output of the ps command for the UUID of the network.
176
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Instance Network Flow
[user@demo ~]$ ps -fe | grep 0062e02b-7e40-407f-ac43-49e84de096ed dnsmasq --no-hosts --no-resolv --strict-order --except-interface=lo --pid-file=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed /pid --dhcp-hostsfile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/host --addn-hosts=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/opts --dhcp-leasefile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/leases --dhcp-match=set:ipxe,175 --bind-interfaces --interface=tapae83329c-91 --dhcp-range=set:tag0,192.168.0.0,static,86400s --dhcp-option-force=option:mtu,1446 --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
The network identifier. The TAP device that the dnsmasq process listens on.
Instance Network Flow The following scenario describes the usage of a VLAN provider network that connects instances directly to external networks. Each instance belongs to a different project. For this scenario, Open vSwitch and Linux bridges are the two network back ends. The scenario assumes the following: • vlan is declared as an ML2 driver in /etc/neutron/plugins/ml2/ml2_conf.ini. [ml2] type_drivers = vlan
• A range of VLAN IDs that reflects the physical network is set in /etc/neutron/plugins/ ml2/ml2_conf.ini. For example, 171-172. [ml2_type_vlan] network_vlan_ranges=physnet1:171:172
• The br-ex bridge is set on the compute node, with eth1 enslaved to it. • The physical network, physnet1 is mapped to the br-ex bridge in /etc/neutron/ plugins/ml2/openvswitch_agent.ini. bridge_mappings = physnet1:br-ex
• The external_network_bridge has an empty value in /etc/neutron/l3_agent.ini. This allows the usage of a providers-based networks instead of bridges-based networks. external_network_bridge =
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
177
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure Figure 5.12: Network flow between two VLANs shows the implementation of the various network bridges, ports, and virtual interfaces.
Figure 5.12: Network flow between two VLANs Such a scenario can be used by administrators for connecting multiple VLAN-tagged interfaces on a single network device to multiple provider networks. This scenario uses the physical network called physnet1 mapped to the br-ex bridge. The VLANs use the IDs 171 and 172; the network nodes and compute nodes are connected to the physical network using eth1 as the physical interface.
Note The ports of the physical switch on which these interfaces are connected must be configured to trunk the VLAN ranges. If the trunk is not configured, the traffic will be blocked.
The following procedure shows the creation of the two networks and their associated subnets.
178
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Instance Network Flow 1.
The following commands create the two networks. Optionally, administrators can mark the networks as shared. [user@demo ~(keystone_admin)]$ neutron net-create provider-vlan171 \ --provider:network-type vlan \ --router:external true \ --provider:physical_network physnet1 \ --provider:segmentation_id 171 \ --shared [user@demo ~(keystone_admin)]$ neutron net-create provider-vlan172 \ --provider:network_type vlan \ --router:external true \ --provider:physical_network physnet1 \ --provider:segmentation_id 172 \ --shared
2.
The following commands create the subnets and for each external network. [user@demo ~(keystone_admin)]$ openstack subnet create \ --network provider-vlan171 \ --subnet-range 10.65.217.0/24 \ --dhcp \ --gateway 10.65.217.254 \ subnet-provider-171 \ [user@demo ~(keystone_admin)]$ openstack subnet create \ --network provider-vlan172 \ --subnet-range 10.65.218.0/24 \ --dhcp \ --gateway 10.65.218.254 \ subnet-provider-172 \
Traffic Flow Implementation The following describes the implementation of the traffic flow. The qbr bridge is connected to the integration bridge, br-int via a veth pair. qvb is the endpoint connected to the Linux bridge. qvo is the endpoint connected to the Open vSwitch bridge. Run the brctl command to review the Linux bridges and their ports. [user@demo ~]$ brctl show bridge name bridge id qbr84878b78-63
STP enabled interfaces
8000.e6b3df9451e0 no
qvb84878b78-63 tap84878b78-63
qbr86257b61-5d
8000.3a3c888eeae6 no
qvb86257b61-5d tap86257b61-5d
The project bridge. The end point of the vEth pair connected to the project bridge. The TAP device, which is the network interface of the instance. Run the ovs-vsctl command to review the implementation of the Open vSwitch bridges. [user@demo ~]$ ovs-vsctl show
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
179
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Bridge br-int fail_mode: secure Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port br-int Interface br-int type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "qvo86257b61-5d" tag: 3 Interface "qvo86257b61-5d" Port "qvo84878b78-63" tag: 2 Interface "qvo84878b78-63"
The Open vSwitch integration bridge. The patch that connects the integration bridge, br-int, to the external bridge, br-ex The end point of the vEth pair that connects the project bridge to the integration bridge for the second project. The end point of the vEth pair that connects the project bridge to the integration bridge for the first project. Outgoing Traffic Flow The following describes the network flow for the two instances for packets destined to an external network. 1.
The packets that leave the instances from the eth0 interface arrive to the Linux bridge, qbr. The instances use the virtual device, tap, as the network device. The device is set as a port in the qbr bridge.
2.
Each qvo end point residing in the Open vSwitch bridge is tagged with the internal VLAN tag associated with the VLAN provider network. In this example, the internal VLAN tag 2 is associated with the VLAN provider network provider-171, and VLAN tag 3 is associated with VLAN provider network provider-172. When a packet reaches the qvo end point, the VLAN tag is added to the packet header.
3.
The packet is then moved to the Open vSwitch bridge br-ex using the patch between intbr-ex and phy-br-ex. Run the ovs-vsctl show command to view the ports in the br-ex and br-int bridges. [user@demo ~]$ ovs-vsctl show Bridge br-ex Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} ...output omitted... Bridge br-int
180
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Instance Network Flow Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex}
Patch port in the br-ex bridge. Patch port in the br-int bridge. 4.
When the packet reaches the endpoint phy-br-ex on the br-ex bridge, an Open vSwitch flow inside the br-ex bridge replaces the internal VLAN tag with the actual VLAN tag associated with the VLAN provider network. Run the ovs-ofctl show br-ex command to retrieve the port number of the phy-brex port. In the following example, the port phy-br-ex has a value of 4. [user@demo ~]$ ovs-ofctl show br-ex 4(phy-br-ex): addr:32:e7:a1:6b:90:3e config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max
5.
The following output shows how Open Flow handles packets in the phy-br-ex bridge (in_port=4), with a VLAN ID of 2 (dl_vlan=2). Open vSwitch replaces the VLAN tag with 171 (actions=mod_vlan_vid:171,NORMAL), then forwards the packet. The output also shows any packets that arrive on the phy-br-ex (in_port=4 with the VLAN tag 3 (dl_vlan=3). Open vSwitch replaces the VLAN tag with 172 (actions=mod_vlan_vid:172,NORMAL), then forwards the packet.
Note These rules are automatically added by the OpenStack Networking Open vSwitch Agent.
[user@demo ~]$ ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0x0, duration=6527.527s, table=0, n_packets=29211, n_bytes=2725576, idle_age=0, priority=1 actions=NORMAL cookie=0x0, duration=2939.172s, table=0, n_packets=117, n_bytes=8296, idle_age=58, priority=4,in_port=4 actions=mod_vlan_vid:172,NORMAL cookie=0x0, duration=6111.389s, table=0, n_packets=145, n_bytes=9368, idle_age=98, priority=4,in_port=4 ,dl_vlan=2 actions=mod_vlan_vid:171 ,NORMAL cookie=0x0, duration=6526.675s, table=0, n_packets=82, n_bytes=6700, idle_age=2462, priority=2,in_port=4 actions=drop
The bridge identifier. The identifier of the input VLAN. The VLAN identifier to apply to the packet. 6.
The packet is then forwarded to the physical interface, eth1.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
181
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure Incoming Traffic Flow The following describes the network flow for incoming traffic to the instances. 1.
Incoming packets destined to instances from the external network first reach the eth1 network device. They are then forwarded to the br-ex bridge. From the br-ex bridge, packets are moved to the integration bridge, br-int over the peer patch that connects the two bridges (phy-br-ex and int-br-ex). The following output shows the port with a number of 18. [user@demo ~]$ ovs-ofctl show br-int 18(int-br-ex): addr:fe:b7:cb:03:c5:c1 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max
2.
When the packet passes through the int-br-ex port, an Open vSwitch flow rule inside the bridge adds the internal VLAN tag 2 if the packets belongs to the provider-171 network, or the VLAN tag 3 if the packet belongs to the provider-172 network. Run the ovs-ofctl dump-flows br-int command to view the flow in the integration bridge: [user@demo ~]$ ovs-ofctl dump-flows br-int ...output omitted... NXST_FLOW reply (xid=0x4): cookie=0x0, duration=6770.572s, table=0, n_packets=1239, n_bytes=127795, idle_age=106, priority=1 actions=NORMAL cookie=0x0, duration=3181.679s, table=0, n_packets=2605, n_bytes=246456, idle_age=0, priority=3,in_port=18 ,dl_vlan=172 actions=mod_vlan_vid:3 ,NORMAL cookie=0x0, duration=6353.898s, table=0, n_packets=5077, n_bytes=482582, idle_age=0, priority=3,in_port=18,dl_vlan=171 actions=mod_vlan_vid:2,NORMAL cookie=0x0, duration=6769.391s, table=0, n_packets=22301, n_bytes=2013101, idle_age=0, priority=2,in_port=18 actions=drop cookie=0x0, duration=6770.463s, table=23, n_packets=0, n_bytes=0, idle_age=6770, priority=0 actions=drop ...output omitted...
The port identifier of the integration bridge. The VLAN ID of the packet. The tagging of the packet. In the previous output, the second rule instructs that packets passing through the intbr-ex port (in_port=18), with a VLAN tag of 172 (dl_vlan=172), have the VLAN tag replaced with 3 (actions=mod_vlan_vid:3,NORMAL), then forwards the packet. The third rule instructs that packets passing through the int-br-ex port (in_port=18), with a VLAN tag of 171 (dl_vlan=171), have the VLAN tag replaced with 2 (actions=mod_vlan_vid:2,NORMAL), and then forwards the packet. These rules are automatically added by the OpenStack Networking Open vSwitch agent. With the internal VLAN tag added to the packet, the qvo interface accepts it and forwards it to the qvb interface after the VLAN tag has been stripped. The packet then reaches the instance. 182
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Instance Network Flow
Tracing Multitenancy Network Flows The following steps outline the process for tracing multitenancy network flows. 1.
Create the provider network and its associated subnet.
2.
Create a router and connect it to all the projects' subnets. This allows for connectivity between two instances in separate projects.
3.
Set the router as a gateway for the provider network.
4.
Connect to the network node and use the tcpdump command against all network interfaces.
5.
Connect to the compute node and use the tcpdump command against the qvb devices in the qrouter namespace.
References Further information is available in the Networking Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/ Highly recommended document called "Networking in too much detail" https://www.rdoproject.org/networking/networking-in-too-much-detail/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
183
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Guided Exercise: Tracing Multitenancy Network Flows In this exercise, you will manage network flow for two projects. You will review the network implementation for multitenancy and trace packets between projects. Outcomes You should be able to: • Create a router for multiple projects. • Review the network implementation for multiple projects. • Use Linux tools to trace network packets between multiple projects. Before you begin Log in to workstation as student using student as the password. Run the lab network-tracing-net-flows setup command. The script ensures that OpenStack services are running and the environment is properly configured for the general exercise. This script creates two projects: research and finance. The developer1 user is a member of the research project, the developer2 user is a member of the finance project. The architect1 user is the administrative user for the two projects. The script also spawns one instance in each project. [student@workstation ~]$ lab network-tracing-net-flows setup
Steps 1. As the architect1 administrative user, review the instances for each of the two projects. 1.1. From workstation, source the credential file for the architect1 user in the finance project, available at /home/student/architect1-finance-rc. List the instances in the finance project. [student@workstation ~]$ source architect1-finance-rc [student@workstation ~(architect1-finance)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "finance-network1=192.168.2.F", "ID": "fcdd9115-5e05-4ec6-bd1c-991ab36881ee", "Image Name": "rhel7", "Name": "finance-app1" } ]
1.2. Source the credential file of the architect1 user for the research project, available at /home/student/architect1-research-rc. List the instances in the project. [student@workstation ~(architect1-finance)]$ source architect1-research-rc [student@workstation ~(architect1-research)]$ openstack server list -f json [
184
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
{ "Status": "ACTIVE", "Networks": "research-network1=192.168.1.R", "ID": "d9c2010e-93c0-4dc7-91c2-94bce5133f9b", "Image Name": "rhel7", "Name": "research-app1" } ]
2.
As the architect1 administrative user in the research project, create a shared external network to provide external connectivity for the two projects. Use provider-172.25.250 as the name of the network. The environment uses flat networks with datacentre as the physical network name. [student@workstation ~(architect1-research)]$ openstack network create \ --external --share \ --provider-network-type flat \ --provider-physical-network datacentre \ provider-172.25.250 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2017-06-09T21:03:49Z | | description | | | headers | | | id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc | | ipv4_address_scope | None | | ipv6_address_scope | None | | is_default | False | | mtu | 1496 | | name | provider-172.25.250 | | port_security_enabled | True | | project_id | c4606deb457f447b952c9c936dd65dcb | | project_id | c4606deb457f447b952c9c936dd65dcb | | provider:network_type | flat | | provider:physical_network | datacentre | | provider:segmentation_id | None | | qos_policy_id | None | | revision_number | 4 | | router:external | External | | shared | True | | status | ACTIVE | | subnets | | | tags | [] | | updated_at | 2017-06-09T21:03:49Z | +---------------------------+--------------------------------------+
3.
Create the subnet for the provider network in the 172.25.250.0/24 range. Name the subnet provider-subnet-172.25.250. Disable the DHCP service for the network and use an allocation pool of 172.25.250.101 - 172.25.250.189. Use 172.25.250.254 as the DNS server and the gateway for the network. [student@workstation ~(architect1-research)]$ openstack subnet create \ --network provider-172.25.250 \ --no-dhcp --subnet-range 172.25.250.0/24 \ --gateway 172.25.250.254 \
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
185
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure --dns-nameserver 172.25.250.254 \ --allocation-pool start=172.25.250.101,end=172.25.250.189 \ provider-subnet-172.25.250 +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | 172.25.250.101-172.25.250.189 | | cidr | 172.25.250.0/24 | | created_at | 2017-06-09T22:28:03Z | | description | | | dns_nameservers | 172.25.250.254 | | enable_dhcp | False | | gateway_ip | 172.25.250.254 | | headers | | | host_routes | | | id | e5d37f20-c976-4719-aadf-1b075b17c861 | | ip_version | 4 | | ipv6_address_mode | None | | ipv6_ra_mode | None | | name | provider-subnet-172.25.250 | | network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc | | project_id | c4606deb457f447b952c9c936dd65dcb | | project_id | c4606deb457f447b952c9c936dd65dcb | | revision_number | 2 | | service_types | [] | | subnetpool_id | None | | updated_at | 2017-06-09T22:28:03Z | +-------------------+--------------------------------------+
4.
List the subnets present in the environment. Ensure that there are three subnets: one subnet for each project and one subnet for the external network. [student@workstation ~(architect1-research)]$ openstack subnet list -f json [ { "Network": "14f8182a-4c0f-442e-8900-daf3055e758d", "Subnet": "192.168.2.0/24", "ID": "79d5d45f-e9fd-47a2-912e-e1acb83c6978", "Name": "finance-subnet1" }, { "Network": "f51735e7-4992-4ec3-b960-54bd8081c07f", "Subnet": "192.168.1.0/24", "ID": "d1dd16ee-a489-4884-a93b-95028b953d16", "Name": "research-subnet1" }, { "Network": "56b18acd-4f5a-4da3-a83a-fdf7fefb59dc", "Subnet": "172.25.250.0/24", "ID": "e5d37f20-c976-4719-aadf-1b075b17c861", "Name": "provider-subnet-172.25.250" } ]
5.
Create the research-router1 router and connect it to the two subnets, finance and research. 5.1. Create the router. [student@workstation ~(architect1-research)]$ openstack router create \
186
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
research-router1 +-------------------------+--------------------------------------+ | Field | Value | +-------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2017-06-09T23:03:15Z | | description | | | distributed | False | | external_gateway_info | null | | flavor_id | None | | ha | False | | headers | | | id | 3fed0799-5da7-48ac-851d-c2b3dee01b24 | | name | research-router1 | | project_id | c4606deb457f447b952c9c936dd65dcb | | project_id | c4606deb457f447b952c9c936dd65dcb | | revision_number | 3 | | routes | | | status | ACTIVE | | updated_at | 2017-06-09T23:03:15Z | +-------------------------+--------------------------------------+
5.2. Connect the router to the research-subnet1 subnet. [student@workstation ~(architect1-research)]$ openstack router add subnet \ research-router1 research-subnet1
5.3. Connect the router to the finance-subnet1 subnet. [student@workstation ~(architect1-research)]$ openstack router add subnet \ research-router1 finance-subnet1
6.
Define the router as a gateway for the provider network, provider-172.25.250. [student@workstation ~(architect1-research)]$ neutron router-gateway-set \ research-router1 provider-172.25.250 Set gateway for router research-router1
7.
Ensure that the router is connected to the three networks by listing the router ports. [student@workstation ~(architect1-research)]$ neutron router-port-list \ research-router1 -f json [ { "mac_address": "fa:16:3e:65:71:68", "fixed_ips": "{\"subnet_id\": \"0e6db9a7-40b6-4b10-b975-9ac32c458879\", \"ip_address\": \"192.168.2.1\"}", "id": "ac11ea59-e50e-47fa-b11c-1e93d975b534", "name": "" }, { "mac_address": "fa:16:3e:5a:74:28", "fixed_ips": "{\"subnet_id\": \"e5d37f20-c976-4719-aadf-1b075b17c861\", \"ip_address\": \"172.25.250.S\"}", "id": "dba2aba8-9060-4cef-be9f-6579baa016fb",
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
187
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure "name": "" }, { "mac_address": "fa:16:3e:a1:77:5f", "fixed_ips": "{\"subnet_id\": \"d1dd16ee-a489-4884-a93b-95028b953d16\", \"ip_address\": \"192.168.1.1\"}", "id": "fa7dab05-e5fa-4c2d-a611-d78670006ddf", "name": "" } ]
8.
As the developer1 user, create a floating IP and attach it to the research-app1 virtual machine. 8.1. Source the credentials for the developer1 user and create a floating IP. [student@workstation ~(architect1-finance)]$ source developer1-research-rc [student@workstation ~(developer1-research)]$ openstack floating ip create \ provider-172.25.250 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | created_at | 2017-06-10T00:40:51Z | | description | | | fixed_ip_address | None | | floating_ip_address | 172.25.250.N | | floating_network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc | | headers | | | id | d9c2010e-93c0-4dc7-91c2-94bce5133f9b | | port_id | None | | project_id | c4606deb457f447b952c9c936dd65dcb | | project_id | c4606deb457f447b952c9c936dd65dcb | | revision_number | 1 | | router_id | None | | status | DOWN | | updated_at | 2017-06-10T00:40:51Z | +---------------------+--------------------------------------+
8.2. Attach the floating IP to the research-app1 virtual machine. [student@workstation ~(developer1-research)]$ openstack server add floating ip \ research-app1 172.25.250.N
9.
As the developer2 user, create a floating IP and attach it to the finance-app1 virtual machine. 9.1. Source the credentials for the developer2 user and create a floating IP. [student@workstation ~(developer1-research)]$ source developer2-finance-rc [student@workstation ~(developer2-finance)]$ openstack floating ip create \ provider-172.25.250 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | created_at | 2017-06-10T00:40:51Z | | description | | | fixed_ip_address | None | | floating_ip_address | 172.25.250.P |
188
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| floating_network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc | | headers | | | id | 797854e4-1253-4059-a6d1-3cb5a99a98ec | | port_id | None | | project_id | cd68b32fa14942d587a4be838ac722be | | project_id | cd68b32fa14942d587a4be838ac722be | | revision_number | 1 | | router_id | None | | status | DOWN | | updated_at | 2017-06-10T00:40:51Z | +---------------------+--------------------------------------+
9.2. Attach the floating IP to the finance-app1 virtual machine. [student@workstation ~(developer2-finance)]$ openstack server add floating ip \ finance-app1 172.25.250.P
10. Source the credentials for the developer1 user and retrieve the floating IP attached to the research-app1 virtual machine. [student@workstation ~(developer2-finance)]$ source developer1-research-rc [student@workstation ~(developer1-research)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "research-network1=192.168.1.R, 172.25.250.N", "ID": "d9c2010e-93c0-4dc7-91c2-94bce5133f9b", "Image Name": "rhel7", "Name": "research-app1" } ]
11.
Test the connectivity to the instance research-app1, running in the research project by using the ping command. [student@workstation ~(developer1-research)]$ PING 172.25.250.N (172.25.250.N) 56(84) bytes 64 bytes from 172.25.250.N: icmp_seq=1 ttl=63 64 bytes from 172.25.250.N: icmp_seq=2 ttl=63 64 bytes from 172.25.250.N: icmp_seq=3 ttl=63
ping -c 3 172.25.250.N of data. time=1.77 ms time=0.841 ms time=0.861 ms
--- 172.25.250.N ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2001ms rtt min/avg/max/mdev = 0.841/1.159/1.776/0.437 ms
12. As the developer2 user, retrieve the floating IP attached to the finance-app1 virtual machine so you can test connectivity. [student@workstation ~(developer1-research)]$ source developer2-finance-rc [student@workstation ~(developer2-finance)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "finance-network1=192.168.2.F, 172.25.250.P", "ID": "797854e4-1253-4059-a6d1-3cb5a99a98ec", "Image Name": "rhel7", "Name": "finance-app1"
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
189
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure } ]
13. Use the ping command to reach the 172.25.250.P IP. Leave the command running, as you will connect to the overcloud nodes to review how the packets are routed. [student@workstation ~(developer2-finance)]$ ping 172.25.250.P PING 172.25.250.P (172.25.250.P) 56(84) bytes of data. 64 bytes from 172.25.250.P: icmp_seq=1 ttl=63 time=1.84 ms 64 bytes from 172.25.250.P: icmp_seq=2 ttl=63 time=0.639 ms 64 bytes from 172.25.250.P: icmp_seq=3 ttl=63 time=0.708 ms ...output omitted...
14. Open another terminal. Use the ssh command to log in to controller0 as the heatadmin user. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$
15. Run the tcpdump command against all interfaces. Notice the two IP address to which the ICMP packets are routed: 192.168.2.F, which is the private IP of the finance-app1 virtual machine, and 172.25.250.254, which is the gateway for the provider network. [heat-admin@overcloud-controller-0 ~]$ sudo tcpdump \ -i any -n -v \ 'icmp[icmptype] = icmp-echoreply' \ or 'icmp[icmptype] = icmp-echo' tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 16:15:09.301102 IP (tos 0x0, ttl 64, id 31032, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 172.25.250.P: ICMP echo request, id 24572, seq 10, length 64 16:15:09.301152 IP (tos 0x0, ttl 63, id 31032, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.2.F: ICMP echo request, id 24572, seq 10, length 64 16:15:09.301634 IP (tos 0x0, ttl 64, id 12980, offset 0, flags [none], proto ICMP (1), length 84) 192.168.2.F > 172.25.250.254: ICMP echo reply, id 24572, seq 10, length 64 16:15:09.301677 IP (tos 0x0, ttl 63, id 12980, offset 0, flags [none], proto ICMP (1), length 84) 172.25.250.P > 172.25.250.254: ICMP echo reply, id 24572, seq 10, length 64 16:15:10.301102 IP (tos 0x0, ttl 64, id 31282, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 172.25.250.P: ICMP echo request, id 24572, seq 11, length 64 16:15:10.301183 IP (tos 0x0, ttl 63, id 31282, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.2.F: ICMP echo request, id 24572, seq 11, length 64 16:15:10.301693 IP (tos 0x0, ttl 64, id 13293, offset 0, flags [none], proto ICMP (1), length 84) 192.168.2.F > 172.25.250.254: ICMP echo reply, id 24572, seq 11, length 64 16:15:10.301722 IP (tos 0x0, ttl 63, id 13293, offset 0, flags [none], proto ICMP (1), length 84) 172.25.250.P > 172.25.250.254: ICMP echo reply, id 24572, seq 11, length 64 ...output omitted...
16. Cancel the tcpdump command by pressing Ctrl+C and list the network namespaces. Retrieve the routes in the qrouter namespace to determine the network device that
190
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
handles the routing for the 192.168.2.0/24 network. The following output indicates that packets destined to the 192.168.2.0/24 network are routed through the qr-ac11ea59e5 device (the IDs and names will be different in your output). [heat-admin@overcloud-controller-0 ~]$ ip netns list qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 ...output omitted... [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \ ip route 172.25.250.0/24 dev qg-dba2aba8-90 proto kernel scope link src 172.25.250.107 192.168.1.0/24 dev qr-fa7dab05-e5 proto kernel scope link src 192.168.1.1 192.168.2.0/24 dev qr-ac11ea59-e5 proto kernel scope link src 192.168.2.1
17.
Within the qrouter namespace, run the ping command to confirm that the private IP of the finance-app1 virtual machine, 192.168.2.F, is reachable. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \ ping -c 3 -I qr-ac11ea59-e5 192.168.2.F PING 192.168.2.F (192.168.2.F) from 192.168.2.1 qr-ac11ea59-e5: 56(84) bytes of data. 64 bytes from 192.168.2.F: icmp_seq=1 ttl=64 time=0.555 ms 64 bytes from 192.168.2.F: icmp_seq=2 ttl=64 time=0.507 ms 64 bytes from 192.168.2.F: icmp_seq=3 ttl=64 time=0.601 ms --- 192.168.2.F ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.507/0.554/0.601/0.042 ms
18. From the first terminal, cancel the ping command by pressing Ctrl+C. Rerun the ping command against the floating IP of the research-app1 virtual machine, 172.25.250.N. Leave the command running, as you will be inspecting the packets from the controller0. [student@workstation ~(developer2-finance)]$ ping 172.25.250.N PING 172.25.250.N (172.25.250.N) 56(84) bytes of data. 64 bytes from 172.25.250.N: icmp_seq=1 ttl=63 time=1.84 ms 64 bytes from 172.25.250.N: icmp_seq=2 ttl=63 time=0.639 ms 64 bytes from 172.25.250.N: icmp_seq=3 ttl=63 time=0.708 ms ...output omitted...
19. From the terminal connected to the controller-0, run the tcpdump command. Notice the two IP address to which the ICMP packets are routed: 192.168.1.R, which is the private IP of the research-app1 virtual machine, and 172.25.250.254, which is the IP address of the gateway for the provider network. [heat-admin@overcloud-controller-0 ~]$ sudo tcpdump \ -i any -n -v \ 'icmp[icmptype] = icmp-echoreply' or \ 'icmp[icmptype] = icmp-echo' tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 16:58:40.340643 IP (tos 0x0, ttl 64, id 65405, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 172.25.250.N: ICMP echo request, id 24665, seq 47, length 64
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
191
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 16:58:40.340690 IP (tos 0x0, ttl 63, id 65405, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.1.R: ICMP echo request, id 24665, seq 47, length 64 16:58:40.341130 IP (tos 0x0, ttl 64, id 41896, offset 0, flags [none], proto ICMP (1), length 84) 192.168.1.R > 172.25.250.254: ICMP echo reply, id 24665, seq 47, length 64 16:58:40.341141 IP (tos 0x0, ttl 63, id 41896, offset 0, flags [none], proto ICMP (1), length 84) 172.25.250.N > 172.25.250.254: ICMP echo reply, id 24665, seq 47, length 64 16:58:41.341051 IP (tos 0x0, ttl 64, id 747, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 172.25.250.N: ICMP echo request, id 24665, seq 48, length 64 16:58:41.341102 IP (tos 0x0, ttl 63, id 747, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.1.R: ICMP echo request, id 24665, seq 48, length 64 16:58:41.341562 IP (tos 0x0, ttl 64, id 42598, offset 0, flags [none], proto ICMP (1), length 84) 192.168.1.R > 172.25.250.254: ICMP echo reply, id 24665, seq 48, length 64 16:58:41.341585 IP (tos 0x0, ttl 63, id 42598, offset 0, flags [none], proto ICMP (1), length 84) 172.25.250.N > 172.25.250.254: ICMP echo reply, id 24665, seq 48, length 64 ...output omitted...
20. Cancel the tcpdump command by pressing Ctrl+C and list the network namespaces. Retrieve the routes in the qrouter namespace to determine the network device that handles routing for the 192.168.1.0/24 network. The following output indicates that packets destined to the 192.168.1.0/24 network are routed through the qr-fa7dab05e5 device (the IDs and names will be different in your output). [heat-admin@overcloud-controller-0 ~]$ sudo ip netns list qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 ...output omitted... [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \ ip route 172.25.250.0/24 dev qg-dba2aba8-90 proto kernel scope link src 172.25.250.107 192.168.1.0/24 dev qr-fa7dab05-e5 proto kernel scope link src 192.168.1.1 192.168.2.0/24 dev qr-ac11ea59-e5 proto kernel scope link src 192.168.2.1
21. Within the qrouter namespace, run the ping command to confirm that the private IP of the finance-app1 virtual machine, 192.168.1.F, is reachable. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \ ping -c 3 -I qr-fa7dab05-e5 192.168.1.R PING 192.168.192.168.1.R (192.168.192.168.1.R) from 192.168.1.1 56(84) bytes of data. 64 bytes from 192.168.192.168.1.R: icmp_seq=1 ttl=64 time=0.500 64 bytes from 192.168.192.168.1.R: icmp_seq=2 ttl=64 time=0.551 64 bytes from 192.168.192.168.1.R: icmp_seq=3 ttl=64 time=0.519
qr-fa7dab05-e5: ms ms ms
--- 192.168.192.168.1.R ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.500/0.523/0.551/0.028 ms
22. Exit from controller0 and connect to compute0. [heat-admin@overcloud-controller-0 ~]$ exit
192
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~]$ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$
23. List the Linux bridges. The following output indicates two bridges with two ports each. Each bridge corresponds to an instance. The TAP devices in each bridge correspond to the virtual NIC; the qvb devices correspond to the vEth pair that connect the Linux bridge to the integration bridge, br-int. [heat-admin@overcloud-compute-0 ~]$ brctl show bridge name bridge id STP enabled interfaces qbr03565cda-b1 8000.a2117e24b27b no qvb03565cda-b1 tap03565cda-b1 qbr92387a93-92 8000.9a21945ec452 no qvb92387a93-92 tap92387a93-92
24. Run the tcpdump command against any of the two qvb interface while the ping command is still running against the 172.25.250.N floating IP. If the output does not show any packets being captured, press CTRL+C and rerun the command against the other qvb interface. [heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i qvb03565cda-b1 \ -n -vv 'icmp[icmptype] = icmp-echoreply' or \ 'icmp[icmptype] = icmp-echo' tcpdump: WARNING: qvb03565cda-b1: no IPv4 address assigned tcpdump: listening on qvb03565cda-b1, link-type EN10MB (Ethernet), capture size 65535 bytes CTRL+C [heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i qvb92387a93-92 \ -n -vv 'icmp[icmptype] = icmp-echoreply' or \ 'icmp[icmptype] = icmp-echo' tcpdump: WARNING: qvb92387a93-92: no IPv4 address assigned tcpdump: listening on qvb92387a93-92, link-type EN10MB (Ethernet), capture size 65535 bytes 17:32:43.781928 IP (tos 0x0, ttl 63, id 48653, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.1.12: ICMP echo request, id 24721, seq 1018, length 64 17:32:43.782197 IP (tos 0x0, ttl 64, id 37307, offset 0, flags [none], proto ICMP (1), length 84) 192.168.1.R > 172.25.250.254: ICMP echo reply, id 24721, seq 1018, length 64 17:32:44.782026 IP (tos 0x0, ttl 63, id 49219, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.1.12: ICMP echo request, id 24721, seq 1019, length 64 17:32:44.782315 IP (tos 0x0, ttl 64, id 38256, offset 0, flags [none], proto ICMP (1), length 84) 192.168.1.R > 172.25.250.254: ICMP echo reply, id 24721, seq 1019, length 64 ...output omitted...
25. From the first terminal, cancel the ping command. Rerun the command against the 172.25.250.P IP, which is the IP of the finance-app1 instance. [student@workstation ~(developer2-finance)]$ ping 172.25.250.P PING 172.25.250.P (172.25.250.P) 56(84) bytes of data. 64 bytes from 172.25.250.P: icmp_seq=1 ttl=63 time=0.883 ms 64 bytes from 172.25.250.P: icmp_seq=2 ttl=63 time=0.779 ms 64 bytes from 172.25.250.P: icmp_seq=3 ttl=63 time=0.812 ms 64 bytes from 172.25.250.P: icmp_seq=4 ttl=63 time=0.787 ms
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
193
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure ...output omitted...
26. From the terminal connected to compute0 node, enter CTRL+C to cancel the tcpdump command. Rerun the command against the second qvb interface, qvb03565cda-b1. Confirm that the output indicates some activity. [heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i qvb03565cda-b1 \ -n -vv 'icmp[icmptype] = icmp-echoreply' or \ 'icmp[icmptype] = icmp-echo' tcpdump: WARNING: qvb03565cda-b1: no IPv4 address assigned 17:40:20.596012 IP (tos 0x0, ttl 63, id 58383, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.2.F: ICMP echo request, id 24763, seq 172, length 64 17:40:20.596240 IP (tos 0x0, ttl 64, id 17005, offset 0, flags [none], proto ICMP (1), length 84) 192.168.2.F > 172.25.250.254: ICMP echo reply, id 24763, seq 172, length 64 17:40:21.595997 IP (tos 0x0, ttl 63, id 58573, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.2.F: ICMP echo request, id 24763, seq 173, length 64 17:40:21.596294 IP (tos 0x0, ttl 64, id 17064, offset 0, flags [none], proto ICMP (1), length 84) 192.168.2.F > 172.25.250.254: ICMP echo reply, id 24763, seq 173, length 64 17:40:22.595953 IP (tos 0x0, ttl 63, id 59221, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.254 > 192.168.2.F: ICMP echo request, id 24763, seq 174, length 64 17:40:22.596249 IP (tos 0x0, ttl 64, id 17403, offset 0, flags [none], proto ICMP (1), length 84) 192.168.2.F > 172.25.250.254: ICMP echo reply, id 24763, seq 174, length 64 ...output omitted...
27. From the first terminal, cancel the ping and confirm that the IP address 192.168.2.F is the private IP of the finance-app1 instance. 27.1. Retrieve the private IP of the finance-app1 instance. [student@workstation ~(developer2-finance)]$ openstack server show \ finance-app1 -f json { "OS-EXT-STS:task_state": null, "addresses": "finance-network1=192.168.2.F, 172.25.250.P", ...output omitted...
28. Log in to the finance-app1 instance as the cloud-user user. Run the ping command against the floating IP assigned to the research-app1 virtual machine, 172.25.250.N. 28.1.Use the ssh command as the cloud-user user to log in to finance-app1, with an IP address of 172.25.250.P. Use the developer2-keypair1 located in the home directory of the student user. [student@workstation ~(developer2-finance)]$ ssh -i developer2-keypair1.pem \ [email protected] [cloud-user@finance-app1 ~]$
28.2.Run the ping command against the floating IP of the research-app1 instance, 172.25.250.N.
194
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[cloud-user@finance-app1 ~]$ ping 172.25.250.N
29. From the terminal connected to compute-0, enter CTRL+C to cancel the tcpdump command. Rerun the command without specifying any interface. Confirm that the output indicates some activity. [heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i any \ -n -v 'icmp[icmptype] = icmp-echoreply' or \ 'icmp[icmptype] = icmp-echo' tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 18:06:05.030442 IP (tos 0x0, ttl 64, id 39160, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.2.F > 172.25.250.N: ICMP echo request, id 12256, seq 309, length 64 ; 18:06:05.030489 IP (tos 0x0, ttl 63, id 39160, offset 0, flags [DF], proto ICMP (1), length 84) 172.25.250.P > 192.168.1.R: ICMP echo request, id 12256, seq 309, length 64 ; 18:06:05.030774 IP (tos 0x0, ttl 64, id 32646, offset 0, flags [none], proto ICMP (1), length 84) 192.168.1.R > 172.25.250.P: ICMP echo reply, id 12256, seq 309, length 64 ; 18:06:05.030786 IP (tos 0x0, ttl 63, id 32646, offset 0, flags [none], proto ICMP (1), length 84) 172.25.250.N > 192.168.2.F: ICMP 18:06:06.030527 IP (tos 0x0, ttl 64, length 84) 192.168.2.F > 172.25.250.N: ICMP 18:06:06.030550 IP (tos 0x0, ttl 63, length 84) 172.25.250.P > 192.168.1.R: ICMP 18:06:06.030880 IP (tos 0x0, ttl 64, (1), length 84) 192.168.1.R > 172.25.250.P: ICMP 18:06:06.030892 IP (tos 0x0, ttl 63, (1), length 84) 172.25.250.N > 192.168.2.F: ICMP ...output omitted...
echo reply, id 12256, seq 309, length 64 ; id 40089, offset 0, flags [DF], proto ICMP (1), echo request, id 12256, seq 310, length 64 id 40089, offset 0, flags [DF], proto ICMP (1), echo request, id 12256, seq 310, length 64 id 33260, offset 0, flags [none], proto ICMP echo reply, id 12256, seq 310, length 64 id 33260, offset 0, flags [none], proto ICMP echo reply, id 12256, seq 310, length 64
The output indicates the following flow for the sequence ICMP 309 (seq 309): The private IP of the finance-app1 instance, 192.168.2.F sends an echo request to the floating IP of the research-app1 instance, 172.25.250.N. The floating IP of the finance-app1 instance, 172.25.250.P sends an echo request to the private IP of the research-app1 instance, 192.168.1.R. The private IP of the research-app1 instance, 192.168.1.R sends an echo reply to the floating IP of the finance-app1 instance, 172.25.250.P. The floating IP of the research-app1 instance, 172.25.250.N sends an echo reply to the private IP of the finance-app1 instance, 192.168.2.F. 30. Close the terminal connected to compute-0. Cancel the ping command, and log out of finance-app1. Cleanup From workstation, run the lab network-tracing-net-flows cleanup script to clean up the resources created in this exercise.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
195
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
[student@workstation ~]$ lab network-tracing-net-flows cleanup
196
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Network Issues
Troubleshooting Network Issues Objectives After completing this section, students should be able to: • Troubleshoot common networking issues. • Review OpenStack services configuration files.
Common Networking Issues While Software-defined Networking may seem to introduce complexity at first glance, the diagnostic process of troubleshooting network connectivity in OpenStack is similar to that of a physical network. The OpenStack virtual infrastructure can be considered the same as a physical infrastructure, hence, administrators can use the same tools and utilities they would use when troubleshooting physical servers. The following table lists some of the basic tools that administrators can use to troubleshoot their environment. Troubleshooting Utilities Command
Purpose
ping
Sends packets to network hosts. The ping command is a useful tool for analyzing network connectivity problems. The results serve as a basic indicator of network connectivity. The ping command works by sending traffic to specified destinations, and then reports back whether the attempts were successful.
ip
Manipulates routing tables, network devices and tunnels. The command allows you to review IP addresses, network devices, namespaces, and tunnels.
traceroute
Tracks the route that packets take from an IP network on their way to a given host.
tcpdump
A packet analyzer that allows users to display TCP/IP and other packets being transmitted or received over a network to which a computer is attached.
ovs-vsctl
High-level interface for managing the Open vSwitch database. The command allows the management of Open vSwitch bridges, ports, tunnels, and patch ports.
ovs-ofctl
Administers OpenFlow switches. It can also show the current state of an OpenFlow switch, including features, configuration, and table entries.
brctl
Manages Linux bridges. The command allows you to manage Linux bridges. Administrators can retrieve MAC addresses, devices names, and bridge configurations.
openstack
The OpenStack unified CLI. The command can be used to review networks and networks ports.
neutron
The Neutron networking service CLI. The command can be used to review router ports and network agents.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
197
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Troubleshooting Scenarios Troubleshooting procedures help mitigate issues and isolate them. There are some basic recurring scenarios in OpenStack environments that administrators are likely to face. The following potential scenarios include basic troubleshooting steps.
Note Some of the resolution steps outlined in the following scenarios can overlap.
Instances are not able to reach the external network. 1. Use the ip command from within the instance to ensure that the DHCP provided an IP address. 2.
Review the bridges on the compute node to ensure that a vEth pair connects the project bridge to the integration bridge.
3.
Review the network namespaces on the network node. Ensure that the router namespace exists and that routes are properly set.
4.
Review the security group that the instance uses to make sure that there is a rule that allows outgoing traffic.
5.
Review the OpenStack Networking configuration to ensure that the mapping between the physical interfaces and the provider network is properly set.
Instances do not retrieve an IP address. 1. Use the ps command to ensure that the dnsmasq service is running on the controller (or network) node. 2.
Review the namespaces to ensure that the qdhcp namespace exists and has the TAP device that the dnsmasq service uses.
3.
If the environment uses VLANs, ensure that the switch ports are set in trunk mode or that the right VLAN ID is set for the port.
4.
If a firewall manages the compute node, ensure that there are not any conflicting rules that prevent the DHCP traffic from passing.
5.
Use the neutron command to review the state of the DHCP agent.
Metadata is not injected into instances. 1. Ensure that the cloud-init package is installed in the source image. 2.
Review the namespace to make sure that there is a route for the 169.254.169.254/32 address, and that it uses the right network interface. This IP addresses is used in Amazon EC2 and other cloud computing platforms to distribute metadata to cloud instances. In OpenStack, a Netfilter rule redirects packets destined to this IP address to the IP address of the node that runs the metadata service.
3.
Ensure that there is a Netfilter rule that redirects the calls from the 169.254.169.254 IP address to the Nova metadata service.
198
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Scenarios Using the ping Command for Troubleshooting The ping command is a useful tool when troubleshooting an environment. By sending traffic to specified destinations, and then reporting back the status, it helps administrators analyze potential network connectivity problems. The results that are obtained are a good indicator of network connectivity, or lack thereof. However, the command might exclude some connectivity issues, such as firewalls blocking the traffic.
Note As a general practice, it is not recommended to configure firewalls to block ICMP packets. Doing so makes troubleshooting more difficult.
The ping command can be run from the instance, the network node, and the compute node. The -I interface option allows administrators to send packets from the specified interface. The command allows the validation of multiple layers of the network infrastructure, such as: • Name resolution, which implies the availability of the DNS server. • IP routing, which uses the routing table rules. • Network switching, which implies proper connectivity between the various network devices. Results from a test using the ping command can reveal valuable information, depending on which destination is tested. For example, in the following diagram, the instance VM1 is experiencing connectivity issues. The possible destinations are numbered and the conclusions drawn from a successful or failed result are presented below.
Figure 5.13: A basic troubleshooting scenario 1.
Internet: a common first step is to send a packet to an external network, such as www.redhat.com.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
199
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure • If the packet reaches the Internet, it indicates that all the various network points are working as expected. This includes both the physical and virtual infrastructures. • If the packet does not reach the Internet, while other servers are able to reach it, it indicates that an intermediary network point is at fault. 2.
Physical router: this is the IP address of the physical router, as configured by the network administrator to direct the OpenStack internal traffic to the external network. • If the packet reaches the IP address of the router, it indicates that the underlying switches are properly set. Note that the packets at this stage do not traverse the router, therefore, this step cannot be used to determine if there is a routing issue present on the default gateway. • If the packet does not reach the router, it indicates a failure in the path between the instance and the router. The router or the switches could be down, or the gateway could be improperly set.
3.
Physical switch: the physical switch connects the different nodes on the same physical network. • If the instance is able to reach an instance on the same subnet, this indicates that the physical switch allows the packets to pass. • If the instance is not able to reach an instance on the same subnet, this could indicate that switch ports do not trunk the required VLANs.
4.
OpenStack Networking router: the virtual OpenStack Networking router that directs the traffic of the instances. • If the instance is able to reach the virtual router, this indicates that there are rules that allow the ICMP traffic. This also indicates that the OpenStack Networking network node is available and properly synchronized with the OpenStack Networking server. • If the instance is not able to reach the virtual router, this could indicate that the security group that the instance uses does not allow ICMP packets to pass. This could also indicate that the L3 agent is down or not properly registered to the OpenStack Networking server.
5.
VM2: the instance running on the same compute node. • If the instance VM1 is able to reach VM2, this indicates that the network interfaces are properly configured. • If the instance VM1 is not able to reach VM2, this could indicate that VM2 prevents the ICMP traffic. This could also indicate that the virtual bridges are not set correctly.
Troubleshooting VLANs OpenStack Networking trunks VLAN networks through SDN switches. The support of VLANtagged provider networks means that the instances are able to communicate with servers located in the physical network. To troubleshoot connectivity to a VLAN Provider network, administrator can use the ping command to reach the IP address of the gateway defined during the creation of the network.
200
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Scenarios There are many ways to review the mapping of VLAN networks. For example, to discover which internal VLAN tag is in use for a given external VLAN, administrators can use the ovs-ofctl command The following scenario assumes the creation of a subnet with a segmentation ID of 6. [user@demo ~]$ openstack network create \ --external \ --provider-network-type vlan \ --provider-physical-network datacentre \ --provider-segment 6 \ provider-net [user@demo ~]$ openstack subnet create \ --network provider-net \ --subnet-range=192.168.1.0/24 \ --dns-nameserver=172.25.250.254 \ --allocation-pool start=192.168.120.254,end=192.168.120.0/24 --dhcp provider-subnet
1.
Retrieve the VLAN ID of the network, referred to as provider:segmentation_id. [user@demo ~]$ openstack network show provider-net +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ ...output omitted... | provider:segmentation_id | None | ...output omitted... +---------------------------+--------------------------------------+
2.
Connect to the compute node and run the ovs-ofctl dump-flows command against the integration bridge. Review the flow to make sure that there is a matching rule for the VLAN tag 6. The following output shows that packets received on port ID 1 with the VLAN tag 6 are modified to have the internal VLAN tag 15. [user@demo ~]$ ovs-ofctl dump-flows br-int NXST_FLOW reply (xid=0x4): cookie=0xa6bd2d041ea176d1, duration=547156.698s, table=0, n_packets=1184, n_bytes=145725, idle_age=82, hard_age=65534, priority=3,in_port=1,dl_vlan=6 actions=mod_vlan_vid:15,NORMAL ...
3.
Run the ovs-ofctl show br-int command to access the flow table and the ports of the integration bridge. The following output shows that the port with the ID of 1 is assigned to the int-br-ex port. [user@demo ~]$ ovs-ofctl show br-int OFPT_FEATURES_REPLY (xid=0x2): dpid:0000828a09b0f949 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(int-br-ex): addr:82:40:82:eb:0a:58 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
201
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 4.
Use tools such as the ping command throughout the various network layers to detect potentials connectivity issues. For example, if packets are lost between the compute node and the controller node, this may indicate network congestion on the equipment that connects the two nodes.
5.
Review the configuration of physical switches to ensure that ports through which the project traffic passes allow the traffic for network packets tagged with the provider ID. Usually, ports need to be set in trunk mode.
Troubleshooting OpenStack Networking Agents OpenStack Networking agents are services that perform a set of particular tasks. They can be seen as software that handles data packets. Such agents include: the DHCP agent, the L3 agent, the metering agent, and the LBaaS agent. The neutron agent-list command can be used to review the state of the agent. If an agent is out of synchronization or not properly registered, this can lead to unexpected results. For example, if the DHCP agent is not marked as alive, instances will not retrieve any IP address from the agent. The following command shows how the neutron agent-list and neutron agent-show command can be used to retrieve more information about OpenStack Networking agents. [user@demo ~]$ neutron agent-list +-----------------------+--------------------+------------------------------------+ | id | agent_type | host | +-----------------------+--------------------+------------------------------------+ | 878cd9a3-addf11b7b302 | DHCP agent | overcloud-controller-0.localdomain | | c60d8343-ed6ba0320a76 | Metadata agent | overcloud-controller-0.localdomain | | cabd8fe5-a37f9aa68111 | L3 agent | overcloud-controller-0.localdomain | | cc054b29-32b83bf41a95 | Open vSwitch agent | overcloud-compute-0.localdomain | | f7921f0a-6c89cea15286 | Open vSwitch agent | overcloud-controller-0.localdomain | +-----------------------+--------------------+------------------------------------+ -------------------+-------+----------------+---------------------------+ availability_zone | alive | admin_state_up | binary | -------------------+-------+----------------+---------------------------+ nova
| :-) | True | neutron-dhcp-agent | | :-) | True | neutron-metadata-agent | nova | :-) | True | neutron-l3-agent | | xxx | True | neutron-openvswitch-agent | | :-) | True | neutron-openvswitch-agent | -------------------+-------+----------------+---------------------------+ [user@demo ~]$ neutron agent-show cabd8fe5-82e1-467a-b59c-a37f9aa68111 +---------------------+-----------------------------------------------+ | Field | Value | +---------------------+-----------------------------------------------+ | admin_state_up | True | | agent_type | L3 agent | | alive | True | | availability_zone | nova | | binary | neutron-l3-agent | | configurations | | | | | |
202
| { | | | | | |
"agent_mode": "legacy", "gateway_external_network_id": "", "handle_internal_only_routers": true, "routers": 0, "interfaces": 0, "floating_ips": 0,
| | | | | | |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Scenarios |
|
"interface_driver": "neutron.agent. linux.interface.OVSInterfaceDriver", "log_agent_heartbeats": false, "external_network_bridge": "", "ex_gw_ports": 0
| | | | | | | | | | | | } | | created_at | 2017-04-29 01:47:50 | | description | | | heartbeat_timestamp | 2017-05-10 00:56:15 | | host | overcloud-controller-0.localdomain | | id | cabd8fe5-82e1-467a-b59c-a37f9aa68111 | | started_at | 2017-05-09 19:22:14 | | topic | l3_agent | +---------------------+-----------------------------------------------+
The agent type. The host that the agent runs on. The status of the agent. :-) indicates that the agent is alive and registered. xxx indicates that the agent is not able to contact the OpenStack Networking server. Extra information about the agent configuration. Troubleshooting OpenStack Networking Configuration Files OpenStack Networking configuration files orchestrate the behavior of OpenStack Networking services. They allow administrators to configure each network service. For example, dhcp_agent.ini is used by the DHCP agent. Most OpenStack Networking configuration files use the INI file format. INI files are text files that specify options as key=value pairs. Each entry belongs to a group, such as DEFAULT. The following output shows the ovs_integration_bridge key with a value of br-int in the DEFAULT group. The entry is commented out, as this is the default value that OpenStack Networking defines. [DEFAULT] # # From neutron.base.agent # # Name of Open vSwitch bridge to use (string value) #ovs_integration_bridge = br-int
OpenStack Networking configuration files are automatically configured by the undercloud when deploying both the undercloud and the overcloud. The installers parse values defined in the undercloud.conf or the Heat template files. However, the tools do not check for environmentrelated error, such as a missing connectivity to external networks of misconfigured interfaces. The following table lists the configuration files for OpenStack Networking services, located in / etc/neutron: OpenStack Networking Configuration Files File
Purpose
dhcp_agent.ini
Used by the OpenStack Networking DHCP agent.
l3_agent.ini
Used by the OpenStack Networking L3 agent.
lbass_agent.ini
Used by the OpenStack Networking LBaaS agent.
metadata_agent.ini
Used by the OpenStack Networking metadata agent.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
203
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure File
Purpose
metering_agent.ini
Used by the OpenStack Networking metering agent.
neutron.conf
Used by the OpenStack Networking server.
conf.d/agent
The conf.d directory contains extra directories for each OpenStack Networking agent. This directory can be used to configure OpenStack Networking services with custom user-defined configuration files.
plugins/ml2
The ml2 directory contains a configuration file for each plugin. For example, the openvswitch_agent.ini contains the configuration for the Open vSwitch plugin.
plugins/ml2/ml2_conf.ini
Defines the configuration for the ML2 framework. In this file, administrators can set the VLAN ranges or the drivers to enable.
Most of the options in the configuration files are documented with a short comment explaining how the option is used by the service. Therefore, administrators can understand what the option does before setting the value. Consider the ovs_use_veth option in the dhcp_agent.ini, which provides instructions for using vEth interfaces: # Uses veth for an OVS interface or not. Support kernels with limited namespace # support (e.g. RHEL 6.5) so long as ovs_use_veth is set to True. (boolean # value) #ovs_use_veth = false
Important While some options use boolean values, such as true or false, other options require a value. Even if the text above each value specifies the type (string value or boolean value), administrators need to understand the option before changing it.
Note Modified configuration files in the overcloud are reset to their default state when the overcloud is updated. If custom options are set, administrators must update the configuration files after each overcloud update.
Administrators are likely to need to troubleshoot the configuration files when some action related to a service fails. For example, upon creation of a VXLAN network, if OpenStack Networking complains about a missing provider, administrators need to review the configuration of ML2. They would then make sure that the type_drivers key in the ml2 section of the /etc/ neutron/plugins/ml2/ml2_conf.ini configuration file has the proper value set. [ml2] type_drivers = vxlan
They also have to make sure that the VLAN range in the section dedicated to VLAN is set correctly. For example:
204
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Scenarios
[ml2_type_vlan] network_vlan_ranges=physnet1:171:172
Common Configuration Errors The following lists some of the most common errors related to misconfigured files and their resolution. • Traffic does not reach the external network: administrators should review the bridge mapping. Traffic that leaves the provider network from the router arrives in the integration bridge. A patch port between the integration bridge and the external bridge allows the traffic to pass through the bridge of the provider network and out to the physical network. Administrators must ensure that there is an interface connected to the Internet which belongs to the external bridge. The bridge mapping is defined in /etc/neutron/plugins/ml2/ openvswitch_agent.ini: bridge_mappings = datacentre:br-ex
The bridge mapping configuration must correlate with that of the VLAN range. For the example given above, the network_vlan_ranges should be set as follows: network_vlan_ranges = datacentre:1:1000
• Packets in a VLAN network are not passing through a switch ports: administrators should review the network_vlan_ranges in the /etc/neutron/plugin.ini configuration file to make sure it matches the VLAN IDs allowed to pass through the switch ports. • The OpenStack Networking metadata server is unreachable by instances: administrators should review the enable_isolated_metadata setting in the /etc/neutron/ dhcp_agent.ini. If the instances are directly attached to a provider's external network, and have an external router configured as their default gateway, OpenStack Networking routers are not used. Therefore, the OpenStack Netorking routers cannot be used to proxy metadata requests from instances to the Nova metadata server. This can be resolved by setting the enable_isolated_metadata key to True: enable_isolated_metadata = True
• Support of overlapping IPs is disabled: overlapping IPs require the usage of Linux network namespace. To enable the support of overlapping IPs, administrators must set the allow_overlapping_ips key to True in the /etc/neutron/neutron.conf configuration file: # MUST be set to False if OpenStack Networking is being used in conjunction with Nova # security groups. (boolean value) # allow_overlapping_ips = True allow_overlapping_ips=True
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
205
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure Troubleshooting Project Networks When using namespaces, project traffic is contained within the network namespaces. As a result, administrators can use both the openstack and ip commands to review the implementation of the project network on the physical system that acts as a network node. The following output shows the available project networks. In this example, there is only one network, internal1. [user@demo ~]$ openstack network list +----------------------------+-----------+----------------------------+ | ID | Name | Subnets | +----------------------------+-----------+----------------------------+ | 0062e02b-7e40-49e84de096ed | internal1 | 9f42ecca-0f8b-a01350df7c7c | +----------------------------+-----------+----------------------------+
Notice the UUID of the network, 0062e02b-7e40-49e84de096ed. This value is appended to the network namespace, as shown by the following output. [user@demo ~]$ ip netns list qdhcp-0062e02b-7e40-49e84de096ed
This mapping allows for further troubleshooting. For example, administrators can review the routing table for this project network. [user@demo ~]$ ip netns exec 0062e02b-7e40-49e84de096ed ip route default via 192.168.0.1 dev tapae83329c-91 192.168.0.0/24 dev tapae83329c-91 proto kernel scope link src 192.168.0.2
The tcpdump command can be used for the namespace. Administrators can, for example, open another terminal window while trying to reach an external server. [user@demo ~]$ ip netns exec 0062e02b-7e40-407f-ac43-49e84de096ed ping -c 3 172.25.250.254 PING 172.25.250.254 (172.25.250.254) 56(84) bytes of data. 64 bytes from 172.25.250.254: icmp_seq=1 ttl=63 time=0.368 ms 64 bytes from 172.25.250.254: icmp_seq=2 ttl=63 time=0.265 ms 64 bytes from 172.25.250.254: icmp_seq=3 ttl=63 time=0.267 ms --- 172.25.250.254 ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2001ms rtt min/avg/max/mdev = 0.265/0.300/0.368/0.048 ms
[user@demo ~]$ ip netns exec 0062e02b-7e40-407f-ac43-49e84de096ed tcpdump -qnntpi any icmp IP 192.168.0.2 > 172.25.250.254: ICMP echo request, id 42731, seq 1, length 64 IP 172.25.250.254 > 192.168.0.2: ICMP echo reply, id 42731, seq 1, length 64 IP 192.168.0.2 > 172.25.250.254: ICMP echo request, id 42731, seq 2, length 64 IP 172.25.250.254 > 192.168.0.2: ICMP echo reply, id 42731, seq 2, length 64 IP 192.168.0.2 > 172.25.250.254: ICMP echo request, id 42731, seq 3, length 64 IP 172.25.250.254 > 192.168.0.2: ICMP echo reply, id 42731, seq 3, length 64
OpenStack Log Files Each OpenStack service uses a log file located in /var/log/service/, where service is the name of the service, such as nova. Inside the service's directory, there is one log file per component. The following output lists the log files for OpenStack Networking services:
206
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
OpenStack Log Files
[root@overcloud-controller-0 neutron]# ls total 512 drwxr-x---. 2 neutron neutron 144 May drwxr-xr-x. 40 root root 4096 May -rw-r--r--. 1 neutron neutron 54540 May -rw-r--r--. 1 neutron neutron 37025 May -rw-r--r--. 1 neutron neutron 24094 May -rw-r--r--. 1 neutron neutron 91136 May -rw-r--r--. 1 neutron neutron 1097 May -rw-r--r--. 1 neutron neutron 298734 May
-al 30 31 31 31 31 31 31 31
20:34 16:10 16:16 16:17 16:16 16:17 16:13 17:13
. .. dhcp-agent.log l3-agent.log metadata-agent.log openvswitch-agent.log ovs-cleanup.log server.log
The log files use the standard logging levels defined by RFC 5424. The following table lists all log levels and provides some examples that administrators are likely to encounter: OpenStack Logging Levels Level
Explanation
TRACE
Only logged if the service has a stack trace: 2015-09-18 17:32:47.156 649 TRACE neutron __import__(mod_str) 2015-09-18 17:32:47.156 649 TRACE neutron ValueError: Empty module name
DEBUG
Logs all statements when debug is set to true in the service's configuration file.
INFO
Logs informational messages. For example, an API call to the service: 2017-05-31 13:29:10.565 3537 INFO neutron.wsgi [req-d75c4ac3c338-410e-ae43-57f12fa34151 3b98aed2205547dca61dae9d774c228f b51d4c2d48de4dc4a867a60ef1e24201 - - -] 172.25.249.200 - - [31/ May/2017 13:29:10] "GET /v2.0/ports.json?network_id=3772b5f7ee03-4ac2-9361-0119c15c5747&device_owner=network%3Adhcp HTTP/1.1" 200 1098 0.075020
AUDIT
Logs significant events affecting server state or resources.
WARNING
Logs non-fatal errors that prevents a request from executing successfully: 2017-05-30 16:33:10.628 3135 WARNING neutron.agent.securitygroups_rpc [-] Driver configuration doesn't match with enable_security_group
ERROR
Logs errors, such as miscommunication between two services: 2017-05-31 12:12:54.954 3540 ERROR oslo.messaging._drivers.impl_rabbit [-] [0536dd37-7342-4763-a9b6-ec24e605ec1e] AMQP server on 172.25.249.200:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 6 seconds. Client port: None
CRITICAL
Logs critical errors that prevent a service from properly functioning: 2017-05-30 16:43:37.259 3082 CRITICAL nova [req-4b5efa91-5f1f-4a68-8da5-8ad1f6b7e2f1 - - - - -] MessagingTimeout: Timed out waiting for a reply to message ID d778513388b748e5b99944aa42245f56
Most of the errors contain explicit statements about the nature of the problem, helping administrators troubleshoot their environment. However, there are cases where the error that is logged does not indicate the root cause of the problem. For example, if there is a critical error
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
207
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure being logged, this does not say anything about what caused that error. Such an error can be caused by a firewall rule, or by a congested network. OpenStack services communicate through a message broker, which provides a resilient communication mechanism between the services. This allows for most of the services to receive the messages even if there are network glitches. Log files contain many entries, which makes it difficult to locate errors. Administrators can use the grep command to filter on a specific log level. The following output indicates a network timeout while a message was being exchanged between the DHCP agent the OpenStack Networking server. [root@demo]# grep -R ERROR /var/log/neutron/dhcp-agent.log ERROR neutron.agent.dhcp.agent [req-515b6204-73c5-41f3-8ac6-70561bbad73f - - - - -] Failed reporting state! ERROR neutron.agent.dhcp.agent Traceback (most recent call last): ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/ dhcp/agent.py", line 688, in ERROR neutron.agent.dhcp.agent ctx, self.agent_state, True) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/ rpc.py", line 88, in report_state ERROR neutron.agent.dhcp.agent return method(context, 'report_state', **kwargs) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/ rpc/client.py", line 169, in ERROR neutron.agent.dhcp.agent retry=self.retry) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/ transport.py", line 97, in _send ERROR neutron.agent.dhcp.agent timeout=timeout, retry=retry) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/ _drivers/amqpdriver.py", line ERROR neutron.agent.dhcp.agent retry=retry) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/ _drivers/amqpdriver.py", line ERROR neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/ _drivers/amqpdriver.py", line ERROR neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout) ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/ _drivers/amqpdriver.py", line ERROR neutron.agent.dhcp.agent 'to message ID %s' % msg_id) ERROR neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID 486663d30ddc488e98c612363779f4be ERROR neutron.agent.dhcp.agent
Enabling Debug Mode All OpenStack services use the same parameter to enable the debug level, named debug, in the DEFAULT section. To enable debug mode for a given service, locate and open the configuration file. For example, to enable debug mode for the OpenStack Networking DHCP agent, edit the / etc/neutron/dhcp_agent.ini configuration file. In the file, locate the key debug and set it to True and restart the service. To disable debug mode, give the key a value of False.
Troubleshooting Tips When troubleshooting, administrators can start by drawing a diagram that details the network topology. This helps to review the network interfaces being used, and how the servers are connected to each other. They should also get familiar with most of the troubleshooting tools presented in the table titled “Troubleshooting Utilities” of this section. When troubleshooting, administrators can ask questions like: • Are the OpenStack Networking services running?
208
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Tips • Did the instance retrieve an IP address? • If an instance failed to boot, was there a port binding issue? • Are the bridges present on the network node? • Can instance be reached with the ping command in the project's network namespace? Introduction to easyOVS easyOVS, available on Github, is an open source tool for OpenStack that lists the rules or validates the configuration of Open vSwitch bridges, Netfilter rules, and DVR configuration. It can be used to map the IP address of an instance to the virtual port, or the VLAN tags and namespaces in use. The tool is fully compatible with network namespaces. The following output lists the Netfilter rules associated to a particular IP address: EasyOVS > ipt vm 192.168.0.2 ## IP = 192.168.0.2, port = qvo583c7038-d PKTS SOURCE DESTINATION #IN: 672 all all 0 all all 0 all all 0 192.168.0.4 all 3 192.168.0.5 all 8 10.0.0.2 all 85784 192.168.0.3 all #OUT: 196K all all 86155 all all 1241 all all #SRC_FILTER: 59163 192.168.0.2 all
## PROT
OTHER
all tcp icmp all all all udp
state RELATED,ESTABLISHED tcp dpt:22
udp all all
udp spt:68 dpt:67 state RELATED,ESTABLISHED
all
MAC FA:16:3E:9C:DC:3A
udp spt:67 dpt:68
The following output shows information related to a port. In the following example, c4493802 is the first portion of the port UUID that uses the IP address 10.0.0.2. EasyOVS > query 10.0.0.2,c4493802 ## port_id = f47c62b0-dbd2-4faa-9cdd-8abc886ce08f status: ACTIVE name: allowed_address_pairs: [] admin_state_up: True network_id: ea3928dc-b1fd-4a1a-940e-82b8c55214e6 tenant_id: 3a55e7b5f5504649a2dfde7147383d02 extra_dhcp_opts: [] binding:vnic_type: normal device_owner: compute:az_compute mac_address: fa:16:3e:52:7a:f2 fixed_ips: [{u'subnet_id': u'94bf94c0-6568-4520-aee3-d12b5e472128', u'ip_address': u'10.0.0.2'}] id: f47c62b0-dbd2-4faa-9cdd-8abc886ce08f security_groups: [u'7c2b801b-4590-4a1f-9837-1cceb7f6d1d0'] device_id: c3522974-8a08-481c-87b5-fe3822f5c89c ## port_id = c4493802-4344-42bd-87a6-1b783f88609a status: ACTIVE name: allowed_address_pairs: [] admin_state_up: True network_id: ea3928dc-b1fd-4a1a-940e-82b8c55214e6
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
209
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure tenant_id: 3a55e7b5f5504649a2dfde7147383d02 extra_dhcp_opts: [] binding:vnic_type: normal device_owner: compute:az_compute mac_address: fa:16:3e:94:84:90 fixed_ips: [{u'subnet_id': u'94bf94c0-6568-4520-aee3-d12b5e472128', u'ip_address': u'10.0.0.4'}] id: c4493802-4344-42bd-87a6-1b783f88609a security_groups: [u'7c2b801b-4590-4a1f-9837-1cceb7f6d1d0'] device_id: 9365c842-9228-44a6-88ad-33d7389cda5f
Troubleshooting Network Issues The following steps outline the process for tracing troubleshooting network issues. 1.
Review the security group rules to ensure that, for example, ICMP traffic is allowed.
2.
Connect to the network nodes to review the implementation of routers and networks namespaces.
3.
Use the ping command within network namespaces to reach the various network devices, such as the interface for the router in the internal network.
4.
Review the list of OpenStack Networking agents and their associated processes by using the ps command to make sure that they are running.
References Further information is available in the Networking Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/ easyOVS GitHub https://github.com/yeasy/easyOVS easyOVS Launchpad https://launchpad.net/easyovs RFC 5424: The Syslog Protocol https://datatracker.ietf.org/doc/rfc5424/
210
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Troubleshooting Network Issues
Guided Exercise: Troubleshooting Network Issues In this exercise, you will troubleshoot network connectivity issues in a project network. Outcomes You should be able to: • Review the network implementation for a project. • Use Linux tools to troubleshoot network connectivity issues. Scenario Users are complaining that they cannot get to their instances using the floating IPs. A user has provided an instance to test named research-app1 that can be used to troubleshoot the issue. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab network-troubleshooting setup command. This script creates the research project for the developer1 user and creates the /home/student/ developer1-research-rc credentials file. The SSH public key is available at /home/ student/developer1-keypair1. The script deploys the instance research-app1 in the research project with a floating IP in the provider-172.25.250 network. [student@workstation ~]$ lab network-troubleshooting setup
Steps 1. From workstation, source the credentials for the developer1 user and review the environment. 1.1. Source the credentials for the developer1 user located at /home/student/ developer1-research-rc. List the instances in the environment. [student@workstation ~]$ source developer1-research-rc [student@workstation ~(developer1-research)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "research-network1=192.168.1.N, 172.25.250.P", "ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d", "Image Name": "rhel7", "Name": "research-app1" } ]
1.2. Retrieve the name of the security group that the instance uses. [student@workstation ~(developer1-research)]$ openstack server show \ research-app1 -f json { ...output omitted...
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
211
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure "security_groups": [ { "name": "default" } ], ...output omitted... }
1.3. List the rules for the default security group. Ensure that there is one rule that allows traffic for SSH connections and one rule for ICMP traffic. [student@workstation ~(developer1-research)]$ openstack security group \ rule list default -f json [ ...output omitted... { "IP Range": "0.0.0.0/0", "Port Range": "22:22", "Remote Security Group": null, "ID": "3488a2cd-bd85-4b6e-b85c-e3cd7552fea6", "IP Protocol": "tcp" }, ...output omitted... { "IP Range": "0.0.0.0/0", "Port Range": "", "Remote Security Group": null, "ID": "f7588545-2d96-44a0-8ab7-46aa7cfbdb44", "IP Protocol": "icmp" } ]
1.4. List the networks in the environment. [student@workstation ~(developer1-research)]$ openstack network list -f json [ { "Subnets": "8647161a-ada4-468f-ad64-8b7bb6f97bda", "ID": "93e91b71-402e-45f6-a006-53a388e053f6", "Name": "provider-172.25.250" }, { "Subnets": "ebdd4578-617c-4301-a748-30b7ca479e88", "ID": "eed90913-f5f4-4e5e-8096-b59aef66c8d0", "Name": "research-network1" } ]
1.5. List the routers in the environment. [student@workstation ~(developer1-research)]$ openstack router list -f json [ { "Status": "ACTIVE", "Name": "research-router1", "Distributed": false, "Project": "ceb4194a5a3c40839a5b9ccf25c6794b", "State": "UP", "HA": false,
212
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
"ID": "8ef58601-1b60-4def-9e43-1935bb708938" } ]
The output indicates that there is one router, research-router1. 1.6. Ensure that the router research-router1 has an IP address defined as a gateway for the 172.25.250.0/24 network and an interface in the research-network1 network. [student@workstation ~(developer1-research)]$ neutron router-port-list \ research-router1 -f json [ { "mac_address": "fa:16:3e:28:e8:85", "fixed_ips": "{\"subnet_id\": \"ebdd4578-617c-4301-a748-30b7ca479e88\", \"ip_address\": \"192.168.1.S\"}", "id": "096c6e18-3630-4993-bafa-206e2f71acb6", "name": "" }, { "mac_address": "fa:16:3e:d2:71:19", "fixed_ips": "{\"subnet_id\": \"8647161a-ada4-468f-ad64-8b7bb6f97bda\", \"ip_address\": \"172.25.250.R\"}", "id": "c684682c-8acc-450d-9935-33234e2838a4", "name": "" } ]
2.
Retrieve the floating IP assigned to the research-app1 instance and run the ping command against the floating IP assigned to the instance, 172.25.250.P. The command should fail. [student@workstation ~(developer1-research)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "research-network1=192.168.1.N, 172.25.250.P", "ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d", "Image Name": "rhel7", "Name": "research-app1" } ] [student@workstation ~(developer1-research)]$ ping -c 3 172.25.250.P PING 172.25.250.P (172.25.250.P) 56(84) bytes of data. From 172.25.250.254 icmp_seq=1 Destination Host Unreachable From 172.25.250.254 icmp_seq=2 Destination Host Unreachable From 172.25.250.254 icmp_seq=3 Destination Host Unreachable --- 172.25.250.P ping statistics --3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 1999ms
3.
Attempt to connect to the instance as the root at its floating IP. The command should fail. [student@workstation ~(developer1-research)]$ ssh [email protected] ssh: connect to host 172.25.250.P port 22: No route to host
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
213
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 4.
Reach the IP address assigned to the router in the provider network, 172.25.250.R. [student@workstation ~(developer1-research)]$ openstack router show \ research-router1 -f json { "external_gateway_info": "{\"network_id\": ...output omitted... \"ip_address\": \"172.25.250.R\"}]}", ...output omitted... } [student@workstation ~(developer1-research)]$ PING 172.25.250.R (172.25.250.R) 56(84) bytes 64 bytes from 172.25.250.R: icmp_seq=1 ttl=64 64 bytes from 172.25.250.R: icmp_seq=2 ttl=64 64 bytes from 172.25.250.R: icmp_seq=3 ttl=64
ping -c 3 172.25.250.R of data. time=0.642 ms time=0.238 ms time=0.184 ms
--- 172.25.250.R ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.184/0.354/0.642/0.205 ms
5.
Review the namespaces implementation on controller0. Use the ping command within the qrouter namespace to reach the router's private IP. 5.1. Retrieve the UUID of the router research-router1. You will compare this UUID with the one of the qrouter namespace. [student@workstation ~(developer1-research)]$ openstack router show \ research-router1 -f json { ...output omitted... "id": "8ef58601-1b60-4def-9e43-1935bb708938", "name": "research-router1" }
5.2. Open another terminal and use the ssh command to log in to controller0 as the heat-admin user. Review the namespace implementation. Ensure that the qrouter namespace uses the ID returned by the previous command. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ sudo ip netns list qrouter-8ef58601-1b60-4def-9e43-1935bb708938
5.3. List the network devices in the qrouter namespace. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-8ef58601-1b60-4def-9e43-1935bb708938 ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host 52: qr-096c6e18-36: mtu 1446 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:28:e8:85 brd ff:ff:ff:ff:ff:ff inet 192.168.1.S/24 brd 192.168.1.255 scope global qr-096c6e18-36
214
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe28:e885/64 scope link valid_lft forever preferred_lft forever 53: qg-c684682c-8a: mtu 1496 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:d2:71:19 brd ff:ff:ff:ff:ff:ff inet 172.25.250.R/24 brd 172.25.250.255 scope global qg-c684682c-8a valid_lft forever preferred_lft forever inet 172.25.250.P/32 brd 172.25.250.108 scope global qg-c684682c-8a valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fed2:7119/64 scope link valid_lft forever preferred_lft forever
The output indicates that there are three devices: the loopback interface, lo, the TAP device with the IP 172.25.250.R, and 172.25.250.P. 5.4. Within the qrouter namespace, run the ping command against the private IP of the router, 192.168.1.S. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-8ef58601-1b60-4def-9e43-1935bb708938 ping -c 3 192.168.1.S PING 192.168.1.S (192.168.1.S) 56(84) bytes of data. 64 bytes from 192.168.1.S: icmp_seq=1 ttl=64 time=0.070 ms 64 bytes from 192.168.1.S: icmp_seq=2 ttl=64 time=0.041 ms 64 bytes from 192.168.1.S: icmp_seq=3 ttl=64 time=0.030 ms --- 192.168.1.S ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.030/0.047/0.070/0.016 ms
6.
From the first terminal, retrieve the private IP of the research-app1 instance. From the second terminal, run the ping command against the private IP of the instance IP within the qrouter namespace. 6.1. From the first terminal, retrieve the private IP of the research-app1 instance. [student@workstation ~(developer1-research)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "research-network1=192.168.1.N, 172.25.250.P", "ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d", "Image Name": "rhel7", "Name": "research-app1" } ]
6.2. From the second terminal, run the ping command in the qrouter namespace against 192.168.1.N. The output indicates that the command fails. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-8ef58601-1b60-4def-9e43-1935bb708938 ping -c 3 192.168.1.N PING 192.168.1.N (192.168.1.N) 56(84) bytes of data. From 192.168.1.S icmp_seq=1 Destination Host Unreachable From 192.168.1.S icmp_seq=2 Destination Host Unreachable From 192.168.1.S icmp_seq=3 Destination Host Unreachable --- 192.168.1.N ping statistics ---
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
215
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2000ms
7.
The previous output that listed the namespace indicated that the qdhcp namespace is missing. Review the namespaces in controller0 to confirm that the namespace is missing. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns list qrouter-8ef58601-1b60-4def-9e43-1935bb708938
8.
The qdhcp namespace is created for the DHCP agents. List the running processes on controller0. Use the grep command to filter dnsmasq processes. The output indicates that no dnsmasq is running on the server. [heat-admin@overcloud-controller-0 ~]$ ps axl | grep dnsmasq 0 1000 579973 534047 20 0 112648 960 pipe_w S+ pts/1 color=auto dnsmasq
9.
0:00 grep --
From the first terminal, source the credentials of the administrative user, architect1, located at /home/student/architect1-research-rc. List the Neutron agents to ensure that there is one DHCP agent. [student@workstation ~(developer1-research)]$ source architect1-research-rc [student@workstation ~(architect1-research)]$ neutron agent-list -f json ...output omitted... { "binary": "neutron-dhcp-agent", "admin_state_up": true, "availability_zone": "nova", "alive": ":-)", "host": "overcloud-controller-0.localdomain", "agent_type": "DHCP agent", "id": "98fe6c9b-3f66-4d14-a88a-bfd7d819ddb7" }, ...output omitted...
10. List the Neutron ports to ensure that there is one IP assigned to the DHCP agent in the 192.168.1.0/24 network. [student@workstation ~(architect1-research)]$ openstack port list \ -f json | grep 192.168.1 "Fixed IP Addresses": "ip_address='192.168.1.S', subnet_id='ebdd4578-617c-4301a748-30b7ca479e88'", "Fixed IP Addresses": "ip_address='192.168.1.N', subnet_id='ebdd4578-617c-4301a748-30b7ca479e88'",
The output indicates that there are two ports in the subnet. This indicates that the research-subnet1 does not run a DHCP server. 11.
Update the subnet to run a DHCP server and confirm the updates in the environment. 11.1. Review the subnet properties. Locate the enable_dhcp property and confirm that it reads False. [student@workstation ~(architect1-research)]$ openstack subnet show \ research-subnet1
216
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
+-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | 192.168.1.2-192.168.1.254 | | cidr | 192.168.1.0/24 | ...output omitted... | enable_dhcp | False | ...output omitted...
11.2. Run the openstack subnet set command to update the subnet. The command does not produce any output. [student@workstation ~(architect1-research)]$ openstack subnet \ set --dhcp research-subnet1
11.3. Review the updated subnet properties. Locate the enable_dhcp property and confirm that it reads True. [student@workstation ~(architect1-research)]$ openstack subnet show \ research-subnet1 +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | 192.168.1.2-192.168.1.254 | | cidr | 192.168.1.0/24 | ...output omitted... | enable_dhcp | True | ...output omitted...
11.4. From the terminal connected to controller0, rerun the ps command. Ensure that a dnsmasq is now running. [heat-admin@overcloud-controller-0 ~]$ ps axl | grep dnsmasq 5 99 649028 1 20 0 15548 892 poll_s S ? --no-hosts \ --no-resolv \ --strict-order \ --except-interface=lo \ ...output omitted... --dhcp-match=set:ipxe,175 \ --bind-interfaces \ --interface=tapdc429585-22 \ --dhcp-range=set:tag0,192.168.1.0,static,86400s \ --dhcp-option-force=option:mtu,1446 \ --dhcp-lease-max=256 \ --conf-file= \ --domain=openstacklocal 0 1000 650642 534047 20 0 112648 960 pipe_w S+ pts/1 color=auto dnsmasq
0:00 dnsmasq
0:00 grep --
11.5. From the first terminal, rerun the openstack port list command. Ensure that there is a third IP in the research-subnet1 network. [student@workstation ~(architect1-research)]$ openstack port list \ -f json | grep 192.168.1
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
217
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure "Fixed IP Addresses": "ip_address='192.168.1.S', subnet_id='ebdd4578-617c-4301-a748-30b7ca479e88'", "Fixed IP Addresses": "ip_address='192.168.1.N', subnet_id='ebdd4578-617c-4301-a748-30b7ca479e88'" "Fixed IP Addresses": "ip_address='192.168.1.2', subnet_id='ebdd4578-617c-4301-a748-30b7ca479e88'",
11.6. From the terminal connected to controller0, list the network namespaces. Ensure that there is a new namespace called qdhcp. [heat-admin@overcloud-controller-0 ~]$ ip netns list qdhcp-eed90913-f5f4-4e5e-8096-b59aef66c8d0 qrouter-8ef58601-1b60-4def-9e43-1935bb708938
11.7. List the interfaces in the qdhcp namespace. Confirm that there is an interface with an IP address of 192.168.1.2. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qdhcp-eed90913-f5f4-4e5e-8096-b59aef66c8d0 ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 61: tap7e9c0f8b-a7: mtu 1446 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:7c:45:e1 brd ff:ff:ff:ff:ff:ff inet 192.168.1.2/24 brd 192.168.1.255 scope global tap7e9c0f8b-a7 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe7c:45e1/64 scope link valid_lft forever preferred_lft forever
12. From the first terminal, stop then start the research-app1 instance to reinitialize IP assignment and cloud-init configuration. 12.1. Stop the instance. [student@workstation ~(architect1-research)]$ openstack server stop \ research-app1
12.2.Confirm the instance is down. [student@workstation ~(architect1-research)]$ openstack server show \ research-app1 -c status -f value SHUTOFF
12.3.Start the instance. [student@workstation ~(architect1-research)]$ openstack server start \ research-app1
13. Confirm that the instance is reachable.
218
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
13.1. Verify the floating IP is assigned to the research-app1 instance. [student@workstation ~(architect1-research)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "research-network1=192.168.1.N, 172.25.250.P", "ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d", "Image Name": "rhel7", "Name": "research-app1" } ]
13.2.Run the ping command against the floating IP 172.25.250.P until it responds. [student@workstation ~(architect1-research)]$ ping 172.25.250.P PING 172.25.250.P (172.25.250.P) 56(84) bytes of data. ...output omitted... From 172.25.250.P icmp_seq=22 Destination Host Unreachable From 172.25.250.P icmp_seq=23 Destination Host Unreachable From 172.25.250.P icmp_seq=24 Destination Host Unreachable From 172.25.250.P icmp_seq=25 Destination Host Unreachable 64 bytes from 172.25.250.P: icmp_seq=26 ttl=63 time=1.02 ms 64 bytes from 172.25.250.P: icmp_seq=27 ttl=63 time=0.819 ms 64 bytes from 172.25.250.P: icmp_seq=28 ttl=63 time=0.697 ms ...output omitted... ^C --- 172.25.250.P ping statistics --35 packets transmitted, 10 received, +16 errors, 71% packet loss, time 34019ms rtt min/avg/max/mdev = 4.704/313.475/2025.262/646.005 ms, pipe 4
13.3. Use ssh to connect to the instance. When finished, exit from the instance. [student@workstation ~(architect1-research)]$ ssh -i developer1-keypair1.pem \ [email protected] [cloud-user@research-app1 ~]$ exit
Cleanup From workstation, run the lab network-troubleshooting cleanup script to clean up the resources created in this exercise. [student@workstation ~]$ lab network-troubleshooting cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
219
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Lab: Managing and Troubleshooting Virtual Network Infrastructure In this lab, you will troubleshoot the network connectivity of OpenStack instances. Outcomes You should be able to: • Use Linux tools to review the network configuration of instances. • Review the network namespaces for a project. • Restore the network connectivity of OpenStack instances. Scenario Cloud users reported issues reaching their instances via their floating IPs. Both ping and ssh connections time out. You have been tasked with troubleshooting and fixing these issues. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab network-review setup command. This script creates the production project for the operator1 user and creates the /home/student/operator1production-rc credentials file. The SSH public key is available at /home/student/ operator1-keypair1.pem. The script deploys the instance production-app1 in the production project with a floating IP in the provider-172.25.250 network. [student@workstation ~]$ lab network-review setup
Steps 1. As the operator1 user, list the instances present in the environment. The credentials file for the user is available at /home/student/operator1-production-rc. Ensure that the instance production-app1 is running and has an IP in the 192.168.1.0/24 network 2.
Attempt to reach the instance via its floating IP by using the ping and ssh commands. Confirm that the commands time out. The private key for the SSH connection is available at /home/student/operator1-keypair1.pem.
3.
Review the security rules for the security group assigned to the instance. Ensure that there is a rule that authorizes packets sent by the ping command to pass.
4.
As the administrative user, architect1, ensure that the external network provider-172.25.250 is present. The credentials file for the user is available at /home/ student/architect1-production-rc. Review the network type and the physical network defined for the network. Ensure that the network is a flat network that uses the datacentre provider network.
5.
As the operator1 user, list the routers in the environment. Ensure that productionrouter1 is present, has a private network port, and is the gateway for the external network.
220
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
6.
From the compute node, review the network implementation by listing the Linux bridges and ensure that the ports are properly defined. Ensure that there is one bridge with two ports in it. The bridge and the port names should be named after the first 10 characters of the port UUID in the private network for the instance production-app1.
7.
From workstation, use the ssh command to log in to controller0 as the heat-admin user. List the network namespaces to ensure that there is a namespace for the router and for the internal network production-network1. Review the UUID of the router and the UUID of the internal network to make sure they match the UUIDs of the namespaces. List the interfaces in the network namespace for the internal network. Within the private network namespace, use the ping command to reach the private IP address of the router. Run the ping command within the qrouter namespace against the IP assigned as a gateway to the router. From the tenant network namespace, use the ping command to reach the private IP of the instance.
8.
From controller0, review the bridge mappings configuration. Ensure that the provider network named datacentre is mapped to the br-ex bridge. Review the configuration of the Open vSwitch bridge br-int. Ensure that there is a patch port for the connection between the integration bridge and the external bridge. Retrieve the name of the peer port for the patch from the integration bridge to the external bridge. Make any necessary changes.
9.
From workstation use the ping command to reach the IP defined as a gateway for the router and the floating IP associated to the instance. Use the ssh command to log in to the instance production-app1 as the cloud-user user. The private key is available at / home/student/operator1-keypair1.pem.
Evaluation From workstation, run the lab network-review grade command to confirm the success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~]$ lab network-review grade
Cleanup From workstation, run the lab network-review cleanup command to clean up this exercise. [student@workstation ~]$ lab network-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
221
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Solution In this lab, you will troubleshoot the network connectivity of OpenStack instances. Outcomes You should be able to: • Use Linux tools to review the network configuration of instances. • Review the network namespaces for a project. • Restore the network connectivity of OpenStack instances. Scenario Cloud users reported issues reaching their instances via their floating IPs. Both ping and ssh connections time out. You have been tasked with troubleshooting and fixing these issues. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab network-review setup command. This script creates the production project for the operator1 user and creates the /home/student/operator1production-rc credentials file. The SSH public key is available at /home/student/ operator1-keypair1.pem. The script deploys the instance production-app1 in the production project with a floating IP in the provider-172.25.250 network. [student@workstation ~]$ lab network-review setup
Steps 1. As the operator1 user, list the instances present in the environment. The credentials file for the user is available at /home/student/operator1-production-rc. Ensure that the instance production-app1 is running and has an IP in the 192.168.1.0/24 network 1.1. From workstation, source the operator1-production-rc file and list the running instances. [student@workstation ~]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "production-network1=192.168.1.N, 172.25.250.P", "ID": "ab497ff3-0335-4b17-bd3d-5aa2a4497bf0", "Image Name": "rhel7", "Name": "production-app1" } ]
2.
Attempt to reach the instance via its floating IP by using the ping and ssh commands. Confirm that the commands time out. The private key for the SSH connection is available at /home/student/operator1-keypair1.pem. 2.1. Run the ping command against the floating IP 172.25.250.P. The command should fail.
222
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
[student@workstation ~(operator1-production)]$ ping -c 3 172.25.250.P PING 172.25.250.P (172.25.250.P) 56(84) bytes of data. From 172.25.250.254 icmp_seq=1 Destination Host Unreachable From 172.25.250.254 icmp_seq=2 Destination Host Unreachable From 172.25.250.254 icmp_seq=3 Destination Host Unreachable --- 172.25.250.102 ping statistics --3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 1999ms
2.2. Attempt to connect to the instance using the ssh command. The command should fail. [student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \ [email protected] ssh: connect to host 172.25.250.P port 22: No route to host
3.
Review the security rules for the security group assigned to the instance. Ensure that there is a rule that authorizes packets sent by the ping command to pass. 3.1. Retrieve the name of the security group that the instance production-app1 uses. [student@workstation ~(operator1-production)]$ openstack server show \ production-app1 +--------------------------------------+-------------------------+ | Field | Value | +--------------------------------------+-------------------------+ ...output omitted... | security_groups | [{u'name': u'default'} | ...output omitted... +--------------------------------------+-------------------------+
3.2. List the rules in the default security group. Ensure that there is a rule for ICMP traffic. [student@workstation ~(operator1-production)]$ openstack security group \ rule list default -f json [ { "IP Range": "0.0.0.0/0", "Port Range": "", "Remote Security Group": null, "ID": "68baac6e-7981-4326-a054-e8014565be6e", "IP Protocol": "icmp" }, ...output omitted...
The output indicates that there is a rule for the ICMP traffic. This indicates that the environment requires further troubleshooting. 4.
As the administrative user, architect1, ensure that the external network provider-172.25.250 is present. The credentials file for the user is available at /home/ student/architect1-production-rc. Review the network type and the physical network defined for the network. Ensure that the network is a flat network that uses the datacentre provider network.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
223
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 4.1. Source the architect1 credentials. List the networks. Confirm that the provider-172.25.250 network is present. [student@workstation ~(operator1-production)]$ source ~/architect1-production-rc [student@workstation ~(architect1-production)]$ openstack network list -f json [ { "Subnets": "2b5110fd-213f-45e6-8761-2e4a2bcb1457", "ID": "905b4d65-c20f-4cac-88af-2b8e0d2cf47e", "Name": "provider-172.25.250" }, { "Subnets": "a4c40acb-f532-4b99-b8e5-d1df14aa50cf", "ID": "712a28a3-0278-4b4e-94f6-388405c42595", "Name": "production-network1" } ]
4.2. Review the provider-172.25.250 network details, including the network type and the physical network defined. [student@workstation ~(architect1-production)]$ openstack network \ show provider-172.25.250 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | nova | | created_at | 2017-06-02T16:37:48Z | | description | | | id | 905b4d65-c20f-4cac-88af-2b8e0d2cf47e | | ipv4_address_scope | None | | ipv6_address_scope | None | | is_default | False | | mtu | 1496 | | name | provider-172.25.250 | | port_security_enabled | True | | project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 | | project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 | | provider:network_type | flat | | provider:physical_network | datacentre | | provider:segmentation_id | None | | qos_policy_id | None | | revision_number | 6 | | router:external | External | | shared | True | | status | ACTIVE | | subnets | 2b5110fd-213f-45e6-8761-2e4a2bcb1457 | | tags | [] | | updated_at | 2017-06-02T16:37:51Z | +---------------------------+--------------------------------------+
5.
As the operator1 user, list the routers in the environment. Ensure that productionrouter1 is present, has a private network port, and is the gateway for the external network. 5.1. Source the operator1-production-rc credentials file and list the routers in the environment.
224
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
[student@workstation ~(architect1-production)]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack router list -f json [ { "Status": "ACTIVE", "Name": "production-router1", "Distributed": "", "Project": "91f3ed0e78ad476495a6ad94fbd7d2c1", "State": "UP", "HA": "", "ID": "e64e7ed3-8c63-49ab-8700-0206d1b0f954" } ]
5.2. Display the router details. Confirm that the router is the gateway for the external network provider-172.25.250. [student@workstation ~(operator1-production)]$ openstack router show \ production-router1 +-------------------------+-----------------------------------------+ | Field | Value | +-------------------------+-----------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | nova | | created_at | 2017-06-02T17:25:00Z | | description | | | external_gateway_info | {"network_id": "905b(...)f47e", | | | "enable_snat": true, | | | "external_fixed_ips": | | | [{"subnet_id": "2b51(...)1457", | | | "ip_address": | | | "172.25.250.S"}]} | | flavor_id | None | | id | e64e7ed3-8c63-49ab-8700-0206d1b0f954 | | name | production-router1 | | project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 | | project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 | | revision_number | 7 | | routes | | | status | ACTIVE | | updated_at | 2017-06-02T17:25:04Z | +-------------------------+-----------------------------------------+
5.3. Use ping to test the IP defined as the router gateway interface. Observe the command timing out. [student@workstation ~(operator1-production)]$ ping -W 5 -c 3 172.25.250.S PING 172.25.250.S (172.25.250.S) 56(84) bytes of data. --- 172.25.250.S ping statistics --3 packets transmitted, 0 received, 100% packet loss, time 1999ms
The ping test was unable to reach the external gateway interface of the router from an external host, but the root cause is still unknown, so continue troubleshooting.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
225
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure 6.
From the compute node, review the network implementation by listing the Linux bridges and ensure that the ports are properly defined. Ensure that there is one bridge with two ports in it. The bridge and the port names should be named after the first 10 characters of the port UUID in the private network for the instance production-app1. From workstation, use ssh to connect to compute0 as the heat-admin user. Review the configuration of the Open vSwitch integration bridge. Ensure that the vEth pair, which has a port associated to the bridge, has another port in the integration bridge. Exit from the virtual machine. 6.1. From the first terminal, list the network ports. Ensure that the UUID matches the private IP of the instance. In this example, the UUID is 04b3f285-7183-4673-836b-317d80c27904, which matches the characters displayed above. [student@workstation ~(operator1-production)]$ openstack port list -f json [ { "Fixed IP Addresses": "ip_address='192.168.1.N', subnet_id='a4c40acb-f532-4b99-b8e5-d1df14aa50cf'", "ID": "04b3f285-7183-4673-836b-317d80c27904", "MAC Address": "fa:16:3e:c8:cb:3d", "Name": "" }, ...output omitted...
6.2. Use the ssh command to log in to compute0 as the heat-admin user. Use the brctl command to list the Linux bridges. Ensure that there is a qbr bridge with two ports in it. The bridge and the ports should be named after the first 10 characters of the port of the instance in the private network. [student@workstation ~] $ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$ brctl show bridge name bridge id STP enabled interfaces qbr04b3f285-71 8000.9edbfc39d5a5 no qvb04b3f285-71 tap04b3f285-71
6.3. As the root user from the compute0 virtual machine, list the network ports in the integration bridge, br-int. Ensure that the port of the vEth pair qvo is present in the integration bridge. [heat-admin@overcloud-compute-0 ~]$ sudo ovs-vsctl list-ifaces br-int int-br-ex patch-tun qvo04b3f285-71
The qvo port exists as expected, so continue troubleshooting. 6.4. Exit from compute0. [heat-admin@overcloud-compute-0 ~]$ exit [stack@director ~] $
226
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 7.
From workstation, use the ssh command to log in to controller0 as the heat-admin user. List the network namespaces to ensure that there is a namespace for the router and for the internal network production-network1. Review the UUID of the router and the UUID of the internal network to make sure they match the UUIDs of the namespaces. List the interfaces in the network namespace for the internal network. Within the private network namespace, use the ping command to reach the private IP address of the router. Run the ping command within the qrouter namespace against the IP assigned as a gateway to the router. From the tenant network namespace, use the ping command to reach the private IP of the instance. 7.1. Use the ssh command to log in to controller0 as the heat-admin user. List the network namespaces. [student@workstation ~] $ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ ip netns list qrouter-e64e7ed3-8c63-49ab-8700-0206d1b0f954 qdhcp-712a28a3-0278-4b4e-94f6-388405c42595
7.2. From the previous terminal, retrieve the UUID of the router production-router1. Ensure that the output matches the qrouter namespace. [student@workstation ~(operator1-production)]$ openstack router show \ production-router1 +-------------------------+-------------------------------------------+ | Field | Value | +-------------------------+-------------------------------------------+ ...output omitted... | | flavor_id | None | | id | e64e7ed3-8c63-49ab-8700-0206d1b0f954 | | name | production-router1 | ...output omitted... | updated_at | 2017-06-02T17:25:04Z | +-------------------------+-------------------------------------------+
7.3. Retrieve the UUID of the private network, production-network1. Ensure that the output matches the qdhcp namespace. [student@workstation ~(operator1-production)]$ openstack network \ show production-network1 +-------------------------+--------------------------------------+ | Field | Value | +-------------------------+--------------------------------------+ ...output omitted... | description | | | id | 712a28a3-0278-4b4e-94f6-388405c42595 | ...output omitted... +-------------------------+--------------------------------------+
7.4. Use the neutron command to retrieve the interfaces of the router productionrouter1. [student@workstation ~(operator1-production)]$ neutron router-port-list \ production-router1 +--------------------------------------+------+-------------------+
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
227
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure | id | name | mac_address | +--------------------------------------+------+-------------------+ | 30fc535c-85a9-4be4-b219-e810deec88d1 | | fa:16:3e:d4:68:d3 | | bda4e07f-64f4-481d-a0bd-01791c39df92 | | fa:16:3e:90:4f:45 | +--------------------------------------+------+-------------------+ -------------------------------------------------------------------+ fixed_ips | -------------------------------------------------------------------+ {"subnet_id": "a4c40acb-f532-4b99-b8e5-d1df14aa50cf", "ip_address": "192.168.1.R"} | {"subnet_id": "2b5110fd-213f-45e6-8761-2e4a2bcb1457", "ip_address": "172.25.250.S"} | -------------------------------------------------------------------+
7.5. From the terminal connected to the controller, use the ping command within the qdhcp namespace to reach the private IP of the router. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qdhcp-712a28a3-0278-4b4e-94f6-388405c42595 ping -c 3 192.168.1.R PING 192.168.1.R (192.168.1.R) 56(84) bytes of data. 64 bytes from 192.168.1.R: icmp_seq=1 ttl=64 time=0.107 ms 64 bytes from 192.168.1.R: icmp_seq=2 ttl=64 time=0.041 ms 64 bytes from 192.168.1.R: icmp_seq=3 ttl=64 time=0.639 ms --- 192.168.1.1 ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.041/0.262/0.639/0.268 ms
7.6. Within the router namespace, qrouter, run the ping command against the IP defined as a gateway in the 172.25.250.0/24 network. [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qrouter-e64e7ed3-8c63-49ab-8700-0206d1b0f954 ping -c 3 172.25.250.S PING 172.25.250.S (172.25.250.S) 56(84) bytes of data. 64 bytes from 172.25.250.S: icmp_seq=1 ttl=64 time=0.091 ms 64 bytes from 172.25.250.S: icmp_seq=2 ttl=64 time=0.037 ms 64 bytes from 172.25.250.S: icmp_seq=3 ttl=64 time=0.597 ms --- 172.25.250.25 ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.037/0.241/0.597/0.252 ms
7.7. Retrieve the IP of the instance in the internal network. [student@workstation ~(operator1-production)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "production-network1=192.168.1.N, 172.25.250.P", "ID": "ab497ff3-0335-4b17-bd3d-5aa2a4497bf0", "Image Name": "rhel7", "Name": "production-app1" } ]
7.8. Use the ping command in the same namespace to reach the private IP of the instance production-app1.
228
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \ qdhcp-712a28a3-0278-4b4e-94f6-388405c42595 ping -c 3 192.168.1.N PING 192.168.1.N (192.168.1.N) 56(84) bytes of data. 64 bytes from 192.168.1.N: icmp_seq=1 ttl=64 time=0.107 ms 64 bytes from 192.168.1.N: icmp_seq=2 ttl=64 time=0.041 ms 64 bytes from 192.168.1.N: icmp_seq=3 ttl=64 time=0.639 ms --- 192.168.1.1 ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.041/0.262/0.639/0.268 ms
8.
From controller0, review the bridge mappings configuration. Ensure that the provider network named datacentre is mapped to the br-ex bridge. Review the configuration of the Open vSwitch bridge br-int. Ensure that there is a patch port for the connection between the integration bridge and the external bridge. Retrieve the name of the peer port for the patch from the integration bridge to the external bridge. Make any necessary changes. 8.1. From controller0, as the root user, review the bridge mappings configuration. Bridge mappings for Open vSwitch are defined in the /etc/neutron/plugins/ml2/ openvswitch_agent.ini configuration file. Ensure that the provider network name, datacentre, is mapped to the br-ex bridge. [heat-admin@overcloud-controller-0 ~]$ cd /etc/neutron/plugins/ml2/ [heat-admin@overcloud-controller-0 ml2]$ sudo grep \ bridge_mappings openvswitch_agent.ini #bridge_mappings = bridge_mappings =datacentre:br-ex
8.2. Review the ports in the integration bridge br-int. Ensure that there is a patch port in the integration bridge. The output lists phy-br-ex as the peer for the patch [heat-admin@overcloud-controller-0 ml2]$ sudo ovs-vsctl show ...output omitted... Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "tapfabe9e7e-0b" tag: 1 Interface "tapfabe9e7e-0b" type: internal Port "qg-bda4e07f-64" tag: 3 Interface "qg-bda4e07f-64" type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "qr-30fc535c-85" tag: 1 Interface "qr-30fc535c-85"
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
229
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure type: internal Port br-int Interface br-int type: internal
8.3. List the ports in the external bridge, br-ex. The output indicates that the port phy-brex is absent from the bridge. [heat-admin@overcloud-controller-0 ml2]$ sudo ovs-vsctl show ...output omitted... Bridge br-ex Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "eth2" Interface "eth2" Port br-ex Interface br-ex type: internal
8.4. Patch ports are managed by the neutron-openvswitch-agent, which uses the bridge mappings for Open vSwitch bridges. Restart the neutron-openvswitchagent. [heat-admin@overcloud-controller-0 ml2]$ sudo systemctl restart \ neutron-openvswitch-agent.service
8.5. Wait a minute then list the ports in the external bridge, br-ex. Ensure that the patch port phy-br-ex is present in the external bridge. [heat-admin@overcloud-controller-0 ml2]$ sudo ovs-vsctl show ...output omitted... Bridge br-ex Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "eth2" Interface "eth2" Port br-ex Interface br-ex type: internal Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex}
9.
From workstation use the ping command to reach the IP defined as a gateway for the router and the floating IP associated to the instance. Use the ssh command to log in to the instance production-app1 as the cloud-user user. The private key is available at / home/student/operator1-keypair1.pem. 9.1. Use the ping command to reach the IP of the router defined as a gateway. [student@workstation ~(operator1-production)]$ ping -W 5 -c 3 172.25.250.S PING 172.25.250.S (172.25.250.S) 56(84) bytes of data. 64 bytes from 172.25.250.S: icmp_seq=1 ttl=64 time=0.658 ms
230
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 64 bytes from 172.25.250.S: icmp_seq=2 ttl=64 time=0.273 ms 64 bytes from 172.25.250.S: icmp_seq=3 ttl=64 time=0.297 ms --- 172.25.250.25 ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.273/0.409/0.658/0.176 ms
9.2. Retrieve the floating IP allocated to the production-app1 instance. [student@workstation ~(operator1-production)]$ openstack server list -f json [ { "Status": "ACTIVE", "Networks": "production-network1=192.168.1.N, 172.25.250.P", "ID": "ab497ff3-0335-4b17-bd3d-5aa2a4497bf0", "Image Name": "rhel7", "Name": "production-app1" } ]
9.3. Use the ping command to reach the floating IP allocated to the instance. [student@workstation ~(operator1-production)]$ ping -W 5 PING 172.25.250.P (172.25.250.P) 56(84) bytes of data. 64 bytes from 172.25.250.P: icmp_seq=1 ttl=63 time=0.658 64 bytes from 172.25.250.P: icmp_seq=2 ttl=63 time=0.616 64 bytes from 172.25.250.P: icmp_seq=3 ttl=63 time=0.690
-c 3 172.25.250.P ms ms ms
--- 172.25.250.P ping statistics --3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.616/0.654/0.690/0.042 ms
9.4. Use the ssh command to log in to the instance as the cloud-user user. The private key is available at /home/student/operator1-keypair1.pem. Exit from the instance. [student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1 \ [email protected] [cloud-user@production-app1 ~]$ exit [student@workstation ~(operator1-production)]$
Evaluation From workstation, run the lab network-review grade command to confirm the success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~]$ lab network-review grade
Cleanup From workstation, run the lab network-review cleanup command to clean up this exercise. [student@workstation ~]$ lab network-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
231
Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure
Summary In this chapter, you learned: • Software-defined networking (SDN) is a networking model that allows network administrators to manage network services through the abstraction of several networking layers. SDN decouples the software that handles the traffic, called the control plane, and the underlying mechanisms that route the traffic, called the data plane. • OpenStack Networking (Neutron) is the SDN networking project that provides Networking-asa-service (NaaS) in virtual environments. It implements traditional networking features such as subnetting, bridging, VLANs, and more recent technologies, such as VXLAN and GRE tunnels. • The Modular Layer 2 (ML2) plug-in is a framework that enables the usage of various technologies. Administrators can interact with Open vSwitch or any vendor technology, such as Cisco equipments, thanks to the various plug-ins available for OpenStack Networking. • When troubleshooting, administrators can use a variety of tools, such as ping, ip, traceroute, and tcpdump.
232
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 6
MANAGING RESILIENT COMPUTE RESOURCES Overview Goal
Add compute nodes, manage shared storage, and perform instance live migration.
Objectives
• View introspection data, orchestration templates, and configuration manifests used to build the Overcloud. • Add a compute node to the Overcloud using the Undercloud • Perform instance live migration using block storage. • Configure shared storage for Nova compute services and perform instance live migration with shared storage.
Sections
• Configuring an Overcloud Deployment (and Guided Exercise) • Scaling Compute Nodes (and Guided Exercise) • Migrating Instances using Block Storage (and Guided Exercise) • Migrating Instances using Shared Storage (and Guided Exercise)
Lab
• Managing Resilient Compute Resources
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
233
Chapter 6. Managing Resilient Compute Resources
Configuring an Overcloud Deployment Objectives After completing this section, students should be able to: • Prepare to deploy an overcloud • Describe the undercloud introspection process • Describe the overcloud orchestration process Red Hat OpenStack Platform director is the undercloud, with components for provision and managing the infrastructure nodes that will become the overcloud. An undercloud is responsible for planning overcloud roles, creating the provisioning network configuration and services, locating and inventorying nodes prior to deployment, and running the workflow service that facilitates the deployment process. The Red Hat OpenStack Platform director installation comes complete with sample deployment templates and both command-line and web-based user interface tools for configuring and monitoring overcloud deployments.
Note Underclouds and tools for provisioning overclouds are relatively new technologies and are still evolving. The choices for overcloud design and configuration are as limitless as the use cases for which they are built. The following demonstration and lecture is an introduction to undercloud tasks and overcloud preparation, and is not intended to portray recommended practice for any specific use case. The cloud architecture presented here is designed to satisfy the technical requirements of this classroom.
Introspecting Nodes To provision overcloud nodes, the undercloud is configured with a provisioning network and IPMI access information about the nodes it will manage. The provisioning network is a large-capacity, dedicated, and isolated network, separate from the normal public network. During deployment, orchestration will reconfigure nodes' network interfaces with Open vSwitch bridges, which would cause the deployment process to disconnect if the provisioning and deployed networks shared the same interface. After deployment, Red Hat OpenStack Platform director will continue to manage and update the overcloud across this isolated, secure provisioning network, completely segregated from both external and internal OpenStack traffic. Verify the provisioning network View the undercloud.conf file, created to build the undercloud, to verify the provisioning network. In the output below, the DHCP address range, from dhcp_start to dhcp_end, is the scope for the OpenStack Networking dnsmasq service managing the provisioning subnet. Nodes deployed to the provisioning network are assigned an IP address from this scope for their provisioning NIC. The inspection_iprange is the scope for the bare metal dnsmasq service, for assigning addresses temporarily to registered nodes during the PXE boot at the start of the introspection process. [user@undercloud]$ head -12 undercloud.conf
234
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Introspecting Nodes [DEFAULT] local_ip = 172.25.249.200/24 undercloud_public_vip = 172.25.249.201 undercloud_admin_vip = 172.25.249.202 local_interface = eth0 masquerade_network = 172.25.249.0/24 dhcp_start = 172.25.249.51 dhcp_end = 172.25.249.59 network_cidr = 172.25.249.0/24 network_gateway = 172.25.249.200 inspection_iprange = 172.25.249.150,172.25.249.180 generate_service_certificate = true
View the undercloud's configured network interfaces. The br-ctlplane bridge is the 172.25.249.0 provisioning network; the eth1 interface is the 172.25.250.0 public network. [user@undercloud]$ ip a | grep -E 'br-ctlplane|eth1' 3: eth1: mtu 1500 qdisc pfifo_fast inet 172.25.250.200/24 brd 172.25.250.255 scope global eth1 7: br-ctlplane: mtu 1500 qdisc noqueue inet 172.25.249.200/24 brd 172.25.249.255 scope global br-ctlplane inet 172.25.249.202/32 scope global br-ctlplane inet 172.25.249.201/32 scope global br-ctlplane
The provisioning subnet is configured for DHCP. The Networking service has configured a dnsmasq instance to manage the scope. Verify the subnet with the location of the DNS nameserver, to be handed out to DHCP clients as a scope option with a default gateway. [user@undercloud]$ openstack subnet list -c ID -c Name +--------------------------------------+-----------------+ | ID | Subnet | +--------------------------------------+-----------------+ | 5e627758-6ec6-48f0-9ea6-1d4803f0196d | 172.25.249.0/24 | +--------------------------------------+-----------------+ [user@undercloud]$ openstack subnet show 5e627758-6ec6-48f0-9ea6-1d4803f0196d +-------------------+------------------------------------------------------------+ | Field | Value | +-------------------+------------------------------------------------------------+ | allocation_pools | 172.25.249.51-172.25.249.59 | | cidr | 172.25.249.0/24 | | dns_nameservers | 172.25.250.200 | | enable_dhcp | True | | host_routes | destination='169.254.169.254/32', gateway='172.25.249.200' | ...output omitted...
Confirm resources for the nodes The nodes to be deployed are typically bare metal physical systems, such as blade servers or rack systems, with IPMI management interfaces for remote power off access and administration. Access each node to verify that the systems are configured with multiple NICs, and the correct configuration of CPU, RAM, and hard disk space for the assigned deployment role. In this course, the nodes are virtual machines with a small-scale configuration. Power management in a cloud environment normally uses the IPMI management NIC built into a server chassis. However, virtual machines do not normally have a lights-out-management platform interface. Instead, they are controlled by the appropriate virtualization management software, which connects to the running virtual machine's hypervisor to request power management actions and events. In this classroom, a Baseboard Management Controller (BMC)
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
235
Chapter 6. Managing Resilient Compute Resources emulator is running on the power virtual machine, configured with one unique IP address per virtual machine node. Upon receiving a valid IPMI request at the correct listener, the BMC emulator sends the request to the hypervisor, which performs the request on the corresponding virtual machine. Define and verify the MAC address, IPMI address, power management user name and password, for each node to be registered, in the instack configuration file instackenv-initial.json. This node registration file can be either JSON or YAML format. The following example shows the instack configuration file in JSON format. [user@undercloud]$ cat instackenv-initial.json { "nodes": [ { "name": "controller0", "pm_user": "admin", "arch": "x86_64", "mac": [ "52:54:00:00:f9:01" ], "cpu": "2", "memory": "8192", "disk": "40", "pm_addr": "172.25.249.101", "pm_type": "pxe_ipmitool", "pm_password": "password" } ..output omitted...
The next step is to register the nodes with the Bare Metal service. The Workflow service manages this task set, which includes the ability to schedule and monitor multiple tasks and actions. [user@undercloud]$ openstack baremetal import --json instackenv-initial.json Started Mistral Workflow. Execution ID: 112b4907-2499-4538-af5d-37d3f934f31c Successfully registered node UUID 5206cc66-b513-4b01-ac1b-cd2d6de06b7d Successfully registered node UUID 099b3fd5-370d-465b-ba7d-e9a19963d0af Successfully registered node UUID 4fef49a8-fe55-4e96-ac26-f23f192a6408 Started Mistral Workflow. Execution ID: 2ecd83b1-045d-4536-9cf6-74a2db52baca Successfully set all nodes to available.
Single or multiple hosts may be introspected simultaneously. When building new clouds, performing bulk introspection is common. After an overcloud cloud is operational, it is best to set a manageable provisioning state on selected nodes, then invoke introspection only on those selected nodes. Introspection times vary depending on the number of nodes and the throughput capacity of the provisioning network, because the introspection image must be pushed to each node during the PXE boot. If introspection appears to not finish, check the Bare Metal services logs for troubleshooting. [user@undercloud]$ openstack baremetal node manage controller0 [user@undercloud]$ openstack baremetal node manage compute0 [user@undercloud]$ openstack baremetal node manage ceph0 [user@undercloud]$ openstack baremetal node list -c Name -c "Power State" \ -c "Provisioning State" -c Maintenance +-------------+-------------+--------------------+-------------+ | Name | Power State | Provisioning State | Maintenance | +-------------+-------------+--------------------+-------------+ | controller0 | power off | manageable | False | | compute0 | power off | manageable | False | | ceph0 | power off | manageable | False |
236
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Orchestrating an Overcloud +-------------+-------------+--------------------+-------------+ [user@undercloud]$ openstack overcloud node introspect --all-manageable --provide Started Mistral Workflow. Execution ID: 28ea0111-fac8-4298-8d33-1aaeb633f6b7 Waiting for introspection to finish... Introspection for UUID 4fef49a8-fe55-4e96-ac26-f23f192a6408 finished successfully. Introspection for UUID 099b3fd5-370d-465b-ba7d-e9a19963d0af finished successfully. Introspection for UUID 5206cc66-b513-4b01-ac1b-cd2d6de06b7d finished successfully. Introspection completed. Started Mistral Workflow. Execution ID: 8a3ca1c5-a641-4e4c-8ef9-95ff9e35eb33
What happened during Introspection? The managed nodes are configured to PXE boot by default. When introspection starts, IPMI (in your classroom, the BMC emulation on the power node) is contacted to reboot the nodes. Each node requests a DHCP address, a kernel, and a RAM disk to network boot, seen in the listing below as bm-deploy-ramdisk and bm-deploy-kernel. This boot image extensively queries and benchmarks the node, then reports the results to a Bare Metal listener on the undercloud, which updates the Bare Metal database and the Object Store. The node is then shut down and is available for the orchestration provisioning steps. [user@undercloud]$ openstack image list +--------------------------------------+------------------------+--------+ | ID | Name | Status | +--------------------------------------+------------------------+--------+ | 7daae61f-18af-422a-a350-d9eac3fe9549 | bm-deploy-kernel | active | | 6cee6ed5-bee5-47ef-96b9-3f0998876729 | bm-deploy-ramdisk | active | ...output omitted...
Introspecting Nodes The following steps outline the process to introspect managed nodes from the undercloud. 1.
Install the undercloud and verify available services.
2.
Create separate provisioning and public networks.
3.
Verify undercloud baremetal DHCP configuration and listeners.
4.
Configure provisioning network with DNS nameserver location.
5.
Upload baremetal and overcloud network boot images to the Image Service.
6.
Check baremetal nodes for correct NIC and disk physical configuration.
7.
Gather node MAC addresses, IPMI addresses, access user names and passwords.
8.
Create and import an instack node registration file.
9.
Set nodes to manageable status; invoke the introspection process.
10.
Review reported node characteristics.
Orchestrating an Overcloud The undercloud has obtained sizing and configuration information about each node through introspection. Nodes can be dynamically assigned to overcloud roles (controller, compute, ceph-storage, block-storage, or object-storage) by comparing each node to capability
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
237
Chapter 6. Managing Resilient Compute Resources conditions set by the cloud administrator. Different roles usually have recognizable sizing distinctions. In this classroom, the nodes are small-scale virtual machines that could be assigned automatically, but assigning deployment roles manually is useful in many cases. Overcloud deployment roles can be assigned in the orchestration templates by including scheduler hints that direct the orchestration engine how to assign roles. Instead, here we assign roles by creating profile tags to attach to flavors and nodes. First, create flavors for each of the deployment roles with sufficient CPU, RAM, and disk for the role. The baremetal flavor is used for building infrastructure servers other than one of the predefined roles, such as a Hadoop dataprocessing application cluster. It is not mandatory to use the role name as the flavor name but it is recommended for simplicity. [user@undercloud]$ openstack flavor list -c Name -c RAM -c Disk -c Ephemeral -c VCPUs +---------------+------+------+-----------+-------+ | Name | RAM | Disk | Ephemeral | VCPUs | +---------------+------+------+-----------+-------+ | ceph-storage | 2048 | 10 | 0 | 1 | | compute | 4096 | 20 | 0 | 1 | | swift-storage | 2048 | 10 | 0 | 1 | | control | 4096 | 30 | 0 | 1 | | baremetal | 4096 | 20 | 0 | 1 | | block-storage | 2048 | 10 | 0 | 1 | +---------------+------+------+-----------+-------+
Add the correct profile tag to each flavor as a property using the capabilities index. Use the same tag names when setting a profile on each node. [user@undecloud]$ openstack flavor show control +----------------------------+------------------------------------------------+ | Field | Value | +----------------------------+------------------------------------------------+ | disk | 30 | | id | a761d361-5529-4992-8b99-6f9b2f0a3a42 | | name | control | | properties | capabilities:boot_option='local', | | | capabilities:profile='control', | | | cpu_arch='x86_64', name='control' | | ram | 4096 | | vcpus | 1 | ...output omitted...
Add the correct matching profile tag to each node as a property using the capabilities index. [user@undercloud]$ openstack baremetal node show controller0 +------------------------+----------------------------------------------------+ | Field | Value | +------------------------+----------------------------------------------------+ | console_enabled | False | | created_at | 2017-06-05T04:05:23+00:00 | | driver | pxe_ipmitool | | driver_info | {u'ipmi_password': u'******', u'ipmi_address': | | | u'172.25.249.101', u'deploy_ramdisk': | | | u'6cee6ed5-bee5-47ef-96b9-3f0998876729', | | | u'deploy_kernel': u'7daae61f-18af| | | 422a-a350-d9eac3fe9549', u'ipmi_username': | | | u'admin'} | | name | controller0 | | properties | {u'memory_mb': u'8192', u'cpu_arch': u'x86_64', |
238
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Orchestrating an Overcloud | | | | uuid ...output omitted...
| | | |
u'local_gb': u'39', u'cpus': u'2', u'capabilities': u'profile:control,boot_option:local'} 5206cc66-b513-4b01-ac1b-cd2d6de06b7d
| | | |
Organizing orchestration templates Red Hat OpenStack Platform director ships with a full set of working overcloud templates, including many optional configuration environment files, in the /usr/share/openstacktripleo-heat-templates/ directory. During provisioning, the top-level file invoked is the overcloud.j2.yaml file, referencing objects defined in the top-level overcloud-resourceregistry-puppet.j2.yaml file. These files are constructed using the Jinja2 template engine for Python. The remaining template files are either YAML files or Puppet manifests. These main files reference the remaining resource files or scripts, depending on the environment files chosen. [user@undercloud]$ -rw-r--r--. 1 root -rw-r--r--. 1 root -rw-r--r--. 1 root drwxr-xr-x. 5 root -rw-r--r--. 1 root drwxr-xr-x. 3 root drwxr-xr-x. 4 root drwxr-xr-x. 4 root drwxr-xr-x. 6 root drwxr-xr-x. 2 root -rw-r--r--. 1 root -rw-r--r--. 1 root -rw-r--r--. 1 root -rw-r--r--. 1 root -rw-r--r--. 1 root -rw-r--r--. 1 root -rw-r--r--. 1 root dhcp.yaml -rw-r--r--. 1 root -rw-r--r--. 1 root drwxr-xr-x. 5 root -rw-r--r--. 1 root -rw-r--r--. 1 root drwxr-xr-x. 5 root -rw-r--r--. 1 root drwxr-xr-x. 2 root
ls -l /usr/share/openstack-tripleo-heat-templates/ root 808 Jan 2 19:14 all-nodes-validation.yaml root 583 Jan 2 19:14 bootstrap-config.yaml root 20903 Jan 2 19:14 capabilities-map.yaml root 75 May 19 13:22 ci root 681 Jan 2 19:14 default_passwords.yaml root 128 May 19 13:22 deployed-server root 168 May 19 13:22 docker root 4096 May 19 13:22 environments root 73 May 19 13:22 extraconfig root 162 May 19 13:22 firstboot root 735 Jan 2 19:14 hosts-config.yaml root 325 Jan 2 19:14 j2_excludes.yaml root 2594 Jan 2 19:14 net-config-bond.yaml root 1895 Jan 2 19:14 net-config-bridge.yaml root 2298 Jan 2 19:14 net-config-linux-bridge.yaml root 1244 Jan 2 19:14 net-config-noop.yaml root 3246 Jan 2 19:14 net-config-static-bridge-with-externalroot 2838 Jan 2 19:14 net-config-static-bridge.yaml root 2545 Jan 2 19:14 net-config-static.yaml root 4096 May 19 13:22 network root 25915 Jan 2 19:14 overcloud.j2.yaml root 13866 Jan 17 12:44 overcloud-resource-registry-puppet.j2.yaml root 4096 May 19 13:22 puppet root 6555 Jan 17 12:44 roles_data.yaml root 26 May 19 13:22 validation-scripts
Recommended practice is to copy this whole directory structure to a new working directory, to ensure that local customizations are not overwritten by package updates. In this classroom, the working directory is /home/stack/templates/. The environment subdirectory contains the sample configuration files to choose features and configurations for this overcloud deployment. Create a new environment working subdirectory and copy only the needed environment files into it. Similarly, create a configuration working subdirectory and save any modified template files into it. The subdirectories are cl210-environment and cl210-configuration. The classroom configuration includes environment files to build trunked VLANs, statically configured node IP address, an explicit Ceph server layout and more. The need for 3 NICs per virtual machine required customizing existing templates, which were copied to the configuration subdirectory before modification. Browse these files of interest to correlate template settings to the live configuration:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
239
Chapter 6. Managing Resilient Compute Resources • • • • •
templates/cl210-environment/30-network-isolation.yaml templates/cl210-environment/32-network-environment.yaml templates/cl210-configuration/single-nic-vlans/controller.yaml templates/cl210-configuration/single-nic-vlans/compute.yaml templates/cl210-configuration/single-nic-vlans/ceph-storage.yaml
The final step is to start the deployment, specifying the main working directories for templates and environment files. Deployment time varies greatly, depending on the number of nodes being deployed and the features selected. Orchestration processes tasks in dependency order. Although many tasks may be running on different nodes simultaneously, some tasks must finish before others can begin. This required structure is organized into a workflow plan, which manages the whole provisioning orchestration process. [user@demo ~]$ openstack overcloud deploy \ --templates /home/stack/templates \ --environment-directory /home/stack/templates/cl210-environment Removing the current plan files Uploading new plan files Started Mistral Workflow. Execution ID: 7c29ea92-c54e-4d52-bfb1-9614a485fa2d Plan updated Deploying templates in the directory /tmp/tripleoclient-0_T1mA/tripleo-heat-templates Started Mistral Workflow. Execution ID: 1edb7bb3-27f5-4b0a-a248-29bf949a4d57
The undercloud returns the status of the overcloud stack. [user@undercloud]$ openstack stack list +--------------------+------------+--------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | +--------------------+------------+--------------------+----------------------+ | 6ce5fe42-5d16-451d | overcloud | CREATE_IN_PROGRESS | 2017-06-05T06:05:35Z | | -88f9-a89de206d785 | | | | +--------------------+------------+--------------------+----------------------+
Monitor the orchestration process on the console where the deployment command was invoked. Orchestration plans that do not complete can be corrected, edited and restarted. The following text displays when the overcloud stack deployment is complete. Stack overcloud CREATE_COMPLETE Started Mistral Workflow. Execution ID: 6ab02187-fc99-4d75-8b45-8354c8826066 /home/stack/.ssh/unbeknownst updated. Original contents retained as /home/stack/.ssh/known_hosts.old Overcloud Endpoint: http://172.25.250.50:5000/v2.0 Overcloud Deployed
Query the undercloud about the status of the overcloud stack. [user@undercloud]$ openstack stack list +--------------------+------------+-----------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | +--------------------+------------+-----------------+----------------------+ | 6ce5fe42-5d16-451d | overcloud | CREATE_COMPLETE | 2017-06-05T06:05:35Z | | -88f9-a89de206d785 | | | | +--------------------+------------+-----------------+----------------------+
240
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Orchestrating an Overcloud What happened during Orchestration? The managed nodes are configured to PXE boot by default. When orchestration starts, IPMI (in our case, the BMC emulation on power node) is contacted to reboot the nodes. Each node requests a DHCP address and network boot image, seen in the listing below as overcloudfull-initrd and overcloud-full-vmlinuz. This boot image runs as an iSCSI target server to configure and publish the node's boot disk as an iSCSI target. It then contacts the Bare Metal conductor, which connects to that iSCSI LUN, partitions it, then overwrites it, copying the overcloud-full image to become the node's boot disk. The system then prepares and performs a reboot, coming up on this new, permanent disk. [user@undercloud]$ openstack image list +--------------------------------------+------------------------+--------+ | ID | Name | Status | +--------------------------------------+------------------------+--------+ ...output omitted... | f5725232-7474-4d78-90b9-92f75fe84615 | overcloud-full | active | | daca43d2-67a3-4333-896e-69761e986431 | overcloud-full-vmlinuz | active | | 1e346cba-a7f2-4535-b9a6-d9fa0bf68491 | overcloud-full-initrd | active | +--------------------------------------+------------------------+--------+
On the following page, Figure 6.1: Bare Metal boot disk provisioning visually describes the procedure for delivering a new boot disk to a node being provisioned. The overcloud-full image is a working Red Hat Enterprise Linux system with all of the Red Hat OpenStack Platform and Red Hat Ceph Storage packages already installed but not configured. By pushing the same overcloud-full image to all nodes, any node could be sent instructions to build any of the supported deployment roles: Controller, Compute, Ceph-Storage, Image-Storage or BlockStorage. When the node boots this image for the first time, the image is configured to send a call back message to the Orchestration service to say that it is ready to be unconfigured. Orchestration then coordinates the sending and processing of resource instructions and Puppet invocations that accomplish the remainder of the build and configuration of the node. When orchestration is complete, the result is a complete server running as one of the deployment roles.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
241
Chapter 6. Managing Resilient Compute Resources
Figure 6.1: Bare Metal boot disk provisioning
242
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Completed Classroom Topology
Orchestrating an Overcloud The following steps outline the process to orchestrate an overcloud from the undercloud. 1.
Create the flavors for each node deployment role.
2.
Assign matching profile tags to specify which nodes will be selected for which flavors.
3.
Copy the default template directory, located at /usr/share/openstack-tripleoheat-templates/, to a new work directory.
4.
Create the environment files required to customize the overcloud deployment.
5.
Run the openstack overcloud deploy command. Use the --templates parameter to specify the template directory. Use the --environment-directory parameter to specify the environment file directory.
6.
Use ssh to connect to each deployed node as the heat-admin user, to verify deployment.
7.
Review the network interfaces, bridges, and disks to verify that each is correctly configured.
Completed Classroom Topology On the following page, Figure 6.2: Completed classroom overcloud portrays four deployed nodes: controller0, compute0, compute1, and ceph0. The compute1 node will be deployed later in this chapter as an overcloud stack upgrade. Use this diagram as a reference when verifying the live overcloud configuration.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
243
Chapter 6. Managing Resilient Compute Resources
Figure 6.2: Completed classroom overcloud
244
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Completed Classroom Topology
References The Ironic developer documentation page https://docs.openstack.org/developer/ironic/ The Mistral documentation page https://docs.openstack.org/developer/mistral/index.html The Heat documentation page https://docs.openstack.org/developer/heat/ Further information about Red Hat OpenStack Platform director is available in the Director Installation & Usage guide for Red Hat OpenStack Platform 10; at https://access.redhat.com/documentation/en/red-hat-openstack-platform/ Further information about template customization is available in the Advanced Overcloud Customization guide for Red Hat OpenStack Platform 10; at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
245
Chapter 6. Managing Resilient Compute Resources
Guided Exercise: Configuring an Overcloud Deployment In this exercise, you will view the results of the deployment tasks that created the overcloud on your virtual machines. You will verify the configuration and status of the undercloud, then verify the configuration and status of the overcloud. Outcomes You should be able to: • Connect to and observe the undercloud system. • View an introspection configuration. • View an orchestration configuration. • View an overcloud deployment configuration. Before you begin Log in to workstation as student using student as the password. On workstation, run the lab resilience-overcloud-depl setup command. The script verifies that overcloud nodes are accessible and running the correct OpenStack services. [student@workstation ~]$ lab resilience-overcloud-depl setup
Steps 1. Log in to director and review the environment. 1.1. Use the ssh command to connect to director. Review the environment file for the stack user. OpenStack environment variables all begin with OS_. [student@workstation ~]$ ssh stack@director [stack@director ~]$ env | grep "OS_" OS_IMAGE_API_VERSION=1 OS_PASSWORD=9ee0904a8dae300a37c4857222b10fb10a2b6db5 OS_AUTH_URL=https://172.25.249.201:13000/v2.0 OS_USERNAME=admin OS_TENANT_NAME=admin OS_NO_CACHE=True OS_CLOUDNAME=undercloud
1.2. View the environment file for the stack user. This file is automatically sourced when the stack user logs in. The OS_AUTH_URL variable in this file defines the Identity Service endpoint of the undercloud. [stack@director ~]$ grep "^OS_" stackrc OS_PASSWORD=$(sudo hiera admin_password) OS_AUTH_URL=https://172.25.249.201:13000/v2.0 OS_USERNAME=admin OS_TENANT_NAME=admin OS_BAREMETAL_API_VERSION=1.15 OS_NO_CACHE=True
246
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
OS_CLOUDNAME=undercloud
2.
Review the network configuration for the undercloud. 2.1. Inspect the /home/stack/undercloud.conf configuration file. Locate the variables that define the provisioning network, such as undercloud_admin_vip. [DEFAULT] local_ip = 172.25.249.200/24 undercloud_public_vip = 172.25.249.201 undercloud_admin_vip = 172.25.249.202 local_interface = eth0 masquerade_network = 172.25.249.0/24 dhcp_start = 172.25.249.51 dhcp_end = 172.25.249.59 network_cidr = 172.25.249.0/24 network_gateway = 172.25.249.200 inspection_iprange = 172.25.249.150,172.25.249.180 ...output omitted...
2.2. Compare the IP addresses in the configuration file with the IP address assigned to the virtual machine. Use the ip command to list the devices. [stack@director ~]$ ip addr | grep -E 'eth1|br-ctlplane' 3: eth1: mtu 1500 qdisc pfifo_fast state inet 172.25.250.200/24 brd 172.25.250.255 scope global eth1 7: br-ctlplane: mtu 1500 qdisc noqueue state inet 172.25.249.200/24 brd 172.25.249.255 scope global br-ctlplane inet 172.25.249.202/32 scope global br-ctlplane inet 172.25.249.201/32 scope global br-ctlplane
2.3. List the networks configured in the undercloud. If an overcloud is currently deployed, then approximately six networks are displayed. If the overcloud has been deleted or has not been deployed, only one network will display. Look for the provisioning network named ctlplane. This display includes the subnets configured within the networks listed. You will use the ID for the provisioning network's subnet in the next step. [stack@director ~]$ openstack network list --long -c Name -c Subnets \ -c "Network Type" +--------------+--------------------------------------+--------------+ | Name | Subnets | Network Type | +--------------+--------------------------------------+--------------+ | external | 52af2265-5c3f-444f-b595-5cbdb56f434f | flat | | tenant | 6d6f5f79-ed32-4c3c-8147-b7d84fb1e02c | flat | | internal_api | 0e703161-c389-47b3-b6ab-e984e9b15bef | flat | | storage_mgmt | 2ee6fb45-77bb-46b5-bb66-978d687b9558 | flat | | ctlplane | 64f6a0a6-dc27-4c92-a81a-b294d1bb22a4 | flat | | storage | 2d146a94-effc-461d-a38b-f7e4da319a2e | flat | +--------------+--------------------------------------+--------------+
2.4. Display the subnet for the ctlplane provisioning network using the subnet ID obtained in the previous step. The allocation_pools field is the DHCP scope, and the dns_nameservers and gateway_ip fields are DHCP options, for an overcloud node's provisioning network interface.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
247
Chapter 6. Managing Resilient Compute Resources
[stack@director ~]$ openstack subnet show 64f6a0a6-dc27-4c92-a81a-b294d1bb22a4 +-------------------+------------------------------------------------------+ | Field | Value | +-------------------+------------------------------------------------------+ | allocation_pools | 172.25.249.51-172.25.249.59 | | cidr | 172.25.249.0/24 | | created_at | 2017-06-12T19:37:40Z | | description | | | dns_nameservers | 172.25.250.200 | | enable_dhcp | True | | gateway_ip | 172.25.249.200 | ...output omitted...
3.
List the services, and their passwords, installed for the undercloud. 3.1. List the undercloud services running on director. [stack@director ~]$ openstack service list -c Name -c Type +------------------+-------------------------+ | Name | Type | +------------------+-------------------------+ | zaqar-websocket | messaging-websocket | | heat | orchestration | | swift | object-store | | aodh | alarming | | mistral | workflowv2 | | ceilometer | metering | | keystone | identity | | nova | compute | | zaqar | messaging | | glance | image | | ironic | baremetal | | ironic-inspector | baremetal-introspection | | neutron | network | +------------------+-------------------------+
3.2. Review the admin and other component service passwords located in the /home/ stack/undercloud-passwords.conf configuration file. You will use various service passwords in later exercises. [auth] undercloud_db_password=eb35dd789280eb196dcbdd1e8e75c1d9f40390f0 undercloud_admin_token=529d7b664276f35d6c51a680e44fd59dfa310327 undercloud_admin_password=96c087815748c87090a92472c61e93f3b0dcd737 undercloud_glance_password=6abcec10454bfeec6948518dd3de6885977f6b65 undercloud_heat_encryption_key=45152043171b30610cb490bb40bff303 undercloud_heat_password=a0b7070cd8d83e59633092f76a6e0507f85916ed undercloud_neutron_password=3a19afd3302615263c43ca22704625db3aa71e3f undercloud_nova_password=d59c86b9f2359d6e4e19d59bd5c60a0cdf429834 undercloud_ironic_password=260f5ab5bd24adc54597ea2b6ea94fa6c5aae326 ...output omitted...
4.
View the configuration used to prepare for deploying the overcloud and the resulting overcloud nodes. 4.1. View the /home/stack/instackenv-initial.json configuration file. The file was used to define each overcloud node, including power management access settings.
248
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
{ "nodes": [ { "name": "controller0", "pm_user": "admin", "arch": "x86_64", "mac": [ "52:54:00:00:f9:01" ], "cpu": "2", "memory": "8192", "disk": "40", "pm_addr": "172.25.249.101", "pm_type": "pxe_ipmitool", "pm_password": "password" ...output omitted...
4.2. List the provisioned nodes in the current overcloud environment. This command lists the nodes that were created using the configuration file shown in the previous step. [stack@director ~]$ openstack baremetal node list -c Name -c 'Power State' \ -c 'Provisioning State' -c 'Maintenance' +-------------+-------------+--------------------+-------------+ | Name | Power State | Provisioning State | Maintenance | +-------------+-------------+--------------------+-------------+ | controller0 | power on | active | False | | compute0 | power on | active | False | | ceph0 | power on | active | False | +-------------+-------------+--------------------+-------------+
4.3. List the servers in the environment. Review the status and the IP address of the nodes. This command lists the overcloud servers built on the bare-metal nodes defined in the previous step. The IP address assigned to the nodes are reachable from the director virtual machine. [stack@director ~]$ openstack server list -c Name -c Status -c Networks +-------------------------+--------+------------------------+ | Name | Status | Networks | +-------------------------+--------+------------------------+ | overcloud-controller-0 | ACTIVE | ctlplane=172.25.249.52 | | overcloud-compute-0 | ACTIVE | ctlplane=172.25.249.53 | | overcloud-cephstorage-0 | ACTIVE | ctlplane=172.25.249.58 | +-------------------------+--------+------------------------+
5.
Using the controller0 node and the control role as an example, review the settings that define how a node is selected to be built for a server role. 5.1. List the flavors created for each server role in the environment. These flavors were created to define the sizing for each deployment server role. It is recommended practice that flavors are named for the roles for which they are used. However, properties set on a flavor, not the flavor's name, determine its use. [stack@director ~]$ openstack flavor list -c Name -c RAM -c Disk -c Ephemeral \ -c VCPUs +---------------+------+------+-----------+-------+ | Name | RAM | Disk | Ephemeral | VCPUs | +---------------+------+------+-----------+-------+
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
249
Chapter 6. Managing Resilient Compute Resources | ceph-storage | 2048 | 10 | 0 | 1 | | compute | 4096 | 20 | 0 | 1 | | swift-storage | 2048 | 10 | 0 | 1 | | control | 4096 | 30 | 0 | 1 | | baremetal | 4096 | 20 | 0 | 1 | | block-storage | 2048 | 10 | 0 | 1 | +---------------+------+------+-----------+-------+
5.2. Review the control flavor's properties by running the openstack flavor show command. The capabilities settings include the profile='control' tag. When this flavor is specified, it will only work with nodes that match these requested capabilities, including the profile='control' tag. [stack@director ~]$ openstack flavor show control +----------------------------+------------------------------------------------+ | Field | Value | +----------------------------+------------------------------------------------+ | disk | 30 | | id | a761d361-5529-4992-8b99-6f9b2f0a3a42 | | name | control | | properties | capabilities:boot_option='local', | | | profile='control', cpu_arch='x86_64', | | ram | 4096 | | vcpus | 1 | ...output omitted...
5.3. Review the controller0 node's properties field. The capabilities settings include the same profile:control tag as defined on the control flavor. When a flavor is specified during deployment, only a node that matches a flavor's requested capabilities is eligible to be selected for deployment. [stack@director ~]$ openstack baremetal node show controller0 +------------------------+----------------------------------------------------+ | Field | Value | +------------------------+----------------------------------------------------+ | console_enabled | False | | created_at | 2017-06-05T04:05:23+00:00 | | driver | pxe_ipmitool | | driver_info | {u'ipmi_password': u'******', u'ipmi_address': | | | u'172.25.249.101', u'deploy_ramdisk': | | | u'6cee6ed5-bee5-47ef-96b9-3f0998876729', | | | u'deploy_kernel': u'7daae61f-18af| | | 422a-a350-d9eac3fe9549', u'ipmi_username': | | | u'admin'} | | extra | {u'hardware_swift_object': u'extra_hardware| | | 5206cc66-b513-4b01-ac1b-cd2d6de06b7d'} | | name | controller0 | | properties | {u'memory_mb': u'8192', u'cpu_arch': u'x86_64', | | | u'local_gb': u'39', u'cpus': u'2', | | | u'capabilities':boot_option:local', | | | u'profile:control} | | uuid | 5206cc66-b513-4b01-ac1b-cd2d6de06b7d | ...output omitted...
6.
Review the template and environment files that were used to define the deployment configuration. 6.1. Locate the environment files used for the overcloud deployment.
250
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[stack@director ~]$ ls ~/templates/cl210-environment/ 00-node-info.yaml 32-network-environment.yaml 20-storage-environment.yaml 34-ips-from-pool-all.yaml 30-network-isolation.yaml 40-compute-extraconfig.yaml
50-pre-config.yaml 60-post-config.yaml
6.2. Locate the configuration files used for the overcloud deployment. [stack@director ~]$ ls ~/templates/cl210-configuration/single-nic-vlans/ ceph-storage.yaml compute.yaml controller.yaml
6.3. Review the /home/stack/templates/cl210-environment/30-networkisolation.yaml environment file that defines the networks and VLANs for each server. For example, this partial output lists the networks (port attachments) to be configured on a node assigned the controller role. ...output omitted... # Port assignments for the controller role OS::TripleO::Controller::Ports::ExternalPort: ../network/ports/external.yaml OS::TripleO::Controller::Ports::InternalApiPort: ../network/ports/internal... OS::TripleO::Controller::Ports::StoragePort: ../network/ports/storage.yaml OS::TripleO::Controller::Ports::StorageMgmtPort: ../network/ports/storage_... OS::TripleO::Controller::Ports::TenantPort: ../network/ports/tenant.yaml ...output omitted...
6.4. Review the /home/stack/templates/cl210-environment/32-networkenvironment.yaml environment file that defines the overall network configuration for the overcloud. For example, this partial output lists the IP addressing used for the Internal API VLAN. ...output omitted... # Internal API - used for private OpenStack services traffic InternalApiNetCidr: '172.24.1.0/24' InternalApiAllocationPools: [{'start': '172.24.1.60','end': '172.24.1.99'}] InternalApiNetworkVlanID: 10 InternalApiVirtualFixedIPs: [{'ip_address':'172.24.1.50'}] RedisVirtualFixedIPs: [{'ip_address':'172.24.1.51'}] ...output omitted...
6.5. View the /home/stack/templates/cl210-configuration/single-nic-vlans/ controller.yaml configuration file that defines the network interfaces for the controller0 node. For example, this partial output shows the Internal API interface, using variables seen previously in the 32-network-environment.yaml file. ...output omitted... type: vlan # mtu: 9000 vlan_id: {get_param: InternalApiNetworkVlanID} addresses: ip_netmask: {get_param: InternalApiIpSubnet} ...output omitted...
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
251
Chapter 6. Managing Resilient Compute Resources 6.6. View the /home/stack/templates/cl210-configuration/single-nic-vlans/ compute.yaml configuration file that defines the network interfaces for the compute0 node. This partial output shows that compute nodes also uses the Internal API VLAN. ...output omitted... type: vlan # mtu: 9000 vlan_id: {get_param: InternalApiNetworkVlanID} addresses: ip_netmask: {get_param: InternalApiIpSubnet} ...output omitted...
6.7. View the /home/stack/templates/cl210-configuration/single-nic-vlans/ ceph-storage.yaml configuration file that defines the network interfaces for the ceph0 node. This partial output shows that Ceph nodes connect to the Storage VLAN. ...output omitted... type: vlan # mtu: 9000 vlan_id: {get_param: StorageNetworkVlanID} addresses: ip_netmask: {get_param: StorageIpSubnet} ...output omitted...
7.
Confirm the successful completion of the overcloud deployment. 7.1. Review the status of the stack named overcloud. [stack@director ~]$ openstack stack list -c "Stack Name" -c "Stack Status" \ -c "Creation Time" +------------+-----------------+----------------------+ | Stack Name | Stack Status | Creation Time | +------------+-----------------+----------------------+ | overcloud | CREATE_COMPLETE | 2017-06-12T19:46:07Z | +------------+-----------------+----------------------+
7.2. Source the overcloudrc authentication environment file. The OS_AUTH_URL variable in this file defines the Identity Service endpoint of the overcloud. [stack@director ~]$ source overcloudrc [stack@director ~]$ env | grep "OS_" OS_PASSWORD=y27kCBdDrqkkRHuzm72DTn3dC OS_AUTH_URL=http://172.25.250.50:5000/v2.0 OS_USERNAME=admin OS_TENANT_NAME=admin OS_NO_CACHE=True OS_CLOUDNAME=overcloud
7.3. List the services running on the overcloud. [stack@director ~]$ openstack service list -c Name -c Type +------------+----------------+
252
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| Name | Type | +------------+----------------+ | keystone | identity | | cinderv3 | volumev3 | | neutron | network | | cinder | volume | | cinderv2 | volumev2 | | swift | object-store | | aodh | alarming | | glance | image | | nova | compute | | gnocchi | metric | | heat | orchestration | | ceilometer | metering | +------------+----------------+
7.4. Review general overcloud configuration. This listing contains default settings, formats, and core component version numbers. The currently empty network field displays networks created, although none yet exist in this new overcloud. [stack@director ~]$ openstack configuration show +---------------------------------+--------------------------------+ | Field | Value | +---------------------------------+--------------------------------+ | alarming_api_version | 2 | | api_timeout | None | | auth.auth_url | http://172.25.250.50:5000/v2.0 | | auth.password | | | auth.project_name | admin | | auth.username | admin | | auth_type | password | ...output omitted... | networks | [] | ...output omitted...
7.5. Exit from director. [stack@director ~]$ exit [student@workstation ~]$
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
253
Chapter 6. Managing Resilient Compute Resources
Scaling Compute Nodes Objective After completing this section, students should be able to add a compute node to the overcloud using the undercloud.
Scaling An important feature of cloud computing is the ability to rapidly scale up or down an infrastructure. Administrators can provision their infrastructure with nodes that can fulfill multiple roles (for example, computing, storage, or controller) and can be pre-installed with a base operating system. Administrators can then integrate these nodes into their environment as needed. Cloud computing provides services that are automatically able to take into account the increase or decrease in load usage, and adequately warn the administrators in case the environment needs to be scaled. In traditional computing models, it is often required to manually install, configure, and integrate new servers into existing environments, thus requiring extra time and effort to provision the node. Autoscaling is one of the main benefits that the cloudcomputing model provides, as it permits, for example, quick response to load spikes. Red Hat OpenStack Platform director, with the Heat orchestration service, implements scaling features. Administrators can rerun the command used to deploy an overcloud, increasing or decreasing the roles based on infrastructure requirements. For example, the overcloud environment can scale by adding two additional compute nodes, bringing the total to three. Red Hat OpenStack Platform director then automatically reviews the current configuration and reconfigures the available services to provision the OpenStack environment with the three compute nodes.
Heat Orchestration Service The Orchestration service provides a template-based orchestration engine for the undercloud, which can be used to create and manage resources such as storage, networking, instances, and applications as a repeatable running environment. Templates are used to create stacks, which are collections of resources (for example, instances, floating IPs, volumes, security groups, or users). The Orchestration service offers access to all the undercloud core services through a single modular template, with additional orchestration capabilities such as autoscaling and basic high availability. An Orchestration stack is a collection of multiple infrastructure resources deployed and managed through the same interface. Stacks can be used to standardize and speed up delivery, by providing a unified human-readable format. While the Heat project started as an analog of AWS CloudFormation, making it compatible with the template formats used by CloudFormation (CFN), it also supports its own native template format, called HOT, for Heat Orchestration Templates. The undercloud provides a collection of Heat templates in order to deploy the different overcloud elements.
254
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Heat Orchestration Service
Note Administrators must give a special role to OpenStack users that allows them to manage stacks. The role name is defined by the heat_stack_user_role variable in /etc/ heat/heat.conf. The default role name is heat_stack_user. A Heat template is written using YAML syntax, and has three major sections: 1.
Parameters: Input parameters provided when deploying from the template.
2.
Resources: Infrastructure elements to deploy, such as virtual machines or network ports.
3.
Outputs: Output parameters dynamically generated by the Orchestration service; for example, the public IP and an instance that has been deployed using the template.
The openstack command supports stack management, including commands shown in the following table. Heat Stack Management with the openstack Command Command
Description
stack create
Create a stack.
stack list
List the user's stacks.
stack show
Show the details for a stack.
stack delete
Delete a stack.
resource list STACKNAME
Show the list of resources created by a stack. The -n option is used to specify the depth of nested stacks for which resources are to be displayed.
deployment list
List the software deployed and its deployment ID.
deployment show ID
Show the details for the software components being deployed.
Troubleshooting the Heat orchestration service requires administrators to understand how the underlying infrastructure has been configured, since Heat makes use of these resources in order to create the stack. For example, when creating an instance, the Orchestration service is invoked the same way users would the Compute service API, through Identity authentication. When a network port is requested by the OpenStack Networking service, the requests are also made to the API through the Identity service. This means the infrastructure needs to be configured and working. Administrators must ensure that the resources requested through Heat can also be requested manually. Orchestration troubleshooting includes: • Ensuring all undercloud services that the templates refer to are configured and running. • Ensuring the resources, such as images or key pairs, exist. • Ensuring the infrastructure has the capacity to deploy the stack. After the troubleshooting has completed, administrators can review the configuration of Orchestration services: • Orchestration service API configuration. • Identity service configuration.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
255
Chapter 6. Managing Resilient Compute Resources
Figure 6.3: The Orchestration service
Orchestration Terminology The following table lists the terms that administrators should be familiar with to properly administer their cloud with the Orchestration service. Heat Terminology Table Term
Definition
Orchestration service API
The Orchestration service API server provides a REST API that forwards orchestration requests to the Orchestration service engine using remote procedure calls RPCs.
Orchestration service engine
The service that applies templates and orchestrates the creation and launch of cloud resources. It reports event status back to the API customer.
YAML
The YAML format is a human-readable data serialization language. Orchestration templates are YAML-based, providing administrators with a convenient way to manage their cloud infrastructure.
256
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Bare Metal Provisioning Service (Ironic) Term
Definition
Heat Orchestration Template (HOT)
A Heat Orchestration Template (HOT) is a YAML-based configuration file that administrators write and pass to the Orchestration service API to deploy their cloud infrastructure. HOT is a template format designed to replace the legacy Orchestration CloudFormationcompatible format (CFN).
CloudFormation template (CFN)
CFN is a legacy template format used by Amazon AWS services. The heat-api-cfn service manages this legacy format.
Orchestration template parameters
Orchestration template parameters are settings passed to the Orchestration service that provide a way to customize a stack. They are defined in a Heat template file, with optional default values used when values are not passed. These are defined in the parameters section of a template.
Orchestration template resources
Orchestration template resources are the specific objects that are created and configured as part of a stack. OpenStack contains a set of core resources that span all components. These are defined in the resources section of a Heat template.
Orchestration template outputs
Orchestration template outputs are values, defined in a Heat template file, that are returned by the Orchestration service after a stack is created. Users can access these values either through the Orchestration service API or client tools. These are defined in the output section of a template.
Bare Metal Provisioning Service (Ironic) The Red Hat OpenStack Platform bare metal provisioning service, Ironic, supports the provisioning of both virtual and physical machines to be used for the overcloud deployment. All the information about a node is retrieved through a process called introspection. After introspection has completed, it is ready to be used to deploy overcloud services. The Bare Metal service makes use of the different services included in the undercloud, to deploy the overcloud services. The Bare Metal service supports different drivers to run the introspection process, based on what the environment hardware supports (for example, IPMI, DRAC). The following table includes the most common openstack baremetal commands for provisioning a new node in Red Hat OpenStack Platform director. Bare Metal Management with the ironic Command Command
Description
node list
List nodes registered with Ironic
node show
Show node details
node set
Update node information
node maintenance set
Change the maintenance state for a node
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
257
Chapter 6. Managing Resilient Compute Resources
Scaling Compute Nodes To scale the overcloud with additional compute nodes, the following pre-deployment steps are required: • Import an appropriate overcloud environment JSON environment file. • Run introspection of the additional compute node(s). • Update the appropriate properties. • Deploy the overcloud. Import the Overcloud Environment JSON File A node definition template file, instackenv.json, is required to define the overcloud node. This file contains the hardware and power management details for the overcloud node. { "nodes": [ { "pm_user": "admin", "arch": "x86_64", "name": "compute1", "pm_addr": "172.25.249.112", "pm_password": "password", "pm_type": "pxe_ipmitool", "mac": [ "52:54:00:00:f9:0c" ], "cpu": "2", "memory": "6144", "disk": "40" } ] }
The node description contains the following required fields: • pm_type: Power management driver to be used by the nodes • pm_addr: Power management server address • pm_user, pm_password: Power management server user name and password used to access it The following are optional fields used when the introspection has completed: • mac: List of MAC addresses of the overcloud nodes • cpu: Number of CPUs in these nodes • arch: CPU architecture • memory: Memory size in MiB • disk: Hard disk size in GiB • capabilities: Ironic node capabilities
258
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Scaling Compute Nodes
"capabilities": "profile:compute,boot_option:local"
There are various Ironic drivers provided for power management, which include: • pxe_ipmitool: Driver that uses the ipmitool utility to manage nodes. • pxe_ssh: Driver which can be used in a virtual environment. It uses virtualized environment commands to power on and power off the VMs over SSH. • pxe_ilo: Used on HP servers with iLO interfaces. • pxe_drac: Used on DELL servers with DRAC interfaces. • fake_pxe: All power management for this driver requires manual intervention. It can be used as a fallback for unusual or older hardware. To import the instackenv.json file, use the openstack baremetal import command. [user@undercloud]$ source stackrc [user@undercloud]$ openstack baremetal import --json instackenv.json
Introspection of Overcloud Nodes Introspection of nodes allows for collecting system information such as CPU count, memory, disk space, and network interfaces. Introspection allows advanced role matching, which ensures that correct roles are allocated to the most appropriate nodes. In cases where advanced role matching with Advanced Health Check (AHC) is not performed, manual tagging can be used to set the profile and extra capabilities for the nodes. [user@undercloud]$ openstack baremetal node set \ --property "capabilities=profile:compute,boot_option:local"
When overcloud nodes are booted into the introspection stage, they are provided with the discovery images by the ironic-inspector service located under /httpboot. The import process assigns each node the bm_deploy_kernel and bm_deploy_ramdisk images automatically. Manual use of openstack baremetal configure boot is no longer needed. In the following output, verify that deploy_kernel and deploy_ramdisk are assigned to the new nodes. [user@undercloud]$ openstack baremetal node show compute2 | grep -A1 deploy | driver_info | {u'ssh_username': u'stack', u'deploy_kernel': u'7bfa6b9e-2d2a-42ab-ac5d- | | | 7b7db9370982', u'deploy_ramdisk': | | | u'd402e2a9-a782-486f-8934-6c20b31c92d3', | | u'ssh_key_contents': u'----|
To introspect the hardware attributes of all registered nodes, run the command openstack baremetal introspection bulk start. [user@undercloud]$ openstack baremetal introspection bulk start Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec Waiting for introspection to finish...
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
259
Chapter 6. Managing Resilient Compute Resources To limit the introspection to nodes that are in the manageable provision state, use the -all-manageable --provide options with the openstack baremetal introspection command. [user@undercloud]$ openstack baremetal introspection --all-manageable --provide
Monitor and troubleshoot the introspection process with the following command. [user@undercloud]$ sudo journalctl -l -u openstack-ironic-inspector -u openstack-ironicinspector-dnsmasq -u openstack-ironic-conductor -f
Creating Flavors and Setting Appropriate Properties Red Hat OpenStack Platform director requires flavors to provision the overcloud nodes. The overcloud Orchestration templates look for a fallback flavor named baremetal. A flavor must be created, and can be used to specify the hardware used to create the overcloud nodes. [user@undercloud]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 \ baremetal
The capabilities, such as boot_option for the flavors, must be set to the boot_mode for the flavor, and the profile defines the node profile to use with the flavor. [user@undercloud]$ openstack flavor set \ --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" \ --property "capabilities:profile"="compute" compute
Deploy Overcloud Red Hat OpenStack Platform undercloud uses the Orchestration service to orchestrate the deployment of the overcloud with a stack definition. These Orchestration templates can be customized to suit various deployment patterns. The stack templates define all resources required for the deployment, and maintain the dependencies for these resource deployments. Red Hat OpenStack Platform can deploy these nodes: • control: A node with the controller role. • compute: A node on which the Compute instances are run. • ceph-storage: A node that runs the Ceph OSDs. Monitors run on the controller node. • block-storage: A dedicated node providing the Block Storage service (Cinder). • object-storage: A dedicated node with the Object Storage service (Swift). The overcloud is deployed using the openstack overcloud deploy command. [user@undercloud]$ openstack overcloud deploy \ --templates ~/templates \ --environment-directory ~/templates/cl210-environment
• --templates: Must specify the template location. If no location is specified, the default template location of /usr/share/openstack-tripleo-heat-templates is used.
260
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Scaling Compute Nodes • --environment-directory: Specifies the directory location of the environment Orchestration templates to be processed. This reduces the complexity of the deployment syntax by not requiring every template to be listed individually.
Note The --compute-scale deployment option is deprecated in Red Hat OpenStack Platform 10 (Newton) in favor of using an environment file. Administrators can define number of nodes to scale out in an environment file and supply that environment file to overcloud deployment stack. All the --*-scale deployment parameters, which includes --compute-scale,--swift-storage-scale, --block-storagescale, and --ceph-storage-scale, will be discontinued in a future Red Hat OpenStack Platform release.
Phases of Overcloud node deployment Registration: • The stack user uploads information about additional overcloud nodes. The information includes credentials for power management. • The information is saved in the Ironic database and used during the introspection phase. Introspection: • The Bare Metal service uses PXE (Preboot eXecution Environment) to boot nodes over a network. • The Bare Metal service connects to the registered nodes to gather more details about the hardware resources. • The discovery kernel and ramdisk images are used during this process. Deployment: • The stack user deploys overcloud nodes, allocating resources and nodes that were discovered during the introspection phase. • Hardware profiles and Orchestration templates are used during this phase. Registering an Overcloud Node Registering an overcloud node consists of adding it to the Bare Metal service list for possible nodes for the overcloud. The undercloud needs the following information to register a node: • The type of power management, such as IPMI or PXE over SSH, being used. The various power management drivers supported by the Bare Metal service can be listed using ironic driver-list. • The power management IP address for the node. • The credentials to be used for the power management interface. • The MAC address for the NIC on the PXE/provisioning network. • The kernel and ramdisk used for introspection.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
261
Chapter 6. Managing Resilient Compute Resources All of this information can be passed using a JSON (JavaScript Object Notation) file or using a CSV file. The openstack baremetal import command imports this file into the Bare Metal service database. [user@undercloud]$ openstack baremetal import --json instackenv.json
For the introspection and discovery of overcloud nodes, the Bare Metal service uses PXE (Preboot eXecution Environment), provided by the undercloud. The dnsmasq service is used to provide DHCP and PXE capabilities to the Bare Metal service. The PXE discovery images are delivered over HTTP. Prior to introspection, the registered nodes must have a valid kernel and ramdisk assigned to them, and every node for introspection has the following settings: • Power State set to power off. • Provision State set to manageable. • Maintenance set to False. • Instance UUID set to None. The openstack baremetal introspection command is used to start the introspection, and --all-manageable --provide informs the Bare Metal service to perform introspection on nodes that are in the manageable provision state. [user@undercloud]$ openstack baremetal introspection --all-manageable --provide Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec Waiting for introspection to finish...
Overcloud Deployment Roles After introspection, the undercloud knows which nodes are used for the deployment of the overcloud, but it may not know what overcloud node types are to be deployed. Flavors are used to assign node deployment roles, and they correspond to the overcloud node types: • control: for a controller node • compute: for a compute node • ceph-storage: for a Ceph storage node • block-storage: a Cinder storage node • object-storage: a Swift storage node The undercloud uses the baremetal hard-coded flavor, which must be set as the default flavor for any unused roles; otherwise, the role-specific flavors are used. [user@undercloud]$ openstack flavor create --id auto --ram 6144 --disk 38 --vcpus 2 \ baremetal [user@undercloud]$ openstack flavor create --id auto --ram 6144 --disk 38 --vcpus 2 \ compute
The undercloud performs automated role matching to apply appropriate hardware for each flavor of node. When nodes are on identical hardware and no flavors are created, the deployment roles
262
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Scaling Compute Nodes are randomly chosen for each node. Manual tagging can also be used to tie the deployment role to a node. To use these deployment profiles, they need to be associated to the respective flavors using the capabilities:profile property. The capabilities:boot_option property is required to set the boot mode for flavors.
Scaling Overcloud Compute Nodes The following steps outline the process for adding an additional compute node to the overcloud. 1.
Modify any applicable Orchestration templates located in /home/stack/templates/ cl210-environment directory on the undercloud node.
2.
On the undercloud node, create an instackenv.json file containing definitions for the additional compute node.
3.
Import the instackenv.json file using the command openstack baremetal import.
4.
Assign boot images to the additional compute node using the command openstack baremetal configure boot.
5.
Set the provisioning state to manageable using the command openstack baremetal node manage.
6.
Use the command openstack overcloud node introspect --all-manageable -provide to begin introspection.
7.
After introspection has completed successfully, update the node profile to use the compute role.
8.
Deploy the overcloud with the command openstack overcloud deploy --templates ~/templates --environment-directory ~/templates/cl210-environment.
References Further information is available for Adding Additional Nodes in the Director Installation and Usage guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
263
Chapter 6. Managing Resilient Compute Resources
Guided Exercise: Scaling Compute Nodes In this exercise, you will add a compute node to the overcloud. Resources Files:
http://materials.example.com/instackenv-onenode.json
Outcomes You should be able to add a compute node to the overcloud. Before you begin Log into workstation as student with a password of student. On workstation, run the lab resilience-scaling-nodes setup command. This ensures that the OpenStack services are running and the environment has been properly configured for this lab. [student@workstation ~]$ lab resilience-scaling-nodes setup
Steps 1. Use SSH to connect to director as the user stack and source the stackrc credentials file. 1.1. From workstation, use SSH to connect to director as the user stack and source the stackrc credentials file. [student@workstation ~]$ ssh stack@director [stack@director ~]$ source stackrc
2.
Prepare compute1 for introspection. 2.1. Download the instackenv-onenode.json file from http:// materials.example.com, for introspection of compute1, to /home/stack. [stack@director ~]$ wget http://materials.example.com/instackenv-onenode.json
2.2. Verify that the instackenv-onenode.json file is for compute1. [stack@director ~]$ cat ~/instackenv-onenode.json { "nodes": [ { "pm_user": "admin", "arch": "x86_64", "name": "compute1", "pm_addr": "172.25.249.112", "pm_password": "password", "pm_type": "pxe_ipmitool", "mac": [ "52:54:00:00:f9:0c" ], "cpu": "2",
264
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
"memory": "6144", "disk": "40" } ] }
2.3. Import instackenv-onenode.json into the Bare Metal service using the command openstack baremetal import, and ensure that the node has been properly imported. [stack@director ~]$ openstack baremetal import --json \ /home/stack/instackenv-onenode.json Started Mistral Workflow. Execution ID: 8976a32a-6125-4c65-95f1-2b97928f6777 Successfully registered node UUID b32d3987-9128-44b7-82a5-5798f4c2a96c Started Mistral Workflow. Execution ID: 63780fb7-bff7-43e6-bb2a-5c0149bc9acc Successfully set all nodes to available [stack@director ~]$ openstack baremetal node list \ -c Name -c 'Power State' -c 'Provisioning State' -c Maintenance +-------------+--------------------+-------------+-------------+ | Name | Provisioning State | Power State | Maintenance | +-------------+--------------------+-------------+-------------+ | controller0 | active | power on | False | | compute0 | active | power on | False | | ceph0 | active | power on | False | | compute1 | available | power off | False | +-------------+--------------------+-------------+-------------+
2.4. Prior to starting introspection, set the provisioning state for compute1 to manageable. [stack@director ~]$ openstack baremetal node manage compute1
3.
Begin introspection of compute1. 3.1. Initiate introspection of compute1. Introspection may take few minutes. [stack@director ~]$ openstack overcloud node introspect \ --all-manageable --provide Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec Waiting for introspection to finish... ...output omitted...
4.
Update the node profile for compute1. 4.1. Update the node profile for compute1 to assign it the compute profile. [stack@director ~]$ openstack baremetal node set compute1 \ --property "capabilities=profile:compute,boot_option:local"
5.
Configure 00-node-info.yaml to scale two compute nodes. 5.1. Edit /home/stack/templates/cl210-environment/00-node-info.yaml to scale to two compute nodes. ...output omitted...
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
265
Chapter 6. Managing Resilient Compute Resources ComputeCount: 2 ...output omitted...
6.
Deploy the overcloud, to scale out the compute nodes. 6.1. Deploy the overcloud, to scale out compute node by adding one more node. The deployment may take 40 minutes or more to complete. [stack@director ~]$ openstack overcloud deploy \ --templates ~/templates \ --environment-directory ~/templates/cl210-environment Removing the current plan files Uploading new plan files Started Mistral Workflow. Execution ID: 6de24270-c3ed-4c52-8aac-820f3e1795fe Plan updated Deploying templates in the directory /tmp/tripleoclient-WnZ2aA/tripleo-heattemplates Started Mistral Workflow. Execution ID: 50f42c4c-d310-409d-8d58-e11f993699cb ...output omitted...
Cleanup From workstation, run the lab resilience-scaling-nodes cleanup command to clean up this exercise. [student@workstation ~]$ lab resilience-scaling-nodes cleanup
266
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Migrating Instances using Block Storage
Migrating Instances using Block Storage Objectives After completing this section, students should be able to: • Describe the principle concepts and terminology of migration • Describe use cases for implementing block-based live migration • Configure block-based live migration
Introduction to Migration Migration is the process of moving a server instance from one compute node to another. In this and the following section of this chapter, the major lecture topic is live migration. Live migration relocates a server instance (virtual machine) from one compute node hypervisor to another while the server application is running, offering uninterrupted service. This section discusses the method known as block-based live migration and the next section discusses an alternative method known as shared storage live migration. First, however, it is important to define what is meant by migration, because one of the primary design goals of cloud architecture is to eliminate the need for legacy server management techniques, including many former use cases for migration. A major feature of cloud-designed applications is that they are resilient, scalable, distributed and stateless; commonly implemented in what is known as a microservices architecture. A microservice application scales, relocates, and self-repairs by deploying itself as replicated components instantiated as virtual machines or containers across many compute nodes, cells, zones, and regions. Applications designed this way share live state information such that the loss of any single component instance has little or no affect on the application or the service being offered. By definition, microservice cloud applications do not need to perform live migration. If a microservices component is to be relocated for any reason, a new component is instantiated in the desired location from the appropriate component image. The component joins the existing application and begins work while the unwanted component instance is simply terminated. Legacy applications, also referred to as enterprise applications, may also include resilient, scalable, and distributed features, but are distinguished by their need to act stateful. Enterprise application server instances cannot be terminated and discarded without losing application state or data, or corrupting data storage structures. Such applications must be migrated to relocate from one compute node to another. The simplest form of migration is cold migration. In legacy computing, a virtual machine is shut down, preserving configuration and state on its assigned disks, then rebooted on another hypervisor or in another data center after relocating the physical or virtual disks. This same concept remains available in OpenStack today. Cold migration is accomplished by taking an instance snapshot on a running, quiesced instance, then saving the snapshot as an image. As with legacy computing, the image is relocated and used to boot a new instance. The original instance remains in service, but the state transferred to the new instance only matches that which existed when the snapshot was taken.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
267
Chapter 6. Managing Resilient Compute Resources
Block Based Live Migration Live migration transfers a running instance from its current location to a new compute node, maintaining active client connections and performing work during the migration. The current state of the original instance is transferred to the new instance. Applications and users communicating with the applications and services on this instance should not detect interruption, other than some slight delays discernible at the moment of the final hand off to the new instance. Restated, live migration involves transferring memory-based active kernel and process structures from one virtual machine to another, with the destination taking over the activity while the source is eventually discarded. What about the disks on the source virtual machine, such as the root disk, extra ephemeral disks, swap disk, and persistent volumes? These disks also must be transferred and attached to the destination virtual machine. The method used, block based or shared storage, is directly related to the overcloud storage architecture that is implemented. With the shared storage method, if both the source and destination compute nodes connect to and have sufficient access privileges for the same shared storage locations containing the migrating instance's disks, then no physical disk movement occurs. The source compute node stops using the disks while the destination compute node takes over disk activity. Block-based live migration is the alternate method used when shared storage is not implemented. When the source and destination compute nodes do not share common-access storage, the root, ephemeral, swap and persistent volumes must be transferred to the storage location used by the destination compute node. When performance is a primary focus, blockbased live migration should be avoided. Instead, implement shared storage structures across common networks where live migration occurs regularly. Block-based Live Migration Use Cases Red Hat recommended practice for overcloud deployment is to install shared storage using a Ceph RBD storage cluster, but earlier use cases offered various configurations for managing instance disks: • Original, proof of concept installations, such as default Packstack installations, used the Compute service (Nova) to manage non-persistent root disks, ephemeral disks, and swap disks. Instance virtual disks managed by the Compute service are found in subdirectories in /var/ lib/nova/instances on each compute node's own disk. • Although /var/lib/nova/instances can be shared across compute nodes using GlusterFS or NFS, the default configuration had each compute node maintaining disk storage for each instance scheduled to their hypervisor. An instance rescheduled or redeployed to another hypervisor would cause a duplicate set of that instance's disks to be deployed on the new compute node. • Different compute nodes, even when operating in the same networks, can be connected to different storage arrays, Red Hat Virtualization data stores, or other back end storage subsystems. • Instances can be deployed using the Block Storage service volume-based transient or persistent disks instead of using the Compute service ephemeral storage, but compute nodes configured with different back ends require block-based migration. Implementation Requirements for Block Based Live Migration There are specific requirements for implementing block-based live migration:
268
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Configuring Block-based Live Migration • Both source and destination compute nodes must be located in the same subnet. • Both compute nodes must use the same processor type. • All controller and compute nodes must have consistent name resolution for all other nodes. • The UID and GID of the nova and libvirt users must be identical on all compute nodes. • Compute nodes must be using KVM with libvirt, which is expected when using Red Hat OpenStack Platform. The KVM with libvirt platform has the best coverage of features and stability for live migration. • The permissions and system access of local directories must be consistent across all nodes. • libvirt must be able to securely communicate between nodes. • Consistent multipath device naming must be used on both the source and destination compute nodes. Instances expect to resolve multipath device names similarly in both locations.
Configuring Block-based Live Migration Preparing for block-based live migration requires configuration for secure transfer of disk blocks over TCP, opening firewall ports for the block transfer, adding common users and access across all compute nodes, and configuring controllers with the configured access information. Secure TCP for Live Migration There are three secure options for remote access over TCP that are typically used for live migration. Using a libvirtd TCP socket, with one of these methods to match your environment's authentication resources: • TLS for encryption, X.509 client certificates for authentication • GSSAPI/Kerberos for both encryption and authentication • TLS for encryption, Kerberos for authentication Edit the /etc/libvirt/libvirtd.conf file with the chosen strategy: TCP Security Strategy Settings TLS with X509
GSSAPI with Kerberos
TLS with Kerberos
listen_tls = 1
listen_tls = 0
listen_tls = 1
listen_tcp = 0
listen_tcp = 1
listen_tcp = 0
auth_tls = "none"
auth_tls = "sasl" auth_tcp = "sasl"
tls_no_verify_certificate = 0 tls_allowed_dn_list = ["distinguished name"] sasl_allowed_username_list = ["Kerberos principal name"]
sasl_allowed_username_list = ["Kerberos principal name"]
Inform libvirt about which security strategy is implemented. • Update the /etc/sysconfig/libvirtd file to include: LIBVIRTD_ARGS="--listen"
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
269
Chapter 6. Managing Resilient Compute Resources • Update the access URI string in /etc/nova/nova.conf to match the strategy. Use "live_migration_uri=qemu+ACCESSTYPE://USER@%s/system", where ACCESSTYPE is tcp or tls and USER is nova or use %s, which defaults to the root user. • Restart the libvirtd service. [root@compute]# systemctl restart libvirtd.service
Configure Compute Nodes for Live Migration On each compute node, make the following configuration changes: • Ensure that OpenStack utilities and the VNC proxy are installed, using: [root@compute]# yum -y install iptables openstack-utils \ openstack-nova-novncproxy
• Add the nova group to the /etc/group file with a line like the following: nova:x:162:nova
• Add the nova user to the /etc/passwd file with a line like the following nova:x:162:162:OpenStack Nova Daemons:/var/lib/nova:/sbin/nologin
• Allow the nova user access to the compute node's ephemeral directory: [root@compute]# chown nova:nova /var/lib/nova/instances [root@compute]# chmod 775 /var/lib/nova/instances
• Add rules for TCP, TLS, and the ephemeral ports to the firewall: If using TCP: [root@compute]# iptables -v -I INPUT 1 -p tcp --dport 16509 -j ACCEPT [root@compute]# iptables -v -I INPUT -p tcp --dport 49152:49261 -j ACCEPT
If using TLS: [root@compute]# iptables -v -I INPUT 1 -p tcp --dport 16514 -j ACCEPT [root@compute]# iptables -v -I INPUT -p tcp --dport 49152:49261 -j ACCEPT
• Save the firewall rules. [root@compute]# service iptables save
• Update Qemu with three settings in the /etc/libvirt/qemu.conf file: user="root" group="root"
270
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Configuring Block-based Live Migration vnc_listen="0.0.0.0"
• Restart the libvirtd service to reflect these changes: [root@compute]# systemctl restart libvirtd.service
• Make the following changes to the compute service configuration file: [root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \ instances_path /var/lib/nova/instances [root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \ novncproxy_base_url http://controller0.overcloud.example.com:6080/vnc_auto.html [root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \ vncserver_listen 0.0.0.0 [root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \ block_migration_flag \ VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,\ VIR_MIGRATE_NON_SHARED_INC
• Restart the firewall and the OpenStack services: [root@compute]# systemctl restart iptables.service [user@compute]$ sudo openstack-service restart
Configure Controller Nodes for Live Migration On each controller node, make the following configuration changes: • Ensure that OpenStack utilities and the VNC proxy are installed, using: [root@controller]# yum -y install openstack-utils openstack-nova-novncproxy
• Make the following changes to the compute service configuration file: [root@controller]# crudini --set /etc/nova/nova.conf DEFAULT \ vncserver_listen 0.0.0.0
• Restart the OpenStack services: [user@controller]$ openstack-service restart
Migrate an Instance Using Block-based Live Migration. Locate the instance to be migrated, and verify the size and settings required. List available compute nodes by checking for active nova-compute services. Ensure that the intended destination compute node has sufficient resources for the migration. Invoke the migration using the syntax for block based live migration: [user@workstation~]$ openstack server migrate --block-migration \ --live dest_compute_node instance
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
271
Chapter 6. Managing Resilient Compute Resources Troubleshooting When migrations fail or appear to take too long, check the activity in the compute service log files on both the source and the destination compute node: • • • •
/var/log/nova/nova-api.log /var/log/nova/nova-compute.log /var/log/nova/nova-conductor.log /var/log/nova/nova-scheduler.log
Migrating Instances with Block Storage The following steps outline the process for the live migration of an instance using the block storage method. 1.
Ensure that the overcloud has more than one compute node added.
2.
Configure block storage and live migration on all compute nodes. Ensure that SELinux is set to permissive mode, and appropriate iptables rules are configured.
3.
On the controller node, update the vncserver_listen variable to listen for all connections in the /etc/nova/nova.conf file.
4.
As an administrator, ensure the instance to be migrated is in a running state.
5.
Using the administrator credentials, live migrate the instance to the destination compute node using the openstack server migrate command.
6.
Verify that the instance got migrated successfully to the destination compute node.
References Further information is available for Configuring Block Migration in the Migrating Instances guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/ Further information is available for Migrating Live Instances in the Migrating Instances guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
272
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Migrating Instances using Block Storage
Guided Exercise: Migrating Instances using Block Storage In this exercise, you will migrate a live instance using block storage. Outcomes You should be able to: • Configure block storage. • Live migrate an instance using block storage. Before you begin Log in to workstation as student with a password of student. This guided exercise requires two compute nodes, as configured in a previous guided exercise which added compute1 to the overcloud. If you did not successfully complete that guided exercise, have reset your overcloud systems, or for any reason have an overcloud with only a single installed compute node, you must first run the command lab resilience-blockstorage add-compute on workstation. The command's add-compute task adds the compute1 node to the overcloud, taking between 40 and 90 minutes to complete.
Important As described above, only run this command if you still need to install a second compute node. If you already have two functioning compute nodes, skip this task and continue with the setup task. [student@workstation ~]$ lab resilience-block-storage add-compute
After the add-compute task has completed successfuly, continue with the setup task in the following paragraph. Start with the setup task if you have two functioning compute nodes, either from having completed the previous overcloud scaling guided exercise, or by completing the extra addcompute task described above. On workstation, run the lab resilience-block-storage setup command. This command verifies the OpenStack environment and creates the project resources used in this exercise. [student@workstation ~]$ lab resilience-block-storage setup
Steps 1. Configure compute0 to use block-based migration. Later in this exercise, you will repeat these steps on compute1. 1.1. Log into compute0 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$ sudo -i
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
273
Chapter 6. Managing Resilient Compute Resources [root@overcloud-compute-0 ~]#
1.2. Configure iptables for live migration. [root@overcloud-compute-0 ~]# iptables -v -I INPUT 1 -p tcp \ --dport 16509 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509 [root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \ --dport 49152:49261 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261 [root@overcloud-compute-0 ~]# service iptables save
1.3. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf. Include the following lines at the bottom of the file. user="root" group="root" vnc_listen="0.0.0.0"
1.4. The classroom overcloud deployment uses Ceph as shared storage by default. Demonstrating block-based migration requires disabling shared storage for the Compute service. Enable the compute0 node to store virtual disk images, associated with running instances, locally under /var/lib/nova/instances. Edit the /etc/ nova/nova.conf file to set the images_type variable to default. [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \ libvirt images_type default
1.5. Configure /etc/nova/nova.conf for block-based live migration. [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT instances_path /var/lib/nova/instances [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url \ http://172.25.250.1:6080/vnc_auto.html [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT block_migration_flag VIR_MIGRATE_UNDEFINE_SOURCE,\ VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_NON_SHARED_INC
\ \
\ \
1.6. Restart OpenStack services and log out of compute0. [root@overcloud-compute-0 ~]# openstack-service restart [root@overcloud-compute-0 ~]# exit [heat-admin@overcloud-compute-0 ~]$ exit [student@workstation ~]$
2.
Configure compute1 to use block-based migration. 2.1. Log into compute1 as heat-admin and switch to the root user.
274
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~]$ ssh heat-admin@compute1 [heat-admin@overcloud-compute-1 ~]$ sudo -i [root@overcloud-compute-1 ~]#
2.2. Configure iptables for live migration. [root@overcloud-compute-1 ~]# iptables -v -I INPUT 1 -p tcp \ --dport 16509 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509 [root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \ --dport 49152:49261 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261 [root@overcloud-compute-1 ~]# service iptables save
2.3. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf. Include the following lines at the bottom of the file. user="root" group="root" vnc_listen="0.0.0.0"
2.4. The classroom overcloud deployment uses Ceph as shared storage by default. Demonstrating block-based migration requires disabling shared storage for the Compute service. Enable the compute0 node to store virtual disk images, associated with running instances, locally under /var/lib/nova/instances. Edit the /etc/ nova/nova.conf file to set the images_type variable to default. [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \ libvirt images_type default
2.5. Configure /etc/nova/nova.conf for block-based live migration. [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT instances_path /var/lib/nova/instances [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url \ http://172.25.250.1:6080/vnc_auto.html [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT block_migration_flag VIR_MIGRATE_UNDEFINE_SOURCE,\ VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_NON_SHARED_INC
\ \
\ \
2.6. Restart OpenStack services and log out of compute1. [root@overcloud-compute-1 ~]# openstack-service restart [root@overcloud-compute-1 ~]# exit [heat-admin@overcloud-compute-1 ~]$ exit [student@workstation ~]$
3.
Configure controller0 for block-based live migration.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
275
Chapter 6. Managing Resilient Compute Resources 3.1. Log into controller0 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ sudo -i [root@overcloud-controller-0 ~]#
3.2. Update the vncserver_listen variable in /etc/nova/nova.conf. [root@overcloud-controller-0 ~]# openstack-config --set /etc/nova/nova.conf \ DEFAULT vncserver_listen 0.0.0.0
3.3. Restart the OpenStack Compute services. Exit controller0. [root@overcloud-controller-0 ~]# openstack-service restart nova [root@overcloud-controller-0 ~]# exit [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$
4.
From workstation, source the /home/student/developer1-finance-rc environment file and launch an instance as the user developer1 using the following attributes: Instance Attributes Attribute
Value
flavor
m1.web
key pair
developer1-keypair1
network
finance-network1
image
rhel7
security group
finance-web
name
finance-web1
[student@workstation ~]$ source ~/developer1-finance-rc [student@workstation ~(developer1-finance)]$ openstack server create \ --flavor m1.web \ --key-name developer1-keypair1 \ --nic net-id=finance-network1 \ --security-group finance-web \ --image rhel7 finance-web1 --wait ...output omitted...
5.
List the available floating IP addresses, then allocate one to the finance-web1 instance. 5.1. List the floating IPs. An available one has the Port attribute set to None. [student@workstation ~(developer1-finance)]$ openstack floating ip list \ -c "Floating IP Address" -c Port +---------------------+------+ | Floating IP Address | Port | +---------------------+------+ | 172.25.250.N | None |
276
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
+---------------------+------+
5.2. Attach an available floating IP to the instance finance-web1. [student@workstation ~(developer1-finance)]$ openstack server add \ floating ip finance-web1 172.25.250.N
5.3. Log in to the finance-web1 instance using /home/student/developer1keypair1.pem with ssh to ensure it is working properly, then log out of the instance. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.N' (ECDSA) to the list of known hosts. [cloud-user@finance-web1 ~]$ exit [student@workstation ~(developer1-finance)]$
6.
Migrate the instance finance-web1 using block-based live migration. 6.1. To perform live migration, the user developer1 must have the admin role assigned for the project finance. Assign the admin role to developer1 for the project finance. The developer1 user may already have been assigned the admin role. [student@workstation [student@workstation developer1 --project [student@workstation
~(developer1-finance)]$ source ~/admin-rc ~(admin-admin)]$ openstack role add --user \ finance admin ~(admin-admin)]$ source ~/developer1-finance-rc
6.2. Determine whether the instance is currently running on overcloud-compute-0 or overcloud-compute-1. This example starts with the instance running on overcloud-compute-1. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web1 -f json | grep compute "OS-EXT-SRV-ATTR:host": "overcloud-compute-1.localdomain", "OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",
6.3. Prior to migration, ensure the destination compute node has sufficient resources to host the instance. In this example, the current server instance location node is overcloudcompute-1.localdomain, and the destination to check is overcloud-compute-0. Modify the command to reflect your actual source and destination compute nodes. Estimate whether the total minus the amount used now is sufficient. [student@workstation ~(developer1-finance)]$ openstack host show \ overcloud-compute-0.localdomain -f json [ { "Project": "(total)", "Disk GB": 39, "Host": "overcloud-compute-0.localdomain", "CPU": 2, "Memory MB": 6143 }, {
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
277
Chapter 6. Managing Resilient Compute Resources "Project": "(used_now)", "Disk GB": 0, "Host": "overcloud-compute-0.localdomain", "CPU": 0, "Memory MB": 2048 }, { "Project": "(used_max)", "Disk GB": 0, "Host": "overcloud-compute-0.localdomain", "CPU": 0, "Memory MB": 0 }
6.4. Migrate the instance finance-web1 to a new compute node. In this example, the instance is migrated from overcloud-compute-1 to overcloud-compute-0. Your scenario may require migrating in the reverse direction. [student@workstation ~(developer1-finance)]$ openstack server migrate \ --block-migration \ --live overcloud-compute-0.localdomain \ --wait finance-web1 Complete
7.
Use the command openstack server show to verify that the migration of financeweb1 using block storage migration was successful. The compute node displayed should be the destination node. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web1 -f json | grep compute "OS-EXT-SRV-ATTR:host": "overcloud-compute-0.localdomain", "OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-0.localdomain",
Cleanup From workstation, run the lab resilience-block-storage cleanup command to clean up this exercise. [student@workstation ~]$ lab resilience-block-storage cleanup
278
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Important - your next step After this Guided Exercise, if you intend to either continue directly to ending Chapter Lab or skip directly to the next Chapter, you must first reset your virtual machines. Save any data that you would like to keep from the virtual machines. After the data is saved, reset all of the virtual machines. In the physical classroom environment, use the rht-vmctl reset all command. In the online classroom environment, delete the current lab environment and provision a new lab environment. If you intend to repeat either of the two Live Migration Guided Exercises in this chapter that require two compute nodes, do not reset your virtual machines. Because your overcloud currently has two functioning compute nodes, you may repeat the Live Migration Guided Exercises without running the add-compute task that was required to build the second compute node.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
279
Chapter 6. Managing Resilient Compute Resources
Migrating Instances with Shared Storage Objectives After completing this section, students should be able to: • Configure shared storage for the Compute services. • Perform instance live migration with shared storage.
Shared Storage for Live Migration Live migration using shared storage is one of the two methods used for live migration. With the shared storage method, if both the source and destination compute nodes connect to and have sufficient access privileges for the same shared storage locations containing the migrating instance's disks, then no disk data transfer occurs. The source compute node stops using the disks while the destination compute node takes over disk activity. When the openstack server migrate command is issued, the source sends an instance's memory content to the destination. During the transfer process, the memory pages on the source host are still being modified in real time. The source host tracks the memory pages that were modified during the transfer and retransmits them after the initial bulk transfer is completed. The instance's memory content must be transferred faster than memory pages are written on the source virtual machine. After all retransmittal is complete, an identical instance is started on the destination host. In parallel, the virtual network infrastructure redirects the network traffic. Live migration using block storage uses a similar process as shared storage live migration. However, with block storage, disk content is copied before the memory content is transferred, making live migration with shared storage quicker and more efficient. Live migration configuration options The following is a list of the live migration configuration options available for libvirt in /etc/ nova/nova.conf and their default values. Live migration configuration options Parameters
Description
live_migration_retry_count = 30
Number of retries needed in live migration.
max_concurrent_live_migrations = 1
Maximum number of concurrent live migrations to run.
live_migration_bandwidth = 0
Maximum bandwidth to be used in MiB/s. If set to 0, a suitable value is chosen automatically.
live_migration_completion_timeout = 800
Timeout value in seconds for successful migration to complete before aborting the operation.
live_migration_downtime = 500
Maximum permitted downtime, in milliseconds.
live_migration_downtime_delay = 75
Time to wait, in seconds, between each step increase of the migration downtime.
live_migration_downtime_steps = 10
Number of incremental steps to reach max downtime value.
280
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Configuring Shared Storage Live Migration Parameters
Description
live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED
Migration flags to be set for live migration.
live_migration_progress_timeout = 150
Time to wait, in seconds, for migration to make progress in transferring data before aborting the operation.
live_migration_uri = qemu+tcp://%s/ system
Migration target URI.
Configuring Shared Storage Live Migration Preparing for shared storage based live migration requires configuring a shared file system, opening firewall ports, ensuring common users and access across all compute nodes, and configuring controllers with the required access information. Secure TCP for Live Migration There are three secure options for remote access over TCP that are typically used for live migration. Use a libvirtd TCP socket, with one of these methods to match your environments authentication resources: • TLS for encryption, X.509 client certificates for authentication • GSSAPI/Kerberos for both encryption and authentication • TLS for encryption, Kerberos for authentication Edit the /etc/libvirt/libvirtd.conf file with the chosen strategy: TCP Security Strategy Settings TLS with X509
GSSAPI with Kerberos
TLS with Kerberos
listen_tls = 1
listen_tls = 0
listen_tls = 1
listen_tcp = 0
listen_tcp = 1
listen_tcp = 0
auth_tls = "none"
auth_tls = "sasl" auth_tcp = "sasl"
tls_no_verify_certificate = 0 tls_allowed_dn_list = ["distinguished name"] sasl_allowed_username_list = ["Kerberos principal name"]
sasl_allowed_username_list = ["Kerberos principal name"]
Inform libvirt about which security strategy is implemented. • Update the /etc/sysconfig/libvirtd file to include: LIBVIRTD_ARGS="--listen" • Update the access URI string in /etc/nova/nova.conf to match the strategy. Use "live_migration_uri=qemu+ACCESSTYPE://USER@%s/system", where ACCESSTYPE is tcp or tls and USER is nova or use '%s', which defaults to the root user.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
281
Chapter 6. Managing Resilient Compute Resources • Restart the libvirtd service. [user@compute ~]$ sudo systemctl restart libvirtd.service
Shared Storage Live Migration Configuration for Controllers The following outlines the process for configuring controllers for live migration using shared storage. 1.
Ensure that the nfs-utils, openstack-nova-novncproxy, and openstack-utils packages are installed.
2.
Configure /etc/sysconfig/nfs to set fixed ports for NFS server services.
3.
Add firewall rules for NFS, TCP, TLS, and Portmap.
4.
Configure the /etc/exports file to export /var/lib/nova/instances to the compute nodes.
5.
Start and enable the NFS service.
6.
Export the NFS directory.
7.
Update /etc/nova/nova.conf with vncserver_listen 0.0.0.0 to enable VNC access.
8.
Restart the OpenStack services.
Shared Storage Live Migration Configuration for Compute The following outlines the process for configuring the compute nodes for live migration using shared storage. 1.
Ensure that the nfs-utilsand openstack-utils packages are installed.
2.
Add rules for TCP, TLS, and the ephemeral ports to the firewall.
3.
Update qemu with three settings in /etc/libvirt/qemu.conf for user, group, and vnc_listen.
4.
Restart the libvirtd service to activate these changes.
5.
Edit the nova.conf to set Compute service configuration parameters.
6.
Restart the compute node services.
Migrating an Instance with Shared Storage The following steps outline the process for live migrating an instance using shared storage method. 1.
Determine which node the instance is currently running on.
2.
Ensure the destination compute node has sufficient resources to host the instance.
3.
Migrate the instance from node one compute node to another by using the openstack server migrate command.
282
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Configuring Shared Storage Live Migration 4.
Verify that the instance has migrated successfully.
Troubleshooting When migration fails or takes too long, check the activity in the Compute service log files on both the source and the destination compute nodes: • /var/log/nova/nova-api.log • /var/log/nova/nova-compute.log • /var/log/nova/nova-conductor.log • /var/log/nova/nova-scheduler.log
References Further information is available for Configuring NFS Shared Storage in the Migrating Instances guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/ Further information is available for Migrating Live Instances in the Migrating Instances guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
283
Chapter 6. Managing Resilient Compute Resources
Guided Exercise: Migrating Instances with Shared Storage In this exercise, you will configure shared storage and migrate a live instance. Outcomes You should be able to: • Configure shared storage. • Live migrate an instance using shared storage. Before you begin Log in to workstation as student with a password of student. This guided exercise requires two compute nodes, as configured in a previous guided exercise which added compute1 to the overcloud. If you did not successfully complete that guided exercise, have reset your overcloud systems, or for any reason have an overcloud with only a single installed compute node, you must first run the command lab resilience-sharedstorage add-compute on workstation. The command's add-compute task adds the compute1 node to the overcloud, taking between 40 and 90 minutes to complete.
Important As described above, only run this command if you still need to install a second compute node. If you already have two functioning compute nodes, skip this task and continue with the setup task. [student@workstation ~]$ lab resilience-shared-storage add-compute
After the add-compute task has completed successfuly, continue with the setup task in the following paragraph. Start with the setup task if you have two functioning compute nodes, either from having completed the previous overcloud scaling guided exercise, or by completing the extra addcompute task described above. On workstation, run the lab resilience-sharedstorage setup command. This command verifies the OpenStack environment and creates the project resources used in this exercise. [student@workstation ~]$ lab resilience-shared-storage setup
Steps 1. Configure controller0 for shared storage. 1.1. Log into controller0 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ sudo -i [root@overcloud-controller-0 ~]#
284
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
1.2. Install the nfs-utils package. [root@overcloud-controller-0 ~]# yum -y install nfs-utils
1.3. Configure iptables for NFSv4 shared storage. [root@overcloud-controller-0 ~]# iptables -v -I INPUT \ -p tcp --dport 2049 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 [root@overcloud-controller-0 ~]# service iptables save
tcp dpt:2049
1.4. Configure /etc/exports to export /var/lib/nova/instances via NFS to compute0 and compute1. Add the following lines to the bottom of the file. /var/lib/nova/instances 172.25.250.2(rw,sync,fsid=0,no_root_squash) /var/lib/nova/instances 172.25.250.12(rw,sync,fsid=0,no_root_squash)
1.5. Enable and start the NFS service. [root@overcloud-controller-0 ~]# systemctl enable nfs --now
1.6. Confirm the directory is exported. [root@overcloud-controller-0 ~]# exportfs /var/lib/nova/instances 172.25.250.2 /var/lib/nova/instances 172.25.250.12
1.7. Update the vncserver_listen variable in /etc/nova/nova.conf. [root@overcloud-controller-0 ~]# openstack-config --set /etc/nova/nova.conf \ DEFAULT vncserver_listen 0.0.0.0
1.8. Restart OpenStack Compute services, then log out of controller0. [root@overcloud-controller-0 ~]# openstack-service restart nova [root@overcloud-controller-0 ~]# exit [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$
2.
Configure compute0 to use shared storage. Later in this exercise, you will repeat these steps on compute1. 2.1. Log into compute0 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$ sudo -i [root@overcloud-compute-0 ~]#
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
285
Chapter 6. Managing Resilient Compute Resources 2.2. Configure /etc/fstab to mount the directory /var/lib/nova/instances, exported from controller0. Add the following line to the bottom of the file. Confirm that the entry is on a single line in the file; the two line display here in the book is due to insufficient width. 172.25.250.1:/ /var/lib/nova/instances nfs4 context="system_u:object_r:nova_var_lib_t:s0" 0 0
2.3. Mount the export from controller0 on /var/lib/nova/instances. [root@overcloud-compute-0 ~]# mount -v /var/lib/nova/instances
2.4. Configure iptables to allow shared storage live migration. [root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \ --dport 16509 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509 [root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \ --dport 49152:49261 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261 [root@overcloud-compute-0 ~]# service iptables save
2.5. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the following lines to the bottom of the file. user="root" group="root" vnc_listen="0.0.0.0"
2.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live migration. Use the nfs mounted /var/lib/nova/instances directory to store instance virtual disks. [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf libvirt images_type default [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT instances_path /var/lib/nova/instances [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT live_migration_flag \ VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
\ \ \ \ \
2.7. Restart OpenStack services and log out of compute0. [root@overcloud-compute-0 ~]# openstack-service restart [root@overcloud-compute-0 ~]# exit [heat-admin@overcloud-compute-0 ~]$ exit [student@workstation ~]$
286
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
3.
Configure compute1 to use shared migration. 3.1. Log into compute1 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@compute1 [heat-admin@overcloud-compute-1 ~]$ sudo -i [root@overcloud-compute-1 ~]#
3.2. Configure /etc/fstab to mount the directory /var/lib/nova/instances, exported from controller0. Add the following line to the bottom of the file. Confirm that the entry is on a single line in the file; the two line display here in the book is due to insufficient width. 172.25.250.1:/ /var/lib/nova/instances nfs4 context="system_u:object_r:nova_var_lib_t:s0" 0 0
3.3. Mount the export from controller0 on /var/lib/nova/instances. [root@overcloud-compute-1 ~]# mount -v /var/lib/nova/instances
3.4. Configure iptables for live migration. [root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \ --dport 16509 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509 [root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \ --dport 49152:49261 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261 [root@overcloud-compute-1 ~]# service iptables save
3.5. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the following lines to the bottom of the file. user="root" group="root" vnc_listen="0.0.0.0"
3.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live migration. Use the nfs mounted /var/lib/nova/instances directory to store instance virtual disks. [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf libvirt images_type default [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT instances_path /var/lib/nova/instances [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT live_migration_flag \ VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
\ \ \ \ \
287
Chapter 6. Managing Resilient Compute Resources 3.7. Restart OpenStack services and log out of compute1. [root@overcloud-compute-1 ~]# openstack-service restart [root@overcloud-compute-1 ~]# exit [heat-admin@overcloud-compute-1 ~]$ exit [student@workstation ~]$
4.
From workstation, source the /home/student/developer1-finance-rc environment file, and launch an instance as the user developer1 using the following attributes: Instance Attributes Attribute
Value
flavor
m1.web
key pair
developer1-keypair1
network
finance-network1
image
rhel7
security group
finance-web
name
finance-web2
[student@workstation ~]$ source ~/developer1-finance-rc [student@workstation ~(developer1-finance)]$ openstack server create \ --flavor m1.web \ --key-name developer1-keypair1 \ --nic net-id=finance-network1 \ --security-group finance-web \ --image rhel7 finance-web2 --wait ...output omitted...
5.
List the available floating IP addresses, then allocate one to the finance-web2 instance. 5.1. List the floating IPs. An available one has the Port attribute set to None. [student@workstation ~(developer1-finance)]$ openstack floating ip list \ -c "Floating IP Address" -c Port +---------------------+------+ | Floating IP Address | Port | +---------------------+------+ | 172.25.250.N | None | +---------------------+------+
5.2. Attach an available floating IP to the instance finance-web2. [student@workstation ~(developer1-finance)]$ openstack server add \ floating ip finance-web2 172.25.250.N
5.3. Log in to the finance-web2 instance using /home/student/developer1keypair1.pem with ssh to ensure it is working properly, then log out of the instance. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \
288
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[email protected] Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts. [cloud-user@finance-web2 ~]$ exit [student@workstation ~(developer1-finance)]$
6.
Migrate the instance finance-web2 using shared storage migration. 6.1. To perform live migration, the developer1 user must have the admin role assigned for the project finance. Assign the admin role to developer1 for the project finance. The developer1 user may already have been assigned the admin role. [student@workstation [student@workstation developer1 --project [student@workstation
~(developer1-finance)]$ source ~/admin-rc ~(admin-admin)]$ openstack role add --user \ finance admin ~(admin-admin)]$ source ~/developer1-finance-rc
6.2. Determine whether the instance is currently running on overcloud-compute-0 or overcloud-compute-1. In the following example the instance is running on overcloud-compute-1. However, your instance may be running on overcloudcompute-0. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web2 -f json | grep compute "OS-EXT-SRV-ATTR:host": "overcloud-compute-1.localdomain", "OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",
6.3. Prior to migration, ensure the destination compute node has sufficient resources to host the instance. In this example, the current server instance location node is overcloudcompute-1.localdomain, and the destination to check is overcloud-compute-0. Modify the command to reflect your actual source and destination compute nodes. Estimate whether the total, minus the amount used now, is sufficient. [student@workstation ~(developer1-finance)]$ openstack host show \ overcloud-compute-0.localdomain -f json [ { "Project": "(total)", "Disk GB": 56, "Host": "overcloud-compute-0.localdomain", "CPU": 2, "Memory MB": 6143 }, { "Project": "(used_now)", "Disk GB": 0, "Host": "overcloud-compute-0.localdomain", "CPU": 0, "Memory MB": 2048 }, { "Project": "(used_max)", "Disk GB": 0, "Host": "overcloud-compute-0.localdomain", "CPU": 0, "Memory MB": 0 }
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
289
Chapter 6. Managing Resilient Compute Resources 6.4. Migrate the instance finance-web1 to a new compute node. In this example, the instance is migrated from overcloud-compute-1 to overcloud-compute-0. Your scenario may require migrating in the opposite direction. [student@workstation ~(developer1-finance)]$ openstack server migrate \ --shared-migration \ --live overcloud-compute-0.localdomain \ --wait finance-web2 Complete
7.
Use the command openstack server show to verify that finance-web2 is now running on the other compute node. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web2 -f json | grep compute "OS-EXT-SRV-ATTR:host": "overcloud-compute-0.localdomain", "OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-0.localdomain",
Cleanup From workstation, run the lab resilience-shared-storage cleanup command to clean up this exercise. [student@workstation ~]$ lab resilience-shared-storage cleanup
Important - your next step After this Guided Exercise, if you intend to either continue directly to ending Chapter Lab or skip directly to the next Chapter, you must first reset your virtual machines. Save any data that you would like to keep from the virtual machines. After the data is saved, reset all of the virtual machines. In the physical classroom environment, use the rht-vmctl reset all command. In the online classroom environment, delete the current lab environment and provision a new lab environment. If you intend to repeat either of the two Live Migration Guided Exercises in this chapter that require two compute nodes, do not reset your virtual machines. Because your overcloud currently has two functioning compute nodes, you may repeat the Live Migration Guided Exercises without running the add-compute task that was required to build the second compute node.
290
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Lab: Managing Resilient Compute Resources
Lab: Managing Resilient Compute Resources In this lab, you will add compute nodes, manage shared storage, and perform instance live migration. Resources Files:
http://materials.example.com/instackenv-onenode.json
Outcomes You should be able to: • Add a compute node. • Configure shared storage. • Live migrate an instance using shared storage. Before you begin If you have not done so already, save any data that you would like to keep from the virtual machines. After the data is saved, reset all of the virtual machines. In the physical classroom environment, reset all of the virtual machines using the command rht-vmctl. In the online environment, delete and provision a new classroom lab environment. Log in to workstation as student with a password of student. On workstation, run the lab resilience-review setup command. The script ensures that OpenStack services are running and the environment has been properly configured for the lab. [student@workstation ~]$ lab resilience-review setup
Steps 1. Use SSH to connect to director as the user stack and source the stackrc credentials file. 2.
Prepare compute1 for introspection. Use the details available in http:// materials.example.com/instackenv-onenode.json file.
3.
Initiate introspection of compute1. Introspection may take a few minutes.
4.
Update the node profile for compute1 to use the compute profile.
5.
Configure 00-node-info.yaml to scale two compute nodes.
6.
Deploy the overcloud, to scale compute by adding one node.
7.
Prepare compute1 for the next part of the lab. [student@workstation ~] lab resilience-review prep-compute1
8.
Configure controller0 for shared storage.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
291
Chapter 6. Managing Resilient Compute Resources 9.
Configure shared storage for compute0.
10. Configure shared storage for compute1. 11.
Launch an instance named production1 as the user operator1 using the following attributes: Instance Attributes Attribute
Value
flavor
m1.web
key pair
operator1-keypair1
network
production-network1
image
rhel7
security group
production
name
production1
12. List the available floating IP addresses, then allocate one to the production1 instance. 13. Ensure that the production1 instance is accessible by logging in to the instance as the user cloud-user, then log out of the instance. 14. Migrate the instance production1 using shared storage. 15. Verify that the migration of production1 using shared storage was successful. Evaluation From workstation, run the lab resilience-review grade command to confirm the success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~]$ lab resilience-review grade
Cleanup Save any data that you would like to keep from the virtual machines. After the data is saved, reset all of the overcloud virtual machines and the director virtual machine. In the physical classroom environment, reset all of the overcloud virtual machines and the director virtual machine using the rht-vmctl command. In the online environment, reset and start the director and overcloud nodes.
292
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution In this lab, you will add compute nodes, manage shared storage, and perform instance live migration. Resources Files:
http://materials.example.com/instackenv-onenode.json
Outcomes You should be able to: • Add a compute node. • Configure shared storage. • Live migrate an instance using shared storage. Before you begin If you have not done so already, save any data that you would like to keep from the virtual machines. After the data is saved, reset all of the virtual machines. In the physical classroom environment, reset all of the virtual machines using the command rht-vmctl. In the online environment, delete and provision a new classroom lab environment. Log in to workstation as student with a password of student. On workstation, run the lab resilience-review setup command. The script ensures that OpenStack services are running and the environment has been properly configured for the lab. [student@workstation ~]$ lab resilience-review setup
Steps 1. Use SSH to connect to director as the user stack and source the stackrc credentials file. [student@workstation ~]$ ssh stack@director [stack@director ~]$
2.
Prepare compute1 for introspection. Use the details available in http:// materials.example.com/instackenv-onenode.json file. 2.1. Download the instackenv-onenode.json file from http:// materials.example.com to /home/stack for introspection of compute1. [stack@director ~]$ wget http://materials.example.com/instackenv-onenode.json
2.2. Verify that the instackenv-onenode.json file is for compute1. [stack@director ~]$ cat ~/instackenv-onenode.json { "nodes": [ { "pm_user": "admin",
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
293
Chapter 6. Managing Resilient Compute Resources "arch": "x86_64", "name": "compute1", "pm_addr": "172.25.249.112", "pm_password": "password", "pm_type": "pxe_ipmitool", "mac": [ "52:54:00:00:f9:0c" ], "cpu": "2", "memory": "6144", "disk": "40" } ] }
2.3. Import instackenv-onenode.json into the baremetal service using openstack baremetal import, and ensure that the node has been properly imported. [stack@director ~]$ openstack baremetal import --json \ /home/stack/instackenv-onenode.json Started Mistral Workflow. Execution ID: 8976a32a-6125-4c65-95f1-2b97928f6777 Successfully registered node UUID b32d3987-9128-44b7-82a5-5798f4c2a96c Started Mistral Workflow. Execution ID: 63780fb7-bff7-43e6-bb2a-5c0149bc9acc Successfully set all nodes to available [stack@director ~]$ openstack baremetal node list \ -c Name -c 'Power State' -c 'Provisioning State' -c Maintenance +-------------+--------------------+-------------+-------------+ | Name | Provisioning State | Power State | Maintenance | +-------------+--------------------+-------------+-------------+ | controller0 | active | power on | False | | compute0 | active | power on | False | | ceph0 | active | power on | False | | compute1 | available | power off | False | +-------------+--------------------+-------------+-------------+
2.4. Prior to starting introspection, set the provisioning state for compute1 to manageable. [stack@director ~]$ openstack baremetal node manage compute1
3.
Initiate introspection of compute1. Introspection may take a few minutes. [stack@director ~]$ openstack overcloud node introspect \ --all-manageable --provide Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec Waiting for introspection to finish... ...output omitted...
4.
Update the node profile for compute1 to use the compute profile. [stack@director ~]$ openstack baremetal node set compute1 \ --property "capabilities=profile:compute,boot_option:local"
5.
Configure 00-node-info.yaml to scale two compute nodes. Update the ComputeCount line as follows.
294
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
ComputeCount: 2
6.
Deploy the overcloud, to scale compute by adding one node. [stack@director ~]$ openstack overcloud deploy \ --templates ~/templates \ --environment-directory ~/templates/cl210-environment Removing the current plan files Uploading new plan files Started Mistral Workflow. Execution ID: 6de24270-c3ed-4c52-8aac-820f3e1795fe Plan updated Deploying templates in the directory /tmp/tripleoclient-WnZ2aA/tripleo-heattemplates Started Mistral Workflow. Execution ID: 50f42c4c-d310-409d-8d58-e11f993699cb ...output omitted...
7.
Prepare compute1 for the next part of the lab. [student@workstation ~] lab resilience-review prep-compute1
8.
Configure controller0 for shared storage. 8.1. Log into controller0 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$ sudo -i [root@overcloud-controller-0 ~]#
8.2. Install the nfs-utils package. [root@overcloud-controller-0 ~]# yum -y install nfs-utils
8.3. Configure iptables for NFSv4 shared storage. [root@overcloud-controller-0 ~]# iptables -v -I INPUT \ -p tcp --dport 2049 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 [root@overcloud-controller-0 ~]# service iptables save
tcp dpt:2049
8.4. Configure /etc/exports to export /var/lib/nova/instances via NFS to compute0 and compute1. Add the following lines to the bottom of the file. /var/lib/nova/instances 172.25.250.2(rw,sync,fsid=0,no_root_squash) /var/lib/nova/instances 172.25.250.12(rw,sync,fsid=0,no_root_squash)
8.5. Enable and start the NFS service. [root@overcloud-controller-0 ~]# systemctl enable nfs --now
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
295
Chapter 6. Managing Resilient Compute Resources 8.6. Confirm the directory is exported. [root@overcloud-controller-0 ~]# exportfs /var/lib/nova/instances 172.25.250.2 /var/lib/nova/instances 172.25.250.12
8.7. Update the vncserver_listen variable in /etc/nova/nova.conf. [root@overcloud-controller-0 ~]# openstack-config --set /etc/nova/nova.conf \ DEFAULT vncserver_listen 0.0.0.0
8.8. Restart OpenStack Compute services, then log out of controller0. [root@overcloud-controller-0 ~]# openstack-service restart nova [root@overcloud-controller-0 ~]# exit [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$
9.
Configure shared storage for compute0. 9.1. Log into compute0 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@compute0 [heat-admin@overcloud-compute-0 ~]$ sudo -i [root@overcloud-compute-0 ~]#
9.2. Configure /etc/fstab to mount the directory /var/lib/nova/instances, exported from controller0. Add the following line to the bottom of the file. Confirm that the entry is on a single line in the file; the two line display here in the book is due to insufficient width. 172.25.250.1:/ /var/lib/nova/instances nfs4 context="system_u:object_r:nova_var_lib_t:s0" 0 0
9.3. Mount the export from controller0 on /var/lib/nova/instances. [root@overcloud-compute-0 ~]# mount -v /var/lib/nova/instances
9.4. Configure iptables to allow shared storage live migration. [root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \ --dport 16509 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509 [root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \ --dport 49152:49261 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261 [root@overcloud-compute-0 ~]# service iptables save
296
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 9.5. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the following lines to the bottom of the file. user="root" group="root" vnc_listen="0.0.0.0"
9.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live migration. Use the nfs mounted /var/lib/nova/instances directory to store instance virtual disks. [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf libvirt images_type default [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT instances_path /var/lib/nova/instances [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 [root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT live_migration_flag \ VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
\ \ \ \ \
9.7. Restart OpenStack services and log out of compute0. [root@overcloud-compute-0 ~]# openstack-service restart [root@overcloud-compute-0 ~]# exit [heat-admin@overcloud-compute-0 ~]$ exit [student@workstation ~]$
10. Configure shared storage for compute1. 10.1. Log into compute1 as heat-admin and switch to the root user. [student@workstation ~]$ ssh heat-admin@compute1 [heat-admin@overcloud-compute-1 ~]$ sudo -i [root@overcloud-compute-1 ~]#
10.2.Configure /etc/fstab to mount the directory /var/lib/nova/instances, exported from controller0. Add the following line to the bottom of the file. Confirm that the entry is on a single line in the file; the two line display here in the book is due to insufficient width. 172.25.250.1:/ /var/lib/nova/instances nfs4 context="system_u:object_r:nova_var_lib_t:s0" 0 0
10.3.Mount the export from controller0 on /var/lib/nova/instances. [root@overcloud-compute-1 ~]# mount -v /var/lib/nova/instances
10.4.Configure iptables for live migration.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
297
Chapter 6. Managing Resilient Compute Resources
[root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \ --dport 16509 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509 [root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \ --dport 49152:49261 -j ACCEPT ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261 [root@overcloud-compute-1 ~]# service iptables save
10.5.Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the following lines to the bottom of the file. user="root" group="root" vnc_listen="0.0.0.0"
10.6.Configure /etc/nova/nova.conf virtual disk storage and other properties for live migration. Use the nfs mounted /var/lib/nova/instances directory to store instance virtual disks. [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf libvirt images_type default [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT instances_path /var/lib/nova/instances [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 [root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf DEFAULT live_migration_flag \ VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
\ \ \ \ \
10.7. Restart OpenStack services and log out of compute1. [root@overcloud-compute-1 ~]# openstack-service restart [root@overcloud-compute-1 ~]# exit [heat-admin@overcloud-compute-1 ~]$ exit [student@workstation ~]$
11.
Launch an instance named production1 as the user operator1 using the following attributes: Instance Attributes
298
Attribute
Value
flavor
m1.web
key pair
operator1-keypair1
network
production-network1
image
rhel7
security group
production
name
production1
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
[student@workstation ~]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack server create \ --flavor m1.web \ --key-name operator1-keypair1 \ --nic net-id=production-network1 \ --security-group production \ --image rhel7 \ --wait production1 ...output omitted...
12. List the available floating IP addresses, then allocate one to the production1 instance. 12.1. List the floating IPs. An available one has the Port attribute set to None. [student@workstation ~(operator1-production)]$ openstack floating ip list \ -c "Floating IP Address" -c Port +---------------------+------+ | Floating IP Address | Port | +---------------------+------+ | 172.25.250.P | None | +---------------------+------+
12.2.Attach an available floating IP to the instance production1. [student@workstation ~(operator1-production)]$ openstack server add \ floating ip production1 172.25.250.P
13. Ensure that the production1 instance is accessible by logging in to the instance as the user cloud-user, then log out of the instance. [student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts. [cloud-user@production1 ~]$ exit [student@workstation ~(operator1-production)]$
14. Migrate the instance production1 using shared storage. 14.1. To perform live migration, the user operator1 must have the admin role assigned for the project production. Assign the admin role to operator1 for the project production. Source the /home/student/admin-rc file to export the admin user credentials. [student@workstation ~(operator1-production)]$ source ~/admin-rc [student@workstation ~(admin-admin)]$ openstack role add --user \ operator1 --project production admin
14.2.Determine whether the instance is currently running on compute0 or compute1. In the example below, the instance is running on compute0, but your instance may be running on compute1.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
299
Chapter 6. Managing Resilient Compute Resources Source the /home/student/operator1-production-rc file to export the operator1 user credentials. [student@workstation ~(admin-admin)]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack server show \ production1 -f json | grep compute "OS-EXT-SRV-ATTR:host": "overcloud-compute-0.localdomain", "OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-0.localdomain",
14.3.Prior to migration, ensure compute1 has sufficient resources to host the instance. The example below uses compute1, however you may need to use compute0. The compute node should contain 2 VCPUs, a 56 GB disk, and 2048 MBs of available RAM. [student@workstation ~(operator1-production)]$ openstack host show \ overcloud-compute-1.localdomain -f json [ { "Project": "(total)", "Disk GB": 56, "Host": "overcloud-compute-1.localdomain", "CPU": 2, "Memory MB": 6143 }, { "Project": "(used_now)", "Disk GB": 0, "Host": "overcloud-compute-1.localdomain", "CPU": 0, "Memory MB": 2048 }, { "Project": "(used_max)", "Disk GB": 0, "Host": "overcloud-compute-1.localdomain", "CPU": 0, "Memory MB": 0 }
14.4.Migrate the instance production1 using shared storage. In the example below, the instance is migrated from compute0 to compute1, but you may need to migrate the instance from compute1 to compute0. [student@workstation ~(operator1-production)]$ openstack server migrate \ --shared-migration \ --live overcloud-compute-1.localdomain \ production1
15. Verify that the migration of production1 using shared storage was successful. 15.1. Verify that the migration of production1 using shared storage was successful. The example below displays compute1, but your output may display compute0. [student@workstation ~(operator1-production)]$ openstack server show \ production1 -f json | grep compute "OS-EXT-SRV-ATTR:host": "overcloud-compute-1.localdomain",
300
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution "OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",
Evaluation From workstation, run the lab resilience-review grade command to confirm the success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~]$ lab resilience-review grade
Cleanup Save any data that you would like to keep from the virtual machines. After the data is saved, reset all of the overcloud virtual machines and the director virtual machine. In the physical classroom environment, reset all of the overcloud virtual machines and the director virtual machine using the rht-vmctl command. In the online environment, reset and start the director and overcloud nodes.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
301
Chapter 6. Managing Resilient Compute Resources
Summary In this chapter, you learned: • The Red Hat OpenStack Platform Bare Metal provisioning service, Ironic, supports the provisioning of both virtual and physical machines to be used for the overcloud deployment. • Red Hat OpenStack Platform director (undercloud) uses the Orchestration service (Heat) to orchestrate the deployment of the overcloud with a stack definition. • Low level system information, such as CPU count, memory, disk space, and network interfaces of a node is retrieved through a process called introspection. • Block-based live migration is the alternate method used when shared storage is not implemented. • When migrating using shared storage, the instance's memory content must be transferred faster than memory pages are written to the source instance. • When using block-based live migration, disk content is copied before memory content is transferred, which makes shared storage live migration quicker and more efficient.
302
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 7
TROUBLESHOOTING OPENSTACK ISSUES Overview Goal
Holistically diagnose and troubleshoot OpenStack issues.
Objectives
• Diagnose and troubleshoot instance launch issues on a compute node. • Diagnose and troubleshoot the identity and messaging services. • Diagnose and troubleshoot the OpenStack networking, image, and volume services.
Sections
• Troubleshooting Compute Nodes (and Guided Exercise) • Troubleshooting Authentication and Messaging (and Guided Exercise) • Troubleshooting OpenStack Networking, Image, and Volume Services (and Guided Exercise)
Lab
• Troubleshooting OpenStack Issues
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
303
Chapter 7. Troubleshooting OpenStack Issues
Troubleshooting Compute Nodes Objectives After completing this section, students should be able to diagnose and troubleshoot instance launch issues on a compute node.
The OpenStack Compute Service Architecture The OpenStack compute service supports the deployment of instances on compute nodes. As are many other OpenStack services, the OpenStack compute service is modular, and their components are deployed on different machines. Each of these components plays a different role in this deployment. The components are deployed on the controller node, and provide a front end using the Compute API. The Nova API component supports the Compute API. The Compute components deployed on the controller node also support the scheduling of instances based on certain scheduling algorithms. These algorithms are configurable, and can be customized. The scheduling is based on the data retrieved from the compute nodes, and is supported by the Compute scheduler component. This data includes the hardware resources currently available in the compute node, like the available memory or the number of CPUs. The Nova compute component, which runs on each compute node, captures this data. This component uses the RabbitMQ messaging service to connect to the Compute service core components deployed on the controller node. The Nova compute component also gathers together all the required resources to launch an instance. This task also includes the scheduling of the instance in the hypervisor running on the compute node. In addition to the RabbitMQ messaging service, Compute also uses the MariaDB service to store its configuration settings. The communication with both RabbitMQ and MariaDB is handled by the Compute conductor component, running on the controller node. The log files for Compute components are in the /var/log/nova directory on both the controller node and the compute node. Each Compute component logs their events to a different log file. The Nova compute component logs to the /var/log/nova/compute.log file in the compute node. The Compute components running on the controller node log to the /var/log/ nova directory on that node. Nova Compute Service
Log File
Scheduler
/var/log/nova/scheduler.log
Conductor
/var/log/nova/conductor.log
API
/var/log/nova/api.log
Compute service commands provide additional visibility on the status of the different Compute components on each node. This status can help troubleshooting issues created by other auxiliary services used by Compute components, such as RabbitMQ or MariaDB. The openstack compute service list command displays the hosts where the Compute components are running on the controller and compute nodes as follows: [user@demo]$ openstack compute service list -c Binary -c Host +------------------+------------------------------------+ | Binary | Host | +------------------+------------------------------------+ | nova-consoleauth | overcloud-controller-0.localdomain |
304
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
The OpenStack Compute Service Architecture | nova-scheduler | overcloud-controller-0.localdomain | | nova-conductor | overcloud-controller-0.localdomain | | nova-compute | overcloud-compute-0.localdomain | +------------------+------------------------------------+
This command also shows the status, state, and last update of the Compute components, as follows: [user@demo]$ openstack compute service list -c Binary -c Status -c State -c "Updated At" +------------------+---------+-------+----------------------------+ | Binary | Status | State | Updated At | +------------------+---------+-------+----------------------------+ | nova-consoleauth | enabled | up | 2017-06-16T19:38:38.000000 | | nova-scheduler | enabled | up | 2017-06-16T19:38:39.000000 | | nova-conductor | enabled | up | 2017-06-16T19:38:35.000000 | | nova-compute | enabled | up | 2017-06-16T19:38:39.000000 | +------------------+---------+-------+----------------------------+
The previous output shows the node where each Compute component is deployed in the Host field, the status of the component in the Status field, and the state of the component in the State field. The Status field shows whether the Compute component is enabled or disabled. The previous command is used to detect issues related to RabbitMQ. A RabbitMQ unavailability issue is indicated when all the Nova Compute components are down.
Note The openstack compute service list command requires admin credentials. A Compute component can be enabled or disabled using the openstack compute service command. This command is useful, for example, when a compute node has to be put under maintenance, as follows: [user@demo]$ openstack compute service set --disable \ overcloud-compute-0.localdomain \ nova-compute [user@demo]$ openstack compute service list -c Binary -c Host -c Status +------------------+------------------------------------+----------+ | Binary | Host | Status | +------------------+------------------------------------+----------+ | nova-consoleauth | overcloud-controller-0.localdomain | enabled | | nova-scheduler | overcloud-controller-0.localdomain | enabled | | nova-conductor | overcloud-controller-0.localdomain | enabled | | nova-compute | overcloud-compute-0.localdomain | disabled | +------------------+------------------------------------+----------+
When the compute node maintenance finishes, the compute node can be enabled again, as follows: [user@demo]$ openstack compute service set --enable \ overcloud-compute-0.localdomain \ nova-compute [user@demo]$ openstack compute service list -c Binary -c Host -c Status +------------------+------------------------------------+---------+ | Binary | Host | Status | +------------------+------------------------------------+---------+ | nova-consoleauth | overcloud-controller-0.localdomain | enabled |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
305
Chapter 7. Troubleshooting OpenStack Issues | nova-scheduler | overcloud-controller-0.localdomain | enabled | | nova-conductor | overcloud-controller-0.localdomain | enabled | | nova-compute | overcloud-compute-0.localdomain | enabled | +------------------+------------------------------------+---------+
All Compute components use the /etc/nova/nova.conf file as their configuration file. This applies both for Compute components running on a controller node and a compute node. That configuration file contains configuration settings for the different Compute components, and also for connecting those to the back-end services. For example, the messaging-related settings are identified by the rabbit prefix (for RabbitMQ). In the compute node, other Nova Compute settings can be configured, like the settings related to the ratio between the physical and virtual resources provided by the compute node. The following settings specify this ratio: • The ram_allocation_ratio parameter for the memory ratio. • The disk_allocation_ratio parameter for the disk ratio. • The cpu_allocation_ratio parameter for the cpu ratio. For example, specifying a ratio of 1.5 will allow cloud users to use 1.5 times as many virtual CPUs as physical CPUs that are available.
Compute Node Placement Algorithm The Compute scheduler component uses a scheduler algorithm to select which compute node is going to be used to deploy an instance. This algorithm is configurable using the scheduler_driver parameter in the /etc/nova/nova.conf configuration file, available on the controller node. By default, the Compute scheduler component uses filter_scheduler, an algorithm based on filters. This algorithm uses a collection of filters to select a suitable host for deploying instances. Those filters will filter hosts based on facts such as the RAM memory available to hosts. When filtered, hosts are sorted according to some cost functions implemented in the Compute scheduler component. Finally, a list of suitable hosts, with their associated costs, is generated. Some of the filters applied by Compute scheduler when using the filter-based algorithm are: • The RetryFilter filter identifies the hosts not used previously. • The RamFilter filter identifies the hosts with enough RAM memory to deploy the instance. • The ComputeFilter filter identifies the compute nodes available to deploy the instance.
Note The Compute scheduler component supports the usage of custom scheduling algorithms.
Regions, Availability Zones, Host Aggregates, and Cells The OpenStack compute service supports the usage of a hierarchy to define its architecture. The top element of that hierarchy is a region. A region usually includes a complete Red Hat
306
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Common Issues with Compute Nodes OpenStack Platform environment. Inside of a region, several availability zones are defined to group compute nodes. A user can specify on which availability zone an instance needs to be deployed. In addition to availability zones, the OpenStack compute service supports host aggregates to group compute nodes. Compute nodes can be grouped in a region using both availability zones and host aggregates. Host aggregates are only visible for cloud administrators. The usage of auxiliary services to connect the different components, like RabbitMQ or MariaDB, can cause issues affecting the OpenStack compute service availability. The OpenStack compute service supports a different hierarchy based on cells. This hierarchy groups compute nodes into cells. Each cell has all the Compute components running except for the Compute API component, which runs on a top-level node. This configuration uses the nova-cells service to select the cell to deploy a new instance. The default OpenStack compute service configuration does not support cells.
Common Issues with Compute Nodes Compute nodes issues are usually related to: • A hardware failure on the compute node. • A failure in the messaging service connecting the Nova compute service with the Compute scheduler service. • Lack of resources, for example CPU or RAM, on the available compute nodes. Those issues usually raise a no valid host issue at the Compute conductor logs because the Compute conductor and scheduler services cannot find a suitable Nova compute service to deploy the instance. [root@demo]# cat /var/log/nova/nova-conductor.log NoValidHost: No valid host was found. There are not enough hosts available. WARNING [instance: 1685(...)02f8] Setting instance to ERROR state.
This error can also be related to the lack of resources on the available compute nodes. The current resources available in the compute nodes running on the Red Hat OpenStack Platform environment can be retrieved using the openstack host list and openstack host show commands as follows. [user@demo]$ openstack host list +------------------------------------+-------------+----------+ | Host Name | Service | Zone | +------------------------------------+-------------+----------+ | overcloud-controller-0.localdomain | consoleauth | internal | | overcloud-controller-0.localdomain | scheduler | internal | | overcloud-controller-0.localdomain | conductor | internal | | overcloud-compute-0.localdomain | compute | nova | +------------------------------------+-------------+----------+ [user@demo]$ openstack host show overcloud-compute-0.localdomain +---------------------------------+------------+-----+-----------+---------+ | Host | Project | CPU | Memory MB | Disk GB | +---------------------------------+------------+-----+-----------+---------+ | overcloud-compute-0.localdomain | (total) | 2 | 6143 | 56 |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
307
Chapter 7. Troubleshooting OpenStack Issues | overcloud-compute-0.localdomain | (used_now) | 0 | 2048 | 0 | | overcloud-compute-0.localdomain | (used_max) | 0 | 0 | 0 | +---------------------------------+------------+-----+-----------+---------+
Note If there is an instance deployed on a compute node, the openstack host show command also shows the usage of CPU, memory, and disk for that instance. The Compute conductor log file also includes the messages related to issues caused by those auxiliary services. For example, the following message in the Compute conductor log file indicates that the RabbitMQ service is not available: [root@demo]# cat /var/log/nova/conductor.log ERROR oslo.messaging._drivers.impl_rabbit [-] [3cb7...857f] AMQP server on 172.24.1.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 16 seconds. Client port: None
The following message indicates that the MariaDB service is not available: [root@demo]# less /var/log/nova/conductor.log WARNING oslo_db.sqlalchemy.engines [req-(...)ac35 - - - - -] SQL connection failed. -1 attempts left.
Troubleshooting Compute Nodes The following steps outline the process for troubleshooting issues in compute nodes. 1.
Log into an OpenStack controller node.
2.
Locate the Compute services log files.
3.
Review the log file for the Compute conductor service.
4.
Review the log file for the Compute scheduler service.
5.
Load admin credentials.
6.
List the Compute services available.
7.
Disable a Nova compute service.
8.
Enable the previous Nova compute service.
References Further information is available in the Logging, Monitoring, and Troubleshooting Guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
308
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Troubleshooting Compute Nodes
Guided Exercise: Troubleshooting Compute Nodes In this exercise, you will fix an issue with the Nova compute service that prevents it from launching instances. Finally, you will verify that the fix was correctly applied by launching an instance. Outcomes You should be able to troubleshoot and fix an issue in the Nova compute service. Before you begin Log in to workstation as student using student as the password. From workstation, run lab troubleshooting-compute-nodes setup to verify that OpenStack services are running, and that resources created in previous sections are available. This script also intentionally breaks the Nova compute service. [student@workstation ~]$ lab troubleshooting-compute-nodes setup
Steps 1. Launch an instance named finance-web1 using the rhel7 image, the m1.web flavor, the finance-network1 network, the finance-web security group, and the developer1keypair1 key pair. These resources were all created by the setup script. The instance deployment will return an error. 1.1. Load the developer1 user credentials. [student@workstation ~]$ source ~/developer1-finance-rc
1.2. Verify that the rhel7 image is available. [student@workstation ~(developer1-finance)]$ openstack image list +---------------+-------+--------+ | ID | Name | Status | +---------------+-------+--------+ | 926c(...)4600 | rhel7 | active | +---------------+-------+--------+
1.3. Verify that the m1.web flavor is available. [student@workstation ~(developer1-finance)]$ openstack flavor list +---------------+--------+------+------+-----------+-------+-----------+ | ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public | +---------------+--------+------+------+-----------+-------+-----------+ | dd1b(...)6900 | m1.web | 2048 | 10 | 0 | 1 | True | +---------------+--------+------+------+-----------+-------+-----------+
1.4. Verify that the finance-network1 network is available. [student@workstation ~(developer1-finance)]$ openstack network list
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
309
Chapter 7. Troubleshooting OpenStack Issues +---------------+---------------------+---------------+ | ID | Name | Subnets | +---------------+---------------------+---------------+ | b0b7(...)0db4 | finance-network1 | a29f(...)855e | ... output omitted ...
1.5. Verify that the finance-web security group is available. [student@workstation ~(developer1-finance)]$ openstack security group list +---------------+-------------+------------------------+---------------+ | ID | Name | Description | Project | +---------------+-------------+------------------------+---------------+ | bdfd(...)b154 | finance-web | finance-web | d9cc(...)ae0f | ... output omitted ...
1.6. Verify that the developer1-keypair1 key pair, and its associated file located at / home/student/developer1-keypair1.pem are available. [student@workstation ~(developer1-finance)]$ openstack keypair list +---------------------+-----------------+ | Name | Fingerprint | +---------------------+-----------------+ | developer1-keypair1 | cc:59(...)0f:f9 | +---------------------+-----------------+ [student@workstation ~(developer1-finance)]$ file ~/developer1-keypair1.pem /home/student/developer1-keypair1.pem: PEM RSA private key
1.7. Launch an instance named finance-web1 using the rhel7 image, the m1.web flavor, the finance-network1 network, the finance-web security group, and the developer1-keypair1 key pair. The instance deployment will return an error. [student@workstation ~(developer1-finance)]$ openstack server create \ --image rhel7 \ --flavor m1.web \ --security-group finance-web \ --key-name developer1-keypair1 \ --nic net-id=finance-network1 \ finance-web1 ...output omitted...
1.8. Verify the status of the finance-web1 instance. The instance status will be ERROR. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web1 -c name -c status +--------+--------------+ | Field | Value | +--------+--------------+ | name | finance-web1 | | status | ERROR | +--------+--------------+
2.
Verify on which host the Nova scheduler and Nova conductor services are running. You will need to load the admin credentials located at the /home/student/admin-rc file. 2.1. Load the admin credentials located at the /home/student/admin-rc file.
310
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~(developer1-finance)]$ source ~/admin-rc
2.2. Verify in which host the Nova scheduler and Nova conductor services are running. Both services are running in controller0. [student@workstation ~(admin-admin)]$ openstack host list +------------------------------------+-------------+----------+ | Host Name | Service | Zone | +------------------------------------+-------------+----------+ | overcloud-controller-0.localdomain | scheduler | internal | | overcloud-controller-0.localdomain | conductor | internal | ...output omitted...
3.
Review the logs for the Compute scheduler and conductor services in controller0. Find the issue related to a no valid host found for the finance-web1 instance in the Compute conductor log file located at /var/log/nova/nova-conductor.log. Find also the issue related to no hosts found by the compute filter in the Compute scheduler log file located at / var/log/nova/nova-scheduler.log 3.1. Log in to controller0 as the heat-admin user. [student@workstation ~(admin-admin)]$ ssh heat-admin@controller0
3.2. Become root in controller0. [heat-admin@overcloud-controller-0 ~]$ sudo -i
3.3. Locate the log message in the Compute conductor log file, which sets the financeweb1 instance's status to error, since no valid host is available to deploy the instance. The log file shows the instance ID. [root@overcloud-controller-0 heat-admin]# cat /var/log/nova/nova-conductor.log ...output omitted... NoValidHost: No valid host was found. There are not enough hosts available. (...) WARNING (...) [instance: 168548c9-a7bb-41e1-a7ca-aa77dca302f8] Setting instance to ERROR state. ...output omitted...
3.4. Locate the log message, in the Nova scheduler file, which returns zero hosts for the compute filter. When done, log out of the root account. [root@overcloud-controller-0 heat-admin]# cat /var/log/nova/nova-scheduler.log ...output omitted... (...) Filter ComputeFilter returned 0 hosts (...) Filtering removed all hosts for the request with instance ID '168548c9a7bb-41e1-a7ca-aa77dca302f8'. (...) [root@overcloud-controller-0 heat-admin]# exit
4.
Verify how many Nova compute services are enabled. 4.1. Load the admin credentials.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
311
Chapter 7. Troubleshooting OpenStack Issues
[heat-admin@overcloud-controller-0 ~]$ source overcloudrc
4.2. List the Compute services. The nova-compute service running on compute0 is disabled. [heat-admin@overcloud-controller-0 ~]$ openstack compute service list \ -c Binary -c Host -c Status +------------------+------------------------------------+---------+ | Binary | Host | Status | +------------------+------------------------------------+---------+ | nova-compute | overcloud-compute-0.localdomain | disabled | ...output omitted... +------------------+------------------------------------+---------+
5.
Enable and verify the Nova compute service in compute0. 5.1. Enable the Nova compute service on compute0. [heat-admin@overcloud-controller-0 ~]$ openstack compute service set \ --enable \ overcloud-compute-0.localdomain \ nova-compute
5.2. Verify that the Nova compute service has been correctly enabled on compute0. When done, log out from the controller node. [heat-admin@overcloud-controller-0 ~]$ openstack compute service list \ -c Binary -c Host -c Status +------------------+------------------------------------+---------+ | Binary | Host | Status | +------------------+------------------------------------+---------+ | nova-compute | overcloud-compute-0.localdomain | enabled | ...output omitted... [heat-admin@overcloud-controller-0 ~]$ exit
6.
Launch the finance-web1 instance again from workstation using the developer1 user credentials. Use the rhel7 image, the m1.web flavor, the finance-network1 network, the finance-web security group, and the developer1-keypair1 key pair. The instance will be deployed without errors. You will need to delete the previous instance deployment with an error status before deploying the new instance. 6.1. Load the developer1 user credentials. [student@workstation ~(admin-admin)]$ source ~/developer1-finance-rc
6.2. Delete the previous finance-web1 instance which deployment issued an error. [student@workstation ~(developer1-finance)]$ openstack server delete \ finance-web1
312
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
6.3. Verify that the instance has been correctly deleted. The command should not return any instances named finance-web1. [student@workstation ~(developer1-finance)]$ openstack server list
6.4. Launch the finance-web1 instance again, using the rhel7 image, the m1.web flavor, the finance-network1 network, the finance-web security group, and the developer1-keypair1 key pair. [student@workstation ~(developer1-finance)]$ openstack server create \ --image rhel7 \ --flavor m1.web \ --security-group finance-web \ --key-name developer1-keypair1 \ --nic net-id=finance-network1 \ --wait finance-web1 ... output omitted ...
6.5. Verify the status of the finance-web1 instance. The instance status will be ACTIVE. It may take some time for the instance's status to became ACTIVE. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web1 -c name -c status +--------+--------------+ | Field | Value | +--------+--------------+ | name | finance-web1 | | status | ACTIVE | +--------+--------------+
Cleanup From workstation, run the lab troubleshooting-compute-nodes cleanup script to clean up this exercise. [student@workstation ~]$ lab troubleshooting-compute-nodes cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
313
Chapter 7. Troubleshooting OpenStack Issues
Troubleshooting Authentication and Messaging Objectives After completing this section, students should be able to diagnose and troubleshoot the Identity and Messaging services.
The OpenStack Identity Service Architecture The Keystone identity service supports user authentication and authorization. This service is the front-end service for a Red Hat OpenStack Platform environment. The cloud administrator creates credentials for each user. These credentials usually include a user name, a password, and an authentication URL. This authentication URL points to the Identity API. The Identity API is used to authenticate to the Red Hat OpenStack Platform environment. The Keystone identity service, like other OpenStack services, has three endpoints associated to it. Those endpoints are the public endpoint, the admin endpoint, and the internal endpoint. The public endpoint, by default bound to port TCP/5000, provides the API functionality required for an external user to use Keystone authentication. This endpoint is usually the one used as the authentication URL provided to cloud users. A user's machine needs to have access to the TCP/5000 port on the machine where the Keystone identity service is running to authenticate in the Red Hat OpenStack Platform environment. The Keystone identity service usually runs on the controller node. The admin endpoint provides additional functionality to the public endpoint. The other Red Hat OpenStack Platform services use the internal endpoint to run authentication and authorization queries on the Keystone identity service. The openstack catalog show identity command displays the list of endpoints available for the user credentials. [user@demo]$ openstack catalog show identity +-----------+---------------------------------------------+ | Field | Value | +-----------+---------------------------------------------+ | endpoints | regionOne | | | publicURL: http://172.25.250.50:5000/v2.0 | | | internalURL: http://172.24.1.50:5000/v2.0 | | | adminURL: http://172.25.249.50:35357/v2.0 | | | | | name | keystone | | type | identity | +-----------+---------------------------------------------+
In the previous output, each endpoint uses a different IP address based on the availability required for each of those endpoints. The HAProxy service manages all of these IP addresses. This service runs on the controller node. The HAProxy configuration file includes two services to manage the three endpoints' IP addresses: keystone_admin and keystone_public. Both services include two IP addresses, one internal and one external. For example, the keystone_public service serves the public endpoint using both an internal IP address and an external IP address: [user@demo]$ less /etc/haproxy/haproxy.cfg ...output omitted... listen keystone_public
314
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
The OpenStack Messaging Service Architecture bind 172.24.1.50:5000 transparent bind 172.25.250.50:5000 transparent mode http http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } server overcloud-controller-0.internalapi.localdomain 172.24.1.1:5000 check fall 5 inter 2000 rise 2 ...output omitted...
In the previous definition for the keystone_public service, the first IP address 172.24.1.50, is configured with the internal IP address. This IP address is used by other OpenStack services for user authentication and authorization, made possible by the Keystone identity service. The second IP address configured for the keystone_public service, 172.25.250.50, is configured with the external IP address. Cloud users use this IP address in their authorization URL. The Keystone identity service runs on top of the httpd service. Issues in Keystone are usually related to the configuration or availability of either the HAProxy or httpd service. If the httpd service is not available, the following error message is displayed: [user@demo]$ openstack volume create --size 1 demo-volume Discovering versions from the identity service failed when creating the password plugin. Attempting to determine version from URL. Service Unavailable (HTTP 503)
The OpenStack Messaging Service Architecture Most of the OpenStack services are modular, so they can easily scale. These services run several components that communicate using a messaging service. Red Hat OpenStack Platform supports RabbitMQ as the default messaging service. When a component wants to send a message to another component, the component places that message in a queue. Both a user and a password are required to send the message to that queue. All Red Hat OpenStack Platform services use the guest user to log into RabbitMQ. Pacemaker manages the RabbitMQ service as a resource. The name for the Pacemaker resource is rabbitmq. An issue with RabbitMQ availability usually means a blocked request for the cloud user. The status of the RabbitMQ service can be obtained using the rabbitmqctl cluster_status command. This command displays basic information about the RabbitMQ cluster status. [root@demo]# rabbitmqctl cluster_status Cluster status of node 'rabbit@overcloud-controller-0' ... [{nodes,[{disc,['rabbit@overcloud-controller-0']}]}, {running_nodes,['rabbit@overcloud-controller-0']}, {cluster_name,}, {partitions,[]}, {alarms,[{'rabbit@overcloud-controller-0',[]}]}]
Additional information, like the IP address where RabbitMQ is listening, is available using the rabbitmqctl status command. [root@demo]# rabbitmqctl status Cluster status of node 'rabbit@overcloud-controller-0' ...
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
315
Chapter 7. Troubleshooting OpenStack Issues [{nodes,[{disc,['rabbit@overcloud-controller-0']}]}, {running_nodes,['rabbit@overcloud-controller-0']}, {cluster_name,}, ...output omitted... {memory,[{total,257256704}, {connection_readers,824456}, {connection_writers,232456}, {connection_channels,1002976}, {connection_other,2633224}, {queue_procs,3842568}, {queue_slave_procs,0}, ...output omitted... {listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]}, ...output omitted...
The status of the Pacemaker resource for RabbitMQ can be viewed using the pcs status command. This command shows the status and any error reports of all the resources configured in the Pacemaker cluster. [root@demo]# pcs status Cluster name: tripleo_cluster ....output omitted... Clone Set: haproxy-clone [haproxy] Started: [ overcloud-controller-0 ] ... output omitted...
In case of failure of the rabbitmq resource, the resource can be restarted using the pcs resource cleanup and the pcs resource debug-start as follows: [root@demo]# pcs resource cleanup rabbitmq Cleaning up rabbitmq:0 on overcloud-controller-0, removing fail-count-rabbitmq Waiting for 1 replies from the CRMd. OK [root@demo]# pcs resource debug-start rabbitmq Operation start for rabbitmq:0 (ocf:heartbeat:rabbitmq-cluster) returned 0 > stderr: DEBUG: RabbitMQ server is running normally > stderr: DEBUG: rabbitmq:0 start : 0
Troubleshooting Authentication and Messaging The following steps outline the process for troubleshooting issues in authentication and messaging services. 1.
Log into an OpenStack controller node.
2.
Review the HAProxy configuration for the keystone_public and keystone_admin services.
3.
Verify the RabbitMQ cluster's status.
4.
Verify the Pacemaker cluster's status.
5.
Verify the rabbitmq-clone resource's status.
6.
Load admin credentials.
7.
Verify that the cinder-scheduler and cinder-volume services are enabled.
8.
Review the Cinder messaging configuration.
316
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
The OpenStack Messaging Service Architecture
References Further information is available in the Logging, Monitoring, and Troubleshooting Guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
317
Chapter 7. Troubleshooting OpenStack Issues
Guided Exercise: Troubleshooting Authentication and Messaging In this exercise, you will fix an issue with the authentication and messaging services. Outcomes You should be able to: • Troubleshoot and fix an issue in the Keystone identity service. • Troubleshoot and fix an issue related to RabbitMQ. Before you begin Log in to workstation as student using student as the password. From workstation, run lab troubleshooting-authentication setup to verify that OpenStack services are running, and the resources created in previous sections are available. This script will also break the Keystone identity service and the RabbitMQ service. [student@workstation ~]$ lab troubleshooting-authentication setup
Steps 1. Create a 1 GB volume named finance-volume1 using developer1 user credentials. The command will raise an issue. 1.1. Load the developer1 user credentials. [student@workstation ~]$ source ~/developer1-finance-rc
1.2. Create a 1 GB volume named finance-volume1. This command raises a service unavailable issue. [student@workstation ~(developer1-finance)]$ openstack volume create \ --size 1 finance-volume1 Discovering versions from the identity service failed when creating the password plugin. Attempting to determine version from URL. Service Unavailable (HTTP 503)
2.
Verify that the IP address used in the authentication URL of the developer1 user credentials file is the same one configured as a virtual IP in the HAProxy service for the keystone_public service. The HAProxy service runs in controller0. 2.1. Find the authentication URL in the developer1 user credentials file. [student@workstation ~(developer1-finance)]$ cat ~/developer1-finance-rc unset OS_SERVICE_TOKEN export OS_USERNAME=developer1 export OS_PASSWORD=redhat export OS_AUTH_URL=http://172.25.250.50:5000/v2.0 export PS1='[\u@\h \W(developer1-finance)]\$ ' export OS_TENANT_NAME=finance
318
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
export OS_REGION_NAME=regionOne
2.2. Log in to controller0 as heat-admin. [student@workstation ~(developer1-finance)]$ ssh heat-admin@controller0
2.3. Find the virtual IP address configured in the HAProxy service for the keystone_public service. [heat-admin@overcloud-controller-0 ~]$ sudo less /etc/haproxy/haproxy.cfg ...output omitted... listen keystone_public bind 172.24.1.50:5000 transparent bind 172.25.250.50:5000 transparent mode http http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } server overcloud-controller-0.internalapi.localdomain 172.24.1.1:5000 check fall 5 inter 2000 rise 2
2.4. Verify that the HAProxy service is active. [heat-admin@overcloud-controller-0 ~]$ systemctl status haproxy haproxy.service - Cluster Controlled haproxy Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled) Drop-In: /run/systemd/system/haproxy.service.d └─50-pacemaker.conf Active: active (running) since Thu 2017-06-15 08:45:47 UTC; 1h 8min ago Main PID: 13096 (haproxy-systemd) ...output omitted...
2.5. Verify the status for the httpd service. The httpd service is inactive. [heat-admin@overcloud-controller-0 ~]$ systemctl status httpd httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/httpd.service.d └─openstack-dashboard.conf Active: inactive (dead) since Thu 2017-06-15 09:37:15 UTC; 21min ago ...output omitted...
3.
Start the httpd service. It may take some time for the httpd service to be started. 3.1. Start the httpd service. [heat-admin@overcloud-controller-0 ~]$ sudo systemctl start httpd
3.2. Verify that the httpd service is active. When done, log out from the controller node. [heat-admin@overcloud-controller-0 ~]$ systemctl status httpd httpd.service - The Apache HTTP Server
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
319
Chapter 7. Troubleshooting OpenStack Issues Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/httpd.service.d └─openstack-dashboard.conf Active: active (running) since Thu 2017-06-15 10:13:15 UTC; 1min 8s ag [heat-admin@overcloud-controller-0 ~]$ logout
4.
On workstation try to create a 1 GB volume named finance-volume1 again. The command will hang because the Keystone identity service is not able to respond. Press Ctrl+C to get back to the prompt. [student@workstation ~(developer1-finance)]$ openstack volume create \ --size 1 \ finance-volume1 Ctrl+C
5.
Verify that the previous issue is caused by the RabbitMQ service. 5.1. Log in to controller0 as heat-admin. [student@workstation ~(developer1-finance)]$ ssh heat-admin@controller0
5.2. Verify that the log file for the Keystone identity service reports that the RabbitMQ service in unreachable. [heat-admin@overcloud-controller-0 ~]$ sudo less /var/log/keystone/keystone.log ...output omitted... (...) AMQP server on 172.24.1.1:5672 is unreachable: [Errno 111] Connection refused. (...)
5.3. Verify that the RabbitMQ cluster is down. [heat-admin@overcloud-controller-0 ~]$ sudo rabbitmqctl cluster_status Cluster status of node 'rabbit@overcloud-controller-0' ... Error: unable to connect to node 'rabbit@overcloud-controller-0': nodedown ...output omitted...
6.
Verify that the root cause for the RabbitMQ cluster unavailability is that the rabbitmq Pacemaker resource is disabled. When done, enable the rabbitmq Pacemaker resource. 6.1. Verify that the root cause for the RabbitMQ cluster unavailability is that the rabbitmq Pacemaker resource is disabled. [heat-admin@overcloud-controller-0 ~]$ sudo pcs status Cluster name: tripleo_cluster Stack: corosync ...output omitted... Clone Set: rabbitmq-clone [rabbitmq] Stopped (disabled): [ overcloud-controller-0 ] ...output omitted...
320
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
6.2. Enable the rabbitmq resource in Pacemaker. When done, log out from the controller node. [heat-admin@overcloud-controller-0 ~]$ sudo pcs resource enable rabbitmq --wait Resource 'rabbitmq' is running on node overcloud-controller-0. [heat-admin@overcloud-controller-0 ~]$ logout
7.
On workstation, try to create again a 1 GB volume, named finance-volume1. The volume will be created successfully. 7.1. On workstation, try to create again a 1 GB volume, named finance-volume1. [student@workstation ~(developer1-finance)]$ openstack volume create \ --size 1 finance-volume1 ...output omitted...
7.2. Verify that the volume has been created successfully. [student@workstation ~(developer1-finance)]$ openstack volume list +---------------+-----------------+-----------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +---------------+-----------------+-----------+------+-------------+ | 9a21(...)2d1a | finance-volume1 | available | 1 | | +---------------+-----------------+-----------+------+-------------+
Cleanup From workstation, run the lab troubleshooting-authentication cleanup script to clean up this exercise. [student@workstation ~]$ lab troubleshooting-authentication cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
321
Chapter 7. Troubleshooting OpenStack Issues
Troubleshooting OpenStack Networking, Image, and Volume Services Objectives After completing this section, students should be able to diagnose and troubleshoot the OpenStack networking, image, and volume services.
Networking This section discusses the different methods, commands, procedures and log files you can use to troubleshoot OpenStack networking issues. Unreachable Instances Problem: You have created an instance but are unable to assign it a floating IP. This problem can occur when the network is not setup correctly. If a router is not set as the gateway for the external network, then users will not be able to assign a floating IP address to an instance. Use the neutron router-gateway-set command to set the router as a gateway for the external network. Then use the openstack server add floating ip command to assign a floating IP address to the instance.
Note Floating IPs can be created even if the router is not connected to the external gateway but when the user attempts to associate a floating IP address with an instance, an error will display.
[user@demo]$ openstack server add floating ip finance-web1 172.25.250.N Error: External network 7aaf57c1-3c34-45df-94d3-dbc12754b22e is not reachable from subnet cfc7ddfa-4403-41a7-878f-e8679596eafd.
If a router is not set as the gateway for the external network, then users will not be able to assign a floating IP address to an instance. [user@demo]$ openstack router show finance-router1 +--------------------------------------------------------------+ | Field | Value | +--------------------------------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | nova | | created_at | 2017-06-15T09:39:07Z | | description | | | external_gateway_info | null | | flavor_id | None | ...output omitted ...
Use the neutron router-gateway-set command to set the router as a gateway for the external network.
322
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Networking
[user@demo]$ neutron router-gateway-set finance-web1 provider-172.25.250 [user@demo]$ openstack router show finance-router1 -f json { "external_gateway_info": "{\"network_id\": \"65606551-51f5-44f0-a389-1c96b728e05f \", \"enable_snat\": true, \"external_fixed_ips\": [{\"subnet_id\": \"9d12d02f-7818-486b-8cbf-015798e28a4d\", \"ip_address\": \"172.25.250.32\"}]}",
Use the openstack server add floating ip command to assign a floating IP address to the instance. [user@demo]$ openstack server add floating ip finance-web1 172.25.250.N
Use the openstack server list command to verify that a floating IP address has been associated with the instance. [user@demo]$ openstack server list -c Name -c Networks +-----------------+---------------------------------------------+ | Name | Networks | +-----------------+---------------------------------------------+ | finance-web1 | finance-network1=192.168.0.P, 172.25.250.N | +-----------------+---------------------------------------------+
Problem: Check that a security group has been assigned to the instance and that a rule has been added to allow SSH traffic. SSH rules are not included by default. [user@demo]$ ssh -i developer1-keypair.pem [email protected] Warning: Permanently added '172.25.250.N' (ECDSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Verify that a security group has been assigned to the instance and that it has a rule allowing SSH. By default, it does not. The rule with the Port Range 22:22, should be associated to the same security group than the instance. Verify this by comparing the IDs. [user@demo]$ openstack server show finance-web1 -f json ...output omitted... "security_groups": [ { "name": "finance-web" } ], ...output omitted... [user@demo]$ openstack security group list +---------------+-------------+-------------+---------------+ | ID | Name | Description | Project | +---------------+-------------+-------------+---------------+ | 1728(...)443f | finance-web | | 1e7d(...)b191 | ...output omitted... [user@demo]$ openstack security group rule list +---------------+-------------+-----------+------------+---------------+ | ID | IP Protocol | IP Range | Port Range | Security Group| +---------------+-------------+-----------+------------+---------------+ | 0049(...)dddc | None | None | | 98d4(...)43e5 | | 31cf(...)aelb | tcp | 0.0.0.0/0 | 22:22 | 1728(...)443f | +---------------+-------------+-----------+------------+---------------+
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
323
Chapter 7. Troubleshooting OpenStack Issues This problem can also occur if the internal network was attached to the router after the instance was created. In this situation, the instance is not able to contact the metadata service at boot, therefore the key is not added to the authorized_keys for the cloud-user user. This can be verified by checking the /var/log/cloud-init.log log file on the instance itself. Alternatively, check the contents of /home/cloud-user/.ssh/authorized-keys. You can gain access to the instance via Horizon. [root@host-192-168-0-N ~]# less /var/log/cloud-init.log ...output omitted... [ 134.170335] cloud-init[475]: 2014-07-01 07:33:22,857 url_helper.py[WARNING]: Calling 'http://192.168.0.1//latest/meta-data/instance- id' failed [0/120s]: request error [HTTPConnectionPool(host='192.168.0.1', port=80): Max retries exceeded with url: //latest/meta-data/instance-id (...) [Errno 113] No route to host)] ...output omitted... [root@host-192-168-0-N ~]# cat /home/cloud-user/.ssh/authorized-keys [root@host-192-168-0-N ~]#
In this situation, there is no option but to delete the instance, attach the subnet to the router, and re-create the instance. [user@demo]$ openstack server delete finance-web1 [user@demo]$ openstack subnet list +---------------+----------------------------+---------------+-----------------+ | ID | Name | Network | Subnet | +---------------+----------------------------+---------------+-----------------+ | 72c4(...)cc37 | provider-subnet-172.25.250 | 8b00(...)5285 | 172.25.250.0/24 | | a520(...)1d9a | finance-subnet1 | f33a(...)42b2 | 192.168.0.0/24 | +---------------+----------------------------+---------------+-----------------+ [user@demo]$ openstack router add subnet finance-router1 finance-subnet1 [user@demo]$ neutron router-port-list finance-router1 -c fixed-ips +-------------------------------------------------------------+ | fixed_ips | +-------------------------------------------------------------+ | {"subnet_id": "dbac(...)673d", "ip_address": "192.168.0.1"} | +-------------------------------------------------------------+
Problem: A key pair was not assigned to the instance at creation. SSH will not be possible. In this scenario, the instance must be destroyed and re-created and a key pair assigned at creation. [user@demo]$ openstack server delete finance-web1 [user@demo]$ openstack server create \ --flavor m1.web \ --nic net-id=finance-network1 \ --key-name developer1-keypair1 \ --security-group finance-web \ --image finance-rhel7-web finance-web1 --wait [user@demo]$ openstack server show finance-web1 -f json ...output omitted... "key_name": "developer1-keypair1", ...output omitted...
324
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Images
Images The Glance image service stores images and metadata. Images can be created by users and uploaded to the Image service. The Glance image service has a RESTful API that allows users to query the metadata of an image, as well as obtaining the actual image. Logging The Image service has two logging files. Their use can be configured by altering the [DEFAULT] section of the /etc/glance/glance.api configuration file. In this file, you can dictate where and how logs should be stored, which storage method should be used, and its specific configuration. You can also configure the Glance image service size limit. Use the image_size_cap=SIZE in the [DEFAULT] section of the file. You can also specify a storage capacity per user by setting the user_storage_quota=SIZE parameter in the [DEFAULT] section. Service
Service Name
Log Path
OpenStack Image Service API Server
openstack-glance-api.service
/var/log/glance/api.log
OpenStack Image Service Registry Server
openStack-glance-registry.service
/var/log/glance/registry.log
Managing Images When creating a new image, a user can choose to protect that image from deletion with the -protected option. This prevents an image from being deleted even by the administrator. It must be unprotected first, then deleted. [user@demo]$ openstack image delete rhel7-web Failed to delete image with name or ID '21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef': 403 Forbidden Image 21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef is protected and cannot be deleted. (HTTP 403) Failed to delete 1 of 1 images. [user@demo]$ openstack image set --unprotected rhel7-web [user@demo]$ openstack image delete rhel7-web
Volumes Ceph The OpenStack block storage service can use Ceph as a storage back end. Each volume created in the block storage service has an associated RBD image in Ceph. The name of the RBD image is the ID of the block storage volume. The OpenStack block storage service requires a user and a pool in Ceph in order to use it. The user is openstack, the same user configured for other services using Ceph as their back end, like the OpenStack image service. The undercloud also creates a dedicated Ceph pool for the block storage services, named volumes. The volumes pool contains all the RBD images associated to volumes. These settings are included in the /etc/cinder/cinder.conf configuration file. [user@demo]$ grep rbd_ /etc/cinder/cinder.conf rbd_pool=volumes
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
325
Chapter 7. Troubleshooting OpenStack Issues rbd_user=openstack ...output omitted...
Permissions within Ceph are known as capabilities, and are granted by daemon type, such as MON or OSD. Three capabilities are available within Ceph: read (r) to view, write (w) to modify, and execute (x) to execute extended object classes. All daemon types support these three capabilities. For the OSD daemon type, permissions can be restricted to one or more pools, for example osd 'allow rwx pool=rbd, allow rx pool=mydata'. If no pool is specified, the permission is granted on all existing pools. The openstack user has capabilities on all the pools used by OpenStack services. The openstack user requires read, write, and execute capabilities in both the volumes and the images pools to be used by the OpenStack block storage service. The images pool is the dedicated pool for the OpenStack image service. [user@demo]$ ceph auth list installed auth entries: ...output omitted... client.openstack key: AQCg7T5ZAAAAABAAI6ZtsCQEuvVNqoyRKzeNcw== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics
Attach and Detach Workflow There are 3 API calls for each attach and detach operation. • Status of the volume is updated in the database • Connection operations on the volume are handled • Status of the volume is finalized and the resource is released In order to attach a volume, it must be in the available state. Any other state results in an error message. It can happen that a volume is stuck in a detaching state. The state can be altered by an admin user. [user@demo]$ cinder reset-state --state available volume_id
If you try to delete a volume and it fails, you can forcefully delete that volume using the --force option. [user@demo]$ openstack volume delete --force volume_id
Incorrect volume configurations cause the most common block storage errors. Consult the Cinder block storage service log files in case of error. Log Files Service
Log Path
OpenStack Block Service API Server
/var/log/cinder/api.log
OpenStack Block Service Registry Server
/var/log/cinder/volume.log
326
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Volumes Service
Log Path
Openstack Cinder Scheduler Log
/var/log/cinder/scheduler.log
The Block Storage service api.log is useful in determining whether the error is due to an endpoint or connectivity error. That is, if you try to create a volume and it fails, then the api.log is the one you should review. If the create request was received by the Block Storage service, then you can verify the request in this api.log log file. Assuming the request is logged in the api.log but there are no errors, check the volume.log for errors that may have occurred during the create request. For Cinder Block Storage services to function properly, it must be configured to use the RabbitMQ messaging service. All Block Storage configuration can be found in the /etc/ cinder/cinder.conf configuration file, stored on the controller node. The default rabbit_userid is guest. If that user is wrongly configured and the Block Storage services are restarted, RabbitMQ will not respond to Block Storage service requests. Any volume created during that period results in a status of ERROR. Any volume with a status of ERROR must be deleted and re-created once the Cinder Block Storage service has been restarted and is running properly. To determine the problem in this scenario, review the /var/log/cinder/scheduler.log log file on the controller node. If the problem is RabbitMQ, you will see the following: [user@demo]$ sudo less /var/log/cinder/scheduler.log ...output omitted... 201 (...) Failed to run task cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid host was found. No weighed hosts available
Verify that both the RabbitMQ Cluster and the rabbitmq-clone Pacemaker resource are available. If both resources are available the problem could be found in the cinder.conf configuration file. Check that all usernames, passwords, IP addresses and URLs in the /etc/ cinder/cinder.conf configuration file are correct. [user@demo]$ sudo rabbitmqctl status Status of node 'rabbit@overcloud-controller-0' ... ...output omitted... {listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]}, ...output omitted... [user@demo]$ sudo pcs resource show rabbitmq-clone Clone: rabbitmq-clone Meta Attrs: interleave=true ordered=true Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster) Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}" Meta Attrs: notify=true Operations: monitor interval=10 timeout=40 (rabbitmq-monitor-interval-10) start interval=0s timeout=200s (rabbitmq-start-interval-0s) stop interval=0s timeout=200s (rabbitmq-stop-interval-0s)
Troubleshooting OpenStack Networking, Image, and Volume Services The following steps outline the process for troubleshooting issues in networking, image, and volume services. 1.
Load user credentials.
2.
Try to delete a protected image.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
327
Chapter 7. Troubleshooting OpenStack Issues 3.
Unprotect a protected image.
4.
Delete the image.
5.
Load admin credentials.
6.
Verify that a router has an external network configured as a gateway.
7.
Log into an OpenStack controller.
8.
Verify the Ceph back end configuration for the Cinder volume service.
9.
Verify the capabilities configured for the Cinder volume service user in Ceph.
References Further information is available in the Logging, Monitoring, and Troubleshooting Guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
328
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Troubleshooting OpenStack Networking, Image, and Volume Services
Guided Exercise: Troubleshooting OpenStack Networking, Image, and Volume Services In this exercise, you will fix an issue related to image requirements. You will also fix an issue with the accessibility of the metadata service. Finally, you will fix an issue with the Ceph back end for the OpenStack Block Storage service. Outcomes You should be able to: • Troubleshoot and fix an issue in the OpenStack Image service. • Troubleshoot and fix an issue in the OpenStack Networking service. • Troubleshoot and fix an issue in the OpenStack Block Storage service. Before you begin Log in to workstation as student using student as the password. From workstation, run lab troubleshooting-services setup to verify that OpenStack services are running, and resources created in previous sections are available. This script also creates the m1.lite flavor, and detaches the finance-subnet1 subnetwork from the finance-router1 router. Finally, the script will break the Ceph back end configuration for the OpenStack Block Storage service. [student@workstation ~]$ lab troubleshooting-services setup
Steps 1. Launch an instance named finance-web1. Use the rhel7 image, the finance-web security group, the developer1-keypair1 key pair, the m1.lite flavor, and the finance-network1 network. The instance's deployment will fail because the flavor does not meet the image's minimal requirements. 1.1. Load the credentials for the developer1 user. [student@workstation ~]$ source ~/developer1-finance-rc
1.2. Verify that the rhel7 image is available. [student@workstation ~(developer1-finance)]$ openstack image list +---------------+-------+--------+ | ID | Name | Status | +---------------+-------+--------+ | 5864(...)ad03 | rhel7 | active | +---------------+-------+--------+
1.3. Verify that the finance-web security group is available. [student@workstation ~(developer1-finance)]$ openstack security group list +---------------+-------------+------------------------+---------------+ | ID | Name | Description | Project |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
329
Chapter 7. Troubleshooting OpenStack Issues +---------------+-------------+------------------------+---------------+ | 0cb6(...)5c7e | finance-web | finance-web | 3f73(...)d660 | ...output omitted...
1.4. Verify that the developer1-keypair1 key pair, and its associated key file located at / home/student/developer1-keypair1.pem are available. [student@workstation ~(developer1-finance)]$ openstack keypair list +---------------------+-----------------+ | Name | Fingerprint | +---------------------+-----------------+ | developer1-keypair1 | 04:9c(...)cb:1d | +---------------------+-----------------+ [student@workstation ~(developer1-finance)]$ file ~/developer1-keypair1.pem /home/student/developer1-keypair1.pem: PEM RSA private key
1.5. Verify that the m1.lite flavor is available. [student@workstation ~(developer1-finance)]$ openstack flavor list +---------------+---------+------+------+-----------+-------+-----------+ | ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public | +---------------+---------+------+------+-----------+-------+-----------+ | 7998(...)bc36 | m1.lite | 1024 | 5 | 0 | 1 | True | ...output omitted...
1.6. Verify that the finance-network1 network is available. [student@workstation ~(developer1-finance)]$ openstack network list +---------------+---------------------+--------------------------------------+ | ID | Name | Subnets | +---------------+---------------------+--------------------------------------+ | a4c9(...)70ff | finance-network1 | ec0d(...)480b | ...output omitted... +---------------+---------------------+--------------------------------------+
1.7. Create an instance named finance-web1. Use the rhel7 image, the finance-web security group, the developer1-keypair1 key pair, the m1.lite flavor, and the finance-network1 network. The instance's deployment will fail because the flavor does not meet the image's minimal requirements. [student@workstation ~(developer1-finance)]$ openstack server create \ --image rhel7 \ --security-group finance-web \ --key-name developer1-keypair1 \ --flavor m1.lite \ --nic net-id=finance-network1 \ finance-web1 Flavor's memory is too small for requested image. (HTTP 400) (...)
2.
Verify the rhel7 image requirements for memory and disk, and the m1.lite flavor specifications. 2.1. Verify the rhel7 image requirements for both memory and disk. The minimum disk required is 10 GB. The minimum memory required is 2048 MB.
330
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~(developer1-finance)]$ openstack image show rhel7 +------------------+----------+ | Field | Value | +------------------+----------+ ...output omitted... | min_disk | 10 | min_ram | 2048 | name | rhel7 ...output omitted...
| | |
2.2. Verify the m1.lite flavor specifications. The disk and memory specifications for the m1.lite flavor do not meet the rhel7 image requirements. [student@workstation ~(developer1-finance)]$ openstack flavor show m1.lite +-------------+--------------------------------------+ | Field | Value | +-------------+--------------------------------------+ ...output omitted... | disk | 5 | | name | m1.lite | | ram | 1024 | ...output omitted...
3.
Verify that the m1.web flavor meets the rhel7 image requirements. Launch an instance named finance-web1. Use the rhel7 image, the finance-web security group, the developer1-keypair1 key pair, the m1.web flavor, and the finance-network1 network. The instance's deployment will be successful. 3.1. Verify that the m1.web flavor meets the rhel7 image requirements. [student@workstation ~(developer1-finance)]$ openstack flavor show m1.web +-------------+--------------------------------------+ | Field | Value | +-------------+--------------------------------------+ ...output omitted... | disk | 10 | | name | m1.web | | ram | 2048 | ...output omitted...
3.2. Launch an instance named finance-web1. Use the rhel7 image, the financeweb security group, the developer1-keypair1 key pair, the m1.web flavor, and the finance-network1 network. [student@workstation ~(developer1-finance)]$ openstack server create \ --image rhel7 \ --security-group finance-web \ --key-name developer1-keypair1 \ --flavor m1.web \ --nic net-id=finance-network1 \ --wait \ finance-web1 ...output omitted...
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
331
Chapter 7. Troubleshooting OpenStack Issues 3.3. Verify that the finance-web1 instance is ACTIVE. [student@workstation ~(developer1-finance)]$ openstack server list \ -c Name -c Status +--------------+--------+ | Name | Status | +--------------+--------+ | finance-web1 | ACTIVE | +--------------+--------+
4.
Attach an available floating IP to the finance-web1. The floating IP will not be attached because the external network is not reachable from the internal network. 4.1. Verify which floating IPs are available. [student@workstation ~(developer1-finance)]$ openstack floating ip list +----------------+---------------------+------------------+------+ | ID | Floating IP Address | Fixed IP Address | Port | +----------------+---------------------+------------------+------+ | a49b(...)a7812 | 172.25.250.P | None | None | +----------------+---------------------+------------------+------+
4.2. Attach the previous floating IP to the finance-web1. The floating IP will not be attached because the external network is not reachable from the internal network. [student@workstation ~(developer1-finance)]$ openstack server add \ floating ip finance-web1 172.25.250.P Unable to associate floating IP 172.25.250.P to fixed IP 192.168.0.N (...) Error: External network cb3a(...)6a35 is not reachable from subnet ec0d(...)480b.(...)
5.
Fix the previous issue by adding the finance-subnet1 subnetwork to the financerouter1 router. 5.1. Verify that the finance-router1 router is ACTIVE. [student@workstation ~(developer1-finance)]$ openstack router list \ -c Name -c Status -c State +-----------------+--------+-------+ | Name | Status | State | +-----------------+--------+-------+ | finance-router1 | ACTIVE | UP | +-----------------+--------+-------+
5.2. Verify the current subnetworks added to the finance-router1 router. No output will display because the subnetwork has not been added. [student@workstation ~(developer1-finance)]$ neutron router-port-list \ finance-router1
5.3. Add the finance-subnet1 subnetwork to the finance-router1 router.
332
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~(developer1-finance)]$ openstack router add subnet \ finance-router1 finance-subnet1
5.4. Verify that the finance-subnet1 subnetwork has been correctly added to the finance-router1 router [student@workstation ~(developer1-finance)]$ neutron router-port-list \ finance-router1 -c fixed_ips +-------------------------------------------------------------+ | fixed_ips | +-------------------------------------------------------------+ | {"subnet_id": "dbac(...)673d", "ip_address": "192.168.0.1"} | +-------------------------------------------------------------+
6.
Attach the available floating IP to the finance-web1 instance. When done, log in to the finance-web1 instance as the cloud-user user, using the /home/student/ developer1-keypair1.pem key file. Even though the floating IP address is attached to the finance-web1 instance, logging in to the instance will fail. This issue will be resolved in an upcoming step in this exercise. 6.1. Attach the available floating IP to the finance-web1 instance. [student@workstation ~(developer1-finance)]$ openstack server add floating ip \ finance-web1 172.25.250.P
6.2. Log in to the finance-web1 instance as the cloud-user user, using the /home/ student/developer1-keypair1.pem key file. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
7.
Verify that the instance is not able to contact the metadata service at boot time. The metadata service is not reachable because the finance-subnet1 was not connected to the finance-router1 router when the finance-web1 instance was created. This is the root cause for the previous issue because the key is not added to the authorized_keys for the cloud-user user. 7.1. Obtain the console URL for the finance-web1 instance. [student@workstation ~(developer1-finance)]$ openstack console url show \ finance-web1 +-------+-------------------------------------------------------------+ | Field | Value | +-------+-------------------------------------------------------------+ | type | novnc | | url | http://172.25.250.50:6080/vnc_auto.html?token=c93c(...)d896 | +-------+-------------------------------------------------------------+
7.2. Open Firefox, and navigate to the finance-web1 instance's console URL.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
333
Chapter 7. Troubleshooting OpenStack Issues 7.3. Log in to the finance-web1 instance's console as the root user, using redhat as a password. 7.4. Verify that the authorized_keys file for the cloud-user is empty. No key has been injected by cloud-init during the instance's boot process. [root@host-192-168-0-N ~]# cat /home/cloud-user/.ssh/authorized_keys
7.5. Verify in the cloud-init log file, located at /var/log/cloud-init.log, that the finance-web1 instance cannot reach the metadata service during its boot process. [root@host-192-168-0-N ~]# less /var/log/cloud-init.log ...output omitted... [ 134.170335] cloud-init[475]: 2014-07-01 07:33:22,857 url_helper.py[WARNING]: Calling 'http://192.168.0.1//latest/meta-data/instanceid' failed [0/120s]: request error [HTTPConnectionPool(host='192.168.0.1', port=80): Max retries exceeded with url: //latest/meta-data/instance-id (...) [Errno 113] No route to host)] ...output omitted...
8.
On workstation, stop then start finance-web1 instance to allow cloud-init to recover. The metadata service is reachable now because the finance-subnet1 subnetwork is connected to the finance-router1 router. 8.1. Stop the finance-web1 instance. [student@workstation ~(developer1-finance)]$ openstack server stop \ finance-web1
8.2. Verify that the finance-web1 instance is in the SHUTOFF state. [student@workstation ~(developer1-finance)]$ openstack server show \ finance-web1 -c status -f value SHUTOFF
8.3. Start the finance-web1 instance. [student@workstation ~(developer1-finance)]$ openstack server start \ finance-web1
8.4. Log in to the finance-web1 instance as the cloud-user user, using the /home/ student/developer1-keypair1.pem key file. [student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \ [email protected] Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.
8.5. Verify that the authorized_keys file for the cloud-user user has had a key injected into it. When done, log out from the instance.
334
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[cloud-user@finance-web1 ~]$ cat .ssh/authorized_keys ssh-rsa AAAA(...)JDGZ Generated-by-Nova [cloud-user@finance-web1 ~]$ exit
9.
On workstation, create a 1 GB volume named finance-volume1. The volume creation will fail. 9.1. On workstation, create a 1 GB volume, named finance-volume1. [student@workstation ~(developer1-finance)]$ openstack volume create \ --size 1 finance-volume1 ...output omitted...
9.2. Verify the status of the volume finance-volume1. The volume's status will be error. [student@workstation ~(developer1-finance)]$ openstack volume list +---------------+-----------------+--------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +---------------+-----------------+--------+------+-------------+ | b375(...)0008 | finance-volume1 | error | 1 | | +---------------+-----------------+--------+------+-------------+
10. Confirm the reason that the finance-volume1 volume was not correctly created. It is because no valid host was found by the Block Storage scheduler service. 10.1. Log in to controller0 as heat-admin. [student@workstation ~(developer1-finance)]$ ssh heat-admin@controller0
10.2.Verify that the Block Storage scheduler log file, located at /var/log/cinder/ scheduler.log, reports a no valid host issue. [heat-admin@overcloud-controller-0 ~]$ sudo less /var/log/cinder/scheduler.log ...output omitted... (...) in rbd.RBD.create (rbd.c:3227)\n', u'PermissionError: error creating image \n'] (...) No valid host was found. (...)
11.
Verify that the Block Storage volume service's status is up to discard any issue related to RabbitMQ. 11.1. Load admin credentials. [heat-admin@overcloud-controller-0 ~]$ source overcloudrc
11.2. Verify that the Block Storage volume service's status is up. [heat-admin@overcloud-controller-0 ~]$ openstack volume service list \ -c Binary -c Status -c State +------------------+---------+-------+ | Binary | Status | State |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
335
Chapter 7. Troubleshooting OpenStack Issues +------------------+---------+-------+ | cinder-volume | enabled | up | ...output omitted... +------------------+---------+-------+
12. Verify that the Block Storage service is configured to use the openstack user, and the volumes pool. When done, verify that the volume creation error is related to the permissions of the openstack user in Ceph. This user needs read, write and execute capabilities on the volumes pools. 12.1. Verify that the block storage service is configured to use the openstack user, and the volumes pool. [heat-admin@overcloud-controller-0 ~]$ sudo grep "rbd_" \ /etc/cinder/cinder.conf ...output omitted... rbd_pool=volumes rbd_user=openstack ...output omitted...
12.2.Log in to ceph0 as heat-admin. [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~(developer1-finance)]$ ssh heat-admin@ceph0
12.3.Verify that the volumes pool is available. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd lspools 0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,
12.4.Verify that the openstack user has no capabilities on the volumes pool. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth get client.openstack exported keyring for client.openstack [client.openstack] key = AQCg7T5ZAAAAABAAI6ZtsCQEuvVNqoyRKzeNcw== caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics"
13. Fix the issue by adding read, write, and execute capabilities to the openstack user on the volumes pool. 13.1. Add the read, write, and execute capabilities to the openstack user on the volumes pool. Unfortunately, you cannot simply add to the list, you must retype it entirely.
Important Please note that the line starting with osd must be entered as a single line.
336
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth caps \ client.openstack \ mon 'allow r' \ osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics' updated caps for client.openstack
13.2.Verify that the openstack user's capabilities has been correctly updated. When done, log out from the Ceph node. [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth get client.openstack exported keyring for client.openstack [client.openstack] key = AQCg7T5ZAAAAABAAI6ZtsCQEuvVNqoyRKzeNcw== caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics" [heat-admin@overcloud-cephstorage-0 ~]$ logout
14. On workstation, try to create again a 1 GB volume, named finance-volume1. The volume creation will be successful. You need to delete the failed finance-volume1 volume. 14.1. Delete the failed finance-volume1 volume. [student@workstation ~(developer1-finance)]$ openstack volume delete \ finance-volume1
14.2.Create a 1 GB volume named finance-volume1. [student@workstation ~(developer1-finance)]$ openstack volume create \ --size 1 finance-volume1
14.3.Verify that the finance-volume1 volume has been correctly created. The volume status should show available, if status is error, please ensure permissions were set correctly in the previous step. [student@workstation ~(developer1-finance)]$ openstack volume list +---------------+-----------------+-----------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +---------------+-----------------+-----------+------+-------------+ | e454(...)ddc8 | finance-volume1 | available | 1 | | +---------------+-----------------+-----------+------+-------------+
Cleanup From workstation, run the lab troubleshooting-services cleanup script to clean up this exercise.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
337
Chapter 7. Troubleshooting OpenStack Issues
[student@workstation ~]$ lab troubleshooting-services cleanup
338
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Lab: Troubleshooting OpenStack
Lab: Troubleshooting OpenStack In this lab, you will find and fix issues in the OpenStack environment. You will solve problems in the areas of authentication, networking, compute nodes, and security. Finally, you will launch an instance and ensure that everything is working as it should. Outcomes You should be able to: • Troubleshoot authentication issues within OpenStack • Search log files to help describe the nature of the problem • Troubleshoot messaging issues within OpenStack • Troubleshoot networking issues within OpenStack Before you begin Log in to workstation as student using student as the password. From workstation, run lab troubleshooting-review setup, which verifies that OpenStack services are running and the resources required for the lab are available. This script also breaks the nova configuration, authentication, and networking. This script downloads the QCOW2 file that you need to create images, and creates the rc files (admin-rc and operator1production-rc) that you will need during this lab. [student@workstation ~]$ lab troubleshooting-review setup
Steps 1. As the operator1 user, remove the existing image called production-rhel7. The operator1-production-rc file can be found in student's home directory on workstation. Troubleshoot any problems. 2.
Source the admin-rc credential file, then run lab troubleshooting-review break to set up the next part of the lab exercise. [student@workstation ~]$ source ~/admin-rc [student@workstation ~(admin-admin)]$ lab troubleshooting-review break
3.
Re-source the /home/student/operator1-production-rc and attempt to list the images. It should fail. Troubleshoot any issues and fix the problem.
4.
Create a new server instance named production-web1. Use the m1.web flavor, the operator1-keypair1 key pair, the production-network1 network, the productionweb security group, and the rhel7 image. This action will fail. Troubleshoot any issues and fix the problem.
5.
Create a floating IP address and assign it to the instance. Troubleshoot any issues and fix the problem.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
339
Chapter 7. Troubleshooting OpenStack Issues 6.
Access the instance using SSH. An error will occur. Troubleshoot any issues and fix the problem.
7.
Create a volume named production-volume1, size 1 GB. Verify the volume status. Use the admin user's Identity service rc file on controller0 at /home/heat-admin/ overcloudrc. Troubleshoot any issues and fix the problem.
Evaluation On workstation, run the lab troubleshooting-review grade command to confirm success of this exercise. [student@workstation ~]$ lab troubleshooting-review grade
Cleanup From workstation, run the lab troubleshooting-review cleanup script to clean up this exercise. [student@workstation ~]$ lab troubleshooting-review cleanup
340
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution In this lab, you will find and fix issues in the OpenStack environment. You will solve problems in the areas of authentication, networking, compute nodes, and security. Finally, you will launch an instance and ensure that everything is working as it should. Outcomes You should be able to: • Troubleshoot authentication issues within OpenStack • Search log files to help describe the nature of the problem • Troubleshoot messaging issues within OpenStack • Troubleshoot networking issues within OpenStack Before you begin Log in to workstation as student using student as the password. From workstation, run lab troubleshooting-review setup, which verifies that OpenStack services are running and the resources required for the lab are available. This script also breaks the nova configuration, authentication, and networking. This script downloads the QCOW2 file that you need to create images, and creates the rc files (admin-rc and operator1production-rc) that you will need during this lab. [student@workstation ~]$ lab troubleshooting-review setup
Steps 1. As the operator1 user, remove the existing image called production-rhel7. The operator1-production-rc file can be found in student's home directory on workstation. Troubleshoot any problems. 1.1. Source the /home/student/operator1-production-rc file. [student@workstation ~]$ source ~/operator1-production-rc
1.2. Delete the existing image. [student@workstation ~(operator1-production)]$ openstack image delete \ production-rhel7 Failed to delete image with name or ID '21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef': 403 Forbidden Image 21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef is protected and cannot be deleted. (HTTP 403) Failed to delete 1 of 1 images.
1.3. The error you see is because the image is currently protected. You need to unprotect the image before it can be deleted. [student@workstation ~(operator1-production)]$ openstack image set \ --unprotected production-rhel7 [student@workstation ~(operator1-production)]$ openstack image delete \
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
341
Chapter 7. Troubleshooting OpenStack Issues production-rhel7
2.
Source the admin-rc credential file, then run lab troubleshooting-review break to set up the next part of the lab exercise. [student@workstation ~]$ source ~/admin-rc [student@workstation ~(admin-admin)]$ lab troubleshooting-review break
3.
Re-source the /home/student/operator1-production-rc and attempt to list the images. It should fail. Troubleshoot any issues and fix the problem. 3.1.
[student@workstation ~(admin-admin)]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack image list Discovering versions from the identity service failed when creating the password plugin. Attempting to determine version from URL. Unable to establish connection to http://172.25.251.50:5000/v2.0/tokens: HTTPConnectionPool(host='172.25.251.50', port=5000): Max retries exceeded with url: /v2.0/tokens.................: Failed to establish a new connection: [Errno 110] Connection timed out',)
3.2. The error occurs because OpenStack cannot authenticate the operator1 user. This can happen when the rc file for the user has a bad IP address. Check the rc file and note the OS_AUTH_URL address. Compare this IP address to the one that can be found in /etc/haproxy/haproxy.cfg on controller0. Search for the line: listen keystone_public. The second IP address is the one that must be used in the user's rc file. When done, log out from the controller node. [student@workstation ~(operator1-production)]$ ssh heat-admin@controller0 \ cat /etc/haproxy/haproxy.cfg ...output omitted... listen keystone_public bind 172.24.1.50:5000 transparent bind 172.25.250.50:5000 transparent ...output omitted...
3.3. Compare the IP address from HAproxy and the rc file. You need to change it to the correct IP address to continue. ...output omitted... export OS_AUTH_URL=http://172.25.251.50:5000/v2.0 ...output omitted...
3.4. Edit the file and correct the IP address. ...output omitted... export OS_AUTH_URL=http://172.25.250.50:5000/v2.0 ...output omitted...
3.5. Source the operator1-production-rc again. Use the openstack image list command to ensure that the OS_AUTH_URL option is correct. [student@workstation ~(operator1-production)]$ source ~/operator1-production-rc
342
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution [student@workstation ~(operator1-production)]$ openstack image list +--------------------------------------+-------+--------+ | ID | Name | Status | +--------------------------------------+-------+--------+ | 21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef | rhel7 | active | +--------------------------------------+-------+--------+
4.
Create a new server instance named production-web1. Use the m1.web flavor, the operator1-keypair1 key pair, the production-network1 network, the productionweb security group, and the rhel7 image. This action will fail. Troubleshoot any issues and fix the problem. 4.1. Create a new server instance. [student@workstation ~(operator1-production)]$ openstack server create \ --flavor m1.web \ --key-name operator1-keypair1 \ --nic net-id=production-network1 \ --security-group production-web \ --image rhel7 --wait production-web1 Error creating server: production-web1 Error creating server
4.2. This error is due to a problem with the nova compute service. List the Nova services. You need to source the /home/student/admin-rc first, as operator1 does not have permission to interact directly with nova services. [student@workstation ~(operator1-production)]$ source ~/admin-rc [student@workstation ~(admin-admin)]$ nova service-list +----+-----------------+-----------------------------------+----------+------+ | ID | Binary | Host | Status | State| +----+-----------------+-----------------------------------+----------+------+ | 3 | nova-consoleauth| overcloud-controller-0.localdomain| enabled | up | | 4 | nova-scheduler | overcloud-controller-0.localdomain| enabled | up | | 5 | nova-conductor | overcloud-controller-0.localdomain| enabled | up | | 7 | nova-compute | overcloud-compute-0.localdomain | disabled | down | +----+-----------------+-----------------------------------+----------+------+
4.3. Restart the nova-compute service. [student@workstation ~(admin-admin)]$ nova service-enable \ overcloud-compute-0.localdomain \ nova-compute +---------------------------------+--------------+---------+ | Host | Binary | Status | +---------------------------------+--------------+---------+ | overcloud-compute-0.localdomain | nova-compute | enabled | +---------------------------------+--------------+---------+
4.4. Source the operator1 rc file and try to create the instance again. First, delete the instance that is currently showing an error status. The instance deployment will finish correctly. [student@workstation ~(admin-admin)]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack server delete \
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
343
Chapter 7. Troubleshooting OpenStack Issues production-web1 [student@workstation ~(operator1-production)]$ openstack server list [student@workstation ~(operator1-production)]$ openstack server create \ --flavor m1.web \ --nic net-id=production-network1 \ --key-name operator1-keypair1 \ --security-group production-web \ --image rhel7 --wait production-web1
5.
Create a floating IP address and assign it to the instance. Troubleshoot any issues and fix the problem. 5.1. Create the floating IP [student@workstation ~(operator1-production)]$ openstack floating ip create \ provider-172.25.250 +--------------------+---------------------+------------------+------+ | ID | Floating IP Address | Fixed IP Address | Port | +--------------------+---------------------+------------------+------+ | ce31(...)9ecb | 172.25.250.N | None | None | +--------------------+---------------------+------------------+------+ [student@workstation ~(operator1-production)]$ openstack server add \ floating ip production-web1 172.25.250.N Unable to associate floating IP 172.25.250.N to fixed IP 192.168.0.6 for instance a53e66d9-6413-4ae4-b95b-2012dd52f908. Error: External network 7aaf57c1-3c34-45df-94d3-dbc12754b22e is not reachable from subnet cfc7ddfa-4403-41a7-878f-e8679596eafd. Therefore, cannot associate Port dcb6692d-0094-42ec-bc8e-a52fd97d7a4c with a Floating IP. Neutron server returns request_ids: ['req-4f88fb24-7628-4155-a921-ff628cb4b371'] (HTTP 400) (Request-ID: req-d6862c58-66c4-44b6-a4d1-bf26514bf04b)
This error message occurs because the external network is not attached to the router of the internal network. 5.2. Create an interface. [student@workstation ~(operator1-production)]$ neutron router-gateway-set \ production-router1 provider-172.25.250
5.3. Attach the floating IP address to the instance. Verify that the instance has been assigned the floating IP address. [student@workstation ~(operator1-production)]$ openstack server add \ floating ip production-web1 172.25.250.N [student@workstation ~(operator1-production)]$ openstack server list \ -c Name -c Networks +-----------------+-----------------------------------------------+ | Name | Networks | +-----------------+-----------------------------------------------+ | production-web1 | production-network1=192.168.0.P, 172.25.250.N | +-----------------+-----------------------------------------------+
6.
344
Access the instance using SSH. An error will occur. Troubleshoot any issues and fix the problem.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 6.1. Attempt to access the instance using SSH. [student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \ [email protected] ssh: connect to host 172.25.250.N port 22: Connection timed out
6.2. Find out which security group the instance is using, then list the rules in that security group. [student@workstation ~(operator1-production)]$ openstack server show \ production-web1 -f json ...output omitted... "security_groups": [ { "name": "production-web" } ], ...output omitted... [student@workstation ~(operator1-production)]$ openstack security group rule \ list production-web +---------------+-------------+----------+------------+-----------------------+ | ID | IP Protocol | IP Range | Port Range | Remote Security Group | +---------------+-------------+----------+------------+-----------------------+ | cc92(...)95b1 | None | None | | None | | eb84(...)c6e7 | None | None | | None | +---------------+-------------+----------+------------+-----------------------+
6.3. We can see that there is no rule allowing SSH to the instance. Create the security group rule. [student@workstation ~(operator1-production)]$ openstack security group rule \ create --protocol tcp --dst-port 22:22 production-web +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | created_at | 2017-06-12T07:24:34Z | | description | | | direction | ingress | | ethertype | IPv4 | | headers | | | id | 06070264-1427-4679-bd8e-e3a8f2e189e9 | | port_range_max | 22 | | port_range_min | 22 | | project_id | 9913a8abd192443c96587a8dc1c0a364 | | project_id | 9913a8abd192443c96587a8dc1c0a364 | | protocol | tcp | | remote_group_id | None | | remote_ip_prefix | 0.0.0.0/0 | | revision_number | 1 | | security_group_id | ac9ae6e6-0056-4501-afea-f83087b8297f | | updated_at | 2017-06-12T07:24:34Z | +-------------------+--------------------------------------+
6.4. Now try to access the instance again. [student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \ [email protected]
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
345
Chapter 7. Troubleshooting OpenStack Issues Warning: Permanently added '172.25.250.N' (ECDSA) to the list of known hosts. [cloud-user@production-web1 ~]$
6.5. Log out from production-web1. [cloud-user@production-web1 ~]$ exit Connection to 172.25.250.N closed.
7.
Create a volume named production-volume1, size 1 GB. Verify the volume status. Use the admin user's Identity service rc file on controller0 at /home/heat-admin/ overcloudrc. Troubleshoot any issues and fix the problem. 7.1. Create the volume. [student@workstation ~(operator1-production)]$ openstack volume create \ --size 1 production-volume1 ...output omitted...
7.2. Check the status of production-volume1. [student@workstation ~(operator1-production)]$ openstack volume list +---------------+--------------------+--------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +---------------+--------------------+--------+------+-------------+ | 0da8(...)be3f | production-volume1 | error | 1 | | +---------------+--------------------+--------+------+-------------+
7.3. The volume displays an error status. The Block Storage scheduler service is unable to find a valid host on which to create the volume. The Block Storage volume service is currently down. Log into controller0 as heat-admin. [student@workstation ~(operator1-production)]$ ssh heat-admin@controller0
7.4. Verify that no valid host was found to create the production-volume1 in the Block Storage scheduler's log file. [heat-admin@overcloud-controller-0 ~]$ sudo less /var/log/cinder/scheduler.log ...output omitted... 201 (...) Failed to run task cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid host was found. No weighed hosts available
7.5. Load the admin credentials and verify that the Cinder volume service is down. The admin credential can be found in /home/heat-admin/overcloudrc. [heat-admin@overcloud-controller-0 ~]$ source ~/overcloudrc [heat-admin@overcloud-controller-0 ~]$ openstack volume service list \ -c Binary -c Host -c Status -c State +------------------+------------------------+---------+-------+ | Binary | Host | Status | State | +------------------+------------------------+---------+-------+ | cinder-scheduler | hostgroup | enabled | up | | cinder-volume | hostgroup@tripleo_ceph | enabled | down |
346
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution +------------------+------------------------+---------+-------+
7.6. Confirm that the IP address and port for the RabbitMQ cluster and the rabbitmqclone Pacemaker resource are correct. [heat-admin@overcloud-controller-0 ~]$ sudo rabbitmqctl status Status of node 'rabbit@overcloud-controller-0' ... ...output omitted... {listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]}, ...output omitted...
7.7. Verify the Cinder configuration file. The username for rabbit_userid is wrong. In the following output, you can see the default is guest, but is currently set as change_me. [heat-admin@overcloud-controller-0 ~]$ | grep rabbit_userid #rabbit_userid = guest rabbit_userid = change_me
sudo cat /etc/cinder/cinder.conf \
7.8. Using crudini change the RabbitMQ user name in the Cinder configuration file. Then reload the Cinder configuration in the Pacemaker cluster to apply the changes and log out. [heat-admin@overcloud-controller-0 ~]$ sudo crudini --set \ /etc/cinder/cinder.conf \ oslo_messaging_rabbit rabbit_userid guest [heat-admin@overcloud-controller-0 ~]$ sudo pcs resource restart \ openstack-cinder-volume [heat-admin@overcloud-controller-0 ~]$ exit
7.9. On workstation, delete the incorrect volume and recreate it. Verify it has been properly created. [student@workstation ~(operator1-production]$ openstack volume delete \ production-volume1 [student@workstation ~(operator1-production)]$ openstack volume create \ --size 1 production-volume1 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2017-06-14T08:08:01.726844 | | description | None | | encrypted | False | | id | 128a9514-f8bd-4162-9f7e-72036f684cba | | multiattach | False | | name | production-volume1 | | properties | | | replication_status | disabled | | size | 1 | | snapshot_id | None |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
347
Chapter 7. Troubleshooting OpenStack Issues | source_volid | None | | status | creating | | type | None | | updated_at | None | | user_id | 0ac575bb96e24950a9551ac4cda082a4 | +---------------------+--------------------------------------+ [student@workstation ~(operator1-production)]$ openstack volume list +--------------------------------------+--------------------+-----------+------+ | ID | Display Name | Status | Size | +--------------------------------------+--------------------+-----------+------+ | 128a9514-f8bd-4162-9f7e-72036f684cba | production-volume1 | available | 1 | +--------------------------------------+--------------------+-----------+------+
Evaluation On workstation, run the lab troubleshooting-review grade command to confirm success of this exercise. [student@workstation ~]$ lab troubleshooting-review grade
Cleanup From workstation, run the lab troubleshooting-review cleanup script to clean up this exercise. [student@workstation ~]$ lab troubleshooting-review cleanup
348
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Summary
Summary In this chapter, you learned: • The overcloud uses the HAProxy service to balance traffic to OpenStack services. • The OpenStack compute service is composed of different components running on both the controller and the compute nodes. These components include the Compute scheduler and the Nova compute services. • The Compute scheduler component selects a compute node to deploy an instance based on an algorithm. By default, this algorithm is filter-based. • The Compute component orchestrates the instance deployment and sends the compute node status to the Compute scheduler component. The no valid host error means that the Compute scheduler has not identified a compute node that can provide the resources required by the instance. • The keystone_admin and the keystone_public services in HAProxy support the three endpoints for the Keystone identity service: public, admin, and internal. • Issues in OpenStack services are usually related to either a failing communication because of a nonfunctioning messaging service, or to a misconfiguration or issue in the storage back end, such as Ceph. • The RabbitMQ service is managed by a Pacemaker cluster running on the controller node. • To access an instance using a floating IP, both the external network associated with that floating IP and the internal network to which the instance is connected, have to be connected using a router. • If an image is set as protected, it cannot be removed. • The OpenStack block storage service requires that the openstack user has read, write, and execute capabilities in both the volumes and the images pool in Ceph.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
349
350
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 8
MONITORING CLOUD METRICS FOR AUTOSCALING Overview Goal
Monitor and analyze cloud metrics for use in orchestration autoscaling.
Objectives
• Describe the architecture of Ceilometer, Aodh, Gnocchi, Panko, and agent plugins. • Analyze OpenStack metrics for use in autoscaling.
Sections
• Describing OpenStack Telemetry Architecture (and Quiz) • Analyzing Cloud Metrics for Autoscaling (and Guided Exercise)
Lab
• Monitoring Cloud Metrics for Autoscaling
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
351
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Describing OpenStack Telemetry Architecture Objective After completing this section, students should be able to describe the architecture of Ceilometer, Aodh, Gnocchi, Panko, and agent plugins.
Telemetry Architecture and Services In Red Hat OpenStack Platform the Telemetry service provides user-level usage data for OpenStack components. These data are used for system monitoring, alerts, and for generating customer usage billing. The Telemetry service collects data using polling agents and notification agents. The polling agents poll the OpenStack infrastructure resources, such as the hypervisor, to publish the meters on the notification bus. The notification agent listens to the notifications on the OpenStack notification bus and converts them into meter events and samples. Most OpenStack resources are able to send such events using the notification system built into oslo.messaging. The normalized data collected by the Telemetry service is then published to various targets. The sample data collected by various agents is stored in the database by the OpenStack Telemetry collector service. The Telemetry collector service uses a pluggable storage system and various databases, such as MongoDB. The Telemetry API service allows executing query requests on this data store by the authenticated users. The query requests on a data store return a list of resources and statistics based on various metrics collected. With this architecture, the Telemetry API encountered scalability issues with an increase in query requests to read the metric data from the data store. Each query request requires the data store to do a full scan of all sample data stored in the database. A new metering service named Gnocchi was introduced to decouple the storing of metric data from the Telemetry service to increase efficiency. Similarly, alerts that were once handled by the Telemetry service were handed over to a new alarming service named Aodh. The Panko service now stores all the events generated by the Telemetry service. By decoupling these services from Telemetry, the scalability of the Telemetry service is greatly enhanced.
352
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Telemetry Architecture and Services Figure 8.1: Telemetry Service Architecture The Telemetry Service (Ceilometer) The Telemetry service collects data by using two built-in plugins: • Notification agents: This is the preferred method for collecting data. An agent monitors the message bus for data sent by different OpenStack services such as Compute, Image, Block Storage, Orchestration, Identity, etc. Messages are then processed by various plugins to convert them into events and samples. • Polling agents: These agents poll services for collecting data. Polling agents are either configured to get information about the hypervisor or using a remote API such as IPMI to gather the power state of a compute node. This method is less preferred as this approach increases the load on the Telemetry service API endpoint. Data gathered by notification and polling agents are processed by various transformers to generate data samples. For example, to get a CPU utilization percentage, multiple CPU utilization sample data collected over a period can be aggregated. The processed data samples get published to Gnocchi for long term storage or to an external system using a publisher. Polling Agent Plugins The Telemetry service uses a polling agent to gather information about the infrastructure that is not published by events and notifications from OpenStack components. The polling agents use the APIs exposed by the different OpenStack services and other hardware assets such as compute nodes. The Telemetry service uses agent plugins to support this polling mechanism. The three default agents plugins used for polling are: • Compute agent: This agent gathers resource data about all instances running on different compute nodes. The compute agent is installed on every compute node to facilitate interaction with the local hypervisor. Sample data collected by a compute agent is sent to the message bus. The sample data is processed by the notification agent and published to different publishers. • Central agent: These agents use the REST APIs of various OpenStack services to gather additional information that was not sent as a notification. A central agent polls networking, object storage, block storage, and hardware resources using SNMP. The sample data collected is sent to the message bus to be processed by the notification agent. • IPMI agent: This agent uses the ipmitool utility to gather IPMI sensor data. An IPMI-capable host requires that an IPMI agent is installed. The sample data gathered is used for providing metrics associated with the physical hardware. Gnocchi Gnocchi is based on a time series database used to store metrics and resources published by the Telemetry service. A time series database is optimized for handling data that contains arrays of numbers indexed by time stamp. The Gnocchi service provides a REST API to create or edit metric data. The gnocchi-metricd service computes statistics, in real time, on received data. This computed data is stored and indexed for fast retrieval. Gnocchi supports various back ends for storing the metric data and indexed data. Currently supported storage drivers for storing metric data include file, Ceph, Swift, S3, and Redis. The default storage driver is file. An overcloud deployment uses the ceph storage driver as the storage for the metric data. Gnocchi can use a PostgreSQL or a MySQL database to store indexed data and any associated metadata. The default storage driver for indexed data is PostgreSQL.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
353
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Figure 8.2: Gnocchi Service Architecture The Telemetry service uses the Gnocchi API service to publish data samples to Gnocchi for processing and storage. Received data samples are stored in temporary measure storage. The gnocchi-metricd service reads the measures from the measure storage. The gnocchimetricd service then computes the measures based on the archive policy and the aggregation methods defined for the meter. The computed statistics are then stored for long term in the metric storage. To retrieve the metric data, a client, such as the Telemetry alarming service, uses the Gnocchi API service to read the metric measures from the metric storage, and the metric metadata stored in the index storage. Aodh Aodh provides the alarming services within the Telemetry architecture. For example, you might want to trigger an alarm when CPU utilization of an instance reaches 70% for more than 10 minutes. To create an Aodh alarm, an alarm action and conditions need to be defined. An alarm rule is used to define when the alarm is to be triggered. The alarm rule can be based on an event or on a computed statistic. The definition of an action to be taken when the alarm is triggered supports multiple forms: • An HTTP callback URL, invoked when the alarm is triggered. • A log file to log the event information. • A notification sent using the messaging bus. Panko Panko provides the service to store events collected by the Telemetry service from various OpenStack components. The Panko service allows storing event data in long term storage, to be used for auditing and system debugging.
Telemetry Use Cases and Best Practices The Telemetry service provides metric data to support billing systems for OpenStack cloud resources. The Telemetry service gathers information about the system and stores it to provide
354
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Configuration Files and Logs data required for billing purposes. This data can be fed into cloud management software, such as Red Hat CloudForms, to provide itemized billing and a charge-back to the cloud users. Best Practices for using Telemetry The Telemetry service collects all data based on OpenStack components. An cloud administrator may not require all of the data gathered by the Telemetry service. Reducing the amount of data sent to the underlying storage increases performance, as this reduces the number of CPU cycles spent on transformation. In order to decrease the data being collected by the Telemetry service, an OpenStack administrator can edit the /etc/ceilometer/pipeline.yaml to include only the relevant meters. This decreases the data gathered by the Telemetry service. The Telemetry service polls the service API every 10 minutes by default. Increasing the polling interval will wait before sending metric data to storage, which may increase performance. Editing the /etc/ceilometer/pipeline.yaml is covered in further details later in this section. Best Practices for using Gnocchi Gnocchi aggregates the data dynamically when it receives the data from the Telemetry service. Gnocchi does not store the data as is, but aggregates it over a given period. An archive policy defines the time span and the level of precision that is kept when aggregating data. The time span defines how long the time series archive will be retained in the metric storage. The level of precision represents the granularity to be used when performing the aggregation. For example, if an archive policy defines a policy of 20 points with a granularity of 1 second, then the archive keeps up to 20 seconds, each representing an aggregation over 1 second. Three archive policies are defined by default: low, medium, and high. The archive policy to be used depends on your use case. Depending on the usage of the data, you can either use one of the default policies or define your own archive policy. Gnocchi Default Archive Policies Policy name
Archive policy definition
low
• Stores metric data with 5 minutes granularity over 30 days.
medium
• Stores metric data with one minute granularity over 7 days. • Stores metric data with one hour granularity over 365 days.
high
• Stores metric data with one second granularity over one hour. • Stores metric data with one minute granularity over 7 days. • Stores metric data with one hour granularity over 365 days.
The gnocchi-metricd daemon is used to compute the statistics of gathered data samples. In the event that the number of processes increase, the gnocchi-metricd daemon can be scaled to any number of servers.
Configuration Files and Logs The Telemetry service defines various configuration files present under the /etc/ceilometer directory. These files includes: Telemetry Configuration Files File Name
Description
ceilometer.conf
Configures Telemetry services and agents.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
355
Chapter 8. Monitoring Cloud Metrics for Autoscaling File Name
Description
event_definitions.yaml
Defines how events received from other OpenStack components translate to Telemetry events.
pipeline.yaml
Defines the pipeline for the Telemetry service to transform and publish data. This file can be modified to adjust polling intervals and number of samples generated by the Telemetry module.
meters.yaml
Defines meters. New meters can be added by updating this file.
gnocchi-resources.yaml
Defines the mapping between Telemetry samples and Gnocchi resources and metrics.
event_pipeline.yaml
Defines which notification event types are captured and where the events are published.
policy.json
Defines access control policies for the Telemetry service.
The /etc/ceilometer/ceilometer.yaml file defines the dispatcher for the processing of metering data with the meter_dispatchers variable. Gnocchi is used as the default meter dispatcher in the overcloud environment. The output below shows the dispatcher configured to use Gnocchi for processing the metering data (which is the default). # Dispatchers to process metering data. (multi valued) # Deprecated group/name - [DEFAULT]/dispatcher #meter_dispatchers = database meter_dispatchers=gnocchi
Pipelines are defined in the /etc/ceilometer/pipelines.yaml file. The processing of sample data is handled by notification agents. The source of data is events or samples gathered by the notification agents from the notification bus. Pipelines describe a coupling between sources of data and corresponding sinks for transformation and publication of data. The sinks section defined in the /etc/ceilometer/pipeline.yaml file provides the logic for sample data transformation and how the processed data is published. In the below pipeline configuration, the cpu meter, collected at 600 seconds interval, is subjected to two transformations named cpu_sink and cpu_delta_sink. The transformer generates the cpu_util meter from the sample values of the cpu counter, which represents cumulative CPU time in nanoseconds, defined using the scale parameter. --sources: - name: cpu_source interval: 600 meters: - "cpu" sinks: - cpu_sink - cpu_delta_sink sinks: - name: cpu_sink transformers: - name: "rate_of_change" parameters: target: name: "cpu_util" unit: "%" type: "gauge"
356
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Configuration Files and Logs scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))" publishers: - notifier:// - name: cpu_delta_sink transformers: - name: "delta" parameters: target: name: "cpu.delta" growth_only: True publishers: - notifier://
The processed data is published, over the messaging bus, to the persistent storage of several consumers. The publishers section in pipeline.yaml defines the destination for published data. The Telemetry service supports three types of publishers: • gnocchi: stores the metric data in Gnocchi time series database. • panko: stores the event data in the Panko data store. • notifier: sends the data to the AMQP messaging bus. Troubleshooting the Telemetry Service To troubleshoot the Telemetry service, an administrator must analyze the following Telemetry service log files, found in /var/log/ceilometer/: Telemetry Log Files File Name
Description
agent-notification.log
Logs the information generated by the notification agent.
central.log
Logs the information generated by the central agent.
collector.log
Logs the information generated by the collector service.
References Gnocchi Project Architecture http://gnocchi.xyz/architecture.html Telemetry service https://docs.openstack.org/newton/config-reference/telemetry.html Telemetry service overview https://docs.openstack.org/mitaka/install-guide-rdo/common/ get_started_telemetry.html Ceilometer architecture https://docs.openstack.org/ceilometer/latest/admin/telemetry-systemarchitecture.html
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
357
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Quiz: Describing OpenStack Telemetry Architecture Choose the correct answer(s) to the following questions: 1.
Which service is responsible for storing metering data gathered by the Telemetry service? a. b. c. d. e.
2.
What two data collection mechanisms are leveraged by the Telemetry service? (Choose two.) a. b. c. d.
3.
Panko Aodh Notifier Gnocchi
What two default archive policies are defined in the Gnocchi service? (Choose two.) a. b. c. d. e.
358
/etc/ceilometer/ceilometer.conf /etc/ceilometer/meters.conf /etc/ceilometer/definitions.yaml /etc/ceilometer/meters.yaml /etc/ceilometer/resources.yaml
What three publisher types are supported by the Telemetry service? (Choose three.) a. b. c. d.
5.
Polling agent Publisher agent Push agent Notification agent
Which configuration file contains the meter definitions for the Telemetry service? a. b. c. d. e.
4.
Panko Oslo Aodh Ceilometer Gnocchi
low coarse medium sparse moderate
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution Choose the correct answer(s) to the following questions: 1.
Which service is responsible for storing metering data gathered by the Telemetry service? a. b. c. d. e.
2.
What two data collection mechanisms are leveraged by the Telemetry service? (Choose two.) a. b. c. d.
3.
/etc/ceilometer/ceilometer.conf /etc/ceilometer/meters.conf /etc/ceilometer/definitions.yaml /etc/ceilometer/meters.yaml /etc/ceilometer/resources.yaml
What three publisher types are supported by the Telemetry service? (Choose three.) a. b. c. d.
5.
Polling agent Publisher agent Push agent Notification agent
Which configuration file contains the meter definitions for the Telemetry service? a. b. c. d. e.
4.
Panko Oslo Aodh Ceilometer Gnocchi
Panko Aodh Notifier Gnocchi
What two default archive policies are defined in the Gnocchi service? (Choose two.) a. b. c. d. e.
low coarse medium sparse moderate
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
359
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Analyzing Cloud Metrics for Autoscaling Objective After completing this section, students should be able to analyze OpenStack metrics for use in autoscaling.
Retrieve and Analyze OpenStack Metrics The Telemetry service stores the metrics associated with various OpenStack services persistently using the Time Series Database (Gnocchi) service. An authenticated user is allowed to send a request to a Time Series Database service API endpoint to read the measures stored in the data store. Time Series Database Resources Resources are objects that represent cloud components, such as an instance, volume, image, load balancer VIP, host, IPMI sensor, and so on. The measures stored in the Gnocchi Time Series Database service are indexed based on the resource and its attributes. Time Series Database Measure A measure in the Time Series Database service is the data gathered for a resource at a given time. The Time Series Database service stores the measure, which is a lightweight component; each measure includes a number, a time stamp, and a value. [user@demo ~]$ openstack metric measures show \ --resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \ cpu_util +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-06-14T00:00:00+00:00 | 86400.0 | 0.542669194306 | | 2017-06-14T15:00:00+00:00 | 3600.0 | 0.542669194306 | | 2017-06-14T15:40:00+00:00 | 300.0 | 0.542669194306 | +---------------------------+-------------+----------------+
Time Series Database Metrics The Time Series Database service provides an entity called metric that stores the aspect of the resource in the data store. For example, if the resource is an instance, the aspect is the CPU utilization, which is stored as a metric. Each metric has several attributes: a metric ID, a name, an archive policy that defines storage lifespan, and different aggregates of the measures. [user@demo ~]$ openstack metric metric show \ --resource-id 6bd6e073-4e97-4a48-92e4-d37cb365cddb \ image.serve +------------------------------------+------------------------------------------------+ | Field | Value | +------------------------------------+------------------------------------------------+ | archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, median, mean | | archive_policy/back_window | 0 | | archive_policy/definition | - points: 12, granularity: 0:05:00, timespan: | | | 1:00:00 | | | - points: 24, granularity: 1:00:00, timespan: | | | 1 day, 0:00:00 | | | - points: 30, granularity: 1 day, 0:00:00, |
360
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Retrieve and Analyze OpenStack Metrics | | archive_policy/name ...output omitted... | id | name ...output omitted...
| |
low
timespan: 30 days, 0:00:00 | |
| |
2a8329f3-8378-49f6-aa58-d1c5d37d9b62 image.serve
| |
Time Series Database Archive Policy The archive policy defined by an OpenStack administrator specifies the data storage policy in the Time Series Database service. For example, an administrator can define a policy to store data for one day with one second granularity, or one hour granularity of data to be stored for one year, or both. Aggregation methods, such as min, max, mean, sum, and so on, provided by the Time Series Database service are used to aggregate the measures based on granularity specified in the policy. The aggregated data is stored in the database according to the archive policies. Archive policies are defined on a per-metric basis and are used to determine the lifespan of stored aggregated data. Using the OpenStack CLI to Analyze Metrics The command-line tool provided by the python-gnocchiclient package helps retrieve and analyze the metrics stored in the Time Series Database service. The openstack metric command is used to retrieve and analyze the Telemetry metrics. To retrieve all the resources and the respective resource IDs, use the openstack metric resource list command. [user@demo ~]$ openstack metric resource list -c type -c id +--------------------------------------+----------------------------+ | id | type | +--------------------------------------+----------------------------+ | 4464b986-4bd8-48a2-a014-835506692317 | image | | 05a6a936-4a4c-5d1b-b355-2fd6e2e47647 | instance_disk | | cef757c0-6137-5905-9edc-ce9c4d2b9003 | instance_network_interface | | 6776f92f-0706-54d8-94a1-2dd8d2397825 | instance_disk | | dbf53681-540f-5ee1-9b00-c06bb53cbd62 | instance_disk | | cebc8e2f-3c8f-45a1-8f71-6f03f017c623 | swift_account | | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | instance | +--------------------------------------+----------------------------+
The Time Series Database service allows you to create custom resource types to enable the use of elements that are part of your architecture but are not tied to any OpenStack resources. For example, when using a hardware load balancer in the architecture, a custom resource type can be created. These custom resource types use all the features provided by the Time Series Database service, such as searching through the resources, associating metrics, and so on. To create a custom resource type, use the openstack metric resource-type create. The --attribute option is used to specify various attributes that are associated with the resource type. These attributes are used to search for resources associated with a resource type. [user@demo ~]$ openstack metric resource-type create \ --attribute display_name:string:true:max_length=255 \ mycustomresource +-------------------------+----------------------------------------------------------+ | Field | Value | +-------------------------+----------------------------------------------------------+ | attributes/display_name | max_length=255, min_length=0, required=True, type=string | | name | mycustomresource |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
361
Chapter 8. Monitoring Cloud Metrics for Autoscaling | state | active | +-------------------------+----------------------------------------------------------+
To list the metrics associated with a resource, use the openstack metric resource show command. The resource ID is retrieved using the openstack metric resource list -type command, which filters based on resource type. [user@demo ~]$ openstack metric resource list --type image -c type -c id +--------------------------------------+----------------------------+ | id | type | +--------------------------------------+----------------------------+ | 4464b986-4bd8-48a2-a014-835506692317 | image | +--------------------------------------+----------------------------+ [user@demo ~]$ openstack metric resource show 4464b986-4bd8-48a2-a014-835506692317 +-----------------------+------------------------------------------------------+ | Field | Value | +-----------------------+------------------------------------------------------+ | created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 | | created_by_user_id | 7521059a98cc4d579eea897027027575 | | ended_at | None | | id | 4464b986-4bd8-48a2-a014-835506692317 | | metrics | image.download: 7b52afb7-3b25-4722-8028-3d3cc3041316 | | | image.serve: 2e0027b9-bc99-425f-931a-a3afad313cb3 | | | image.size: ff4b7310-e6f9-4871-98b8-fff2006fb897 | | | image: b0feed69-078b-4ab7-9f58-d18b293c110e | | original_resource_id | 4464b986-4bd8-48a2-a014-835506692317 | | project_id | fd0ce487ea074bc0ace047accb3163da | | revision_end | None | | revision_start | 2017-05-16T03:48:57.218470+00:00 | | started_at | 2017-05-16T03:48:57.218458+00:00 | | type | image | | user_id | None | +-----------------------+------------------------------------------------------+
New metrics can be added to a resource by an administrator using the openstack metric resource update command. The --add-metric option can be used to add any existing metric. The --create-metric option is used to create and then add a metric. The --createmetric option requires the metric name and the archive policy to be attached to the metric. To add a new metric named custommetric with the low archive policy to an image resource, use the command as shown. The resource ID in this example is the ID that was shown previously. [user@demo ~]$ openstack metric resource update \ --type image \ --create-metric custommetric:low \ 4464b986-4bd8-48a2-a014-835506692317 +-----------------------+------------------------------------------------------+ | Field | Value | +-----------------------+------------------------------------------------------+ | container_format | bare | | created_by_project_id | df179bcea2e540e398f20400bc654cec | | created_by_user_id | b74410917d314f22b0301c55c0edd39e | | disk_format | qcow2 | | ended_at | None | | id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | | metrics | custommetric: ff016814-9047-4ee7-9719-839c9b79e837 | | | image.download: fc82d8eb-2f04-4a84-8bc7-fe35130d28eb | | | image.serve: 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 | | | image.size: 9b065b52-acf0-4906-bcc6-b9604efdb5e5 | | | image: 09883163-6783-4106-96ba-de15201e72f9 |
362
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Retrieve and Analyze OpenStack Metrics | name | finance-rhel7 | | original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | | project_id | cebc8e2f3c8f45a18f716f03f017c623 | | revision_end | None | | revision_start | 2017-05-23T04:06:12.958634+00:00 | | started_at | 2017-05-23T04:06:12.958618+00:00 | | type | image | | user_id | None | +-----------------------+------------------------------------------------------+
All the metrics provided by the Telemetry service can be listed by an OpenStack administrator using the openstack metric metric list command. [user@demo]$ openstack metric metric list -c name -c unit -c archive_policy/name +---------------------+---------------------------------+-----------+ | archive_policy/name | name | unit | +---------------------+---------------------------------+-----------+ | low | disk.iops | None | | low | disk.root.size | GB | | low | subnet.create | None | | low | storage.objects.outgoing.bytes | None | | low | disk.allocation | B | | low | network.update | None | | low | disk.latency | None | | low | disk.read.bytes | B | ...output omitted...
The openstack metric metric show command shows the metric details. The resource ID of a resource is retrieved using the openstack metric resource list command. To list the detailed information of the image.serve metric for an image with the 6bd6e073-4e97-4a48-92e4-d37cb365cddb resource ID, run the following command: [user@demo ~]$ openstack metric metric show \ --resource-id 6bd6e073-4e97-4a48-92e4-d37cb365cddb \ image.serve +------------------------------------+------------------------------------------------+ | Field | Value | +------------------------------------+------------------------------------------------+ | archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, median, mean | | archive_policy/back_window | 0 | | archive_policy/definition | - points: 12, granularity: 0:05:00, timespan: | | | 1:00:00 | | | - points: 24, granularity: 1:00:00, timespan: | | | 1 day, 0:00:00 | | | - points: 30, granularity: 1 day, 0:00:00, | | | timespan: 30 days, 0:00:00 | | archive_policy/name | low | | created_by_project_id | df179bcea2e540e398f20400bc654cec | | created_by_user_id | b74410917d314f22b0301c55c0edd39e | | id | 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 | | name | image.serve | | resource/created_by_project_id | df179bcea2e540e398f20400bc654cec | | resource/created_by_user_id | b74410917d314f22b0301c55c0edd39e | | resource/ended_at | None | | resource/id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | | resource/original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | | resource/project_id | cebc8e2f3c8f45a18f716f03f017c623 | | resource/revision_end | None | | resource/revision_start | 2017-05-23T04:06:12.958634+00:00 |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
363
Chapter 8. Monitoring Cloud Metrics for Autoscaling | resource/started_at | 2017-05-23T04:06:12.958618+00:00 | | resource/type | image | | resource/user_id | None | | unit | None | +------------------------------------+------------------------------------------------+
The openstack metric archive-policy list command list the archive policies. [user@demo ~]$ openstack metric archive-policy list -c name -c definition +--------+---------------------------------------+ | name | definition | +--------+---------------------------------------+ | high | - points: 3600, granularity: 0:00:01, | | | timespan: 1:00:00 | | | - points: 10080, granularity: | | | 0:01:00, timespan: 7 days, 0:00:00 | | | - points: 8760, granularity: 1:00:00, | | | timespan: 365 days, 0:00:00 | | low | - points: 12, granularity: 0:05:00, | | | timespan: 1:00:00 | | | - points: 24, granularity: 1:00:00, | | | timespan: 1 day, 0:00:00 | | | - points: 30, granularity: 1 day, | | | 0:00:00, timespan: 30 days, 0:00:00 | | medium | - points: 1440, granularity: 0:01:00, | | | timespan: 1 day, 0:00:00 | | | - points: 168, granularity: 1:00:00, | | | timespan: 7 days, 0:00:00 | | | - points: 365, granularity: 1 day, | | | 0:00:00, timespan: 365 days, 0:00:00 | +--------+---------------------------------------+ [user@demo ~]$ openstack metric archive-policy list -c name -c aggregation_methods +--------+---------------------------------------+ | name | aggregation_methods | +--------+---------------------------------------+ | high | std, count, 95pct, min, max, sum, | | | median, mean | | low | std, count, 95pct, min, max, sum, | | | median, mean | | medium | std, count, 95pct, min, max, sum, | | | median, mean | +--------+---------------------------------------+
A Telemetry service administrator can add measures to the data store using the openstack metric measures add command. To view measures, use the openstack metric measures show. Both commands require the metric name and resource ID as parameters. The Time Series Database service uses ISO 8601 time stamp format for output. In ISO 8601 notation, the date, time, and time zone are represented in the following format: yyyymmddThhmmss+|-hhmm. The date -u "+%FT%T.%6N" command converts the current date time into the ISO 8601 timestamp format. Measures are added using the yyyymmddThhmmss+|-hhmm@value format. Multiple measures can be added using the openstack metric measures add --measure command. The resource ID of a resource is retrieved using the openstack metric resource list command. To list the metrics associated with a resource, use the openstack metric resource show command. The default aggregation method used by the openstack metric resource show command is mean.
364
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Retrieve and Analyze OpenStack Metrics
Important For removing measures, administrator privileges are required. The final entry in the following output shows the average CPU utilization for a resource. [user@demo ~]$ openstack metric measures add \ --resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \ --measure $(date -u "+%FT%T.%6N")@23 \ cpu_util [user@demo ~]$ openstack metric measures show \ --resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \ --refresh cpu_util +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-04-27T00:00:00+00:00 | 86400.0 | 11.0085787895 | | 2017-04-27T11:00:00+00:00 | 3600.0 | 0.312042039086 | | 2017-04-27T12:00:00+00:00 | 3600.0 | 16.3568471647 | | 2017-04-27T11:45:00+00:00 | 300.0 | 0.374260637142 | | 2017-04-27T11:55:00+00:00 | 300.0 | 0.24982344103 | | 2017-04-27T12:05:00+00:00 | 300.0 | 0.263997402134 | | 2017-04-27T12:15:00+00:00 | 300.0 | 0.163391256752 | | 2017-04-27T12:20:00+00:00 | 300.0 | 32.5 | +---------------------------+-------------+----------------+
Note For querying and adding measures, a few other time stamp formats are supported. For example: 50 minutes, which indicating 50 minutes from now, and - 50 minutes, indicating 50 minutes ago. Time stamps based on the UNIX epoch is also supported. Use aggregation methods such as min, max, mean, sum, etc., to display the measures based on the granularity. The following command shows how to list measures with a particular aggregation method. The command uses the resource ID associated with an instance to display the minimum CPU utilization for different granularity. The --refresh option is used to include all new measures. The final entry of the following screen capture shows the minimum CPU utilization for the resource. [user@demo ~]$ openstack metric measures show \ --resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \ --aggregation min \ --refresh cpu_util +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-04-27T00:00:00+00:00 | 86400.0 | 0.163391256752 | | 2017-04-27T11:00:00+00:00 | 3600.0 | 0.24982344103 | | 2017-04-27T12:00:00+00:00 | 3600.0 | 0.163391256752 | | 2017-04-27T11:45:00+00:00 | 300.0 | 0.374260637142 | | 2017-04-27T11:55:00+00:00 | 300.0 | 0.24982344103 | | 2017-04-27T12:05:00+00:00 | 300.0 | 0.263997402134 | | 2017-04-27T12:15:00+00:00 | 300.0 | 0.163391256752 |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
365
Chapter 8. Monitoring Cloud Metrics for Autoscaling | 2017-04-27T12:20:00+00:00 | 300.0 | 23.0 | +---------------------------+-------------+----------------+
Querying the Telemetry Metrics The telemetry metrics stored in the database can be queried based on several conditions. For example, you can specify a time range to look for measures based on the aggregation method. Operators like equal to (eq), less than or equal to (le), greater than or equal to (ge), lesser than (lt), greater than (gt), and various not operators can be used in the query. The operators or, and, and not are also supported. The --query option uses attributes associated with a resource type. The following command displays the mean CPU utilization for all provisioned instances that use the flavor with an ID of 1, or that use the image with an ID of 6bd6e073-4e97-4a48-92e4-d37cb365cddb. [user@demo ~]$ openstack metric resource-type show instance +-------------------------+-----------------------------------------------------------+ | Field | Value | +-------------------------+-----------------------------------------------------------+ | attributes/display_name | max_length=255, min_length=0, required=True, type=string | | attributes/flavor_id | max_length=255, min_length=0, required=True, type=string | | attributes/host | max_length=255, min_length=0, required=True, type=string | | attributes/image_ref | max_length=255, min_length=0, required=False, type=string | | attributes/server_group | max_length=255, min_length=0, required=False, type=string | | name | instance | | state | active | +-------------------------+-----------------------------------------------------------+ [user@demo ~]$ openstack metric measures \ aggregation \ --metric cpu_util \ --aggregation mean \ --resource-type instance \ --query '(flavor_id='1')or(image_ref='6bd6e073-4e97-4a48-92e4-d37cb365cddb')' \ --refresh +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-06-14T00:00:00+00:00 | 86400.0 | 0.575362745414 | ...output omitted...
Use the --start option and the --stop option in the openstack metric measures aggregation command to provide the time range for computing aggregation statistics. For example, the server_group attribute of the instance resource type can be used the --query option to group a specific set of instances which can then be monitored for autoscaling. It is also possible to search for values in the metrics by using one or more levels of granularity. Use the -granularity option to make queries based on the granularity. Common Telemetry Metrics The Telemetry service collects different meters by polling the infrastructural components or by consuming notifications provided by various OpenStack services. There are three types metric data provided by the Telemetry service: • Cumulative: A cumulative meter provides measures that are accumulated over time. For example, total CPU time used. • Gauge: A gauge meter records the current value at the time that a reading is recorded. For example, number of images.
366
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Backing Up and Restoring Telemetry Data • Delta: A delta meter records the change between values recorded over a particular time period. For example, network bandwidth. Some of the meters collected for instances are: Compute Service Meters Meter Name
Meter Type
Description
memory
gauge
The amount of memory in MB allocated to the instance.
memory.usage
gauge
Amount of memory in MB consumed by the instance.
cpu
cumulative
Total CPU time used in nanosecond (ns)
cpu_util
gauge
Average CPU utilization in percentage.
disk.read.requests
cumulative
Total number of read requests.
disk.read.requests.rate
gauge
Average rate of read requests per second.
The meters collected for the images are: Image Service Meters Meter Name
Meter Type
Description
image
gauge
The size of the image uploaded.
image.download
delta
The number of bytes downloaded for an image.
image.serve
delta
The number of bytes served out for an image.
Backing Up and Restoring Telemetry Data To recover from the loss of Telemetry data, the database associated with the metering data needs to have been backed up. Both the indexed and metric storage databases associated with the Time Series Database service can be backed up using the native database tools. The indexed data stored in PostgreSQL or MYSQL can be backed up using the database dump utilities. Similarly, if the metering data is stored on Ceph, Swift, or the file system, then a snapshot must be regularly taken. The procedure for restoring both data stores is to restore the data backup using the native database utilities. The Time Series Database services should be restarted after restoring the databases. The procedure to back up and restore is beyond the scope of this course.
Creating and Managing Telemetry Alarms Aodh is the alarming service in the Telemetry service architecture. Aodh allows OpenStack users to create alarms based on events and metrics provided by OpenStack services. When creating an alarm based on metrics, the alarm can be set for a single meter or a combination of many meters. For example, an alarm can be configured to be triggered when the memory consumption of the instance breaches 70%, and the CPU utilization is more than 80%. In the case of an event alarm, the change in state of an OpenStack resource triggers the alarm. For example, updating an image property would trigger an event alarm for the image. The alarm action defines the action that needs to be taken when an alarm is triggered. In Aodh, the alarm notifier notifies the activation of an alarm by using one of three methods: triggering the HTTP callback URL, writing to a log file, or sending notifications to the messaging bus.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
367
Chapter 8. Monitoring Cloud Metrics for Autoscaling You can create a threshold alarm that activates when the aggregated statistics of a metric breaches the threshold value. In the following example, an alarm is created to trigger when the average CPU utilization metric of the instance exceeds 80%. The alarm action specified adds an entry to the log. A query is used so that the alarm monitors the CPU utilization of a particular instance with an instance ID of 5757edba-6850-47fc-a8d4-c18026e686fb. [usr@demo ~]$ openstack alarm create \ --type gnocchi_aggregation_by_resources_threshold \ --name high_cpu_util \ --description 'GnocchiAggregationByResourceThreshold' \ --metric cpu_util \ --aggregation-method mean \ --comparison-operator 'ge' \ --threshold 80 \ --evaluation-periods 2 \ --granularity 300 \ --alarm-action 'log://' \ --resource-type instance \ --query '{"=": {"id": "5757edba-6850-47fc-a8d4-c18026e686fb"}}' +---------------------------+-------------------------------------------------------+ | Field | Value | +---------------------------+-------------------------------------------------------+ | aggregation_method | mean | | alarm_actions | [u'log://'] | | alarm_id | 1292add6-ac57-4ae1-bd49-6147b68d8879 | | comparison_operator | ge | | description | GnocchiAggregationByResourceThreshold | | enabled | True | | evaluation_periods | 2 | | granularity | 300 | | insufficient_data_actions | [] | | metric | cpu_util | | name | high_cpu_util | | ok_actions | [] | | project_id | fd0ce487ea074bc0ace047accb3163da | | query | {"=": {"id": "5757edba-6850-47fc-a8d4-c18026e686fb"}} | | repeat_actions | False | | resource_type | instance | | severity | low | | state | insufficient data | | state_timestamp | 2017-05-19T06:46:19.235846 | | threshold | 80.0 | | time_constraints | [] | | timestamp | 2017-05-19T06:46:19.235846 | | type | gnocchi_aggregation_by_resources_threshold | | user_id | 15ceac73d7bb4437a34ee26670571612 | +---------------------------+-------------------------------------------------------+
To get the alarm state, use the openstack alarm state get command. The alarm history can be viewed using the openstack alarm-history show command. This checks the alarm state transition and shows the related time stamps. [usr@demo ~]$ openstack alarm state get 1292add6-ac57-4ae1-bd49-6147b68d8879 +-------+-------+ | Field | Value | +-------+-------+ | state | alarm | +-------+-------+ [usr@demo ~]$ openstack alarm-history show 1292add6-ac57-4ae1-bd49-6147b68d8879 \ -c timestamp -c type -c details -f json
368
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Telemetry Metrics in Autoscaling [ { "timestamp": "2017-06-08T16:28:54.002079", "type": "state transition", "detail": "{\"transition_reason\": \"Transition to ok due to 2 samples inside threshold, most recent: 0.687750180591\", \"state\": \"ok\"}" }, { "timestamp": "2017-06-08T15:25:53.525213", "type": "state transition", "detail": "{\"transition_reason\": \"2 datapoints are unknown\", \"state\": \"insufficient data\"}" }, { "timestamp": "2017-06-08T14:05:53.477088", "type": "state transition", "detail": "{\"transition_reason\": \"Transition to alarm due to 2 samples outside threshold, most recent: 70.0\", \"state\": \"alarm\"}" }, ...output omitted...
Telemetry Metrics in Autoscaling When using Auto Scaling to scale in and scale out of instances, the Telemetry service alerts provided by the Aodh alarm trigger the execution of the Auto Scaling policy. The alarm watches a single metric published by the metering service and alarms send messages to Auto Scaling when the metric crosses the threshold value. The alarm can monitor any metrics provided by the Telemetry metering service. Most common metrics for autoscaling an instance are cpu_util, memory.usage, disk.read.requests.rate, and disk.write.requests.rate. However, custom metrics can also be used to trigger autoscaling.
Monitoring Cloud Resources With the Telemetry Service The following steps outline the process for monitoring cloud resources using the Telemetry service. 1.
Use the openstack metric resource list command to find the resource ID and the desired resource.
2.
Use the openstack metric resource show command with the resource ID found in the previous step to view the available meters for the resource. Make note of the metric ID.
3.
Use the openstack metric metric show command with the metric ID found in the previous step to view the details of the desired meter.
4.
Create an alarm based on the desired meter using the openstack alarm create command. Use the --alarm-action option to define the action to be taken after the alarm is triggered.
5.
Verify the alarm state using the openstack alarm state get command.
6.
List the alarm history using the openstack alarm-history command to check the alarm state transition time stamps.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
369
Chapter 8. Monitoring Cloud Metrics for Autoscaling
References Further information is available in the Monitoring Using the Telemetry Service chapter of the Logging, Monitoring, and Troubleshooting Guide for Red Hat OpenStack Platform 10 at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
370
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Guided Exercise: Analyzing Cloud Metrics for Autoscaling
Guided Exercise: Analyzing Cloud Metrics for Autoscaling In this exercise, you will view and analyze common metrics required for autoscaling. Outcomes You should be able to: • List the available metrics associated with a resource. • Analyze metrics to view the aggregated values. Before you begin Log in to workstation as student user and student as the password. On workstation, run the lab monitoring-analyzing-metrics setup command. This script will ensure the OpenStack services are running and the environment is properly configured for this exercise. The script also creates an instance named finance-web1. [student@workstation ~]$ lab monitoring-analyzing-metrics setup
Steps 1. From workstation connect to the controller0 node. Open the /etc/ceilometer/ ceilometer.conf file and which meter dispatcher is configured for the Telemetry service. On workstation, run the ceilometer command (which should produce an error) to verify that the Gnocchi telemetry service is running instead of Ceilometer. 1.1. Use SSH to connect to controller0 as the user heat-admin. [student@workstation ~]$ ssh heat-admin@controller0 [heat-admin@overcloud-controller-0 ~]$
1.2. Open the /etc/ceilometer/ceilometer.conf and search for the meter_dispatchers variable. The meter dispatcher is set to gnocchi, which is storing the metering data. [heat-admin@overcloud-controller-0 ~]$ sudo grep meter_dispatchers \ /etc/ceilometer/ceilometer.conf #meter_dispatchers = database meter_dispatchers=gnocchi
1.3. Log out of the controller0 node. [heat-admin@overcloud-controller-0 ~]$ exit [student@workstation ~]$
1.4. From workstation, source the /home/student/developer1-finance-rc file. Verify that the ceilometer command returns an error because Gnocchi is set as the meter dispatcher.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
371
Chapter 8. Monitoring Cloud Metrics for Autoscaling
[student@workstation ~]$ source ~/developer1-finance-rc [student@workstation ~(developer1-finance)]$ ceilometer --debug meter-list ...output omitted... DEBUG (client) RESP BODY: {"error_message": "410 Gone\n\nThis resource is no longer available. No forwarding address is given. \n\n This telemetry installation is configured to use Gnocchi. Please use the Gnocchi API available on the metric endpoint to retrieve data. "} ...output omitted...
2.
List the resource types available in the Telemetry metering service. Use the resource ID of the instance resource type to list all the meters available. 2.1. List the resource types available. [student@workstation ~(developer1-finance)]$ openstack metric resource-type \ list -c name +----------------------------+ | name | +----------------------------+ | ceph_account | | generic | | host | | host_disk | | host_network_interface | | identity | | image | | instance | | instance_disk | | instance_network_interface | | ipmi | | network | | stack | | swift_account | | volume | +----------------------------+
2.2. List the resources accessible by the developer1 user. Note the resource ID of the instance resource type. [student@workstation ~(developer1-finance)]$ openstack metric resource list \ -c id -c type +--------------------------------------+----------------------------+ | id | type | +--------------------------------------+----------------------------+ | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | image | | 05a6a936-4a4c-5d1b-b355-2fd6e2e47647 | instance_disk | | cef757c0-6137-5905-9edc-ce9c4d2b9003 | instance_network_interface | | 6776f92f-0706-54d8-94a1-2dd8d2397825 | instance_disk | | dbf53681-540f-5ee1-9b00-c06bb53cbd62 | instance_disk | | cebc8e2f-3c8f-45a1-8f71-6f03f017c623 | swift_account | | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | instance | +--------------------------------------+----------------------------+
2.3. Verify that the instance ID of the finance-web1 instance is the same as the resource ID.
372
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[student@workstation ~(developer1-finance)]$ openstack server list \ -c ID -c Name +--------------------------------------+--------------+ | ID | Name | +--------------------------------------+--------------+ | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | finance-web1 | +--------------------------------------+--------------+
2.4. Using the resource ID, list all the meters associated with the finance-web1 instance. [student@workstation ~(developer1-finance)]$ openstack metric resource show \ a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 +-----------------------+------------------------------------------------------+ | Field | Value | +-----------------------+------------------------------------------------------+ | created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 | | created_by_user_id | 7521059a98cc4d579eea897027027575 | | ended_at | None | | id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | | metrics | cpu.delta: 75369002-85ca-47b9-8276-88f5314aa9ad | | | cpu: 71d9c293-f9ba-4b76-aaf8-0b1806a3b280 | | | cpu_util: 37274980-f825-4aef-b1b9-fe46d266d1d8 | | | disk.allocation: 93597246-5a02-4f65-b51c| | | 2b4946f411cd | | | disk.capacity: cff713de-cdcc-4162-8a5b-16f76f86cf10 | | | disk.ephemeral.size: | | | b7cccbb5-bc27-40fe-9296-d62dfc22dfce | | | disk.iops: ccf52c4d-9f59-4f78-8b81-0cb10c02b8e3 | | | disk.latency: c6ae52ee-458f-4b8f-800a-cb00e5b1c1a6 | | | disk.read.bytes.rate: 6a1299f2-a467-4eab| | | 8d75-f8ea68cad213 | | | disk.read.bytes: | | | 311bb209-f466-4713-9ac9-aa5d8fcfbc4d | | | disk.read.requests.rate: 0a49942b-bbc9-4b2b| | | aee1-b6acdeeaf3ff | | | disk.read.requests: 2581c3bd-f894-4798-bd5b| | | 53410de25ca8 | | | disk.root.size: b8fe97f1-4d5e-4e2c-ac11-cdd92672c3c9 | | | disk.usage: 0e12b7e5-3d0b-4c0f-b20e-1da75b2bff01 | | | disk.write.bytes.rate: a2d063ed| | | 28c0-4b82-b867-84c0c6831751 | | | disk.write.bytes: | | | 8fc5a997-7fc0-43b5-88a0-ca28914e47cd | | | disk.write.requests.rate: | | | db5428d6-c6d7-4d31-888e-d72815076229 | | | disk.write.requests: a39417a5-1dca| | | 4a94-9934-1deaef04066b | | | instance: 4ee71d49-38f4-4368-b86f-a72d73861c7b | | | memory.resident: 8795fdc3-0e69-4990-bd4c| | | 61c6e1a12c1d | | | memory.usage: 902c7a71-4768-4d28-9460-259bf968aac5 | | | memory: 277778df-c551-4573-a82e-fa7d3349f06f | | | vcpus: 11ac9f36-1d1f-4e72-a1e7-9fd5b7725a14 | | original_resource_id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | | project_id | 861b7d43e59c4edc97d1083e411caea0 | | revision_end | None | | revision_start | 2017-05-26T04:10:16.250620+00:00 | | started_at | 2017-05-26T03:32:08.440478+00:00 | | type | instance | | user_id | cbcc0ad8d6ab460ca0e36ba96528dc03 |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
373
Chapter 8. Monitoring Cloud Metrics for Autoscaling +-----------------------+------------------------------------------------------+
3.
List the meters associated with the image resource type. 3.1. Retrieve the resource ID associated with the image resource type. [student@workstation ~(developer1-finance)]$ openstack metric resource list \ --type image -c id -c type +--------------------------------------+-------+ | id | type | +--------------------------------------+-------+ | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | image | +--------------------------------------+-------+
3.2. List the meters associated with the image resource ID. [student@workstation ~(developer1-finance)]$ openstack metric resource show \ 6bd6e073-4e97-4a48-92e4-d37cb365cddb +-----------------------+------------------------------------------------------+ | Field | Value | +-----------------------+------------------------------------------------------+ | created_by_project_id | df179bcea2e540e398f20400bc654cec | | created_by_user_id | b74410917d314f22b0301c55c0edd39e | | ended_at | None | | id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | | metrics | image.download: fc82d8eb-2f04-4a84-8bc7-fe35130d28eb | | | image.serve: 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 | | | image.size: 9b065b52-acf0-4906-bcc6-b9604efdb5e5 | | | image: 09883163-6783-4106-96ba-de15201e72f9 | | original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | | project_id | cebc8e2f3c8f45a18f716f03f017c623 | | revision_end | None | | revision_start | 2017-05-23T04:06:12.958634+00:00 | | started_at | 2017-05-23T04:06:12.958618+00:00 | | type | image | | user_id | None | +-----------------------+------------------------------------------------------+
4.
Using the resource ID, list the details for the disk.read.requests.rate metric associated with the finance-web1 instance. [student@workstation ~(developer1-finance)]$ openstack metric metric show \ --resource-id a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 \ disk.read.requests.rate +------------------------------------+-----------------------------------------+ | Field | Value | +------------------------------------+-----------------------------------------+ | archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, | | | median, mean | | archive_policy/back_window | 0 | | archive_policy/definition | - points: 12, granularity: 0:05:00, | | | timespan: 1:00:00 | | | - points: 24, granularity: 1:00:00, | | | timespan: 1 day, 0:00:00 | | | - points: 30, granularity: 1 day, | | | 0:00:00, timespan: 30 days, 0:00:00 | | archive_policy/name | low | | created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 | | created_by_user_id | 7521059a98cc4d579eea897027027575 |
374
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| id | 0a49942b-bbc9-4b2b-aee1-b6acdeeaf3ff | | name | disk.read.requests.rate | | resource/created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 | | resource/created_by_user_id | 7521059a98cc4d579eea897027027575 | | resource/ended_at | None | | resource/id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | | resource/original_resource_id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | | resource/project_id | 861b7d43e59c4edc97d1083e411caea0 | | resource/revision_end | None | | resource/revision_start | 2017-05-26T04:10:16.250620+00:00 | | resource/started_at | 2017-05-26T03:32:08.440478+00:00 | | resource/type | instance | | resource/user_id | cbcc0ad8d6ab460ca0e36ba96528dc03 | | unit | None | +------------------------------------+-----------------------------------------+
The disk.read.requests.rate metric uses the low archive policy. The low archive policy uses as low as 5 minutes granularity for aggregation and the maximum life span of the aggregated data is 30 days. 5.
Display the measures gathered and aggregated by the disk.read.requests.rate metric associated with the finance-web1 instance. The number of records returned in the output may vary. [student@workstation ~(developer1-finance)]$ openstack metric measures show \ --resource-id a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 \ disk.read.requests.rate +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-05-23T00:00:00+00:00 | 86400.0 | 0.277122710561 | | 2017-05-23T04:00:00+00:00 | 3600.0 | 0.0 | | 2017-05-23T05:00:00+00:00 | 3600.0 | 0.831368131683 | | 2017-05-23T06:00:00+00:00 | 3600.0 | 0.0 | | 2017-05-23T07:00:00+00:00 | 3600.0 | 0.0 | | 2017-05-23T05:25:00+00:00 | 300.0 | 0.0 | | 2017-05-23T05:35:00+00:00 | 300.0 | 4.92324971194 | | 2017-05-23T05:45:00+00:00 | 300.0 | 0.0 | | 2017-05-23T05:55:00+00:00 | 300.0 | 0.0 | | 2017-05-23T06:05:00+00:00 | 300.0 | 0.0 | | 2017-05-23T06:15:00+00:00 | 300.0 | 0.0 | | 2017-05-23T06:25:00+00:00 | 300.0 | 0.0 | | 2017-05-23T06:35:00+00:00 | 300.0 | 0.0 | | 2017-05-23T06:45:00+00:00 | 300.0 | 0.0 | | 2017-05-23T06:55:00+00:00 | 300.0 | 0.0 | | 2017-05-23T07:05:00+00:00 | 300.0 | 0.0 | | 2017-05-23T07:15:00+00:00 | 300.0 | 0.0 | +---------------------------+-------------+----------------+
Observe the value column, which displays the aggregated values based on archive policy associated with the metric. The 86400, 3600, and 300 granularity column values represent the aggregation period as 1 day, 1 hour, and 5 minutes, respectively, in seconds. 6.
Using the resource ID, list the maximum measures associated with the cpu_util metric with 300 seconds granularity. The number of records returned in the output may vary. [student@workstation ~(developer1-finance)]$ openstack metric measures show \ --resource-id a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 \ --aggregation max \
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
375
Chapter 8. Monitoring Cloud Metrics for Autoscaling --granularity 300 \ cpu_util +---------------------------+-------------+-----------------+ | timestamp | granularity | value | +---------------------------+-------------+-----------------+ | 2017-05-23T05:45:00+00:00 | 300.0 | 0.0708371692841 | | 2017-05-23T05:55:00+00:00 | 300.0 | 0.0891683788482 | | 2017-05-23T06:05:00+00:00 | 300.0 | 0.0907790288644 | | 2017-05-23T06:15:00+00:00 | 300.0 | 0.0850440360854 | | 2017-05-23T06:25:00+00:00 | 300.0 | 0.0691660923575 | | 2017-05-23T06:35:00+00:00 | 300.0 | 0.0858326136269 | | 2017-05-23T06:45:00+00:00 | 300.0 | 0.0666668728895 | | 2017-05-23T06:55:00+00:00 | 300.0 | 0.0658094259754 | | 2017-05-23T07:05:00+00:00 | 300.0 | 0.108326315232 | | 2017-05-23T07:15:00+00:00 | 300.0 | 0.066695508806 | | 2017-05-23T07:25:00+00:00 | 300.0 | 0.0666670677802 | | 2017-05-23T07:35:00+00:00 | 300.0 | 0.0666727313294 | +---------------------------+-------------+-----------------+
7.
List the average CPU utilization for all instances provisioned using the rhel7 image. Query for all instances containing the word finance in the instance name. 7.1. List the attributes supported by the instance resource type. The command returns the attributes that may be used to query this resource type. [student@workstation ~(developer1-finance)]$ openstack metric resource-type \ show instance +-------------------------+----------------------------------------------------+ | Field | Value | +-------------------------+----------------------------------------------------+ | attributes/display_name | max_length=255, min_length=0, required=True, | | | type=string | | attributes/flavor_id | max_length=255, min_length=0, required=True, | | | type=string | | attributes/host | max_length=255, min_length=0, required=True, | | | type=string | | attributes/image_ref | max_length=255, min_length=0, required=False, | | | type=string | | attributes/server_group | max_length=255, min_length=0, required=False, | | | type=string | | name | instance | | state | active | +-------------------------+----------------------------------------------------+
7.2. Only users with the admin role can query measures using resource attributes. Use the architect1 user's Identity credentials to execute the command. The architect1 credentials are stored in the /home/student/architect1-finance-rc file. [student@workstation ~(developer1-finance)]$ source ~/architect1-finance-rc [student@workstation ~(architect1-finance)]$
7.3. Retrieve the image ID for the finance-rhel7 image. [student@workstation ~(architect1-finance)]$ openstack image list +--------------------------------------+---------------+--------+ | ID | Name | Status | +--------------------------------------+---------------+--------+ | 6bd6e073-4e97-4a48-92e4-d37cb365cddb | finance-rhel7 | active |
376
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
+--------------------------------------+---------------+--------+
7.4. List the average CPU utilization for all the instances using the openstack metric measures aggregation command. Use the --query option to filter the instances. Source the admin-rc credential file. The instance resource type has the attributes image_ref and display_name. The image_ref attribute specifies the image used for provisioning. The display_name attribute specifies the instance name. The query uses the like operator to search for the finance substring. Combine the query conditions using the and operator. The -refresh option is used to force aggregation of all known measures. The number of records returned in the output may vary. [student@workstation ~(architect1-finance)]$ openstack metric measures \ aggregation \ --metric cpu_util \ --aggregation mean \ --resource-type instance \ --query '(display_name like "finance%")and(image_ref='6bd6e073-4e97-4a48-92e4d37cb365cddb')' \ --refresh +---------------------------+-------------+-----------------+ | timestamp | granularity | value | +---------------------------+-------------+-----------------+ | 2017-05-23T00:00:00+00:00 | 86400.0 | 0.107856401515 | | 2017-05-23T04:00:00+00:00 | 3600.0 | 0.0856332847432 | | 2017-05-23T05:00:00+00:00 | 3600.0 | 0.214997947668 | | 2017-05-23T06:00:00+00:00 | 3600.0 | 0.0772163449665 | | 2017-05-23T07:00:00+00:00 | 3600.0 | 0.0761148056641 | | 2017-05-23T08:00:00+00:00 | 3600.0 | 0.073333038879 | | 2017-05-23T09:00:00+00:00 | 3600.0 | 0.111944170402 | | 2017-05-23T10:00:00+00:00 | 3600.0 | 0.110803068583 | | 2017-05-23T08:15:00+00:00 | 300.0 | 0.0675114757132 | | 2017-05-23T08:25:00+00:00 | 300.0 | 0.0858683130787 | | 2017-05-23T08:35:00+00:00 | 300.0 | 0.0658268878936 | | 2017-05-23T08:45:00+00:00 | 300.0 | 0.065833179174 | | 2017-05-23T08:55:00+00:00 | 300.0 | 0.0658398475278 | | 2017-05-23T09:05:00+00:00 | 300.0 | 0.109115311727 | | 2017-05-23T09:15:00+00:00 | 300.0 | 0.141717706062 | | 2017-05-23T09:25:00+00:00 | 300.0 | 0.159984446046 | | 2017-05-23T09:35:00+00:00 | 300.0 | 0.0858446020112 | | 2017-05-23T09:45:00+00:00 | 300.0 | 0.0875042966068 | | 2017-05-23T09:55:00+00:00 | 300.0 | 0.087498659958 | | 2017-05-23T10:05:00+00:00 | 300.0 | 0.110803068583 | +---------------------------+-------------+-----------------+
Cleanup From workstation, run the lab monitoring-analyzing-metrics cleanup command to clean up this exercise. [student@workstation ~]$ lab monitoring-analyzing-metrics cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
377
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Lab: Monitoring Cloud Metrics for Autoscaling In this lab, you will analyze the Telemetry metric data and create an Aodh alarm. You will also set the alarm to trigger when the maximum CPU utilization of an instance exceeds a threshold value. Outcomes You should be able to: • Search and list the metrics available with the Telemetry service for a particular user. • View the usage data collected for a metric. • Check which archive policy is in use for a particular metric. • Add new measures to a metric. • Create an alarm based on aggregated usage data of a metric, and trigger it. • View and analyze an alarm history. Before you begin Log in to workstation as student with a password of student. On workstation, run the lab monitoring-review setup command. This will ensure that the OpenStack services are running and the environment has been properly configured for this lab. The script also creates an instance named production-rhel7. [studentworkstation ~]$ lab monitoring-review setup
Steps 1. List all of the instance type telemetry resources accessible by the user operator1. Ensure the production-rhel7 instance is available. Observe the resource ID of the instance. Credentials for user operator1 are in /home/student/operator1-production-rc on workstation. 2.
List all metrics associated with the production-rhel7 instance.
3.
List the available archive policies. Verify that the cpu_util metric of the productionrhel7 instance uses the archive policy named low.
4.
Add new measures to the cpu_util metric. Observe that the newly added measures are available using min and max aggregation methods. Use the values from the following table. The measures must be added using the architect1 user's credentials, because manipulating data points requires an account with the admin role. Credentials of user architect1 are stored in /home/student/architect1-production-rc file. Measures Parameter Timestamp
Current time in ISO 8601 formatted timestamp
Measure values
30, 42
The measure values 30 and 42 are manual data values added to the cpu_util metric.
378
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
5.
Create a threshold alarm named cputhreshold-alarm based on aggregation by resources. Set the alarm to trigger when maximum CPU utilization for the productionrhel7 instance exceeds 50% for two consecutive 5 minute periods.
6.
Simulate high CPU utilization scenario by manually adding new measures to the cpu_util metric of the instance. Observe that the alarm triggers when the aggregated CPU utilization exceeds the 50% threshold through two evluation periods of 5 minutes each. To simulate high CPU utilization, manually add a measure with a value of 80 once every minute until the alarm triggers. It is expected to take between 5 and 10 minutes to trigger.
Evaluation On workstation, run the lab monitoring-review grade command to confirm success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~] lab monitoring-review grade
Cleanup From workstation, run the lab monitoring-review cleanup command to clean up this exercise. [student@workstation ~] lab monitoring-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
379
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Solution In this lab, you will analyze the Telemetry metric data and create an Aodh alarm. You will also set the alarm to trigger when the maximum CPU utilization of an instance exceeds a threshold value. Outcomes You should be able to: • Search and list the metrics available with the Telemetry service for a particular user. • View the usage data collected for a metric. • Check which archive policy is in use for a particular metric. • Add new measures to a metric. • Create an alarm based on aggregated usage data of a metric, and trigger it. • View and analyze an alarm history. Before you begin Log in to workstation as student with a password of student. On workstation, run the lab monitoring-review setup command. This will ensure that the OpenStack services are running and the environment has been properly configured for this lab. The script also creates an instance named production-rhel7. [studentworkstation ~]$ lab monitoring-review setup
Steps 1. List all of the instance type telemetry resources accessible by the user operator1. Ensure the production-rhel7 instance is available. Observe the resource ID of the instance. Credentials for user operator1 are in /home/student/operator1-production-rc on workstation. 1.1. From workstation, source the /home/student/operator1-production-rc file to use operator1 user credentials. Find the ID associated with the user. [student@workstation ~]$ source ~/operator1-production-rc [student@workstation ~(operator1-production)]$ openstack user show operator1 +------------+----------------------------------+ | Field | Value | +------------+----------------------------------+ | enabled | True | | id | 4301d0dfcbfb4c50a085d4e8ce7330f6 | | name | operator1 | | project_id | a8129485db844db898b8c8f45ddeb258 | +------------+----------------------------------+
1.2. Use the retrieved user ID to search the resources accessible by the operator1 user. Filter the output based on the instance resource type. [student@workstation ~(operator1-production)]$ openstack metric resource \ search user_id=4301d0dfcbfb4c50a085d4e8ce7330f6 \ -c id -c type -c user_id --type instance -f json
380
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution [ { "user_id": "4301d0dfcbfb4c50a085d4e8ce7330f6", "type": "instance", "id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c" } ]
1.3. Observe that the ID of the resource in the previous output matches the instance ID of the production-rhel7 instance. The production-rhel7 instance is available. [student@workstation ~(operator1-production)]$ openstack server show \ production-rhel7 -c id -c name -c status +--------+--------------------------------------+ | Field | Value | +--------+--------------------------------------+ | id | 969b5215-61d0-47c4-aa3d-b9fc89fcd46c | | name | production-rhel7 | | status | ACTIVE | +--------+--------------------------------------+
The production-rhel7 instance resource ID matches the production-rhel7 instance ID. Note this resource ID, as it will be used in upcoming lab steps. 2.
List all metrics associated with the production-rhel7 instance. 2.1. Use the production-rhel7 instance resource ID to list the available metrics. Verify that the cpu_util metric is listed. [student@workstation ~(operator1-production)]$ openstack metric resource \ show 969b5215-61d0-47c4-aa3d-b9fc89fcd46c --type instance +--------------+---------------------------------------------------------------+ |Field | Value | +--------------+---------------------------------------------------------------+ |id | 969b5215-61d0-47c4-aa3d-b9fc89fcd46c | |image_ref | 280887fa-8ca4-43ab-b9b0-eea9bfc6174c | |metrics | cpu.delta: a22f5165-0803-4578-9337-68c79e005c0f | | | cpu: e410ce36-0dac-4503-8a94-323cf78e7b96 | | | cpu_util: 6804b83c-aec0-46de-bed5-9cdfd72e9145 | | | disk.allocation: 0610892e-9741-4ad5-ae97-ac153bb53aa8 | | | disk.capacity: 0e0a5313-f603-4d75-a204-9c892806c404 | | | disk.ephemeral.size: 2e3ed19b-fb02-44be-93b7-8f6c63041ac3 | | | disk.iops: 83122c52-5687-4134-a831-93d80dba4b4f | | | disk.latency: 11e2b022-b602-4c5a-b710-2acc1a82ea91 | | | disk.read.bytes.rate: 3259c60d-0cb8-47d0-94f0-cded9f30beb2 | | | disk.read.bytes: eefa65e9-0cbd-4194-bbcb-fdaf596a3337 | | | disk.read.requests.rate: 36e0b15c-4f6c-4bda-bd03-64fcea8a4c70 | | | disk.read.requests: 6f14131e-f15c-401c-9599-a5dbcc6d5f2e | | | disk.root.size: 36f6f5c1-4900-48c1-a064-482d453a4ee7 | | | disk.usage: 2e510e08-7820-4214-81ee-5647bdaf0db0 | | | disk.write.bytes.rate: 059a529e-dcad-4439-afd1-7d199254bec9 | | | disk.write.bytes: 68be5427-df81-4dac-8179-49ffbbad219e | | | disk.write.requests.rate: 4f86c785-35ef-4d92-923f-b2a80e9dd14f| | | disk.write.requests: 717ce076-c07b-4982-95ed-ba94a6993ce2 | | | instance: a91c09e3-c9b9-4f9a-848b-785f9028b78a | | | memory.resident: af7cd10e-6784-4970-98ff-49bf1e153992 | | | memory.usage: 2b9c9c3f-05ce-4370-a101-736ca2683607 | | | memory: dc4f5d14-1b55-4f44-a15c-48aac461e2bf | | | vcpus: c1cc42a0-4674-44c2-ae6d-48df463a6586 | |resource_id | 969b5215-61d0-47c4-aa3d-b9fc89fcd46c |
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
381
Chapter 8. Monitoring Cloud Metrics for Autoscaling | ... output omitted... | +--------------+---------------------------------------------------------------+
3.
List the available archive policies. Verify that the cpu_util metric of the productionrhel7 instance uses the archive policy named low. 3.1. List the available archive policies and their supported aggregation methods. [student@workstation ~(operator1-production)]$ openstack metric archive-policy \ list -c name -c aggregation_methods +--------+------------------------------------------------+ | name | aggregation_methods | +--------+------------------------------------------------+ | high | std, count, 95pct, min, max, sum, median, mean | | low | std, count, 95pct, min, max, sum, median, mean | | medium | std, count, 95pct, min, max, sum, median, mean | +--------+------------------------------------------------+
3.2. View the definition of the low archive policy. [student@workstation ~(operator1-production)]$ openstack metric archive-policy \ show low -c name -c definition +------------+---------------------------------------------------------------+ | Field | Value | +------------+---------------------------------------------------------------+ | definition | - points: 12, granularity: 0:05:00, timespan: 1:00:00 | | | - points: 24, granularity: 1:00:00, timespan: 1 day, 0:00:00 | | | - points: 30, granularity: 1 day, 0:00:00, timespan: 30 days | | name | low | +------------+---------------------------------------------------------------+
3.3. Use the resource ID of the production-rhel7 instance to check which archive policy is in use for the cpu_util metric. [student@workstation ~(operator1-production)]$ openstack metric metric \ show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ -c archive_policy/name \ cpu_util +---------------------+-------+ | Field | Value | +---------------------+-------+ | archive_policy/name | low | +---------------------+-------+
3.4. View the measures collected for the cpu_util metric associated with the production-rhel7 instance to ensure that it uses granularities according to the definition of the low archive policy. [student@workstation ~(operator1-production)]$ openstack metric measures \ show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ cpu_util +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-05-28T00:00:00+00:00 | 86400.0 | 0.838532808055 | | 2017-05-28T15:00:00+00:00 | 3600.0 | 0.838532808055 |
382
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution | 2017-05-28T18:45:00+00:00 | 300.0 | 0.838532808055 | +---------------------------+-------------+----------------+
4.
Add new measures to the cpu_util metric. Observe that the newly added measures are available using min and max aggregation methods. Use the values from the following table. The measures must be added using the architect1 user's credentials, because manipulating data points requires an account with the admin role. Credentials of user architect1 are stored in /home/student/architect1-production-rc file. Measures Parameter Timestamp
Current time in ISO 8601 formatted timestamp
Measure values
30, 42
The measure values 30 and 42 are manual data values added to the cpu_util metric. 4.1. Source architect1 user's credential file. Add 30 and 42 as new measure values. [student@workstation ~(operator1-production)]$ source ~/architect1-production-rc [student@workstation ~(architect1-production)]$ openstack metric measures add \ --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ --measure $(date -u --iso=seconds)@30 cpu_util [student@workstation ~(architect1-production)]$ openstack metric measures add \ --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ --measure $(date -u --iso=seconds)@42 cpu_util
4.2. Verify that the new measures have been successfully added for the cpu_util metric. Force the aggregation of all known measures. The default aggregation method is mean, so you will see a value of 36 (the mean of 30 and 42). The number of records and their values returned in the output may vary. [student@workstation ~(architect1-production)]$ openstack metric measures \ show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ cpu_util --refresh +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-05-28T00:00:00+00:00 | 86400.0 | 15.419266404 | | 2017-05-28T15:00:00+00:00 | 3600.0 | 15.419266404 | | 2017-05-28T19:55:00+00:00 | 300.0 | 0.838532808055 | | 2017-05-28T20:30:00+00:00 | 300.0 | 36.0 | +---------------------------+-------------+----------------+
4.3. Display the maximum and minimum values for the cpu_util metric measure. [student@workstation ~(architect1-production)]$ openstack metric measures \ show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ cpu_util --refresh --aggregation max +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-05-28T00:00:00+00:00 | 86400.0 | 42.0 | | 2017-05-28T15:00:00+00:00 | 3600.0 | 42.0 | | 2017-05-28T19:55:00+00:00 | 300.0 | 0.838532808055 | | 2017-05-28T20:30:00+00:00 | 300.0 | 42.0 | +---------------------------+-------------+----------------+
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
383
Chapter 8. Monitoring Cloud Metrics for Autoscaling [student@workstation ~(architect1-production)]$ openstack metric measures \ show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ cpu_util --refresh --aggregation min +---------------------------+-------------+----------------+ | timestamp | granularity | value | +---------------------------+-------------+----------------+ | 2017-05-28T00:00:00+00:00 | 86400.0 | 0.838532808055 | | 2017-05-28T15:00:00+00:00 | 3600.0 | 0.838532808055 | | 2017-05-28T20:30:00+00:00 | 300.0 | 30.0 | +---------------------------+-------------+----------------+
5.
Create a threshold alarm named cputhreshold-alarm based on aggregation by resources. Set the alarm to trigger when maximum CPU utilization for the productionrhel7 instance exceeds 50% for two consecutive 5 minute periods. 5.1. Create the alarm. [student@workstation ~(architect1-production)]$ openstack alarm create \ --type gnocchi_aggregation_by_resources_threshold \ --name cputhreshold-alarm \ --description 'Alarm to monitor CPU utilization' \ --enabled True \ --alarm-action 'log://' \ --comparison-operator ge \ --evaluation-periods 2 \ --threshold 50.0 \ --granularity 300 \ --aggregation-method max \ --metric cpu_util \ --query '{"=": {"id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c"}}' \ --resource-type instance +--------------------+-------------------------------------------------------+ | Field | Value | +--------------------+-------------------------------------------------------+ | aggregation_method | max | | alarm_actions | [u'log://'] | | alarm_id | f93a2bdc-1ac6-4640-bea8-88195c74fb45 | | comparison_operator| ge | | description | Alarm to monitor CPU utilization | | enabled | True | | evaluation_periods | 2 | | granularity | 300 | | metric | cpu_util | | name | cputhreshold-alarm | | ok_actions | [] | | project_id | ba5b8069596541f2966738ee0fee37de | | query | {"=": {"id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c"} | | repeat_actions | False | | resource_type | instance | | severity | low | | state | insufficient data | | state_timestamp | 2017-05-28T20:41:43.872594 | | threshold | 50.0 | | time_constraints | [] | | timestamp | 2017-05-28T20:41:43.872594 | | type | gnocchi_aggregation_by_resources_threshold | | user_id | 1beb5c527a8e4b42b5858fc04257d1cd | +--------------------+-------------------------------------------------------+
384
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution 5.2. View the newly created alarm. Verify that the state of the alarm is either ok or insufficient data. According to the alarm definition, data is insufficient until two evaluation periods have been recorded. Continue with the next step if the state is ok or insufficient data. [student@workstation ~(architect1-production)]$ openstack alarm list -c name \ -c state -c enabled +--------------------+-------+---------+ | name | state | enabled | +--------------------+-------+---------+ | cputhreshold-alarm | ok | True | +--------------------+-------+---------+
6.
Simulate high CPU utilization scenario by manually adding new measures to the cpu_util metric of the instance. Observe that the alarm triggers when the aggregated CPU utilization exceeds the 50% threshold through two evluation periods of 5 minutes each. To simulate high CPU utilization, manually add a measure with a value of 80 once every minute until the alarm triggers. It is expected to take between 5 and 10 minutes to trigger. 6.1. Open two terminal windows, either stacked vertically or side-by-side. The second terminal will be used in subsequent steps to add data points until the alarm triggers. In the first window, use the watch to repetitively display the alarm state. [student@workstation ~(architect-production)]$ watch openstack alarm list \ -c alarm_id -c name -c state Every 2.0s: openstack alarm state -c alarm_id -c name -c state +--------------------------------------+--------------------+-------+ | alarm_id | name | state | +--------------------------------------+--------------------+-------+ | 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 | cputhreshold-alarm | ok | +--------------------------------------+--------------------+-------+
6.2. In the second terminal window, use the watch command to add new measures to the cpu_util metric of the production-rhel7 instance every minute. A value of 80 will simulate high CPU utilization, since the alarm is set to trigger at 50%. [student@workstation ~(architect1-production)]$ openstack metric measures \ add --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ --measure $(date -u --iso=seconds)@80 cpu_util
Repeat this command once per minute. Continue to add manual data points at a rate of about one of these commands per minute. Be patient, as the trigger must detect a maximum value greater than 50 in 2 consecutive 5 minute evaluation periods. This is expected to take between 6 and 10 minutes. As long as you are adding one measure at a casual pace of one per minute, the alarm will always trigger. [student@workstation ~(architect-production)]$ openstack metric measures \ add --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \ --measure $(date -u --iso=seconds)@80 cpu_util
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
385
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Note In a real-world environment, measures are collected automatically using various polling and notification agents. Manually adding data point measures for a metric is only for alarm configuration testing purposes.
6.3. The alarm-evaluator service will detect the new manually added measures. Within the expected 6 to 10 minutes, the alarm change state to alarm in the first terminal window. Stop manually adding new data measures as soon as the new allarm state occurs. Observe the new alarm state. The alarm state will transition back to ok after one more evaluation period, because high CPU utilization values are no longer being received. Press CTRL-C to stop the watch. Every 2.0s: openstack alarm state -c alarm_id -c name -c state +--------------------------------------+--------------------+-------+ | alarm_id | name | state | +--------------------------------------+--------------------+-------+ | 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 | cputhreshold-alarm | alarm | +--------------------------------------+--------------------+-------+
6.4. After stopping the watch and closing the second terminal, view the alarm history to analyze when the alarm transitioned from the ok state to the alarm state. The output may look similar to the lines displayed below. [student@workstation ~(architect1-production)]$ openstack alarm-history show \ 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 -c timestamp -c type -c detail -f json [ { "timestamp": "2017-06-08T14:05:53.477088", "type": "state transition", "detail": "{\"transition_reason\": \"Transition to alarm due to 2 samples outside threshold, most recent: 70.0\", \"state\": \"alarm\"}" }, { "timestamp": "2017-06-08T13:18:53.356979", "type": "state transition", "detail": "{\"transition_reason\": \"Transition to ok due to 2 samples inside threshold, most recent: 0.579456043152\", \"state\": \"ok\"}" }, { "timestamp": "2017-06-08T13:15:53.338924", "type": "state transition", "detail": "{\"transition_reason\": \"2 datapoints are unknown\", \"state\": \"insufficient data\"}" }, { "timestamp": "2017-06-08T13:11:51.328482", "type": "creation", "detail": "{\"alarm_actions\": [\"log:/tmp/alarm.log\"], \"user_id\": \"b5494d9c68eb4938b024c911d75f7fa7\", \"name\": \"cputhreshold-alarm\", \"state\": \"insufficient data\", \"timestamp\": \"2017-06-08T13:11:51.328482\", \"description\": \"Alarm to monitor CPU utilization\", \"enabled\": true, \"state_timestamp\": \"2017-06-08T13:11:51.328482\", \"rule\":
386
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution {\"evaluation_periods\": 2, \"metric\": \"cpu_util\", \"aggregation_method\": \"max\", \"granularity\": 300, \"threshold\": 50.0, \"query\": \"{\\\"=\\\": {\\\"id\\\": \\ \"969b5215-61d0-47c4-aa3d-b9fc89fcd46c\\\"}}\", \"comparison_operator \": \"ge\", \"resource_type\": \"instance\"},\"alarm_id\": \"82f0b4b6-5955-4acd-9d2e-2ae4811b8479\", \"time_constraints\": [], \ "insufficient_data_actions\": [], \"repeat_actions\": false, \"ok_actions \": [], \"project_id\": \"4edf4dd1e80c4e3b99c0ba797b3f3ed8\", \"type\": \"gnocchi_aggregation_by_resources_threshold\", \"severity\": \"low\"}"
Evaluation On workstation, run the lab monitoring-review grade command to confirm success of this exercise. Correct any reported failures and rerun the command until successful. [student@workstation ~] lab monitoring-review grade
Cleanup From workstation, run the lab monitoring-review cleanup command to clean up this exercise. [student@workstation ~] lab monitoring-review cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
387
Chapter 8. Monitoring Cloud Metrics for Autoscaling
Summary In this chapter, you learned: • Telemetry data is used for system monitoring, alerts, and for generating customer usage billing. • The Telemetry service collects data using polling agents and notification agents. • The Time Series Database (Gnocchi) service was introduced to decouple the storing of metric data from the Telemetry service to increase the efficiency. • The gnocchi-metricd service is used to compute, in real time, statistics on received data. • The Alarm (Aodh) service provides alarming services within the Telemetry service architecture. • The Event Storage (Panko) service stores events collected by the Telemetry service from various OpenStack components. • The measures stored in the Time Series Database are indexed based on the resource and its attributes. • The aggregated data is stored in the metering database according to the archive policies defined on a per-metric basis. • In the Alarm service, the alarm notifier notifies the activation of an alarm by using the HTTP callback URL, writing to a log file, or sending notifications using the messaging bus.
388
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
TRAINING CHAPTER 9
ORCHESTRATING DEPLOYMENTS Overview Goal
Deploy Orchestration stacks that automatically scale.
Objectives
• Describe the Orchestration service architecture and use cases. • Write templates using the Heat Orchestration Template (HOT) language. • Configure automatic scaling for a stack.
Sections
• Describing Orchestration Architecture (and Quiz) • Writing Heat Orchestration Templates (and Guided Exercise) • Configuring Stack Autoscaling (and Quiz)
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
389
Chapter 9. Orchestrating Deployments
Describing Orchestration Architecture Objectives After completing this section, students should be able to describe Heat orchestration architecture and use cases.
Heat Orchestration and Services When managing an OpenStack infrastructure, using scripts can make it difficult to create and manage all the infrastructural resources. Even version control and tracking changes in the infrastructure can be challenging. Replicating production environments across multiple developments and testing environment become much harder. The Orchestration service (Heat) provides developers and system administrators an easy and repeatable way to create and manage a collection of related OpenStack services. The Heat orchestration service deploys OpenStack resources in an orderly and predictable fashion. The user creates a Heat Orchestration Template (HOT) template to describe OpenStack resources and run time parameters required to execute an application. The Orchestration service does the ordering of the deployment of these OpenStack resources and resolves any dependencies. When provisioning your infrastructure with the Orchestration service, the Orchestration template describes the resources to be provisioned and their settings. Since the templates are text files, the versions of the templates can be controlled using a version control system to track changes to infrastructure.
Heat Orchestration Service Architecture An orchestration stack is a collection of multiple infrastructure resources deployed and managed through the same interface, either by using the Horizon dashboard or using the commandline interface. Stacks standardize and speed up delivery by providing a unified human-readable format. The Heat orchestration project started as an analog of AWS CloudFormation, making it compatible with the template formats used by CloudFormation (CFN), but it also supports its native template format, Heat Orchestration Templates (HOT). The orchestration service executes Heat Orchestration Template (HOT) written in YAML. The YAML format is a human-readable data serialization language. The template, along with the input parameters, calls the Orchestration REST APIs for deploying the stack using either the Horizon dashboard or the OpenStack CLI commands. The Heat orchestration API service forwards requests to the Orchestration engine service using the remote procedure calls (RPCs) over AMQP. Optionally, the Orchestration CFN service sends the AWS CloudFormation-compatible requests to the Orchestration engine service over RPC. The Orchestration engine service interprets the orchestration template and launches the stack. The events generated by the Orchestration engine service are consumed by the Orchestration API service to provide the status of the Orchestration stack that was launched.
390
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Orchestration Use Cases and Recommended Practices
Figure 9.1: Heat Orchestration Service Architecture Heat Orchestration Service In Red Hat OpenStack Platform, the orchestration service is provided by Heat. An orchestration template specifies the relationship between the resources to be deployed. The relationship specified in the template enables the orchestration engine to call different OpenStack APIs to deploy the resources in the correct order. The Orchestration template uses resource types to create various resources such as instances, volumes, security groups, and other resources. Next, more complex resources are created using a nested stack. The Orchestration templates primarily deploy various infrastructural components. Different software configuration management tools, such as Ansible, Puppet, and others, can be integrated with the Orchestration templates to deploy software and to make configuration changes to this software.
Orchestration Use Cases and Recommended Practices The Orchestration template can be used repeatedly to create identical copies of the same Orchestration stack. A template, written in YAML formatted text, can be placed under a source control system to maintain various versions of the infrastructure deployment. Orchestration makes it easy to organize and deploy a collection of OpenStack resources that allows you to describe any dependencies or pass specific parameters at run time. Orchestration template parameters are used to customize aspects of the template at run time during the creation of the stack. Here are recommended practices to help you plan and organize the deployment of your Orchestration stack.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
391
Chapter 9. Orchestrating Deployments • Using multiple layers of stacks that build on top of one another is the best way to organize an orchestration stack. Putting all the resources in one stack becomes cumbersome to manage when the stack is scaled, and broadens the scope of resources to be provisioned. • When using nested stacks the resources names or IDs can be hard coded into the calling stack. However, hard coding of resources names or IDs can make templates difficult to be reused, and may increase overhead to get the stack deployed. • The changes in the infrastructure after updating a stack should be verified first by doing a dry run of the stack. • Before launching a stack, ensure all the resources to be deployed by the orchestration stack are within the project quota limits. • With the growth of infrastructure, declaring resources in each template becomes repetitive. Such shared resources should be maintained as a separate stack and used inside a nested stack. Nested stacks are the stacks that create other stacks. • When declaring parameters in the orchestration template, use constraints to define the format for the input parameters. Constraints allow you to describe legal input values so that the Orchestration engine catches any invalid values before creating the stack. • Before using a template to create or update a stack, you can use OpenStack CLI to validate it. Validating a template helps catch syntax and some semantic errors, such as circular dependencies before the Orchestration stack creates any resources.
Configuration Files and Logs The Orchestration service uses the /etc/heat/heat.conf file for configuration. Some of the most common configuration options can be found in the following table: Parameter
Description
encrypt_parameters_and_properties
Encrypts the parameter and properties of a resource; marked as hidden before storing in the database. The parameter accepts a Boolean value.
heat_stack_user_role
The Identity user role name associated with the user who is responsible for launching the stack. The parameter accepts the value as a string. The default value is the heat_stack_user role.
num_engine_workers
The number of heat-engine processes to fork and run on the host. The parameter accepts the value as an integer. The default value is either the number of CPUs on the host running the heat-engine service or 4, whichever is greater.
stack_action_timeout
The default timeout period in seconds to timeout the creation and update of a stack. The default value is 3600 seconds (1 hour).
392
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Orchestration Service The log files for the orchestration service are stored in the /var/log/heat directory of the host on which the heat-api, heat-engine, and heat-manage services are running. File name
Description
heat-api.log
Stores the log related to the orchestration API service.
heat-engine.log
Stores the log related to the orchestration engine service.
heat-manage.log
Stores the log related to the orchestration events service.
Troubleshooting Orchestration Service Most of the errors occur while deploying the orchestration stack. The following are some of the common errors and ways to troubleshoot the problem. • Editing an existing template might introduce YAML syntax errors. Various tools, such as python -m json.tool, help validate the YAML syntax errors in the template files. Using the --dry-run option with the openstack stack create command validates some of the YAML syntax. • If an instance goes into the ERROR state after launching a stack, troubleshoot the problem by looking for the /var/log/nova/scheduler.log log file on the compute node. If the error shows No valid host was found, the compute node does not have the required resources to launch the instance. Check the resources consumed by the instances running on the compute nodes and, if possible, change the allocation ratio in the /etc/nova/nova.conf file. To over-commit the amount of the CPU, the RAM, and the disk allocated on the compute nodes use the following commands to change the allocation ratio. The ratios shown in the commands are arbitrary. [user@demo ~]$ crudini --set /etc/nova/nova.conf DEFAULT disk_allocation_ratio 2.0 [user@demo ~]$ crudini --set /etc/nova/nova.conf DEFAULT cpu_allocation_ratio 8.0 [user@demo ~]$ crudini --set /etc/nova/nova.conf DEFAULT ram_allocation_ratio 1.5
• While validating a template using the --dry-run option. It checks for the existence of resources required for the template and run time parameters. Using custom constraints helps the template parameters to be parsed at an early stage rather than failing during the launch of the stack.
References Further information is available in the Components chapter of the Architecture Guide for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
393
Chapter 9. Orchestrating Deployments
Quiz: Describing Orchestration Architecture Choose the correct answer(s) to the following questions: 1.
Which OpenStack service provides orchestration functionality in Red Hat OpenStack Platform? a. b. c. d.
2.
Which two template formats are supported by the Orchestration service? (Choose two.) a. b. c. d.
3.
/var/log/heat/heat-api.log /var/log/heat/heat-manage.log /var/log/heat/engine.log /var/log/heat/heat-engine.log
Which command-line interface option helps to validate a template? a. b. c. d.
394
86400 Seconds 3600 Seconds 300 Seconds 600 Seconds
In which log file does information related to the Orchestration engine service get logged? a. b. c. d.
6.
XML JSON YAML HTML
What is the default timeout period for a stack creation? a. b. c. d.
5.
OpenStack Orchestration Template (OOT) Heat Orchestration Template (HOT) Rapid Deployment Template (RDP) CloudFormation (CFN)
In what language are Orchestration templates written? a. b. c. d.
4.
Nova Glance Heat Ceilometer
--validate --run-dry --dry-run --yaml
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution Choose the correct answer(s) to the following questions: 1.
Which OpenStack service provides orchestration functionality in Red Hat OpenStack Platform? a. b. c. d.
2.
Which two template formats are supported by the Orchestration service? (Choose two.) a. b. c. d.
3.
86400 Seconds 3600 Seconds 300 Seconds 600 Seconds
In which log file does information related to the Orchestration engine service get logged? a. b. c. d.
6.
XML JSON YAML HTML
What is the default timeout period for a stack creation? a. b. c. d.
5.
OpenStack Orchestration Template (OOT) Heat Orchestration Template (HOT) Rapid Deployment Template (RDP) CloudFormation (CFN)
In what language are Orchestration templates written? a. b. c. d.
4.
Nova Glance Heat Ceilometer
/var/log/heat/heat-api.log /var/log/heat/heat-manage.log /var/log/heat/engine.log /var/log/heat/heat-engine.log
Which command-line interface option helps to validate a template? a. b. c. d.
--validate --run-dry --dry-run --yaml
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
395
Chapter 9. Orchestrating Deployments
Writing Heat Orchestration Templates Objectives After completing this section, students should be able to write templates using the Heat Orchestration Template (HOT) language.
Introduction to YAML Orchestration templates are written using the YAML Ain't Markup Language (YAML) language. Therefore, it is necessary to understand the basics of YAML syntax to write an orchestration template. YAML was designed primarily for the representation of data structures such as lists and associative arrays, in an easily written, human-readable format. This design objective is accomplished primarily by abandoning traditional enclosure syntax, such as brackets, braces, or opening and closing tags, commonly used by other languages to denote the structure of a data hierarchy. Instead, in YAML, data hierarchy structures are maintained using outline indentation. Data structures are represented using an outline format with space characters for indentation. There is no strict requirement regarding the number of space characters used for indentation other than data elements must be further indented than their parents to indicate nested relationships. Data elements at the same level in the data hierarchy must have the same indentation. Blank lines can be optionally added for readability. Indentation can only be performed using the space character. Indentation is very critical to the proper interpretation of YAML. Since tabs are treated differently by various editors and tools, YAML forbids the use of tabs for indentation. Adding the following line to the user's $HOME/.vimrc, two-space indentation is performed when the Tab key is pressed. This also auto-indents subsequent lines. autocmd FileType yaml setlocal ai ts=2 sw=2 et
Heat Orchestration Template (HOT) Language Heat Orchestration Template (HOT) is a language supported by the Heat orchestration service. The template uses the YAML syntax to describe various resources and properties. Each orchestration template must include the heat_template_version key with a correct orchestration template version. The orchestration template version defines both the supported format of the template and the features that are valid and supported for the Orchestration service. The orchestration template version is in a date format or uses the release name, such as newton. The openstack orchestration template version list command lists all the supported template versions. [user@demo ~]$ openstack orchestration template version list +--------------------------------------+------+ | version | type | +--------------------------------------+------+ | AWSTemplateFormatVersion.2010-09-09 | cfn | | HeatTemplateFormatVersion.2012-12-12 | cfn | | heat_template_version.2013-05-23 | hot |
396
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Heat Orchestration Template (HOT) Language | heat_template_version.2014-10-16 | hot | | heat_template_version.2015-04-30 | hot | | heat_template_version.2015-10-15 | hot | | heat_template_version.2016-04-08 | hot | | heat_template_version.2016-10-14 | hot | +--------------------------------------+------+
The description key in a template is optional, but can include some useful text that describes the purpose of the template. You can add multi-line text to the description key by using folded blocks (>) in YAML. Folded blocks replace each line break with a single space, ignoring indentation. heat_template_version: 2016-10-14 description: > This is multi-line description that describes the template usage.
Parameters The orchestration templates allow users to customize the template during deployment of the orchestration stack by use of input parameters. The input parameters are defined in the parameters section of the orchestration template. Each parameter is defined as a separate nested block with required attributes such as type or default. In the orchestration template, the parameters section uses the following syntax and attributes to define an input parameter for the template. parameters: : type: label: description: default: hidden: constraints:
immutable:
Attribute
Description
type
Data type of the parameter. The supported data types are string, number, JSON, comma delimited list, and boolean.
label
Human-readable name for the parameter. This attribute is optional.
description
Short description of the parameter. This attribute is optional.
default
Default value to be used in case the user does not enter any value for the parameter. This attribute is optional.
hidden
Determines whether the value of the parameter is hidden when the user lists information about a stack created by the orchestration template. This attribute is optional and defaults to false.
constraints
Constraints to be applied to validate the input value provided by the user for a parameter. The constraints attribute can apply lists of different constraints. This attribute is optional.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
397
Chapter 9. Orchestrating Deployments Attribute
Description
immutable
Defines whether the parameter can be updated. The stack fails to be updated if the parameter value is changed and the attribute value is set to true.
The custom_constraints constraint adds an extra step of validation to verify whether the required resource exists in the environment. Custom constraints are implemented using Orchestration plugins. The custom_constraints attribute uses the name associated with the Orchestration plugins. For example, use the following syntax to ensure the existence of a Block Storage (Cinder) volume: parameters: volume_name: type: string description: volume name constraints: - custom_constraints: cinder.volume
Resources The resources section in the orchestration template defines resources provisioned during deployment of a stack. Each resource is defined as a separate nested block with its required attributes, such as type and properties. The properties attribute defines the properties required to provision the resource. The resources section in a template uses the following syntax and attributes to define a resource for the stack. resources: : type: properties: :
Attribute
Description
resource ID
A resource name. This must be uniquely referenced within the resources section of the template.
type
The attribute uses the resource type name. The core OpenStack resources are included in the Orchestration engine service as a built-in resource. The Orchestration service provides support for resource plugins using custom resources. This attribute is mandatory and must be specified when declaring a resource.
properties
This attribute is used to specify a list of properties associated with a resource type. The property value is either hard-coded or uses intrinsic functions to retrieve the value. This attribute is optional.
Resource Types A resource requires a type attribute, such as an instance and various properties, that depend on the resource type. To list the available resource types, use the openstack orchestration resource type list command. [user@demo ~]$ openstack orchestration resource type list +----------------------------------------------+
398
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Heat Orchestration Template (HOT) Language | Resource Type | +----------------------------------------------+ ...output omitted... | OS::Nova::FloatingIP | | OS::Nova::FloatingIPAssociation | | OS::Nova::KeyPair | | OS::Nova::Server | | OS::Nova::ServerGroup | | OS::Swift::Container | +----------------------------------------------+
The OS::Heat::ResourceGroup resource type creates one or more identical resources. The resource definition is passed as a nested stack. The required property for the ResourceGroup resource type is resource_def. The value of the resource_def property is the definition of the resource to be provisioned. The count property sets the number of resources to provision. resources: my_group: type: OS::Heat::ResourceGroup properties: count: 2 resource_def: type: OS::Nova::Server properties: name: { get_param: instance_name } image: { get_param: instance_image }
Intrinsic Functions HOT provides several built-in functions that are used to perform specific tasks in the template. Intrinsic functions in the Orchestration template assign values to the properties that are available during creation of the stack. Some of the widely used intrinsic functions are listed below: • get_attr: The get_attr function references an attribute of a resource. This function takes the resource name and the attribute name as the parameters to retrieve the attribute value for the resource. resources: the_instance: type: OS::Nova::Server ...output omitted... outputs: instance_ip: description: IP address of the instance value: { get_attr: [the_instance, first_address] }
• get_param: The get_param function references an input parameter of a template and returns the value of the input parameter. This function takes the parameter name as the parameter to retrieve the value of the input parameter declared in the template. parameters: instance_flavor: type: string description: Flavor to be used by the instance. resources:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
399
Chapter 9. Orchestrating Deployments the_instance: type: OS::Nova::Server properties: flavor: { get_param: instance_flavor }
• get_resource: The get_resource function references a resource in the template. The function takes the resource name as the parameter to retrieve the resource ID of the referenced resource. resources: the_port: type: OS::Neutron::Port ...output omitted... the_instance: type: OS::Nova::Server properties: networks: port: { get_resource: the_port }
• str_replace: The str_replace function substitutes variables in an input string with values that you specify. The input string along with variables are passed to the template property of the function. The values of the variables are instantiated using the params property as a key value pair. outputs: website_url: description: The website URL of the application. value: str_replace: template: http://varname/MyApps params: varname: { getattr: [ the_instance, first_address ] }
• list_join: The list_join function appends a set of strings into a single value, separated by the specified delimiter. If the delimiter is an empty string, it concatenates all of the strings. resources: random: type: OS::Heat::RandomString properties: length: 2 the_instance: type: OS::Nova::Server properties: instance_name: { list_join: [ '-', [ {get_param: instance_name}, {get_attr: [random, value]} ] ] }
Software Configuration using Heat Orchestration Template Orchestration templates allow a variety of options to configure software on the instance provisioned by the Orchestration stack. The frequency of the software configuration changes to be applied to the software installed on the instance is the deciding factor on how to
400
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Software Configuration using Heat Orchestration Template implement software configurations. There are, broadly, three options to implement the software configuration changes using the orchestration template: • Using a custom image the includes installed and configured software. This method can be used when there is no change in software configuration required during the life cycle of an instance. • Using the user data script and cloud-init to configure the pre-installed software in the image. This method can be used when there is a software configuration change required once during the life cycle of an instance (at boot time). An instance must be replaced with a new instance when software configuration changes are made using this option. • Using the OS::Heat::SoftwareDeployment resource allows any number of software configuration changes to be applied to an instance throughout its life cycle. Using User Data Scripts in a Heat Orchestration Template When provisioning an instance, you can specify a user-data script to configure the software installed on the instance. Software can be baked into the image, or installed using a userdata script. In HOT language, user data is provided using the user-data property for the OS::Nova::Server resource type. The data provided using the user-data property can be a shell script or a cloud-init script. The str_replace intrinsic function is used to set the variable value based on the parameters or the resources in a stack. The user_data_format property defines the way user data is processed by an instance. Using RAW as the value of the user_data_format property, the user data is passed to the instance unmodified. resources: the_instance: type: OS::Nova::Server properties: ...output omitted... user_data_format: RAW user_data: str_replace: template: | #/bin/bash echo "Hello World" > /tmp/$demo params: $demo: demofile
When the user data is changed and the orchestration stack is updated using the openstack stack update command, the instance is deleted and recreated using the updated user data script. To provide complex scripts using the user-data property, one must use the get_file intrinsic function. The get_file function takes the name of a file as its argument. resources: the_instance: type: OS::Nova::Server properties: ...output omitted... user_data_format: RAW user_data: str_replace: template: { get_file: demoscript.sh } params: $demo: demofile
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
401
Chapter 9. Orchestrating Deployments Using the Software Deployment Resource Type Use the OS::Heat::SoftwareDeployment resource type to initiate software configuration changes without replacing the instance with a new instance. An example use case is any situation where an instance cannot be replaced with a new instance, but software configuration changes are needed during the life cycle of the instance. The OS::Heat::SoftwareDeployment resource type allows you to add or remove software configuration multiple times from an instance during its life cycle. There are three resource types required to perform the software configuration changes using the orchestration stack. • The OS::Heat::SoftwareConfig resource type enables integration with various software configuration tools such an Ansible Playbook, shell script, or Puppet manifest. The resource type creates an immutable software configuration so that any change to software configuration replaces the old configuration with a new configuration. Properties of the OS::Heat::SoftwareConfig are config, group, inputs, and outputs. The group property defines the name of the software configuration tool to be used, such as script, ansible, or puppet. The config property sets the configuration script or manifests that specifies the actual software configuration performed on the instance. The inputs and the outputs properties represent the input parameter and the output parameter for the software configuration. resources: the_config: type: OS::Heat::SoftwareConfig properties: group: script inputs: - name: filename - name: content outputs: - name: result config: get_file: demo-script.sh
• The OS::Heat::SoftwareDeployment resource type applies the software configuration defined using the OS::Heat::SoftwareConfig resource. The SoftwareDeployment resource type allow you to input values, based on defined input variables using the inputs property of the SoftwareConfig resource. When the state changes to the IN_PROGRESS state, the software configuration that has been replaced with the variable values is made available to the instance. The state is changed to the CREATE_COMPLETE state when a success or failure signal is received from the Orchestration API. The required property for the OS::Heat::SoftwareDeployment resource type is the server property. The server property is a reference to the ID of the resource to which configuration changes are applied. Other optional properties includes action, config, and input_values. The action property defines when the software configuration needs to be initiated based on orchestration stack state. The action property supports CREATE, UPDATE, SUSPEND, and RESUME actions. The config property references the resource ID of the software configuration resource to execute when applying changes to the instance. The input_values property maps values to the input variables defined in the software configuration resource. resources: the_deployment: type: OS::Heat::SoftwareDeployment
402
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Software Configuration using Heat Orchestration Template properties: server: get_resource: the_server actions: - CREATE - UPDATE config: get_resource: the_config input_values: filename: demofile content: 'Hello World'
• The OS::Nova::Server resource type defines the instance on which the software configuration changes are applied. The user_data_format property of the OS::Nova::Server resource type must use the SOFTWARE_CONFIG value to support the software configuration changes using the OS::Heat::SoftwareDeployment resource. resources: the_server: type: OS::Nova::Server properties: ...output omitted... user_data_format: SOFTWARE_CONFIG
The instance to use OS::Heat::SoftwareDeployment resources for software configuration requires orchestration agents to collect and process the configuration changes by polling the Orchestration API. You must imbed these Orchestration agents into the image. The pythonheat-agent package must be included, and provides support for software configuration via shell scripts. Support for other software configuration tools is available from the python-heat-agentansible package (for Ansible playbooks) or the python-heat-agent-puppet package (for Puppet manifests).
Figure 9.2: SoftwareDeployment Workflow Other agents used to apply software configuration changes on an instance follows: • The os-collect-config agents poll the Orchestration API for updated resource metadata that is associated with the OS::Nova::Server resource. • The os-refresh-config agent is executed when there is a change in the software configuration and which is polled by the os-collect-config agent. It refreshes the configuration by deleting the older configuration and replacing it with the newer configuration.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
403
Chapter 9. Orchestrating Deployments The os-refresh-config agent uses the group property defined for the deployment to process configuration. It uses the heat-config-hook script to apply the software configuration changes. The heat-config-hook scripts are provided by the python-heatagent-* packages. Upon completion, the hook notifies the Orchestration API of a successful or failed configuration deployment using the heat-config-notify element. • The os-apply-config agent transforms software configuration data provided by the orchestration template into a service configuration file.
Using SoftwareDeployment Resource from an Orchestration Stack The following steps outline the process to use the OS::Heat::SoftwareDeployment resource for software configuration of an instance. 1.
Create a Heat Orchestration Template file to define the orchestration stack.
2.
Set the required input parameters in the orchestration stack.
3.
Specify the OS::Nova::Server resource to apply the software configuration.
4.
Define the OS::Heat::SoftwareConfig resource to create the configuration to be applied to the OS::Nova::Server resource.
5.
Define the OS::Heat::SoftwareDeployment resource. Reference the OS::Heat::SoftwareConfig resource to set the configuration to be used. Set the server property of the OS::Heat::SoftwareDeployment resource to use the OS::Nova::Server resource. Pass the required input parameters to the OS::Heat::SoftwareDeployment resource that is made available to the instance during runtime. Optionally, specify the actions property to define life cycle actions that trigger the deployment.
6.
Optionally, specify the output of the stack using the attributes of the OS::Heat::SoftwareDeployment resource.
7.
Create the environment file with all input parameters required for launching the orchestration stack.
8.
Execute a dry run to test stack creation.
9.
Initiate the orchestration stack to configure the software using the openstack stack create command.
10.
Optionally, change the software configuration either by editing the configuration script or by changing the input parameters passed during runtime. Commit the configuration changes to the instance by updating the stack using the openstack stack update command.
404
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Software Configuration using Heat Orchestration Template
References Template Guide https://docs.openstack.org/heat/latest/template_guide/index.html Software configuration https://docs.openstack.org/heat/latest/template_guide/software_deployment.html
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
405
Chapter 9. Orchestrating Deployments
Guided Exercise: Writing Heat Orchestration Templates In this exercise, you will edit a orchestration template to launch a customized instance. You will use a preexisting template and troubleshoot orchestration issues. Resources Files:
http://materials.example.com/heat/finance-app1.yaml http://materials.example.com/heat/ts-stack.yaml http://materials.example.com/heat/ts-environment.yaml
Outcomes You should be able to: • Edit a orchestration template to launch a customized instance. • Launch a stack using the orchestration template. • Provision identical resources using the OS::Heat::ResourceGroup resource type. • Troubleshoot orchestration issues. Before you begin Log in to workstation as student user and student as the password. On workstation, run the lab orchestration-heat-templates setup command. This script ensures the openstack services are running and the environment is properly configured for the exercise. The script also confirms that the resources needed for launching the stack are available. [student@workstation ~]$ lab orchestration-heat-templates setup
Steps 1. On workstation, create a directory named /home/student/heat-templates. The /home/student/heat-templates directory will store downloaded template files and environment files used for orchestration. [student@workstation ~]$ mkdir ~/heat-templates
2.
When you edit YAML files, you must use spaces, not the tab character, for indentation. If you use vi for text editing, add a setting in the .vimrc file to set auto-indentation and set the tab stop and shift width to two spaces for YAML files. Create the /home/student/.vimrc file with the content, as shown: autocmd FileType yaml setlocal ai ts=2 sw=2 et
406
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
3.
Download the http://materials.example.com/heat/finance-app1.yaml file in the /home/student/heat-templates directory. Edit the orchestration template to launch a customized instance. The orchestration template must orchestrate the following items: • The finance-web1 instance must install the httpd package. • The httpd service must be started and enabled. • The web server must host a web page containing the following content: You are connected to $public_ip The private IP address is: $private_ip Red Hat Training
The $public_ip variable is the floating IP address of the instance. The $private_ip variable is the private IP address of the instance. You will define these variables in the template. • The orchestration stack must retry once to execute the user data script. The user data script must return success on the successful execution of the script. The script must return the fail result if it is unable to execute the user data script within 600 seconds due to timeout. 3.1. Change to the /home/student/heat-templates directory. Download the orchestration template file from http://materials.example.com/heat/ finance-app1.yaml in the /home/student/heat-templates directory. [student@workstation ~]$ cd ~/heat-templates [student@workstation heat-templates]$ wget \ http://materials.example.com/heat/finance-app1.yaml
3.2. Use the user_data property to define the user data script to install the httpd package. The httpd service must be started and enabled to start at boot time. The user_data_format property for the OS::Nova::Server resource type must be set to RAW. Edit the /home/student/heat-templates/finance-app1.yaml file, as shown: web_server: type: OS::Nova::Server properties: name: { get_param: instance_name } image: { get_param: image_name } flavor: { get_param: instance_flavor } key_name: { get_param: key_name } networks: - port: { get_resource: web_net_port } user_data_format: RAW user_data: str_replace: template: | #!/bin/bash yum -y install httpd
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
407
Chapter 9. Orchestrating Deployments systemctl restart httpd.service systemctl enable httpd.service
3.3. In the user_data property, create a web page with the following content: You are connected to $public_ip The private IP address is: $private_ip Red Hat Training
The web page uses the $public_ip and the $private_ip variables passed as parameters. These parameters are defined using the params property of the str_replace intrinsic function. The $private_ip variable uses the web_net_port resource attribute fixed_ips to retrieve the first IP address associated with the network interface. The $public_ip variable uses the web_floating_ip resource attribute floating_ip_address to set the public IP address associated with the instance. Edit the /home/student/heat-templates/finance-app1.yaml file, as shown: web_server: type: OS::Nova::Server properties: name: { get_param: instance_name } image: { get_param: image_name } flavor: { get_param: instance_flavor } key_name: { get_param: key_name } networks: - port: { get_resource: web_net_port } user_data_format: RAW user_data: str_replace: template: | #!/bin/bash yum -y install httpd systemctl restart httpd.service systemctl enable httpd.service sudo touch /var/www/html/index.html sudo cat /var/www/html/index.html You are connected to $public_ip The private IP address is:$private_ip Red Hat Training EOF params: $private_ip: {get_attr: [web_net_port,fixed_ips,0,ip_address]} $public_ip: {get_attr: [web_floating_ip,floating_ip_address]}
3.4. Use the WaitHandleCondition resource to send a signal about the status of the user data script. The $wc_notify variable is set to the wait handle URL using the curl_cli attribute of the wait_handle resource. The wait handle URL value is set to the $wc_notify variable. The $wc_notify variable returns the status as SUCCESS if the web page deployed by the script is accessible and returns 200 as the HTTP status code. The web_server resource state is marked as CREATE_COMPLETE when the WaitConditionHandle resource signals SUCCESS.
408
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
The WaitConditionHandle returns FAILURE if the web page is not accessible or if it times out after 600 seconds. The web_server resource state is marked as CREATE_FAILED. Edit the /home/student/heat-templates/finance-app1.yaml file, as shown: web_server: type: OS::Nova::Server properties: name: { get_param: instance_name } image: { get_param: image_name } flavor: { get_param: instance_flavor } key_name: { get_param: key_name } networks: - port: { get_resource: web_net_port } user_data_format: RAW user_data: str_replace: template: | #!/bin/bash yum -y install httpd systemctl restart httpd.service systemctl enable httpd.service sudo touch /var/www/html/index.html sudo cat /var/www/html/index.html You are connected to $public_ip The private IP address is:$private_ip Red Hat Training EOF export response=$(curl -s -k \ --output /dev/null \ --write-out %{http_code} http://$public_ip/) [[ ${response} -eq 200 ]] && $wc_notify \ --data-binary '{"status": "SUCCESS"}' \ || $wc_notify --data-binary '{"status": "FAILURE"}' params: $private_ip: {get_attr: [web_net_port,fixed_ips,0,ip_address]} $public_ip: {get_attr: [web_floating_ip,floating_ip_address]} $wc_notify: {get_attr: [wait_handle,curl_cli]}
Save and exit the file. 4.
Create the /home/student/heat-templates/environment.yaml environment file. Enter the values for all input parameters defined in the /home/student/heattemplates/finance-app1.yaml template file. Edit the /home/student/heat-templates/environment.yaml file with the content, as shown: parameters: image_name: finance-rhel7 instance_name: finance-web1 instance_flavor: m1.small key_name: developer1-keypair1 public_net: provider-172.25.250 private_net: finance-network1 private_subnet: finance-subnet1
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
409
Chapter 9. Orchestrating Deployments 5.
Launch the stack and verify it by accessing the web page deployed on the instance. Use the developer1 user credentials to launch the stack. 5.1. Using the developer1 user credentials to dry run the stack, check the resources that will be created when launching the stack. Rectify all errors before proceeding to the next step to launch the stack. Use the finance-app1.yaml template file and the environment.yaml environment file. Name the stack finance-app1.
Note Before running the dry run of the stack. Download the http:// materials.example.com/heat/finance-app1.yaml-final template file in the /home/student/heat-templates directory. Use the diff command to confirm the editing of the finance-app1.yaml template file with the known good template file; finance-app1.yaml-final. Fix any differences you find, then proceed to launch the stack. [student@workstation heat-templates]$ wget \ http://materials.example.com/heat/finance-app1.yaml-final [student@workstation heat-templates]$ diff finance-app1.yaml \ finance-app1.yaml-final
[student@workstation heat-templates]$ source ~/developer1-finance-rc [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment environment.yaml \ --template finance-app1.yaml \ --dry-run -c description \ finance-app1 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | description | spawning a custom web server | +---------------------+--------------------------------------+
5.2. Launch the stack using the finance-app1.yaml template file and the environment.yaml environment file. Name the stack finance-app1. If the dry run is successful, run the openstack stack create with the --enablerollback option. Do not use the --dry-run option while launching the stack. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment environment.yaml \ --template finance-app1.yaml \ --enable-rollback \ --wait \ finance-app1 [finance-app1]: CREATE_IN_PROGRESS Stack CREATE started [finance-app1.wait_handle]: CREATE_IN_PROGRESS state changed [finance-app1.web_security_group]: CREATE_IN_PROGRESS state changed [finance-app1.wait_handle]: CREATE_COMPLETE state changed
410
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
[finance-app1.web_security_group]: CREATE_COMPLETE state changed [finance-app1.wait_condition]: CREATE_IN_PROGRESS state changed [finance-app1.web_net_port]: CREATE_IN_PROGRESS state changed [finance-app1.web_net_port]: CREATE_COMPLETE state changed [finance-app1.web_floating_ip]: CREATE_IN_PROGRESS state changed [finance-app1.web_floating_ip]: CREATE_COMPLETE state changed [finance-app1.web_server]: CREATE_IN_PROGRESS state changed [finance-app1.web_server]: CREATE_COMPLETE state changed [finance-app1.wait_handle]: SIGNAL_COMPLETE Signal: status:SUCCESS reason:Signal 1 received [finance-app1.wait_condition]: CREATE_COMPLETE state changed [finance-app1]: CREATE_COMPLETE Stack CREATE completed successfully +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | id | 23883f81-19b0-4446-a2b8-7f261958a0f1 | | stack_name | finance-app1 | | description | spawning a custom web server | | creation_time | 2017-06-01T08:04:29Z | | updated_time | None | | stack_status | CREATE_COMPLETE | | stack_status_reason | Stack CREATE completed successfully | +---------------------+--------------------------------------+
5.3. List the output returned by the finance-app1 stack. Check the website_url output value. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ output list finance-app1 +----------------+------------------------------------------------------+ | output_key | description | +----------------+------------------------------------------------------+ | web_private_ip | IP address of first web server in private network | | web_public_ip | Floating IP address of the web server | | website_url | This URL is the "external" URL that | | | can be used to access the web server. | | | | +----------------+------------------------------------------------------+ [student@workstation heat-templates(developer1-finance)]$ openstack stack \ output show finance-app1 website_url +--------------+--------------------------------------------+ | Field | Value | +--------------+--------------------------------------------+ | description | This URL is the "external" URL that can be | | | used to access the web server. | | | | | output_key | website_url | | output_value | http://172.25.250.N/ | +--------------+--------------------------------------------+
5.4. Verify that the instance was provisioned and the user data was executed successfully on the instance. Use the curl command to access the URL returned as the value for the website_url output. [student@workstation heat-templates(developer1-finance)]$ curl \ http://172.25.250.N/ You are connected to 172.25.250.N The private IP address is:192.168.1.P Red Hat Training
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
411
Chapter 9. Orchestrating Deployments In the previous output, the N represents the last octet of the floating IP address associated with the instance. The P represents the last octet of the private IP address associated with the instance. 6.
Delete the finance-app1 stack. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ delete --yes --wait finance-app1 2017-06-01 08:19:01Z [finance-app1]: DELETE_IN_PROGRESS Stack DELETE started
7.
Use the OS::Heat::ResourceGroup resource type to provision identical resources. The stack must orchestrate a maximum of two such resources. The main stack must call the / home/student/heat-templates/finance-app1.yaml for provisioning the resource defined in the file. Edit the orchestration template after downloading from http:// materials.example.com/heat/nested-stack.yaml to the /home/student/heattemplates directory. 7.1. Download the orchestration template file from http://materials.example.com/ heat/nested-stack.yaml to the /home/student/heat-templates directory. [student@workstation heat-templates(developer1-finance)]$ wget \ http://materials.example.com/heat/nested-stack.yaml
7.2. Edit the /home/student/heat-templates/nested-stack.yaml orchestration template. Add the new input parameter named instance_count under the parameters section. Use the range constraints to define the minimum number as 1 and maximum number as 2. parameters: ...output omitted... instance_count: type: number description: count of servers to be provisioned constraints: - range: { min: 1, max: 2 }
7.3. Edit the /home/student/heat-templates/nested-stack.yaml orchestration template. Add a resource named my_resource under the resources section. Use the OS::Heat::ResourceGroup resource type and set the count property to use the instance_count input parameter. ...output omitted... resources: my_resource: type: OS::Heat::ResourceGroup properties: count: { get_param: instance_count } ...output omitted...
Save and exit the file.
412
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
7.4. Edit the /home/student/heat-templates/environment.yaml environment file to initialize the instance_count input parameter. parameters: image_name: finance-rhel7 instance_name: finance-web1 instance_flavor: m1.small key_name: developer1-keypair1 public_net: provider-172.25.250 private_net: finance-network1 private_subnet: finance-subnet1 instance_count: 2
7.5. Edit the /home/student/heat-templates/environment.yaml environment file to define a custom resource type named My::Server::Custom::WebServer. The My::Server::Custom::WebServer custom resource type must point to the finance-app1.yaml template. resource_registry: My::Server::Custom::WebServer: finance-app1.yaml parameters: image_name: finance-rhel7 instance_name: finance-web1 instance_flavor: m1.small key_name: developer1-keypair1 public_net: provider-172.25.250 private_net: finance-network1 private_subnet: finance-subnet1 instance_count: 2
7.6. Open and edit the /home/student/heat-templates/nested-stack.yaml orchestration template. Set the resource_def property of the my_resource resource type to use the My::Server::Custom::WebServer custom resource type. The My::Server::Custom::WebServer custom resource type uses the input parameters required to provision the instance. Edit the file to add the content, as shown: resources: my_resource: type: OS::Heat::ResourceGroup properties: count: { get_param: instance_count } resource_def: type: My::Server::Custom::WebServer properties: instance_name: { get_param: instance_name } instance_flavor: { get_param: instance_flavor } image_name: { get_param: image_name } key_name: { get_param: key_name } public_net: { get_param: public_net } private_net: { get_param: private_net } private_subnet: { get_param: private_subnet }
Save and exit the file.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
413
Chapter 9. Orchestrating Deployments 7.7. Use the developer1 user credentials to dry run the stack and check for resources that will be created. Name the stack finance-app2. Use the nested-stack.yaml template and the environment.yaml environment file. Rectify any errors before proceeding to the next step to launch the stack.
Note Before running the dry run of the stack. Download the http:// materials.example.com/heat/nested-stack.yaml-final template file in the /home/student/heat-templates directory. Use the diff command to confirm the editing of the nested-stack.yaml template file with the known good template file; nested-stack.yaml-final. Fix any differences you find, then proceed to launch the stack. [student@workstation heat-templates]$ wget \ http://materials.example.com/heat/nested-stack.yaml-final [student@workstation heat-templates]$ diff nested-stack.yaml \ nested-stack.yaml-final
[student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment environment.yaml \ --template nested-stack.yaml \ --dry-run \ finance-app2
7.8. Launch the stack using the nested-stack.yaml template file and the environment.yaml environment file. Name the stack finance-app2. If the dry run succeeds, run the openstack stack create with --enablerollback option. Do not use the --dry-run option while launching the stack. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment environment.yaml \ --template nested-stack.yaml \ --enable-rollback \ --wait \ finance-app2 2017-06-01 08:48:03Z [finance-app2]: CREATE_IN_PROGRESS Stack CREATE started 2017-06-01 08:48:03Z [finance-app2.my_resource]: CREATE_IN_PROGRESS state changed 2017-06-01 08:51:10Z [finance-app2.my_resource]: CREATE_COMPLETE state changed 2017-06-01 08:51:10Z [finance-app2]: CREATE_COMPLETE Stack CREATE completed successfully +---------------------+------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------+ | id | dbb32889-c565-495c-971e-8f27b4e35588 | | stack_name | finance-app2 | | description | Using ResourceGroup to scale out the custom instance | | creation_time | 2017-06-01T08:48:02Z | | updated_time | None | | stack_status | CREATE_COMPLETE |
414
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
| stack_status_reason | Stack CREATE completed successfully | +---------------------+------------------------------------------------------+
7.9. Verify that the finance-app2 stack provisioned two instances. [student@workstation heat-templates(developer1-finance)]$ openstack server \ list -c Name -c Status -c Networks +--------------+--------+----------------------------------------------+ | Name | Status | Networks | +--------------+--------+----------------------------------------------+ | finance-web1 | ACTIVE | finance-network1=192.168.1.N, 172.25.250.P | | finance-web1 | ACTIVE | finance-network1=192.168.1.Q, 172.25.250.R | +--------------+--------+----------------------------------------------+
7.10. Delete the finance-app1 stack. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ delete --yes --wait finance-app2 2017-06-01 08:52:01Z [finance-app2]: DELETE_IN_PROGRESS Stack DELETE started
8.
Download the template from http://materials.example.com/heat/ts-stack.yaml. Download the environment file from http://materials.example.com/heat/ ts-environment.yaml. Troubleshoot the template and fix the issues to deploy the orchestration stack successfully. 8.1. Download the template and the environment files in the /home/student/templates directory. [student@workstation heat-templates(developer1-finance)]$ wget \ http://materials.example.com/heat/ts-stack.yaml [student@workstation heat-templates(developer1-finance)]$ wget \ http://materials.example.com/heat/ts-environment.yaml
8.2. Verify that the Heat template does not contain any errors. Use the developer1 user credentials to dry run the stack and check for any errors. Name the stack finance-app3. Use the ts-stack.yaml template and the tsenvironment.yaml environment file. The finance-app3 stack dry run returns the following error: [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment ts-environment.yaml \ --template ts-stack.yaml \ --dry-run \ finance-app3 Error parsing template file:///home/student/heat-templates/ts-stack.yaml while parsing a block mapping in "", line 58, column 5: type: OS::Nova::Server ^ expected , but found '' in "", line 61, column 7:
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
415
Chapter 9. Orchestrating Deployments image: { get_param: image_name }
8.3. Fix the indentation error for the name property of the OS::Nova::Server resource type. web_server: type: OS::Nova::Server properties: name: { get_param: instance_name }
8.4. Verify the indentation fix by running the dry run of the finance-app3 stack again. The finance-app3 stack dry run returns the following error: [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment ts-environment.yaml \ --template ts-stack.yaml \ --dry-run \ finance-app3 ERROR: Parameter 'key_name' is invalid: Error validating value 'finance-keypair1': The Key (finance-keypair1) could not be found.
8.5. Resolve the error as the keypair passed in the ts-environment.yaml file does not exists. Check the keypair name that exists. [student@workstation heat-templates(developer1-finance)]$ openstack keypair \ list +---------------------+-------------------------------------------------+ | Name | Fingerprint | +---------------------+-------------------------------------------------+ | developer1-keypair1 | e3:f0:de:43:36:7e:e9:a4:ee:04:59:80:8b:71:48:dc | +---------------------+-------------------------------------------------+
Edit the /home/student/heat-templates/ts-environment.yaml file. Enter the correct key pair name, developer1-keypair1. 8.6. Verify the keypair name fix in the /home/student/heat-templates/tsenvironment.yaml file. The finance-app3 stack dry run must not return any error. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment ts-environment.yaml \ --template ts-stack.yaml \ --dry-run \ finance-app3
8.7. Launch the stack using the ts-stack.yaml template file and the tsenvironment.yaml environment file. Name the stack finance-app3.
416
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
If the dry run succeeds, run the openstack stack create with --enablerollback option. Do not use the --dry-run option while launching the stack. [student@workstation heat-templates(developer1-finance)]$ openstack stack \ create \ --environment ts-environment.yaml \ --template ts-stack.yaml \ --enable-rollback \ --wait \ finance-app3 [finance-app3]: CREATE_IN_PROGRESS Stack CREATE started [finance-app3.wait_handle]: CREATE_IN_PROGRESS state changed [finance-app3.web_security_group]: CREATE_IN_PROGRESS state changed [finance-app3.web_security_group]: CREATE_COMPLETE state changed [finance-app3.wait_handle]: CREATE_COMPLETE state changed [finance-app3.wait_condition]: CREATE_IN_PROGRESS state changed [finance-app3.web_net_port]: CREATE_IN_PROGRESS state changed [finance-app3.web_net_port]: CREATE_COMPLETE state changed [finance-app3.web_floating_ip]: CREATE_IN_PROGRESS state changed [finance-app3.web_server]: CREATE_IN_PROGRESS state changed [finance-app3.web_floating_ip]: CREATE_COMPLETE state changed [finance-app3.web_server]: CREATE_COMPLETE state changed [finance-app3.wait_handle]: SIGNAL_COMPLETE Signal: status:SUCCESS reason:Signal 1 received [finance-app3.wait_condition]: CREATE_COMPLETE state changed [finance-app3]: CREATE_COMPLETE Stack CREATE completed successfully +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | id | 839ab589-1ded-46b2-8987-3fe18e5e823b | | stack_name | finance-app3 | | description | spawning a custom server | | creation_time | 2017-06-01T12:08:22Z | | updated_time | None | | stack_status | CREATE_COMPLETE | | stack_status_reason | Stack CREATE completed successfully | +---------------------+--------------------------------------+
Cleanup From workstation, run the lab orchestration-heat-templates cleanup command to clean up this exercise. [student@workstation ~]$ lab orchestration-heat-templates cleanup
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
417
Chapter 9. Orchestrating Deployments
Configuring Stack Autoscaling Objective After completing this section, students should be able to implement Autoscaling.
Overview of Autoscaling and its Benefits Autoscaling provides cloud applications the ability to dynamically adjust the resource capacity to meet service requirements. Adding autoscaling to your architecture provides scalability, availability, and fault tolerance. Automatic scaling of the cloud infrastructure allows the cloud provider to gain following benefits: • Autoscaling detects an unhealthy instance, terminates it, and launches a new instance to replace it. • Autoscaling allows cloud resources to run with the capacity required to handle the demand. There are two types of scaling architecture: scale-up and scale-out. In scale-up architecture, scaling adds more capacity by increasing the resources such as memory, CPU, disk IOPS, and so on. In scale-out architecture, scaling adds more capacity by increasing the number of servers to handle the load. The scale-up architecture is simple to implement but hits a saturation point sooner or later. If you keep adding more memory to an existing cloud instance to adapt with the current load, saturation is reached once the instance's host itself runs out of resources. The scaling scope in this case depends entirely upon the hardware capacity of the node where the cloud instance is hosted. In scale-out architecture, new identical resources are created to fulfill the load with virtually unlimited scaling scope. Therefore, the scale-out architecture is preferred and the recommended approach for cloud infrastructure. Autoscaling requires a trigger generated from an alarming service to scale out or scale in. In Red Hat OpenStack Platform, the Orchestration service implements autoscaling by using utilization data gathered from the Telemetry service. An alarm acts as the trigger to autoscale an orchestration stack based on the resource utilization threshold or the event pattern defined in the alarm.
Autoscaling Architecture and Services The Orchestration service implements the Autoscaling feature. An administrator creates a stack that dynamically scales based on defined scaling policies. The Telemetry service monitors cloud instances and other resources in OpenStack. The metrics collected by the Telemetry service are stored and aggregated by the Time Series Database service (Gnocchi). Based on data collected by the Time Series Database service, an alarm determines the condition upon which scaling triggers. To trigger autoscaling based on the current workload, use the metric, alarm, and scaling policy resources.
418
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Autoscaling Use Cases and Best Practices
Figure 9.3: Autoscaling with Heat orchestration An orchestration stack is also automatically scaled using the Aodh event alarms. For example, when an instance abruptly stops, the stack marks the server unhealthy and launches a new server to replace it.
Autoscaling Use Cases and Best Practices Autoscaling builds complex environments that automatically adjust capacity by dynamically adding or removing resources. This aids in performance, availability, and control over infrastructure cost and usage. Autoscaling supports all the use cases where an application architecture demands scalability to maintain performance and decreases capacity during low demands to reduce cost. Autoscaling is well-suited for the applications which have stable usage patterns or shows variability in usage patterns during a given period. Autoscaling also supports creating self-healing application architecture. In a self-healing architecture, the unhealthy cloud instances in which the application is not responding are replaced by terminating the instance and launching a new instance.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
419
Chapter 9. Orchestrating Deployments
Figure 9.4: Deploying Infrastructure using an Orchestration Stack Consider a deployment, illustrated in Figure 9.4: Deploying Infrastructure using an Orchestration Stack, in which the Orchestration stack uses a public IP address, a load balancer pool, a load balancer, a set of cloud instances, and alarms to monitor events. The stack uses predefined event patterns generated in the OpenStack messaging queue.
420
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Autoscaling Use Cases and Best Practices
Figure 9.5: Self-Healing Infrastructure using Orchestration Stack When the event alarm associated with the load balancer detects an event indicating one of the instance in the pool is stopped or deleted, a scaling event occurs. It first marks the server as unhealthy and begins a stack update to replace it with a new identical stack automatically. The following recommended practices help you to plan and organize autoscaling with an Orchestration stack: • Scale-out architecture is more suitable for cloud computing and autoscaling, whereas scale up is a better option for traditional virtualization platform. • Stateless application architecture is most appropriate for autoscaling. When a server goes down or transitions into an error state, it is not repaired, but is removed from the stack and replaced by a new server. • It is better to scale up faster than scale down. For example, when scaling up, do not add one server after five minutes then another one after ten minutes. Instead, add two servers at once. • Avoid unnecessary scaling by defining a reasonable cool-down period in the Autoscaling group. • Ensure that the Telemetry service is operating correctly and emitting the metrics required for autoscaling to work. • The granularity defined for the alarm must match the archive policy used by the metric.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
421
Chapter 9. Orchestrating Deployments • Test your scaling policies by simulating real-world data. For example, use the openstack metric measures add command to push new measures directly to the metric and check if that triggers the scaling as expected.
Autoscaling Configuration In a template, the Autoscaling resource group defines the resource to be provisioned. It launches a number of instances defined by the desired capacity or minimum group size parameters. Telemetry alarms are defined to trigger autoscaling to either scale out or scale in, based on the alarm rules. Primarily there are two alarms: one for scaling out and the other for scaling in. The action for these alarms invokes the URL associated with the scaling-out policy and scaling-in policy. The Autoscaling policy defines the number of resources that need to be added or removed in the event of scale out or scale in. It uses the defined Autoscaling group. To adjust to various usage patterns, multiple Autoscaling policies can be defined to automatically scale the infrastructure. Almost all metrics monitored by the Telemetry service can be used to scale orchestration stacks dynamically. The following Orchestration resource types are used to create resources for autoscaling: OS::Heat::AutoScalingGroup This resource type is used to define an Autoscaling resource group. Required properties include max_size, min_size, and resource. Optional properties include cooldown, desired_capacity, and rolling_updates. The resource property defines the resource and its properties that are created in the Autoscaling group. The max_size property defines the maximum number of identical resources in the Autoscaling group. The min_size property defines the minimum number of identical resources that must be running in the Autoscaling group. The desired_capacity property defines the desired initial number of resources. If not specified, the value of desired_capacity is equal to the value of min_size. The optional cooldown property defines the time gap, in seconds, between two consecutive scaling events. The rolling_updates property defines the sequence for rolling out the updates. It streamlines the update rather than taking down the entire service at the same time. The optional max_batch_size and min_in_service parameters of the property define maximum and minimum numbers of resources to be replaced at once. The pause_time property defines a time to wait between two consecutive updates. web_scaler: type: OS::Heat::AutoScalingGroup properties: desired_capacity: 2 cooldown: 100 max_size: 5 min_size: 1 resource: type: My::Server::Custom::WebServer properties:
422
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Autoscaling Configuration instance_name: { list_join: [ '-', [ {get_param: instance_name}, {get_attr: [random, value]} ] ] } instance_flavor: {get_param: instance_flavor} image_name: {get_param: image_name} key_name: {get_param: key_name} public_net: {get_param: public_net} private_net: {get_param: private_net} private_subnet: {get_param: private_subnet} instance_metadata: { "metering.server_group": {get_param: "OS::stack_id"} }
OS::Heat::ScalingPolicy The OS::Heat::AutoScalingGroup resource type defines the Autoscaling policy used to manage scaling in the Autoscaling group. Required properties include adjustment_type, auto_scaling_group_id, and scaling_adjustment. Optional properties include cooldown and min_adjustment_step. The Autoscaling policy uses the adjustment_type property to decide on the type of adjustment needed. When a scaling policy is executed, it changes the current capacity of the Autoscaling group using the scaling_adjustment specified in the policy. The value for the property can be set to change_in_capacity, exact_capacity, or percentage_change_in_capacity. The Autoscaling policy uses the auto_scaling_group_id property to apply the policy to the Autoscaling group. The scaling_adjustment property defines the size of adjustment. A positive value indicates that resources should be added. A negative value terminates the resource. The cooldown property defines the time gap, in seconds, between two consecutive scaling events. The min_adjustment_step property is used in conjunction with the percentage_change_in_capacity property. The property defines the minimum number of resources that are added or terminated when the Autoscaling group scales out or scales in. The resource return two attributes: alarm_url and signal_url. The alarm_url attribute returns a signed URL to handle the alarm associated with the scaling policy. This attribute is used by an alarm to send a request to either scale in or scale out, depending on the associated scaling policy. The signal_url attribute is an URL to handle the alarm using the native API that is used for scaling. The attribute value must be invoked as a REST API call with a valid authentication token. scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: { get_resource: web_scaler } cooldown: 180 scaling_adjustment: 1 scaledown_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: { get_resource: web_scaler } cooldown: 180 scaling_adjustment: -1
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
423
Chapter 9. Orchestrating Deployments OS::Aodh::GnocchiAggregationByResourcesAlarm This resource type defines the Aodh telemetry alarm based on the aggregation of resources. The alarm monitors the usage of all the sub-resources of a resource. Required properties include metric, query, resource_type, and threshold. Optional properties include aggregation_method, alarm_actions, comparison_operator, evaluation_periods, and granularity. The alarm_actions property defines the action to be taken when the alarm is triggered. When the alarm associated with a scaling policy is triggered, the alarm_actions property calls the signal_url attribute of the Autoscaling policy. The signal_url attribute is the URL that handles an alarm. The metric property defines the metric to be monitored. The evaluation_periods property sets the number of periods to evaluate the metric measures before setting off the alarm. The threshold property defines the value which, when exceeded or reduced, the alarm is triggered. memory_alarm_high: type: OS::Aodh::GnocchiAggregationByResourcesAlarm properties: description: Scale up if memory usage is 50% for 5 minutes metric: memory aggregation_method: mean granularity: 300 evaluation_periods: 1 threshold: 600 resource_type: instance comparison_operator: gt alarm_actions: - str_replace: template: trust+url params: url: {get_attr: [scaleup_policy, signal_url]} query: str_replace: template: '{"=": {"server_group": "stack_id"}}' params: stack_id: {get_param: "OS::stack_id"}
Manually Scaling an Orchestration Stack Manually autoscaling allows you to dry run the orchestration stack before deploying the stack with the associated alarms. The following steps outline the process for manually autoscaling with an orchestration stack using the signal_url attribute. 1.
Write an orchestration template to autoscale a stack using the AutoScalingGroup and ScalingPolicy resources.
2.
Define the outputs section to return the output values using the signal_url attribute of the ScalingPolicy resources.
3.
Launch the orchestration stack. List the output values returned by the signal_url attribute for both scaling out and scaling in policies.
4.
Use the openstack token issue command to retrieve an authentication token.
424
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Troubleshooting Autoscaling Issues 5.
Manually scale out or scale in by invoking the REST API using the signal_url attribute value along with the token ID generated.
Troubleshooting Autoscaling Issues When a stack with automatic scaling is deployed, useful information is logged into the log files of the Orchestration service. The default logging level for the orchestration service is ERROR. Enabling DEBUG logging gives more insight and helps to trace the complex issues. To enable DEBUG logging, edit the /etc/heat/heat.conf file on the on the host where the Orchestration components are deployed. The following log files for each orchestration service are stored in the /var/log/heat directory on the host where the Orchestration components are deployed. heat-api.log The /var/log/heat/heat-api.log log file records API calls to the orchestration service. heat-engine.log The /var/log/heat/heat-engine.log log file stores the processing of orchestration templates and the requests to the underlying API for the resources defined in the template. heat-manage.log The /var/log/heat/heat-manage.log log file stores the events that occur when deploying a stack, or when a scaling event is triggered. Alarms play important roles in the autoscaling of instances. The following log files for the Aodh alarming service is stored in the /var/log/Aodh directory of the controller node. listener.log Logs related to the Aodh alarming service querying the Gnocchi metering service are recorded in this file. The /var/log/aodh/listener.log log file provides information to troubleshoot situations when the Alarming service is unable to reach the Telemetry service to evaluate the alarm condition. notifier.log Logs related to notifications provided by an Aodh alarm are recorded in this file. The / var/log/aodh/notifier.log log file is helpful when troubleshooting situations where the Alarming service is unable to reach the signal_url defined for the alarm to trigger autoscaling. evaluator.log The Alarming service evaluates the usage data every minute, or as defined in the alarm definition. Should the evaluation fail, errors are logged in the /var/log/aodh/ evaluator.log log file. Troubleshooting a deployed orchestration stack that is not performing scaling up or scaling down operations starts by looking for the orchestration events. After the orchestration stack provisions the resources, the openstack stack event list command returns the SIGNAL_COMPLETE status once a scaling event completes. More information about the scaling event can be viewed using the openstack stack event show command. If the autoscaling stack fails to deploy, use the openstack stack command to identify the failed component. Use the openstack stack list command with the --show-nested
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
425
Chapter 9. Orchestrating Deployments option to view all nested stacks. The command returns the nested stack IDs, names, and stack status. Use the openstack stack resource list command to identify the failed resource. The command returns the resource name, physical resource ID, resource type, and its status. The physical resource ID can then be queried using the openstack stack resource show command to check the output value returned while creating the resource.
References Further information is available in the Configure Autoscaling for Compute section of the Autoscaling for Compute for Red Hat OpenStack Platform at https://access.redhat.com/documentation/en/red-hat-openstack-platform/
426
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Quiz: Configuring Stack Autoscaling
Quiz: Configuring Stack Autoscaling Choose the correct answer(s) to the following questions: 1.
Which OpenStack service provides the evaluation criteria for triggering auto-scaling? a. b. c. d.
2.
Which two statements are true about autoscaling using an orchestration stack? (Choose two.) a. b. c. d.
3.
cooldown wait pause timeout
Which three are allowed values for the adjustment_type property of a scaling policy resource? (Choose three) a. b. c. d. e. f.
6.
OS::Heat::AutoScalingPolicy OS::Nova::Server OS::Heat::ScalingPolicy OS::Heat::AutoScalingGroup
Which property of the AutoScalingGroup resource is used to define the time gap between two consecutive scaling events? a. b. c. d.
5.
Autoscaling allows you to scale the resources in but not out. Autoscaling allows you to manually scale the resources both in and out. Autoscaling allows you to scale the resources automatically out but not in. Autoscaling allows you to scale the resources automatically both in and out.
What is the resource type required to define the Auto Scaling policy using an orchestration stack? a. b. c. d.
4.
Nova Gnocchi Aodh Ceilometer
change_capacity change_in_capacity exact_capacity exact_in_capacity percentage_change_in_capacity percentage_change_capacity
Which attribute of the scaling policy returns the signed URL to handle the alarm associated with the scaling policy?
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
427
Chapter 9. Orchestrating Deployments a. b. c. d.
428
signed_URL signal_URL alarm_URL scale_URL
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Solution
Solution Choose the correct answer(s) to the following questions: 1.
Which OpenStack service provides the evaluation criteria for triggering auto-scaling? a. b. c. d.
2.
Which two statements are true about autoscaling using an orchestration stack? (Choose two.) a. b. c. d.
3.
cooldown wait pause timeout
Which three are allowed values for the adjustment_type property of a scaling policy resource? (Choose three) a. b. c. d. e. f.
6.
OS::Heat::AutoScalingPolicy OS::Nova::Server OS::Heat::ScalingPolicy OS::Heat::AutoScalingGroup
Which property of the AutoScalingGroup resource is used to define the time gap between two consecutive scaling events? a. b. c. d.
5.
Autoscaling allows you to scale the resources in but not out. Autoscaling allows you to manually scale the resources both in and out. Autoscaling allows you to scale the resources automatically out but not in. Autoscaling allows you to scale the resources automatically both in and out.
What is the resource type required to define the Auto Scaling policy using an orchestration stack? a. b. c. d.
4.
Nova Gnocchi Aodh Ceilometer
change_capacity change_in_capacity exact_capacity exact_in_capacity percentage_change_in_capacity percentage_change_capacity
Which attribute of the scaling policy returns the signed URL to handle the alarm associated with the scaling policy? a. b.
signed_URL signal_URL
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
429
Chapter 9. Orchestrating Deployments c. d.
430
alarm_URL scale_URL
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
Summary
Summary In this chapter, you learned: • The Orchestration service (Heat) provides the developers and the system administrators an easy and repeatable way to create and manage a collection of related OpenStack resources. • The Orchestration API service forwards requests to the Orchestration engine service using remote procedure calls (RPCs) over AMQP. • The Orchestration engine service interprets the orchestration template and launches the stack. • Using multiple layers of stacks that build on top of one another is the best way to organize an orchestration stack. • Changes in infrastructure after updating a stack must be verified first by doing a dry run of the stack. • Intrinsic functions in the Heat orchestration template assign values to properties that are available during the creation of a stack. • Using the OS::Heat::SoftwareDeployment resource allows any number of software configuration changes applied to an instance throughout its life cycle. • When the user data is changed and the orchestration stack is updated using the openstack stack update command, the instance is deleted and recreated using the updated user data script. • The AutoScalingGroup and the ScalingPolicy resources of the Orchestration stack help build self-healing infrastructure. • Stateless servers are more suitable for autoscaling. If a server goes down or transitions into an error state, instead of repairing the server, it will be replaced with a new server.
CL210-RHOSP10.1-en-2-20171006
Rendered for Nokia. Please do not distribute.
431
432
Rendered for Nokia. Please do not distribute.