Skip to content

Commit

Permalink
more wordsmithing
Browse files Browse the repository at this point in the history
  • Loading branch information
llpeterson committed Oct 20, 2021
1 parent f4388c6 commit df0ae01
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 66 deletions.
33 changes: 16 additions & 17 deletions arch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Platform-as-a-Service (PaaS).

Aether supports this combination by implementing both the RAN and the
user plane of the Mobile Core on-prem, as cloud-native workloads
co-located on the Aether cluster. This is often referred to as local
breakout because it enables direct communication between mobile
co-located on the Aether cluster. This is often referred to as *local
breakout* because it enables direct communication between mobile
devices and edge applications without data traffic leaving the
enterprise. This scenario is depicted in :numref:`Figure %s
<fig-hybrid>`, which does not name the edge applications, but
Expand All @@ -62,7 +62,7 @@ example.

The approach includes both edge (on-prem) and centralized (off-prem)
components. This is true for edge apps, which often have a centralized
counterpart running in a commodity cloud. It is also true for the
counterpart running in a commodity cloud. It is also true for the 5G
Mobile Core, where the on-prem User Plane (UP) is paired with a
centralized Control Plane (CP). The central cloud shown in this figure
might be private (i.e., operated by the enterprise), public (i.e.,
Expand All @@ -72,9 +72,9 @@ cloud). Also shown in :numref:`Figure %s <fig-hybrid>` is a
centralized *Control and Management Platform*. This represents all the
functionality needed to offer Aether as a managed service, with system
administrators using a portal exported by this platform to operate the
underlying infrastructure and services. The rest of this book is about
everything that goes into implementing that *Control and Management
Platform*.
underlying infrastructure and services within their enterprise. The
rest of this book is about everything that goes into implementing that
*Control and Management Platform*.

2.1 Edge Cloud
--------------
Expand Down Expand Up @@ -112,8 +112,8 @@ the SD-Fabric), are deployed as a set of microservices, but details
about the functionality implemented by these containers is otherwise
not critical to this discussion. For our purposes, they are
representative of any cloud native workload. (The interested reader is
referred to our 5G and SDN books for more information about the
internal working of SD-RAN, SD-Core, and SD-Fabric.)
referred to our companion 5G and SDN books for more information about
the internal working of SD-RAN, SD-Core, and SD-Fabric.)

.. _reading_5g:
.. admonition:: Further Reading
Expand Down Expand Up @@ -151,8 +151,8 @@ Platform (AMP).
Each SD-Core CP controls one or more SD-Core UPs, as specified by
3GPP, the standards organization responsible for 5G. Exactly how CP
instances (running centrally) are paired with UP instances (running at
the edges) is a configuration-time decision, and depends on the degree
of isolation the enterprise sites require. AMP is responsible for
the edges) is a runtime decision, and depends on the degree of
isolation the enterprise sites require. AMP is responsible for
managing all the centralized and edge subsystems (as introduced in the
next section).

Expand All @@ -173,12 +173,12 @@ we started with in :numref:`Figure %s <fig-hw>` of Chapter 1).\ [#]_
This is because, while each ACE site usually corresponds to a physical
cluster built out of bare-metal components, each of the SD-Core CP
subsystems shown in :numref:`Figure %s <fig-aether>` is actually
deployed as a logical Kubernetes cluster on a commodity cloud. The
deployed in a logical Kubernetes cluster on a commodity cloud. The
same is true for AMP. Aether’s centralized components are able to run
in Google Cloud Platform, Microsoft Azure, and Amazon’s AWS. They also
run as an emulated cluster implemented by a system like
KIND—Kubernetes in Docker—making it possible for developers to run
these components on a laptop.
these components on their laptop.

.. [#] Confusingly, Kubernetes adopts generic terminology, such as
“cluster” and “service”, and gives it very specific meaning. In
Expand All @@ -190,8 +190,7 @@ these components on a laptop.
potentially thousands of such logical clusters. And as we'll
see in a later chapter, even an ACE edge site sometimes hosts
more than one Kubernetes cluster (e.g., one running production
services and one used for development and testing of new
services).
services and one used for trial deployments of new services).
2.3 Control and Management
--------------------------
Expand Down Expand Up @@ -304,7 +303,7 @@ both physical and virtual resources.
2.3.2 Lifecycle Management
~~~~~~~~~~~~~~~~~~~~~~~~~~

Lifecycle Management is the process of integrating fixed, extended,
Lifecycle Management is the process of integrating debugged, extended,
and refactored components (often microservices) into a set of
artifacts (e.g., Docker containers and Helm charts), and subsequently
deploying those artifacts to the operational cloud. It includes a
Expand Down Expand Up @@ -368,7 +367,7 @@ the cloud offers to end users. Thus, we can generalize the figure so
Runtime Control mediates access to any of the underlying microservices
(or collections of microservices) the cloud designer wishes to make
publicly accessible, including the rest of AMP! In effect, Runtime
Control implements an abstraction layer, codified with programmatic
Control implements an abstraction layer, codified with a programmatic
API.

Given this mediation role, Runtime Control provides mechanisms to
Expand Down Expand Up @@ -434,7 +433,7 @@ operators a way to both read (monitor) and write (control) various
parameters of a running system. Connecting those two subsystems is how
we build closed loop control.

A third example is even more ambiguous. Lifecycle management usually
A third example is even more nebulous. Lifecycle management usually
takes responsibility for *configuring* each component, while runtime
control takes responsibility for *controlling* each component. Where
you draw the line between configuration and control is somewhat
Expand Down
82 changes: 48 additions & 34 deletions intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ perspective on the problem. We return to the confluence of enterprise,
cloud, access technologies later in this chapter, but we start by
addressing the terminology challenge.

.. _reading_aether:
.. admonition:: Further Reading

`Aether: 5G-Connected Edge Cloud
<https://opennetworking.org/aether/>`__.

1.1 Terminology
---------------

Expand Down Expand Up @@ -107,7 +113,7 @@ terminology.
* **OSS/BSS:** Another Telco acronym (Operations Support System,
Business Support System), referring to the subsystem that
implements both operational logic (OSS) and business logic
(BSS). Usually the top-most component in the overall O&M
(BSS). It is usually the top-most component in the overall O&M
hierarchy.

* **EMS:** Yet another Telco acronym (Element Management System),
Expand Down Expand Up @@ -164,34 +170,34 @@ terminology.
* **Continuous Integration / Continuous Deployment (CI/CD):** An
approach to Lifecycle Management in which the path from
development (producing new functionality) to testing, integration,
and ultimately deployment is an automated pipeline. Typically
implies continuously making small incremental changes rather than
performing large disruptive upgrades.
and ultimately deployment is an automated pipeline. CI/CD
typically implies continuously making small incremental changes
rather than performing large disruptive upgrades.

* **DevOps:** An engineering discipline (usually implied by CI/CD)
that balances feature velocity against system stability. It is a
practice typically associated with container-based (also known as
*cloud native*) systems, and typified by *Site Reliability
*cloud native*) systems, as typified by *Site Reliability
Engineering (SRE)* practiced by cloud providers like Google.

* **In-Service Software Upgrade (ISSU):** A requirement that a
component continue running during the deployment of an upgrade,
with minimal disruption to the service delivered to
end-users. Generally implies the ability to incrementally roll-out
(and roll-back) an upgrade, but is specifically a requirement on
individual components (as opposed to the underlying platform used
to manage a set of components).
end-users. ISSU generally implies the ability to incrementally
roll-out (and roll-back) an upgrade, but is specifically a
requirement on individual components (as opposed to the underlying
platform used to manage a set of components).

* **Monitoring & Logging:** Collecting data from system components to aid
in management decisions. This includes diagnosing faults, tuning
performance, doing root cause analysis, performing security audits,
and provisioning additional capacity.

* **Analytics:** A program (often using statistical models) that
produces additional insights (value) from raw data. Can be used to
close a control loop (i.e., auto-reconfigure a system based on
produces additional insights (value) from raw data. It can be used
to close a control loop (i.e., auto-reconfigure a system based on
these insights), but could also be targeted at a human operator
(that subsequently takes some action).
that subsequently takes some action.

Another way to talk about operations is in terms of stages, leading to
a characterization that is common for traditional network devices:
Expand Down Expand Up @@ -301,9 +307,9 @@ manageable:
majority of configuration involves initiating software parameters,
which is more readily automated.

* Cloud native implies a set best-practices for addressing many of the
FCAPS requirements, especially as they relate to availability and
performance, both of which are achieved through horizontal
* Cloud native implies a set of best-practices for addressing many of
the FCAPS requirements, especially as they relate to availability
and performance, both of which are achieved through horizontal
scaling. Secure communication is also typically built into cloud RPC
mechanisms.

Expand All @@ -319,17 +325,19 @@ monitoring data in a uniform way, and (d) continually integrating and
deploying individual microservices as they evolve over time.

Finally, because a cloud is infinitely programmable, the system being
managed has the potential to change substantially over time.\ [#]_ This
means that the cloud management system must itself be easily extended
to support new features (as well as the refactoring of existing
features). This is accomplished in part by implementing the cloud
management system as a cloud service, but it also points to taking
advantage of declarative specifications of how all the disaggregated
pieces fit together. These specifications can then be used to generate
elements of the management system, rather than having to manually
recode them. This is a subtle issue we will return to in later
chapters, but ultimately, we want to be able to auto-configure the
subsystem responsible for auto-configuring the rest of the system.
managed has the potential to change substantially over time.\ [#]_
This means that the cloud management system must itself be easily
extended to support new features (as well as the refactoring of
existing features). This is accomplished in part by implementing the
cloud management system as a cloud service, which means we will see a
fair amount of recursive dependencies throughout this book. It also
points to taking advantage of declarative specifications of how all
the disaggregated pieces fit together. These specifications can then
be used to generate elements of the management system, rather than
having to manually recode them. This is a subtle issue we will return
to in later chapters, but ultimately, we want to be able to
auto-configure the subsystem responsible for auto-configuring the rest
of the system.

.. [#] For example, compare the two services Amazon offered ten years
ago (EC2 and S3) with the well over 100 services available on
Expand Down Expand Up @@ -371,13 +379,19 @@ identifies the technology we assume.
~~~~~~~~~~~~~~~~~~~~~~~

The assumed hardware building blocks are straightforward. We start
with bare-metal servers and switches, built using merchant
silicon. These might, for example, be ARM or x86 processor chips and
with bare-metal servers and switches, built using merchant silicon
chips. These might, for example, be ARM or x86 processor chips and
Tomahawk or Tofino switching chips, respectively. The bare-metal boxes
also include a bootstrap mechanism (e.g., BIOS for servers and ONIE
for switches), and a remote device management interface (e.g., IPMI or
Redfish).

.. _reading_redfish:
.. admonition:: Further Reading

Distributed Management Task Force (DMTF) `Redfish
<https://www.dmtf.org/standards/redfish>`__.

A physical cloud cluster is then constructed with the hardware
building blocks arranged as shown in :numref:`Figure %s <fig-hw>`: one
or more racks of servers connected by a leaf-spine switching
Expand All @@ -397,11 +411,11 @@ that software running on the servers controls the switches.
software components, which we describe next. Collectively, all the
hardware and software components shown in the figure form the
*platform*. Where we draw the line between what's *in the platform*
and what runs *on top of the platform* will become clear in later
chapters, but the summary is that different mechanisms will be
responsible for (a) bringing up the platform and prepping it to host
workloads, and (b) managing the various workloads that need to be
deployed on that platform.
and what runs *on top of the platform*, and why it is important, will
become clear in later chapters, but the summary is that different
mechanisms will be responsible for (a) bringing up the platform and
prepping it to host workloads, and (b) managing the various workloads
that need to be deployed on that platform.


1.3.2 Server Virtualization
Expand All @@ -415,7 +429,7 @@ resources, all running on the commodity processors in the cluster:
2. Kubernetes instantiates and interconnects containers.

3. Helm charts specify how collections of related containers are
interconnected.
interconnected to build applications.

These are all well known and ubiquitous, and so we only summarize them
here. Links to related information for anyone that is not familiar
Expand Down
30 changes: 15 additions & 15 deletions preface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,21 @@ job of it.
The answer, we believe, is that the cloud is becoming ubiquitous in
another way, as it moves from hundreds of datacenters to tens of
thousands of enterprises. And while it is clear that the commodity
cloud providers will happily manage those edge clusters as a logical
cloud providers are eager to manage those edge clusters as a logical
extension of their datacenters, they do not have a lock on the
know-how for making that happen.

This book lays out a roadmap that a small team of engineers followed
over a course of a year to stand-up and operationalize a hybrid cloud
spanning a dozen enterprises, and hosting a non-trivial cloud native
service (5G connectivity in our case, but that’s just an example). The
team was able to do this by leveraging 20+ open source components,
but selecting those components is just a start. There were dozens of
technical decisions to make along the way, and a few thousand lines of
configuration code to write. We believe this is a repeatable exercise,
which we report in this book. (And the code for those configuration
files is open source, for those that want to pursue the topic in more
detail.)
over the course of a year to stand-up and operationalize a hybrid
cloud that spans a dozen enterprises, and hosts a non-trivial cloud
native service (5G connectivity in our case, but that’s just an
example). The team was able to do this by leveraging 20+ open source
components, but selecting those components is just a start. There were
dozens of technical decisions to make along the way, and a few
thousand lines of configuration code to write. We believe this is a
repeatable exercise, which we report in this book. (And the code for
those configuration files is open source, for those that want to
pursue the topic in more detail.)

Our roadmap may not be the right one for all circumstances, but it
does shine a light on the fundamental challenges and trade-offs
Expand All @@ -41,8 +41,8 @@ How to operationalize a computing system is a question that’s as old
as the field of *Operating Systems*. Operationalizing a cloud is just
today’s version of that fundamental problem, which has become all the
more interesting as we move up the stack, from managing *devices* to
managing *services*. The fact that this topic is both timely and
foundational are among the reasons it is worth studying.
managing *services*. That this topic is both timely and foundational
are among the reasons it is worth studying.


Guided Tour of Open Source
Expand Down Expand Up @@ -80,11 +80,11 @@ Sunay for his influence on its overall design. Suchitra Vemuri's
insights into testing and quality assurance were also invaluable.

This book is still very much a work-in-progress, and we will happily
acknowledge anyone that provides feedback. Please send us your
acknowledge everyone that provides feedback. Please send us your
comments using the `Issues Link
<https://github.com/SystemsApproach/ops/issues>`__. Also see the
`Wiki <https://github.com/SystemsApproach/ops/wiki>`__ for the TODO
list we're working on.
list we're currently working on.

| Larry Peterson, Scott Baker, Andy Bavier, Zack Williams, and Bruce Davie
| October 2021
Expand Down

0 comments on commit df0ae01

Please sign in to comment.