diff --git a/D3126_Overview/tex/overview.tex b/D3126_Overview/tex/overview.tex index d5e90a4..2acf645 100644 --- a/D3126_Overview/tex/overview.tex +++ b/D3126_Overview/tex/overview.tex @@ -2,13 +2,52 @@ %% \chapter{Overview} \section{Overview} -Graphs, used in ML and other \textbf{scientific} domains, as well as \textbf{industrial} and \textbf{general} programming, -do \textbf{not} presently exist in the C++ standard. In ML, a graph forms the underlying structure of an \textbf{artificial neural network} (ANN). -In a \textbf{game}, a graph can be used to represent the \textbf{map} of a game world. In \textbf{business} environments, graphs arise as -\textbf{entity relationship diagrams} (ERD) or \textbf{data flow diagrams} (DFD). In the realm of \textbf{social media}, a graph represents a -\textbf{social network}. - -All documents, taken as a whole for a Graph Library, proposes the addition of \textbf{graph algorithms, operators, views, adaptors}, the +%% Graphs, used in ML and other \textbf{scientific} domains, as well as \textbf{industrial} and \textbf{general} programming, +%% do \textbf{not} presently exist in the C++ standard. In ML, a graph forms the underlying structure of an \textbf{artificial neural network} (ANN). +%% In a \textbf{game}, a graph can be used to represent the \textbf{map} of a game world. In \textbf{business} environments, graphs arise as +%% \textbf{entity relationship diagrams} (ERD) or \textbf{data flow diagrams} (DFD). In the realm of \textbf{social media}, a graph represents a +%% \textbf{social network}. + +The original STL revolutionized the way that C++ programmers could apply algorithms to different kinds of containers, by defining \emph{generic} algorithms, realized via function templates. +A hierarchy of \emph{iterators} were the mechanism by which algorithms could be made generic with respect to different kinds of containers, +Named requirements specified the valid expressions and associated types that algorithms required of their arguments. As of C++20, we now have both ranges and concepts, which +provide language-based mechanisms for specifying requirements for generic algorithms. + +As powerful as the algorithms in the standard library are, the underlying basis for them is a range (or iterator pair), which inherently can only specify a one-dimensional container. +Iterator pairs (equiv.\ ranges) specify a \lstinline{begin()} and an \lstinline{end()} and can move between those two limits in various ways, depending on the type of iterator. +As a result, important classes of problems that programmers are regularly faced with use structures that are not one-dimensional containers, and so the standard library algorithms can't be directly used. +Multi-dimensional arrays are an example of one such kind of data structure. Matrices do have the nice property that they (typically) have the ability to be ``raveled'', i.e., the data underlying the matrix can still be treated as a one-dimensional container. Multi-dimensional arrays also have the property that, even though they can be thought of as hierarchical containers, the hierarchy is uniform---an N-dimensional array is a container of N-1 dimensional arrays. + +Another important problem domain that does not fit into the category of one-dimensional ranges is that of \emph{graph algorithms and data structures}. +Graphs are a powerful abstraction for modeling relationships between entities in a given problem domain, +irrespective of what the actual entities are, and irrespective of what the actual relationships are. +In that sense, graphs are, by their very nature, generic. +Graphs are a fundamental abstraction in computer science, and are ubiquitous in real-world applications. + +Any problem concerned with connectivity can be modeled as a graph. +Just a small set of examples include +Internet routing, circuit partitioning and layout, and finding the best route to take to a destination on map. +There are also relationships between entities that are inferred from large sets of data, for example the graph of consumers who have purchased the same product, or who have viewed the same movie. +Yet more interesting structures (hypergraphs or k-partite graphs) can arise when we want to model relationships between diverse types of data, such as the graph of consumers, the products they have purchased, and the vendors of the products. +And, of course, graphs play a critical role in multiple aspects of machine learning. + +Along with these graph abstractions are the graph algorithms that are widely used for solving problems from these domains. +Well-known graph algorithms include breadth-first search, Dijkstra's algorithm, connected components, and so on. +% +Because graphs can come from so many different problem domains, they will also be represented with many different kinds of data structures. +To make graph algorithms as usable as possible across arbitrary representations requires application of the same principles that were used in the original STL: +a collection of related algorithms from a problem domain (in our case, graphs), +minimizing the requirements imposed by the algorithms on their arguments, +systematically organizing the requirements, and +realizing this framework of requirements in the form of concepts. + +There are also many uses of graphs that would not be met by a standard set of algorithms. A standardized interface for graphs is eminently useful in such situations as well. +In the most basic case, it would provide a well-defined framework for development. But in keeping with the foundational goal of generic programming to enable reuse, it would also empower users to develop and deploy their own reusable graph components. In the best case, such algorithms would be available to the broader C++ programmer community. + +Because graphs are so ubiquitous and so important to modern software systems, a standardized library of graph algorithms and data structures would have enormous benefit to the C++ development community. +This proposal contains the specification of such a library, developed using the principles above. + +In total, all documents, taken as a whole for a Graph Library, propose the addition of \textbf{graph algorithms, operators, views, adaptors}, the \textbf{graph container interface}, and a \textbf{graph container implementation} to the C++ library to support \textbf{machine learning} (ML), as well as other applications. ML is a large and growing field, both in the \textbf{research community} and \textbf{industry}, that has received a great deal of attention in recent years. This documents presents an \textbf{interface} of the proposed algorithms, operators, adaptors, @@ -76,20 +115,58 @@ \subsection{Future Roadmap} \end{itemize} -\section{Examples} +% \section{Examples} +%% %\andrew{Where do examples really belong? In P2300 they are up front here, but I think there is too much forward referencing for that.} +%% The following code demonstrates how a simple graph can be created as a range of ranges, using the standard containers. +%% % \phil{Duplicated in Introduction. OK?} +%% {\small +%% \lstinputlisting[firstline=26,lastline=48]{D3126_Overview/src/bacon.cpp} +%% } +%% \tcode{target_id(g,uv)} defines the required function to get a target\_id for an edge in the graph \tcode{G}. Other functions can also +%% be overridden to allow a developer to adapt their own graph data structures to the library. -%\andrew{Where do examples really belong? In P2300 they are up front here, but I think there is too much forward referencing for that.} +\section{Example: Six Degrees of Kevin Bacon} +\label{sec:bacon} -The following code demonstrates how a simple graph can be created as a range of ranges, using the standard containers. +A classic example of the use of a graph algorithm is the game ``The Six Degrees of Kevin Bacon.'' +The game is played by connecting actors to each other through movies they have appeared in together. +The goal is to find the smallest number of movies that connect a given actor to Kevin Bacon. +That number is called the ``Bacon number'' of the actor. Kevin Bacon himself has a Bacon number of 0. +Since Kevin Bacon appeared with Tom Cruise in ``A Few Good Men'', Tom Cruise has a Bacon number of 1. -\phil{Duplicated in Introduction. OK?} +The following program computes the Bacon number for a small selection of actors. +% \phil{Duplicated in Overview's Examples. OK?} {\small \lstinputlisting[firstline=26,lastline=48]{D3126_Overview/src/bacon.cpp} } -\tcode{target_id(g,uv)} defines the required function to get a target\_id for an edge in the graph \tcode{G}. Other functions can also -be overridden to allow a developer to adapt their own graph data structures to the library. + +\noindent +Output: +\begin{lstlisting} +Tom Cruise has Bacon number 1 +Kevin Bacon has Bacon number 0 +Hugo Weaving has Bacon number 3 +Carrie-Anne Moss has Bacon number 4 +Natalie Portman has Bacon number 2 +Jack Nicholson has Bacon number 1 +Kelly McGillis has Bacon number 2 +Harrison Ford has Bacon number 1 +Sebastian Stan has Bacon number 3 +Mila Kunis has Bacon number 3 +Michelle Pfeiffer has Bacon number 1 +Keanu Reeves has Bacon number 4 +Julia Roberts has Bacon number 1 +\end{lstlisting} + + +In graph parlance, we are creating a graph where the vertices are actors and the edges are movies. +The number of movies that connect an actor to Kevin Bacon is the shortest path in the graph +from Kevin Bacon to that actor. In the example above, we compute shortest paths from Kevin +Bacon to all other actors and print the results. +Note, however, that actor-actor relationships are not how data about actors +is available in the wild (from IMDB, for example). Rather, two available types of data are actor-movie and movie-actor relationships. See Section~\ref{sec:bipartite} below. \section{What this proposal is \textbf{not}} @@ -130,6 +207,7 @@ \section{Performance Considerations} Performance analysis from those algorithms can be found in the peer-reviewed papers for NWGraph~\cite{REF_nwgraph_paper,gapbs_2023}. \section{Prior Art} + \textbf{boost::graph} has been an important C++ graph implementation since 2001. It was developed with the goal of providing a modern (at the time) generic library that addressed all the needs of a graph library user. It is still a viable library used today, attesting to the value it brings. diff --git a/D3126_Overview/tex/revision.tex b/D3126_Overview/tex/revision.tex index 011ef7b..9230a04 100644 --- a/D3126_Overview/tex/revision.tex +++ b/D3126_Overview/tex/revision.tex @@ -30,4 +30,6 @@ \subsection*{\paperno r3} \item Change the reference implementation from the \tcode{std::graph} to the \tcode{graph} namespace. This will make it more accessible to the community and allow for easier experimentation outside of this proposal. \item Update the status on supporting more versatile BFS and DFS algorithms. + \item Add additional motivation for a graph library in the Overview section. + \item Extend the Six Degrees of Kevin Bacon example to include the output and additional description. \end{itemize} diff --git a/D3127_Terminology/tex/revision.tex b/D3127_Terminology/tex/revision.tex index 0efd1a0..9482bdb 100644 --- a/D3127_Terminology/tex/revision.tex +++ b/D3127_Terminology/tex/revision.tex @@ -6,3 +6,13 @@ \subsection*{\paperno r0} \item Split from the P1709r5 \textit{Overview and Introduction} section and expanded with more details and examples. Also added \textit{Getting Started} section. \end{itemize} + +\subsection*{\paperno r1} + +\begin{itemize} + \item Move text from the Motivation section to the Overview section in P3126. + \item Remove the Six Degrees of Kevin Bacon example, a duplication of the same example in P3126. + \item Update the Direct Representation with C++ code examples, and add content for special cases that + occur in graphs such as \emph{self-loops}, \emph{multigraph}, \emph{cycle}, \emph{tree}, etc. + \item Add a sections on Incident Matrices and Regarding Algorithms. +\end{itemize} diff --git a/D3127_Terminology/tex/terminology_0.tex b/D3127_Terminology/tex/terminology_0.tex index d480b31..1e275f1 100644 --- a/D3127_Terminology/tex/terminology_0.tex +++ b/D3127_Terminology/tex/terminology_0.tex @@ -5,128 +5,52 @@ %% \chapter{Introduction} %% \label{ch:introduction} - -\section{Motivation} - -The original STL revolutionized the way that C++ programmers could apply algorithms to different kinds of containers, by defining \emph{generic} algorithms, realized via function templates. -A hierarchy of \emph{iterators} were the mechanism by which algorithms could be made generic with respect to different kinds of containers, -Named requirements specified the valid expressions and associated types that algorithms required of their arguments. As of C++20, we now have both ranges and concepts, which -provide language-based mechanisms for specifying requirements for generic algorithms. - -As powerful as the algorithms in the standard library are, the underlying basis for them is a range (or iterator pair), which inherently can only specify a one-dimensional container. -Iterator pairs (equiv.\ ranges) specify a \lstinline{begin()} and an \lstinline{end()} and can move between those two limits in various ways, depending on the type of iterator. -As a result, important classes of problems that programmers are regularly faced with use structures that are not one-dimensional containers, and so the standard library algorithms can't be directly used. -Multi-dimensional arrays are an example of one such kind of data structure. Matrices do have the nice property that they (typically) have the ability to be ``raveled'', i.e., the data underlying the matrix can still be treated as a one-dimensional container. Multi-dimensional arrays also have the property that, even though they can be thought of as hierarchical containers, the hierarchy is uniform---an N-dimensional array is a container of N-1 dimensional arrays. - -Another important problem domain that does not fit into the category of one-dimensional ranges is that of \emph{graph algorithms and data structures}. -Graphs are a powerful abstraction for modeling relationships between entities in a given problem domain, -irrespective of what the actual entities are, and irrespective of what the actual relationships are. -In that sense, graphs are, by their very nature, generic. -Graphs are a fundamental abstraction in computer science, and are ubiquitous in real-world applications. - -Any problem concerned with connectivity can be modeled as a graph. -Just a small set of examples include -Internet routing, circuit partitioning and layout, and finding the best route to take to a destination on map. -There are also relationships between entities that are inferred from large sets of data, for example the graph of consumers who have purchased the same product, or who have viewed the same movie. -Yet more interesting structures (hypergraphs or k-partite graphs) can arise when we want to model relationships between diverse types of data, such as the graph of consumers, the products they have purchased, and the vendors of the products. -And, of course, graphs play a critical role in multiple aspects of machine learning. - -Along with these graph abstractions are the graph algorithms that are widely used for sovling problems from these domains. -Well-known graph algorithms include breadth-first search, Dijkstra's algorithm, connected components, and so on. -% -Because graphs can come from so many different problem domains, they will also be represented with many different kinds of data structures. -To make graph algorithms as usable as possible across arbitrary representations requires application of the same principles that were used in the original STL: -a collection of related algorithms from a problem domain (in our case, graphs), -minimizing the requirements imposed by the algorithms on their arguments, -systematically organizing the requirements, and -realizing this framework of requirements in the form of concepts. - -There are also many uses of graphs that would not be met by a standard set of algorithms. A standardized interface for graphs is eminently useful in such situations as well. -In the most basic case, it would provide a well-defined framework for development. But in keeping with the foundational goal of generic programming to enable reuse, it would also empower users to develop and deploy their own reusable graph components. In the best case, such algorithms would be available to the broader C++ programmer community. - -Because graphs are so ubiquitous and so important to modern software systems, a standardized library of graph algorithms and data structures would have enormous benefit to the C++ development community. -This proposal contains the specification of such a library, developed using the principles above. - - - -\section{Example: Six Degrees of Kevin Bacon} -\label{sec:bacon} - -A classic example of the use of a graph algorithm is the game ``The Six Degrees of Kevin Bacon.'' -The game is played by connecting actors to each other through movies they have appeared in together. -The goal is to find the smallest number of movies that connect a given actor to Kevin Bacon. -That number is called the ``Bacon number'' of the actor. Kevin Bacon himself has a Bacon number of 0. -Since Kevin Bacon appeared with Tom Cruise in ``A Few Good Men'', Tom Cruise has a Bacon number of 1. - -The following program computes the Bacon number for a small selection of actors. - -% \phil{Duplicated in Overview's Examples. OK?} -{\small - \lstinputlisting[firstline=26,lastline=48]{D3126_Overview/src/bacon.cpp} -} - - -\noindent -Output: -\begin{lstlisting} -Tom Cruise has Bacon number 1 -Kevin Bacon has Bacon number 0 -Hugo Weaving has Bacon number 3 -Carrie-Anne Moss has Bacon number 4 -Natalie Portman has Bacon number 2 -Jack Nicholson has Bacon number 1 -Kelly McGillis has Bacon number 2 -Harrison Ford has Bacon number 1 -Sebastian Stan has Bacon number 3 -Mila Kunis has Bacon number 3 -Michelle Pfeiffer has Bacon number 1 -Keanu Reeves has Bacon number 4 -Julia Roberts has Bacon number 1 -\end{lstlisting} - - -In graph parlance, we are creating a graph where the vertices are actors and the edges are movies. -The number of movies that connect an actor to Kevin Bacon is the shortest path in the graph -from Kevin Bacon to that actor. In the example above, we compute shortest paths from Kevin -Bacon to all other actors and print the results. -Note, however, that actor-actor relationships are not how data about actors -is available in the wild (from IMDB, for example). Rather, two available types of data are actor-movie and movie-actor relationships. See Section~\ref{sec:bipartite} below. +% \section{Motivation} \section{Graph Background} % and Terminology} -For clarity, we briefly review some of the basic terminology of graphs. -We use commonly accepted terminology for graph data structures and algorithms and -adopt the particular terminology used in the textbook by -Cormen, Leiserson, Rivest, and Stein (``CLRS'')~\cite{CLRS2022}. +\andrew{We should use a standard latex macro for C++ -- it doesn't really look right in plain text.} + +For clarity in the material contained in other documents, here we briefly review some +of the basic terminology of graphs. We use commonly accepted terminology for graph +data structures and algorithms and specifically adopt the terminology used in the +textbook by Cormen, Leiserson, Rivest, and Stein (``CLRS'')~\cite{CLRS2022}. % In defining terminology that is rich enough, yet precise enough, to be used as the basis of a C++ graph library, we emphasize the difference between a graph (an abstraction of entities in a domain, along with their relationships) and the \emph{representation} of a graph (a structure suitable for use by algorithms and/or for code)\footnote{An example of the kind of ambiguity about graphs arising in typical usage is shown in Appendix~\ref{sec:ambiguity}}. -% We note that because of the precision with which we define representations, there will be some unexpected results. +% +We note that because of the precision with which we define representations, there are results that may be unexpected for some. \section{Summary of Key Takeaways} A very brief summary of our terminology is the following: \begin{itemize} - \item A graph comprises a set of vertices $\{V\}$ and a set of edges $\{E\}$, written $G=\{V, E\}$. + \item A graph comprises a set of \emph{vertices} $\{V\}$ and a set of \emph{edges} $\{E\}$, and is written $G=\{V, E\}$. \item Expressing algorithms (mathematically as well as in code) requires a \emph{representation} of a graph, the most basic of which is an \emph{adjacency matrix}. An adjacency matrix is constructed using an \emph{enumeration} of the vertices, not the vertices themselves. - \item In addition to the (dense) adjacency matrix representation, we consider three sparse representations: coordinate, compressed, and packed coordinate. The sparse forms store \emph{indices} used in the enumeration. - \item The coordinate and compressed forms of the adjacency matrix, as taken from linear algebra, respectively correspond to the graph theoretical representations of \emph{edge list} and \emph{adjacency list}. + \item In addition to the (dense) adjacency matrix representation, we consider three sparse representations: coordinate, compressed, and packed coordinate. The sparse forms store \emph{indices} defined by the enumeration. + \item The \emph{coordinate} and \emph{compressed} forms of the adjacency matrix\footnote{the terems coordinate and compressed are taken from linear algebra.} respectively correspond to representations of the graph theoretical +\emph{edge list} and \emph{adjacency list}. \end{itemize} -\subsection{Basic Terminology} - -To model the relationships between entities, a \emph{graph} $G$ comprises two sets: -a \emph{vertex set} $V$, whose elements correspond to the entities, and an \emph{edge set} $E$, whose -elements are pairs corresponding to elements in $V$ that have some relationship with each other. That is, -if $u$ and $v$ are members of $V$ that have some relationship that we wish to capture, then there is -a pair $\{u, v\}$ in $E$. We write $G=\{V, E\}$ to express that the two sets $V$ and $E$ define a graph $G$. +\section{Basic Terminology} -Figures~\ref{subfig:airport} and~\ref{subfig:instagram} show two examples of graph models, -a network of airline routes between cities and a social network of names and followers. -The figures indicate the domain-specific data to be modeled and the sets $V$ and $E$ for each graph. -Each figure also includes a node and link diagram, a commonly-used graphical\footnote{An unfortunate collision of terminology.} +To model the relationships between entities in some domain, a \emph{graph} $G$ +comprises two sets: a \emph{vertex set} $V$, whose elements correspond to the domain +entities, and an \emph{edge set} $E$, whose elements are pairs corresponding to +elements in $V$ that have some relationship with each other. That is, if $u$ and $v$ +are members of $V$ that have some relationship that we wish to capture, then we can +express that by the existence of a pair $\{u, v\}$ in $E$. We write $G=\{V, E\}$ to +express that the two sets $V$ and $E$ define a graph $G$. We can also describe set +membership of a vertex in $V$ or and edge in $E$ with set notation as $v\in V$ or +$e\in E$, but we will generally try to avoid using too much purely mathematical notation. +Figures~\ref{subfig:airport} and~\ref{subfig:instagram} show two examples of graph +models, a network of airline routes between cities and a social network of names and +followers. The figures indicate the domain-specific data to be modeled and the sets +$V$ and $E$ for each graph. Each figure also includes a node and link diagram, a +commonly-used graphical\footnote{An unfortunate collision of terminology.} notation. + \begin{figure}[ht] \begin{center} \subcaptionbox{An undirected graph representing airline routes between cities. Shown are the list of airports (the vertices) and the list of routes between them (the edges). Also shown are a node and link diagram and the set-based description.\label{subfig:airport}} @@ -182,6 +106,41 @@ \subsection{Graph Representation: Enumerating the Vertices} \end{itemize} + +There are some special cases that deserve mention, as their presence or absence may determine algorithmic properties. +\begin{itemize} +% +\item A \emph{self-loop} is an edge from a vertex $v_i$ to itself, that is, there is an edge ${v_i, v_i}$ in $E$. +% +\item An \emph{isolated vertex} $v_i$ is one that has no edge incident on it, that is, a vertex $v_i$ for which there is no edge $\{v_i, v_j\}$ nor $\{v_j, v_i\}$. +% +\item A \emph{multigraph} is a graph $G$ for which there exist multiple edges between the same vertices, i.e., there are + multiple edges $\{v_i, v_j\}$ for the same $i$ and $j$ in $E$. +% +\item A \emph{hypergraph} is a graph $G=\{V,E\}$ for which the elements of $E$ are arbitrary subsets of $V$. That is, + elements of $E$ may be $\{v_i, v_j, v_k, \ldots\}$. Consideration of hypergraphs is outside the scope of this + proposal. +\item A \emph{hypersparse} graph is a graph for which the enumeration is not contiguous. That is for $V = \{v_i, v_j, v_k, \ldots\}$, with $i < j < k < \ldots$ the set $\{i, j, k, \ldots\}$ may not be contiguous and may not start at $0$. +\item A \emph{path} is sequence of edges $\{v_i, v_j\}, \{v_j, v_k\}, \{v_k, v_l\}, \ldots $ such that every $v_i$ is distinct. + That is, +any $v_i$ appears once and only once in an edge $\{v_j, v_i\}$ and once and only once in an +edge $\{v_i, v_k\}$. +\item A \emph{cycle} is a path such that every vertex appears twice, that is, for every $v_i$ there is an edge + $\{v_j, v_i\}$ and an +edge $\{v_i, v_k\}$. In terms of the sequence above, a path is a cycle if the second vertex of the last edge is the first vertex of the first edge. +\item A \emph{directed acyclic graph (DAG)} is a directed graph with no cycles. +\item A \emph{tree} is a connected graph with no cycles. Trees are a special case of graphs but are important enough that they have their own + rich theory (and corresponding software). As such, we omit trees from this proposal and look forward to + separate library proposals for trees. + \item A \emph{subgraph} of $G=\{V,E\}$ is a graph $H=\{V, F\}$ such that $F$ is a subset of $E$. +\item A \emph{spanning tree} is a subgraph of $G$ that is also a tree. +\end{itemize} +If any of these properties is important to the correct functioning of an algorithm, either positively or negatively, it will be part of the corresponding requirements of the algorithm. In general we assume that graphs are not multigraphs, not hyperspase, and that they do not have self-loops. + +\andrew{We should probably have pictures for all of these -- and others -- saves 1k (or 10k) words.} + + + \subsection{Adjacency-Based Representations} We begin our development of graph representations @@ -302,6 +261,81 @@ \subsection{Adjacency-Based Representations} \end{figure} +\subsection{Incident Matrices} + +An \emph{Incidence matrix} of a directed graph $G$ is a +$|V|\times |E|$ matrix $B = (b_{ij})$ such that +\[ + b_{i j} = + \left\{ + \begin{array}{rl} + -1 & \textrm{if } (v_i, v_j) \in E \\ + 1 & \textrm{if } (v_j, v_i) \in E \\ + 0 & \textrm { otherwise } + \end{array} + \right. + \] + We note that the product $BB^\top$ of an incident matrix $B$ is the adjacency matrix + of the graph $G$, i.e., $G=BB^\top$. + + +\section{Direct Representations} + +Another approach to representing a graph is to model an adjacency list (e.g., Figure~\ref{fig:airport-compressed-sparse-adjacency} or~\ref{fig:instagram-compressed-sparse-adjacency} ) directly. That is, we can represent a vertex as a class and an edge as a class, and use pointers to represent adjacency. + +For example, the following structures could be used to directly represent a graph +\begin{lstlisting} +struct Edge; +struct Vertex; + +struct Vertex { + std::forward_list edges; + std::string name; +}; + +struct Arc { + Vertex* tip; + double distance; +}; + +struct Graph { + std::vector vertices; +}; +\end{lstlisting} + +Much of terminology for graphs still applies in a direct representation, except, of course, we have structures representing the different components of a graph, rather than their indices. There are a number of variations one could consider to this representation, such as using \lstinline{std::vector} rather than \lstinline{std::forward_list} to store outgoing \lstinline{Arc} in a \lstinline{Vertex}. + +Direct representations of graphs will often be implicitly part of other structures (``embedded'') in a given application. +For example, +one might have data structures to represent an electronic circuit: +\begin{lstlisting} + struct two_terminal { + node* from; + node* to; + double current; + }; + + struct resistor : public two_terminal { + double conductance; + }; + + struct capacitor : public two_terminal { + double capacitance; + }; + + struct node { + std::forward_list elements; + double voltage; + }; + + struct circuit { + std::vector nodes; + }; +\end{lstlisting} +Note that, although structures and fields are differently named, a circuit is inherently a direct representation of a graph, with nodes as vertices and two\_terminal elements as edges. + + + \section{Bipartite Graphs} \label{sec:bipartite} @@ -399,6 +433,9 @@ \section{Bipartite Graphs} A structurally bipartite graph explicitly captures distinct vertex categories. % +We note that a structurally bipartite graph may have an edge $(u_i, v_i)$, that is, an edge between two vertices with the same index. Even though $u_i$ and $v_i$ are not the same vertex, we opt to consider such an edge to still be called a self-loop. + + \section{Partitioned Graphs} In contrast to structurally bipartite graphs, there are certainly cases @@ -413,80 +450,6 @@ \section{Partitioned Graphs} We note that partitioned graphs are not restricted to two partitions---a partitioned graph can represent an arbitrary number of partitions, i.e., a \emph{multipartite} graph (a graph with multiple subsets of vertices such that edges only go between subsets). While partitioned graphs can be used to model multipartite graphs, partitioned graphs are not necessarily multipartite; edges can comprise vertices within a partition as well as well as across partitions. -\section{Direct Representations} - -Another approach to representing a graph is to model an adjacency list (e.g., Figure~\ref{fig:airport-compressed-sparse-adjacency} or~\ref{fig:instagram-compressed-sparse-adjacency} ) directly. That is, we can represent a vertex as a class and an edge as a class, and use pointers to represent adjacency. - -For example, the following structures could be used to directly represent a graph (these are simplified, and slightly modernized, versions of the structs used in Donald Knuth's original Stanford Graph Base\footnote{Stanford Graph Base was written in pre-ANSI C}): -\begin{lstlisting} -class Vertex { - Arc* arcs; - int u, v, w; -}; - -class Arc { - Vertex* tip; - Arc* next; - int a, b; -}; - -class Graph { - Vertex* vertices; - int uu, vv, ww; -}; -\end{lstlisting} -In this example, a `lstinline{Graph}` is an array of \lstinline{Vertex}. Each \lstinline{Vertex} has a pointer to a list of \lstinline{Arc} (an edge) and each \lstinline{Arc} has a pointer to its target \lstinline{Vertex}. Each of the classes contains not only the adjacency information, but properties as well (represented here by \lstinline{int} fields in each of the classes); - -Using modern C++ practice, we might instead express this as -\begin{lstlisting} -template -class Vertex { - std::forward_list arcs; - std::tuple props; -}; - -template -class Arc { - Vertex* tip; - std::tuple props; -}; - -template -class Graph { - std::vector vertices; - std::tuple props; -}; -\end{lstlisting} - -Much of terminology for graphs still applies in a direct representation, except, of course, we have structures representing the different components of a graph, rather than their indices. There are a number of variations one could consider to this representation, such as using \lstinline{std::vector} rather than \lstinline{std::forward_list} to store outgoing \lstinline{Arc} in a \lstinline{Vertex}. - -One advantage of a direct representation is that it can be embedded. That is, the \lstinline{Vertex} / \lstinline{Arc} representation may be implicit (or easily incorporated into) existing structures within an application. For example, -one might have data structures to represent an electronic circuit: -\begin{lstlisting} - struct two_terminal { - node* from; - node* to; - double current; - }; - - struct resistor : public two_terminal { - double conductance; - }; - - struct capacitor : public two_terminal { - double capacitance; - }; - - struct node { - std::forward_list elements; - double voltage; - }; - - struct circuit { - std::vector nodes; - }; -\end{lstlisting} -Note that, although structures and fields are differently named, a circuit is inherently a direct representation of a graph. % \andrew{We should have note that anything wrapped in a graph adapter has a `num\_vertices` % method (and other members). @@ -515,6 +478,11 @@ \section{Direct Representations} % \subsection{Edge List and Adjacency List: Compressed Edge List} % \andrew{Using a sort and group-by (or a sort, a run-length encoding, and a scan), we can compactify the edge-list reprentation and at the same time obtain an adjacency-list representation -- one that is memory and compute efficient. Best of both worlds. Has same basic structural principles as CSR / CSC matrices in linear algebra -- but much more general.} +\section{Regarding Algorithms} + +\andrew{We need to note that we can see some abstract properties of graph representations that are universal and can used for defining concepts. I suggest that that be done where we define our concepts.} + + \appendix \section{On Ambiguous Terminology} @@ -619,3 +587,32 @@ \subsection{Converting an Edge List to an Adjacency List} adj_list[dst].push_back (src, val); } \end{lstlisting} + + +\section{Graphs and Sparse Matrices} + +The relationship between graphs and sparse matrices is natural and important enough that a few words are in order. + +In numerical linear algebra, a sparse matrix is one that only stores elements of interest~\footnote{These are typically called ``non-zeroes'', though the stored value could be zero.}. Elements that are not stored are assumed to be zero. The information necessary to use a matrix includes the row index, the column index, and the entry value itself---abstractly a triple $(i, j, v)$ +The elements can be stored in coordinate form, where each triple is stored (either as separate arrays or as tuples in a single array), or in compressed sparse form, which compresses one of the index dimensions and stores the other index and the value. + +A sparse matrix can be considered as a structurally bipartite graph representation. Suppose that we have a sparse matrix $A$ represented as $\{(i, j, a_{ij})\}$. We can create a structurally bipartite $J = \{U, V, E\}$ such that $(i,j)$ is in $E$ if and only if $\{(i, j, a_{ij})\}$ is in $A$. The coordinate and compressed representations for a sparse matrix are the same as for the graph (and, in fact, the terminology ``coordinate'' and ``compressed sparse'' originate in sparse numerical linear algebra). For a matrix, the sets $U$ and $V$ have a particular meaning. Either the indices of $U$ consist of row numbers and $V$ of column numbers, or vice versa. In the former case, a compressed representation is known as ``compressed sparse row.'' In the latter case, it is known as ``compressed sparse column.'' + +\andrew{This code is from nwgraph, need to bring it up to std::graph} +The following code snippet illustrates a sparse matrix vector product when a compressed adjacency representation is interpreted as a compressed sparse row matrix. +\begin{lstlisting}[language=C++] + for (auto&& [row, u_neighbors] : make_neighbor_range(graph)) { + for (auto&& [col, val] : u_neighbors) { + y[row] += x[col] * val; + } + } +\end{lstlisting} + +The following code snippet illustrates a sparse matrix vector product but for a compressed sparse row matrix. +\begin{lstlisting}[language=C++] + for (auto&& [col, u_neighbors] : make_neighbor_range(graph)) { + for (auto&& [row, v] : u_neighbors) { + y[row] += x[col] * v; + } + } +\end{lstlisting}