GVPR(1) GVPR(1)
NAME
gvpr - graph pattern scanning and processing language
( previously known as gpr )
SYNOPSIS
gvpr [-icV?] [ -o outfile ] [ -a args ] [ 'prog' | -f progfile ] [
files ]
DESCRIPTION
gvpr is a graph stream editor inspired by awk. It copies input graphs
to its output, possibly transforming their structure and attributes,
creating new graphs, or printing arbitrary information. The graph
model is that provided by libagraph(3). In particular, gvpr reads and
writes graphs using the dot language.
Basically, gvpr traverses each input graph, denoted by $G, visiting
each node and edge, matching it with the predicate-action rules sup-
plied in(1,8) the input program. The rules are evaluated in(1,8) order. For
each predicate evaluating to true, the corresponding action is per-
formed. During the traversal, the current node or edge being visited
is denoted by $.
For each input graph, there is a target subgraph, denoted by $T, ini-
tially empty and used to accumulate chosen entities, and an output
graph, $O, used for final processing and then written to output. By
default, the output graph is the target graph. The output graph can be
set(7,n,1 builtins) in(1,8) the program or, in(1,8) a limited sense, on the command line.
OPTIONS
The following options are supported:
-a args
The string(3,n) args is split(1,n) into whitespace-separated tokens, with
the individual tokens available as strings in(1,8) the gvpr program
as ARGV[0],...,ARGV[ARGC-1].
-c Use the source graph as the output graph.
-i Derive the node-induced subgraph extension of the output graph
in(1,8) the context of its root graph.
-o outfile
Causes the output stream to be written to the specified file(1,n); by
default, output is written to stdout.
-f progfile
Use the contents of the specified file(1,n) as the program to execute
on the input. If -f is not given, gvpr will use the first non-
option argument as the program.
-V Causes the program to print version(1,3,5) information and exit.
-? Causes the program to print usage information and exit.
OPERANDS
The following operand is supported:
files Names of files containing 1 or more graphs in(1,8) the dot language.
If no -f option is given, the first name is removed from the
list and used as the input program. If the list of files is
empty, stdin will be used.
PROGRAMS
A gvpr program consists of a list of predicate-action clauses, having
one of the forms:
BEGIN { action }
BEG_G { action }
N [ predicate ] { action }
E [ predicate ] { action }
END_G { action }
END { action }
A program can contain at most one of each of the BEGIN, BEG_G, END_G
and END clauses. There can be any number of N and E statements, the
first applied to nodes, the second to edges. The top-level semantics
of a gvpr program are: Evaluate the BEGIN clause, if(3,n) any. For each
input graph G {
Set G as the current graph and current object.
Evaluate the BEG_G clause, if(3,n) any.
For each node and edge in(1,8) G {
Set the node or edge as the current object.
Evaluate the N or E clauses, as appropriate.
}
Set G as the current object.
Evaluate the END_G clause, if(3,n) any. } Evaluate the END clause, if(3,n)
any. The actions of the BEGIN, BEG_G, END_G and END clauses are per-
formed when the clauses are evaluated. For N or E clauses, either the
predicate or action may be omitted. If there is no predicate with an
action, the action is performed on every node or edge, as appropriate.
If there is no action and the predicate evaluates to true, the associ-
ated node or edge is added to the target graph.
Predicates and actions are sequences of statements in(1,8) the C dialect
supported by the libexpr(3) library. The only difference between pred-
icates and actions is that the former must have a type that may inter-
preted as either true or false. Here the usual C convention is fol-
lowed, in(1,8) which a non-zero value is considered true. This would include
non-empty strings and non-empty references to nodes, edges, etc. How-
ever, if(3,n) a string(3,n) can be converted to an integer, this value is used.
In addition to the usual C base types (void, int, char, float, long,
unsigned and double), gvpr provides string(3,n) as a synonym for char*, and
the graph-based types node_t, edge_t, graph_t and obj_t. The obj_t
type can be viewed as a supertype of the other 3 concrete types; the
correct base type is maintained dynamically. Besides these base types,
the only other supported type expressions are (associative) arrays.
Constants follow C syntax, but strings may be quoted with either "..."
or '...'. In certain contexts, string(3,n) values are interpreted as pat-
terns for the purpose of regular expression matching. Patterns use
ksh(1) file(1,n) match pattern syntax. gvpr uses C++ comments.
A statement can be a declaration of a function, a variable or an array,
or an executable statement. For declarations, there is a single scope.
Array declarations have the form:
type array [ var ]
where the var is optional. As in(1,8) C, variables and arrays must be
declared. In particular, an undeclared variable will be interpreted as
the name of an attribute of a node, edge or graph, depending on the
context.
Executable statements can be one of the following:
{ [ statement ... ] }
expression // commonly var = expression
if(3,n)( expression ) statement [ else statement ]
for( expression ; expression ; expression ) statement
for( array [ var ]) statement
while( expression ) statement
switch(1,n)( expression ) case statements
break [ expression ]
continue [ expression ]
return [ expression ]
In the second form of the for statement, the variable var is set(7,n,1 builtins) to
each value used as an index in(1,8) the specified array and then the associ-
ated statement is evaluated. Function definitions can only appear in(1,8)
the BEGIN clause.
Expressions include the usual C expressions. String comparisons using
== and != treat the right hand operand as a pattern. gvpr will attempt
to use an expression as a string(3,n) or numeric value as appropriate.
Expressions of graphical type (i.e., graph_t, node_t, edge_t, obj_t)
may be followed by a field reference in(1,8) the form of .name. The result-
ing value is the value of the attribute named(5,8) name of the given object.
In addition, in(1,8) certain contexts an undeclared, unmodified identifier
is taken to be an attribute name. Specifically, such identifiers denote
attributes of the current node or edge, respectively, in(1,8) N and E
clauses, and the current graph in(1,8) BEG_G and END_G clauses.
As usual in(1,8) the libagraph(3) model, attributes are string-valued. In
addition, gvpr supports certain pseudo-attributes of graph objects, not
necessarily string-valued. These reflect intrinsic properties of the
graph objects and cannot be set(7,n,1 builtins) by the user.
head : node_t
the head of an edge.
tail : node_t
the tail of an edge.
name : string(3,n)
the name of an edge, node or graph. The name of an edge has the
form "<tail-name><edge-op><head-name>[<key>]", where <edge-op>
is "->" or "--" depending on whether the graph is directed or
not. The bracket part [<key>] only appears if(3,n) the edge has a
non-trivial key.
indegree : int
the indegree of a node.
outdegree : int
the outdegree of a node.
degree : int
the degree of a node.
root : graph_t
the root graph of an object. The root of a root graph is itself.
parent : graph_t
the parent graph of a subgraph. The parent of a root graph is
NULL
n_edges : int
the number of edges in(1,8) the graph
n_nodes : int
the number of nodes in(1,8) the graph
directed : int
true (non-zero) if(3,n) the graph is directed
strict : int
true (non-zero) if(3,n) the graph is strict
BUILT-IN FUNCTIONS
The following functions are built into gvpr. Those functions returning
references to graph objects return NULL in(1,8) case of failure.
Graphs and subgraph
graph(s : string(3,n), t : string(3,n)) : graph_t
creates a graph whose name is s and whose type is specified by
the string(3,n) t. Ignoring case, the characters U, D, S, N have the
interpretation undirected, directed, strict, and non-strict,
respectively. If t is empty, a directed, non-strict graph is
generated.
subg(g : graph_t, s : string(3,n)) : graph_t
creates a subgraph in(1,8) graph g with name s. If the subgraph
already exists, it is returned.
isSubg(g : graph_t, s : string(3,n)) : graph_t
returns the subgraph in(1,8) graph g with name s, if(3,n) it exists, or
NULL otherwise.
fstsubg(g : graph_t) : graph_t
returns the first subgraph in(1,8) graph g, or NULL if(3,n) none exists.
nxtsubg(sg : graph_t) : graph_t
returns the next subgraph after sg, or NULL.
isDirect(g : graph_t) : int
returns true if(3,n) and only if(3,n) g is directed.
isStrict(g : graph_t) : int
returns true if(3,n) and only if(3,n) g is strict.
nNodes(g : graph_t) : int
returns the number of nodes in(1,8) g.
nEdges(g : graph_t) : int
returns the number of edges in(1,8) g.
Nodes
node(sg : graph_t, s : string(3,n)) : node_t
creates a node in(1,8) graph g of name s. If such a node already
exists, it is returned.
subnode(sg : graph_t, n : node_t) : node_t
inserts the node n into the subgraph g. Returns the node.
fstnode(g : graph_t) : node_t
returns the first node in(1,8) graph g, or NULL if(3,n) none exists.
nxtnode(n : node_t) : node_t
returns the next node after n, or NULL.
isNode(sg : graph_t, s : string(3,n)) : node_t
looks for a node in(1,8) graph g of name s. If such a node exists, it
is returned. Otherwise, NULL is returned.
Edges
edge(t : node_t, h : node_t, s : string(3,n)) : edge_t
creates an edge with tail node t, head node h and name s. If the
graph is undirected, the distinction between head and tail nodes
is unimportant. If such an edge already exists, it is returned.
subedge(g : graph_t, e : edge_t) : edge_t
inserts the edge e into the subgraph g. Returns the edge.
isEdge(t : node_t, h : node_t, s : string(3,n)) : edge_t
looks for an edge with tail node t, head node h and name s. If
the graph is undirected, the distinction between head and tail
nodes is unimportant. If such an edge exists, it is returned.
Otherwise, NULL is returned.
fstout(n : node_t) : edge_t
returns the first out edge of node n.
nxtout(e : edge_t) : edge_t
returns the next out edge after e.
fstin(n : node_t) : edge_t
returns the first in(1,8) edge of node n.
nxtin(e : edge_t) : edge_t
returns the next in(1,8) edge after e.
fstedge(n : node_t) : edge_t
returns the first edge of node n.
nxtedge(e : edge_t) : edge_t
returns the next edge after e.
Graph I/O
write(1,2)(g : graph_t) : void
prints g in(1,8) dot format onto the output stream.
writeG(g : graph_t, fname : string(3,n)) : void
prints g in(1,8) dot format into the file(1,n) fname.
fwriteG(g : graph_t, fd : int) : void
prints g in(1,8) dot format onto the open(2,3,n) stream denoted by the inte-
ger fd.
readG(fname : string(3,n)) : graph_t
returns a graph read(2,n,1 builtins) from the file(1,n) fname. The graph should be in(1,8)
dot format. If no graph can be read(2,n,1 builtins), NULL is returned.
freadG(fd : int) : graph_t
returns the next graph read(2,n,1 builtins) from the open(2,3,n) stream fd. Returns
NULL at end of file.
Graph miscellany
delete(g : graph_t, x : obj_t) : void
deletes object x from graph g. If g is NULL, the function uses
the root graph of x. If x is a graph or subgraph, it is closed
unless x is locked.
isIn(g : graph_t, x : obj_t) : int
returns true if(3,n) x is in(1,8) subgraph g. If x is a graph, this indi-
cates that g is the immediate parent graph of x.
clone(g : graph_t, x : obj_t) : obj_t
creates a clone of object x in(1,8) graph g. In particular, the new
object has the same name/value attributes and structure as the
original object. If an object with the same key as x already
exists, its attributes are overlaid by those of x and the object
is returned. If an edge is cloned, both endpoints are implic-
itly cloned. If a graph is cloned, all nodes, edges and sub-
graphs are implicitly cloned. If x is a graph, g may be NULL,
in(1,8) which case the cloned object will be a new root graph.
copy(g : graph_t, x : obj_t) : obj_t
creates a copy of object x in(1,8) graph g, where the new object has
the same name/value attributes as the original object. If an
object with the same key as x already exists, its attributes are
overlaid by those of x and the object is returned. Note that
this is a shallow copy. If x is a graph, none of its nodes,
edges or subgraphs are copied into the new graph. If x is an
edge, the endpoints are created if(3,n) necessary, but they are not
cloned. If x is a graph, g may be NULL, in(1,8) which case the
cloned object will be a new root graph.
induce(g : graph_t) : void
extends g to its node-induced subgraph extension in(1,8) its root
graph.
compOf(g : graph_t, n : node_t) : graph_t
returns the connected component of the graph g containing node
n, as a subgraph of g. The subgraph only contains the nodes. One
can use induce to add the edges. The function fails and returns
NULL if(3,n) n is not in(1,8) g. Connectivity is based on the underlying
undirected graph of g.
lock(g : graph_t, v : int) : int
implements graph locking on root graphs. If the integer v is
positive, the graph is set(7,n,1 builtins) so that future calls to delete have
no immediate effect. If v is zero, the graph is unlocked. If
there has been a call to delete the graph while it was locked,
the graph is closed. If v is negative, nothing is done. In all
cases, the previous lock value is returned.
Strings
sprintf(fmt : string(3,n), ...) : string(3,n)
returns the string(3,n) resulting from formatting the values of the
expressions occurring after fmt according to the printf(1,3,1 builtins)(3) for-
mat fmt
gsub(str : string(3,n), pat : string(3,n)) : string(3,n)
gsub(str : string(3,n), pat : string(3,n), repl : string(3,n)) : string(3,n)
returns str with all substrings matching pat deleted or replaced
by repl, respectively.
sub(str : string(3,n), pat : string(3,n)) : string(3,n)
sub(str : string(3,n), pat : string(3,n), repl : string(3,n)) : string(3,n)
returns str with the leftmost substring matching pat deleted or
replaced by repl, respectively. The characters '^' and '$' may
be used at the beginning and end, respectively, of pat to anchor
the pattern to the beginning or end of str.
substr(str : string(3,n), idx : int) : string(3,n)
substr(str : string(3,n), idx : int, len : int) : string(3,n)
returns the substring of str starting at position idx to the end
of the string(3,n) or of length len, respectively. Indexing starts
at 0. If idx is negative or idx is greater than the length of
str, a fatal error(8,n) occurs. Similarly, in(1,8) the second case, if(3,n) len
is negative or idx + len is greater than the length of str, a
fatal error(8,n) occurs.
length(s : string(3,n)) : int
returns the length of the string(3,n) s.
index(s : string(3,n), t : string(3,n)) : int
returns the index of the character in(1,8) string(3,n) s where the left-
most copy of string(3,n) t can be found, or -1 if(3,n) t is not a sub-
string(3,n) of s.
match(s : string(3,n), p : string(3,n)) : int
returns the index of the character in(1,8) string(3,n) s where the left-
most match of pattern p can be found, or -1 if(3,n) no substring of s
matches p.
canon(s : string(3,n)) : string(3,n)
returns a version(1,3,5) of s appropriate to be used as an identifier
in(1,8) a dot file.
xOf(s : string(3,n)) : string(3,n)
returns the string(3,n) "x" if(3,n) s has the form "x,y", where both x and
y are numeric.
yOf(s : string(3,n)) : string(3,n)
returns the string(3,n) "y" if(3,n) s has the form "x,y", where both x and
y are numeric.
llOf(s : string(3,n)) : string(3,n)
returns the string(3,n) "llx,lly" if(3,n) s has the form
"llx,lly,urx,ury", where all of llx, lly, urx, and ury are
numeric.
urOf(s)
urOf(s : string(3,n)) : string(3,n) returns the string(3,n) "urx,ury" if(3,n) s has
the form "llx,lly,urx,ury", where all of llx, lly, urx, and ury
are numeric.
I/O
print(...) : void
print( expr(1,3,n), ... ) prints a string(3,n) representation of each argu-
ment in(1,8) turn onto stdout, followed by a newline.
printf(1,3,1 builtins)(fmt : string(3,n), ...) : int
printf(1,3,1 builtins)(fd : int, fmt : string(3,n), ...) : int
prints the string(3,n) resulting from formatting the values of the
expressions following fmt according to the printf(1,3,1 builtins)(3) format fmt.
Returns 0 on success. By default, it prints on stdout. If the
optional integer fd is given, output is written on the open(2,3,n)
stream associated with fd.
openF(s : string(3,n), t : string(3,n)) : int
opens the file(1,n) s as an I/O stream. The string(3,n) argument t speci-
fies how the file(1,n) is opened. The arguments are the same as for
the C function fopen(3). It returns an integer denoting the
stream, or -1 on error.
As usual, streams 0, 1 and 2 are already open(2,3,n) as stdin, stdout,
and stderr, respectively. Since gvpr may use stdin to read(2,n,1 builtins) the
input graphs, the user should avoid using this stream.
closeF(fd : int) : int
closes the open(2,3,n) stream denoted by the integer fd. Streams 0, 1
and 2 cannot be closed. Returns 0 on success.
readL(fd : int) : string(3,n)
returns the next line read(2,n,1 builtins) from the input stream fd. It returns
the empty string(3,n) "" on end of file. Note that the newline char-
acter is left in(1,8) the returned string.
Math
exp(d : double) : double
returns e to the dth power.
log(d : double) : double
returns the natural log of d.
sqrt(d : double) : double
returns the square root of the double d.
pow(d : double, x : double) : double
returns d raised to the xth power.
cos(d : double) : double
returns the cosine of d.
sin(d : double) : double
returns the sine of d.
atan2(y : double, x : double) : double
returns the arctangent of y/x in(1,8) the range -pi to pi.
Miscellaneous
exit(3,n,1 builtins)() : void
exit(3,n,1 builtins)(v : int) : void
causes gvpr to exit(3,n,1 builtins) with the exit(3,n,1 builtins) code v. v defaults to 0 if(3,n)
omitted.
rand(1,3)() : double
returns a pseudo-random double between 0 and 1.
srand() : int
srand(v : int) : int
sets a seed for the random(3,4,6) number generator. The optional argu-
ment gives the seed; if(3,n) it is omitted, the current time(1,2,n) is used.
The previous seed value is returned. srand should be called
before any calls to rand(1,3).
BUILT-IN VARIABLES
gvpr provides certain special, built-in variables, whose values are set(7,n,1 builtins)
automatically by gvpr depending on the context. Except as noted, the
user cannot modify their values.
$ : obj_t
denotes the current object (node, edge, graph) depending on the
context. It is not available in(1,8) BEGIN or END clauses.
$F : string(3,n)
is the name of the current input file.
$G : graph_t
denotes the current graph being processed. It is not available
in(1,8) BEGIN or END clauses.
$O : graph_t
denotes the output graph. Before graph traversal, it is initial-
ized to the target graph. After traversal and any END_G actions,
if(3,n) it refers to a non-empty graph, that graph is printed onto
the output stream. It is only valid in(1,8) N, E and END_G clauses.
The output graph may be set(7,n,1 builtins) by the user.
$T : graph_t
denotes the current target graph. It is a subgraph of $G and is
available only in(1,8) N, E and END_G clauses.
$tgtname : string(3,n)
denotes the name of the target graph. By default, it is set(7,n,1 builtins) to
"gvpr_result". If used multiple times during the execution of
gvpr, the name will be appended with an integer. This variable
may be set(7,n,1 builtins) by the user.
$tvroot : node_t
indicates the starting node for a (directed or undirected)
depth-first traversal of the graph (cf. $tvtype below). The
default value is NULL for each input graph.
$tvtype : tvtype_t
indicates how gvpr traverses a graph. At present, it can only
take one of six values: TV_flat, TV_dfs, TV_fwd, TV_ref, TV_ne,
and TV_en. TV_flat is the default. The meaning of these values
is discussed below.
ARGC : int
denotes the number of arguments specified by the -a args com-
mand-line argument.
ARGV : string(3,n) array
denotes the array of arguments specified by the -a args command-
line argument. The ith argument is given by ARGV[i].
BUILT-IN CONSTANTS
There are several symbolic constants defined by gvpr.
NULL : obj_t
a null object reference, equivalent to 0.
TV_flat : tvtype_t
a simple, flat traversal, with graph objects visited in(1,8) seem-
ingly arbitrary order.
TV_ne : tvtype_t
a traversal which first visits all of the nodes, then all of the
edges.
TV_en : tvtype_t
a traversal which first visits all of the edges, then all of the
nodes.
TV_dfs : tvtype_t
a traversal of the graph using a depth-first search on the
underlying undirected graph. To do the traversal, gvpr will
check the value of $tvroot. If this has the same value that it
had previously (at the start, the previous value is initialized
to NULL.), gvpr will simply look(1,8,3 Search::Dict) for some unvisited node and
traverse its connected component. On the other hand, if(3,n) $tvroot
has changed, its connected component will be toured, assuming it
has not been previously visited or, if(3,n) $tvroot is NULL, the tra-
versal will stop. Note that using TV_dfs and $tvroot, it is pos-
sible to create an infinite loop.
TV_fwd : tvtype_t
a traversal of the graph using a depth-first search on the graph
following only forward arcs. In libagraph(3), edges in(1,8) undi-
rected graphs are given an arbitrary direction, which is used
for this traversal. The choice of roots for the traversal is the
same as described for TV_dfs above.
TV_rev : tvtype_t
a traversal of the graph using a depth-first search on the graph
following only reverse arcs. In libagraph(3), edges in(1,8) undi-
rected graphs are given an arbitrary direction, which is used
for this traversal. The choice of roots for the traversal is the
same as described for TV_dfs above.
EXAMPLES
gvpr -i 'N[color=="blue"]' file.dot Generate the node-induced subgraph
of all nodes with color blue. gvpr -c 'N[color=="blue"]{color =
"red"}' file.dot Make all blue nodes red. BEGIN { int n, e; int tot_n
= 0; int tot_e = 0; } BEG_G {
n = nNodes($G);
e = nEdges($G);
printf(1,3,1 builtins) ("%d nodes %d edges %s0, n, e, $G.name);
tot_n += n;
tot_e += e; } END { printf(1,3,1 builtins) ("%d nodes %d edges total0, tot_n, tot_e)
} Version of the program gc. gvpr -c "" Equivalent to nop. BEG_G {
graph_t g = graph ("merge(1,8)", "S"); } E {
node_t h = clone(g,$.head);
node_t t = clone(g,$.tail);
edge_t e = edge(t,h,"");
e.weight = e.weight + 1; } END_G { $O = g; } Produces a strict ver-
sion(1,3,5) of the input graph, where the weight attribute of an edge indi-
cates how many edges from the input graph the edge represents. BEGIN
{node_t n; int deg[]} E{deg[head]++; deg[tail]++; } END_G {
for (deg[n]) {
printf(1,3,1 builtins) ("deg[%s] = %d0, n.name, deg[n]);
} } Computes the degrees of nodes with edges.
BUGS
When the program is given as a command line argument, the usual shell
interpretation takes place, which may affect some of the special names
in(1,8) gvpr. To avoid this, it is best to wrap the program in(1,8) single
quotes.
The constants TV_flat, TV_dfs, TV_fwd, and TV_rev
There is a single scope and the extent of all variables is the entire
life of the program. It might be preferable for scope to reflect the
natural nesting of the clauses, or for the program to at least reset(1,7,1 tput)
locally declared variables.
The expr(1,3,n) library does not support string(3,n) values of (char*)0. This
means we can't distinguish between "" and (char*)0 edge keys. For the
purposes of looking up and creating edges, we translate "" to be
(char*)0, since this latter value is necessary in(1,8) order to look(1,8,3 Search::Dict) up any
edge with a matching head and tail.
The language inherits the usual C problems such as dangling references
and the confusion between '=' and '=='.
AUTHOR
Emden R. Gansner <erg@research.att.com>
SEE ALSO
awk(1), gc(1), dot(1), nop(1), libexpr(3), libagraph(3)
14 November 2003 GVPR(1)