Free Software Foundation GPLv3 LGPL Electronic Frontier Foundation
Mozilla.org OpenOffice.org Linux.com
If you want to write programs that are compatible with Open-Graphtheory.org, then you need to know how .gxl-files are structured. The same format is used for communication between the programs through Pipes.

GXL (Graph eXchange Language) is an XML format, cooperatively designed by multiple universities around the world. It is even ratified by Dagstuhl and that's saying something. From dozens of graph formats, it was chosen as Open-Graphtheory.orgs standard format (even after a self-designed format had already been implemented) for a reason.
You find GXLs official homepage here, but since there isn't much useful information there (an XSD, a DTD and some overly complicated examples) we decided to write a more practical guide to the format here.

Here are some example GXL files for you to tinker with. A (simplified) map of the german autobahn system

Jump to:

Skeleton
Comments
Graph Declaration
Vertex / Node Declaration
Edge and Arc Declaration
Hypergraphs
Arbitrary Attributes
If your program can't handle a feature
About other formats

Skeleton

A .gxl file can start with a standard XML-declaration. The root-element must be a tag "gxl".
<?xml
version="1.0" encoding="UTF-8"
?>

<!DOCTYPE
gxl SYSTEM "http://www.gupro.de/GXL/gxl-1.0.dtd"
>

<gxl
xmlns:xlink=" http://www.w3.org/1999/xlink"
>

  ...
</gxl>
The simple_xml library (which Open-Graphtheory.org uses internally) currently ignores the declaration, if it is present.

Comments

Comments in GXL are standard XML-comments, starting with <!-- and ending with -->
<!-- www.Open-Graphtheory.org -->
The simple_xml library currently ignores comments, so if you load a file and save it again, comments are lost!

Graph Declaration

The description of one or more graphs starts with a tag "graph" as subtag of the "gxl" root-element. This tag can have optional attributes
  • id in case you transmit multiple graphs and must reference one of them later
  • edgeids (true or false) specifies whether the edges have ids - this attribute is ignored by the opengt library. If edges have ids, they are used and automatically generated otherwise
  • hypergraph (true or false) also ignored, since all graphs are processed as hypergraphs
  • edgemode (directed, undirected, defaultdirected or defaultundirected) this attribute is mostly ignored, it is only used to decide whether an edge is directed or undirected IF it has no isdirected attribute (so opengt interprets "directed" as "defaultdirected" and "undirected" as "defaultundirected")
<graph
id="23" edgeids="true" edgemode="defaultundirected" hypergraph="false"
>

  ...
</graph>

Vertex / Node Declaration

Between the "graph" tags, you can declare a vertex / node by using a tag "node", which must have an attribute "id", whose value must be unique throughout all vertices. coordinates, labels, weights etc. are defined as arbitrary attributes
<node
id="v2"
>

  
<attr
name="name"
>

    
<string>
Hello World!
</string>

  
</attr>

  
<attr
name="location"
>

    
<seq>

      
<float>
100
</float>

      
<float>
50
</float>

      
<float>
25
</float>

    
</seq>

  
</attr>

  
<attr
name="weight"
>

    
<float>
10
</float>

  
</attr>

</node>

Edge and Arc Declaration

An edge is declared by a tag "edge" which can also have an attribute id. Whether edges have ids or not should be daclared in the graph declaration. Open-Graphtheory ignores these ids if they are present, but when writing to a stream, they are written because this can do no harm.
To specify which vertices are connected by the edge, you use attributes from and to which contain the id of the respective vertex. To specify whether the edge is directed or undirected, the attribute isdirected is used (value "true" or "false"). Notice that this attribute can be omitted, if the attribute edgemode is set accordingly in the graph declaration. Using isdirected overrides the edgemode defaultdirected/defaultundirected, but in Open-Graphtheory, it also overrides the edgemode directed/undirected, which does not exactly obey the GXL standard. It's just a modification that will usually have no effect, but could fix errors in some GXL documents. If the edgemode is not set, the GXL standard defines "directed" as the default value.
Edges can have a label and a weight. These values are stored in arbitrary attributes name (string) and weight (float)
<edge
from="v1" to="v2" isdirected="true" id="e1"
/>

<edge
from="v2" to="v3" isdirected="false" id="e2"
>

  
<attr
name="name"
>

    
<string>
This is an edge from v2 to v3
</string>

  
</attr>

  
<attr
name="weight"
>

    
<float>
42
</float>

  
</attr>

</edge>

Hypergraphs

The GXL standard and Open-Graphtheorys implementation support Hypergraphs. A Hypergraph is a graph that contains at least one hyper-edge, which is an edge that connects more or less than two vertices. If it "connects" 1 vertex, it is called a half edge, if it "connects" 0 vertices, it is called a loose edge. Notice the difference between a half edge and a loop: a loop contains the same vertex twice, whereas a loose edge contains it only once. A hyperedge can also be an edge connecting 2 vertices, that is "directed on one end and undirected on the other end" or it is positively/negatively connected to both ends
To define a hyperedge, you use a tag "rel". Since there can be more or less than 2 connections to vertices, the connections cannot be defined as attributes of the tag - they must be declared individually. To declare a connection of a hyperedge to a vertex, you use sub-tags "relend" with the attributes target (containing the id of the one vertex) and direction with value "in", "out" or "none". You might think of a vertex as a room and a hyperedge as a hallway with multiple doors - some of which being one-way revolving doors.
<rel
id="e3"
>

  
<relend
target="v1" direction="in"
/>

  
<relend
target="v2" direction="out"
/>

  
<relend
target="v3" direction="none"
/>

  
<attr
name="name"
>

    
<string>
This is a hyperedge
</string>

  
</attr>

  
<attr
name="weight"
>

    
<float>
42
</float>

  
</attr>

</rel>

Please note

"rel" tags can have arbitrary attributes, but "relend" tags can not.

Now you might argue that a hallway could be viewed as a room also. When you look at visual representations of hypergraphs, hyperedges are often depicted like vertices with different shapes and connections between hyperedges and vertices become lines/arcs just like edges/arcs between vertices in non-hypergraphs. The borders are blurry here, because these things are logically the same. Hypergraphs and attributed graphs are just two different interpretations of the same logical structure.

Arbitrary attributes

Arbitrary attributes are one of the main reasons, why we chose the GXL format as OpenGraphtheorys standard-format. With arbitrary attributes you can add many additional informations to graphs, vertices and edges.
To add an arbitrary attribute to a graph, vertex or edge, you just add additional sub-tags to the respective xml-tags. The additional tag must have the name attr and an xml-attribute name, that contains the name under which you want to find the attribute in your program. The actual type of the data and it's content is stored in a sub-sub-tag
<node id="v1">

  
<attr
name="color1"
>

    
<int>
23
</int>

  
</attr>

  
<attr
name="color2"
>

    
<int>
42
</int>

  
</attr>

</node>
According to the GXL standard, attributes can have the following types:
  • bool e.g. <bool>true</bool>
  • int e.g. <int>23</int>
  • float e.g. <float>13.37</float>
  • string e.g. e.g. <string>foobar</string>
also, you can add composite attributes, which contain more attributes in sub-tags
  • seq (sequence)
  • set
  • tup (tuple)
  • bag
but they all have the same structure in GXLs document type definition so they are logically the same. The only difference they can make is by using different container classes (std::set, std::vector, std::list) which might have an impact on running times - although the choice of datastructure should NOT be a concern of a filetype - especially not in case of an XML-file, because XML is all about the logic of the data - representation in memory is what it is supposed to abstract from!

Please note that the GXL standard demands that composite attributes contain only attributes of the same type, although Open-Graphtheorys class hierarchy allows you to have any combination of values here.

If your program can't handle a feature

Whether you are to lazy to write your program abstract enough or your algorithm is based on a theorem that only holds for certain graphs or maybe your program computes a property which isn't even defined for all graph-classes - that's fine. Just make sure your program rejects graphs that it cannot handle! To help you with this, open-graphtheory offers the following boolean methods:
  • bool Graph::EdgeIterator::IsHyperedge()
  • bool Graph::HasHyperedges()
  • bool Graph::ArcIterator::IsHyperarc()
  • bool Graph::HasHyperarcs()
  • bool Graph::Hypergraph()

  • bool Graph::Directed()
  • bool Graph::Undirected()
  • bool Graph::Mixed()

  • bool Graph::VertexIterator::HasParallelEdges()
  • bool Graph::HasParallelEdges()
  • bool Graph::VertexIterator::HasParallelArcs()
  • bool Graph::HasParallelArcs()
  • bool Graph::EdgeIterator::IsLoop()
  • bool Graph::HasLoops()
  • bool Graph::Simple()
  • bool Graph::Multigraph()
  • bool Graph::Multidigraph()
For example
if
( ! ( G.Undirected() && G.Simple())) { cerr <<
"illegal file"
;
return
1
; }
If you want to contribute your algorithm to the open-graphtheory project, then we will try to incorporate it in a library for absolutely general graphs, which selects your algorithm in cases where it can be applied and that uses worse algorithms whenever neccessary.

About other formats

One of the main goals of the open-graphtheory project is establishing a common open standard for a graph-filetype. There are several other graph-fileformats already, but we think they are not sufficient to be the common standard for various reasons.
None the less, as open source advocates we support freedom of choice and actively fight vendor lock-in's, so we have import and export filters for almost all of these filetypes.
Tom Sawyer Software
  • (The format is) not open. This is also why we don't even know about more possible shortcommings
VRMLgraph
  • Unmaintained since 2013, no progress since 2001
  • No documentation - you have to read the parsers' sourcecode to learn the syntax. Actually, the parser looks like a broken parser for the TGF format, but the format is definitely different from TGF because if a line contains only one identifier, StringTokenizer. nextToken throws an exception in line 71 of GraphFileInputParser.java. This exception is caught, but rethrown in line 136 and causes termination of all the DemoMain programs (see lines 35 or 36 respectively in the DemoMain programs)
  • Not uniquely decodable (if it comes through a stream and there is more data after the graph)
  • No hypergraphs
  • No directed or mixed graphs
  • No isolated vertices
  • No loops
  • No labels, weights and coordinates
TGF
  • The files contain no information about whether the graph is to be interpreted as directed or undirected graph
  • Not uniquely decodable (if it comes through a stream and there is more data after the graph)
  • No mixed graphs
  • No hypergraphs
  • No weights, X- and Y-coordinates
TEI
  • No mixed graphs
  • No hypergraphs
  • No weights, X- and Y-coordinates
GML
  • Defines its own language for description of attributed tree-structures (instead of going with a format that is already commonly used, like XML)
  • Very much fitted to the application of drawing graphs in the Tk framework (infact, the definition includes a full vector-graphic format) instead of general computations in any framework and any programming language
  • No hypergraphs
  • Distinction between edges and arcs through a graphics attribute (type arc / type line)
  • No weights
RGML
XGMML
  • Is the same as GML, except that XGMML uses XML and has weights, so GML's first and last drawbacks drop out
  • Bad documentation (a DTD is only a sufficient documentation IF you are a full XML parser!)
DOT
  • Extremely fitted to the application of graph-drawing (this should be no surprise - dot is designed for the GraphViz software collection)
  • No hypergraphs
  • No weights
  • Distinction between edges and arcs through a graphics attribute (dir="forward" / dir="none"). Yet the header contains an information about whether the graph is directed or undirected (but not mixed!)
Don't get this wrong - GraphViz is a great software collection and DOT is probably the best format for that one application, but all these graphics attributes are just overkill for a general-purpose standard
GraphML GraphML is a good format. Especially the ports are an interesting feature that we might steal some day
Contributors Disclaimer