Skip to content
RishabhBhatnagar edited this page Aug 23, 2020 · 1 revision

Welcome to the wiki of gordf parser. This document will act as a report for the work done by Rishabh Bhatnagar in the period of Google Summer Of Code (2020). Checkout gsoc-logs to see a detailed and time-stamped log of tasks and explanation about the changes made to the original design and some key considerations in the development phase.


Table Of Contents


Project Guided and Mentored By

  1. Gary O'Neall
  2. Rohit Lodha
  3. Stephen Winslow

Student Details

Name Rishabh Bhatnagar
Github /RishabhBhatnagar
Email /bhatnagarrishabh4
Linkedin /bhatnagar-rishabh

Overview of GoRDF

GoRDF is a library made using GoLang which facilitates a RDF/XML parser which can be used to deal with RDF data format. Since the RDF/XML is a data format that is heirarchial in nature and is independent of the other structures in the document, there are tags which can be parsed without any dependency on the other tags. GoRDF makes an extensive use of concurrency feature of the GoLang to facilitate computational speedups without changing semantics of the linear RDF parser. The Library provides two major Functionalities:

  1. RDFLoader: Generates triples from a RDF/XML file.
  2. RDFWriter: Writes triples to a RDF/XML file.

RDFLoader

The module was written and developed in two phases:

  1. XML Reader: Parses the XML structure of the file and returns a rootBlock if the input xml file is a valid RDF document else reports an error.
  2. RDF Parser: Uses the rootBlock given as an input by the previous phase to generate RDF Triples out of it.

Why are two phases needed?

  1. For representing a RDF file, we can have many data formats like xml, json, NTriples, yaml, etc.
  2. Two phase allows the code to be easily switched among the data fragments without rewriting the entire code.
  3. For adding a support for new representation of RDF format, programmer just have to parse the given file structure and store it into the block formats without writing the entire code to generate the rdf triples out of it.
  4. Easier Testing and Debugging.

Code Entry point

github.com/spdx/tools-golang/gordf/rdfloader

RDFLoader Phase 1: XML Reader

Description

This is the first phase in the rdf loader provided by the GoRDF module. It provides an interface for reading the XML file and returning a rootBlock or the error if any encountered while parsing the xml structure. XML Reader acts as a dependency of the RDF Loader. This two phase structure of the GoRDF allows the programmers to easily change the XML Reader with any other reader of other representations of the RDF Format.

Code Entry Point

github.com/spdx/gordf/rdfloader/xmlreader


Construction And Invocation

Invocation Method: XMLReader.Read()

XMLReader Obtained from:

  1. XMLReaderFromFileObject
  2. XMLReaderFromFilePath

Returns

Read function returns a RootBlock:

Grammar And Definitions of Return Arguments:

RootBlock -> Block
Block -> OpeningTag Children BlockValue| BLOCK
OpeningTag -> Tag
Tag -> SchemaName Name Attributes | TAG
Attributes -> SingleAttribute Attributes | ϵ
SingleAttribute -> Name SchemaName AttributeValue | ATTRIBUTE
Name -> STRING
SchemaName -> STRING
AttributeValue -> URI_STRING
Children -> Blocks
Blocks -> Block Blocks | ϵ
BlockValue -> STRING

Tree Structure of Return Arguments

RootBlock  
└───Block  
    ├───BlockValue  
    │   └───STRING  
    ├───Children  
    │   └───Blocks  
    │       ├───Block  
    │       └───ϵ  
    └───OpeningTag  
        ├───Attributes  
        │   ├───Attributes  
        │   ├───SingleAttribute  
        │   │   ├───AttributeValue  
        │   │   │   └───URI_STRING  
        │   │   ├───Name  
        │   │   │   └───STRING  
        │   │   └───SchemaName  
        │   │       └───STRING  
        │   └───ϵ  
        ├───Name  
        │   └───STRING  
        └───SchemaName  
            └───STRING  

Examples:

Example For XML Reader: 1-xmlreader


Associated Significant Commits:

  1. 3e71687: Initial version of XMLReader
  2. d8bd2d8: Example For XML-Reader
  3. c428e8a: New Package for dealing with URIs
  4. d3bb55e: Splitted xmlReader into Utils and Reader for modularity
  5. aee9bfa: Wrote Tests for XML-Reader utils

RDFLoader Phase 2: RDF Parser

Description

This is the second phase in the rdf loader provided by the GoRDF module. It provides an interface for generating triples from the rootBlock the error if any encountered. RDF Parser acts as a dependency of the RDF Loader. The RDF Parser works independently without any dependency from the previous phase. User can easily change the phase one and provide relevant data structures required by this phase for parsing.


Code Entry Point

github.com/spdx/gordf/rdfloader/rdfparser


Construction And Invocation

Invocation Method: Parse( RootBlock )


Returns

Parse Function returns an error if any. And, populates the parser with SchemaDefinition and Triples

Grammar And Definitions of Return Arguments:

SchemaDefinition:
Map with Key and Value as Strings. Key represents the abbreviation and Value is the absolute URI.
For example: key="rdf", value="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

Grammar of Triples

Triples -> Triple Triples | ϵ
Triple -> Subject Predicate Object | TRIPLE
Subject -> Node
Predicate -> Node
Object -> Node
Node -> NodeType ID | NODE
NodeType -> LITERAL | RESOURCELITERAL | NODEIDLITERAL | BLANK | IRI
ID -> STRING

Tree Structure of Return Arguments

Triples
├───Triple
│   ├───Object
│   │   └───Node
│   │       ├───ID
│   │       └───NodeType
│   ├───Predicate
│   │   └───Node
│   │       ├───ID
│   │       └───NodeType
│   └───Subject
│       └───Node
│           ├───ID
│           └───NodeType
└───ϵ

Examples:

Example For RDF Parser: 2-rdfparser


Associated Significant Commits:

  1. b70edda: Add RDF Parser
  2. dd3e455: Example For RDF Parser
  3. 3169b32: Tests For RDF Parser
  4. 348ba4c, 68d0d13: Support Concurrency
  5. 3998529: Tests And Bug Fixes
  6. 2595dfa: Allow Parser Triples to be a slice of Triples instead of a Map
  7. f7b9334: Add Support For CDATA Tags
  8. bfa2733: Closure For Terminal Tags
  9. 1a36ab0: Use String Instead Of Pointer As Keys For A Map
  10. 752599d: NodeToTriples Now returns a dynamic slice of unique triples

RDFWriter

Description

This is the first phase in the rdf loader provided by the GoRDF module. It provides an interface for reading the XML file and returning a rootBlock or the error if any encountered while parsing the xml structure. XML Reader acts as a dependency of the RDF Loader. This two phase structure of the GoRDF allows the programmers to easily change the XML Reader with any other reader of other representations of the RDF Format.


Code Entry Point

github.com/spdx/gordf/rdfwriter


Construction And Invocation

Invocation Methods: WriteToFile(writer, triples, schemaDefinition)
writer: io.Writer Object in which the content will be written. triples: List of unordered triples with structure same as this


Returns

error if any is encountered while serialising the triples into RDF/XML format.


Examples:

Example For RDF Writer: 4-rdfwriter


Associated Significant Commits:

  1. 503dea6: Biggest Update wrt RDF Writer. Adds Topological Sorting, Utils for RDF Writer and, Tests For the same. Almost everything of the RDF writer is added in this commit.
  2. f9e43c5: Add Support For Default NameSpaces
  3. 1a36ab0: Map of nodeToTriples now use strings as the keys instead of pointers providing a better and sturdy querying.

UseCase and Applications of GoRDF

  1. tools-golang: for more info, read "Application Of GoRDF (tools-golang)"

What is tools-golang?

tools-golang is a collection of Go packages intended to make it easier for Go programs to work with SPDX® files.
Till 27th August 2020, repository provides following functionalities:

  1. Tag Value Loader
  2. Tag Value Saver
  3. SPDX Document builder
  4. Compare Licenses

For more examples and use cases, refer examples

Description Of The UseCase

The main branch doesn't provide functionality for RDF Loading and Writing SPDX files into RDF format. There's a new branch called gordf which attempts to add these supports to the library.
Currently, gordf branch doesn't allow writing SPDX document to RDF format, but it allows user to load their RDF files into SPDX Document or validate it.


Packages Provided By The gordf Branch

  1. RDFLoader

Code Entry point

github.com/spdx/tools-golang/gordf/rdfloader


Construction And Invocation

Invocation Method: Load2_2(reader) reader is a io.Reader object from where the rdf file content will be read from.

Returns

Read function returns a spdxDocument and error if any.

Associated Significant Commits

  1. f609198: Biggest Update For this Branch. It integrates gordf to generate triples and includes code to parse triples and set it to the spdx document. It doesn't support License to a full extent.
  2. d8eb2d8: Added Support For All types of SPDX License.