1. Parsing Support for Operations Software Messages
    1. Interface
      1. Reply Headers
      2. Ordered Keyword Dictionaries
      3. Keyword Objects
      4. String Representations
    2. Implementation

From Ops/Core

Parsing Support for Operations Software Messages

This page describes the interface and implementation of parsers for the standard operations software messages. The underlying protocols being parsed are described here.

Interface

The command and reply string parsers return Command and Reply objects (defined in opscore.protocols.messages) that each contain an ordered dictionary of Keyword objects. For example, a list of reply keywords can be processed in order using:

from opscore.protocols.parser import ReplyParser,ParseError

rParser = ReplyParser()
try:
    reply = rParser.parse("tui.operator 911 BossICC : type=decaf;blend = 20:80, Kenyan,Bolivian ; now")
    for keyword in reply.keywords:
        print ' ',keyword.name,keyword.values
except ParseError:
    ...

A Command provides two additional attributes:

and can be parsed in a similar way:

from opscore.protocols.parser import CommandParser,ParseError

cParser = CommandParser()
try:
    cmd = cParser.parse("drink 'coffee' type=decaf blend = 20:80, Kenyan,Bolivian now")
    print cmd.name,cmd.values
    for keyword in cmd.keywords:
        print ' ',keyword.name,keyword.values
except ParseError:
    ...

Both the Command and Reply classes include a string attribute that stores the original input text before parsing.

The message classes (Command, Reply, Keywords, Keyword, Values) are designed to be instantiated by the parser and so only perform minimal protocol validation and do not test assertions that are already guaranteed by the grammar. A message validation framework, built on top of the parser, is documented here.

Reply Headers

A parsed reply has a header attribute that stores the fields described here. To access the four individual words use:

hdr = reply.header
print hdr.cmdrName,hdr.commandId,hdr.actor,hdr.code

The subfields of the commander name are also available and will be empty strings in case an optional subfield is not present in the reply message:

print hdr.programName,hdr.userName,hdr.actorStack

The code attribute is an instance of a enumerated type defined as follows:

MsgCode = types.Enum('>','D','I','W',':','F','!',
    labelHelp=['Queued','Debug','Information','Warning','Finished','Error','Fatal'],
    name='code',help='Reply header status code')

Header codes can be directly compared against the enumeration labels:

if hdr.code == 'F':
   print 'Error'

You can also check if the code is contained within a set of codes using, for example:

if hdr.code in 'F!:':
   print 'Done'

Ordered Keyword Dictionaries

The examples above iterate through cmd.keywords and reply.keywords as ordered lists of Keyword objects:

for keyword in parsed.keywords:
   ...

Keywords can also be directly indexed by position starting from zero:

first = parsed.keywords[0]
second = parsed.keywords[1]
...
last = parsed.keywords[-1]

Valid indices are less than len(parsed.keywords).

In cases where keyword order is not significant, keywords can also be indexed by keyword name (this does not introduce any ambiguities since integers are not valid keyword names):

mode = parsed.keywords['mode']

The presence of a keyword name can be tested with the usual in operator:

if 'mode' in parsed.keywords:
   ...

Note that keyword name matching is always case insensitive, but keyword.name preserves the actual case used.

Finally, keywords can be sliced to support mixed positional and unordered processing logic:

required = parsed.keywords[0]
if 'optional' in parsed.keywords[1:]:
  ...

Keyword Objects

A keyword object has the following fields:

After parsing, values are stored as opaque strings. Use the validation framework to check for the expected number of values and their data types and, if successful, replace the string values with their typed equivalents.

String Representations

Message classes provide three string representations:

For the sample command parsed above, the corresponding string representations are:

CMD('drink'=['coffee'];[KEY(type)=['decaf'], KEY(blend)=['20:80', 'Kenyan', 'Bolivian'], KEY(now)=[]])
drink "coffee" type="decaf" blend=20:80,Kenyan,Bolivian now
VERB 123 KEY=123 KEY=123,123,123 KEY

The last two representations are themselves valid commands that can be re-parsed. A correct parser implementation will satisfy the following round-trip assertions:

assert(result.canonical() == parser.parse(result.canonical()).canonical())
assert(result.tokenized() == parser.parse(result.canonical()).tokenized())
assert(result.tokenized() == parser.parse(result.tokenized()).tokenized())

Implementation

The formal message grammar has been implemented with two different python parsing libraries:

Both are lightweight and pure python and so would be reasonable external dependencies for the operations software. As of 21 Oct 2008, we have selected the PLY version as the recommended parser for SDSS-3 operations software and made it available in the module opscore.protocols.parser together with a bundled version of the PLY package in ops.lib.ply.

PLY remains very close to its LEX and YACC origins, including the optimized performance and syntax idiosyncrasies. One pythonic wrinkle it adds is that the parser rules are declared implicitly, rather than through explicit statements, somewhat in the style of regular expressions. As an example, the following PLY code specifies how a single command keyword should be parsed:

def p_keyword_with_values(self,p):
    "keyword : NAME_OR_VALUE values"
    p[0] = Keyword(p[1],p[2])

def p_bare_keyword(self,p):
    "keyword : NAME_OR_VALUE"
    p[0] = Keyword(p[1])

def p_raw_keyword(self,p):
    "keyword : RAW LINE"
    p[0] = Keyword(p[1],[p[2]])

The PLY command parser is about 200 lines long.

Pyparsing is an object oriented framework where parser rules are declared explicitly via python expressions. On the whole is is considerably more expressive that PLY but does not offer the same level of run-time performance. A "packrat" caching optimization is included but does not improve the performance with our grammar. Here is the pyparser specification of the same command keyword rules:

keyword = (RAW + Group(LINE)) | (NAME_OR_VALUE + Group(Optional(values)))
keyword.setParseAction(lambda s,l,token: Keyword(token[0],token[1].asList()) )

The pyparsing command parser and is about 100 lines long.

Both implementations have been validated against a unit test suite with reasonable (but not exhaustive) coverage and including the round-trip assertions mentioned above.

The table below summarizes the performance of the two implementations on a timing test suite of about 400 replies and 1000 commands of various lengths and grammars. Raw keywords are not included in the command test suite since they are intended primarily as a diagnostic tool (in any case, they are generally faster to parse and do not change the overall timing ratios). The test suite repeats shorter messages more often, mirroring the expected usage pattern, to calculate a net throughput normalized for a 100-character message:

Parser Implementation Replies Commands
PLY 2.5.0 2.2 kHz 2.2 kHz
pyparsing 1.5.0 460 Hz 230 Hz
pyparsing 1.5.0, packrat enabled 270 Hz 170 Hz

These numbers are obviously dependent on the testing platform (an Intel MacBook Pro in this case), but clearly indicate that PLY is about 5 (10) times faster than pyparsing for replies (commands) and fast enough for the intended application. The pyparsing module throughput is low enough in these tests to justify selecting PLY as the standard parser for the operations software, despite the pyparsing's more expressive syntax. The equal PLY reply and command parsing speeds are expected since table-driven parsing performance should be roughly independent of the grammar complexity. The significantly worse pyparsing performance with packrat caching enabled is unexpected.