Skip to content
Filipe Funenga edited this page Jun 20, 2013 · 27 revisions

Table of Contents

Normalize Vocabulary

Normalize vocabulary to correspond the upcoming docopt.py v0.7.0.

Element should be Pattern:

Pattern
    ChildPattern  # a better name??: TerminalPattern or LeafPattern
        Option
        Argument
        Command 
    ParentPattern  # a better name??: CollectionPattern
        Required
        Optional
        AnyOptions
        OneOrMore
        Either

New C-Struct Ideas

How Pattern C-struct could look like:

typedef {
    // probably more conventional to have enum
    // all UPPER CASE
    enum {
        OPTION,
        ARGUMENT,
        COMMAND,

        REQUIRED,
        OPTIONAL,
        ONEORMORE,
        EITHER,

        NONE,
    } type;
    union {
        Option option;
        Argument argument;
        Command command;

        // maybe having a single container for
        // required/optional/oneoremore/either,
        // since they all hold the same kind of data.
        Container container;
    } payload;
} Pattern;

typedef struct {
    // in Python these are `short` and `long`,
    // but these are reserved in C
    const char *oshort;  // maybe *flag?
    const char *olong;   // maybe *full?
    bool argcount;       // number of arguments 0 or 1
    bool value;          // value if argcount is 0
    char *argument;      // value if argcount is 1
    // maybe instead of having `value` and `argument`
    // we could have just `char *value` and treat it
    // as NULL/non-NULL for true/false (if argcount == 0)
    // and as value/no-value when argcount == 1?
    // I wonder, would that be portable:
    // char *value = true;
    //
    // Oh, it is actually more complicated than that.
    // In Python, Option's value could be either:
    //  * True/False, if it's a single option without argument
    //  * a number, if it's an option (w/o argument) that could be repeated
    //  * a string, if it's an option with argument non-repeated
    //  * array of strings, if it's an option with argument, repeated
    //  Well:
    //          repeatable |       no          yes
    //          -----------+----------------------
    //            argcount |
    //                   0 |      bool        int
    //                   1 |      char*       char**
    //
    // So we need to handle all these cases.
    //
    // a) Maybe go full type-unsafe and declare it as char**
    // and use it as bool/char*/int by casting? :-)
    //
    // b) Another variant would be to have a union of these types.
    //
    // c) Yet another would be to store all these in the same struct.
} Option;

typedef struct {
     char *name;
     bool repeating;  // Maybe int count; instead?
                      // (to explicitly state the length of **array).
     char *value;     // Maybe get rid of this in favor of
                      // just having 1 item in **array.
     char **array;    // I can think of 2 ways how to allocate this array.
                      // 1. It could be statically allocated
                      //    to fit, say, max 32 elements, and then each
                      //    pointer in it will point to an item in argv.
                      // 2. Make it point to (a part of) argv directly
                      //    and then use `count` to see how long it is.
} Argument;

typedef struct {
     char *name;
     // Command's value could be either True/False in Python
     // or the number of times the command was mentioned.
     // We could use the fact that 0 is falsy in C, and
     // declare it simply as int:
     int value; // Maybe int count; instead?
} Command;

// In Python version required/optional/etc hold an array
// called `children` which is an array of all child nodes.
// Since we don't want to allocate these arrays dynamically
// or overallocate them statically (by allocating, say
// 32 items "just in case"), I was thinking of 2 variants:
//
// 1. Use linked list like the following:
typedef struct {
    Pattern *pattern;
    Container *next;
} Container;
// and transform patterns in Python into a linked list form:
//
// Required(a, b, c) into Required(a, Required(b, Required(c, NULL)))
//
// 2. Another way could be to *generate* code for each struct. I.e. if the
// pattern is `usage: prog [<this> <that>] (-a | -b | -c)` or in Python terms:
// Required(Optional(<this>, <that>), Either(-a, -b, -c))
//
// Then just generate precisely these containers:
//
typedef struct {
    Pattern patterns[2];
} ContainerRequired1;

typedef struct {
    Pattern patterns[2];
} ContainerOptional1;

typedef struct {
    Pattern patterns[3];
} ContainerEither1;

Ideas to Parse ARGV

<Add new ideas here!>

Workflow of docopt Function

  1. Runs function parse_args to populate Element *options array
  2. Overrides default values in struct DocoptArgs with the values parsed from argv
  3. Verifies that the Pattern is respected.

Note: It might possible to integrate step 3 into step 2.

Similar-To-Pythonic-Approach [STPA]

  • Advantages
    • all generated C code is readable because only pattern tree is generated, not any real code, (maybe just a little);
    • implementation similar to the python one, which makes it only a matter of porting code to C
  • Disadvantages (- this needs to port considerable portion of Python code)
    • Complexity: O(Size_of_the_pattern_tree)
    • complex patterns might use a lot of call stack.

VK: First thing that comes into mind for me is to mimic Python implementation. Python parses pattern into a tree of nodes like:

'''usage: program --hai
          program --bye <arg>'''

=> parse_pattern =>

Either(Option('--hai'), Required(Option('--bye'), Argument('<arg>')))

Then Python parses argv:

['--bye', 'arg'] => parse_argv => [Option('--bye'), Argument('arg')]

Then matches one against another. We could generate the pattern using Python. Write parse_argv function similar to Python one and write a bunch of match functions similar to Python match methods for each kind of pattern (Option, Argument, Either, etc.).

Non-Recursive Similar-To-Pythonic-Approach [NR-STPA]

  • Advantages
    • Complexity: O(Number_of_arguments_provided_in_CLI)
    • Simple memory consumption (stack usage is not dependent on the pattern tree size). Definite number of stack levels.
    • all generated C code is readable because only sub-patterns are generated, not any real code
  • Disadvantages
    • Non-recursive means more code, which might be a bad thing in a single file

VK: I don't see advantages in being non-recursive. How does it make memory consumption simpler? We are not going to run out of stack anyway.

This approach starts by dividing the pattern tree into multiple sub_patterns. Example:

Usage:
    prog ship new [--coordinator] <name>
    prog ship move <x> <y>
    prog fire <gun> <x> <y> [--burst]

This pattern has three sub-patterns.

When we are iterating over the argv elements, a sub_pattern can be in two states: 'unsatisfiable' and 'not unsatisfiable'.

When a sub_pattern is said to be unsatisfiable, it means the algorithm has come to the conclusion that no matter what the next elements of argv might be, the sub_pattern will not be satisfied. Caution: if there are still argv elements to be parsed, a sub_pattern might not be 'satisfiable'. Hence, the 'not unsatisfiable' notation.

The main workflow is to assume all sub-patterns are NOT UNsatisfiable in the beginning and, after parsing each element of argv, all sub-patterns are tested if they have become unsatisfiable.

Example with the previous usage pattern:

user inserted argv: $ ./prog ship new SuperBoat
Initial sub_patterns list:
    sub_pattern[0].IsNotSatisfiable() == False
    sub_pattern[1].IsNotSatisfiable() == False
    sub_pattern[2].IsNotSatisfiable() == False
Iteration 1:
    parsed from argv: Command('ship')
    sub_pattern[0].IsNotSatisfiable() == False
    sub_pattern[1].IsNotSatisfiable() == False
    sub_pattern[2].IsNotSatisfiable() == True
Iteration 2:
    parsed from argv: Command('new')
    sub_pattern[0].IsNotSatisfiable() == False
    sub_pattern[1].IsNotSatisfiable() == True
    sub_pattern[2].IsNotSatisfiable() == True
# (...)

In C, we can have an array of sub_patterns that keep their state along iterations.

With this approach its easy to define a flowchart/state_machine:

  • If all sub_patterns become unsatisfiable before all argv elements are parsed, then present usage_message.
  • If all argv elements were parsed and there is not an unique SATISFIABLE(*) sub_pattern, then present usage_message.

(*) After all argv elements are parsed, the algorithm can conclude which sub_patterns are satisfiable. At the end of argv, the set of ["unsatisfiable", "not unsatisfiable"] sub_patterns can be converted to a set of ["unsatisfiable", "satisfiable"] sub_patterns.

Pseudocode Workflow

  1. Iterate over argv
    1. Parse argv[i] to (type, value [, name])
    2. Iterate over all not unsatisfiable sub_patterns
      1. Change the state to unsatisfiable if test fails
    3. If all patterns are unsatisfiable, present usage_message and exit(1)
  2. Iterate over all not unsatisfiable sub_patterns and convert them to satisfiable or unsatisfiable
  3. If there is not one single unique satisfiable sub_pattern, present usage_message and exit(1)
  4. Compile resulting struct DocoptArgs from the unique satisfiable sub_pattern (copy parsed values and insert defaults)

How a sub_pattern can be implemented

In 2013-06-06, @halst said at docopt/docopt:

docopt does respect the order of arguments and commands in argv, but options could go in any order, anywhere.

We these characteristics in mind, a sub_pattern can be implemented by separating the ordered elements (aka statics) and Options (options):

typedef struct { /*(...)*/ } STATIC;
typedef enum {UNSAT, NOTUNSAT, SAT} SAT_STATE;
typedef struct {
    /* the current state of the the sub_pattern */
    SAT_STATE satisfiability;
    
    /* This is an array of elements in [Argument, Command, SUBPATTERN] */
    int statics_len;
    STATIC *statics_array;
    int statics_idx;        /* This index indicates what is the next element
                             * that needs to exist in the pattern. */
    
    /* This is an array of Options. */
    int options_len;
    OPTION *options_array;
} SUBPATTERN;

Switch-Case-Chars [SCC]

  • Advantages
    • It is probably simple to implement in docopt_c.py
  • Disadvantages
    • The produced code is unreadable :(((
    • VK: I don't thing switch-case is a good idea. What about options—they can be positioned anywhere, so you need to take them into account at every point of time?

This approach needs to build a "tree of switch-cases" based on the chars in words of argv. For instance, see this docopt string:

Usage:
    program tcp <host> <port> [--timeout=<seconds>]
    program serial <port> [--baud=9600] [--timetout=<seconds>]
    program -h | --help | --version

can have in its argv[1] the following words:

  • tcp
  • serial
  • -h
  • --help
  • --version

The tree of switch-cases for the first word in argv can be summarized in the following diagram:

argv[1][0] == 't' -> argv[1] is "tcp"
argv[1][0] == 's' -> argv[1] is "serial"
argv[1][0] == '-'
               |-- argv[1][1] == 'h' -> argv[1] is "-h"
               |-- argv[1][1] == '-'
                                  |-- argv[1][2] == 'h' -> argv[1] is "--help"
                                  |-- argv[1][2] == 'v' -> argv[1] is "--version"

Check out this simple example in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


/* "Usage:\n"
 * "    program tcp <host> <port> [--timeout=<seconds>]\n"
 * "    program serial <port> [--baud=9600] [--timetout=<seconds>]\n"
 * "    program -h | --help | --version\n"
 */


int main(int argc, char *argv[]) {
    int i = 1;
    char *word;
    size_t word_len;

    if (argc == 1) {
        fprintf(stderr, "Usage:\n"
                        "    program tcp <host> <port> [--timeout=<seconds>]\n"
                        "    program serial <port> [--baud=9600] [--timetout=<seconds>]\n"
                        "    program -h | --help | --version\n");
        exit(0);
    }

    word = argv[1];
    word_len = strlen(word);

    if (word_len < 1) {
        fprintf(stderr, "error: '%s' not recognized\n", word);
        exit(1);
    }
    switch (word[0]) {
        case 't':
            if (strcmp(word, "tcp")) {
                fprintf(stderr, "error: '%s' not recognized\n", word);
                exit(1);
            }
            /* Keep on parsing the rest of the CLI knowing that argv[1] is "tcp" */
            /* (...) */
            break;
        case 's':
            if (strcmp(word, "serial")) {
                fprintf(stderr, "error: '%s' not recognized\n", word);
                exit(1);
            }
            /* Keep on parsing the rest of the CLI knowing that argv[1] is "serial" */
            /* (...) */
            break;
        case '-':
            if (word_len < 2) {
                fprintf(stderr, "error: '%s' not recognized\n", word);
                exit(1);
            }
            switch (word[1]) {
                case 'h':
                    if (strcmp(word, "-h")) {
                        fprintf(stderr, "error: '%s' not recognized\n", word);
                        exit(1);
                    }
                    /* Executes the -h action */
                    /* (...) */
                    break;
                case '-':
                    if (word_len < 3) {
                        fprintf(stderr, "error: '%s' not recognized\n", word);
                        exit(1);
                    }
                    switch (word[2]) {
                        case 'h':
                            if (strcmp(word, "--help")) {
                                fprintf(stderr, "error: '%s' not recognized\n", word);
                                exit(1);
                            }
                            /* Executes the --help action */
                            /* (...) */
                            break;
                        case 'v':
                            if (strcmp(word, "--version")) {
                                fprintf(stderr, "error: '%s' not recognized\n", word);
                                exit(1);
                            }
                            /* Executes the --version action */
                            /* (...) */
                            break;
                        default:
                            fprintf(stderr, "error: '%s' not recognized\n", word);
                            exit(1);
                    }
                    break;
                default:
                    fprintf(stderr, "error: '%s' not recognized\n", word);
                    exit(1);
            }
            break;
        default:
            fprintf(stderr, "error: '%s' not recognized\n", word);
            exit(1);
    }

    return 0;
}
Clone this wiki locally