Skip to content
Filipe Funenga edited this page Jun 6, 2013 · 27 revisions

Table of Contents

Normalize Vocabulary

Normalize vocabulary to correspond the upcoming docopt.py v0.7.0.

Element should be Pattern:

Pattern
    ChildPattern  # a better name??: TerminalPattern or LeafPattern
        Option
        Argument
        Command 
    ParentPattern  # a better name??: CollectionPattern
        Required
        Optional
        AnyOptions
        OneOrMore
        Either

New C-Struct Ideas

How Pattern C-struct could look like:

typedef {
    // probably more conventional to have enum
    // all UPPER CASE
    enum {
        OPTION,
        ARGUMENT,
        COMMAND,

        REQUIRED,
        OPTIONAL,
        ONEORMORE,
        EITHER,

        NONE,
    } type;
    union {
        Option option;
        Argument argument;
        Command command;

        // maybe having a single container for
        // required/optional/oneoremore/either,
        // since they all hold the same kind of data.
        Container container;
    } payload;
} Pattern;

typedef struct {
    // in Python these are `short` and `long`,
    // but these are reserved in C
    const char *oshort;  // maybe *flag?
    const char *olong;   // maybe *full?
    bool argcount;       // number of arguments 0 or 1
    bool value;          // value if argcount is 0
    char *argument;      // value if argcount is 1
    // maybe instead of having `value` and `argument`
    // we could have just `char *value` and treat it
    // as NULL/non-NULL for true/false (if argcount == 0)
    // and as value/no-value when argcount == 1?
    // I wonder, would that be portable:
    // char *value = true;
    //
    // Oh, it is actually more complicated than that.
    // In Python, Option's value could be either:
    //  * True/False, if it's a single option without argument
    //  * a number, if it's an option (w/o argument) that could be repeated
    //  * a string, if it's an option with argument non-repeated
    //  * array of strings, if it's an option with argument, repeated
    //  Well:
    //          repeatable |       no          yes
    //          -----------+----------------------
    //            argcount |
    //                   0 |      bool        int
    //                   1 |      char*       char**
    //
    // So we need to handle all these cases.
    //
    // a) Maybe go full type-unsafe and declare it as char**
    // and use it as bool/char*/int by casting? :-)
    //
    // b) Another variant would be to have a union of these types.
    //
    // c) Yet another would be to store all these in the same struct.
} Option;

typedef struct {
     char *name;
     bool repeating;  // Maybe int count; instead?
                      // (to explicitly state the length of **array).
     char *value;     // Maybe get rid of this in favor of
                      // just having 1 item in **array.
     char **array;    // I can think of 2 ways how to allocate this array.
                      // 1. It could be statically allocated
                      //    to fit, say, max 32 elements, and then each
                      //    pointer in it will point to an item in argv.
                      // 2. Make it point to (a part of) argv directly
                      //    and then use `count` to see how long it is.
} Argument;

typedef struct {
     char *name;
     // Command's value could be either True/False in Python
     // or the number of times the command was mentioned.
     // We could use the fact that 0 is falsy in C, and
     // declare it simply as int:
     int value; // Maybe int count; instead?
} Command;

// In Python version required/optional/etc hold an array
// called `children` which is an array of all child nodes.
// Since we don't want to allocate these arrays dynamically
// or overallocate them statically (by allocating, say
// 32 items "just in case"), I was thinking of 2 variants:
//
// 1. Use linked list like the following:
typedef struct {
    Pattern *pattern;
    Container *next;
} Container;
// and transform patterns in Python into a linked list form:
//
// Required(a, b, c) into Required(a, Required(b, Required(c, NULL)))
//
// 2. Another way could be to *generate* code for each struct. I.e. if the
// pattern is `usage: prog [<this> <that>] (-a | -b | -c)` or in Python terms:
// Required(Optional(<this>, <that>), Either(-a, -b, -c))
//
// Then just generate precisely these containers:
//
typedef struct {
    Pattern patterns[2];
} ContainerRequired1;

typedef struct {
    Pattern patterns[2];
} ContainerOptional1;

typedef struct {
    Pattern patterns[3];
} ContainerEither1;

Ideas to Parse ARGV

(Add new ideas here.)

Switch-Case-Chars

This approach needs to build a "tree of switch-cases" based on the chars in words of argv. For instance, see this docopt string:

Usage:
    program tcp <host> <port> [--timeout=<seconds>]
    program serial <port> [--baud=9600] [--timetout=<seconds>]
    program -h | --help | --version

can have in its argv[1] the following words:

  • tcp
  • serial
  • -h
  • --help
  • --version

The tree of switch-cases for the first word in argv can be summarized in the following diagram:

argv[1][0] == 't' -> argv[1] is "tcp"
argv[1][0] == 's' -> argv[1] is "serial"
argv[1][0] == '-'
               |-- argv[1][1] == 'h' -> argv[1] is "-h"
               |-- argv[1][1] == '-'
                                  |-- argv[1][2] == 'h' -> argv[1] is "--help"
                                  |-- argv[1][2] == 'v' -> argv[1] is "--version"

Check out this simple example in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


/* "Usage:\n"
 * "    program tcp <host> <port> [--timeout=<seconds>]\n"
 * "    program serial <port> [--baud=9600] [--timetout=<seconds>]\n"
 * "    program -h | --help | --version\n"
 */


int main(int argc, char *argv[]) {
    int i = 1;
    char *word;
    size_t word_len;

    if (argc == 1) {
        fprintf(stderr, "Usage:\n"
                        "    program tcp <host> <port> [--timeout=<seconds>]\n"
                        "    program serial <port> [--baud=9600] [--timetout=<seconds>]\n"
                        "    program -h | --help | --version\n");
        exit(0);
    }

    word = argv[1];
    word_len = strlen(word);

    if (word_len < 1) {
        fprintf(stderr, "error: '%s' not recognized\n", word);
        exit(1);
    }
    switch (word[0]) {
        case 't':
            if (strcmp(word, "tcp")) {
                fprintf(stderr, "error: '%s' not recognized\n", word);
                exit(1);
            }
            /* Keep on parsing the rest of the CLI knowing that argv[1] is "tcp" */
            /* (...) */
            break;
        case 's':
            if (strcmp(word, "serial")) {
                fprintf(stderr, "error: '%s' not recognized\n", word);
                exit(1);
            }
            /* Keep on parsing the rest of the CLI knowing that argv[1] is "serial" */
            /* (...) */
            break;
        case '-':
            if (word_len < 2) {
                fprintf(stderr, "error: '%s' not recognized\n", word);
                exit(1);
            }
            switch (word[1]) {
                case 'h':
                    if (strcmp(word, "-h")) {
                        fprintf(stderr, "error: '%s' not recognized\n", word);
                        exit(1);
                    }
                    /* Executes the -h action */
                    /* (...) */
                    break;
                case '-':
                    if (word_len < 3) {
                        fprintf(stderr, "error: '%s' not recognized\n", word);
                        exit(1);
                    }
                    switch (word[2]) {
                        case 'h':
                            if (strcmp(word, "--help")) {
                                fprintf(stderr, "error: '%s' not recognized\n", word);
                                exit(1);
                            }
                            /* Executes the --help action */
                            /* (...) */
                            break;
                        case 'v':
                            if (strcmp(word, "--version")) {
                                fprintf(stderr, "error: '%s' not recognized\n", word);
                                exit(1);
                            }
                            /* Executes the --version action */
                            /* (...) */
                            break;
                        default:
                            fprintf(stderr, "error: '%s' not recognized\n", word);
                            exit(1);
                    }
                    break;
                default:
                    fprintf(stderr, "error: '%s' not recognized\n", word);
                    exit(1);
            }
            break;
        default:
            fprintf(stderr, "error: '%s' not recognized\n", word);
            exit(1);
    }

    return 0;
}

###Pros and Cons

  • Pros
    • It is probably simple to implement in docopt_c.py
  • Cons
    • The produced code is unreadable :(((
Clone this wiki locally