-
Notifications
You must be signed in to change notification settings - Fork 48
Home
Table of Contents
Normalize vocabulary to correspond the upcoming docopt.py v0.7.0.
Element should be Pattern:
Pattern
ChildPattern # a better name??: TerminalPattern or LeafPattern
Option
Argument
Command
ParentPattern # a better name??: CollectionPattern
Required
Optional
AnyOptions
OneOrMore
Either
How Pattern C-struct could look like:
typedef {
// probably more conventional to have enum
// all UPPER CASE
enum {
OPTION,
ARGUMENT,
COMMAND,
REQUIRED,
OPTIONAL,
ONEORMORE,
EITHER,
NONE,
} type;
union {
Option option;
Argument argument;
Command command;
// maybe having a single container for
// required/optional/oneoremore/either,
// since they all hold the same kind of data.
Container container;
} payload;
} Pattern;
typedef struct {
// in Python these are `short` and `long`,
// but these are reserved in C
const char *oshort; // maybe *flag?
const char *olong; // maybe *full?
bool argcount; // number of arguments 0 or 1
bool value; // value if argcount is 0
char *argument; // value if argcount is 1
// maybe instead of having `value` and `argument`
// we could have just `char *value` and treat it
// as NULL/non-NULL for true/false (if argcount == 0)
// and as value/no-value when argcount == 1?
// I wonder, would that be portable:
// char *value = true;
//
// Oh, it is actually more complicated than that.
// In Python, Option's value could be either:
// * True/False, if it's a single option without argument
// * a number, if it's an option (w/o argument) that could be repeated
// * a string, if it's an option with argument non-repeated
// * array of strings, if it's an option with argument, repeated
// Well:
// repeatable | no yes
// -----------+----------------------
// argcount |
// 0 | bool int
// 1 | char* char**
//
// So we need to handle all these cases.
//
// a) Maybe go full type-unsafe and declare it as char**
// and use it as bool/char*/int by casting? :-)
//
// b) Another variant would be to have a union of these types.
//
// c) Yet another would be to store all these in the same struct.
} Option;
typedef struct {
char *name;
bool repeating; // Maybe int count; instead?
// (to explicitly state the length of **array).
char *value; // Maybe get rid of this in favor of
// just having 1 item in **array.
char **array; // I can think of 2 ways how to allocate this array.
// 1. It could be statically allocated
// to fit, say, max 32 elements, and then each
// pointer in it will point to an item in argv.
// 2. Make it point to (a part of) argv directly
// and then use `count` to see how long it is.
} Argument;
typedef struct {
char *name;
// Command's value could be either True/False in Python
// or the number of times the command was mentioned.
// We could use the fact that 0 is falsy in C, and
// declare it simply as int:
int value; // Maybe int count; instead?
} Command;
// In Python version required/optional/etc hold an array
// called `children` which is an array of all child nodes.
// Since we don't want to allocate these arrays dynamically
// or overallocate them statically (by allocating, say
// 32 items "just in case"), I was thinking of 2 variants:
//
// 1. Use linked list like the following:
typedef struct {
Pattern *pattern;
Container *next;
} Container;
// and transform patterns in Python into a linked list form:
//
// Required(a, b, c) into Required(a, Required(b, Required(c, NULL)))
//
// 2. Another way could be to *generate* code for each struct. I.e. if the
// pattern is `usage: prog [<this> <that>] (-a | -b | -c)` or in Python terms:
// Required(Optional(<this>, <that>), Either(-a, -b, -c))
//
// Then just generate precisely these containers:
//
typedef struct {
Pattern patterns[2];
} ContainerRequired1;
typedef struct {
Pattern patterns[2];
} ContainerOptional1;
typedef struct {
Pattern patterns[3];
} ContainerEither1;
<Add new ideas here!>
- Runs function
parse_args
to populateElement *options
array - Overrides default values in
struct DocoptArgs
with the values parsed fromargv
- Verifies that the Pattern is respected.
Note: It might possible to integrate step 3 into step 2.
- Advantages
- all generated C code is readable because only pattern tree is generated, not any real code, (maybe just a little);
- implementation similar to the python one, which makes it only a matter of porting code to C
- Disadvantages
(- this needs to port considerable portion of Python code)
- Complexity: O(Size_of_the_pattern_tree)
- complex patterns might use a lot of call stack.
VK: First thing that comes into mind for me is to mimic Python implementation. Python parses pattern into a tree of nodes like:
'''usage: program --hai
program --bye <arg>'''
=> parse_pattern =>
Either(Option('--hai'), Required(Option('--bye'), Argument('<arg>')))
Then Python parses argv:
['--bye', 'arg'] => parse_argv => [Option('--bye'), Argument('arg')]
Then matches one against another. We could generate the pattern using Python. Write parse_argv
function similar to Python one and write a bunch of match
functions similar to Python match
methods for each kind of pattern (Option, Argument, Either, etc.).
- Advantages
- Complexity: O(Number_of_arguments_provided_in_CLI)
- Simple memory consumption (stack usage is not dependent on the pattern tree size). Definite number of stack levels.
- all generated C code is readable because only sub-patterns are generated, not any real code
- Disadvantages
- Non-recursive means more code, which might be a bad thing in a single file
VK: I don't see advantages in being non-recursive. How does it make memory consumption simpler? We are not going to run out of stack anyway.
This approach starts by dividing the pattern tree into multiple sub_patterns
. Example:
Usage:
prog ship new [--coordinator] <name>
prog ship move <x> <y>
prog fire <gun> <x> <y> [--burst]
This pattern has three sub-patterns
.
When we are iterating over the argv
elements, a sub_pattern
can be in two states: 'unsatisfiable' and 'not unsatisfiable'.
When a sub_pattern
is said to be unsatisfiable, it means the algorithm has come to the conclusion that no matter what the next elements of argv
might be, the sub_pattern
will not be satisfied. Caution: if there are still argv
elements to be parsed, a sub_pattern
might not be 'satisfiable'. Hence, the 'not unsatisfiable' notation.
The main workflow is to assume all sub-patterns
are NOT UNsatisfiable in the beginning and, after parsing each element of argv
, all sub-patterns
are tested if they have become unsatisfiable.
Example with the previous usage pattern:
user inserted argv: $ ./prog ship new SuperBoat
Initial sub_patterns list:
sub_pattern[0].IsNotSatisfiable() == False
sub_pattern[1].IsNotSatisfiable() == False
sub_pattern[2].IsNotSatisfiable() == False
Iteration 1:
parsed from argv: Command('ship')
sub_pattern[0].IsNotSatisfiable() == False
sub_pattern[1].IsNotSatisfiable() == False
sub_pattern[2].IsNotSatisfiable() == True
Iteration 2:
parsed from argv: Command('new')
sub_pattern[0].IsNotSatisfiable() == False
sub_pattern[1].IsNotSatisfiable() == True
sub_pattern[2].IsNotSatisfiable() == True
# (...)
In C, we can have an array of sub_patterns
that keep their state along iterations.
With this approach its easy to define a flowchart/state_machine:
- If all
sub_patterns
become unsatisfiable before allargv
elements are parsed, then presentusage_message
. - If all
argv
elements were parsed and there is not an unique SATISFIABLE(*)sub_pattern
, then presentusage_message
.
(*) After all argv
elements are parsed, the algorithm can conclude which sub_patterns
are satisfiable. At the end of argv
, the set of ["unsatisfiable", "not unsatisfiable"] sub_patterns
can be converted to a set of ["unsatisfiable", "satisfiable"] sub_patterns
.
- Iterate over
argv
- Parse
argv[i]
to(type, value [, name])
- Iterate over all not unsatisfiable
sub_patterns
- Change the state to unsatisfiable if test fails
- If all patterns are unsatisfiable, present
usage_message
and exit(1)
- Parse
- Iterate over all not unsatisfiable
sub_patterns
and convert them to satisfiable or unsatisfiable - If there is not one single unique satisfiable
sub_pattern
, presentusage_message
and exit(1) - Compile resulting
struct DocoptArgs
from the unique satisfiablesub_pattern
(copy parsed values and insert defaults)
In 2013-06-06, @halst said at docopt/docopt:
docopt does respect the order of arguments and commands in argv, but options could go in any order, anywhere.
We these characteristics in mind, a sub_pattern
can be implemented by separating the ordered elements (aka statics
) and Options (options
):
typedef struct { /*(...)*/ } STATIC;
typedef enum {UNSAT, NOTUNSAT, SAT} SAT_STATE;
typedef struct {
/* the current state of the the sub_pattern */
SAT_STATE satisfiability;
/* This is an array of elements in [Argument, Command, SUBPATTERN] */
int statics_len;
STATIC *statics_array;
int statics_idx; /* This index indicates what is the next element
* that needs to exist in the pattern. */
/* This is an array of Options. */
int options_len;
OPTION *options_array;
} SUBPATTERN;
- Advantages
- It is probably simple to implement in
docopt_c.py
- It is probably simple to implement in
- Disadvantages
- The produced code is unreadable :(((
- VK: I don't thing switch-case is a good idea. What about options—they can be positioned anywhere, so you need to take them into account at every point of time?
This approach needs to build a "tree of switch-cases" based on the chars in words of argv
. For instance, see this docopt string:
Usage:
program tcp <host> <port> [--timeout=<seconds>]
program serial <port> [--baud=9600] [--timetout=<seconds>]
program -h | --help | --version
can have in its argv[1]
the following words:
tcp
serial
-h
--help
--version
The tree of switch-cases for the first word in argv
can be summarized in the following diagram:
argv[1][0] == 't' -> argv[1] is "tcp"
argv[1][0] == 's' -> argv[1] is "serial"
argv[1][0] == '-'
|-- argv[1][1] == 'h' -> argv[1] is "-h"
|-- argv[1][1] == '-'
|-- argv[1][2] == 'h' -> argv[1] is "--help"
|-- argv[1][2] == 'v' -> argv[1] is "--version"
Check out this simple example in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* "Usage:\n"
* " program tcp <host> <port> [--timeout=<seconds>]\n"
* " program serial <port> [--baud=9600] [--timetout=<seconds>]\n"
* " program -h | --help | --version\n"
*/
int main(int argc, char *argv[]) {
int i = 1;
char *word;
size_t word_len;
if (argc == 1) {
fprintf(stderr, "Usage:\n"
" program tcp <host> <port> [--timeout=<seconds>]\n"
" program serial <port> [--baud=9600] [--timetout=<seconds>]\n"
" program -h | --help | --version\n");
exit(0);
}
word = argv[1];
word_len = strlen(word);
if (word_len < 1) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
switch (word[0]) {
case 't':
if (strcmp(word, "tcp")) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
/* Keep on parsing the rest of the CLI knowing that argv[1] is "tcp" */
/* (...) */
break;
case 's':
if (strcmp(word, "serial")) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
/* Keep on parsing the rest of the CLI knowing that argv[1] is "serial" */
/* (...) */
break;
case '-':
if (word_len < 2) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
switch (word[1]) {
case 'h':
if (strcmp(word, "-h")) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
/* Executes the -h action */
/* (...) */
break;
case '-':
if (word_len < 3) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
switch (word[2]) {
case 'h':
if (strcmp(word, "--help")) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
/* Executes the --help action */
/* (...) */
break;
case 'v':
if (strcmp(word, "--version")) {
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
/* Executes the --version action */
/* (...) */
break;
default:
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
break;
default:
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
break;
default:
fprintf(stderr, "error: '%s' not recognized\n", word);
exit(1);
}
return 0;
}