Skip to content

Latest commit

 

History

History
172 lines (126 loc) · 5.88 KB

QUESTIONS.md

File metadata and controls

172 lines (126 loc) · 5.88 KB

Open equestions/assumptions:

  • Q: Impose a minimum vector length of 4 for speed?
    A:

  • Q: Trig functions range
    A: 0..2pi

  • Q: Work with matrix objects or on arrays?
    A: Raw arrays as default

  • Q: Work with explicit C99 native types for DSP library?
    A: Yes, no savings in hidden structures, except for types

  • Q: Library vs API portability?
    A: ??

  • Q: 32/64 bit portability
    A:

  • Q: Argument order, "natural" or compatible with other the "other" library
    A: Natural order, except for when we want to implement exact API

  • Q: Saturation/overflow policy
    A: GIGO

  • Q: Include scatter gather in basic memcpy operations
    A: Yes, fundemental to DMAs, should be in hal

  • Q: Include barrier?
    A: Yes, fundemental to all models, based on absolute address like mutex

  • Q: Perfect API compatibility?
    A: The ability to run the exact same code on multiple platforms is magic!
    Strive for MPI and POSIX same name no change compatibility.

  • Q: What DSP/math API's where researched?
    A:

  • Q: Which POSIX starting point
    A: Bionic, newlib, uClibc, other?

  • Q: int dd or explicit structure type?
    A: file descriptor..

  • Q: Should we have the ability to open device as stream, what does it mean?
    A: No, that is one level up.(fifo=memory+state)

  • Q: Use argc/argv for passing variables?
    A: Make the argv of type void* args[], not just chars

  • Q: NAN, denormal numbers in math
    A: pass through (gigo)

  • Q: Error checking in library
    A: How about a "debug" #define flag that gets turned off in production.
    (would prefer not having it at all...)

  • Q: Do we need a level of abstraction above the memory pointer?
    A: Yes

  • Q: Do we still need a fast memcpy like interface?
    A: Yes, but it's niche for shared memory systems

  • Q: What should the processor id be?
    A: A simple integer

  • Q: At what level should we be doing work?
    A: At the team level

  • Q: What memory operations to support?
    A: host writing to epiphany registers-->p_write
    epiphany writing to other epiphany-->p_write
    host to shared memory-->p_write, assign alloc to a team

  • Q: Assume contiguous memory in all buffers?
    A: Yes

  • Q: Support no holds bar memory direct read/writes to addresses?
    A:

    1. declare a team that holds all the cores in the system
    2. create a 2D array of pal_mem_t structs, each one 32KB
  • Q: Is "remote malloc" on processor 'n' a bad idea?
    A: No, but implementation might be tricky:-) Can it work in a request/signal fashion?

  • Q: Make all data movement calls non-blocking?
    A: Yes, you can always add the blocking part in wrapper above.
    Non-blocking is the lego block.

  • Q: pal_* or p_*
    A: Clarity vs brevity?

  • Q: The building blocks.
    A:

    1. A current processor "X", ie "me".
    2. Other processors "Y(s)", ie "not me"
    3. A set of named memory buffers belonging to processor "X", AND "Y"
    4. A team of N processors, at least one X and one Y
    5. A startup program on processor X, ie "root processor".
  • Q: Make data structures completely opaque?
    A: Yes (a processor could be almost anything)

  • Q: Return value as return argument or in argument list?
    A: Prefer simple functions with short argument lists.
    We know what functional programming would suggest..

  • Q: Typedefs for some structs
    A: Yes, because the intent is to make these completely opaque

  • Q: Include events in argument list?
    A: No. Complicated, can't think of a clean solution

  • Q: Include event information in memory/team objects
    A: Read/write ordering and "done" is too tricky for everyone.

  • Q: Argument ordering in functions: inputs, flags, output
    A: Having flags last does make some sense

  • Q: Argument passing to run
    A: Enable passing arguments in "main style" and function-kernel style.
    Contract between caller/callee program. Short cuts allowed

  • Q: Is it practical to run the same executable on bare metal and in Linux?
    A:

  • Q: Use of "node" to represent a processor?
    A: The reason I chose this name was to abstract away as much as possible the biases people have. The word "core" has been completely bastardized by the industry and spans anything from a multiply unit up to a CPU running Linux. The word processor could work, but also carries with it a heavy history and suggests a microprocessor. Node is non-descript which is a good thing. I also like that a node is often represented by a dot, which ironically is what compute hardware has become. Programmers should think about memory, dots, and lines when architecting algorithms and programs. Anything beyond that is too much mental overhead.

  • Q: Is there a need for a new memory object?
    A: Yes, memory is not the same thing as a file and files are not appropriate as something to work with when doing HPC. We need very fast and direct manipulations of array data.

  • Q: Make ssize_t a first class citizen for memory manipulation
    A: Yes, we can no longer assume that all writes succeed. The faulty domain of clusters is making it down to chips.

  • Q: Why not make the PAL memory objects true file descriptors?
    A:

  • Q: Math arg name output, change to 'res' for clarity
    A:

  • Q: Include consts for input pointers to functions, any other keyword?
    A: