Skip to content

65C816 Platform Porting Guide

EtchedPixels edited this page Apr 26, 2018 · 1 revision

FUZIX 65C816 Guide

PRELIMINARY

Introduction

The 65C816 target for FUZIX provides a single implementation of the core code required to support FUZIX on typical configurations of these processors.

Providing your platform fits the basic structure anticipated only small amounts of non driver platform code are required. The main work is laying out the memory map.

The port shares an ABI with the 6502 target, although processes can use the additional features with certain limitations imposed by the CPU architecture.

Limitations of The Port

o CC65 currently has no floating point support. This needs someone to write an open source library to unpack/pack IEEE float and do basic maths operations upon it.

o There are architectural differences between the 6502/65C02/65C816 that cannot be hidden. In particular the 65C816 requires that the processor stack and zero page reside in bank 0, while the 6502/65C02 places them at the bottom of the current bank.

o The 65C816 supports separate 64K code and data spaces. This is not yet supported. In theory with the right kernel magic multiple code banks are supportable.

o Relocation of loaded executables is not yet supported.

o Swapping is not yet supported.

o cc65 does not know how to generate 65C816 optimized code. For the kernel in particular it would be a good project for someone to teach ANSI pcc how to generate suitable 65c816 code.

o No proper support for exceptions such as abort yet.

Reference Platform

The reference platform for development and testing is v65c816. This is a simple software 65C816 simulation of a platform with basic I/O devices and 8MB of RAM from address 0, with a 256 byte I/O window at $FE00 and a simple simulated framebuffer.

We require 512 bytes (ZP and CPU stack) for each process in memory at the same time. Thus with 512K RAM we have 7 processes in memory (and bank 0 holds the rest), so we need 3.5K for DP/ZP spaces.

With 8MB of RAM that rises to 63.5K which requires that the kernel moves to another bank. In practice it's quite hard to use 8MB of RAM for processes on a 65C816 as that amounts to 128 processes which for such a slow processor is unlikely to be a practical workload.

There are two reference kernels: v65c816 for the reference platform places the kernel in bank 0, along with the CPU stack and direct page (zero page) for each process. This model should be directly usable for smaller systems up to about 512K RAM.

For larger systems v65c816-big shows how to develop a platform where you only place stacks and direct pages in bank 0. In the reference kernel the kernel code lives in bank 1 and the kernel data in bank 2.

Execution Environment

Userspace applications are loaded at the load address indicated in their header providing they fit within the available memory. Thus even though we don't yet support relocations a binary linked at $2000 can be loaded on a system with userspace from $0200-$FE00 as it will fit with space below.

Applications are entered in 65C816 mode in a8i8 form. DP and S are set to suitable memory spaces and S is at the top of a 256 byte stack that cc65 uses only for call/return paths and briefly when doing work. On entry the registers are defined as follows

X:	high byte of load address
Y:	zero page base (will be used with relocating)
A:	currently zero (undefined)

The C stack top holds argc, argv, and the environment (see Library/libs/crt0_6502.s).

The application memory is arranged with the code at the bottom, then any writeable data then bss (variables initialized to 0). Above that is space and at the top of the available user memory is the C language stack, a construct maintained entirely by cc65. The stack expands down and sbrk() allows the data to expand upwards.

System calls are made by calling [$00FE] (which should not be assumed to be the same as $FE in zero page). At the moment this is hard coded but does need to end up in the relocation system for 6502 platforms where those bytes are not usable. Calls should be made in a8i8 mode, and the return from them will likewise be a8i8.

System calls are made like normal C calls and the syscall interface pulls the arguments off the application C stack as if it was a function call. This also means that the compiler knows that zp:0 is the C stack pointer, and this is also hardcoded until we get relocations.

System call state is returned in XY and the error code if any is in A. If there is no error then A is zero.

At any point the binary that is being executed may be swapped out and at some point swapped back in from storage and resumed. When it is resumed it may well be allocated a different bank, different S and different DP.

An application must not

  • Assume that DP is $0000 in your current bank, or that the CPU stack is $0100 in your current bank. On 65C816 this will not be true even if it is true on the 6502 itself.

  • Save or restore DP, B or S itself

  • Do any long operations (rtl etc) or use mvn/mvp (as we might move bank mid mvn/mvp)

  • Compute addresses based upon S or DP. They may not remain valid. It is ok to do things like LDA 6,S but not to do TSX LDA 6,X. Treat S as if it was a stack that lives in its own memory space not accessible elsewhere

cc65 (as of the latest version) handles these rules correctly. Older cc65 assumes ZP is at $0000 but the C library has a workaround to replace the problem routine with its own. The C library also contains a less than pretty implementation of longjmp that does the right thing.

Signal Handling

Signals are particularly tricky in the 6502/65C816 world as the C stack is a software construction. In order to isolate the kernel from the complexities of what each compiler does a signal is handled by pushing a system specific save area followed by the signal number and the vector to invoke. Rather than returning via that vector the kernel calls [load_address + $20] which is a pointer to a helper routine in user space (again see crt0_6502.s).

When the signal handler completes the helper is expected to restore the C state and return. The platform specific handler provided by the kernel will then restore state and return to the interrupted point.

Porting To A New Platform

It is easiest to start with a copy of the v65c816 platform.

config.h holds the system configuration as seen from C. kernel.def holds the configuration in a form the assembler can use. Always remember to change both when a symbol is present in both places.

Start with config.h. The only things you should need to change are

MAX_MAPS: The number of user processes that we can have loaded at once. This is normally 1 per 64K of memory minus one for the kernel (3 for 256K 7 for 512K, 15 for 1MB).

STACK_BANKOFF: This specifies where in bank 0 to begin allocation of the zero pages and direct pages for each process. This can be adjusted to suit the memory map and the number of pages (two per process in memory) required

TICKSPERSEC: Number of timer interrupt events per second. If you can configure this then 10/second is ideal.

The rest should not need touching for a typical setup.

kernel.def contains another copy of STACK_BANKOFF that should be set the same.

The following other values might need touching

KERNEL_BANK - the bank number of the kernel KERNEL_FAR - the 24bit base of the kernel

The low level platform code lives in v65.s and crt0.s. crt0.s is the setup code that runs before the C environment and kernel. It is responsible for things like wiping the BSS and setting the stack pointers. This will need rewriting for whatever boot loader and boot you are using.

v65.s contains the platform specific routines.

_trap_monitor: jumps to a monitor program if present, otherwise halts the system. It is used on a detected crash to help debug. One option is to load a debugger into the top bank and drop MAX_MAPS by 1 when debugging

_trap_reboot: if possible causes a reboot, if not halts the system

__hard_di: __hard_ei: __hard_irqrestore:

These are low level wrappers for the interrupt masking. They can be copied from the reference port unless you need to implemnt 'soft' IRQ disabling to get better handling of things like serial interrupts.

init_early:

This is called from crt0 and in our example case we use it to place the exception vectors in the correct addresses

init_hardware:

This does any early hardware setup. Interrupts are off so things like timers

kernel.def contains another copy of STACK_BANKOFF that should be set the same.

The following other values might need touching

KERNEL_BANK - the bank number of the kernel KERNEL_FAR - the 24bit base of the kernel

But note that as of writing only bank 0 kernel has been tested

The low level platform code lives in v65.s and crt0.s. crt0.s is the setup code that runs before the C environment and kernel. It is responsible for things like wiping the BSS and setting the stack pointers. This will need rewriting for whatever boot loader and boot you are using.

v65.s contains the platform specific routines.

_trap_monitor: jumps to a monitor program if present, otherwise halts the system. It is used on a detected crash to help debug. One option is to load a debugger into the top bank and drop MAX_MAPS by 1 when debugging

_trap_reboot: if possible causes a reboot, if not halts the system

__hard_di: __hard_ei: __hard_irqrestore:

These are low level wrappers for the interrupt masking. They can be copied from the reference port unless you need to implemnt 'soft' IRQ disabling to get better handling of things like serial interrupts.

init_early:

This is called from crt0 and in our example case we use it to place the exception vectors in the correct addresses

init_hardware:

This does any early hardware setup. Interrupts are off so things like timers can be enabled ready, memory sized etc.

_program_vectors:

On some architectures this copies vectors between banks. For the 65C816 this isn't needed so can be an rts

outchar:

This routine should print the contents of the A to whatever is a good debug port to use. It must not corrupt X or Y. This is used to output debugging messages and crash messages from assembler code for debug.

The file also contains examples of block read/write code for a disk driver.

C Files

main.c: This holds routines used in the set up and general running of the platform.

platform_idle: is called in a tight loop when there is no process to run. If the platform is entirely IRQ driven it could sleep to save power, but with systems that have polled serial it is usually best to keep polling the serial to get the best responsiveness and least dropped characters.

do_beep: is only used if you enable VT as opposed to serial ports and should make the beep noise for control-G.

pagemap_init/map_init should not need touching for a 65c816 port

platform_param: is called with each string on the boot line that is not known by the core code. The platform can claim a string by returning 1, and then do whatever setup is required for this.

devtty.c provides a very simple example serial port driver for the console port and on the v65c816 platform a simple video driver.

devhd.c provides a very simple example block device.

devices.c provides the tables and helper functions that bind all the devices and the kernel core together. The table can be altered to suit the needs of the system. device_init() is a hook to initialize/probe any devices during system startup.

Large Systems

Once the amount of memory rises above 512K it becomes trickier to find a space for all of the ZP and DP blocks that are in use. In addition the number of buffers and inode cache entries required for lots of processes puts further pressure on allocations.

A large configuration would therefore want to place the ZP/DP buffers in bank 0, maybe along with tty buffers to use up space if needed. The kernel can in theory run split I/D across banks 1 and 2, which would allow enough space for networking and a lot of buffers. Such a configuration could support up to 8MB without the core code changing. Beyond that point there are assumptions in lowlevel-65c816.s that you can take the user page number shift it left and add it to a base to get an offset. This breaks once you pass 128 banks (8MB). In addition once you hit 128 processes you run out of bank 0 memory to hold their DP/stack.

Moving the kernel also of course requires updating the exception vectors to jump into the right bank, and their return paths to do the right thing.

For an example see platform-v65c816-big.