Techrocks Blog

- Back to Home

Creating a Robust Firmware Architecture for a CubeSat

By FerretCode

Background

At my university, I started a CubeSat project that began in the summer. The goal for this year is to develop a prototype satellite (not necessarily flight ready) to launch on a student high-power rocket, with the goal of sending it to low Earth orbit sometime down the line.

For this prototype, the goal is to build an MVP based on four custom subsystems, which are covered in more detail below. Since each subsystem must communicate and coordinate with each other to manage system state, perform experiments, and communicate with the ground, having a firmware architecture that's highly testable, modular, and easy to work with on all levels of abstraction is important, especially in a team environment.

This article details a firmware architecture "standard" that we're using for our project that integrates well with CI, Hardware-In-The-Loop testing, supports high levels of modularity, and is easy to reason about.

Satellite Subsystems

Before diving into the nitty gritty of the framework, having some context about what each subsystem is responsible for gives some background about where the need for a coordinated system comes in.

EPS

The first and most important subsystem on board is the EPS (Electrical Power System). This subsystem is responsible for charging a multi-cell battery pack from solar energy, regulating that energy into system rails, and then further distributing it to the rest of the subsystems. On top of this, the EPS also must provide a significant level of protection against harmful events like overcurrents or brownouts.

It's also the last barrier of protection in case any of the other subsystems go down--it acts as an external watchdog for the other boards by requesting heartbeats and asserting RESET signals in the case of an event.

OBC

The OBC (On Board Computer) is the main controller & coordinator for the rest of the satellite. On system initialization, it handles health checks for other subsystems, and ensures that everything comes up safely and in the correct order. Furthermore, it also handles the main mission logic at a high level by communicating with other subsystems to coordinate the flow between running experiments, collecting data, and downlinking it to the ground.

Comms

The communications board for this prototype will be a simple LoRa board that's directly managed through the OBC (not via another MCU) for telemetry & data downlink.

Payload

The payload board is the actual science instrument on the satellite. Its goal is to capture images in multiple wavelengths of light, which can then be further analyzed by algorithms like NDVI (Normalized Differential Vegetation Index) or NDCI (Normalized Differential Chlorophyll Index), which provide data on key climate health indicators like vegetation health.

Communication & Synchronization

Each of these subsystems is interdependent, which makes reliable communication and synchronization a first-order priority. For our prototype, subsystems communicate over two UART lines--a primary channel & an auxiliary backup channel--using a shared packet/codec format that standardizes commands, telemetry, and events across the satellite.

While this defines how data moves throughout the system, it still leaves how each subsystem manages its own internal state, hardware peripherals, and external command interface up in the air. Without a clearly defined structure for how mission logic, driver code, and data will flow throughout the firmware, the code can quickly become messy, tightly coupled, and difficult to reason about or test.

As such, we needed an architecture that clearly separates concerns like command handling, policy, peripheral control, and driver code, while still allowing these pieces to glue together into a cohesive system.

Concept 1: Layered Architecture

The first principle of our firmware architecture is a clear separation between high-level mission logic, peripheral management/utilities, and drivers. We designed the architecture around three layers (ordered from lowest -> highest level):

HAL/Drivers

The HAL (Hardware Abstraction Layer)/driver layer is where all of the hardware-specific code lives. Specifically, it provides interfaces for higher-level layers to interact with the hardware. For example, the EPS firmware HAL provides interfaces for interacting with GPIO, UART, I2C, ADCs, and more.

Each HAL provides a simple, high-level interface for consumers to interact with the hardware. For example, here is an excerpt from the EPS GPIO driver interface:

/**
 * @defgroup gpio_types Types
 * @ingroup gpio
 * @brief Types used by the GPIO driver.
 *
 * @{
 */

/**
 * @typedef gpio_callback_t
 * @brief Function signature for GPIO interrupt handlers.
 *
 * @param[in] pin The pin ID that triggered the interrupt.
 * @param[in] ctx User context pointer provided at registration.
 */
typedef void (*gpio_callback_t)(uint8_t pin, void *ctx);

/**
 * @enum gpio_pin_t
 * @brief GPIO pin descriptor
 */
typedef struct {
    gpio_mode_t mode;   /**< GPIO pin mode */
    gpio_state_t state; /**< GPIO state mode */
    gpio_pull_t pull;   /**< GPIO pull */
    bool irq_enabled;   /**< Whether the IRQ is enabled on this pin */

    gpio_callback_t cb; /**< Callback that fires when pin state changes */
    void *ctx;          /**< Context passed to the callback when fired */
} gpio_pin_t;

/** @} */ // end gpio_types

/**
 * @defgroup gpio_api Public API
 * @ingroup gpio
 * @brief External interface for interacting with the GPIO drvier.
 *
 * @{
 */

/**
 * @brief Initialize the GPIO driver.
 *
 * This should be called before using any other GPIO functions.
 */
void hal_gpio_init(void);

/**
 * @brief Set the mode of a GPIO pin.
 *
 * If an interrupt mode is selected, this function will configure
 * the NVIC (Interrupt Controller) but will NOT enable the callback
 * until gpio_register_callback() is called.
 *
 * @param[in] pin The GPIO pin
 * @param[in] mode The GPIO mode
 */
void hal_gpio_set_mode(uint8_t pin, gpio_mode_t mode);

// ...

As seen, the interface provides simple functions & types to make interacting with the hardware simple.

Services

The next layer up from the HAL is the service layer, which both provides utilities (like UART packet framing or logging), as well as granular mission logic (e.g., selecting which power rails to enable based on a selected power profile). Additionally, the service layer is used for peripheral management, like petting watchdogs or managing an MPPT controller. The service layer interacts directly with the HAL--each service consumes what it needs to fulfill its role.

Services are also designed to be loosely coupled--services do not directly call or interact with other services, instead choosing to use events (more on this later) for inter-service communication. This avoids large dependencies between services and allows the service to evolve naturally over time without having to go back up the call stack and rewrite everything. This event-driven architecture, along with the direct consumption of HAL drivers, also provides an interesting pattern--HAL drivers are not allowed to subscribe to or publish events, to keep the HAL policy free.

Because of this, though, when services need to interact with an asynchronous protocol like UART (and by extension packets that come from other subsystems), a pattern must be in place where multiple consumers can easily interact with the HAL in some asynchronous fashion.

The solution to this problem comes in the form of upgrading asynchronous callbacks from the HAL into events on the service layer. Each HAL driver that is asynchronous (UART, I2C using DMA, etc.) exposes a single callback "socket." Upon an event (or error) occurring, like the I2C DMA buffer filling with bytes, the driver fires the callback. The subscriber to this single callback is a service--when the callback is fired, the service handles it, correctly translating the HAL event into an event that other services can understand.

This pattern is seen no better in the interaction between the UART HAL driver and the UART events service. The UART HAL handles an interrupt and further processes incoming bytes via DMA like this:

/**
 * @brief Process new bytes from DMA buffer into User Ring Buffer
 *
 * This calculates how many bytes the DMA has written since we last checked,
 * copies them to the ring buffer, and fires the user callback.
 */
static void process_dma_input(uart_port_t port) {
    uart_port_state_t *state = &g_uart_state[port];

    // ...

    if (state->rx_callback != NULL) {
        state->rx_callback(port, state->rx_callback_ctx);
    }
}

The important bit to note here is how the user callback is fired. When this happens, the UART events service handles it like this:

/**
 * @brief Reassembly State Machine
 * Reconstructs a full packet frame from the byte stream.
 */
static void uart_process_byte(uart_events_t *uart, uint8_t byte) {
    uint8_t *current_buf = uart->packet_pool[uart->pool_index];

    // ...

    switch (uart->rx_state) {
    case RX_STATE_WAIT_START_BYTE:
        // ...

        break;

    case RX_STATE_READ_HEADER:
        // ...

        break;

    case RX_STATE_READ_PAYLOAD:
        current_buf[uart->decode_index++] = byte;

        if (uart->decode_index >= uart->expected_packet_len) {
            OSUSatPacket rx_packet;

            OSUSatPacketResult res = osusat_packet_unpack(
                &rx_packet, current_buf, uart->expected_packet_len);

            if (res == OSUSAT_PACKET_OK) {
                osusat_event_bus_publish(UART_EVENT_PACKET_RECEIVED, &rx_packet,
                                         sizeof(OSUSatPacket));

                // ...

                LOG_INFO(COMPONENT_UART_PRIMARY,
                         "Successfully decoded a packet of length %d",
                         uart->expected_packet_len);
            } else {
                // ...

                LOG_ERROR(COMPONENT_UART_PRIMARY,
                          "Failed to decode a packet of expected length %d",
                          uart->expected_packet_len);
            }

            // ...
        }
    }
}

The overall flow looks like: HAL event happens -> call user callback -> handle in a service -> translate into event & publish -> subscribers process.

Tangent aside, services just provide a single unit that can handle utilities, peripherals, and granular pieces of mission logic. The architecture of a service will be discussed in more detail in the section about events.

Applications

This is the final & highest-level layer. Applications handle the broad & high-level mission logic for the subsystem it manages. There may be multiple applications for one subsystem. For instance, the EPS firmware has two subsystems--a command handler, and a power policies application. The command handler processes commands from the OBC, and the power policies app responds to system changes and updates power rails/distribution accordingly.

The power policies application looks like:

void power_policies_init(power_policies_t *app) {
    if (app == NULL) {
        return;
    }

    memset(app, 0, sizeof(power_policies_t));
    app->initialized = true;

    // battery management events
    osusat_event_bus_subscribe(BATTERY_EVENT_CRITICAL_LOW, handle_battery_event,
                               app);
    osusat_event_bus_subscribe(BATTERY_EVENT_FULLY_CHARGED,
                               handle_battery_event, app);

    // mppt controller events
    osusat_event_bus_subscribe(MPPT_EVENT_FAULT_DETECTED, handle_mppt_event,
                               app);
    osusat_event_bus_subscribe(MPPT_EVENT_PGOOD_CHANGED, handle_mppt_event,
                               app);
    osusat_event_bus_subscribe(MPPT_EVENT_PGOOD_CHANGED, handle_mppt_event,
                               app);
}

static void handle_battery_event(const osusat_event_t *e, void *ctx) {
    power_policies_t *app __attribute__((unused)) = (power_policies_t *)ctx;

    switch (e->id) {
    case BATTERY_EVENT_CRITICAL_LOW:
        // on critical battery, request to switch to safe mode
        osusat_event_bus_publish(APP_EVENT_REQUEST_POWER_PROFILE_SAFE, NULL, 0);

        break;

    case BATTERY_EVENT_FULLY_CHARGED:
        // if battery is fully charged, we can request to go back to nominal
        // mode
        osusat_event_bus_publish(APP_EVENT_REQUEST_POWER_PROFILE_NOMINAL, NULL,
                                 0);

        break;

    default:
        break;
    }
}

static void handle_mppt_event(const osusat_event_t *e, void *ctx) {
    power_policies_t *app __attribute__((unused)) = (power_policies_t *)ctx;

    switch (e->id) {
    case MPPT_EVENT_FAULT_DETECTED:
        if (e->payload_len >= sizeof(uint8_t)) {
            uint8_t failed_channel = e->payload[0];

            osusat_event_bus_publish(APP_EVENT_REQUEST_MPPT_DISABLE_CHANNEL,
                                     &failed_channel, sizeof(uint8_t));

            // ...
        }

        // ...
    }
}

Here, the app subscribes to events that denote a change in system state, and then handles accordingly by publishing requests. Note that the application also uses the event bus, and it isn't just services. Alongside the applications themselves, the application layer also provides a set of events for communication specifically between applications & services:

/**
 * @defgroup app_events Application Events
 * @brief Defines events originating from the application layer.
 *
 * @{
 */

#define APP_SERVICE_UID 0xA00 // service UID for application-level events

/**
 * @enum app_event_id_t
 * @brief Application-level event IDs.
 */
typedef enum {
    REQUEST_POWER_PROFILE_NOMINAL = 0x10,
    REQUEST_POWER_PROFILE_SAFE,
    REQUEST_MPPT_ENABLE_CHANNEL,
    REQUEST_MPPT_DISABLE_CHANNEL,
    REQUEST_RAIL_CONTROLLER_ENABLE_RAIL,
    REQUEST_RAIL_CONTROLLER_DISABLE_RAIL,
    REQUEST_LOGGING_FLUSH_LOGS,
    REQUEST_REDUNDANCY_HEALTH,
    REQUEST_REDUNDANCY_COMPONENT_STATUS,
    REQUEST_REDUNDANCY_FAULT_LIST,
    REQUEST_REDUNDANCY_CLEAR_FAULT,
    REQUEST_REDUNDANCY_CLEAR_ALL,
    // ...
} app_event_id_t;

// ...

So, applications provide the highest-level of mission logic & system synchronization, but how are they glued together?

Glue

At the tippy top of the pyramid is the main entry point, which merges the HAL, services, and applications together:

int main() {
    // initialize BSP HAL
    HAL_Init();
    bsp_clock_init();

    MX_DMA_Init();

    MX_GPIO_Init();
    MX_ADC2_Init();

    MX_I2C1_Init();
    MX_I2C2_Init();
    MX_I2C3_Init();
    MX_I2C4_Init();

    MX_USART1_UART_Init();
    MX_USART3_UART_Init();

    MX_IWDG_Init();

    // initialize event bus
    osusat_event_bus_init(event_queue, EVENT_QUEUE_SIZE);

    // initialize HAL
    hal_time_init();

    uart_config_t uart_config = {.baudrate = 115200};

    hal_uart_init(UART_PORT_1, &uart_config);
    hal_uart_init(UART_PORT_3, &uart_config);

    // initialize services
    uart_events_init(&usart1_events_service, UART_PORT_1);
    uart_events_init(&usart3_events_service, UART_PORT_3);

    logging_init(OSUSAT_SLOG_INFO, &usart1_events_service,
                 &usart3_events_service);
    rail_controller_init(&rail_controller);
    power_profiles_init(&power_profiles_service, &rail_controller);
    mppt_init(&mppt_controller_service);
    redundancy_manager_init(&redundancy_manager_service);

    // initialize applications
    command_handler_init(&command_handler);
    power_policies_init(&power_policies);

    LOG_INFO(EPS_COMPONENT_MAIN, "Initialization complete");

    while (1) {
        osusat_event_bus_process();
    }

    return 0;
}

It handles initialization for the HAL, services, and applications, and then provides the main super-loop that processes and dispatches events.

Overall, the tiered architecture provides both a clean separation between layers, but also patterns that glue them together into a cohesive system.

Concept 2: Event Driven-ness

It's now established that both services and applications rely on events to communicate with each other instead of direct calls. This provides a few benefits:

No polling: consumers catch events right as they happen
No tight coupling: allowing services to stay separated makes maintainability easier
Clear structuring: the nature of how services interact exposes a template that makes building new services straightforward

Service Architecture

To the last point, having an easily expandable library of events, as well as some core system events (like system ticks) makes each service look similar. Take the battery management service as an example; its interface consists of a few main components:

A service ID & library of events:

/**
 * @brief Service Unique Identifier (16-bit).
 * Used to construct unique Event IDs.
 * "BA77" = BATT
 */
#define BATTERY_SERVICE_UID 0xBA77

typedef enum {
    /**
     * @brief Published when a critical fault is detected.
     * Payload: battery_status_t (Snapshot at time of failure)
     */
    BATTERY_FAULT_DETECTED = 0x10,

    /**
     * @brief Published when the battery management service passes its
     * self-check. Payload: NULL
     */
    BATTERY_SELF_CHECK_PASSED,

    /**
     * @brief Published when the battery management service fails its
     * self-check. Payload: failure mode
     */
    BATTERY_SELF_CHECK_FAILED,

    /**
     * @brief Published when voltage drops below critical threshold.
     * Payload: float (Current Voltage)
     */
    BATTERY_CRITICAL_LOW,

    /**
     * @brief Published when charging starts or stops.
     * Payload: bool (true = charging started, false = stopped)
     */
    BATTERY_CHARGING_CHANGE,

    /**
     * @brief Published when battery reaches 100% SoC.
     * Payload: NULL
     */
    BATTERY_FULLY_CHARGED,

    /**
     * @brief Periodic telemetry broadcast (e.g., every 10s or 1 min).
     * Payload: battery_status_t
     */
    BATTERY_TELEMETRY
} battery_event_id_t;

State management types:

/**
 * @struct battery_status_t
 * @brief Snapshot of system battery state
 *
 * This structure is returned by ::battery_get_status and is
 * used internally by the BMS to make decisions.
 */
typedef struct {
    float voltage; /**< Current pack voltage in volts */
    float current; /**< Pack current in amps (+ = charging, - = discharging) */
    float temperature; /**< Average pack temperature in C */
    float soc;         /**< State of charge (0-100%) */
    float soh;         /**< State of health estimate (0-100%) */
    bool charging;     /**< True if charging is currently active */
    bool balancing;    /**< True if balancing circuits are enabled */
    bool protection; /**< True if in battery protection mode (could be due to a
                        fault, etc.) */
} battery_status_t;

/**
 * @struct battery_management_t
 * @brief The battery management service
 */
typedef struct {
    battery_status_t battery_status; /**< The battery status */
    bool initialized;                /**< True if the BMS is initialized */
    uint32_t tick_counter;           /**< Internal counter for update loop */
    uint32_t telemetry_tick_counter; /**< Internal counter for telemetry */
} battery_management_t;

The API

/**
 * @brief Initialize the Battery Management Service.
 *
 * This must be called once at startup before any other BMS functions.
 * Initializes internal state, and performs a startup self-check.
 *
 * @note If called more than once, the internal battery state will be reset.
 *
 * @param[out] manager The battery manager
 */
void battery_init(battery_management_t *manager);

/**
 * @brief Apply charge-control policy.
 *
 * Enables or disables charging circuits based on SoC, temperature,
 * EPS power budget, and safety limits.
 *
 * @param[in] manager The battery manager
 */
void battery_charge_control(battery_management_t *manager, bool enable);

/**
 * @brief Enter battery protection mode.
 *
 * Used during overvoltage, deep discharge, or
 * other critical conditions. May disable EPS rails or charging.
 *
 * @param[in] manager The battery manager
 */
void battery_protect_mode(battery_management_t *manager);

The events define the interface with which other actors interact with it, the state management is used by the service to make decisions, and the API allows for a clean initialization flow & direct commands from the app if needed.

Additionally, the implementations look similar as well--each service contains something like:

Internal event handlers & procedures

#define BATTERY_UPDATE_INTERVAL_TICKS 10
#define TELEMETRY_INTERVAL_CYCLES 600

/**
 * @brief System Tick Handler.
 * Called automatically by the Event Bus.
 *
 * @param e   The event (SYSTICK).
 * @param ctx The context pointer (points to battery_management_t).
 */
static void battery_handle_tick(const osusat_event_t *e, void *ctx);

/**
 * @brief Internal update logic (reads sensors).
 */
static void battery_perform_update(battery_management_t *manager);

/**
 * @brief Hardware self-check (I2C comms, initial voltage).
 */
static bool battery_run_diagnostics(battery_management_t *manager);

Initialization

void battery_init(battery_management_t *manager) {
    if (manager == NULL) {
        return;
    }

    memset(manager, 0, sizeof(battery_management_t));

    bool healthy = battery_run_diagnostics(manager);

    if (healthy) {
        manager->initialized = true;
        osusat_event_bus_publish(BATTERY_EVENT_SELF_CHECK_PASSED, NULL, 0);
    } else {
        // ...
    }

    osusat_event_bus_subscribe(EVENT_SYSTICK, battery_handle_tick, manager);
}

Note the subscription to the system tick event. Time in the satellite is treated as just another event, allowing services to schedule themselves without having to manage timers or delays themselves. This is where the service is linked to the event bus.

Tick handling & state updates

static void battery_handle_tick(const osusat_event_t *e, void *ctx) {
    (void)e;

    battery_management_t *manager = (battery_management_t *)ctx;

    if (manager == NULL || !manager->initialized) {
        return;
    }

    manager->tick_counter++;

    if (manager->tick_counter >= BATTERY_UPDATE_INTERVAL_TICKS) {
        manager->tick_counter = 0;
        battery_perform_update(manager);
    }
}

static void battery_perform_update(battery_management_t *manager) {
    float voltage = 0; // TODO: replace with real read

    manager->battery_status.voltage = voltage;

    if (voltage < CRITICAL_BATTERY_VOLTAGE_THRESHOLD &&
        !manager->battery_status.protection) {
        battery_protect_mode(manager);

        osusat_event_bus_publish(BATTERY_EVENT_CRITICAL_LOW, &voltage,
                                 sizeof(float));
    }

    manager->telemetry_tick_counter++;

    if (manager->telemetry_tick_counter >= TELEMETRY_INTERVAL_CYCLES) {
        manager->telemetry_tick_counter = 0;

        osusat_event_bus_publish(BATTERY_EVENT_TELEMETRY,
                                 &manager->battery_status,
                                 sizeof(battery_status_t));
    }
}

// ...

Each service has a clear interface to fulfill: an external interface, state management, commands, and internal event handling.

Because all events are processed through a simple dispatch loop, ordering and determinism are preserved, which makes system behavior easy to reason about when a fault occurs, and makes debugging easier compared to navigating through deeply nested callbacks.

Not every interaction is modeled as an event-driven flow, though. Events are reserved for asynchronous state changes, notifications, and requests, while direct function calls/commands are still used for synchronous configuration, and one-off commands during init or simple state changes.

In practice, this results in services that all share a common shape: a well-defined external event interface, internal state, and a small set of commands. This consistency lowers the cognitive overhead of adding new functionality, and makes it easier for contributors to think about interactions between services and the broader satellite.

The Mechanism

Now that we've covered how events work in practice, we can introduce the mechanism behind how events are published, subscribed to, and work under the hood. We have a core library that provides primitives like ring buffers, structured logging (more on this in the traceability section), and importantly the event bus.

Each interaction with the event bus starts at the definition level; services need to define their events and register them with the event bus. This is performed through a set of macros that transform enum fields into IDs:

/**
 * @brief Event Identifier Type (32-bit).
 *
 * Constructed using OSUSAT_BUILD_EVENT_ID().
 */
typedef uint32_t osusat_event_id_t;

/**
 * @brief Helper to build a unique ID from a Service UID and Local Code.
 *
 * @param svc_uid  Unique 16-bit Service Identifier (e.g. 0xBA77 for Batt).
 * @param code     Local enum value (0-65535).
 */
#define OSUSAT_BUILD_EVENT_ID(svc_uid, code)                                   \
    (((uint32_t)(svc_uid) << 16) | ((uint32_t)(code) & 0xFFFF))

/**
 * @brief Helper to extract the Service UID from an Event ID.
 */
#define OSUSAT_GET_SERVICE_UID(event_id) ((uint16_t)((event_id) >> 16))

/**
 * @brief Helper to extract the Local Code from an Event ID.
 */
#define OSUSAT_GET_LOCAL_CODE(event_id) ((uint16_t)((event_id) & 0xFFFF))

/**
 * @brief Reserved UID for Core System Events.
 */
#define OSUSAT_SERVICE_UID_SYSTEM 0x0000

To note, the event bus itself also provides some simple events:

/**
 * @brief System Event Codes.
 */
typedef enum {
    SYSTEM_SYSTICK = 1, /**< Periodic heartbeat (e.g. 100Hz) */
    SYSTEM_INIT_DONE,   /**< All services initialized */
    SYSTEM_HEARTBEAT    /**< Heartbeat event for health monitoring */
} osusat_system_code_t;

#define EVENT_SYSTICK                                                          \
    OSUSAT_BUILD_EVENT_ID(OSUSAT_SERVICE_UID_SYSTEM, SYSTEM_SYSTICK)
#define EVENT_SYSTEM_INIT                                                      \
    OSUSAT_BUILD_EVENT_ID(OSUSAT_SERVICE_UID_SYSTEM, SYSTEM_INIT_DONE)

Next is the actual event bus interface itself:

/**
 * @struct osusat_event_t
 * @brief The event object stored in the queue.
 */
typedef struct {
    osusat_event_id_t id;                      /**< Composite Event ID */
    uint8_t payload[OSUSAT_EVENT_MAX_PAYLOAD]; /**< Data copy */
    uint8_t payload_len;                       /**< Valid bytes in payload */
} osusat_event_t;

/**
 * @brief Event Handler Callback definition.
 *
 * @param[in] event Pointer to the event data.
 * @param[in] ctx   User context pointer registered during subscription.
 */
typedef void (*osusat_event_handler_t)(const osusat_event_t *event, void *ctx);

/** @} */ // end osusat_event_bus_types

/**
 * @defgroup osusat_event_bus_api Public API
 * @ingroup osusat_event_bus
 * @brief External interface for interacting with the Event Bus.
 *
 * @{
 */

/**
 * @brief Initialize the Event Bus.
 *
 * Configures the internal ring buffer and clears subscribers.
 *
 * @param[in] queue_storage  Pointer to allocated array of event structs.
 * @param[in] queue_capacity Number of elements in the storage array.
 */
void osusat_event_bus_init(osusat_event_t *queue_storage,
                           size_t queue_capacity);

/**
 * @brief Subscribe to an event.
 *
 * Registers a callback to be invoked when the specific Event ID occurs.
 *
 * @param[in] event_id The Composite ID to listen for.
 * @param[in] handler  The function to call.
 * @param[in] ctx      Optional context pointer passed to the handler.
 *
 * @retval true  Subscription added successfully.
 * @retval false Subscriber table full (increase OSUSAT_EVENT_MAX_SUBSCRIBERS).
 */
bool osusat_event_bus_subscribe(osusat_event_id_t event_id,
                                osusat_event_handler_t handler, void *ctx);

/**
 * @brief Publish an event to the bus.
 *
 * Copies the event data into the queue. Safe to call from ISRs.
 *
 * @param[in] event_id The Composite ID of the event.
 * @param[in] payload  Pointer to data to copy (can be NULL).
 * @param[in] len      Length of data (must be <= OSUSAT_EVENT_MAX_PAYLOAD).
 *
 * @retval true  Event queued successfully.
 * @retval false Queue full (Event Dropped!).
 */
bool osusat_event_bus_publish(osusat_event_id_t event_id, const void *payload,
                              size_t len);

/**
 * @brief Process the Event Queue.
 *
 * Pops all pending events and executes their subscribers.
 * @warning Must be called from the main loop (Thread Mode).
 */
void osusat_event_bus_process(void);

Importantly, it defines the format that events are passed to subscribers in, as well as how contextual information & payloads are provided to event handlers. Every time an event is published, the implementation pushes the event into a queue to be processed. When the next iteration of the main loop occurs, the process function is called, draining the queue and dispatching events to their subscribers, which are stored internally.

At a high level, the event bus acts as the system's synchronization mechanism. Events can be published from anywhere, but they are always processed & dispatched in a single place in the main loop, which makes reasoning about the system easy, during both nominal operation & fault scenarios.

This mechanism favors simplicity & predictability over throughput. It remains easy to test & debug, but is sufficient for coordinating behavior between services & applications.

Concept 3: Testing & The Build System

With testability being a core pillar of our architecture, we needed an easy way to simulate hardware components and interactions without relying on physical hardware.

This approach starts with mock implementations of real hardware interactions.

#include "hal_adc_mock.h"
#include <stdint.h>
#include <stdio.h>

#define MAX_ADC_CHANNELS ADC_CHANNEL_MAX

static uint16_t mock_adc_values[MAX_ADC_CHANNELS];

void hal_adc_init(void) {
    printf("MOCK: ADC initialized\n");

    for (int i = 0; i < MAX_ADC_CHANNELS; i++) {
        mock_adc_values[i] = 0;
    }
}

uint16_t hal_adc_read(adc_channel_t channel) {
    if (channel >= MAX_ADC_CHANNELS) {
        printf("MOCK ERROR: ADC channel %d out of bounds\n", channel);
        return 0;
    }

    uint16_t value = mock_adc_values[channel];
    printf("MOCK: Reading ADC channel %d => %u\n", channel, value);
    return value;
}

void mock_adc_set_value(adc_channel_t channel, uint16_t value) {
    if (channel >= MAX_ADC_CHANNELS) {
        printf("MOCK ERROR: ADC channel %d out of bounds\n", channel);
        return;
    }

    printf("MOCK: Setting ADC channel %d to %u\n", channel, value);
    mock_adc_values[channel] = value;
}

Since each mock uses the same interface, we can use dependency injection at build time via the build system to conditionally link the real or mock implementation based on the build targets (e.g., building for HITL, x86 architectures, etc.):

if(TARGET_ARCH STREQUAL "ARM")
     file(GLOB HAL_SRC
        hal/*.c
        bsp/hal_driver/Src/*.c
        bsp/mcu/Src/*.c
        bsp/system/*.c
        bsp/startup/*.c
        bsp/startup/*.s
    )

    add_executable(eps_firmware
        main.c
        ${APP_SRC}
        ${SERVICES_SRC}
        ${HAL_SRC}
    )

    target_include_directories(eps_firmware PRIVATE
        bsp/cmsis/Include
        bsp/cmsis/Device/ST/STM32L4xx/Include
        bsp/hal_driver/Inc
        bsp/mcu/Inc
        bsp/system
        hal
        app
        services
        config
    )

    target_compile_definitions(eps_firmware PRIVATE
        STM32L496xx
        USE_HAL_DRIVER
    )

elseif(BUILD_HITL)
    message(STATUS "Building for HITL testing...")
    add_definitions(-DHITL)

    set(HITL_HAL_MOCKS
        mocks/bsp/hal_stubs.c
        mocks/hal_adc_mock.c
        mocks/hal_gpio_mock.c
        mocks/hal_uart_mock.c
        mocks/hal_i2c_mock.c
        mocks/hal_time_mock.c
        mocks/bsp/clock_mock.c
    )

    add_executable(eps_firmware
        main.c
        ${APP_SRC}
        ${SERVICES_SRC}
        ${HITL_HAL_MOCKS}
    )

    include_directories(
        mocks/bsp
        hal
        mocks
        app
        services
        config
    )
endif()

This way, based on the options passed to CMake, we use the correct configuration of mock & real implementations.

Furthermore, to support our testing requirement, we use ctest for unit testing:

void test_battery_critical_low(void) {
    printf("Running test: %s\n", __func__);

    mock_event_bus_reset();

    battery_management_t manager;

    battery_init(&manager);

    mock_event_bus_reset_published(); // clear events from init

    // trigger the tick handler enough times to call battery_perform_update
    for (int i = 0; i < BATTERY_UPDATE_INTERVAL_TICKS; i++) {
        mock_event_bus_trigger(EVENT_SYSTICK, NULL, 0);
    }

    assert(mock_event_bus_get_published_count() > 0);

    bool critical_event_found = false;

    for (int i = 0; i < mock_event_bus_get_published_count(); ++i) {
        captured_event_t event = mock_event_bus_get_published_event(i);

        if (event.id == BATTERY_EVENT_CRITICAL_LOW) {
            critical_event_found = true;
        }
    }

    assert(critical_event_found);
    assert(manager.battery_status.protection);

    printf("Test passed.\n");
}

Tests are written at both the service & driver level--this means we can test both how drivers interact but also how events affect how services run and interact. Furthermore, we have a HITL test harness in the works for the application level testing--when running the app, it stimulates events in the system & causes services to perform work.

Concept 4: The BSP

All of our subsystems are based on STM32 microcontrollers (EPS uses the STM32L4, and the OBC uses the STM32H7, etc.). While our architecture defines how the system is structured and interacts, it doesn't define how hardware specific code is written. To supplement this, we introduced a BSP (Board Support Package). Using STM32CubeMX, we generated a project with the correct hardware & peripheral configuration, and brought in the generated code into a supporting, linkable format. The file tree looks like:

.
├── cmsis
│   ├── Device
│   │   └── ST
│   │       └── STM32L4xx
│   │           ├── Include
│   │           │   ├── stm32l496xx.h
│   │           │   ├── stm32l4xx.h
│   │           │   └── system_stm32l4xx.h
│   │           ├── License.md
│   │           └── LICENSE.txt
│   └── Include
│       ├── ...
├── hal_driver
│   ├── Inc
│   │   ├── ...
│   └── Src
│       ├── ...
├── mcu
│   ├── Inc
│   │   ├── adc.h
│   │   ├── dma.h
│   │   ├── gpio.h
│   │   ├── i2c.h
│   │   ├── iwdg.h
│   │   ├── main.h
│   │   ├── stm32l4xx_hal_conf.h
│   │   ├── stm32l4xx_it.h
│   │   ├── tim.h
│   │   └── usart.h
│   └── Src
│       ├── adc.c
│       ├── dma.c
│       ├── gpio.c
│       ├── i2c.c
│       ├── iwdg.c
│       ├── stm32l4xx_hal_msp.c
│       ├── stm32l4xx_it.c
│       ├── syscalls.c
│       ├── sysmem.c
│       ├── tim.c
│       └── usart.c
├── startup
│   ├── startup_stm32l496xx.s
│   ├── STM32L496XX_FLASH.ld
│   └── system_stm32l4xx.c
└── system
    ├── clock.c
    ├── clock.h
    ├── dma.c
    ├── dma.h
    ├── error.c
    └── error.h

Instead of building our firmware around ST's project structure, we brought over all of the supporting generated code into our own project and linked them as necessary.

Earlier, in the entry point, you may have noticed that all of the STM32 HAL components were initialized:

// initialize BSP HAL
HAL_Init();
bsp_clock_init();

MX_DMA_Init();

MX_GPIO_Init();
MX_ADC2_Init();

MX_I2C1_Init();
MX_I2C2_Init();
MX_I2C3_Init();
MX_I2C4_Init();

MX_USART1_UART_Init();
MX_USART3_UART_Init();

MX_IWDG_Init();

With our CMake build system that links the BSP components:

    file(GLOB HAL_SRC
    hal/*.c
    bsp/hal_driver/Src/*.c
    bsp/mcu/Src/*.c
    bsp/system/*.c
    bsp/startup/*.c
    bsp/startup/*.s
)

add_executable(eps_firmware
    main.c
    ${APP_SRC}
    ${SERVICES_SRC}
    ${HAL_SRC}
)

target_include_directories(eps_firmware PRIVATE
    bsp/cmsis/Include
    bsp/cmsis/Device/ST/STM32L4xx/Include
    bsp/hal_driver/Inc
    bsp/mcu/Inc
    bsp/system
    hal
    app
    services
    config
)

target_compile_definitions(eps_firmware PRIVATE
    STM32L496xx
    USE_HAL_DRIVER
)

It enables us to use the STM32 HAL in our architecture. Then, as the BSP is conditionally linked between mock implementations & the real HAL, it allows us to build for other architectures besides ARM-Cortex (like host machines that are x86).

Concept 5: Traceability & Observability

The final important pillar for our firmware architecture is expanding the system to be observable. This means easily being able to dissect logs, trace the system through its flow, and debug issues given enough information. This is accomplished in two main parts:

Part 1. Per-Subsystem Structured Logging

As mentioned previously, our core library provides a primitive for structured logging. Within each subsystem's firmware, logging is performed using macros provided by the core lib:

/**
 * @brief Main logging macro.
 *
 * Automatically captures the source line number and formats the message.
 *
 * @param[in] level     Severity level (osusat_slog_level_t).
 * @param[in] component Component identifier (uint8_t).
 * @param[in] fmt       Printf-style format string.
 * @param[in] ...       Variable arguments for format string.
 *
 * Example:
 * @code
 * OSUSAT_SLOG(OSUSAT_SLOG_WARN, EPS_BATTERY, "Voltage low: %dmV", voltage);
 * @endcode
 */
#define OSUSAT_SLOG(level, component, fmt, ...)                                \
    osusat_slog_write_internal(level, component, __LINE__, fmt, ##__VA_ARGS__)

/**
 * @brief Log a DEBUG level message.
 *
 * @param[in] component Component identifier.
 * @param[in] fmt       Printf-style format string.
 * @param[in] ...       Variable arguments.
 */
#define LOG_DEBUG(component, fmt, ...)                                         \
    OSUSAT_SLOG(OSUSAT_SLOG_DEBUG, component, fmt, ##__VA_ARGS__)

/**
 * @brief Log an INFO level message.
 *
 * @param[in] component Component identifier.
 * @param[in] fmt       Printf-style format string.
 * @param[in] ...       Variable arguments.
 */
#define LOG_INFO(component, fmt, ...)                                          \
    OSUSAT_SLOG(OSUSAT_SLOG_INFO, component, fmt, ##__VA_ARGS__)

/**
 * @brief Log a WARN level message.
 *
 * @param[in] component Component identifier.
 * @param[in] fmt       Printf-style format string.
 * @param[in] ...       Variable arguments.
 */
#define LOG_WARN(component, fmt, ...)                                          \
    OSUSAT_SLOG(OSUSAT_SLOG_WARN, component, fmt, ##__VA_ARGS__)

/**
 * @brief Log an ERROR level message.
 *
 * @param[in] component Component identifier.
 * @param[in] fmt       Printf-style format string.
 * @param[in] ...       Variable arguments.
 */
#define LOG_ERROR(component, fmt, ...)                                         \
    OSUSAT_SLOG(OSUSAT_SLOG_ERROR, component, fmt, ##__VA_ARGS__)

/**
 * @brief Log a CRITICAL level message.
 *
 * @param[in] component Component identifier.
 * @param[in] fmt       Printf-style format string.
 * @param[in] ...       Variable arguments.
 */
#define LOG_CRITICAL(component, fmt, ...)                                      \
    OSUSAT_SLOG(OSUSAT_SLOG_CRITICAL, component, fmt, ##__VA_ARGS__)

Then a subsystem can use it like:

LOG_INFO(EPS_COMPONENT_MAIN, "System health changed to %s", health_str);

Internally, the slog implementation pushes log entries produced by the firmware into a ring buffer for later use.

Part 2. Log Flushing

In flight, no one will be up in space with a laptop to use the serial monitor on the board. This means that in order to retrieve logs from the satellite for debugging or telemetry purposes, we must downlink our logs to the ground station. Our slogging implementation handles this by keeping the slog implementation generic, and leaving the flushing implementation up to the consumer.

For instance, our EPS's flushing mechanism sens each log over UART to the OBC in a packet:

static void send_log_packet(log_flush_context_t *ctx, bool is_last) {
    // don't flush if we have no UART service connected
    if (ctx->payload_offset == 0 || g_active_uart == NULL ||
        !g_active_uart->initialized) {
        return;
    }

    OSUSatPacket packet = {.version = 1,
                           .destination = OSUSatDestination_OBC,
                           .source = OSUSatDestination_EPS,
                           .message_type = OSUSatMessageType_LOG,
                           .command_id = OSUSatCommonCommand_LOG,
                           .sequence = ctx->sequence,
                           .is_last_chunk = is_last,
                           .payload_len = (uint8_t)ctx->payload_offset,
                           .payload = ctx->payload_buffer};

    uart_events_send_packet(g_active_uart, &packet);

    ctx->payload_offset = 0;
}

Then, when the OBC receives a set of log packets from a subsystem, it can queue them up for downlink and send them down to the ground station.

This is extra powerful with our support of mock implementations as well. During testing on the bench, it might not be as convenient to have to flush our logs to the "OBC" just to see the logs. We can simply create a mock that uses printf for flushing & see it right on the desktop.

Final Thoughts

Designing firmware for a distributed system like a CubeSat presents challenges that require complex synchronization, communication, and data flow that requires an architecture beyond simple scripts. Reliability, testability, observability, maintainability, and the ability for multiple contributors to reason about the system all become priorities in firmware design.

The architecture outlined here aims to address those constraints by implementing a few core ideas:

Clear separation of concerns via a layered design
Event-driven communication to reduce coupling and improve determinism
Build-time dependency injection to enable testing without hardware
A simple, predictable execution model that favors "debuggability" over throughput
Structured logging & traceability to make failures diagnosable after the fact

When combined into one architecture, they form a framework that scales well as complexity grows. Both services & applications share common shapes, interactions are explicit & traceable through events, and most of the system can be implemented and tested before the hardware arrives. This consistency makes onboarding contributors easier, and lets them jump in on one component that makes sense both on its own and as a part of a larger system.

There are tradeoffs, of course--an event bus & super loop architecture have their drawbacks compared to an RTOS based system--but for our prototype & student team, this approach has proven to work well.