papers
/
2024d_execution_model


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267
							\section{Detailed Intermittent Execution Model}
\label{sec:detailed_execution_model}

In this section, we describe our execution model and its implications for software design.
In Sec.~\ref{sec:system_description}, we introduce target architecture and the reference system used for evaluations.
Sec.~\ref{sec:execution_model} presents the proposed execution model, designed based on the key observations from experimental results.
In the following three sections, we discuss how this model affects both the power efficiency and correctness of software design.
Finally, in Sec.~\ref{sec:other_architectures}, we evaluate the effectiveness of our model across systems with different architectural configurations. 

\subsection{System Description}
\label{sec:system_description}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/cropped/system.pdf}
    \caption{A typical hardware setup of intermittent systems.}
    \label{fig:hardware_setup}
\end{figure}

A typical intermittent system consists of two main components: a power management system and a computing system, as illustrated in Fig.~\ref{fig:hardware_setup}.
The power management system is responsible for accumulating the incoming energy into storage and providing a stable-voltage current to the computing system.
The computing system equips NVMs along with the MCU and peripherals, and utilize the NVMs for state retention between power failures.

This setup includes two notable decoupling capacitors that affect the execution model of intermittent systems.
The first one (C1 in the figure) is placed at the power management system as voltage regulators require a capacitor larger than the device-specific minimum capacitance for stable operation.
Also, the computing system has its own decoupling capacitor (C2) to stabilize operating voltage.

Recent studies increasingly explore 32-bit architectures for the computing system~\cite{shihIntermittent2024,wuIntOS2024,kimRapid2024,akhunovEnabling2023,kimLACT2024,kimLivenessAware2023,parkEnergyHarvestingAware2023,kortbeekWARio2022,khanDaCapo2023,barjamiIntermittent2024,songTaDA2024}, as emerging applications on intermittent systems, such as Deep Neural Networks (DNNs)~\cite{houTale2024,yenKeep2023,khanDaCapo2023,gobieskiIntelligence2019,islamEnabling2022,kangMore2022,leeNeuro2019,islamZygarde2020,custodeFastInf2024,barjamiIntermittent2024,songTaDA2024}, demand greater computational capabilities~\cite{bakarProtean2023a,carontiFinegrained2023}.
In this context, we employ a custom-built board featuring a 32-bit ARM Cortex-M33 processor (STM32L5, operating at 16Mhz) with 512KB of Ferroelectric RAM (FRAM) as a reference system.
A TI BQ25570 based board is used for the power management system, with power-on and off thresholds of 4.9V and 3.4V, respectively.
We empirically select 22uF and 220uF capacitors for C1 and C2, respectively, as these are the minimum capacitor sizes for stable checkpoint and recovery.
Sec.~\ref{sec:other_architectures} evaluates generality of our model in different architectures, such as systems with Magnetic RAM (MRAM) and a 16-bit core (e.g., MSP430).

% In this work, our goal is to model the buffering effects of these capacitors and evaluate their implications on software designs.
% (Recent studies present the need for better computing capability~\cite{bakarProtean2023a})
% For model validation and evaluation, we use a custom-built board equipped with an ARM Cortex-M33 core and 512KB of FRAM.
% Our setup requires XXuF and 220uF capacitors for C1 and C2, respectively, for stable execution of checkpoint and recovery.
% Sec.~\ref{sec:other_architectures} evaluates our model in different architectures.

\subsection{Execution Model}
\label{sec:execution_model}

\begin{figure}
    \centering
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_8a_cropped.pdf}
        \caption{Voltage traces for one power cycle.}
        \label{fig:execution_trace_one_cycle}
        \vspace{5pt}
    \end{subfigure}
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_8b_cropped.pdf}
        \caption{Voltage traces around the first power-on.}
        \label{fig:execution_trace_detailed}
    \end{subfigure}
    \caption{Voltages trace of energy storage and Vdd.}
    \label{fig:execution_trace}
\end{figure}

To derive general execution model with the effects of decoupling capacitors, we first present a sample measurement from our reference system.
To generate operation time of 50ms under 1.5mA current supply, we use a 470uF capacitor for energy storage.
Fig.~\ref{fig:execution_trace_one_cycle} shows the traces of the energy storage voltage and the MCU operating voltage (Vdd) for one power cycle.
Note that Vdd is maintained by decoupling capacitors after current supply from the power management system stops.
The shaded areas represent the ranges that system executes the application code.
% Fig.~\ref{fig:execution_trace_one_cycle} shows the trace during one power cycle, and Fig.~\ref{fig:execution_trace_detailed} presents the first execution cycle in more detail.

Fig.~\ref{fig:execution_trace_detailed} presents the first execution cycle in more detail. It shows several interesting differences between the traditional execution model and the actual operation.
Among them, we highlight three key observations that affect software designer's decision.

\begin{itemize}
    \item \textbf{O1}: The capacitor voltage drops quickly to charge decoupling capacitor when the system wakes-up ($t1$--$t2$).
    \item \textbf{O2}: The system executes at sub-voltage using the decoupling capacitors, even after power supply stops ($t4$--$t5$).
    \item \textbf{O3}: Decoupling capacitors discharge while the system is powered-off (after $t5$, as shown in Fig.~\ref{fig:execution_trace_one_cycle}).
\end{itemize}

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/cropped/detailed_execution_model.pdf}
    \caption{Detailed execution model of intermittent systems.}
    \label{fig:detailed_execution_model}
\end{figure}

% As we discuss in the following sections, all three observations significantly impact the performance of intermittent system designs.
% We propose a detailed execution model which reflects these observations.
Fig.~\ref{fig:detailed_execution_model} shows our detailed execution model, which reflects these key observations.
When the capacitor voltage reaches the power-on threshold, the voltage experience quick drop due to the buffering effects (\circled{1}), instead of gradual reduction.
After initialization (\circled{2}), the system starts to execute at normal voltage (\circled{3}), 3.3V for example.
When the voltage hits the power-off threshold, the power supply stops but system now starts to execute using the buffered energy (\circled{4}).
Since voltage of the decoupling capacitor decreases as it discharges, the system executes at sub-normal voltage until it reaches the voltage it cannot operate (e.g., 2.5V).
% This voltage is known as Brown-Out Reset (BOR) voltage and is typically in a range of 1.7V to 2.5V in modern MCUs~\cite{}.
Finally, until the next power-on, the remaining energy in decoupling capacitors continues to discharge (\circled{5}).

When designing intermittent systems, especially targeting small capacitors, it is important for software designers to understand this model.
In the following sections, we discuss the impact of this model to software design in more detail.

\subsection{Impact on Power Efficiency}
\label{sec:power_efficiency}

The traditional model implies that the energy consumed between power-on and power-off thresholds are entirely used in the computing system.
However, our model reveals that considerable energy is used for charging the decoupling capacitors (\textbf{O1}) and dissipated during power-off durations (\textbf{O3}).
This implies that much smaller energy may be used for the useful computation compared to the designer's expectation.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/plot_expr_5_cropped.pdf}
    \caption{Distribution of energy consumed in a power cycle in different capacitor sizes (1mA current supply).}
    \label{fig:power_distribution}
\end{figure}

Fig.~\ref{fig:power_distribution} shows the distribution of the energy consumed for each stage of operation within one power cycle, averaged over 50 executions, where 1mA of input current is provided at 1.9V.
The x-axis represents different capacitor sizes and the line in the secondary axis represents the average operation times for application code.
The checkpoint is executed by the interrupt from the power management system~\cite{}, which is generated when the capacitor voltage reaches the power-off threshold (3.4V).
Note that this is the most efficient point for checkpoint execution according to the traditional model.

The results shows that significant energy is wasted in the decoupling capacitors.
For example, 60.7\% of power is wasted during the power-off duration (denoted as \emph{Dischrged}) in 470uF case.
The discharging behavior can be modeled as RC-discharging circuit (i.e., $q=CVe^{-\frac{1}{RC}t}$), which has exponential discharge rate.
As a result, the cost from discharging is more expensive when the capacitor size is small;
in our case, 50\% of energy is discharged at the first 161 ms.
The discharge rate decreases as the capacitor size increases, down to 28.5\% in 1320uF case, which is still not negligible.
% The cost is more expensive when the capacitor size is small since the discharge rate follows the RC-discharging circuits.
% While using smaller capacitors shortens the power-off durations, the discharging behavior penalizes them most since RC-discharging circuits discharge exponentially (in our case, 50\% of energy is discharged at the first 161 ms).
% As a result, 60.7\% of power is wasted in 470uF, and the rate decreases as the capacitor size increases, down to 28.5\% in 1320uF case.

Another important observation is the error introduced by the traditional model.
The traditional model expects both the energies, \emph{Execution} and \emph{Discharged}, are used for computation.
This introduces significant errors, up to 5.62x in 470uF setup.
In the same context, the traditional model expects using 470uF capacitor instead of 1320uF results in merely 1.22x overhead in energy efficiency, but the actual energy efficiency differs by 4.71x.
% However, our model shows that the actual energy efficiency differs by xx\% in reality, brining xx\% error in the traditional model.
This can significantly mislead the system designers when they decide the capacitor size by considering tradeoffs between overall efficiency and reactiveness.
In Sec.~\ref{sec:design_guidelines}, we discuss options to minimize overhead from discharging when designing software techniques.

% More importantly, this wasted energy is expected to be used for the computation in traditional execution model, as all the energy except for the initialization and checkpoint/recovery is expected to be used in computations.
% It brings significant errors between the two models in available energy for the execution.
% In 470uF case, the actual energy efficiency (Execution) and the expectation from the traditional model (Execution and Discharged) differs by 4.99 times.

% (Limitations of power failure injection and simulation based evaluations).

% In Sec.~\ref{sec:design_guidelines}, we discuss our guidelines to maximize power efficiency with software-level designs.

\subsection{Impact on Predicting Power Failures}
\label{sec:predicting_power_failures}

According to the traditional model, the system states should be saved to NVM before power-off threshold, as the system halts at this point.
On the other hand, our model shows that the system may operate afterward using the energy stored in the decoupling capacitors (\textbf{O2}). 
Since modern MCUs can operate on a range of supply voltages (e.g., from 1.7V to 3.6V in STM32L5 and MSP430), the computing system is executed until the voltage of decoupling capacitors reaches the minimum operating voltage.
% Modern MCUs can operate on a range of supply voltages (e.g., from 1.7V to 3.6V for STM32L5 and MSP430).
% Since the voltage of decoupling capacitors decreases as the discharge, the computing system is executed until the voltage reaches the minimum operating voltage.
% While the voltage of decoupling capacitors decreases as they discharge, the computing system operates since modern MCUs can operate on a range of supply voltages (e.g., from 1.7V to 3.6V for STM32L5 and MSP430).
This makes the energy storage voltage not a good estimate of the remaining time that system can execute.

\begin{figure}
    \centering
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_6a_cropped.pdf}
        \caption{Input current = 1mA.}
        \label{fig:sub_voltage_execution_1mA}
        \vspace{5pt}
    \end{subfigure}
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_6b_cropped.pdf}
        \caption{Input current = 3mA.}
        \label{fig:sub_voltage_execution_3mA}
    \end{subfigure}
    \caption{Ratio of sub-voltage operations in total execution time.}
    \label{fig:sub_voltage_execution}
\end{figure}

% Modern MCUs can operate on wide range of operating voltages (e.g., from 1.7V to 3.6V for STM32L5 and MSP430).

Fig.~\ref{fig:sub_voltage_execution} shows the ratio of the times executed under sub-voltage over the total execution times, averaged over 30 measurements.
The x-axis shows the different capacitor sizes and the colors represent the voltages that system stops its operation.
We evaluate various voltages ranging from 1.7V to 2.5V since not all components in the computing system may operate at the lowest voltage (Sec.~\ref{sec:sub_normal_execution}).
Also, we present two different cases with input current of 1mA (Fig.~\ref{fig:sub_voltage_execution_1mA}) and 3mA (Fig.~\ref{fig:sub_voltage_execution_3mA}) to evaluate the impact of input power.

The figure shows that significant MCU operation is executed at sub-normal voltage.
For example, when 470uF capacitor is used at 1mA input current (Fig.~\ref{fig:sub_voltage_execution_1mA}), 82.8\% of computation is executed \emph{after} power-off threshold.
The ratio decreases as the system powers-off early (reduced sub-voltage operation time) or the input current increases (longer operation time at normal voltage).
Under 1000uF is the major focus of this paper.

These values can be directly translated to the inefficiency of the system based on the traditional model.
For example, in 470uF with 1mA input current case, systems executing checkpoint at power-off threshold may operate 16.3ms, although it can operate 29.4ms longer if it execute checkpoint at 2.5V.
At next power-on, decoupling capacitors are discharged to similar voltages in either cases, as capacitors discharge exponentially (Sec.~\ref{sec:power_efficiency}).
As a result, failing to execute at sub-normal voltage introduces significant power efficiency overhead.
% Although early checkpoint execution may save some energy in decoupling capacitors, the saved energy is not preserved as discussed in Sec.~\ref{sec:power_efficiency}.
In Sec.~\ref{sec:design_guidelines}, we validate this aspect and propose a method to predict the power-off time more accurately.

\subsection{Impact of Sub-normal Voltage Execution}
\label{sec:sub_normal_execution}

The traditional model makes the software designers assume the system is executed under stable voltage.
However, the majority of execution may happen after the power-off threshold at sub-normal voltage (\textbf{O3}), as discussed in Sec.~\ref{sec:predicting_power_failures}.
Being aware of this is important to software designers since the peripherals and analog components may function differently at sub-normal voltage.

The two most critical examples are Analog-Digital Converters (ADCs) and external NVMs.
They play an important role in checkpointing and are likely used at sub-normal voltage, since ADCs are often used to estimate power-off time by reading the capacitor voltage and NVMs have to save the checkpoint data safely.
Incorrect execution of these components may lead to unsafe or incomplete checkpoint executions.

\begin{figure}
    \centering
    \begin{subfigure}{0.45\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_2_cropped.pdf}
        \caption{Analog-Digital Converter.}
        \label{fig:adc_error}
    \end{subfigure}
    \hfill
    \begin{subfigure}{0.52\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_3_cropped.pdf}
        \caption{External FRAM.}
        \label{fig:fram_drror}
    \end{subfigure}
    \caption{Incorrectly functioning components at sub-normal voltage.} 
    \label{fig:adc_and_fram_error}
\end{figure}

Fig.~\ref{fig:adc_error} shows the behavior of ADCs in sub-normal voltage.
As STM32L5 uses Vdd as a reference voltage of ADC, accessing ADC at sub-normal Vdd results in increased values.
Fig.~\ref{fig:fram_drror} shows the error rate of FRAM at different voltages.
In our case, FRAM cannot operate correctly when the supply voltage is below 2.4V.
When the system is configured to execute until the lowest MCU operation voltage, which is default in STM32L5 and cannot be changed in MSP430, executing checkpoint below the safe voltage results in corrupting checkpoint data.

\subsection{Sensitivity to Architectural Designs}
\label{sec:other_architectures}

% Please add the following required packages to your document preamble:
% \usepackage{booktabs}
% \usepackage{multirow}
% \usepackage{graphicx}
\begin{table}[]
    \centering
    \caption{Architectures for generality evaluation}
    \label{tab:architectures}
    \renewcommand{\arraystretch}{0.9} % Reduce vertical spacing
    \setlength{\tabcolsep}{3pt} % Reduce horizontal spacing
    \resizebox{0.95\columnwidth}{!}{%
    \begin{tabular}{@{}cccccccc@{}}
    \toprule
    \multirow{2}{*}{} & \multirow{2.5}{*}{Core} & \multirow{2.5}{*}{\begin{tabular}[c]{@{}c@{}}Core\\ Freq.\end{tabular}} & \multicolumn{3}{c}{Capacitance (uF)} & \multirow{2.5}{*}{Current} & \multirow{2.5}{*}{Memory}                                   \\ \cmidrule(lr){4-6}
                      &                       &                                                                       & C1       & C2        & Storage       &                          &                                                           \\ \midrule
    A1              & STM32L5               & 16MHz                                                                 & 22       & 220       & 1,320         & 3mA                      & \begin{tabular}[c]{@{}c@{}}MRAM\\ (off-chip)\end{tabular} \\
    A2            & MSP430FR5994          & 8MHz                                                                  & 22       & 10        & 40            & 100uA                    & \begin{tabular}[c]{@{}c@{}}FRAM\\ (on-chip)\end{tabular}  \\ \bottomrule
    \end{tabular}%
    }
    \end{table}

To evaluate the generality of our model, we employ two additional architectural setups.
Table~\ref{tab:architectures} shows the detailed parameters of them.
A1 is a same setup with the reference system but having MRAM (Everspin MR5A16ACYS35).
This setup is evaluated as MRAM is also gaining attention as a next generation NVM.
Second target is MSP430, which has been most popular 16-bit platform in intermittent system research.
For both systems, we set architectural parameters to make operation time around 50ms.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/plot_expr_9_cropped.pdf}
    \caption{Energy breakdown and the ratio of sub-voltage operations in different architectures.}
    \label{fig:other_architectures}
\end{figure}

Fig.~\ref{fig:other_architectures} shows the results in different power-off voltage.
The bar in the left side shows the energy breakdown in one power cycle, and the one in the right side represents the ratio of the execution time operated at sub-voltage.
The most noticeable difference is ratio of energy consumed for ramp-up and init.
While A1 consumes 63.4\% power at this stage on average, A2 consumes only 5.6\%.
This is because A1 shows larger leakage current due to external MRAM, which consumes more current than FRAM in our case.
However, both architectures show high sub-voltage execution rates, up to 70.1\% in A2.
In addition, discharged energy takes considerable portion both in A1 (31.4\%) and A2 (52.0\%) at 3.3V configuration, which represents the techniques based on the traditional model.
In summary, the evaluation reveals that the buffering effect of system's capacitance and its implications are general in other systems.