\section{Design Guidelines}
\label{sec:design_guidelines}

Based on the insights from our model, we propose design guidelines to implement efficient intermittent systems.
The effectiveness of these guidelines is evaluated using seven benchmarks on the reference system used in Sec.~\ref{sec:detailed_execution_model}. 
We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used in the evaluation of intermittent systems in the literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.

We evaluate two popular existing checkpointing schemes: \emph{static} and \emph{dynamic}.
The static scheme~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018} inserts checkpoint triggers at every loop latch in the program during compilation.
At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
In contrast, the dynamic scheme~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.

\subsection{Delaying Checkpoint Executions}
\label{sec:delay_checkpoint_execution}

The first design practice we propose is to delay checkpoint executions until the last possible moment.
While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
Under the traditional execution model, early checkpoint execution is often considered acceptable as it allows the system to wake up sooner, incurring only minor costs for initialization and recovery.
For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
On the other hand, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}), highlighting the impact of delaying checkpoint executions.
% As a result, the importance of delaying checkpoint executions is greater than previously assumed.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/plot_expr_7_cropped.pdf}
    \caption{Execution times across various checkpoint voltages, normalized to the 3.4V configuration.}
    \label{fig:expr_checkpoint_voltages}
\end{figure}

Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
A 1100uF capacitor is used as an energy storage and the execution times are normalized to the 3.4V configuration. 
The results show that executing checkpoints earlier is significantly inefficient: by 1.38x and 2.45x in 3.7V and 4.0V configurations, respectively.
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
Achieving this fundamentally depends on accurately predicting imminent power failures, which is the focus of the next section.
% Consequently, it is important to execute as long as possible whenever the system wakes up.
% In the next section, we discuss how this can be implemented in the existing intermittent systems.

\subsection{Using $V_{dd}$ with a Reference Voltage for Checkpoint Signals}
\label{sec:use_vdd_for_checkpoint}

Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
Our setups are designed to work below the normal $V_{dd}$ by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
% Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).

For consistent operation of ADCs, we adopt a voltage source with a known value of $V_{ref}$.
In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
We propose two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in static and dynamic checkpoint schemes, respectively.

$S_{sta}$ is designed for static checkpoint techniques.
Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$. 
This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
During sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
As a result, given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.

On the other hand, $S_{dyn}$ utilizes an on-chip comparator, which is available in most modern MCUs including STM32L5 and MSP430.
As $V_{ref}$ is always lower than $V_{dd}$, we use a voltage divider consisting of two resistors, $R1$ and $R2$, to scale $V_{dd}$ and compare it with $V_{ref}$.
Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{th} = V_{ref}$, so the comparator generates an interrupt when $V_{dd}$ reaches the threshold $V_{th}$.

% T2 is setup for static checkpoint techniques, which poll the capacitor voltage to determine whether execute checkpoint or not.
% Instead of reading the capacitor voltage, it reads the reference voltage.
% As we discussed in Sec.~\ref{sec:sub_normal_execution}, the voltage remains same while the system executes at normal voltage but the value increases during sub-normal voltage execution.

% \begin{itemize}
%     \item T1 utilizes a on-chip comparator (available both in STM32L5 and MSP430) with a reference voltage.
%     \item T2.
% \end{itemize}

\begin{figure}
    \centering
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_11_cropped.pdf}
        \caption{Static checkpointing with $S_{sta}$.}
        \label{fig:expr_precise_checkpoint_timings_static}
        \vspace{3pt}
    \end{subfigure}
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_10_cropped.pdf}
        \caption{Dynamic checkpointing with $S_{dyn}$.}
        \label{fig:expr_precise_checkpoint_timings_dynamic}
    \end{subfigure}
    \caption{Impact of precise checkpoint timings to the end-to-end execution times.}
    \label{fig:expr_precise_checkpoint_timings}
\end{figure}

Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average end-to-end execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
Fig.~\ref{fig:expr_precise_checkpoint_timings_static} illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
The error bars indicate the minimum and maximum measured execution times for each benchmark.
The results clearly demonstrate that the execution time is significantly improved in both systems by extending the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
Furthermore, these improvements are consistent across all benchmarks, regardless of the application characteristics, highlighting the general effectiveness of the proposed setups.

Another advantage of the proposed setups is their simplicity and practical applicability.
Since the both setups only modify the method to detect imminent power failures and leave the checkpoint algorithms unchanged, it is straightforward to apply them in existing techniques.
Furthermore, the proposed setups can reduce the system complexity, as they eliminate the need for communication between the energy storage system and the computing system (e.g., interrupt or access to $V_{ES}$).

% \subsection{Checkpoint Techniques and Evaluation Methods}
\subsection{On Selecting Hardware Components}

Our model also helps designers in selecting efficient hardware components across various parameters.
For example, it reveals that operating voltage of peripherals (e.g., external NVMs) is a critical design consideration (Sec.~\ref{sec:sub_normal_execution}), often more important than other factors such as latency.
% We evaluate this tradeoff by simulating an external FRAM having faster access latency but smaller operating voltage.
To evaluate this tradeoff, we simulate two FRAM configurations, F1 and F2, in our reference system.
F1 represents a slower setup capable of operating down to 2.5V, achieved by doubling the software-configurable wait time for FRAM accesses.
F2 is set to have the lowest access latency but the system stops at 2.8V.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/plot_expr_12_cropped.pdf}
    \caption{Impact of peripheral operating voltage.}
    \label{fig:expr_peripheral_voltage}
\end{figure}

Fig.~\ref{fig:expr_peripheral_voltage} presents the execution times of the benchmarks for the two configurations in $S_{dyn}$, averaged over 20 runs.
Despite its doubled latency, F1 completes the workloads 1.46x faster on average, with consistent improvements across all benchmarks.
These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\cite{fujitsuMB85R4M2T}) could considerably improve the performance of our reference system.
This example clearly shows that operating voltage, often overlooked in the traditional execution model, should be considered a critical design parameter.

Finally, our model highlights advantages of using smaller decoupling capacitors.
Larger buffers not only increases the ratio of sub-normal voltage operations but also raises the amount of discharged energy during power-offs.
Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that it takes 1.18x and 1.36x longer to complete the benchmarks, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
% As a result, it is a good design practice to use the smallest decoupling capacitors for efficiency of intermittent systems.

% \begin{figure}
%     \centering
%     \includegraphics[width=\linewidth]{figs/plot_expr_12_cropped.pdf}
%     \caption{Execution times with varying decoupling capacitors.}
%     % \label{fig:expr_checkpoint_voltages}
% \end{figure}

% Power failure injection (soft reset)~\cite{wuIntOS2024,yildizEfficient2023}.