papers
/
2024d_execution_model


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
							\section{Design Guidelines}
\label{sec:design_guidelines}

Based on the insights from our model, we propose design guidelines for efficient and safe intermittent systems.
The effectiveness of the guidelines is evaluated using seven benchmarks on the reference system used in Sec.~\ref{sec:detailed_execution_model}. 
We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used for evaluating intermittent systems in literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.

We evaluate two popular existing checkpointing schemes: \emph{static} and \emph{dynamic}.
The static scheme~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018} inserts checkpoint triggers at every loop latch in the program during compilation.
At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
In contrast, the dynamic scheme~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.

\subsection{Delaying Checkpoint Executions}
\label{sec:delay_checkpoint_execution}

The first design practice we propose is to delay checkpoint executions until the last possible moment.
While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
Under the traditional execution model, early checkpoint execution is often considered acceptable as it allows the system to wake up sooner, incurring only minor costs for initialization and recovery.
For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
On the other hand, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}), highlighting the impact of delaying checkpoint executions.
% As a result, the importance of delaying checkpoint executions is greater than previously assumed.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/plot_expr_7_cropped.pdf}
    \caption{Execution times across various checkpoint voltages, normalized to the 3.4V configuration.}
    \label{fig:expr_checkpoint_voltages}
\end{figure}

Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
A 1100 uF capacitor is used as an energy storage and the execution times are normalized to the 3.4V case. 
The results show that executing checkpoints earlier is significantly inefficient: by 1.38x and 2.45x in 3.7V and 4.0V configurations, respectively.
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
Achieving this fundamentally depends on accurately predicting imminent power failures, which is the focus of the next section.
% Consequently, it is important to execute as long as possible whenever the system wakes up.
% In the next section, we discuss how this can be implemented in the existing intermittent systems.

\subsection{Using $V_{dd}$ with a Reference Voltage for Checkpoint Signals}
\label{sec:use_vdd_for_checkpoint}

Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in systems without power management system (Sec.~\ref{sec:related_work}).
Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).

For consistent operation of ADCs, we adopt a voltage source with a known value of $V_{ref}$.
In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
We propose two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in static and dynamic checkpoint schemes, respectively.

$S_{sta}$ is designed for static checkpoint techniques.
Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$. 
This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
During sub-voltage execution, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
Given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.

On the other hand, $S_{dyn}$ utilizes an on-chip comparator, which is available in most modern MCUs including STM32L5 and MSP430.
As $V_{ref}$ is always lower than $V_{dd}$, we use a voltage divider consisting of two resistors, $R1$ and $R2$, to scale $V_{dd}$ and compare it with $V_{ref}$.
Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{th} = V_{ref}$, so the comparator generates an interrupt when $V_{dd}$ reaches the threshold $V_{th}$.

% T2 is setup for static checkpoint techniques, which poll the capacitor voltage to determine whether execute checkpoint or not.
% Instead of reading the capacitor voltage, it reads the reference voltage.
% As we discussed in Sec.~\ref{sec:sub_normal_execution}, the voltage remains same while the system executes at normal voltage but the value increases during sub-normal voltage execution.

% \begin{itemize}
%     \item T1 utilizes a on-chip comparator (available both in STM32L5 and MSP430) with a reference voltage.
%     \item T2.
% \end{itemize}

\begin{figure}
    \centering
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_11_cropped.pdf}
        \caption{Static checkpointing with $S_{sta}$.}
        \label{fig:expr_precise_checkpoint_timings_static}
        \vspace{7pt}
    \end{subfigure}
    \begin{subfigure}{\linewidth}
        \includegraphics[width=\textwidth]{figs/plot_expr_10_cropped.pdf}
        \caption{Dynamic checkpointing with $S_{dyn}$.}
        \label{fig:expr_precise_checkpoint_timings_dynamic}
    \end{subfigure}
    \caption{Impact of precise checkpoint timings to the end-to-end execution times.}
    \label{fig:expr_precise_checkpoint_timings}
\end{figure}

Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average end-to-end execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
Fig.~\ref{fig:expr_precise_checkpoint_timings_static} illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
The error bars indicate the minimum and maximum measured execution times for each benchmark.
The results clearly demonstrate that the execution time is significantly improved in both systems by extending the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
Furthermore, these improvements are consistent across all benchmarks, regardless of the application characteristics, highlighting the general effectiveness of the proposed setups.

Another advantage of the proposed setups is their simplicity and practical applicability.
Since the both setups only modify the method to detect imminent power failures and leave the checkpoint algorithms unchanged, it is straightforward to apply them in existing techniques.
Furthermore, the proposed setups can reduce the system complexity, as they eliminate the need for communication (e.g., interrupt or access to $V_{ES}$) between the energy storage system and the computing system.

% \subsection{Checkpoint Techniques and Evaluation Methods}
\subsection{On Selecting Hardware Components}

Our model helps designers to select efficient hardware components in various aspects.
For example, it implies that operating voltage of peripherals (e.g., external NVMs) is a critical design parameter, often more important than their latency.
% We evaluate this tradeoff by simulating an external FRAM having faster access latency but smaller operating voltage.
We evaluate this tradeoff by simulating two FRAM configurations, F1 and F2, in our reference system.
F1 represents slower setup operating until 2.5V; we double the software-configurable wait time for FRAM accesses for this setup.
In F2, the fastest FRAM access parameters are used but the system stops operating at 2.8V.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figs/plot_expr_12_cropped.pdf}
    \caption{Impact of peripheral operating voltage.}
    \label{fig:expr_peripheral_voltage}
\end{figure}

Fig.~\ref{fig:expr_peripheral_voltage} presents the results.
It shows that operating voltage should considered, which can be ignored in the traditional execution model.

Finally, our model highlights advantages of using smaller decoupling capacitors.
Using larger buffers not only increases the ratio of sub-normal voltage operations but also increases the amount of discharged energy during power-offs.
Indeed, we observe our reference system requires xx\% and xx\% longer time on average for execution of the benchmarks, when xxuF and xxuF decoupling capacitors are used, compared to our design of 220uF.
As a result, it is a good design practice to use the smallest decoupling capacitors for efficiency of intermittent systems.

% \begin{figure}
%     \centering
%     \includegraphics[width=\linewidth]{figs/plot_expr_12_cropped.pdf}
%     \caption{Execution times with varying decoupling capacitors.}
%     % \label{fig:expr_checkpoint_voltages}
% \end{figure}

% Power failure injection (soft reset)~\cite{wuIntOS2024,yildizEfficient2023}.