|
|
@@ -6,10 +6,11 @@ The effectiveness of these guidelines is evaluated using seven benchmarks on the
|
|
|
We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used in the evaluation of intermittent systems in the literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.
|
|
|
|
|
|
We evaluate two popular existing checkpointing schemes: \emph{static} and \emph{dynamic}.
|
|
|
-The static scheme~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018} inserts checkpoint triggers at every loop latch in the program during compilation.
|
|
|
+In \emph{static}, checkpoint triggers are inserted at every loop latch in the program during compilation~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018}.
|
|
|
At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
|
|
|
-In contrast, the dynamic scheme~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
|
|
|
+In contrast, \emph{dynamic}~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
|
|
|
Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
|
|
|
+These schemes are considered since most checkpoint techniques exploit $V_{ES}$ by either actively polling it (as in \emph{static}) or receiving a signal (as in \emph{dynamic}).
|
|
|
All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.
|
|
|
|
|
|
\subsection{Delaying Checkpoint Executions}
|
|
|
@@ -17,9 +18,10 @@ All the evaluations are conducted with 470uF energy storage and 1mA of input cur
|
|
|
|
|
|
The first design practice we propose is to delay checkpoint executions until the last possible moment.
|
|
|
While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
|
|
|
-Under the traditional execution model, early checkpoint execution is often considered acceptable as it allows the system to wake up sooner, incurring only minor costs for initialization and recovery.
|
|
|
-For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
|
|
|
-On the other hand, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}), highlighting the impact of delaying checkpoint executions.
|
|
|
+Under the traditional execution model, early checkpoint execution is often considered acceptable as it makes the system wake up sooner, incurring only minor costs for initialization and recovery.
|
|
|
+For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}.
|
|
|
+% For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
|
|
|
+In contrast, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}).%, highlighting the impact of delaying checkpoint executions.
|
|
|
% As a result, the importance of delaying checkpoint executions is greater than previously assumed.
|
|
|
|
|
|
\begin{figure}
|
|
|
@@ -29,11 +31,16 @@ On the other hand, our model reveals that significant energy is wasted each time
|
|
|
\label{fig:expr_checkpoint_voltages}
|
|
|
\end{figure}
|
|
|
|
|
|
-Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
|
|
|
-A 1100uF capacitor is used as an energy storage and the execution times are normalized to the 3.4V configuration.
|
|
|
-The results show that executing checkpoints earlier is significantly inefficient: by 1.38x and 2.45x in 3.7V and 4.0V configurations, respectively.
|
|
|
+We evaluate the impact of delaying checkpoint executions in \emph{dynamic}, by varying the interrupt voltage.
|
|
|
+A 1100uF capacitor is used for $C_{ES}$.
|
|
|
+% Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
|
|
|
+% A 1100uF capacitor is used for $C_{ES}$ and the execution times are normalized to the 3.4V configuration.
|
|
|
+Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 20 runs, normalized to the 3.V configuration.
|
|
|
+The results show that executing checkpoints earlier is significantly inefficient as opposed to existing expectations: by 1.38x in 3.7V, and 2.45x in 4.0V setups, on average.
|
|
|
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
|
|
|
-Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
|
|
|
+% Consequently, to design efficient checkpoint techniques, it important to minimize the margin between checkpoint execution and the power-off.
|
|
|
+Consequently, for maximum power efficiency, checkpoint techniques should be able to minimize the margin between the checkpoint execution and the power-off.
|
|
|
+% Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
|
|
|
Achieving this fundamentally depends on accurately predicting imminent power failures, which is the focus of the next section.
|
|
|
% Consequently, it is important to execute as long as possible whenever the system wakes up.
|
|
|
% In the next section, we discuss how this can be implemented in the existing intermittent systems.
|
|
|
@@ -41,20 +48,25 @@ Achieving this fundamentally depends on accurately predicting imminent power fai
|
|
|
\subsection{Using $V_{dd}$ with a Reference Voltage for Checkpoint Signals}
|
|
|
\label{sec:use_vdd_for_checkpoint}
|
|
|
|
|
|
-Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
|
|
|
-Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
|
|
|
-Our setups are designed to work below the normal $V_{dd}$ by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
+Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a reliable estimate for the system's remaining execution time and that low $V_{dd}$ is the direct cause of power-off.
|
|
|
+Based on this insight, we propose using $V_{dd}$ to more accurately detect the imminent power failures, as in works without power management system (Sec.~\ref{sec:related_work}).
|
|
|
+We present two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in approaches similar to \emph{static} and \emph{dynamic}, respectively.
|
|
|
+
|
|
|
+% Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
|
|
|
+% Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
|
|
|
+% We propose setups that operate correctly below the normal $V_{dd}$, by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
% Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
|
|
|
+Meanwhile, when designing techniques using $V_{dd}$, designers should account for the behaviors of analog components at sub-normal voltages (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
For consistent operation of ADCs, we adopt a voltage source with a known value of $V_{ref}$.
|
|
|
In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
|
|
|
Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
|
|
|
-We propose two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in static and dynamic checkpoint schemes, respectively.
|
|
|
|
|
|
-$S_{sta}$ is designed for static checkpoint techniques.
|
|
|
-Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$.
|
|
|
+$S_{sta}$ is designed for the techniques like \emph{static}, which query to decide checkpoint execution at checkpoint triggers.
|
|
|
+Since directly reading $V_{dd}$ is infeasible (i.e., $V_{dd}$ itself is a reference voltage), $S_{sta}$ reads $V_{ref}$ instead.
|
|
|
+% Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$.
|
|
|
This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
|
|
|
-During sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
|
|
|
+On the other hand, during sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
|
|
|
As a result, given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.
|
|
|
|
|
|
On the other hand, $S_{dyn}$ utilizes an on-chip comparator, which is available in most modern MCUs including STM32L5 and MSP430.
|
|
|
@@ -87,7 +99,7 @@ Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{t
|
|
|
\label{fig:expr_precise_checkpoint_timings}
|
|
|
\end{figure}
|
|
|
|
|
|
-Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average end-to-end execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
|
|
|
+Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
|
|
|
Fig.~\ref{fig:expr_precise_checkpoint_timings_static} illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
|
|
|
The error bars indicate the minimum and maximum measured execution times for each benchmark.
|
|
|
The results clearly demonstrate that the execution time is significantly improved in both systems by extending the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
|
|
|
@@ -103,9 +115,11 @@ Furthermore, the proposed setups can reduce the system complexity, as they elimi
|
|
|
Our model also helps designers in selecting efficient hardware components across various parameters.
|
|
|
For example, it reveals that operating voltage of peripherals (e.g., external NVMs) is a critical design consideration (Sec.~\ref{sec:sub_normal_execution}), often more important than other factors such as latency.
|
|
|
% We evaluate this tradeoff by simulating an external FRAM having faster access latency but smaller operating voltage.
|
|
|
+
|
|
|
To evaluate this tradeoff, we simulate two FRAM configurations, F1 and F2, in our reference system.
|
|
|
-F1 represents a slower setup capable of operating down to 2.5V, achieved by doubling the software-configurable wait time for FRAM accesses.
|
|
|
-F2 is set to have the lowest access latency but the system stops at 2.8V.
|
|
|
+F1 represents a slower setup capable of operating down to 2.5V.
|
|
|
+This is achieved by doubling the software-configurable wait time for FRAM accesses.
|
|
|
+F2 is set to have the lowest access latency but requires the system stop operating at 2.8V.
|
|
|
|
|
|
\begin{figure}
|
|
|
\centering
|
|
|
@@ -120,8 +134,8 @@ These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\ci
|
|
|
This example clearly shows that operating voltage, often overlooked in the traditional execution model, should be considered a critical design parameter.
|
|
|
|
|
|
Finally, our model highlights advantages of using smaller decoupling capacitors.
|
|
|
-Larger buffers not only increases the ratio of sub-normal voltage operations but also raises the amount of discharged energy during power-offs.
|
|
|
-Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that it takes 1.18x and 1.36x longer to complete the benchmarks, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
|
|
|
+Larger buffers not only increases the ratio of sub-normal voltage operations but also raise the amount of discharged energy during power-offs.
|
|
|
+Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that completing benchmarks takes 1.18x and 1.36x longer on average, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
|
|
|
% As a result, it is a good design practice to use the smallest decoupling capacitors for efficiency of intermittent systems.
|
|
|
|
|
|
% \begin{figure}
|