|
|
@@ -1,9 +1,9 @@
|
|
|
\section{Design Guidelines}
|
|
|
\label{sec:design_guidelines}
|
|
|
|
|
|
-Based on the insights from our model, we propose design guidelines for efficient and safe intermittent systems.
|
|
|
-The effectiveness of the guidelines is evaluated using seven benchmarks on the reference system used in Sec.~\ref{sec:detailed_execution_model}.
|
|
|
-We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used for evaluating intermittent systems in literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.
|
|
|
+Based on the insights from our model, we propose design guidelines to implement efficient intermittent systems.
|
|
|
+The effectiveness of these guidelines is evaluated using seven benchmarks on the reference system used in Sec.~\ref{sec:detailed_execution_model}.
|
|
|
+We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used in the evaluation of intermittent systems in the literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.
|
|
|
|
|
|
We evaluate two popular existing checkpointing schemes: \emph{static} and \emph{dynamic}.
|
|
|
The static scheme~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018} inserts checkpoint triggers at every loop latch in the program during compilation.
|
|
|
@@ -30,7 +30,7 @@ On the other hand, our model reveals that significant energy is wasted each time
|
|
|
\end{figure}
|
|
|
|
|
|
Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
|
|
|
-A 1100 uF capacitor is used as an energy storage and the execution times are normalized to the 3.4V case.
|
|
|
+A 1100uF capacitor is used as an energy storage and the execution times are normalized to the 3.4V configuration.
|
|
|
The results show that executing checkpoints earlier is significantly inefficient: by 1.38x and 2.45x in 3.7V and 4.0V configurations, respectively.
|
|
|
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
|
|
|
Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
|
|
|
@@ -42,8 +42,9 @@ Achieving this fundamentally depends on accurately predicting imminent power fai
|
|
|
\label{sec:use_vdd_for_checkpoint}
|
|
|
|
|
|
Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
|
|
|
-Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in systems without power management system (Sec.~\ref{sec:related_work}).
|
|
|
-Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
+Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
|
|
|
+Our setups are designed to work below the normal $V_{dd}$ by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
+% Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
|
|
|
|
|
|
For consistent operation of ADCs, we adopt a voltage source with a known value of $V_{ref}$.
|
|
|
In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
|
|
|
@@ -53,8 +54,8 @@ We propose two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately
|
|
|
$S_{sta}$ is designed for static checkpoint techniques.
|
|
|
Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$.
|
|
|
This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
|
|
|
-During sub-voltage execution, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
|
|
|
-Given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.
|
|
|
+During sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
|
|
|
+As a result, given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.
|
|
|
|
|
|
On the other hand, $S_{dyn}$ utilizes an on-chip comparator, which is available in most modern MCUs including STM32L5 and MSP430.
|
|
|
As $V_{ref}$ is always lower than $V_{dd}$, we use a voltage divider consisting of two resistors, $R1$ and $R2$, to scale $V_{dd}$ and compare it with $V_{ref}$.
|
|
|
@@ -94,16 +95,16 @@ Furthermore, these improvements are consistent across all benchmarks, regardless
|
|
|
|
|
|
Another advantage of the proposed setups is their simplicity and practical applicability.
|
|
|
Since the both setups only modify the method to detect imminent power failures and leave the checkpoint algorithms unchanged, it is straightforward to apply them in existing techniques.
|
|
|
-Furthermore, the proposed setups can reduce the system complexity, as they eliminate the need for communication (e.g., interrupt or access to $V_{ES}$) between the energy storage system and the computing system.
|
|
|
+Furthermore, the proposed setups can reduce the system complexity, as they eliminate the need for communication between the energy storage system and the computing system (e.g., interrupt or access to $V_{ES}$).
|
|
|
|
|
|
% \subsection{Checkpoint Techniques and Evaluation Methods}
|
|
|
\subsection{On Selecting Hardware Components}
|
|
|
|
|
|
-Our model helps designers in selecting efficient hardware components across various parameters.
|
|
|
-For example, it implies that operating voltage of peripherals (e.g., external NVMs) is a critical design consideration (Sec.~\ref{sec:sub_normal_execution}), often more important than other factors such as latency.
|
|
|
+Our model also helps designers in selecting efficient hardware components across various parameters.
|
|
|
+For example, it reveals that operating voltage of peripherals (e.g., external NVMs) is a critical design consideration (Sec.~\ref{sec:sub_normal_execution}), often more important than other factors such as latency.
|
|
|
% We evaluate this tradeoff by simulating an external FRAM having faster access latency but smaller operating voltage.
|
|
|
To evaluate this tradeoff, we simulate two FRAM configurations, F1 and F2, in our reference system.
|
|
|
-F1 represents slower setup capable of operating down to 2.5V, achieved by doubling the software-configurable wait time for FRAM accesses.
|
|
|
+F1 represents a slower setup capable of operating down to 2.5V, achieved by doubling the software-configurable wait time for FRAM accesses.
|
|
|
F2 is set to have the lowest access latency but the system stops at 2.8V.
|
|
|
|
|
|
\begin{figure}
|
|
|
@@ -120,7 +121,7 @@ This example clearly shows that operating voltage, often overlooked in the tradi
|
|
|
|
|
|
Finally, our model highlights advantages of using smaller decoupling capacitors.
|
|
|
Larger buffers not only increases the ratio of sub-normal voltage operations but also raises the amount of discharged energy during power-offs.
|
|
|
-Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that it takes xx\% and xx\% longer to complete the benchmarks, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
|
|
|
+Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that it takes 1.18x and 1.36x longer to complete the benchmarks, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
|
|
|
% As a result, it is a good design practice to use the smallest decoupling capacitors for efficiency of intermittent systems.
|
|
|
|
|
|
% \begin{figure}
|