| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152 |
- \section{Design Guidelines}
- \label{sec:design_guidelines}
- Based on the insights from our model, we propose design guidelines to implement efficient intermittent systems.
- The effectiveness of these guidelines is evaluated using seven benchmarks on the reference system used in Sec.~\ref{sec:detailed_execution_model}.
- We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used in the evaluation of intermittent systems in the literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.
- We evaluate two popular existing checkpointing schemes: \emph{static} and \emph{dynamic}.
- In \emph{static}, checkpoint triggers are inserted at every loop latch in the program during compilation~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018}.
- At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
- In contrast, \emph{dynamic}~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
- Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
- These schemes are considered since most checkpoint techniques utilize $V_{ES}$ by either actively polling it (as in \emph{static}) or by receiving a signal (as in \emph{dynamic}).
- All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.
- \subsection{Delaying Checkpoint Executions}
- \label{sec:delay_checkpoint_execution}
- The first design practice we propose is to delay checkpoint executions until the last possible moment.
- While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
- Under the traditional execution model, early checkpoint execution is often considered acceptable as it makes the system wake up sooner, incurring only minor costs for initialization and recovery.
- For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024,raffeckWoCA2024}.
- % For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
- In contrast, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}).%, highlighting the impact of delaying checkpoint executions.
- % As a result, the importance of delaying checkpoint executions is greater than previously assumed.
- \begin{figure}
- \centering
- \includegraphics[width=\linewidth]{figs/plot_expr_7_cropped.pdf}
- \caption{Execution times across various checkpoint voltages, normalized to the 3.4V configuration.}
- \label{fig:expr_checkpoint_voltages}
- \end{figure}
- We evaluate the impact of delaying checkpoint executions in \emph{dynamic}, by varying the interrupt voltage.
- A 1100uF capacitor is used for $C_{ES}$.
- % Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
- % A 1100uF capacitor is used for $C_{ES}$ and the execution times are normalized to the 3.4V configuration.
- Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 30 runs, normalized to the 3.4V configuration.
- The results show that executing checkpoints earlier is significantly inefficient as opposed to existing expectations: by 1.38x in 3.7V, and 2.45x in 4.0V setups, on average.
- Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
- % Consequently, to design efficient checkpoint techniques, it important to minimize the margin between checkpoint execution and the power-off.
- Consequently, for maximum power efficiency, checkpoint techniques should be able to minimize the margin between the checkpoint execution and the power-off.
- % Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
- Achieving this fundamentally depends on accurately predicting imminent power failures, which is the focus of the next section.
- % Consequently, it is important to execute as long as possible whenever the system wakes up.
- % In the next section, we discuss how this can be implemented in the existing intermittent systems.
- \subsection{Using $V_{dd}$ with a Reference Voltage for Checkpoint Signals}
- \label{sec:use_vdd_for_checkpoint}
- Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a reliable estimate for the system's remaining execution time and that low $V_{dd}$ is the direct cause of power-off.
- Based on this insight, we propose using $V_{dd}$ to more accurately detect the imminent power failures, as in works without power management system (Sec.~\ref{sec:related_work}).
- We present two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in approaches similar to \emph{static} and \emph{dynamic}, respectively.
- % Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
- % Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
- % We propose setups that operate correctly below the normal $V_{dd}$, by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
- % Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
- Meanwhile, when designing techniques using $V_{dd}$, designers should account for the behaviors of analog components at sub-normal voltages (Sec.~\ref{sec:sub_normal_execution}).
- For consistent operation of ADCs, we adopt a voltage source with a known value of $V_{ref}$.
- In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
- Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
- $S_{sta}$ is designed for techniques similar to \emph{static}, which query whether to execute a checkpoint at checkpoint triggers.
- Since directly reading $V_{dd}$ is infeasible (i.e., $V_{dd}$ itself is a reference voltage), $S_{sta}$ reads $V_{ref}$ instead.
- % Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$.
- This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
- On the other hand, during sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
- As a result, given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.
- On the other hand, $S_{dyn}$ utilizes an on-chip comparator, which is available in most modern MCUs including STM32L5 and MSP430.
- As $V_{ref}$ is always lower than $V_{dd}$, we use a voltage divider consisting of two resistors, $R1$ and $R2$, to scale $V_{dd}$ and compare it with $V_{ref}$.
- Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{th} = V_{ref}$, so the comparator generates an interrupt when $V_{dd}$ reaches the threshold $V_{th}$.
- % T2 is setup for static checkpoint techniques, which poll the capacitor voltage to determine whether execute checkpoint or not.
- % Instead of reading the capacitor voltage, it reads the reference voltage.
- % As we discussed in Sec.~\ref{sec:sub_normal_execution}, the voltage remains same while the system executes at normal voltage but the value increases during sub-normal voltage execution.
- % \begin{itemize}
- % \item T1 utilizes a on-chip comparator (available both in STM32L5 and MSP430) with a reference voltage.
- % \item T2.
- % \end{itemize}
- \begin{figure}
- \centering
- \begin{subfigure}{\linewidth}
- \includegraphics[width=\textwidth]{figs/plot_expr_11_cropped.pdf}
- \caption{Static checkpointing with $S_{sta}$.}
- \label{fig:expr_precise_checkpoint_timings_static}
- \vspace{3pt}
- \end{subfigure}
- \begin{subfigure}{\linewidth}
- \includegraphics[width=\textwidth]{figs/plot_expr_10_cropped.pdf}
- \caption{Dynamic checkpointing with $S_{dyn}$.}
- \label{fig:expr_precise_checkpoint_timings_dynamic}
- \end{subfigure}
- \caption{Impact of precise checkpoint timings to the end-to-end execution times.}
- \label{fig:expr_precise_checkpoint_timings}
- \end{figure}
- Fig.~\ref{fig:expr_precise_checkpoint_timings} compares the average execution times of the benchmarks over 30 iterations between traditional systems and the proposed setups.
- Fig.~\ref{fig:expr_precise_checkpoint_timings_static} and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} illustrates the results of $S_{sta}$ and $S_{dyn}$, respectively.
- % illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
- The whiskers indicate the minimum and maximum execution times for each benchmark.
- The results show significant improvements in execution times for both systems, with average gain of 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
- While the effectiveness of checkpoint schemes varies depending on application characteristics, our setups evenly enhance performance across all benchmarks.
- This underscores the importance of accurately detecting power-off events for efficient intermittent system operation.
- % It clearly demonstrates that the both setups can extend the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
- % Furthermore, these improvements are consistent across all benchmarks, regardless of the application characteristics, highlighting the general effectiveness of the proposed setups.
- Another advantage of the proposed setups is their simplicity and practical applicability.
- Since the both setups only modify the method to detect imminent power failures and leave the checkpoint algorithms unchanged, it is straightforward to apply them in existing techniques.
- Furthermore, the proposed setups can reduce the system complexity, as they eliminate the need for communication between the energy storage system and the computing system (e.g., interrupt or access to $V_{ES}$).
- % \subsection{Checkpoint Techniques and Evaluation Methods}
- \subsection{On Selecting Hardware Components}
- Our model also helps designers in selecting efficient hardware components across various parameters.
- For example, it reveals that operating voltage of peripherals (e.g., external NVMs) is a critical design consideration (Sec.~\ref{sec:sub_normal_execution}), often more important than other factors such as latency.
- % We evaluate this tradeoff by simulating an external FRAM having faster access latency but smaller operating voltage.
- To evaluate this tradeoff, we simulate two FRAM configurations, F1 and F2, in our reference system.
- F1 represents a slower setup capable of operating down to 2.5V.
- This is achieved by doubling the software-configurable wait time for FRAM accesses.
- F2 is set to have the lowest access latency but requires the system stop operating at 2.8V.
- \begin{figure}
- \centering
- \includegraphics[width=\linewidth]{figs/plot_expr_12_cropped.pdf}
- \caption{Impact of peripheral operating voltage.}
- \label{fig:expr_peripheral_voltage}
- \end{figure}
- Fig.~\ref{fig:expr_peripheral_voltage} presents the execution times of the benchmarks for the two configurations in $S_{dyn}$, averaged over 30 runs.
- Despite its doubled latency, F1 completes the workloads 1.46x faster on average, with consistent improvements across all benchmarks.
- These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\cite{fujitsuMB85R4M2T}) could considerably improve the performance of our reference system.
- This example clearly shows that operating voltage, often overlooked in the traditional model, should be considered a critical design parameter.
- Finally, our model highlights advantages of using smaller decoupling capacitors.
- Larger buffers not only increases the ratio of sub-normal voltage operations but also raise the amount of discharged energy during power-offs.
- Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that completing benchmarks takes 1.18x and 1.36x longer on average, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
- % As a result, it is a good design practice to use the smallest decoupling capacitors for efficiency of intermittent systems.
- % \begin{figure}
- % \centering
- % \includegraphics[width=\linewidth]{figs/plot_expr_12_cropped.pdf}
- % \caption{Execution times with varying decoupling capacitors.}
- % % \label{fig:expr_checkpoint_voltages}
- % \end{figure}
- % Power failure injection (soft reset)~\cite{wuIntOS2024,yildizEfficient2023}.
|