hace 1 año · 6bae084df7
--- a/sections/Conclusion.tex
+++ b/sections/Conclusion.tex
@@ -1,7 +1,12 @@
 
				 \section{Conclusion}
			
 
				 
			
 
				-When designing software supports for intermittent systems, designers rely on an execution model that abstracts the hardware-level operations and describes the key behaviors of the system.
			
 
				-However, the traditional model is failing to accurately model the actual behaviors as recent systems target smaller energy storages and more power-demanding architectures. 
			
 
				-In this paper, we propose a new execution model, which incorporates the major source of this inconsistency: the buffering effects due to the system's inherent capacitance.
			
 
				-Our model reveals that the traditional model can mislead the power efficiency of the system up to 5.62x and also may lead to unsafe checkpoint executions.
			
 
				-Also, we propose several design guidelines, including methods to predict imminent power failure more accurately, which can improve the performance of existing checkpoint techniques up to 3.04x.
			
 
				+As recent intermittent systems target smaller energy storages and shorter operation times, the traditional execution model for intermittent systems is failing to accurately represent actual system behaviors.
			
 
				+In this paper, we propose a new execution model that incorporates the buffering effects from the system's inherent capacitance, which is a primary source of the discrepancies of the traditional model.
			
 
				+Our model reveals that systems designed upon the traditional model can be up to 5.62x power-inefficient than expected and may result in unsafe checkpoint executions.
			
 
				+Based on the insights from our model, we propose several design guidelines, including setups to improve performance of existing static and dynamic checkpoint techniques by 3.04x and 2.85x on average, respectively.
			
 
				+
			
 
				+% When designing software supports for intermittent systems, designers rely on an execution model that abstracts the hardware-level operations and describes the key behaviors of the system.
			
 
				+% However, the traditional model is failing to accurately model the actual behaviors as recent systems target smaller energy storages and more power-demanding architectures. 
			
 
				+% In this paper, we propose a new execution model, which incorporates the major source of this inconsistency: the buffering effects due to the system's inherent capacitance.
			
 
				+% Our model reveals that the traditional model can mislead the power efficiency of the system up to 5.62x and also may lead to unsafe checkpoint executions.
			
 
				+% Also, we propose several design guidelines, including methods to predict imminent power failure more accurately, which can improve the performance of existing checkpoint techniques up to 3.04x.
			
--- a/sections/OurApproach.tex
+++ b/sections/OurApproach.tex
@@ -6,10 +6,11 @@ The effectiveness of these guidelines is evaluated using seven benchmarks on the
 
				 We ported five benchmarks from miBench~\cite{guthausMiBench2001} benchmark suite and implemented two computation kernels (\emph{matmul} and \emph{conv2d}) commonly used in the evaluation of intermittent systems in the literature~\cite{kimLACT2024,maengSupporting2019,bhattacharyyaNvMR2022,ganesanWhat2019,akhunovEnabling2023}.
			
 
				 
			
 
				 We evaluate two popular existing checkpointing schemes: \emph{static} and \emph{dynamic}.
			
 
				-The static scheme~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018} inserts checkpoint triggers at every loop latch in the program during compilation.
			
 
				+In \emph{static}, checkpoint triggers are inserted at every loop latch in the program during compilation~\cite{ransfordMementos2011,kimLivenessAware2023,kimLACT2024,maengAdaptive2018}.
			
 
				 At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
			
 
				-In contrast, the dynamic scheme~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
			
 
				+In contrast, \emph{dynamic}~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
			
 
				 Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
			
 
				+These schemes are considered since most checkpoint techniques exploit $V_{ES}$ by either actively polling it (as in \emph{static}) or receiving a signal (as in \emph{dynamic}).
			
 
				 All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.
			
 
				 
			
 
				 \subsection{Delaying Checkpoint Executions}
			
@@ -17,9 +18,10 @@ All the evaluations are conducted with 470uF energy storage and 1mA of input cur
 
				 
			
 
				 The first design practice we propose is to delay checkpoint executions until the last possible moment.
			
 
				 While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
			
 
				-Under the traditional execution model, early checkpoint execution is often considered acceptable as it allows the system to wake up sooner, incurring only minor costs for initialization and recovery.
			
 
				-For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
			
 
				-On the other hand, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}), highlighting the impact of delaying checkpoint executions.
			
 
				+Under the traditional execution model, early checkpoint execution is often considered acceptable as it makes the system wake up sooner, incurring only minor costs for initialization and recovery.
			
 
				+For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}.
			
 
				+% For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
			
 
				+In contrast, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}).%, highlighting the impact of delaying checkpoint executions.
			
 
				 % As a result, the importance of delaying checkpoint executions is greater than previously assumed.
			
 
				 
			
 
				 \begin{figure}
			
@@ -29,11 +31,16 @@ On the other hand, our model reveals that significant energy is wasted each time
 
				     \label{fig:expr_checkpoint_voltages}
			
 
				 \end{figure}
			
 
				 
			
 
				-Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
			
 
				-A 1100uF capacitor is used as an energy storage and the execution times are normalized to the 3.4V configuration. 
			
 
				-The results show that executing checkpoints earlier is significantly inefficient: by 1.38x and 2.45x in 3.7V and 4.0V configurations, respectively.
			
 
				+We evaluate the impact of delaying checkpoint executions in \emph{dynamic}, by varying the interrupt voltage. 
			
 
				+A 1100uF capacitor is used for $C_{ES}$.
			
 
				+% Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
			
 
				+% A 1100uF capacitor is used for $C_{ES}$ and the execution times are normalized to the 3.4V configuration. 
			
 
				+Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 20 runs, normalized to the 3.V configuration.
			
 
				+The results show that executing checkpoints earlier is significantly inefficient as opposed to existing expectations: by 1.38x in 3.7V, and 2.45x in 4.0V setups, on average.
			
 
				 Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
			
 
				-Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
			
 
				+% Consequently, to design efficient checkpoint techniques, it important to minimize the margin between checkpoint execution and the power-off.
			
 
				+Consequently, for maximum power efficiency, checkpoint techniques should be able to minimize the margin between the checkpoint execution and the power-off.
			
 
				+% Consequently, delaying checkpoint executions is crucial when designing state-retention techniques.
			
 
				 Achieving this fundamentally depends on accurately predicting imminent power failures, which is the focus of the next section.
			
 
				 % Consequently, it is important to execute as long as possible whenever the system wakes up.
			
 
				 % In the next section, we discuss how this can be implemented in the existing intermittent systems.
			
@@ -41,20 +48,25 @@ Achieving this fundamentally depends on accurately predicting imminent power fai
 
				 \subsection{Using $V_{dd}$ with a Reference Voltage for Checkpoint Signals}
			
 
				 \label{sec:use_vdd_for_checkpoint}
			
 
				 
			
 
				-Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
			
 
				-Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
			
 
				-Our setups are designed to work below the normal $V_{dd}$ by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
			
 
				+Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a reliable estimate for the system's remaining execution time and that low $V_{dd}$ is the direct cause of power-off.
			
 
				+Based on this insight, we propose using $V_{dd}$ to more accurately detect the imminent power failures, as in works without power management system (Sec.~\ref{sec:related_work}).
			
 
				+We present two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in approaches similar to \emph{static} and \emph{dynamic}, respectively.
			
 
				+
			
 
				+% Sec.~\ref{sec:predicting_power_failures} demonstrates that $V_{ES}$ is not a good estimate for the system's remaining execution time.
			
 
				+% Instead, we propose using $V_{dd}$ to more accurately estimate the imminent power-off events, similar to approaches used in works without power management system (Sec.~\ref{sec:related_work}).
			
 
				+% We propose setups that operate correctly below the normal $V_{dd}$, by accounting for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
			
 
				 % Additionally, when obtaining $V_{dd}$, it is important to account for the operations of ADC in sub-normal voltage conditions (Sec.~\ref{sec:sub_normal_execution}).
			
 
				 
			
 
				+Meanwhile, when designing techniques using $V_{dd}$, designers should account for the behaviors of analog components at sub-normal voltages (Sec.~\ref{sec:sub_normal_execution}).
			
 
				 For consistent operation of ADCs, we adopt a voltage source with a known value of $V_{ref}$.
			
 
				 In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
			
 
				 Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
			
 
				-We propose two efficient implementations, $S_{sta}$ and $S_{dyn}$, to accurately detect the imminent power-off events in static and dynamic checkpoint schemes, respectively.
			
 
				 
			
 
				-$S_{sta}$ is designed for static checkpoint techniques.
			
 
				-Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$. 
			
 
				+$S_{sta}$ is designed for the techniques like \emph{static}, which query to decide checkpoint execution at checkpoint triggers.
			
 
				+Since directly reading $V_{dd}$ is infeasible (i.e., $V_{dd}$ itself is a reference voltage), $S_{sta}$ reads $V_{ref}$ instead.
			
 
				+% Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$. 
			
 
				 This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
			
 
				-During sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
			
 
				+On the other hand, during sub-normal voltage executions, this value increases as $V_{dd}$ decreases, as discussed in Sec.~\ref{sec:sub_normal_execution}.
			
 
				 As a result, given that the target threshold voltage for checkpoint execution is $V_{th}$, software designers can compare the ADC value against $\lfloor V_{ref}/V_{th} \cdot 2^n \rfloor$ to determine whether to execute a checkpoint.
			
 
				 
			
 
				 On the other hand, $S_{dyn}$ utilizes an on-chip comparator, which is available in most modern MCUs including STM32L5 and MSP430.
			
@@ -87,7 +99,7 @@ Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{t
 
				     \label{fig:expr_precise_checkpoint_timings}
			
 
				 \end{figure}
			
 
				 
			
 
				-Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average end-to-end execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
			
 
				+Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
			
 
				 Fig.~\ref{fig:expr_precise_checkpoint_timings_static} illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
			
 
				 The error bars indicate the minimum and maximum measured execution times for each benchmark.
			
 
				 The results clearly demonstrate that the execution time is significantly improved in both systems by extending the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
			
@@ -103,9 +115,11 @@ Furthermore, the proposed setups can reduce the system complexity, as they elimi
 
				 Our model also helps designers in selecting efficient hardware components across various parameters.
			
 
				 For example, it reveals that operating voltage of peripherals (e.g., external NVMs) is a critical design consideration (Sec.~\ref{sec:sub_normal_execution}), often more important than other factors such as latency.
			
 
				 % We evaluate this tradeoff by simulating an external FRAM having faster access latency but smaller operating voltage.
			
 
				+
			
 
				 To evaluate this tradeoff, we simulate two FRAM configurations, F1 and F2, in our reference system.
			
 
				-F1 represents a slower setup capable of operating down to 2.5V, achieved by doubling the software-configurable wait time for FRAM accesses.
			
 
				-F2 is set to have the lowest access latency but the system stops at 2.8V.
			
 
				+F1 represents a slower setup capable of operating down to 2.5V. 
			
 
				+This is achieved by doubling the software-configurable wait time for FRAM accesses.
			
 
				+F2 is set to have the lowest access latency but requires the system stop operating at 2.8V.
			
 
				 
			
 
				 \begin{figure}
			
 
				     \centering
			
@@ -120,8 +134,8 @@ These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\ci
 
				 This example clearly shows that operating voltage, often overlooked in the traditional execution model, should be considered a critical design parameter.
			
 
				 
			
 
				 Finally, our model highlights advantages of using smaller decoupling capacitors.
			
 
				-Larger buffers not only increases the ratio of sub-normal voltage operations but also raises the amount of discharged energy during power-offs.
			
 
				-Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that it takes 1.18x and 1.36x longer to complete the benchmarks, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
			
 
				+Larger buffers not only increases the ratio of sub-normal voltage operations but also raise the amount of discharged energy during power-offs.
			
 
				+Indeed, in our reference system with $C_{ES}$ = 1100uF, we observe that completing benchmarks takes 1.18x and 1.36x longer on average, when 440uF and 660uF capacitors are used as C2, respectively, compared to our setup with a 220uF capacitor.
			
 
				 % As a result, it is a good design practice to use the smallest decoupling capacitors for efficiency of intermittent systems.
			
 
				 
			
 
				 % \begin{figure}