|
@@ -10,7 +10,7 @@ In \emph{static}, checkpoint triggers are inserted at every loop latch in the pr
|
|
|
At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
|
|
At runtime, checkpoint triggers examine $V_{ES}$ and execute checkpoint only when it is below a predefined threshold.
|
|
|
In contrast, \emph{dynamic}~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
|
|
In contrast, \emph{dynamic}~\cite{jayakumarQUICKRECALL2014,maengSupporting2019,balsamoHibernus2016,balsamoHibernus2015,kortbeekTimesensitive2020} does not modify the original program code.
|
|
|
Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
|
|
Instead, it executes checkpoints via interrupts from the power management system, generated when $V_{ES}$ reaches $V_l$.
|
|
|
-These schemes are considered since most checkpoint techniques exploit $V_{ES}$ by either actively polling it (as in \emph{static}) or receiving a signal (as in \emph{dynamic}).
|
|
|
|
|
|
|
+These schemes are considered since most checkpoint techniques utilize $V_{ES}$ by either actively polling it (as in \emph{static}) or by receiving a signal (as in \emph{dynamic}).
|
|
|
All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.
|
|
All the evaluations are conducted with 470uF energy storage and 1mA of input current at 1.9V, unless otherwise stated.
|
|
|
|
|
|
|
|
\subsection{Delaying Checkpoint Executions}
|
|
\subsection{Delaying Checkpoint Executions}
|
|
@@ -19,7 +19,7 @@ All the evaluations are conducted with 470uF energy storage and 1mA of input cur
|
|
|
The first design practice we propose is to delay checkpoint executions until the last possible moment.
|
|
The first design practice we propose is to delay checkpoint executions until the last possible moment.
|
|
|
While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
|
|
While this practice is generally regarded as desirable in existing works~\cite{ransfordMementos2011,bhattiHarvOS2017}, it has not been recognized as a critical property.
|
|
|
Under the traditional execution model, early checkpoint execution is often considered acceptable as it makes the system wake up sooner, incurring only minor costs for initialization and recovery.
|
|
Under the traditional execution model, early checkpoint execution is often considered acceptable as it makes the system wake up sooner, incurring only minor costs for initialization and recovery.
|
|
|
-For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}.
|
|
|
|
|
|
|
+For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024,raffeckWoCA2024}.
|
|
|
% For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
|
|
% For example, some approaches have explored proactive power-offs based on the program's worst-case execution time~\cite{choiCompilerDirected2022,reymondSCHEMATIC2024}, which can be overly pessimistic~\cite{raffeckWoCA2024}.
|
|
|
In contrast, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}).%, highlighting the impact of delaying checkpoint executions.
|
|
In contrast, our model reveals that significant energy is wasted each time the system powers off (Sec.~\ref{sec:power_efficiency}).%, highlighting the impact of delaying checkpoint executions.
|
|
|
% As a result, the importance of delaying checkpoint executions is greater than previously assumed.
|
|
% As a result, the importance of delaying checkpoint executions is greater than previously assumed.
|
|
@@ -35,7 +35,7 @@ We evaluate the impact of delaying checkpoint executions in \emph{dynamic}, by v
|
|
|
A 1100uF capacitor is used for $C_{ES}$.
|
|
A 1100uF capacitor is used for $C_{ES}$.
|
|
|
% Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
|
|
% Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
|
|
|
% A 1100uF capacitor is used for $C_{ES}$ and the execution times are normalized to the 3.4V configuration.
|
|
% A 1100uF capacitor is used for $C_{ES}$ and the execution times are normalized to the 3.4V configuration.
|
|
|
-Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 30 runs, normalized to the 3.V configuration.
|
|
|
|
|
|
|
+Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 30 runs, normalized to the 3.4V configuration.
|
|
|
The results show that executing checkpoints earlier is significantly inefficient as opposed to existing expectations: by 1.38x in 3.7V, and 2.45x in 4.0V setups, on average.
|
|
The results show that executing checkpoints earlier is significantly inefficient as opposed to existing expectations: by 1.38x in 3.7V, and 2.45x in 4.0V setups, on average.
|
|
|
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
|
|
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
|
|
|
% Consequently, to design efficient checkpoint techniques, it important to minimize the margin between checkpoint execution and the power-off.
|
|
% Consequently, to design efficient checkpoint techniques, it important to minimize the margin between checkpoint execution and the power-off.
|
|
@@ -62,7 +62,7 @@ For consistent operation of ADCs, we adopt a voltage source with a known value o
|
|
|
In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
|
|
In STM32L5 and MSP430, an internal reference voltage source of 1.2V is available; alternatively, an external voltage reference (e.g., TI LVM431~\cite{texasinstrumentsLMV431}) can be used.
|
|
|
Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
|
|
Note that $V_{ref}$ should be lower than the minimal operating voltage of MCU (e.g., 1.7V) as $V_{ref}$ is generated by regulating $V_{dd}$.
|
|
|
|
|
|
|
|
-$S_{sta}$ is designed for the techniques like \emph{static}, which query to decide checkpoint execution at checkpoint triggers.
|
|
|
|
|
|
|
+$S_{sta}$ is designed for techniques similar to \emph{static}, which query whether to execute a checkpoint at checkpoint triggers.
|
|
|
Since directly reading $V_{dd}$ is infeasible (i.e., $V_{dd}$ itself is a reference voltage), $S_{sta}$ reads $V_{ref}$ instead.
|
|
Since directly reading $V_{dd}$ is infeasible (i.e., $V_{dd}$ itself is a reference voltage), $S_{sta}$ reads $V_{ref}$ instead.
|
|
|
% Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$.
|
|
% Instead of reading $V_{ES}$ at checkpoint triggers, $S_{sta}$ reads $V_{ref}$.
|
|
|
This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
|
|
This results in the same value of $\lfloor V_{ref}/V_{dd} \cdot 2^n \rfloor$ when operating on normal voltage, where $n$ is the ADC resolution.
|
|
@@ -102,7 +102,7 @@ Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{t
|
|
|
Fig.~\ref{fig:expr_precise_checkpoint_timings} compares the average execution times of the benchmarks over 30 iterations between traditional systems and the proposed setups.
|
|
Fig.~\ref{fig:expr_precise_checkpoint_timings} compares the average execution times of the benchmarks over 30 iterations between traditional systems and the proposed setups.
|
|
|
Fig.~\ref{fig:expr_precise_checkpoint_timings_static} and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} illustrates the results of $S_{sta}$ and $S_{dyn}$, respectively.
|
|
Fig.~\ref{fig:expr_precise_checkpoint_timings_static} and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} illustrates the results of $S_{sta}$ and $S_{dyn}$, respectively.
|
|
|
% illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
|
|
% illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
|
|
|
-The error bars indicate the minimum and maximum execution times for each benchmark.
|
|
|
|
|
|
|
+The whiskers indicate the minimum and maximum execution times for each benchmark.
|
|
|
The results show significant improvements in execution times for both systems, with average gain of 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
|
|
The results show significant improvements in execution times for both systems, with average gain of 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
|
|
|
While the effectiveness of checkpoint schemes varies depending on application characteristics, our setups evenly enhance performance across all benchmarks.
|
|
While the effectiveness of checkpoint schemes varies depending on application characteristics, our setups evenly enhance performance across all benchmarks.
|
|
|
This underscores the importance of accurately detecting power-off events for efficient intermittent system operation.
|
|
This underscores the importance of accurately detecting power-off events for efficient intermittent system operation.
|
|
@@ -135,7 +135,7 @@ F2 is set to have the lowest access latency but requires the system stop operati
|
|
|
Fig.~\ref{fig:expr_peripheral_voltage} presents the execution times of the benchmarks for the two configurations in $S_{dyn}$, averaged over 30 runs.
|
|
Fig.~\ref{fig:expr_peripheral_voltage} presents the execution times of the benchmarks for the two configurations in $S_{dyn}$, averaged over 30 runs.
|
|
|
Despite its doubled latency, F1 completes the workloads 1.46x faster on average, with consistent improvements across all benchmarks.
|
|
Despite its doubled latency, F1 completes the workloads 1.46x faster on average, with consistent improvements across all benchmarks.
|
|
|
These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\cite{fujitsuMB85R4M2T}) could considerably improve the performance of our reference system.
|
|
These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\cite{fujitsuMB85R4M2T}) could considerably improve the performance of our reference system.
|
|
|
-This example clearly shows that operating voltage, often overlooked in the traditional execution model, should be considered a critical design parameter.
|
|
|
|
|
|
|
+This example clearly shows that operating voltage, often overlooked in the traditional model, should be considered a critical design parameter.
|
|
|
|
|
|
|
|
Finally, our model highlights advantages of using smaller decoupling capacitors.
|
|
Finally, our model highlights advantages of using smaller decoupling capacitors.
|
|
|
Larger buffers not only increases the ratio of sub-normal voltage operations but also raise the amount of discharged energy during power-offs.
|
|
Larger buffers not only increases the ratio of sub-normal voltage operations but also raise the amount of discharged energy during power-offs.
|