|
|
@@ -35,7 +35,7 @@ We evaluate the impact of delaying checkpoint executions in \emph{dynamic}, by v
|
|
|
A 1100uF capacitor is used for $C_{ES}$.
|
|
|
% Fig.~\ref{fig:expr_checkpoint_voltages} presents the benchmark execution times in dynamic checkpoint scheme, across various checkpoint execution voltages.
|
|
|
% A 1100uF capacitor is used for $C_{ES}$ and the execution times are normalized to the 3.4V configuration.
|
|
|
-Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 20 runs, normalized to the 3.V configuration.
|
|
|
+Fig.~\ref{fig:expr_checkpoint_voltages} presents the average execution times of the benchmarks over 30 runs, normalized to the 3.V configuration.
|
|
|
The results show that executing checkpoints earlier is significantly inefficient as opposed to existing expectations: by 1.38x in 3.7V, and 2.45x in 4.0V setups, on average.
|
|
|
Moreover, the overhead is consistent across all benchmarks since early checkpoint executions directly reduce the energy available for the computing system.
|
|
|
% Consequently, to design efficient checkpoint techniques, it important to minimize the margin between checkpoint execution and the power-off.
|
|
|
@@ -99,11 +99,15 @@ Specifically, we configure $R1$ and $R2$ to satisfy $\frac{R2}{R1+R2} \cdot V_{t
|
|
|
\label{fig:expr_precise_checkpoint_timings}
|
|
|
\end{figure}
|
|
|
|
|
|
-Fig.~\ref{fig:expr_precise_checkpoint_timings} shows the average execution times of the benchmarks over 30 iterations, comparing the traditional systems with the proposed setups.
|
|
|
-Fig.~\ref{fig:expr_precise_checkpoint_timings_static} illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
|
|
|
-The error bars indicate the minimum and maximum measured execution times for each benchmark.
|
|
|
-The results clearly demonstrate that the execution time is significantly improved in both systems by extending the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
|
|
|
-Furthermore, these improvements are consistent across all benchmarks, regardless of the application characteristics, highlighting the general effectiveness of the proposed setups.
|
|
|
+Fig.~\ref{fig:expr_precise_checkpoint_timings} compares the average execution times of the benchmarks over 30 iterations between traditional systems and the proposed setups.
|
|
|
+Fig.~\ref{fig:expr_precise_checkpoint_timings_static} and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} illustrates the results of $S_{sta}$ and $S_{dyn}$, respectively.
|
|
|
+% illustrates the performance of $S_{sta}$ and Fig.~\ref{fig:expr_precise_checkpoint_timings_dynamic} presents the result for $S_{dyn}$.
|
|
|
+The error bars indicate the minimum and maximum execution times for each benchmark.
|
|
|
+The results show significant improvements in execution times for both systems, with average gain of 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
|
|
|
+While the effectiveness of checkpoint schemes varies depending on application characteristics, our setups evenly enhance performance across all benchmarks.
|
|
|
+This underscores the importance of accurately detecting power-off events for efficient intermittent system operation.
|
|
|
+% It clearly demonstrates that the both setups can extend the operation at sub-normal voltages: 3.04x in $S_{sta}$ and 2.85x in $S_{dyn}$.
|
|
|
+% Furthermore, these improvements are consistent across all benchmarks, regardless of the application characteristics, highlighting the general effectiveness of the proposed setups.
|
|
|
|
|
|
Another advantage of the proposed setups is their simplicity and practical applicability.
|
|
|
Since the both setups only modify the method to detect imminent power failures and leave the checkpoint algorithms unchanged, it is straightforward to apply them in existing techniques.
|
|
|
@@ -128,7 +132,7 @@ F2 is set to have the lowest access latency but requires the system stop operati
|
|
|
\label{fig:expr_peripheral_voltage}
|
|
|
\end{figure}
|
|
|
|
|
|
-Fig.~\ref{fig:expr_peripheral_voltage} presents the execution times of the benchmarks for the two configurations in $S_{dyn}$, averaged over 20 runs.
|
|
|
+Fig.~\ref{fig:expr_peripheral_voltage} presents the execution times of the benchmarks for the two configurations in $S_{dyn}$, averaged over 30 runs.
|
|
|
Despite its doubled latency, F1 completes the workloads 1.46x faster on average, with consistent improvements across all benchmarks.
|
|
|
These results suggest that using slower FRAM that operates until 1.8V (e.g.,~\cite{fujitsuMB85R4M2T}) could considerably improve the performance of our reference system.
|
|
|
This example clearly shows that operating voltage, often overlooked in the traditional execution model, should be considered a critical design parameter.
|