Pausing tasks causes tast restart

Message boards : Number crunching : Pausing tasks causes tast restart
Message board moderation

To post messages, you must log in.

AuthorMessage
Michael H.W. Weber
Avatar

Send message
Joined: 27 Jun 20
Posts: 9
Credit: 788,006
RAC: 0
Message 87 - Posted: 28 Jun 2020, 9:38:39 UTC

So far, I processed only one task: When initially started yesterday on my NVIDIA GTX1060, I paused computation after about 10 min of computation. When I returned to my machine, run time was set back to zero and upon resuming, the GPU task started from scratch.
At least it completed successfully after around 4 hrs and was validated.

I have never seen such behaviour before (I am participating 24/7 in distributed computing projects since 2001).

Michael.
President of Rechenkraft.net - This world's first and largest distributed computing organization. We make those things possible that supercomputers don't.
ID: 87 · Report as offensive
Profile chip

Send message
Joined: 14 Jun 20
Posts: 78
Credit: 1,321,619
RAC: 0
Message 88 - Posted: 28 Jun 2020, 12:47:26 UTC - in response to Message 87.  

So far, I processed only one task: When initially started yesterday on my NVIDIA GTX1060, I paused computation after about 10 min of computation. When I returned to my machine, run time was set back to zero and upon resuming, the GPU task started from scratch.
At least it completed successfully after around 4 hrs and was validated.

I have never seen such behaviour before (I am participating 24/7 in distributed computing projects since 2001).

Michael.


Hey, checkpointing is available in a development branch, it’ll be pushed shortly to replace the current app version. By the end of the day hopefully.

Long story short, we thought it was already sorted, but alas here we are.
ID: 88 · Report as offensive
Jord
Volunteer moderator
Help desk expert
Avatar

Send message
Joined: 24 Jun 20
Posts: 85
Credit: 207,156
RAC: 0
Message 94 - Posted: 28 Jun 2020, 15:38:11 UTC - in response to Message 88.  
Last modified: 28 Jun 2020, 15:39:26 UTC

Not so sure I would call this checkpointing...

28/06/2020 16:52:47 | minecrafthome | Starting task kaktwoos_1.0.7_0218_268465497051539_0
28/06/2020 16:52:47 | minecrafthome | [cpu_sched] Starting task kaktwoos_1.0.7_0218_268465497051539_0 using kaktwoos version 112 (opencl_amd) in slot 0
28/06/2020 17:32:31 | minecrafthome | [checkpoint] result kaktwoos_1.0.7_0218_268465497051539_0 checkpointed
28/06/2020 17:32:34 | minecrafthome | Computation for task kaktwoos_1.0.7_0218_268465497051539_0 finished

That's one checkpoint just seconds before the task stops. Remaining estimate is now 2 hours on entry, Progress gets up to 22% before it's now finished.

(My preferences are to checkpoint at most every 65 seconds)
ID: 94 · Report as offensive
Profile Hy
Project developer
Avatar

Send message
Joined: 15 Jun 20
Posts: 74
Credit: 19,537,761
RAC: 0
Message 102 - Posted: 29 Jun 2020, 4:59:59 UTC
Last modified: 29 Jun 2020, 5:00:16 UTC

ID: 102 · Report as offensive
Michael H.W. Weber
Avatar

Send message
Joined: 27 Jun 20
Posts: 9
Credit: 788,006
RAC: 0
Message 103 - Posted: 29 Jun 2020, 7:28:28 UTC

It is not about checkpointing. When a task is paused, I so far assumed it is just halted and kept in memory. The experience I described above, however, suggests that Minecraft discards the entire data after a certain period of time? Which I found baffling. ;-)

Michael.
President of Rechenkraft.net - This world's first and largest distributed computing organization. We make those things possible that supercomputers don't.
ID: 103 · Report as offensive
Profile Henk Haneveld

Send message
Joined: 24 Jun 20
Posts: 3
Credit: 1,715,763
RAC: 8
Message 106 - Posted: 29 Jun 2020, 11:45:29 UTC - in response to Message 103.  

It is not about checkpointing. When a task is paused, I so far assumed it is just halted and kept in memory. The experience I described above, however, suggests that Minecraft discards the entire data after a certain period of time? Which I found baffling. ;-)

Michael.

When a GPU task is paused it is removed from the GPU memory. Without checkpointing it has to restart from the beginning.
The "Leave in Memory" setting works only for CPU tasks.
ID: 106 · Report as offensive
Michael H.W. Weber
Avatar

Send message
Joined: 27 Jun 20
Posts: 9
Credit: 788,006
RAC: 0
Message 108 - Posted: 29 Jun 2020, 14:03:11 UTC - in response to Message 106.  

When a GPU task is paused it is removed from the GPU memory. Without checkpointing it has to restart from the beginning.
The "Leave in Memory" setting works only for CPU tasks.

Thanks for the information.

Michael.
President of Rechenkraft.net - This world's first and largest distributed computing organization. We make those things possible that supercomputers don't.
ID: 108 · Report as offensive
Profile chip

Send message
Joined: 14 Jun 20
Posts: 78
Credit: 1,321,619
RAC: 0
Message 118 - Posted: 1 Jul 2020, 13:10:52 UTC

This is solved in the latest round of updates, see the last news post.
ID: 118 · Report as offensive

Message boards : Number crunching : Pausing tasks causes tast restart