Message boards :
Number crunching :
Checkpoint(Pausing) might be causing Invalid Results
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Aug 24 Posts: 1 Credit: 152,500 RAC: 3,322 |
In short: So this is my conclusion: interrupting the calculation (either by pause or reboot) will clear its result file(seeds.txt), and finally leads to calculation error, I think. Hey guys! For project 1.20 Find seeds with zero villages within a radius v1.01 (cuda), I got my first task error, so I decided to take a look at others errors. Having checked 3 users' tasks and task units they belong to, I noticed that every task that have been paused at least once goes failed, and all of those succeeded tasks take no pause. Although I didn't do a complete research, I assume that incontinuous calculations would finally become invalid results. I had tracked my BOINC data directory, and had found that after pause or reboot, the seeds.txt file will be cleared, and the stderr.txt will append "checkpoint loaded..." text. I didn't find the original seeds anywhere, so I guess the program might just uploaded the seeds.txt file as the result. Now I'm conducting a test on my laptop, with a schedule backuping the seeds file every 5 minutes. I imported the first two seeds that were cleared, and now it was nearly completed. Let's see if it's right this time or not... [1.19,3:49 p.m. UTC+8] Yes! I got it! Please look at https://minecraftathome.com/minecrafthome/workunit.php?wuid=5093723, where Task 10832714 is mine. Task 10310763 paused 2 times and got wrong, while Task 10310764 never paused. Task 10832714 is mine. Although it paused 4 times, it still succeeded by importing the 2 missing seeds that were cleared by checkpoint. So this is my conclusion: interrupting the calculation (either by pause or reboot, etc) will clear its result file(seeds.txt), and finally leads to calculation error, I think. Surprisingly, busy CPU won't cause checkpoint, as the program might be suspended through signals, rather than killing it and log a checkpoint. The program might be improved by not clearing the file during the startup, but checking if exists before appending to the file's end, in my opinion. After all, the BOINC program will automatically clear the task's slot after it's finished. |
Send message Joined: 15 Jun 20 Posts: 32 Credit: 101,415,555 RAC: 110,632 |
Hi! Thanks for the bug report! I think you're correct. https://github.com/MinecraftAtHome/LoneliestSeed/blob/main/main.cu#L4867 Here's where we're opening seeds.txt. I believe this should be "a", not "w+" as w/w+ would replace the file in some fashion. I'll update that and push it up to the server once it's finished. I appreciate the heads up about this, and apologies for any inconvenience caused by the validation errors. Thank you! |
Send message Joined: 9 Sep 24 Posts: 3 Credit: 25,487,500 RAC: 2,418 |
I see a validation error in one of my tasks (running v1.07) that had to checkpoint. Will this be fixed in v1.08? https://minecraftathome.com/minecrafthome/result.php?resultid=10340374 <core_client_version>7.18.1</core_client_version> <![CDATA[ <stderr_txt> boinc gpu 0 gpuindex: 0 No checkpoint to load boinc gpu 0 gpuindex: 0 Checkpoint loaded, task time 1097 s, seed pos: 119 checked = 1073741824 time taken = 9809.283000 seeds per second: 124341.598919 07:08:31 (24849): called boinc_finish(0) </stderr_txt> ]]> |