Message boards :
News :
Xoroshigo2 v1.04 - New plan classes
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jun 20 Posts: 70 Credit: 131,995,555 RAC: 940,864 ![]() ![]() ![]() ![]() ![]() |
Hello everyone, We've generated some new work, and while I was at it, I implemented the plan class changes I alluded to in discussion with a few users who were trying to use Windows 7 to run xoroshigo. Python does not support Windows 7 anymore, hasn't since Python 3.8 back in 2018. 3.9 made breaking changes for Windows 7 and will not run on Windows 7. To remedy the issue, we've implemented "win-modern" and "lin-modern" as plan classes we can use if our apps won't run on older OSes. Put simply, on linux if your GLIBC is older than 2.27, you won't get work for xoroshigo2 (or any other app that requires lin-modern - which is likely to be all of our work in the future.) On Windows, your windows version must be 8.1 or higher. Let us know if you run into any issues running our apps. Happy crunching! |
![]() Send message Joined: 8 Mar 21 Posts: 81 Credit: 877,820,473 RAC: 9,313,686 ![]() |
Thanks for the new apps. Should reduce the error rate from older hosts. Running some lin-modern now already with no issues. |
Send message Joined: 3 Sep 20 Posts: 7 Credit: 507,652,549 RAC: 7,449,993 ![]() |
Is this new app making WUs run 10x longer? |
![]() Send message Joined: 8 Mar 21 Posts: 81 Credit: 877,820,473 RAC: 9,313,686 ![]() |
Is this new app making WUs run 10x longer? Don't know. We never saw this on the earlier apps. Developer said he tested the app for the first 50 confguration files and saw no issues. But the latest configuration files are screwing the app up or something. |
Send message Joined: 15 Jun 20 Posts: 70 Credit: 131,995,555 RAC: 940,864 ![]() ![]() ![]() ![]() ![]() |
Hey Keith, I’ll write up an explainer tonight about what happened and what we’re working on to try to correct it. The short answer though is that we did not anticipate these workunits to run as long as they are and I’ve been hard at work for much of the day analyzing and diagnosing it. I hope to have a solid fix in place tonight or tomorrow, but I’ll still write up a more detailed post tonight regardless. |
![]() Send message Joined: 8 Mar 21 Posts: 81 Credit: 877,820,473 RAC: 9,313,686 ![]() |
Thanks for the progress update. Hope it gets resolved soonest. |
Send message Joined: 15 Jun 20 Posts: 70 Credit: 131,995,555 RAC: 940,864 ![]() ![]() ![]() ![]() ![]() |
So the gist of the issue: Some of the new configuration files produce situations where the function with most of the runtime no longer has most of the runtime. This reveals room for optimization as another function we had not created a native implementation for yet is now consuming most of the runtime. We've been working on this new implementation of that function, which has necessitated a re-implementation of the RNG used by numpy, and we're getting results that are close to original. Not perfect, but close. The efforts I've put in so far have netted a 6.6x runtime improvement for the worst-case config file I tested for 1 million iterations. But, again, it's not quite ready to release because of the inaccuracy I mentioned before. Once we decide whether we're satisfied with "close enough", or we find the bug in the implementation we've written that is causing the inaccuracy, we'll be ready to push this out to the wider BOINC project and hopefully address the runtime discrepancy for everyone. |
Send message Joined: 3 Apr 25 Posts: 1 Credit: 7,165,000 RAC: 293,993 |
Any reason why some v1.04 tasks are taking 5-20x longer than previously? Previous avg runtime was about 2hrs. See e.g. https://minecraftathome.com/minecrafthome/workunit.php?wuid=7371217. I have another task that's about 22hrs in with 14hrs remaining. I don't want to abort these extra-long tasks, but with constant-credit I feel like I'm getting the short end of the stick here. |
![]() Send message Joined: 8 Mar 21 Posts: 81 Credit: 877,820,473 RAC: 9,313,686 ![]() |
If the tasks get fixed back to the intended fixed runtimes, there will be no need to update the fixed credit system. But I agree we are getting shorted on credit for the current long running tasks for the amount of calculations we are inputting into the effort. Maybe we can be credited accordingly for these temporary long running tasks in the meantime. |
Send message Joined: 5 Mar 25 Posts: 4 Credit: 23,315,000 RAC: 791,016 |
A task has been running for about 40 hours with 37 per cent done. It's progress rate is declining. Since yesterday to present, its estimated remaining time has increased instead. If the admins won't abort these tasks, I'd like to attempt to see if it can be done in a week. <checkpoint_cpu_time>137811.800000</checkpoint_cpu_time> <checkpoint_elapsed_time>139462.959419</checkpoint_elapsed_time> <fraction_done>0.371080</fraction_done> Estimated time remaining 2d 17:39:28 Progress rate 1.080% per hour |
Send message Joined: 15 Jun 20 Posts: 70 Credit: 131,995,555 RAC: 940,864 ![]() ![]() ![]() ![]() ![]() |
Just an update, but I'm much closer on the binary fixes. The RNG I was re-implementing is producing the exact values I expected. Now I'm just debugging including that re-implementation in our project's code. I hope to have this finalized tomorrow. We need to address the task deadline issue still, of course. I'm considering bumping the credit amount up considerably as a temporary consolation once the new binary is in place. |
Send message Joined: 25 Jun 20 Posts: 15 Credit: 494,111,845 RAC: 7,023,004 ![]() ![]() ![]() |
Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too. |
Send message Joined: 11 Mar 21 Posts: 5 Credit: 219,419,901 RAC: 2,226,139 ![]() ![]() |
Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too.+1 |
Send message Joined: 28 Jun 20 Posts: 6 Credit: 125,834,723 RAC: 1,966,711 ![]() ![]() |
Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too. +1 |
Send message Joined: 3 Sep 20 Posts: 7 Credit: 507,652,549 RAC: 7,449,993 ![]() |
Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too. Doesn't matter, the runtimes are 10-20x and more longer...the credits need to be adjusted to make up for not only the extended runtimes, but also the fact that even before this batch a lot of tasks from earlier this month went Invalid. That is quite a bit of resources wasted with no reward. |
![]() Send message Joined: 8 Mar 21 Posts: 81 Credit: 877,820,473 RAC: 9,313,686 ![]() |
Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too. +1 I agree, especially for the already returned ones that were in fact valid and only invalidated because the tasks were cancelled by bad admin configurations. |
![]() ![]() Send message Joined: 25 Jun 20 Posts: 8 Credit: 97,507,028 RAC: 2,059,890 ![]() |
I have half a dozen running on my Windows laptop and they don't appear on my account, all running 2 day + run times but should I kill them Conan EDIT:: I found the work units, all have been thrown in the error box as Timeout No Response, will I get any credit I wonder. They have passed 70% done |
Send message Joined: 5 Mar 25 Posts: 4 Credit: 23,315,000 RAC: 791,016 |
They may have timed out, but if they are completed and reported before the third validation arrives, the timed out tasks may still be valid? But I think a task that long might not end up getting validated even if completed. I have a task that has been running for over three days and it is still running and has timed out. Name xoroshigo_2.07_config-053-hxlreg-fullinfo-rank005-tamTZ5DN_12 CPU time 3d 03:08:40 Elapsed time 3d 03:39:36 Estimated time remaining 2d 19:25:36 |
![]() Send message Joined: 21 Jul 20 Posts: 18 Credit: 5,862,364 RAC: 8,448 ![]() |
I aborted one Task 15046751 because it had run for 2 days 14 hours 50 min 4 sec, and was only 34% complete. The estimate to complete was getting exponentially longer every hour it was running. |
Send message Joined: 13 Oct 20 Posts: 14 Credit: 119,231,591 RAC: 2,827,872 |
As some wus run for several days now I would strongly suggest to increase the latest return time to at least a week if not 10 days! As soon as all the wus arer back to the usual run times you can bring that time back to three days. ![]() |