Xoroshigo2 v1.04 - New plan classes

Message boards : News : Xoroshigo2 v1.04 - New plan classes
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 70
Credit: 131,995,555
RAC: 940,864
Message 1125 - Posted: 13 Apr 2025, 1:15:04 UTC

Hello everyone,

We've generated some new work, and while I was at it, I implemented the plan class changes I alluded to in discussion with a few users who were trying to use Windows 7 to run xoroshigo.

Python does not support Windows 7 anymore, hasn't since Python 3.8 back in 2018. 3.9 made breaking changes for Windows 7 and will not run on Windows 7.

To remedy the issue, we've implemented "win-modern" and "lin-modern" as plan classes we can use if our apps won't run on older OSes.

Put simply, on linux if your GLIBC is older than 2.27, you won't get work for xoroshigo2 (or any other app that requires lin-modern - which is likely to be all of our work in the future.)
On Windows, your windows version must be 8.1 or higher.

Let us know if you run into any issues running our apps. Happy crunching!
ID: 1125 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 81
Credit: 877,820,473
RAC: 9,313,686
Message 1127 - Posted: 13 Apr 2025, 1:21:24 UTC - in response to Message 1125.  

Thanks for the new apps. Should reduce the error rate from older hosts. Running some lin-modern now already with no issues.
ID: 1127 · Report as offensive     Reply Quote
bluestang

Send message
Joined: 3 Sep 20
Posts: 7
Credit: 507,652,549
RAC: 7,449,993
Message 1135 - Posted: 14 Apr 2025, 15:03:52 UTC

Is this new app making WUs run 10x longer?
ID: 1135 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 81
Credit: 877,820,473
RAC: 9,313,686
Message 1141 - Posted: 14 Apr 2025, 22:52:53 UTC - in response to Message 1135.  

Is this new app making WUs run 10x longer?

Don't know. We never saw this on the earlier apps. Developer said he tested the app for the first 50 confguration files and saw no issues. But the latest configuration files are screwing the app up or something.
ID: 1141 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 70
Credit: 131,995,555
RAC: 940,864
Message 1142 - Posted: 15 Apr 2025, 4:37:28 UTC - in response to Message 1141.  

Hey Keith,

I’ll write up an explainer tonight about what happened and what we’re working on to try to correct it.

The short answer though is that we did not anticipate these workunits to run as long as they are and I’ve been hard at work for much of the day analyzing and diagnosing it.

I hope to have a solid fix in place tonight or tomorrow, but I’ll still write up a more detailed post tonight regardless.
ID: 1142 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 81
Credit: 877,820,473
RAC: 9,313,686
Message 1144 - Posted: 15 Apr 2025, 7:20:04 UTC - in response to Message 1142.  

Thanks for the progress update. Hope it gets resolved soonest.
ID: 1144 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 70
Credit: 131,995,555
RAC: 940,864
Message 1145 - Posted: 15 Apr 2025, 7:30:01 UTC

So the gist of the issue:

Some of the new configuration files produce situations where the function with most of the runtime no longer has most of the runtime.
This reveals room for optimization as another function we had not created a native implementation for yet is now consuming most of the runtime.

We've been working on this new implementation of that function, which has necessitated a re-implementation of the RNG used by numpy, and we're getting results that are close to original. Not perfect, but close.

The efforts I've put in so far have netted a 6.6x runtime improvement for the worst-case config file I tested for 1 million iterations. But, again, it's not quite ready to release because of the inaccuracy I mentioned before.

Once we decide whether we're satisfied with "close enough", or we find the bug in the implementation we've written that is causing the inaccuracy, we'll be ready to push this out to the wider BOINC project and hopefully address the runtime discrepancy for everyone.
ID: 1145 · Report as offensive     Reply Quote
Gnarwhals

Send message
Joined: 3 Apr 25
Posts: 1
Credit: 7,165,000
RAC: 293,993
Message 1146 - Posted: 15 Apr 2025, 14:25:11 UTC

Any reason why some v1.04 tasks are taking 5-20x longer than previously? Previous avg runtime was about 2hrs. See e.g. https://minecraftathome.com/minecrafthome/workunit.php?wuid=7371217. I have another task that's about 22hrs in with 14hrs remaining. I don't want to abort these extra-long tasks, but with constant-credit I feel like I'm getting the short end of the stick here.
ID: 1146 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 81
Credit: 877,820,473
RAC: 9,313,686
Message 1147 - Posted: 15 Apr 2025, 15:51:39 UTC - in response to Message 1146.  

If the tasks get fixed back to the intended fixed runtimes, there will be no need to update the fixed credit system. But I agree we are getting shorted on credit for the current long running tasks for the amount of calculations we are inputting into the effort.

Maybe we can be credited accordingly for these temporary long running tasks in the meantime.
ID: 1147 · Report as offensive     Reply Quote
esek

Send message
Joined: 5 Mar 25
Posts: 4
Credit: 23,315,000
RAC: 791,016
Message 1149 - Posted: 15 Apr 2025, 16:11:18 UTC

A task has been running for about 40 hours with 37 per cent done. It's progress rate is declining.

Since yesterday to present, its estimated remaining time has increased instead. If the admins won't abort these tasks, I'd like to attempt to see if it can be done in a week.

<checkpoint_cpu_time>137811.800000</checkpoint_cpu_time>
<checkpoint_elapsed_time>139462.959419</checkpoint_elapsed_time>
<fraction_done>0.371080</fraction_done>
Estimated time remaining 2d 17:39:28
Progress rate 1.080% per hour
ID: 1149 · Report as offensive     Reply Quote
boysanic
Project administrator
Project developer

Send message
Joined: 15 Jun 20
Posts: 70
Credit: 131,995,555
RAC: 940,864
Message 1150 - Posted: 16 Apr 2025, 8:21:02 UTC

Just an update, but I'm much closer on the binary fixes.
The RNG I was re-implementing is producing the exact values I expected. Now I'm just debugging including that re-implementation in our project's code.
I hope to have this finalized tomorrow.


We need to address the task deadline issue still, of course. I'm considering bumping the credit amount up considerably as a temporary consolation once the new binary is in place.
ID: 1150 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 25 Jun 20
Posts: 15
Credit: 494,111,845
RAC: 7,023,004
Message 1151 - Posted: 16 Apr 2025, 9:04:58 UTC

Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too.
ID: 1151 · Report as offensive     Reply Quote
klepel

Send message
Joined: 11 Mar 21
Posts: 5
Credit: 219,419,901
RAC: 2,226,139
Message 1152 - Posted: 16 Apr 2025, 10:48:50 UTC - in response to Message 1151.  
Last modified: 16 Apr 2025, 10:50:31 UTC

Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too.
+1
ID: 1152 · Report as offensive     Reply Quote
tito

Send message
Joined: 28 Jun 20
Posts: 6
Credit: 125,834,723
RAC: 1,966,711
Message 1153 - Posted: 16 Apr 2025, 17:41:26 UTC - in response to Message 1151.  

Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too.

+1
ID: 1153 · Report as offensive     Reply Quote
bluestang

Send message
Joined: 3 Sep 20
Posts: 7
Credit: 507,652,549
RAC: 7,449,993
Message 1154 - Posted: 16 Apr 2025, 20:19:00 UTC - in response to Message 1151.  
Last modified: 16 Apr 2025, 20:19:18 UTC

Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too.


Doesn't matter, the runtimes are 10-20x and more longer...the credits need to be adjusted to make up for not only the extended runtimes, but also the fact that even before this batch a lot of tasks from earlier this month went Invalid. That is quite a bit of resources wasted with no reward.
ID: 1154 · Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 8 Mar 21
Posts: 81
Credit: 877,820,473
RAC: 9,313,686
Message 1155 - Posted: 16 Apr 2025, 21:49:52 UTC - in response to Message 1154.  

Credit was already quite high for the given run time. I wouldn't suggest going higher. You're fixing the issue. If run times return to a more normal 1-2 hours the deadline is acceptable too.


Doesn't matter, the runtimes are 10-20x and more longer...the credits need to be adjusted to make up for not only the extended runtimes, but also the fact that even before this batch a lot of tasks from earlier this month went Invalid. That is quite a bit of resources wasted with no reward.

+1 I agree, especially for the already returned ones that were in fact valid and only invalidated because the tasks were cancelled by bad admin configurations.
ID: 1155 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 25 Jun 20
Posts: 8
Credit: 97,507,028
RAC: 2,059,890
Message 1156 - Posted: 17 Apr 2025, 6:55:32 UTC
Last modified: 17 Apr 2025, 7:00:21 UTC

I have half a dozen running on my Windows laptop and they don't appear on my account, all running 2 day + run times but should I kill them

Conan

EDIT:: I found the work units, all have been thrown in the error box as Timeout No Response, will I get any credit I wonder. They have passed 70% done
ID: 1156 · Report as offensive     Reply Quote
esek

Send message
Joined: 5 Mar 25
Posts: 4
Credit: 23,315,000
RAC: 791,016
Message 1157 - Posted: 17 Apr 2025, 7:01:45 UTC - in response to Message 1156.  

They may have timed out, but if they are completed and reported before the third validation arrives, the timed out tasks may still be valid? But I think a task that long might not end up getting validated even if completed. I have a task that has been running for over three days and it is still running and has timed out.

Name
xoroshigo_2.07_config-053-hxlreg-fullinfo-rank005-tamTZ5DN_12
CPU time
3d 03:08:40
Elapsed time
3d 03:39:36
Estimated time remaining
2d 19:25:36
ID: 1157 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 21 Jul 20
Posts: 18
Credit: 5,862,364
RAC: 8,448
Message 1158 - Posted: 17 Apr 2025, 18:07:08 UTC

I aborted one Task 15046751 because it had run for 2 days 14 hours 50 min 4 sec, and was only 34% complete.
The estimate to complete was getting exponentially longer every hour it was running.
ID: 1158 · Report as offensive     Reply Quote
Drago75

Send message
Joined: 13 Oct 20
Posts: 14
Credit: 119,231,591
RAC: 2,827,872
Message 1160 - Posted: 18 Apr 2025, 10:01:17 UTC - in response to Message 1158.  

As some wus run for several days now I would strongly suggest to increase the latest return time to at least a week if not 10 days! As soon as all the wus arer back to the usual run times you can bring that time back to three days.
ID: 1160 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : Xoroshigo2 v1.04 - New plan classes