BAD_WORK_UNIT et GPU AMD
#1
Pas mal BAD_WORK_UNIT de mon coté ces dernieres heures/derniers jours (certaines également FAULTY chez d'autres plieurs, cf. https://apps.foldingathome.org/wu )
Code :
01:53:33:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:11777 run:0 clone:23631 gen:17 core:0x22 unit:0x0000001c287234c95e7749fffffe0e1c
09:10:52:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11779 run:0 clone:1948 gen:31 core:0x22 unit:0x0000002d0d5a98395e73c5c57518e51b
16:09:58:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:11779 run:0 clone:437 gen:44 core:0x22 unit:0x000000360d5a98395e73c5de6d83a3fa
17:21:57:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:11776 run:0 clone:4209 gen:9 core:0x22 unit:0x0000001d287234c95e73c45dd1e3c4fa
18:10:23:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11764 run:0 clone:2502 gen:40 core:0x22 unit:0x0000003e80fccb0a5e6d8271bb659f30
00:53:47:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:11781 run:0 clone:4816 gen:29 core:0x22 unit:0x000000310d5a98395e73c4f893e38b3a
01:02:49:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11764 run:0 clone:3187 gen:33 core:0x22 unit:0x0000003780fccb0a5e6d85da25ea0604
09:02:17:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11778 run:0 clone:29892 gen:23 core:0x22 unit:0x00000025287234c95e77482e18da6e1d
09:02:51:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11759 run:0 clone:2960 gen:37 core:0x22 unit:0x0000003f80fccb0a5e6d7ca7854675bc
09:06:47:WU03:FS00:Sending unit results: id:03 state:SEND error:FAULTY project:11776 run:0 clone:12249 gen:19 core:0x22 unit:0x00000026287234c95e74333c7fa22e47
01:53:33:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:11777 run:0 clone:23631 gen:17 core:0x22 unit:0x0000001c287234c95e7749fffffe0e1c
09:10:52:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11779 run:0 clone:1948 gen:31 core:0x22 unit:0x0000002d0d5a98395e73c5c57518e51b
16:09:58:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:11779 run:0 clone:437 gen:44 core:0x22 unit:0x000000360d5a98395e73c5de6d83a3fa
17:21:57:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:11776 run:0 clone:4209 gen:9 core:0x22 unit:0x0000001d287234c95e73c45dd1e3c4fa
18:10:23:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11764 run:0 clone:2502 gen:40 core:0x22 unit:0x0000003e80fccb0a5e6d8271bb659f30
00:53:47:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:11781 run:0 clone:4816 gen:29 core:0x22 unit:0x000000310d5a98395e73c4f893e38b3a
01:02:49:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11764 run:0 clone:3187 gen:33 core:0x22 unit:0x0000003780fccb0a5e6d85da25ea0604
10:24:49:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11776 run:0 clone:8419 gen:28 core:0x22 unit:0x0000002f287234c95e743375b03a9d44
10:29:49:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:11781 run:0 clone:5226 gen:26 core:0x22 unit:0x000000320d5a98395e758911e444f866
10:37:24:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11781 run:0 clone:6558 gen:41 core:0x22 unit:0x000000420d5a98395e7588fe03d35e85

Avec toujours le même msg d'erreur.
Code :
01:02:48:WU00:FS00:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
01:02:48:WU00:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
01:02:49:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)

Je ne suis a priori pas le seul :
- https://foldingforum.org/viewtopic.php?f=19&t=32301
- https://foldingforum.org/viewtopic.php?f=74&t=32991
- https://foldingforum.org/viewtopic.php?f=81&t=32771

Ça semble lié à l'implémentation d'OpenCL sur certains drivers / certaines cartes AMD (j'ai une RX 460)
Code :
09:02:51:WU03:FS00:Requesting new work unit for slot 00: READY gpu:0:Baffin XT [Radeon RX 460] from 40.114.52.201

Joe_H a écrit :Someone else looked at this kind of error, it appears from their research that there is a limit set in the AMD OpenCL support that may be causing problems if the size of the WU exceeds some large number of atoms.  This will need to be checked into, it is not clear whether this just applies to some version of the drivers, is actually the problem, and so on.  The projects running into this error may get restricted from assignments to AMD GPU's at some point, a report was passed on to the F@h group.  Or the fix might be something else, too soon to be more definite.

muziqaz a écrit :Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Thank you for understanding

Je regarderai si une MàJ des drivers existe dans la journée, vu que je continue à avoir ce type de WU.
Répondre
#2
Oui c'est un bug des drivers pour ces GPUs là (tous sauf Navi pour l’instant) ... d'ailleurs il est signalé dans la liste des problèmes connus dans la dernière version.

La plupart des projets qui ont le problème devraient exclus pour ces GPUs en attendant, mais il en reste peut être quelque uns ...
Répondre
#3
Même problème avec une RX580, j'ai installé le dernier pilote beta 20.4.1 et j'ai réussi à faire une WU seulement depuis hier. C'est un peu frustrant de ne faire tourner que le cpu vu le faible rendement.
Répondre
#4
Mais elles sont encore assignées à vos GPUs ces WUs ?
Répondre
#5
Je viens de relancer le client car au fur et mesure le délai entre les tentatives augmente... Le GPU a commencé à travailler mais quelques minutes seulement avant une nouvelle erreur, je te mets le log ci-dessous car je suis loin de tout comprendre ! :)
Code :
*********************** Log Started 2020-04-07T09:34:19Z ***********************
09:34:27:FS01:Unpaused
09:34:27:WU01:FS01:Connecting to 65.254.110.245:8080
09:34:27:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:34:27:WU01:FS01:Connecting to 18.218.241.186:80
09:34:28:WU01:FS01:Assigned to work server 40.114.52.201
09:34:28:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] from 40.114.52.201
09:34:28:WU01:FS01:Connecting to 40.114.52.201:8080
09:34:53:WU01:FS01:Downloading 29.70MiB
09:34:59:WU01:FS01:Download 39.99%
09:35:05:WU01:FS01:Download 82.93%
09:35:07:WU01:FS01:Download complete
09:35:07:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11777 run:0 clone:9482 gen:6 core:0x22 unit:0x0000000f287234c95e7432cf933d2a85
09:35:07:WU01:FS01:Starting
09:35:07:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ADM20\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 12280 -checkpoint 20 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
09:35:07:WU01:FS01:Started FahCore on PID 5116
09:35:07:WU01:FS01:Core PID:10660
09:35:07:WU01:FS01:FahCore 0x22 started
09:35:08:WU01:FS01:0x22:*********************** Log Started 2020-04-07T09:35:07Z ***********************
09:35:08:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
09:35:08:WU01:FS01:0x22:      Type: 0x22
09:35:08:WU01:FS01:0x22:      Core: Core22
09:35:08:WU01:FS01:0x22:    Website: https://foldingathome.org/
09:35:08:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
09:35:08:WU01:FS01:0x22:    Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
09:35:08:WU01:FS01:0x22:            <rafal.wiewiora@choderalab.org>
09:35:08:WU01:FS01:0x22:      Args: -dir 01 -suffix 01 -version 705 -lifeline 5116 -checkpoint 20
09:35:08:WU01:FS01:0x22:            -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
09:35:08:WU01:FS01:0x22:    Config: <none>
09:35:08:WU01:FS01:0x22:************************************ Build *************************************
09:35:08:WU01:FS01:0x22:    Version: 0.0.2
09:35:08:WU01:FS01:0x22:      Date: Dec 6 2019
09:35:08:WU01:FS01:0x22:      Time: 21:30:31
09:35:08:WU01:FS01:0x22: Repository: Git
09:35:08:WU01:FS01:0x22:  Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
09:35:08:WU01:FS01:0x22:    Branch: HEAD
09:35:08:WU01:FS01:0x22:  Compiler: Visual C++ 2008
09:35:08:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:35:08:WU01:FS01:0x22:  Platform: win32 10
09:35:08:WU01:FS01:0x22:      Bits: 64
09:35:08:WU01:FS01:0x22:      Mode: Release
09:35:08:WU01:FS01:0x22:************************************ System ************************************
09:35:08:WU01:FS01:0x22:        CPU: AMD Ryzen 5 2600 Six-Core Processor
09:35:08:WU01:FS01:0x22:    CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
09:35:08:WU01:FS01:0x22:      CPUs: 12
09:35:08:WU01:FS01:0x22:    Memory: 15.95GiB
09:35:08:WU01:FS01:0x22:Free Memory: 8.58GiB
09:35:08:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
09:35:08:WU01:FS01:0x22: OS Version: 6.2
09:35:08:WU01:FS01:0x22:Has Battery: true
09:35:08:WU01:FS01:0x22: On Battery: false
09:35:08:WU01:FS01:0x22: UTC Offset: 2
09:35:08:WU01:FS01:0x22:        PID: 10660
09:35:08:WU01:FS01:0x22:        CWD: C:\Users\ADM20\AppData\Roaming\FAHClient\work
09:35:08:WU01:FS01:0x22:        OS: Windows 10 Pro
09:35:08:WU01:FS01:0x22:    OS Arch: AMD64
09:35:08:WU01:FS01:0x22:********************************************************************************
09:35:08:WU01:FS01:0x22:Project: 11777 (Run 0, Clone 9482, Gen 6)
09:35:08:WU01:FS01:0x22:Unit: 0x0000000f287234c95e7432cf933d2a85
09:35:08:WU01:FS01:0x22:Reading tar file core.xml
09:35:08:WU01:FS01:0x22:Reading tar file integrator.xml
09:35:08:WU01:FS01:0x22:Reading tar file state.xml
09:35:08:WU01:FS01:0x22:Reading tar file system.xml
09:35:08:WU01:FS01:0x22:Digital signatures verified
09:35:08:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:35:08:WU01:FS01:0x22:Version 0.0.2
09:35:23:WU01:FS01:0x22:Completed 0 out of 2000000 steps (0%)
09:35:23:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:37:36:WU01:FS01:0x22:Completed 20000 out of 2000000 steps (1%)
09:39:49:WU01:FS01:0x22:Completed 40000 out of 2000000 steps (2%)
09:42:05:WU01:FS01:0x22:Completed 60000 out of 2000000 steps (3%)
09:43:40:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:43:40:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:43:44:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:43:44:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:43:47:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:43:47:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:43:47:WU01:FS01:0x22:ERROR:114: Max Retries Reached
09:43:47:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
09:43:47:WU01:FS01:0x22:Saving result file badstate-0.xml
09:43:47:WU01:FS01:0x22:Saving result file badstate-1.xml
09:43:47:WU01:FS01:0x22:Saving result file badstate-2.xml
09:43:47:WU01:FS01:0x22:Saving result file checkpointState.xml
09:43:48:WU01:FS01:0x22:Saving result file checkpt.crc
09:43:48:WU01:FS01:0x22:Saving result file positions.xtc
09:43:48:WU01:FS01:0x22:Saving result file science.log
09:43:48:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:43:48:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:43:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11777 run:0 clone:9482 gen:6 core:0x22 unit:0x0000000f287234c95e7432cf933d2a85
09:43:48:WU01:FS01:Uploading 57.69MiB to 40.114.52.201
09:43:48:WU01:FS01:Connecting to 40.114.52.201:8080
09:43:48:WU02:FS01:Connecting to 65.254.110.245:8080
09:43:49:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:43:49:WU02:FS01:Connecting to 18.218.241.186:80
09:43:49:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
09:43:49:ERROR:WU02:FS01:Exception: Could not get an assignment
09:43:50:WU02:FS01:Connecting to 65.254.110.245:8080
09:43:50:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:43:50:WU02:FS01:Connecting to 18.218.241.186:80
09:43:51:WU02:FS01:Assigned to work server 128.252.203.10
09:43:51:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] from 128.252.203.10
09:43:51:WU02:FS01:Connecting to 128.252.203.10:8080
09:44:02:WU01:FS01:Upload 0.22%
09:44:12:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
09:44:12:WU02:FS01:Connecting to 128.252.203.10:80
09:44:18:WU01:FS01:Upload 0.33%
09:44:24:WU01:FS01:Upload 1.41%
09:44:30:WU01:FS01:Upload 2.06%
09:44:33:ERROR:WU02:FS01:Exception: Failed to connect to 128.252.203.10:80: Une tentative de connexion a échoué car le parti connecté n’a pas répondu convenablement au-delà d’une certaine durée ou une connexion établie a échoué car l’hôte de connexion n’a pas répondu.
09:44:36:WU01:FS01:Upload 2.60%
09:44:43:WU01:FS01:Upload 3.14%
09:44:49:WU01:FS01:Upload 3.79%
09:44:50:WU02:FS01:Connecting to 65.254.110.245:8080
09:44:50:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:44:50:WU02:FS01:Connecting to 18.218.241.186:80
09:44:51:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
09:44:51:ERROR:WU02:FS01:Exception: Could not get an assignment
09:44:55:WU01:FS01:Upload 4.33%
09:45:01:WU01:FS01:Upload 4.88%
09:45:07:WU01:FS01:Upload 5.63%
09:45:14:WU01:FS01:Upload 6.28%
09:45:20:WU01:FS01:Upload 6.93%
09:45:26:WU01:FS01:Upload 7.69%
09:45:32:WU01:FS01:Upload 8.56%
09:45:38:WU01:FS01:Upload 9.43%
09:45:44:WU01:FS01:Upload 10.29%
09:45:50:WU01:FS01:Upload 11.27%
09:45:56:WU01:FS01:Upload 12.13%
09:46:02:WU01:FS01:Upload 12.89%
09:46:08:WU01:FS01:Upload 13.76%
09:46:14:WU01:FS01:Upload 14.63%
09:46:20:WU01:FS01:Upload 15.60%
09:46:26:WU01:FS01:Upload 16.58%
09:46:27:WU02:FS01:Connecting to 65.254.110.245:8080
09:46:27:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:46:27:WU02:FS01:Connecting to 18.218.241.186:80
09:46:28:WARNING:WU02:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
09:46:28:ERROR:WU02:FS01:Exception: Could not get an assignment
09:46:32:WU01:FS01:Upload 17.44%
09:46:38:WU01:FS01:Upload 18.31%
09:46:44:WU01:FS01:Upload 19.07%
09:46:50:WU01:FS01:Upload 19.83%
09:46:56:WU01:FS01:Upload 20.58%
09:47:02:WU01:FS01:Upload 21.13%
09:47:08:WU01:FS01:Upload 21.88%
09:47:15:WU01:FS01:Upload 22.75%
09:47:21:WU01:FS01:Upload 23.29%
Répondre
#6
(07-04-2020 10:35:54)toTOW a écrit : Mais elles sont encore assignées à vos GPUs ces WUs ?
Ce sont 4 des 6 dernieres WU traitées
cf. mes stats EOC / Hourly Production à droite (quand y'en a 3 dans la meme remontée, c'est que 2 sont FAULTY)
https://folding.extremeoverclocking.com/...?s=&u=8310



Ah bah encore deux de plus il y a peu Cry2
Code :
09:02:50:WU00:FS00:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
09:02:50:WU00:FS00:0x22:Saving result file ..\logfile_01.txt
09:02:50:WU00:FS00:0x22:Saving result file science.log
09:02:50:WU00:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:02:51:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:02:51:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:11759 run:0 clone:2960 gen:37 core:0x22

Code :
09:06:46:WU03:FS00:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
09:06:46:WU03:FS00:0x22:Saving result file ..\logfile_01.txt
09:06:46:WU03:FS00:0x22:Saving result file science.log
09:06:46:WU03:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:06:47:WARNING:WU03:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:06:47:WU03:FS00:Sending unit results: id:03 state:SEND error:FAULTY project:11776 run:0 clone:12249 gen:19 core:0x22 unit:0x00000026287234c95e74333c7fa22e47



J'ai mis à jour le premier post avec la liste des dernieres WU renvoyées (sauf celle qui n'arrive pas à partir vers 140.163.4.231:80 depuis plusieurs jours Cry2 )
Répondre
#7
(07-04-2020 10:47:37)Osteofold a écrit :
Code :
09:43:40:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:43:40:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:43:44:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:43:44:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:43:47:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:43:47:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan

A priori pas le même probleme :  ton systeme est overclocké ?
Répondre
#8
Et 3 de plus depuis mon dernier msg ...

Je vais tout arrêter, faire le ménage dans le dossier f@h , faire une maj du driver, et on verra...
Répondre
#9
Le cpu légèrement, le gpu rien d’extrême, j'ai juste essayé 2 réglages d'usine différents et ça ne change rien à priori.
Répondre
#10
Purée, impossible d'installer les derniers drivers WHQL (20.2.2) ... quel looze...
Il me dit que le package est incomplet...

Je vais essayer la 20.4.1
Répondre





Utilisateur(s) parcourant ce sujet : 1 visiteur(s)