Find a way to ignore Errors within a chord

task_reject_on_worker_lost configuration should solve the problem

But for that to work, enabling task_acks_late seem to be mandatory.

This way, if the worker is killed externally, the running task at the time will be moved back in the queue to be run later. And the chord will have all of it expected results

See :

  • https://docs.celeryq.dev/en/stable/userguide/configuration.html#task-reject-on-worker-lost
  • https://docs.celeryq.dev/en/stable/userguide/configuration.html#std-setting-task_acks_late

Sentry Issue: WAVEQC-W

ChordError: Dependency 4ed627ec-7d38-482c-ba36-2e2e128d733a raised WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV) Job: 4596.')
  File "celery/backends/redis.py", line 528, in on_chord_part_return
    resl = [unpack(tup, decode) for tup in resl]
  File "celery/backends/redis.py", line 528, in <listcomp>
    resl = [unpack(tup, decode) for tup in resl]
  File "celery/backends/redis.py", line 434, in _unpack_chord_result
    raise ChordError(f'Dependency {tid} raised {retval!r}')

Chord "93a7605a-84b6-447c-81e2-97a07cf34d33" raised: "ChordError(\"Dependency 4ed627ec-7d38-482c-ba36-2e2e128d733a raised WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV) Job: 4596.')\")"
Edited May 15, 2024 by Simon Panay
Assignee Loading
Time tracking Loading