Handle properly Http retries / timeouts to download ESO header files
I am encoutering a lot of http timeouts on ESO download header this afternoon: requests.get("http://archive.eso.org/hdr?DpId={0}".format(header_id))
It seems too simple: using retries and properly handling timeouts (0 by default) are needed...
I found few blog posts on that topic (relying on urllib3 Retry):
- https://www.peterbe.com/plog/best-practice-with-retries-with-requests
- https://findwork.dev/blog/advanced-usage-python-requests-timeouts-retries-hooks/
I quickly tried:
def requests_retry_session(
retries=3,
backoff_factor=0.3,
status_forcelist=(500, 502, 504),
session=None,
):
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
return session
html_page = requests_retry_session().get("http://archive.eso.org/hdr?DpId={0}".format(header_id), timeout=10)
But if does not retry if a timeout happens:
2020-03-25 15:07:09,059 DEBUG [urllib3.connectionpool:225][MainThread] Starting new HTTP connection (1): archive.eso.org:80
2020-03-25 15:07:09,606 DEBUG [urllib3.connectionpool:437][MainThread] http://archive.eso.org:80 "GET /hdr?DpId=MATIS.2020-02-15T02:37:37.267 HTTP/1.1" 200 None
--- Logging error ---
Traceback (most recent call last):
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
yield
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/urllib3/response.py", line 755, in read_chunked
chunk = self._handle_chunk(amt)
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/urllib3/response.py", line 708, in _handle_chunk
returned_chunk = self._fp._safe_read(self.chunk_left)
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/http/client.py", line 620, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/requests/models.py", line 750, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/urllib3/response.py", line 560, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
self._original_response.close()
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/home/bourgesl/.pyenv/versions/3.7.6/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='archive.eso.org', port=80): Read timed out.
Too sad.