Check file size before downloading it with Python
If you're yolo-ing on the web and downloading a lot of content, especially arbitrary media files using a crawler, it might be useful to first check the mimetype & filesize before downloading it.
To do this with Python's requests
module, you'll have to set stream=True
and examine the headers for size & mime type. Following that, you can retrieve the content.
More specifically, 'Content-Length'
gives the file size in bytes while 'Content-type'
gives the mime type (not always reliable).
Here's a quick example.
import requests
MAX_SIZE = 2**20
url = "https://i.imgur.com/AD3MbBi.jpeg"
resp = requests.get(url, stream=True)
if all(
resp.headers.get("Content-Type", "") == "image/jpeg",
int(resp.headers.get("Content-length")) < MAX_SIZE
):
content = resp.content
with open("image.jpg", 'wb') as f:
f.write(content)