Gadicc/squid-ssl-zero - caching proxy for your builds

Dockers layered filesystem and cache is pretty awesome, and if you’re only editing your app.py, it’s all you’ll ever need.

However, if you’re often changing things at a lower level (requirements.txt, pip install, conda install, apt-get, etc), redownloading all these files (and especially your models in download.py can be a big time waster, especially if you have slow internet).

Zero-config caching proxy to the rescue! It’s as simple as 1-2-3:

1. Run this once:

$ docker run -d -p 3128:3128 -p 3129:80 \
  --name squid --restart=always \
  -v /usr/local/squid:/usr/local/squid \
  gadicc/squid-ssl-zero

2. Edit /usr/local/squid/etc/squid.conf and to comment out the “default” policy and enable the “super aggressive” one (or make your own policies as desired):

# Default:
# refresh_pattern .		0	20%	4320   # <-- comment out this
# SUPER agressive (breaks HTTP standard but can be very useful)
refresh_pattern . 52034400 50% 52034400 override-expire override-lastmod reload-into-ims ignore-reload ignore-no-store ignore-private refresh-ims store-stale  # <-- uncomment this

Squid will reload automatically when you save the file.

3. Add this near the top of your Dockerfile (just after FROM):

# Note, docker uses HTTP_PROXY and HTTPS_PROXY (uppercase)
# We purposefully want those managed independently, as we want docker
# to manage its own cache.  This is just for pip, models, etc.
ARG http_proxy
ENV http_proxy=${http_proxy}
ARG https_proxy
ENV https_proxy=${https_proxy}
RUN if [ -n "$http_proxy" ] ; then \
    echo quit \
    | openssl s_client -proxy $(echo ${https_proxy} | cut -b 8-) -servername google.com -connect google.com:443 -showcerts \
    | sed 'H;1h;$!d;x; s/^.*\(-----BEGIN CERTIFICATE-----.*-----END CERTIFICATE-----\)\n---\nServer certificate.*$/\1/' \
    > /usr/local/share/ca-certificates/squid-self-signed.crt ; \
    update-ca-certificates ; \
  fi
ENV REQUESTS_CA_BUNDLE=${http_proxy:+/usr/local/share/ca-certificates/squid-self-signed.crt}

The RUN line is a little long to make sure your project will still build even if the build-arg http_proxy is not defined, i.e. your project will still build in other environments.

And you’re done. Never worry about wasted redownload time again.

Extra Info

A lot of projects require https now. Obviously we can’t usually inspect the contents of these connections to cache them, since they’re encrypted. The repo has squid setup to, on a request, generate it’s own self-signed certificate for the domain you’re looking for and serve with that (a so-called “man-in-the-middle” (MiTM) attack, if it was used maliciously). For this to work, your container must recognize the used Certificate Authority. That’s what the steps in the Dockerfile do.

Let me know if you have any other questions or problems.