Purging content on dynamically scalable caching proxies

rmontenegroo · July 27, 2020, 5:39pm

Hey, guys!

I am trying to setup a plone deployment on kubernetes.

This setup predicts the usage of scalable zeoclients and varnish instances.

The thing is when I have a dynamically change of varnish replicas, it becomes very difficult to keep portal cache configuration up to date. I was wondering it would be much easier having a headless kubernetes service setup for varnish or, considering an off-kubernetes scenario something like a round-robin rotation dns entry, plone.cachepurging.purger method run could start a producer for each IP address resolved from the caching proxy service name.

Something like:

diff --git a/plone/cachepurging/purger.py b/plone/cachepurging/purger.py
index f5670b2..bfb984b 100644
--- a/plone/cachepurging/purger.py
+++ b/plone/cachepurging/purger.py
@@ -27,6 +27,8 @@ import requests
 import six
 import sys
 import threading
+import socket
+import urllib
 
 
 logger = logging.getLogger(__name__)
@@ -179,6 +181,36 @@ class Worker(threading.Thread):
     def stop(self):
         self.stopping = True
 
+    def unfoldURL(self, url):
+        logger.info('Unfolding URL: %s' % url)
+
+        output = [url,]
+        
+        urlinfo = list(urllib.parse.urlparse(url))
+        if not urlinfo[1]:
+            return output
+        
+        netlocinfo = urlinfo[1].split(':')
+
+        port = ''
+        netloc = netlocinfo[0]
+        
+        if len(netlocinfo) > 1:
+            port = netlocinfo[1]
+
+        try:
+            unfolded_netlocs = socket.gethostbyname_ex(netloc)[2]
+        except:
+            return output
+
+        output = []
+        for un in unfolded_netlocs:
+            new_netloc = un + ':' + port
+            urlinfo[1] = new_netloc
+            output.append(urllib.parse.urlunparse(urlinfo))
+
+        return output
+
     def run(self):
         logger.debug("%s starting", self)
         # queue should always exist!
@@ -196,35 +228,40 @@ class Worker(threading.Thread):
                         )
                         break
                     url, httpVerb = item
+                    unfoldedURLS = self.unfoldURL(url)
+                    logger.info('Unfolded URL: %s' % unfoldedURLS)
 
-                    # Loop handling errors (other than connection errors) doing
-                    # the actual purge.
-                    for i in range(5):
-                        if self.stopping:
-                            break
-                        # Got an item, purge it!
-                        try:
-                            resp, msg, err = self.producer.purge(
-                                session, url, httpVerb
-                            )
-                            if resp.status_code == requests.codes.ok:
-                                break  # all done with this item!
-                            if resp.status_code == requests.codes.not_found:
-                                # not found is valid
-                                logger.debug(
-                                    "Purge URL not found: {0}".format(url)
+                    for uu in unfoldedURLS:
+                        url = uu
+                        # Loop handling errors (other than connection errors) doing
+                        # the actual purge.
+                        for i in range(5):
+                            if self.stopping:
+                                break
+                            # Got an item, purge it!
+                            try:
+                                resp, msg, err = self.producer.purge(
+                                    session, url, httpVerb
                                 )
-                                break  # all done with this item!
-                        except Exception:
-                            # All other exceptions are evil - we just disard
-                            # the item.  This prevents other logic failures etc
-                            # being retried.
-                            logger.exception("Failed to purge {0}".format(url))
-                            break
-                        logger.debug(
-                            "Transient failure on {0} for {1}, "
-                            "retrying: {2}".format(httpVerb, url, i)
-                        )
+                                logger.info("url %s purged", url) # aqui
+                                if resp.status_code == requests.codes.ok:
+                                    break  # all done with this item!
+                                if resp.status_code == requests.codes.not_found:
+                                    # not found is valid
+                                    logger.debug(
+                                        "Purge URL not found: {0}".format(url)
+                                    )
+                                    break  # all done with this item!
+                            except Exception:
+                                # All other exceptions are evil - we just disard
+                                # the item.  This prevents other logic failures etc
+                                # being retried.
+                                logger.exception("Failed to purge {0}".format(url))
+                                break
+                            logger.debug(
+                                "Transient failure on {0} for {1}, "
+                                "retrying: {2}".format(httpVerb, url, i)
+                            )

jensens · July 28, 2020, 10:47am

Like! That would work. I think this could an optional configuration, so the lookup overhead is only active in environments where this is needed. Would you mind to open an issue or pull request?

rmontenegroo · July 28, 2020, 2:40pm

I totally agree. That would be much better if we could avoid lookup overhead with a configuration. I just wonder where this configuration would be better placed.

jensens · July 29, 2020, 4:28pm

I would add it to the registry as part of the cache configuration control-panel.

djay · July 30, 2020, 3:01am

Isn't purging happening in a seperate thread? In which case does performance matter much? Having one less thing to configure and perhaps just caching the DNS unfolding for an hour or so is better?

jensens · July 30, 2020, 8:04am

Good point, I forgot about that! I think you're right, no additional setting is needed.
+1 for caching the DNS. And maybe check if an IP was used and in case bypass the lookup.

rmontenegroo · July 30, 2020, 11:48am

When you have horizontal pod autoscaler enabled in a kubernetes deployment, for example, things can go really crazy. I mean varnish instances can come and go much more frequently than an hour. On default basis kubernetes horizontal autoscaler loops every 15 seconds checking upon its pods whether they need to be scaling up or down.

If one new varnish instance has been left out for cache reasons, a content recently changed won't be refreshed until its TTL is achieved. Maybe this DNS cache TTL should be really often. What do you think?

+1 for IP usage bypassing!

Where would you guys store this cache?

rmontenegroo · July 30, 2020, 5:36pm

I suppose you mean something like this:

diff --git a/plone/cachepurging/purger.py b/plone/cachepurging/purger.py
index f5670b2..1a1e132 100644
--- a/plone/cachepurging/purger.py
+++ b/plone/cachepurging/purger.py
@@ -27,7 +27,12 @@ import requests
 import six
 import sys
 import threading
+import socket
+import urllib
+import ipaddress
+import datetime
 
+DNSCACHETTL = 15 #seconds
 
 logger = logging.getLogger(__name__)
 
@@ -172,6 +177,9 @@ class Worker(threading.Thread):
         self.producer = producer
         self.queue = queue
         self.stopping = False
+        self.dnscachettl = DNSCACHETTL
+        self.dnscache = {}
+
         super(Worker, self).__init__(
             name="PurgeThread for %s://%s" % (scheme, host)
         )
@@ -179,6 +187,58 @@ class Worker(threading.Thread):
     def stop(self):
         self.stopping = True
 
+    def resolveHostname(self, netloc):
+        try:
+            # if it is an ip address there is no need to resolve
+            ipaddress.ip_address(netloc)
+            logger.info('Netloc is an IP address.')
+            return [netloc,]
+        except:
+            pass
+
+        if netloc in self.dnscache:
+            info = self.dnscache.get(netloc)
+            logger.info('Netloc entry found in cache')
+            if datetime.datetime.now() < info.get('good_until'):
+                logger.info('Netloc used from cache')
+                return info.get('addresses')
+
+        logger.info('Netloc lookup is needed')
+        addresses = socket.gethostbyname_ex(netloc)[2]
+        info = {
+                'addresses': addresses,
+                'good_until': datetime.datetime.now() + datetime.timedelta(seconds = self.dnscachettl)
+        }
+
+        self.dnscache.update( { netloc: info } )
+        return addresses
+
+    def unfoldURL(self, url):
+        logger.info('Unfolding URL: %s' % url)
+
+        output = [url,]
+        
+        urlinfo = list(urllib.parse.urlparse(url))
+        if not urlinfo[1]:
+            return output
+        
+        netlocinfo = urlinfo[1].split(':')
+
+        port = ''
+        netloc = netlocinfo[0]
+        
+        if len(netlocinfo) > 1:
+            port = netlocinfo[1]
+
+        unfolded_netlocs = self.resolveHostname(netloc)
+
+        output = []
+        for un in unfolded_netlocs:
+            urlinfo[1] = un + ':' + port if port else un
+            output.append(urllib.parse.urlunparse(urlinfo))
+
+        return output
+
     def run(self):
         logger.debug("%s starting", self)
         # queue should always exist!
@@ -196,35 +256,40 @@ class Worker(threading.Thread):
                         )
                         break
                     url, httpVerb = item
+                    unfoldedURLs = self.unfoldURL(url)
+                    logger.info('Unfolded URL: %s' % unfoldedURLs)
 
-                    # Loop handling errors (other than connection errors) doing
-                    # the actual purge.
-                    for i in range(5):
-                        if self.stopping:
-                            break
-                        # Got an item, purge it!
-                        try:
-                            resp, msg, err = self.producer.purge(
-                                session, url, httpVerb
-                            )
-                            if resp.status_code == requests.codes.ok:
-                                break  # all done with this item!
-                            if resp.status_code == requests.codes.not_found:
-                                # not found is valid
-                                logger.debug(
-                                    "Purge URL not found: {0}".format(url)
+                    for uu in unfoldedURLs:
+                        url = uu
+                        # Loop handling errors (other than connection errors) doing
+                        # the actual purge.
+                        for i in range(5):
+                            if self.stopping:
+                                break
+                            # Got an item, purge it!
+                            try:
+                                resp, msg, err = self.producer.purge(
+                                    session, url, httpVerb
                                 )
-                                break  # all done with this item!
-                        except Exception:
-                            # All other exceptions are evil - we just disard
-                            # the item.  This prevents other logic failures etc
-                            # being retried.
-                            logger.exception("Failed to purge {0}".format(url))
-                            break
-                        logger.debug(
-                            "Transient failure on {0} for {1}, "
-                            "retrying: {2}".format(httpVerb, url, i)
-                        )
+                                logger.info("url %s purged", url) 
+                                if resp.status_code == requests.codes.ok:
+                                    break  # all done with this item!
+                                if resp.status_code == requests.codes.not_found:
+                                    # not found is valid
+                                    logger.debug(
+                                        "Purge URL not found: {0}".format(url)
+                                    )
+                                    break  # all done with this item!
+                            except Exception:
+                                # All other exceptions are evil - we just disard
+                                # the item.  This prevents other logic failures etc
+                                # being retried.
+                                logger.exception("Failed to purge {0}".format(url))
+                                break
+                            logger.debug(
+                                "Transient failure on {0} for {1}, "
+                                "retrying: {2}".format(httpVerb, url, i)
+                            )