How to copy a Plone site to serve with nginx

I've been poking around with wget to grab a static copy of the https://2015.ploneconf.org site to serve it with nginx.

Googling around I found http://www.linuxjournal.com/content/downloading-entire-web-site-wget and so the command I've used is

wget --recursive --no-clobber --page-requisites --html-extension --domains 2015.ploneconf.org https://2015.ploneconf.org

You can see the static copy at https://2015new.ploneconf.org/ (though any links on that page are not rewritten to use 2015new.ploneconf.org)

This does seem to grab everything correctly, but I've noticed that links to, say, the Things To Do, at https://2015.ploneconf.org/venue/#things-to-do are not correctly converted to use the .html, i.e.. you get an error 403 because the static URL should really be https://2015new.ploneconf.org/venue.html#things-to-do (and sure enough the venue.html page was grabbed by wget)

Plone is smart with its traversal... it just figures out that "venue#things-to-do" is not a directory but an anchor within a page.

Any suggestions on how to handle that sort of Plone URL with nginx?

4 Likes

OK, trying as per https://linuxaria.com/pills/how-to-modify-an-url-extension-with-a-nginx-rewrite

# add .html to URI and serve file, directory, or symlink if it exists
if (-e $request_filename.html) {
  rewrite ^/(.*)$ /$1.html last;
  break;
}

Holy cr*p I think it works now :slight_smile: https://2015.ploneconf.org

1 Like

To identify broken links, I used this command:

wget --spider -o wget.log -e robots=off --wait 1 -r -p https://2015.ploneconf.org

It worked great... tracked down and fixed 40+ broken links!

Take a look at https://github.com/jcu-eresearch/static-plone-wget for a quite generally applicable option for turning a plone site static. He's written a script that takes care of a lot of the oddities. I used it recently to do a static backup of a site including the private parts. Some of the recent changes are my fault because of issues I encountered.:slight_smile:
Jonathan

2 Likes

I know I'm asking years later.
Out of curiosity why not httrack for this?

1 Like

Because I never thought of it? :slight_smile: looking now...

httrack is the tool for grabing websites.

Indeed! I don't know why I didn't think of it or run into it in my earlier searches. https://www.httrack.com/

I will shortly be static-ifying the 2016.ploneconf.org and 2017.ploneconf.org sites, so thx for your timely question, @pigeonflight!

1 Like

Maybe this would have been fun too?

1 Like

We are using this in a Plone 4.3.x site with a client and it works.

2 Likes