tkimnguyen
(T. Kim Nguyen)
December 31, 2016, 4:41pm
1
I've been poking around with wget to grab a static copy of the https://2015.ploneconf.org site to serve it with nginx.
Googling around I found http://www.linuxjournal.com/content/downloading-entire-web-site-wget and so the command I've used is
wget --recursive --no-clobber --page-requisites --html-extension --domains 2015.ploneconf.org https://2015.ploneconf.org
You can see the static copy at https://2015new.ploneconf.org/ (though any links on that page are not rewritten to use 2015new.ploneconf.org )
This does seem to grab everything correctly, but I've noticed that links to, say, the Things To Do, at https://2015.ploneconf.org/venue/#things-to-do are not correctly converted to use the .html, i.e.. you get an error 403 because the static URL should really be https://2015new.ploneconf.org/venue.html#things-to-do (and sure enough the venue.html page was grabbed by wget)
Plone is smart with its traversal... it just figures out that "venue#things-to-do" is not a directory but an anchor within a page.
Any suggestions on how to handle that sort of Plone URL with nginx?
4 Likes
tkimnguyen
(T. Kim Nguyen)
December 31, 2016, 4:59pm
2
OK, trying as per https://linuxaria.com/pills/how-to-modify-an-url-extension-with-a-nginx-rewrite
# add .html to URI and serve file, directory, or symlink if it exists
if (-e $request_filename.html) {
rewrite ^/(.*)$ /$1.html last;
break;
}
tkimnguyen
(T. Kim Nguyen)
December 31, 2016, 5:04pm
3
Holy cr*p I think it works now https://2015.ploneconf.org
1 Like
tkimnguyen
(T. Kim Nguyen)
January 1, 2017, 7:51pm
4
To identify broken links, I used this command:
wget --spider -o wget.log -e robots=off --wait 1 -r -p https://2015.ploneconf.org
It worked great... tracked down and fixed 40+ broken links!
gutow
(Gutow)
January 5, 2017, 2:26pm
5
Take a look at https://github.com/jcu-eresearch/static-plone-wget for a quite generally applicable option for turning a plone site static. He's written a script that takes care of a lot of the oddities. I used it recently to do a static backup of a site including the private parts. Some of the recent changes are my fault because of issues I encountered.
Jonathan
2 Likes
pigeonflight
(David Bain (Will Theme Plone Sites))
January 25, 2019, 6:59pm
6
I know I'm asking years later.
Out of curiosity why not httrack for this?
1 Like
tkimnguyen
(T. Kim Nguyen)
January 25, 2019, 7:16pm
7
Because I never thought of it? looking now...
zopyx
(Andreas Jung)
January 25, 2019, 7:38pm
8
httrack
is the tool for grabing websites.
tkimnguyen
(T. Kim Nguyen)
January 25, 2019, 7:41pm
9
Indeed! I don't know why I didn't think of it or run into it in my earlier searches. https://www.httrack.com/
I will shortly be static-ifying the 2016.ploneconf.org and 2017.ploneconf.org sites, so thx for your timely question, @pigeonflight !
1 Like
djowett
(Djowett)
January 28, 2019, 10:26pm
10
Maybe this would have been fun too?
1 Like
erral
(Mikel Larreategi)
January 29, 2019, 6:47am
11
We are using this in a Plone 4.3.x site with a client and it works.
2 Likes