Why don't I get a 404 when the same path is added several times in the URL?

Hello,

I'd like to understand why Plone doesn't return 404 when the same path is added several times in the virtual path (path_info).

plone_website_url/community/contribute/community/contribute/community/contribute/contribute/contribute

The above URL is working but should not on my point of view and return a 404

Imagine if you had <a href="contribute">test</a> in the code of this webpage you'd have a url loop and all robots would traverse the same page over and over again adding contribute to the path. This can create a DoS or DDoS because this can bypass any caching.

This issue can be reproduced on any Plone-based website, including the Plone website and the FBI website.

I managed to find this issue in our apache logs with the following commands:
grep -h -E '/([^/]+/[^/]+)/\1/\1/' access*.log
zgrep -h -E '/([^/]+)/\1/\1/' access*.gz

I would appreciate it if you could help me find a github issue that talks about this or where to create it.

Thank you.

Hi @jejbq . I recommend reading up on Traversal and Acquisition in Plone/ Zope to understand the behaviour a bit more:

1 Like

You can prevent this by using Products.CMFCore >= 3.1 . This will become the default behaviour starting with Plone 7.

Currently the pages probably have the canonical url set and search engines should read it and conclude these are not the pages it is looking for.

1 Like

Hi @jaroel, I've ported the following patch to our older version of CMFCore but I'm not getting a 404 when the same path is added twice or more in the URL. I've added a logger to the patch and the CMFCore.interfaces._explicitacquisition class is loaded correctly.

github_url/zopefoundation/Products.CMFCore/commit/264a9fb8ef10fab49a2b38ed644f2f17f2e45491

I don't understand how the behavior is changed in Plone 7 and if it's possible to port this behavior back to Plone 5 or 6.

Thank you.

Hi @JeffersonBledsoe.

Thank you for the doc.

I've been trying to understand the concept of Traversal vs Acquisition in Plone and have come to the conclusion that there is no real concept of a complete hierarchy in Plone that can be assimilated to an absolute path like in other CMS. Traversal is closer to a pseudo-absolute path than Acquisition. It seems that the virtual path is more like an inheritance of objects and attributes than the implementation of a real hierarchy, isn't it?

The most disturbing thing about Plone's default behavior is that robots or attackers can traverse alternative paths to the same page, or even loop if a link is relative to the same object.

This Plone behavior can be used as a fingerprint (by duplicating the path twice or more) to detect that the website is indeed using Plone and not another type of CMS.

I still don't understand how to move an existing Plone from Acquisition to Traversal (or the opposite) and what steps are necessary to achieve this.

Please make sure you copied over all changes in all files (changelog is optional) from Provide a way to not publish items that are acquired. by jaroel · Pull Request #129 · zopefoundation/Products.CMFCore · GitHub , including the tests.

Then run the tests by running bin/test -t test_explicitacquisition in the project root.
You should see something like

  Running:
                
  Ran 3 tests with 0 failures, 0 errors, 0 skipped in 0.046 seconds.

If is says something like Ran 0 tests[...] or No tests found, please make sure you have edited the right files.
Please make sure you do not set the env var PUBLISHING_EXPLICIT_ACQUISITION=false - setting it to false will skip the new checks.

For fingerprinting you can just look at the <meta name="generator" content="Plone - https://plone.org/"/> header in the html - it's there by default.

Thanks @jaroel but I don't have any bin/test in zinstance.

To be honest, I tried to port your patch to Products.CMFCore 2.2.10 (directly by patching the Products.CMFCore-2.2.10-py2.7.egg) and we are using Plone 5.0.6. We tried to migrate to Plone 6 but we couldn't pass the migration from Python 2.7 to Python 3 and the conversion from latin1 to utf8 ... all our website broke after 5.1.6 because of some modules.

For the fingerprint, I'm talking about the fact that if the current page still loads if you repeat the current path in the URL, the website is probably running a Plone/Zope-based CMS. This can be verified on Plone website, FBI website, our website etc.

Given that you're running a Python2.7 site, the potential for DDOS-ing is not something to worry about.
The fingerprinting has never been an issue. Hiding the fact that you're running Plone or Zope will not deter anyone. No one will try for acquisition effects via url - there are easier and faster methods to check.

I'm not really sure what you expect to gain from back porting, but hey it's your party.

Anyhow, if you add raise ValueError("Yay!") to the top of def after_traversal_hook, do you get an error page/exception/traceback of some kind for any url?
If not - your code is not being loaded.

Make sure to restart your instance!

The code is correctly loaded but doesn't handle 404 the way I expected

Traceback (innermost last):
Module ZPublisher.Publish, line 129, in publish
Module zope.event, line 31, in notify
Module zope.component.event, line 24, in dispatch
Module zope.component._api, line 136, in subscribers
Module zope.component.registry, line 321, in subscribers
Module zope.interface.adapter, line 585, in subscribers
Module Products.CMFCore.explicitacquisition, line 18, in after_traversal_hook
ValueError: Yay!

But if I duplicate the URL path or the last part of the URL, I still get the content and not a 404. I have the same problem on the main official Plone website.

https_plone_org/why-plone/what-is-plone/what-is-plone/what-is-plone/what-is-plone/what-is-plone

We had a DDoS because of Robots repeating the URL ... because of a relative link instead of an absolute link in our website. Plone didn't return 404, so we had to create a ModRewrite to return 404 ...

"PhxBot/0.1 (phxbot@protonmail.com)"

You could try GitHub - collective/collective.explicitacquisition: Disallow access to acquired content outside the current path instead.

Could you check and see what both context.aq_chain and context.aq_inner.aq_chain are for some of the urls? I don't have a Plone 5 lying around.

Could you see if you can find the time to figure out how to run the tests?
You could try to add a test case for your use-case, ie trying /foo/foo/join-us or something like that.

Jejbq via Plone Community wrote at 2023-7-13 14:19 +0000:

...
I've been trying to understand the concept of Traversal vs Acquisition in Plone and have come to the conclusion that there is no real concept of a complete hierarchy in Plone that can be assimilated to an absolute path like in other CMS. Traversal is closer to a pseudo-absolute path than Acquisition. It seems that the virtual path is more like an inheritance of objects and attributes than the implementation of a real hierarchy, isn't it?

Acquisition was introduced in the old days of Zope
when there has not yet been a good distinction between
tools (providing logic) and content objects.
It has allowed objects to provide functionality to siblings and
their descendents (thus implementing tool objects).

Actually, Acquisition supports a strict hierarchy.
It calls this "containment" and Traversable.getPhysicalPath
(and derived from this Traversable.absolute_url) use this
to get correct hierarchical URLS.
To get the hierarchical parent of an object o you
use aq_parent(aq_inner(o)).

The primary purpose of the so called "acquisition wrapper"s is to remember
the access path to an object in order to allow the object
to use attributes (e.g. methods, properties) from ancestors in
this path -- as you have found out some kind of (path ancestor) inheritance.

Could you check and see what both context.aq_chain and context.aq_inner.aq_chain are for some of the urls? I don't have a Plone 5 lying around.

URL called fqdn/en/about/missions-and-values/about/missions-and-values/about/missions-and-values/missions-and-values. No 404 returned and the same content as fqdn/en/about/missions-and-values.

context.aq_chain and context.aq_inner.aq_chain are the same object pointer.

def after_traversal_hook(event):
...
context = event.request["PARENTS"][0]
import logging
logger = logging.getLogger('CMFCore.explicitacquisition')
logger.info("context.aq_chain:{} context.aq_inner.aq_chain:{}".format(context.aq_chain, context.aq_inner.aq_chain))

INFO CMFCore.explicitacquisition

context.aq_chain:[
<Document at /fqdn/en/about/missions-and-values>,
<Folder at /fqdn/en/about used for /fqdn/en/about/missions-and-values>,
<Document at /fqdn/en/about/missions-and-values>,
<Folder at /fqdn/en/about used for /fqdn/en/about/missions-and-values>,
<Document at /fqdn/en/about/missions-and-values>,
<Folder at /fqdn/en/about>, <Container at /fqdn/en>,
<PloneSite at /fqdn>,
,
<ZPublisher.BaseRequest.RequestContainer object at 0x7f854ea18a90>
]

context.aq_inner.aq_chain:[
<Document at /fqdn/en/about/missions-and-values>,
<Folder at /fqdn/en/about used for /fqdn/en/about/missions-and-values>,
<Document at /fqdn/en/about/missions-and-values>,
<Folder at /fqdn/en/about used for /fqdn/en/about/missions-and-values>,
<Document at /fqdn/en/about/missions-and-values>,
<Folder at /fqdn/en/about>,
<Container at /fqdn/en>,
<PloneSite at /fqdn>,
,
<ZPublisher.BaseRequest.RequestContainer object at 0x7f854ea18a90>
]

Imagine if you had test in the code of this webpage you'd have a url loop and all robots would traverse the same page over and over again adding contribute to the path. This can create a DoS or DDoS because this can bypass any caching.

https_plone_org/why-plone/what-is-plone

Imagine that <a aria-current="page" class="active" href="why-plone/what-is-plone"><span>What is Plone?</span></a> is a menu item present on every page of your plone website...

https_plone_org/try-plone/why-plone/what-is-plone
https_plone_org/news-and-events/news/why-plone/what-is-plone
https_plone_org/foundation/contact-us/why-plone/what-is-plone
...

So any URL on your website will NOT return 404 when you add ... "/why-plone/what-is-plone" to the end of the URL and will point to the same plone object over and over again, creating an infinite hierarchy of the same object for crawlers or web browsers... add multiple bots from multiple IPs/Ranges into the mix.... DDoS !

https_plone_org/try-plone/why-plone/what-is-plone/why-plone/what-is-plone/why-plone/what-is-plone
https_plone_org/news-and-events/news/why-plone/what-is-plone/why-plone/what-is-plone/why-plone/what-is-plone
https_plone_org/foundation/contact-us/why-plone/what-is-plone/why-plone/what-is-plone/why-plone/what-is-plone

still NO 404 ... and this is the official plone website ...

Plone Community Website also.
The following link will display announcements section in jobs section instead of 404:
https_community_plone_org/c/jobs/38/announcements/27

Imagine all the people, living in harmony!

Please try collective.explicitacquisition.