Does zope make two reads to publish an object?

dieter · September 3, 2019, 6:54am

You do not need to understand manage_workspace to solve your current task (making LocalFS Python3/Zope 4 compatible). That LocalFS sometimes reads a file twice is not a serious problem (likely, it was there all the time), just a small inefficiency. Even if you want to fix the problem, you can do so without thinking about manage_workspace -- by following my recommendation in a previous response.

But, I do not want to withhold the information about manage_workspace. It is an element of the ZMI (= "Zope Management Interface") and facilitates the contruction of ZMI URL references to Zope objects. Whenever a ZMI view wants to create a URL to the ZMI page of the Zope object obj, it can append /manage_workspace to the url to obj. In short: manage_workspace's task is to present the ZMI page for the object on which it is called.
The default implementation of manage_workspace's checks which of obj's ZMI actions are available to the current user and redirects to the first of those actions. A class could override this definition should this not give an appropriate ZMI page for its instances.

zfm · September 8, 2019, 1:07pm

@dieter, Thanks for your feedback. I finally had a bit of time to look into it. As it is, LocalFS creates a file object with OFS.Image.File(id, '', file) where file is the local file handler. The object is then wrapped with a mixin (Wrapper, OFS.Image.File). It would be possible to create here (Wrapper class) the properties you mention, although I don't know the nature of the corresponding attributes in OFS.Image.File and if inheritance allows indeed an attribute to be replaced by a property.

I had a brief look at File.init() and _read_data(). It's not simple code but in fact I got the impression that it already does what you suggest, that is, read the actual data only when needed, at least for big files.

Finaly, for completeness, REQUEST arrives in bobo_traverse with an actual response yet, which makes sense. There is a key, TraversalRequestNameStack, which contains, ['manage_workspace']. So, it would be possible for LocalFS to detect the specific situation (as an alternative solution) and return a minimal object. However, I don't know what is expected by Zope.

In any case, I don't have the time to handle this corner situation of LocalFS, which does not affect my use case, and which might not even be a real issue considering Image.File implementation. At least for now, I will simply document the issue and point to this thread. Thanks again for all the patient and detailed feedback!

dieter · September 8, 2019, 2:23pm

It is possible to override an attribute with a property.

Regarding the "nature of the attributes", you can look at how the attributes are used. data is likely is bit difficult: it has either a string/binary like value (small file) or a recursively chained value (large file). The latter does not fit well with a file system file. The easiest solution would be to let data return the complete external file content -- but this could have serious consequences for huge files. To cope with huge files, other methods would likely require adaptation as well.

I discourage this route: you should not try to identify individual situations (like a call to manage_workspace). Instead, I would try to create an OFS.Image.File object with a faked content (more precisely: the empty content) and ensure that the true file content is accessed only when it is really required. This way, there is no need to identify individual situations but only to identify the methods/attributes which have a need to access the true file content.

Let me stress again: all this is not necessary, it is only an optimization. Likely, "LocalFS" has always accessed the file content more often than necessary and it has not been a serious problem. There is no real need to change it at this point. If you want to tackle it, you may think of it as a secondary project.

zfm · September 8, 2019, 6:51pm

Any idea why Zope is not actually doing the same for big objects? At first sight it doesn't matter if the object file comes from a file system file or from a ZODB object. In the end it all comes from a file. It all depends which one is more efficient for big objects/files (which one has the best file/object index structure). Usually I don't put big files in Data.fs, I use Apache to serve those files for that purpose. Just wondering.

dieter · September 8, 2019, 8:47pm

It does: the recursive chained structure used for the data attribute of OFS.Image.File is used to prevent big files to cause huge structures in memory. But you do not have this structure for a file in the file system - and the trivial replacement does not cope well for big files in the file system. It gets significantly more complex when one want to change this.

zfm · September 9, 2019, 7:29pm

Well, that was my initial impression from a quick reading but after a second look, it appears that, for Data.fs objects, the object is read in chunks and then stored back in Data.fs as Pdata parts? Well, at least that's what I think is happening with big files. It does release the main memory, but it does not seem to prevent the reading in the first place, right? it just breaks the big object in parts for better memory management but it doesn't avoid two reads from secondary memory (Data.fs).

        data = Pdata(read(end - pos))
        self._p_jar.add(data)
        data.next = _next

        # Save the object so that we can release its memory.
        transaction.savepoint(optimistic=True)
        data._p_deactivate()

I mean, one read at object creation (first line) and then another read at serving time.

dieter · September 9, 2019, 8:56pm

Your description is not very precise: with "Data.fs object" you apparently designate an OFS.Image.File object. The data of such an object is stored either as a (binary) string (small data) or as a chain of Pdata objects (large data).

That highly depends on the method you are using. If you pass a file parameter to __init__, then it must read this file (as it changes the data representation). index_html, on the other hand, does not in all cases read the complete data (it does not for "range requests" and maybe for "if modified since" requests).

IF you really want to remove the inefficiency (reading the file several times), then you would pass b"" as file to File.__init__ and handle the file content access in your derived class. If you do the latter intelligently, then there is no need to read the file content in a manage_workspace (and many other ZMI) requests.

zfm · September 10, 2019, 6:47am

Ahh, yes, indeed I was confusing things. For normal Data.fs objects, the read call there happens at upload times. For LocalFS, the object has to be upload into memory/Data.fs at the time of serving.

So, what is the life cycle for the big LocalFS objects (>2 * (1 << 16))? ZODB implements MVCC for ZEO storage, do the big files stay there forever? are they re-used as a cache for subsequent requests??