[GSoC] Ideas about integration with Hadoop

Hello guys,
I am Matteo Cossu and I am studying in the Master of Computer Science in Freiburg (Germany). I would like to contribute to Plone for the next Google Summer of Code. In the last year I acquired some experience with hadoop, especially Impala and Spark. Could some Plone's developer tell me if has sense a plugin that allows to access big data tools from Plone? Or maybe is it a hazardous idea?

There is no need for that..I did similar integrations for cloud storages etc....almost no feedback and adaptation...so really too specific for wasting time and resources here.


What sort of things do you see this integration helping to do?

for example visualize data analytics that come from the hadoop ecosystem. I understand that maybe this idea would only help a niche of users, so it's up to you to tell me if is it worth or not.

I personally have not heard of anyone using Plone with Hadoop but that doesn't mean much... Also, if your integration did cool & useful things, it would open up Plone to Hadoop users, which is great for Plone.

Some perspective... I use Plone to build data/analytic applications. What I build has few upstream dependencies, and is mostly a custom-built data-collection / data-viz / query solution. My sense is that any wins in the "data science" niche for Plone would start by focus on the existing Python-based data-science toolsets, e.g.:

  • Hosting iPython/Jupyter notebooks in a CMS setting;
  • Supporting pandas data frames in ZODB (not sure if this is possible, desirable).
  • Integrating data visualization tools around data-sets uploaded into Plone as CSV or Excel worksheets. The merits of this are debatable at the scale a GSoC project could accomplish.

Hadoop, spark, etc are interesting for "big data" or "data warehouse" project but the degree and variety of ways they could benefit from integration with a CMS are possibly too tricky to generalize without years of experience in both kinds of toolsets.


1 Like

It might be interesting to work on adaptations of a package like altair using the vega grammar into a plone setting. Allowing declarative graph descriptions TTW to be rendered inline in some context within the CMS could be a compelling story for some users.

1 Like

Definitely a fan of Vega grammars and the GoG approach, generally (though I admit not to using it now in my current D3 projects). This could be interesting, should it be reasonable to model such a grammar as a hierarchy of content, assuming there is enough richness in Dexterity to model 1:1 the concepts? Of course, that might assume that either: we have a data-grid field that is not long in the tooth, or we have a special purpose data-grid widget written (and wrapped via patternslib) to do a purpose-specific grid with hard-coded schema (saving JSON to hidden form input).