[GSoC] Ideas about integration with Hadoop

Hello guys,
I am Matteo Cossu and I am studying in the Master of Computer Science in Freiburg (Germany). I would like to contribute to Plone for the next Google Summer of Code. In the last year I acquired some experience with hadoop, especially Impala and Spark. Could some Plone's developer tell me if has sense a plugin that allows to access big data tools from Plone? Or maybe is it a hazardous idea?

There is no need for that..I did similar integrations for cloud storages etc....almost no feedback and adaptation...so really too specific for wasting time and resources here.

-aj

What sort of things do you see this integration helping to do?

for example visualize data analytics that come from the hadoop ecosystem. I understand that maybe this idea would only help a niche of users, so it's up to you to tell me if is it worth or not.

I personally have not heard of anyone using Plone with Hadoop but that doesn't mean much... Also, if your integration did cool & useful things, it would open up Plone to Hadoop users, which is great for Plone.

Some perspective... I use Plone to build data/analytic applications. What I build has few upstream dependencies, and is mostly a custom-built data-collection / data-viz / query solution. My sense is that any wins in the "data science" niche for Plone would start by focus on the existing Python-based data-science toolsets, e.g.:

  • Hosting iPython/Jupyter notebooks in a CMS setting;
  • Supporting pandas data frames in ZODB (not sure if this is possible, desirable).
  • Integrating data visualization tools around data-sets uploaded into Plone as CSV or Excel worksheets. The merits of this are debatable at the scale a GSoC project could accomplish.

Hadoop, spark, etc are interesting for "big data" or "data warehouse" project but the degree and variety of ways they could benefit from integration with a CMS are possibly too tricky to generalize without years of experience in both kinds of toolsets.

Sean

1 Like

It might be interesting to work on adaptations of a package like altair using the vega grammar into a plone setting. Allowing declarative graph descriptions TTW to be rendered inline in some context within the CMS could be a compelling story for some users.

1 Like

Definitely a fan of Vega grammars and the GoG approach, generally (though I admit not to using it now in my current D3 projects). This could be interesting, should it be reasonable to model such a grammar as a hierarchy of content, assuming there is enough richness in Dexterity to model 1:1 the concepts? Of course, that might assume that either: we have a data-grid field that is not long in the tooth, or we have a special purpose data-grid widget written (and wrapped via patternslib) to do a purpose-specific grid with hard-coded schema (saving JSON to hidden form input).

Sean