[GSoC] Ideas about integration with Hadoop

matteuan · March 20, 2017, 3:55pm

Hello guys,
I am Matteo Cossu and I am studying in the Master of Computer Science in Freiburg (Germany). I would like to contribute to Plone for the next Google Summer of Code. In the last year I acquired some experience with hadoop, especially Impala and Spark. Could some Plone's developer tell me if has sense a plugin that allows to access big data tools from Plone? Or maybe is it a hazardous idea?

zopyx · March 20, 2017, 8:58pm

There is no need for that..I did similar integrations for cloud storages etc....almost no feedback and adaptation...so really too specific for wasting time and resources here.

-aj

tkimnguyen · March 21, 2017, 4:22am

What sort of things do you see this integration helping to do?

matteuan · March 21, 2017, 11:33am

for example visualize data analytics that come from the hadoop ecosystem. I understand that maybe this idea would only help a niche of users, so it's up to you to tell me if is it worth or not.

tkimnguyen · March 21, 2017, 5:10pm

I personally have not heard of anyone using Plone with Hadoop but that doesn't mean much... Also, if your integration did cool & useful things, it would open up Plone to Hadoop users, which is great for Plone.

seanupton · March 24, 2017, 3:33pm

Some perspective... I use Plone to build data/analytic applications. What I build has few upstream dependencies, and is mostly a custom-built data-collection / data-viz / query solution. My sense is that any wins in the "data science" niche for Plone would start by focus on the existing Python-based data-science toolsets, e.g.:

Hosting iPython/Jupyter notebooks in a CMS setting;
Supporting pandas data frames in ZODB (not sure if this is possible, desirable).
Integrating data visualization tools around data-sets uploaded into Plone as CSV or Excel worksheets. The merits of this are debatable at the scale a GSoC project could accomplish.

Hadoop, spark, etc are interesting for "big data" or "data warehouse" project but the degree and variety of ways they could benefit from integration with a CMS are possibly too tricky to generalize without years of experience in both kinds of toolsets.

Sean

cewing · March 24, 2017, 3:59pm

It might be interesting to work on adaptations of a package like altair using the vega grammar into a plone setting. Allowing declarative graph descriptions TTW to be rendered inline in some context within the CMS could be a compelling story for some users.

seanupton · March 27, 2017, 11:37pm

Definitely a fan of Vega grammars and the GoG approach, generally (though I admit not to using it now in my current D3 projects). This could be interesting, should it be reasonable to model such a grammar as a hierarchy of content, assuming there is enough richness in Dexterity to model 1:1 the concepts? Of course, that might assume that either: we have a data-grid field that is not long in the tooth, or we have a special purpose data-grid widget written (and wrapped via patternslib) to do a purpose-specific grid with hard-coded schema (saving JSON to hidden form input).

Sean