Some perspective... I use Plone to build data/analytic applications. What I build has few upstream dependencies, and is mostly a custom-built data-collection / data-viz / query solution. My sense is that any wins in the "data science" niche for Plone would start by focus on the existing Python-based data-science toolsets, e.g.:
- Hosting iPython/Jupyter notebooks in a CMS setting;
- Supporting pandas data frames in ZODB (not sure if this is possible, desirable).
- Integrating data visualization tools around data-sets uploaded into Plone as CSV or Excel worksheets. The merits of this are debatable at the scale a GSoC project could accomplish.
Hadoop, spark, etc are interesting for "big data" or "data warehouse" project but the degree and variety of ways they could benefit from integration with a CMS are possibly too tricky to generalize without years of experience in both kinds of toolsets.