Session roadmap

  • What we've come to expect from analytics software
  • Drupal for managing "web experiences"
  • Proposed architecture and improvements to existing components.

Let's talk about analytics

Analytics

Analytics is the systematic analysis of data or statistics.


  • Data analyzed includes existing application data necessary for day-to-day operations as well as purpose-collected metrics.
  • Analytics data is often collected and stored in purpose-built databases or data warehouses for later analysis
  • Data is queried, aggregated, and analyzed to make informed decisions within an organization
  • Analytics comes in many flavors; for our purposes, we'll focus on web analytics and software analytics

Web analytics basics

Thanks to ubiquitous tools like GA, we've come to expect...


  • Automated metric collection on events at the page or session level
    • Page view events, typically keyed off of page URL or path
    • Default metrics like: browser, OS, geo, etc.
  • Aggregation on records over time for those events
    • How many hits to this page in the last month?
    • How many unique visitors?
  • Segmentation and filtering on collected metrics
    • How many hits from Tokyo?
    • How many IE visitors to this page from China yesterday?

Web analytics extensibility

Built on those basics, we've come to find the following indispensable:


  • Ad hoc events, decorated with the usual metrics
    • User clicked [this element]
    • User focused on [this field] with [this value]
  • Ad hoc definition of custom metrics and dimensions
    • This page is a taxonomy term with [this tid]
    • This visitor is associated with [this industry]
  • Native integration of the above with the aforementioned basics
    • How many people in [this industry] clicked [this element] last week?

Deriving value & meaningful insights

We can also translate this into value at an organizational level:


  • Goals define relationships between multiple analytic events
    • User visited [this page]
    • Then, user clicked [this button]
  • Conversions happen when visitors complete defined goals
    • Conversions typically represent core organizational value
    • Typically the #1 area where optimization occurs
  • Conversion analysis is also natively integrated
    • What's the success rate of [becoming an engaged constituent]?
    • Segment conversion rate on custom dimensions over time

Focus on developer experience

Though often difficult to conceptualize and define goals, metrics, and events across an organization's full digital presence, developer implementation can be trivial.

  • Performance is front-and-center
    • The analytics package is loaded asynchronously
    • Prior to its availability, event and metric calls are queued
  • All functionality is available via a simple facade

    ga('send', 'event', 'category', 'action');
    ga('set', 'dimension20', 'foo');

Software analytics

Same concept, different data.


  • Instead of pageview and click events, we have transaction, network connectivity, and other app-centric events
  • Rather than aggregating hits or visitors, we typically aggregate performance and scalability metrics
  • Rather than conversion tracking, we focus on performance benchmarks and optimization

Does Drupal do analytics?

  • Well, so the Statistics module can count hits to nodes, right?
  • And, y'know, performance and other request details are tracked in the accesslog
  • I guess there's some interesting stuff in the watchdog
  • But no... Drupal does not do analytics the way we want
  • Instead, we rely on Drupal doing what it does best: integrating with specialized third-party services

So what now?

Let's talk web experience management

Web experience management

WEM is kind of a fluffy marketing term that basically means "a content management system, but with personalization."

  • Personas: users are here for different reasons with different goals
  • Content delivery should be optimized for those personas and their respective goals
  • Most importantly, optimization should be tested, measured, and improved constantly

WEM implementation fundamentals

A minimal framework for personalized content delivery would need:


  • Some mechanism to customize the way content is delivered
  • Some mechanism to determine a visitor's persona
  • Some system to collect and store personalization analytics

Drupal as the foundation for WEM

How does Drupal look as a basis for managing experiences?


  • Content delivery customization
    • Blocks, regions: UI, hooks; condition plugins (D8)
    • Context module: block placement, theme, etc.
    • Views plugins: contextual filters, query alters, etc.
    • Panels
  • Visitor segmentation
    • Taxonomy module
    • Decent work in contrib: browscap, IP Geolocation
    • Some integration with third party real-time ID technologies
    • Field-able user entities

Analytics: the elephant in the Drupal room

The key to WEM is data and Drupal is completely ignorant of it.


  • Without ongoing knowledge of a user's actions, Drupal is fundamentally handicapped in personifying a visitor.
  • Inaccurate visitor segmentation is difficult or otherwise yields low quality personalization
  • Most importantly, without personalization metrics, administrators are fundamentally handicapped in improving visitor experience

The fact of the matter

If we want to do WEM, our application architecture must be data-first.


  • Analytic data collection and storage should be core to Drupal
  • Data collection and storage should be pluggable
  • Analytics should be a task targeted at site administrators, not developers
  • As with the ubiquity of RWD, modules that aren't data-minded should be thought of as "broken"
  • Analytics data should be fully integrated with all other Drupal systems

So how do we get there?

Scalability is hard

But we're in luck!


  • A path to app-layer scalability
    • Web services in core
    • Bootstrap re-architecture: the kernal
  • A path to DB-layer scalability
    • Proliferation of highly scalable databases
    • Consumerization of "big data" software as a service

Leveraging content entities

Imagine a Stat Entity where each instance is an analytic event that bundles configurable dimensions and metrics by event type.

  • The Entity API gives us pluggable storage for free
  • We also get bundle configuration UI for free
  • Most importantly, content entities already integrate seamlessly with existing subsystems like Views
  • Entity Query supports relationships and aggregation, is storage agnostic

Plugins for metric collection

Imagine Stat Data Plugins provided by core and contrib modules associated with custom fields on Stat bundles.

class UserAgent extends StatDataPluginBase implements ContainerFactoryPluginInterface {
  // @var Symfony\Component\HttpFoundation\Request
  protected $request;

  public static function create(ContainerInterface $container) {
    return new static($container->get('request'));
  }

  public function __construct(Request $request) {
    $this->request = $request;
  }

  public function execute() {
    return $this->request->headers->get('user-agent', '');
  }
}

Plugins for data management

Imagine Data Manager Plugins to perform regular tasks we commonly perform on similar data in Drupal today.

  • Data truncation (truncate by # of records, truncate by time)
  • Data aggregation (across configurable segments for configurable periods of time)

Metric collection

So we've defined an entity and a way to populate fields...


  • How do we empower site builders to actually collect data?
  • Do we build a UI on top of Symfony events?
  • Do we pull Rules into core?
  • How do we handle events that are purely client-side?

Data analysis

So we have this giant pile of data, how do we understand it?


  • Views is a powerful way to query data, but...
  • Building a View is hard. We should work to make Views more intuitive for site builders
  • To support custom entity storage, we need to use Entity Query as Views' query service
  • Find a way to support View relationships between entities across storage backends

Visualization

Tables of numbers and text are not useful, but visualization is hard.


  • If there were ever a time to pull in a visualization framework, this would be it
  • We should definitely not build our own viz framework
  • We should definitely define a light API so the framework we do pull in is swappable
  • Views integration is a no-brainer.

tldr

  • Everyone needs analytics, Drupal doesn't do it right now.
  • If WEM is a thing we want to do, we need native analytics data
  • Even if not, it'd be nice to standardize analytics integrations
  • Much of the work that would go into a native analytics framework is already in core
  • The remaining problems to be solved are not completely daunting

Thanks!

Let's keep talking: