Keynote by David Campbell (Microsoft) on Big Data Challenges at VLDB 2011, Seattle

This post is about the insights I got from the interesting keynote speech by David Campbell (Microsoft) on Big Data Challenges that was given on August 31st 2011 at VLDB 2011, Seattle.

The challenge of big data is not interesting just because of the “big” per sè. It’s a multi-faceted concept and all the perspectives need to be considered.
The point is that this big data must be available on small devices and in shorter time-to-concept or time-to-insight than in the past.
We cannot afford any more the traditional paradigm in which the lifecycle is:

  • pose question
  • conceptual model
  • collect data
  • logical model
  • physical model
  • respond question

The question lifecycle can be summarized by the graph below:

However, the current lead time of this is too long (weeks or months). The true challenge is that We have much more data than we can model. The bottleneck is becoming the modeling phase, as shown below:

The correct cycle to be adopted is the sensemaking developed by Pirolli and Card in 2005 in the Intelligence Analysis community.
The notion is to have a frame that explains the data and viceversa the data supports the explanatory frame, in a continuous feedback and interdependent relationship. (see the Data-frame theory for sensemaking by Klein et al.)
So far, this is viable in modeled domains, while big data expands this to unmodeled domains.
This needs to enable automatic model generation.
The other challenge is to grant that the new paradigm is able to comprise the traditional data application and that it will be able to get the best of traditional data and big data.
A few patterns have been identified for big data:

  • Digital shoebox: retain all the ambient data to enable sensemaking. This is motivated by the cost of data acquisition and data storage going toward zero. I simply augment the raw data with sourceID and instanceID and keep it for future usage or sensemaking.
  • Information production: turn the acquired data from digital shoebox to other events, states, and results, thus transforming raw data into information (still requiring subsequent processing). The results go back in the digital shoebox
  • Model development: enable sensemaking direclty over the digital shoebox without extensive up front modeling, so as to create knowledge. Simple visualizations often suffice for getting the big picture of a trend or a behaviour (e.g., home automation sensors can provide the habits of a family).
  • Monitor, mine, manage: develop and use generated models to perfom active management or intervention. Models (or algorithms) are automatically generated so as to be installed as a new system (e.g., think to fraud detection or other fields).

I think that these patterns can actually be defined more as new development phases than patterns. Their application can significantly shorten the time-to-insight and is independent on the size of the datasource.
On the other side, I think this paradigm can apply more to sensor data that generally speaking big data (e.g., datasources on the web), but still has a huge potential both for personal information management, social networking data and also for enterprise management.

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Social BPM: motivation and impact on the BPM lifecycle

A lot of discussions are ongoing on the motivations and role of social BPM, in particular on how and when it should impact on the classical BPM lifecycle.

In terms of motivations, I think the social extension of a business process can be regarded  as a specific optimization phase, where the organization seeks efficiency by extending the reach of a business process to a broader class of stakeholders. This general objective articulates into different optimization goals, which constitute the motivation of the process socialization effort:

  • Exploitation of weak ties and implicit knowledge: the goal is discovering and exploiting informal knowledge and relationships to improve activity execution.
  • Transparency: the goal is making the decision procedures internal to the process more visible to the affected stakeholders.
  • Participation: the goal is engaging a broader community to raise the awareness about, or the acceptance of, the process outcome.
  • Activity distribution: the goal is assigning an activity to a broader set of performers or to find  appropriate contributors for its execution.
  • Decision distribution: the goal is eliciting opinions that contribute to the making of a decision.
  • Social feedback: the goal is acquiring feedback from a broader set of stakeholders, for process improvement.
  • Knowledge sharing: the goal is disseminating knowledge in order to improve task execution; at an extreme, this could entail fostering mutual support among users to avoid performing costly activities (e.g., technical support).

To attain these objectives, the social BPM features (or levels of adoption) must be incorporated into the business process lifecycle.
I tried to understand on  which phases of the classical BPM cycle the Social BPM levels of adoption impact more and I ended up with this simple mapping:

Social BPM levels mapped to the classical BPM cycle

While participatory design obviously impacts more in the design and modeling phases, the social enactment and participatory enactment apply on the execution phase. Finally, process mining involves some technical aspects at execution time (e.g., logging of events) but then plays its role mainly in the monitoring and optimization phases.

I wish also to highlight that, when the model of the social process is consolidated, the deployment phase might also play an important role: it consists of a the technical phase that produces the actual executable version of the social process enactment application. This task might be complicated by the need of interacting at runtime with social software to support the social interactions required by the process model (in case of social or participatory enactment). These platforms are available online and can be used as a service in the enactment of the process (e.g., LinkedIn for skill and people search, Doodle for decision distribution, etc.). However, the integration of the BPM runtime to the social services is a nontrivial task, complicated by the absence of an interoperability standard masking the technical details of the APIs of each different platform.

To face this problem means to support easy and quick deployment, which is a critical enabler for convincing the management to embrace some social approach with limited cost. For that one can rely on technical architectures and development tools that automate the generation of process enactment applications from Social BPMN process models.
In the paper we recently wrote for the BPMS2 workshop we describe how to exploit the WebRatio architecture and tool for this purpose. WebRatio applies some model transformations to:

  • first map Social BPMN models into the WebML Domain Specific Language (DSL) application models 
  • and then the WebML models into Java components connected to social software APIs

We will delve further into these issues thanks to a research project called BPM4People that has been recently funded by the European Commission within the 7FP Capacities program for SMEs.

Do you see any other possible motivations or impacts of social BPM?

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).