Scientific Foundations and Challenges


Challenge: Theory unification and predictive models

We need a theoretical framework that is rich and encompassing enough to provide practical guidance on how to design an online community. There is now an opportunity for a new approach based on synthesizing psychological, organizational, economic, and social theory (and more) that can be tested against real digital traces of online communities, and that can be used to navigate the complex trade-offs needed to design socio-technical systems that improve individual and community performance. The framework must be rich and complex enough to support integrated models that support (a) decomposition of macroscale phenomena down to microscale mechanisms that are (b) relevant to the understanding and design of online communities that evolve over months to years and encompass large numbers of people and (c) predict accurately the effects and tradeoffs of design decisions made at levels ranging from moment-by-moment user interaction to long-term social dynamics.

Whether models in agent-based simulations, dynamical systems, or some other approach, there is the opportunity to harden and integrate a new unified theoretical framework. Broadly speaking, conditions are ripe for a burst of theory development that will lead towards more complex, nuanced, and integrated models of individual and social phenomena in online communities. Massive amounts of data from online communities are becoming available, These data trace phenomena at multiple time-scales that can be used to test and validate models of the interplay of psychological mechanisms, behavioral economics, social relations, network dynamics, and more. Thousands of online communities varying in purpose and architecture provide a vast natural laboratory for testing and integrating such theories, and for understanding what makes those communities work or not. These data are attracting interest from a variety of fields, and although there is interest in cross-fertilization, much work remains to develop a unified framework to provide the foundation to predict and prescribe successful communities.

One can ask: Do we need to theoretical integration? Human behavior is the result of a hierarchically organized set of systems rooted in physics and biology at one end of the spectrum and large-scale social and cultural phenomena at the other end. The time scale for operations of each system level in this hierarchy increases by approximately a factor of 10 as one moves up the hierarchy. This hierarchical organization produces layers of phenomena at which different mechanisms and factors dominate: neural, psychological, economic, and social--just to name the more familiar. Theories are needed to understand and predict how microscale factors at the level of the individual (such as changes in usability or communication costs) percolate upwards to yield macroscale emergent phenomena at the social level (such as increased participation or improved social intelligence). Similarly, theories are needed to understand how macroscale social factors percolate downward to shape individual behavior.
JCT: One of the interesting challenges in this regard is the integration of theories across many levels of scale. This seems particularly problematic in human behavior (and HCI) because problems at a lower level can (but do not always) propagate up to higher levels. Example: Someone is designing a system to support electronic voting but the system fails because of perceptual errors or timing issues in the hardware and as a result the higher level system fails. But in other cases, lower level details can be safely ignored because a robust design allows people to work around such low level details. Is there a principled way to predict when and how lower levels impact higher levels?
PP: Perhaps there are two issues, both of which are important: (1) decomposition/emergence, which is the idea that things at higher levels can--in principle--be be decomposed into lower-level level causal stories, and (2) relevance, which is the idea that models necessarily abstract away irrelevant details, and for any particular set of design/engineering/scientific questions it is important to know what is relevant,what can be approximated, and what is really crucial.

Challenge: Google Social Earth

Imagine a visualization of the structure and dynamics of one or more online communities that could reveal patterns at multiple levels of granularity, much like a “Google Earth” for the social world (of course it would implement the mantra of “overview, zoom and filter, and details-on-demand”). This would imply that we had sufficient modeling and measurement capabilities to create such a visualization. So the visualization is really just a concrete driver for the scientific challenge outlined above. It also requires dealing with severe issues of privacy and security (see below).

There are several analogies for this challenge. In astronomy for instance, there is the recent challenge to visualize the entire Milky Way from imaginary points outside the galaxy. The creation of such a map turns out to be technically challenging, but compelling if completed. Similarly, the Hubble and COBE were responses to the challenge of seeing as far back to the beginning of the universe as possible. NSF’s many projects to visualize science are some partial attempts to visualize online (scientific) communities.

Challenge: Models and metrics of online growth and sustainability

Virtual communities are typically valuable when they can harness the information and insight from a large and diverse user base. Although some virtual communities have been highly successful, many of them fail. For example, the success of Wikipedia is the exception and not the rule. Of the more than 9,000 wikis using the MediaWiki platform, more than half have seven or fewer contributors. An important reason for these failures is a lack of evidence-based, scientific guidance in building and managing online communities. To be successful, virtual communities must overcome challenges that are endemic to many groups and organizations. They must handle the start-up paradox, when early in their lifecycle they have few members to generate content and little content to attract members. Throughout their lifecycle, they must recruit and socialize newcomers, encourage commitment and contribution from members, solve problems of coordination and encourage appropriate behavior among members and interlopers alike.

Challenge: Trade-offs in growth and sustainability

Understanding and designing for online growth and sustainability will depend on integrating a variety of mid-level (sometimes competing) theories in the cognitive and social sciences that provide partial explanations for contribution behavior and how group membership changes it. For instance, information overload theory (cognitive psychology), collective effort (social psychology), theories of public goods (economics), structural hole theory (social networks) all provide partial explanations that if combined, will probably provide a richer understanding of the complex tradeoffs that need to be navigated to grow a vibrant online community.

Challenge: Online dynamics and stability conditions.

In what ways do models of population dynamics of online communities differ from models of natural populations? Are online communities fundamentally chaotic or unstable? Many complex systems biological and non-human social ecologies do have stable equilibria, but can be become unstable if perturbed by outside conditions.

Challenge: How and why do the structure, mechanisms, and dynamics of an online community impact aggregate social intelligence and instrumental action?

What does it mean to achieve social intelligence? A simple but useful definition of individual intelligence (due to Newell) is the effective and efficient marshaling of available knowledge to act in an effective and efficient manner to achieve some purpose. Collections of people, however, do not operate in the same way as individual brains. Individual cognition tends to have more coherent (and fewer) goals driving behavior, and the communication, flow, and integration of information happens fast in comparison to action. Simple sociotechnical architectures such as prediction markets are impressive in often surpassing individual expertise in part because they have a simple well-defined goal, people are motivated to rationally achieve the goal (usually by some bet), and the the nuggets of unique individual knowledge are efficiently communicated and aggregated within the system. Most online communities have multiple purposes with less coherence, more diverse (and sometime contradictory) motivations, and hence a greater need for complex mechanisms for marshaling and using information.

Challenge: What are the underlying mechanics that determine how social embedding and connections affect individual intelligence, preferences, biases, and ultimately behavior?

Individuals are situated in multiple tributaries of flowing information that comes to them through social relations. The social capital that accrues to individuals is affected by their positioning in social networks. Online communities, social networking sites, lifestreams, email, etc. have altered our social networks. Because of the richness of data available, we may now be in a position to understand some current mysteries about social network phenomena. First and foremost is the underlying causal mechanics, from the individual and person-to-person interactions on up, that give rise to well-known social network phenomena such as idea contagion, the network spread of obesity, smoking, and happiness, the effects of social brokerage on innovation, the effects of network closure on trust and reputation, and so on. Just as the internet is an abstraction implemented in the mechanics of different layers (application, transport, TCP/IP) that ultimately get realized in computers, routers, cables, and wireless, so too are social networks a theoretical abstraction that is realized by the social and and cognitive mechanisms of people. How it is that unhealthy behaviors actually move from person to person? How do the signals and stories arriving over an interface about others in a community get processed by the individual mind to form a judgment of reputation or trust? How does participation in diverse communities cause the individual mind to synthesize information and alter their judgment biases? We know that such phenomena exist, but we do not know how or why they work they way they do.

Challenge: How does technology affect Dunbar’s Number?

This question is meant capture a richer set of questions about the impact of sociotechnical architectures on how individuals engage their social relations. Dunbar’s Number refers to the relation of mean social group size to proportion of the brain volume devoted to neocortex (in primates). In humans, this number is approximately 150. There has been media interest lately in whether this constraint holds true in online social networking, perhaps implicitly questioning whether social network sites actually alter the quantity and quality of bilateral social relations that a person can engage in. People engage in a great variety of longer-term bilateral social relations (the distinction between “weak” and “strong” ties being only the coarsest of distinctions), and have only finite resources to seek out, maintain, synthesize, and utilize these connections. When people trade face-to-face time for time on Twitter and Facebook what do they gain and what do they lose? How does the distribution of social relations connected to a person shift under different sociotechnical architectures for online communities? How do user profiles, social network updates, RSS feeds, etc. enhance or hinder the individual’s ability to maintain and exploit mental models of their social world?

Challenge: How do incentive mechanisms drive online communities? How do norms and governance evolve? How does social transparency and provenance shape participation and production?

There is a sense among many researchers that participation and production in online communities has similarities to markets, and that online communities can be shaped by structuring incentive mechanisms. But online communities, peer-productions systems, and the like capture our imagination precisely because people are not driven by monetary reward (at least not in any direct way). Social motivations, the desire for attention, reputation, credibility, have all been vastly under-studied. How do norms and governance emerge and evolve? Why do they vary? Why and how do they adapt (or not) to perturbations? How do sociotechnical mechanisms for social transparency affect reputation, trust, and ultimately shape communal participation and action?
Related to these issues are the the mechanisms that lead to optimally structured social networks that balance diversity (e.g., bridging relations across diverse groups) and coherence (e.g., redundant relations within a group). The former is believed to foster learning, creativity, improved decision making, whereas the latter is believed to foster reputation, trust, and improved mobilization to action. Often, communities and groups grow beyond their optimal size (in terms of rewards to the individual members), which often leads members to develop mechanisms to exclude new joiners. The challenge is to understand the incentive mechanisms, governance, norms, and social signaling that drive online communities to have particular structures and dynamics.

Challenge: Security vs freedom.

On the one hand, in the social networking ideal universe, anyone can connect with anyone else. In the ideal secure universe, there is insurance against that. For instance, scores of online communities in US Army have arisen as grass-roots movements to exchange information and offer support. On the one hand, these initially thrive because they have few barriers to entry. However, security rapidly becomes an issue because a lot of the information is quite sensitive (e.g., information about day-to-day tactics is useful to the community, but dangerous in the hands of the enemy).

Challenge: Development of new protocols, instruments, tools to study online communities

Although massive amount of data about online communities has become available, we need methods that are capable of providing rich traces at the granularity and richness relevant to developing theory. The analog is the effect verbal protocol analysis had on the development of cognitive psychology. Detailed traces of cognitive states and dynamics made it feasible to develop and test more complex and integrated models.
In addition to more detail, the protocols we develop need to reveal intentionality. Online data logs may provide detailed records of behavior, but it is important to understand the goals, beliefs, perceptions, etc. that are driving behavior.

One set of methods that has provided some early results, and could provide additional footholds in understanding online communities, is the application of complex systems methods, such as network analysis and agent-based modeling. By embodying theories of agent behavior within a computational model and then allowing them to interact within the space of a social network it is possible to create testable models of social participation in online communities. That being said, there are large challenges that remain in this area. For instance, describing and characterizing dynamic networks is an open research question. Many static network metrics exist, but they do not provide insight into understanding and generalizing over networks that change in time due to the addition and deletion of users and relationships. Moreover, the development of novel methods for the development, analysis, and use of large scale models of behavior in which individuals are represented on a one-to-one basis in the models, and validating these models against real-world online communities is also a necessary step in the understanding of large-scale social participation.

Challenge: Availability and shareability of data and tools.

A major problem facing researchers these days concerns the availability of data about online communities. Currently private companies such as Yahoo, Google, and Facebook, provide data to privileged researchers, but this inhibits replication and extension, and more generally the building up of a community of practice of researchers. Private companies, and members of online communities are extremely sensitive about data privacy and security. There is no simple solution to this (e.g., anonymization is non-trivial). Consequently, it is a grand challenge.

Related to this is the challenge of developing a shared set of tools and techniques for research. Individual disciplines have their signature techniques, ranging from survey techniques, to agent-based simulations, to dynamical systems of nonlinear equations. A near-term challenge could involve the development of workshops for researcher training and the development of curricula for higher education.

Challenge: Help for Theory of Mind.

Humans typically have the ability to develop a "theory of mind"; that is, to imagine what the world is "like" from the viewpoint of another person. In fact, we can even imagine what person A thinks about person B's perspective on person C. However, this ability seems limited in several important ways. First, we cannot seem to extend this process indefinitely (whereas causal chains seem to be indefinitely extendable). [It may be related to the difficulty in understanding center embedded sentences]. Second, while we are generally capable of doing this with respect to physical perspective and states of knowledge, it seems more difficult with respect to people who have different believe systems (e.g., different religious beliefs; different political ideologies). Third, some people (young children and autistic spectrum people, for instance) have difficulty with theory of mind tasks. Fourth, even when people have the capacity to use theory of mind, they often fail to do so (competency vs. performance). The question is this: Is it possible to design tools that will help extend human capacities to develop and use theory of mind?