We now consider some of these issues more formally. Consider a licensor (an individual, group of individuals, or a corporation) choosing among K different licenses, k=1,...,K. License k confers expected payoffs or utilities and for the licensor and the (representative member of the) developer community, respectively.
For example, these payoffs may take the following form (for i equal to L or C):
is the sum of the signalling benefit (peer recognition, career concerns) and the potential benefit of being able to tailor the code for one's specific usage ( may also include the pleasure of working in a type-k open source environment),
is the expected commercial incentive. For an individual, this would include the option of providing services or services based on the open source project, perhaps through a start-up. For the corporate licensor, this would include the option of privatizing the code later on, an increase in the sale of a complementary proprietary software due to the development of the open source project, or the reduction of the mark-up of another commercial software due to competitive pressure of the open source program.
Letting denote the opportunity cost of participating in the open source project of the (representative member of the) community,17 then the choice of license is governed by a constrained optimization. All else being equal, the licensor would like to choose her preferred license, but must satisfy the open source community's participation constraint (CPC):
Furthermore, it must be the case that the resulting choice satisfy the “licensor's participation constraint” (LPC). Let denote the payoff to the licensor of keeping the code private rather than releasing it under an open source license (the licensor may undertake the project as a proprietary project, or may just work on alternative projects). Then it must satisfy the constraint:
Let us for simplicity assume that there are only two types of licenses—restrictive (R) and permissive (P)—and make the following assumption:
If then a fortiori
The motivation for the assumption that the project leadership (the licensor) is relatively more likely to benefit from a permissive license is that the ability to demonstrate talent to one's peers and/or to the labor market is not much affected by the choice of license. But commercial benefits, which are probably larger under a permissive license, are likely to flow disproportionately to the project leaders: as Lerner and Tirole  document, there are numerous examples where project leaders have parlayed participation in these projects into such opportunities. Put another way, because the leadership of the project is likely to benefit more than community from a permissive license, if a restrictive license is better for the leadership then one can assume that such a license will also be better for the community. Note that this assumption says nothing about absolute preferences. The leadership and the community may both prefer the restrictive license or both prefer the permissive license.18
Ignoring for the moment the licensor's participation constraint, we can distinguish two cases:
Strong community appeal: The community will participate if the licensor wants to opt for the permissive license In this case, the licensor chooses between the restrictive and permissive licenses in an unconstrained fashion.
Fragile community appeal: The licensor will not obtain participation if she opts for the permissive license Thus, the licensor must opt for the restrictive license, whether the latter is her unconstrained preferred choice or not.
To illustrate the impact of community participation on licensing choice, suppose that the community of developers expects no financial reward from being able to commercialize complementary proprietary software or support Suppose further that their benefits from ego gratification, career concerns, and open source interactions satisfy the condition that:
This setting is depicted in Figure 1. For convenience, the figure normalizes the licensor's benefits to be equal to those of the community, although in practice they may differ (e.g., the leadership may get greater benefits). The line S0 represents an indifference curve, denoting combinations of benefits that provide equal levels of satisfaction to the licensor. If the choice is between points 1 and 2, the leadership prefers the restrictive license: the higher financial prospects from a permissive license are not sufficiently large to make it appealing to the licensor. The restrictive license is then a Pareto choice to the extent that both parties prefer it. By way of contrast, if the choice were between points 2 and 3, the licensor would prefer the permissive license, but instead chooses the restrictive one so as to enlist other programmers.
One application of this framework concerns projects with unsophisticated end users as the intended audience, such as desktop applications and games. It is plausible to regard these as part of the “fragile community appeal” category:
Ego gratification and career concerns incentives do not have much power, as the audience mostly does not look at the code and is not composed of the programmers’ peers.
The benefits from tailoring the code for particular applications are weak.
By way of contrast, code aimed at developers, and to a lesser extent, system administrators, is more likely to belong to the “strong community appeal” category. This reasoning suggests that code aimed at developers is more likely to be licensed under a permissive license than code oriented towards unsophisticated end users.19
One interesting question relates to the licenses chosen by corporations when they release code. It might be thought at first glance that corporations would universally employ permissive licenses, since they wish to retain the right to commercially exploit discoveries. But we should actually not be surprised if we see firms choosing more restrictive licences. The very fact that the licensor could keep the code private (the licensor's participation constraint) implies that the act of releasing the code creates a “truncation effect.” Corporations release some pieces of code because their chance of winning the commercial battle against a rival has become small (say, due to technological differences or network externalities). As a result, the corporation prefers gambling on a different strategy, such as selling consulting services or licensing the code on a case-by-case basis to customers that for some reason cannot make use of the product if subject to a restrictive license.20 Such code may therefore belong to the fragile community appeal category, and thereby be more likely to receive a restrictive license.
A related, but different, point is that the choice of initiating an open source project may be subject to an adverse selection problem if the community is less well informed than the licensor. The community may be suspicious about the project’s prospects (the licensor may have released the code because the commercial prospects were low) or about the licensor's intent (such as its commitment to the project rather than to commercially adjacent segments and the possibility that the firm will reprivatize the project). This latter concern about reprivatization may induce the licensor to choose a restrictive license in order to “prove” her good faith.21
4. Constructing the Sample
The dataset consisted of all software development projects listed on (and for a subset of the analyses, hosted on) SourceForge.net. SourceForge is a free service that since 1999 has offered hosting and project administration tools to software development projects. The site’s operations have been funded since its inception by VA Software (formerly known as VA Linux), which at the time of the site’s creation was primarily selling computer systems optimized for Linux. Today, VA Software has abandoned the hardware business, and intends to ultimately earn a profit by selling a version of the SourceForge service to corporations to manage the development of software for internal (proprietary) applications.
SourceForge contained (as of May 2002, when the data was accessed) approximately 39 thousand projects. Essentially, it accepts listings of (and is willing to host) all projects that conform to the Open Source Definition discussed above, as well as selected projects operating under licenses that are not compliant with that definition.22 Not all open source projects, however, are hosted on SourceForge. Many of the largest projects instead have their own web sites. Other projects are hosted at smaller competing sites. These tend, however, to be much smaller: Savannah, often referred to as SourceForge’s leading competitor, had 790 active projects in May 2002.23 Even when the projects are hosted elsewhere, however, these projects in many cases are often still listed in SourceForge (the reader is simply encouraged to go elsewhere to make a code contribution or report a bug). In cases where a project was listed on SourceForge but hosted elsewhere, we are able to gather the basic data about the project, even if we cannot determine the extent of activity in the project.
We accessed the data in two forms:
The basic data about each project was downloaded from the SourceForge web site. This information included the stage of development of the project, the environment in which the project operated (e.g., Windows-based systems, handheld devices, Internet applications), the type of license employed, the human language in which the programmers operated, the operating system under which the program ran, and the intended audience. Since project leaders report these data to SourceForge, a natural question relates to their accuracy. An important point to note, though, is that the project leaders are trying to recruit users to make an extended time commitment to their project. Undertaking a “bait-and-switch” strategy at the time of recruiting new users—e.g., by making the project appear to be something other than what it really is—is unlikely to be a positive signal to prospective developers. Only in approximately 40% of the cases, however, was the full information on the project (and especially the license type) available. This reflected the fact that project leaders did not always complete this information at the time the project was established on SourceForge.
We obtained directly from the SourceForge staff various supplemental measures, including the date at which the project was first posted on SourceForge and the activity at the web sites (e.g., bug reports submitted and resolved) since the inception of the site in 1999. The latter data were available only for approximately 10 thousand projects. In the other instances, the projects could have attracted no activity whatsoever, or else the activity was concentrated on another site.24
The two datasets were then merged. The set of projects in the SourceForge database is summarized in final columns of Table 1 and Table 2. In each case, we indicate the distribution for all licenses and for the subset of projects where the site has had active postings from SourceForge users.
Several patterns are evident from these tabulations. First, the dominant role of the General Public License is clear. Fully 72% of the licenses are the GPL, and its less constraining cousin, the Lesser GPL, represents another 10%. The BSD license, which represents 7% of the sample, is third.25 Second, the sample is dominated by early-stage projects. This dominance is somewhat less pronounced in the tabulation of projects with contributions: not surprisingly, the youngest projects have garnered the fewest contributions to date. Third, the sample is dominated by projects in English, oriented to end-users and developers, and geared to two families of operating systems (the POSIX family—which includes Linux, BSD, and Sun’s Solaris—and Microsoft) or else independent of any operating system.
A natural concern is the extent to which the measures are co-linear: in other words, the extent to which the characteristics of the projects are highly correlated with each other. Table 3 provides an illustrative tabulation displaying the cross-tabulation of project topic and intended audience. To be sure, there is some clustering: for instance, projects geared towards system administrators disproportionately involve security and systems tools (from which they presumably derive greater private benefits from tailoring the projects to their needs). But certainly, a considerable degree of diversity exists in this and the other comparisons.