Forking is a Feature

September 10, 2010

While Linus Torvalds is best known as the creator of Linux, it's one of his more geeky creations, and the social implications of its design, that may well end up being his greatest legacy. Because Linus has, in just a few short years, changed the social dynamic around forking, turning the idea of multiple versions of a work from a cultural weakness into a cultural strength. Perhaps the technologies that let us easily collaborate together online have finally matured enough to let our work reflect the reality that some problems are better solved with lots of different efforts instead of one committee-built compromise.

GitHub's map of code forks

The idea of "open source" in the technology world is really a set of cultural beliefs, despite usually masquerading as technical or legal choices made by a community. All cultures have norms, and standards of behavior, and most importantly, they have behaviors they consider antisocial or destructive.

For most of the first three decades of open source's ascendance, the most destructive action that one could threaten to do, the nuclear option, was to fork.

Tuning

There are several related technical concepts that can answer to the name "fork", but the one I reference here is the dramatic moment when a software project undergoes a schism on ideological or technical grounds. Instead of merely taking their ball and going home, those who forked were taking a copy of your ball and going to a new playground. And while splitting a community could obviously cause an open source community's momentum to grind to a halt, even the mere threat of a fork could cause significant problems, by revealing conflicting goals or desires or motivations within a previously-united community.

Still, forks have had important consequences. Firefox (earlier Firebird and Phoenix) was originally a fork of Mozilla, the open source browser that had been mired in indecision for half a decade. WordPress was born as a fork of B2, a neglected early blogging tool.

Outside of tech, forks have an even bigger meaning. You could make a pretty strong argument that the Reformation was a fork, or that Christianity itself is a fork. So clearly, forking a community can have a significant, even profound impact. But in tech, it had largely been seen solely as a violent, schismatic action.

Part of the predicate for forks being so disruptive was the idea that there is One True Version — a creation, like a piece of software, a written work, or anything else, that can only be accurately represented by a single ideal expression. Even some of the most disruptive technological innovations like Wikipedia are still built around the idea of achieving consensus on a definitive work, and striving mightily to avoid forks arising within the community.

Until a git named Linus changed that.

In The Road

You know Linus Torvalds, he's the guy who created (and nearly eponymized) Linux. But perhaps his most impressive act of creating technological culture is in fathering Git, the enormously popular distributed revision control system. That mouthful of a description basically means "system that lets a decentralized group of creators efficiently collaborate on a complicated bit of software". Other systems had enabled distributed revision control before, making it easier to rapidly evolve software, and to appropriately assign blame to whomever had introduced a bug into the program, but few had found any notable degree of popularity.

Forking on GitHub

But Linus' pedigree, influence and outstanding implementation immediately put Git at the forefront of choices to solve this class of problem. Even better, his credibility with the new generation of social software creators inspired the rapid launch and brilliant evolution of GitHub, a social network for developers that relies on the technology of Git as its underpinnings, but has also embraced the philosophy of Git as its fundamental interaction model.

Often, the very first thing a coder does when she sees an interesting new project on GitHub is to make a fork of it and start tinkering. That's only one of the reasons that GitHub so important, though; The GitHub principle of "see it, make your own version, and then get to work" has started to filter into other disciplines, as exemplified by design sites like Dribbble, and upcoming new sites for creatives such as Forrst.

These new sites are admittedly still in their formative stages — Dribbble just had its breakout moment with the recent popularity of redesigning the iTunes icon — but it's easy to imagine a more mature version where, instead of merely focusing on the pretty pixels on the screen, the designers who frequent the site were encouraged to describe their rationale, and to use the site's replying abilities (called "rebounds"on Dribbble) to do something more akin to forking, where raw Photoshop and Illustrator files were shared.

The One True Version

Most importantly, the new culture of ubiquitous forking can have profound impacts on lots of other categories of software. There have been recent rumblings that participation in Wikipedia editing has plateaued, or even begun to decline. Aside from the (frankly, absurd) idea that "everything's already been documented!" one of the best ways for Wikipedia to reinvigorate itself, and to break away from the stultifying and arcane editing discussions that are its worst feature, could be to embrace the idea that there's not One True Version of every Wikipedia article.

A new-generation Wikipedia based on Git-style technologies could allow there to be not just one Ocelot article per language, but an infinite number of them, each of which could be easily mixed and merged into your own preferred version. Wikipedia already technically has similar abilities on the back end, of course, but the software's cultural bias is still towards producing a definitive consensus version instead of seeing multiple variations as beneficial.

There are plenty of other cultural predecessors for the idea of forking, all demonstrating that moving away from the need for a forced consensus can be great for innovation, while also reducing social tensions. Our work on ThinkUp at Expert Labs has seen a tremendous increase in programmers participating, without any of the usual flame wars or antagonism that frequently pop up on open source mailing lists. Some part of that is attributable to the cultural infrastructure GitHub provides for participation.

Moving forward, there are a lot more lessons we can learn if we build our social tools with the assumption that no one version of any document, app, or narrative needs to be the definitive one. We might even make our software, and our communities, more inclusive if we embrace the forking ourselves.

18 Comments

I'm not so sure this idea of pervasive forking is actually a good idea. I think where it works with Linus Torvalds, it's due to the hidden strength of the cult of personality.

We still struggle with this at the community level, for example, when looking at Unix/Linux sites versus Ubuntu community sites:

http://blog.stackoverflow.com/2010/09/fork-it/

Way back in the day there was a web server - the NCSA HTTPD - and it was free and Lo! many did use it. But it lacked features so people produced patches but the patches were rarely blessed back into the code and so getting the damn thing set up was a huge pain in the arse and it was hard to debug problems because there was an infinite number of variations of patches and all their versions working against other patches and their versions.

So some people collected the patches together and produced a new HTTP server called ... Apache ("A Patchy Webserver") and everything was good again.

But now I'm seeing that Git and Github is causing this kind of fragmentation again. If I want an OAuth library for Ruby then there's pelle/oauth which tells me to look at mojodna/oauth which has 34 forks. 34.

Which one should I use?

Which is to say - I'm torn on the whole affair. I like the idea but the community seems to lack the maturity to use the tools properly (ObSpiderman: with great power etc etc).

The nice thing about all this is that Git and GitHub have turned open source software development into a law-of-the-jungle, survival-of-the-fittest model rather than a we-are-the-one-true-project-because-we-were-here-first model. It's all about influence, marketing, personality, (some) money and of course, technical excellence.

So @Simon: just use the most popular fork (or original). You can't go far wrong.

Google Knol is almost doing this. I could see Knol being a very interesting experiment if they switched to more of a Git based collaboration style.

Simon, you make a great point: "I like the idea but the community seems to lack the maturity to use the tools properly."

It caused me to recall "The Wisdom of the Crowds," and Surowiecki's oft-ignored point that only crowds that are well organized and submitted to a wise leader (of sorts) become truly wise.

Perhaps the forking that github has implemented should be seen as one iteration toward what will be truly great, once those forks can be *wisely* judged and sorted and voted on by the crowd, to make it easier for people to make sense of the significance of each fork as it compares to the others.

Drupal and other open source projects suffer from the same general effect: multiple modules are created to handle the same task; modules that serve different functions but share common features stifle one another. There is no idea of "forking" here but the problem's the same: there needs to be tools not only to create things, but to make sense of the things created.

All in good time, all in good time.

Stellar piece, Anil. It won't surprise me to see people referencing it years from now.

Fork as feature provides a new lens through which to see Wikipedia's history. And maybe its future. It began as a hostile fork of Nupedia, edged out from within; the hostility came from those wanting to distance their work from such a relaxed drafting process. Refusing to bend was Nupedia's undoing, both as a project and as a general collaboration exercise.

More tellingly, wikis began as CVS applied to text. Wikipedia's open revision base is a readymade starting point for experiments of the kind you describe. Edits posted to many different web servers could be algorithmically threaded together, anthologized by readers, and even reconciled with main articles over time.

James Boyle wrote, �We are systematically likely to undervalue the importance, viability, and productive power of open systems.� Granular, permissive forking could likewise have a net effect of heading off schism at the level of concepts or worldviews. I�d love to see more intellectual content built to fork. Less taking my ball and going home, more meeting you down the road.

Your opening point about his legacy it true, I believe. I know little of the world Mr. Torvalds has built, but the idea of open-source and the use of what comes of it pervades my world.

I work with retailers and consumer products companies to build "culture" around their brands. The idea of "forking" (if I understand it correctly) makes a great deal of success in my work.

You see, much of my world has been consultants (designers, architects, advisors) like me developing a solution to a problem when the reality has been a that a complex situation warrants a set of responses, often times, each with a significant investment necessary to execute.

When we allow ourselves to be inspired by ideas such as open-source we develop structures akin to language. And subsequently, grammar, and perhaps even poetry. Just because the words are out there, don't mean nobody knows how to use 'em right.

So, yes the legacy is there. We see it today with the most sophisticated (big and small) companies being open with their culture (that means they need to nail down their purpose) and allowing their customers to make new things. Facebook is the best example of everyday people taking bits and pieces and making new value for themselves and those in their orbits.

This construct actually liberates us from the duality of problem/solution and gives us a plurality of opportunities/pathways. The trick is and always has been, deciding what to do then.

Chuck Palmer
ConsumerX


(If you don't mind, I've posted your article and my response on my blog at www.ConsumerXretail.com to share with my readers.)

I agree that easy forking is critical, however, easy merging is equally important. We no longer fear forking primarily because it's not a permanent bifurcation. Forks are mostly temporary and easily re-merged.

Git's design and Github's tools make merging others' forks trivially easy. Using Github's Fork Queue you can merge another's fork in one click. Github encourages you to view the commits on other people's forks as a todo list of sorts, things you should merge into your own fork. In fact as soon as you merge the changes from other forks, they disappear from both your Network view and from your Fork Queue view, showing you only the remaining "outstanding" forks/commits to be merged.

That being said, sometimes forks do persist. For projects where there are many unmerged forks, sophisticated developers use the Network view to determine which of the forks is the best to use; it's not always the one from the person who started the project. The key is that the Network view, with its information about who is forking and merging from who else, serves to give developers insight into the social dynamics of the history of the project, allowing them to make an informed choice about which fork to use. There's nothing saying that these tools couldn't also be made available on Wikipedia, however it takes a fair amount of thought to divine meaning from such analytics.

Simon's story of HTTPDs is amusing because of how/where it cuts off. Yes, the Apache HTTPD is pretty much the most common stable of web servers in the Open Source World today, but that's not the end of it. Think of how many other projects have been born out of perceived faults with Apache HTTPD, LigHTTPD, nginx, WEBrick, etc.

Anil, I think it's important to distinguish between the various types of forking you mention. In the "Tuning" section, you describe famous forks that involved a change in direction or personnel; or else involved evolving what someone else had let languish.

A GitHub-style "fork" would be the first step in this process if it were to happen today. But GitHub promotes forking (not in a PR sense, but in a UI sense) as a way to contribute patches back to the original project. By my observation, the great majority of people who fork Prototype do so without the intent of taking the project in a different direction. They often don't even intend to maintain their patched version by keeping it in sync with later commits to the repository whence they forked.

At any rate, I simply can't imagine how this can be applied to Wikipedia. It's a site for the aggregation and sourcing of facts. In what respects would forked variants of articles differ from the originals? (Do you envision different political slants? Different standards of notablility?)

Most importantly, how could a user make an informed decision about which forks to use? Simon's comment above describes how helpless he feels when he has to choose between 34 versions of an OAuth library. What happens when a popular article has 5,000 forks, most of which involve only minor edits to whatever snapshot of the page existed when the fork was created?

You don't answer any of the above questions, or even make much of an argument about why a forkable Wikipedia would be useful. But you still say, almost parenthetically, that Wikipedia's devotion to canon is a "cultural bias," rather than a decision arrived at through a reason-based process. Why are you so certain?

Not to sound like a "git" ...but it is Aristotelian theory "truth is found through the dialectic" ...but the body politic can be warped if their is not agreement on base standards persuaded/structured by(common) ethos, pathos, logos.

In other words...too much forking, too spread, too "micrososmed" and you have a mess.

I dont believe in this. What you are saying is that a somewhat unnecessary version control system (dist scm is not anything close to new) and a blog site to manage developers is somehow different.

I'll give you another take on it.

Software only becomes open when its a commodity that is difficult to profit from. Its uninteresting and becomes a hobby potentially for many people.

Although its a good hobby and fun, most of these things simply do not matter. Unless of course you are trying to dump software on the market to devalue the for-pay software of a giant.

Managing forks is not a problem. The fact that a lot of people are wasting their time developing software only in the hot house of their minds for no general benefit is.

Changing software or rewriting it just to do it, is of questionable value, especially when nobody really cares why or for what purpose.

Its self flagellation and a waste of developers resources.


There are two ways of looking at applying the idea of forking to Wikipedia.

First is a complete fork, an entirely separate copy and branching of the encyclopedia, which is something that has happened already in various forms. It ranges from wholesale copying of the site (which is mostly spam mirrors and mobile-specific sites) to starting almost entirely over again (Citizendium and Conservapedia).

The other way of looking at forking is forking of individual articles inside Wikipedia, which seems to be what you're getting at. Frankly, as a Wikipedian, this is never going to happen. The idea of a single, neutral encyclopedia article is so core to the nature of the site and its community that to change it would destroy what's been built already. Multiple articles on a single subject also presents serious logistical challenges for readers and editors, as Knol has shown.

In the general space of applying revision systems to wikis, it will be interesting to see how the new wikis at Github play out, since they recently moved to using a git-backed system for those.

I'm happy that both the merging and the network view issues were addressed on the previous comments. I have been interested in extending the git&github; models beyond software myself. I understand the interest in considering Wikipedia as the next logical step for networked collaboration right after code, but I think there is a fundamental difference between the two. While software code contains a set of rules that would operate a system, Wikipedia's model is almost opposite - it documents a system that is already happening or has happened. Wikipedia attempts to document a monolithic past while software attempts to imagine the multiplicity of the future(s).

If there is room for the distributed model to be extended beyond software, I believe we should try to find other creative processes that are aimed at the future. One of these fields which I am very interested in these days (and I know you Anil are to) is legislation.

We've already established that "Code is Law", but we have not realized that it also means we can/should fork it and hack it, and then possibly merge it too. Most Open Gov initiatives have been focused on government transparency - making the past activity of the gov't more accessible hoping this would make representatives more accountable in the future. We're often use software to mashup the past data rather than to help create it.

I say, rather than just promote or fight a bill (traditional pre-Internet models of engagement) we should fork it. Don't send me tiring petitions about why this is wrong, send me a "diff" highlighting your proposed patches (then we can fight together for a "pull request").

While there will only be one "build" we'll be able to "execute" together, the git model is not just about forking. It's about mitigating individual creativity and autonomy with the collective production. In software projects, you can fork and follow your own individual trajectory at the possible high price of losing the benifits of a community. The same can be said about democracy. It's about open leadership� People largely choose to engage rather than live in the hills, so I believe we should encourage them to fork and trust that they will have enough incentive to merge.

Anil, or anyone else interested in further developing these ideas, I'd love to hear your thoughts here or @Mushon(.com)

Content creation and forking?

That is Tumblr.. though Google really has problems with canonicalization.

Another interesting concept will be forking of conversations with those "salmon" being distributed all over the web swimming up and down stream.

I would fork this blog post, but CC by-nc-sa in some respects prevents that on blogs for commerical use.

Licensing is going to get interesting when comments start moving between sites at the behest of the site owner.

I think this post posits an interesting, tho disturbing, explanation for the decline in wikipedia participation: that people are frustrated with a single, centrally-and-consensually-determined version of what is "true", and that participation would increase again if they gave people the option of creating multiple "forks" of wikipedia articles. in other words, create their own reality.

So we could have one "fork" of the entry for Barrak Obama that explains that he's a muslim born in Somalia who wants to take away your guns, and another "fork" that would say he's a christian born in hawaii. and who knows how many other versions. And there'd be no way for the casual reader to decide which he wants to read or believe. And Anil seems to claim this would "reduce social tensions" as well.

how very post-modern.

I love the idea of open-sourcing everything the government is tasked to do. It's archaic and unrealistic (foolish?) to expect a few (mostly rich, white males) to make good decisions that affect hundreds of millions or billions of people. It's 2010! We have a lot of technology at our disposal yet we don't even vote electronically. Ridiculous! An open source approach would allow for an organic proposal-iteration-approval process. It's proven itself, and, I believe, is our best hope toward breaking out of the stalemate we're currently in and toward creating a better today. Let the people who care about a topic/issue (likely those who will be most greatly affected by its outcome) do the work to move the idea forward, not some unattached, disinterested government worker sitting in an office hundreds of miles away with nothing on the line. Let's let go of the really, really old ideas we have about how government can and should operate and embrace and make use of the great tools we now have at our disposal.

I'm noticing a thread through the critical comments; forking isn't helpful if there's no consensus on which fork is the "master". Forking works well for Linux, because you justify your fork with "it's better than Linus's tree because I do X, Y and Z", and if your fork really does improve things, Linus will eventually merge your fork back into his version.

With other projects (e.g. the OAuth implementations Simon mentions), there's no baseline to compare to, or gold-standard tree to aim to be included in. As a result, it's harder to explain why I should use your fork, and harder for me to find a project that I can use as a suitable baseline.

Leave a comment