Linus Torvalds on Distributed Version Control with Git

February 23rd, 2008

A friend directed me to the YouTube video posted in May of 2007 in which Linus Torvalds talks about git and the general advantages of distributed version control systems (DVCS’s) over centralized version contol systems (CVCS’s).

Here are some of the key points, liberally paraphrased by me.

  • Advantages of DVCS:
    1. While working offline, developers can still commit their changes. This is a big win when you have a large community of developers who are only loosely coupled, i.e. they don’t all have commit access to a central repository.
    2. Branches occur as everyday events. Merging is not the cumbersome and specialized task we have come to think of under CVS.
    3. Everyone has commit access. Eliminates distinction between trusted insiders with commit access vs untrusted outsiders without it. Then when you complete work on your branch, you offer it back to the community where it may be accepted or rejected. Linus says this point alone is reason enough for every open source project to use a distributed model of VC.
    4. Release handling is facilitated: any release team has their own branch, selected by merging selected branches from #3. [And multiple release teams with differing goals and timetables are possible, without getting in one anothers' way.]
    5. Centralized code base is a deterrent to commits. Many shops have a strict policy - don’t commit [or merge to major branch] until you are sure everything works, run a huge test suite, etc. If you don’t branch very often, you end up with huge branches and a large merge liability. If you do branch often, you put a lot of resources into testing and re-testing. It’s expensive either way.
    6. With DVCS, smaller groups can pull from one another and get a start on integration and testing across teams. This allows heirarchical development and reduces need for software priesthood at the top gating all changes into production code.
    7. Lots and lots of branches can happen without cluttering a central repository.
    8. DVCS is the model the Linux kernel uses. And it works. It helps that networking team can share branches with NFS and vice versa.
    9. When merging for a release, you don’t try to cover all available branches. You pull from your network of trusted developers, maybe 5-15 individuals. “At some point trust means you have to accept other people’s decisions.”
    10. Git is much easier to use than CVS, once you get over the hump of what distribution is about.
    11. Merging is much easier with git than with CVS. The Linux kernel is over 22,000 files. They have been using git for 2 years, with an average of 4.5 merges per day.
    12. When a merge conflict between two branches occurs, instead of pulling out his hair, Linus accepts one branch, then kicks task of resolution back to author of code in the other conflicting branch. That way Linus does not need expertise in every aspect of the kernel. Second developer does the hard work, instead of Linus.
    13. DVCS is clear winner if you have teams collaborating at different sites, unless you have a very fast and reliable network.
    14. Diff against repository and commit is orders of magnitude faster in DVCS than CVCS, even if you have a good network, and many projects do not have a good network.
    15. In CVCS, one deterrent to branching is the global namespace for all branches. What do you call your test branch - “test”? Oops, there are already branches test1 [or test0001] through test2000. So you invent bureaucratic branch naming conventions.
    16. Merging several times a day with minimal headache (often done in less than a second w/22,000 files) keeps away the big dreaded consolidated merge task.
    17. CVCS merges (CVS and SVN) lose history, making repeated merges difficult.
    18. DVCS eliminates single point of failure, uses natural replication of data.
    19. Git uses SHA1 - this is to check consistency at project level, not really a security feature. If you keep 20-byte hash, you can download a copy of an entire repository from a completely untrusted source and be confident in integrity of the content. Google Code, which is based on SVN, does not offer this degree of assurance.
    20. Graphic display of branch/merge history, done quickly (gitk), is an important feature of git, especially for release teams. When there is a bug, merge guys don’t care about a single file, they care about subsystems. What happened in the SCSI subsystem in the 15,000 commits since last week? This command can cut that down to 50 commits.
    21. Git is 100,000 lines of C. Some of it is complex because performance is a goal, for example optimizing handling of an entire tree vs. simple one-file-at-a-time traversal.
    22. CVS annotate is faster than git’s annotate at file level, but git can track file changes e.g. when a function moves from one file to another.
  • Observations on DVCS:
    1. As soon as you commit code offline, you are creating a branch.
    2. If you have an active development community, backups of a central repository are far less of an issue. If you lose your local copy, pull the corresponding branch from someone else. This is what Linus does.
    3. In centralized version control systems, there is a notion of commit access. You have gurus with commit access and morons whose code can’t be trusted. This leads to political struggles within the project and saps creative energy from both gurus.
    4. In a centralized repository, branches are global.
    5. Development typically proceeds with hundreds or thousands of branches, but only a few are seen by the outside world, or indeed by the majority of insiders.
    6. Merging and releases are based on a network of trust. Linus points out, if you have ever done security work, you realize it boils down at some point to a network of trust. The same is true of release management in a distributed environment. Trust networks fan out so it allows multiple tiers of effort.
    7. Linus encourages multiple releases of the kernel going on, says his branch probably gets more weight than it should.
    8. Git and Mercurial (hg) are the two DVCS’s Linus recommends, with preference for git.
    9. Some companies that are not using DVCS officially may have developer teams that check out from SVN or CVS and work within git until they are ready to merge back into the central repository.
    10. Distribution means nobody is special.
    11. Centralized systems have been used successfully in tightly controlled corporate environments for decades.
    12. You do not track a single file in git. You track all content of the project. (Atomic commits, etc.)
    13. KDE kept everything in a single CVS repository, 8 GB. SVN blew that up by a factor of 3. Git might compress that down to 1.3 GB. But in this scenario, initial cloning of the tree with git is expensive. It’s much better to put separate components into separate repositories.
    14. Git uses a content addressable storage system, which saves considerable space when many copies of the same content are involved.
    15. User “super project” to coordinate different components and deal with shared build structure.
    16. There is a significant difference between something taking 30 seconds vs. taking half a second. When a task takes less time, people will use the tools differently.
    17. Branching is not the issue. CVCS’s can create branches very quickly. Merging is the issue.
    18. Git tracks content, not files. History is on project basis, not file basis.

Other commentary on this talk: Linus Torvalds on GIT and SCM
Slides and text of a talk (pdf): git - A Stupid Content Tracker ~ Junio HamanoWet And Wild Nymphos CD-1
childhood masturbation
pee and blow
hardcore couples outdoors
free gay oral
big boobs handjob
sexy latinos
lesbian sex slaves
women have sex with dog
guy on girl sexcam
virgins teens
fox hustler flex fit hat
xxx bdsm
ass porn
cigarette smell removal
big dicks little chicks
gay demon
american bad ass
cute black girl getting fucked
adult halloween party ideas
shaved asian teen
gay male videos
hardcore stocking
korean underground teens free nude pictures
rachel weisz paparazzi
hot petite lick
milf big boobs
bi gay
midget men fuck women porn
usenet petite model
cum fuck me boots
milf blowjobs
interest rate swaps accounting
deep enough to dream mp3
lolita toy
my first sex teacher full vid
british milf
personal accident insurance broker
lg flatron
fisting dildo
keira knightley porn
big tits bikini
boy cam
college girl party
korean chicks nude
POVerted-6 CD-2
petite bald pussy
amd assembly compare and swap code
mariah carey boobs
shemale fuck
rubber mouth gags
how to fuck a girl
linda hogan upskirt
hentai gothic
archos 504 personal media player
facial rejuvenation ct
dangerous side effects celebrex
midget cum anal
forced gay blowjobs
youn girl and big stud
penis implants
creampie housewives
reality hardcore parties
japanese secretary orgasm
drunk dorm sex
massage miami school student
young latinas nude
teenage girls in bikinis
ca military relief fund
giving girls orgasms
stripping party
petite female nude models
love making techniques
abduction and rape stories
full figured breasts
throat infection
squirt mouth
penis close up
nude gay beach
tiny titty
paparazzi celebrity
learn how to deep throat
young cunt gallery
mily anal beastiality
adult diaper sites
hard money loan personal unsecured california ca
milfriders.com password
hand foot and mouth disease
nude beach erection
medical sex bondage
Tesao Na Adolescencia CD-2
nude athletes
swimsuit models cheerleaders
free galleries pregnant nude women
sluts sucking cocks
Blow It Out Your Ass CD-2
giant women swallow guys whole
adult voyeur
sexy gangbang
baby bottle fetish
redheads nude
my wifes clit
dump the pussy
young nudes
twinks pics
rape me nirvana
kinky pvc stocking sex vids
girl give me head
hardcore throat fucking
amateur chunky chubby fingering older
classroom teacher tits
girls using a dildo
booty barker nextel cup crew chief
latino marketing plastic surgery
hott blondes
Dont Stop Im Going To Squirt CD-1
old sluts in stockings
teenage girls naked
fucked hard brunette
free erotic videos
panty hose teacher
fuck me daddy
age pee mp3
vagina games
girls in bikini pics
hot squirting orgasms
femdom free stories
literotica stories
shemale best sex
facial video preview
vintage rodox hairy pussy thumbnails
men ass holes
biggest cum shot
drunk vip girls
animated bdsm gifs
free gay male sex pictures
free sample of penis enlargement
double cock suck
silvia saint hardcore gallery
male animal sex
femdom stories
ass eating
jades nude celebrity archive
zoo sex free
free teacher pussy licking vids
hidden up skirt pics
naked and pregnant in public
free foot fetish chat rooms
massive female bukkake
coats choice horse stud australia
young girl peeing
Volgari Punizioni Anali CD-2
korean girl girl
horny chicks
free wives gallery
fff licking orgy
hot brunette squirting
sexcam jasmin
free paris hilton porno
wife in thong

yaws page - mnesia table backup

April 7th, 2007

Here’s a page you can drop into the yaws directory of any Erlang node to backup local mnesia tables. You can click on any table name. A backup of local ram contents is written to the mnesia directory, named for the table and the day of the week - on Saturday, table xtab will be backed up to file xtab.6.

To restore the table from that backup file, do for example
mnesia:restore(/path/to/xtab.6,[{recreate_tables,[xtab]}])

backup_tables.yaws

mnesia table viewer

March 21st, 2007

Here’s a page you can drop into the yaws directory of any Erlang node to view mnesia tables. You can click on any table name. If the table has 500 rows or fewer, its entire contents are displayed, otherwise only slot 0 is shown.

The yaws page is divided into sections to allow re-use of section code independently

mnesia_tables.yaws

partial-mesh Erlang clusters

December 13th, 2005

The possibility of breaking a cluster of Erlang nodes into smaller domains, with messaging allowed via designated border nodes, is attractive for two reasons:

  1. it eliminates the n-squared growth in network traffic of a full mesh
  2. it allows communication between distinct legacy clusters on the same network

Read the rest of this entry »

components of CM

December 7th, 2005

Software deployed over a platform falls into discrete CM layers
Read the rest of this entry »

automating CM

December 5th, 2005

A configuration management system for software will have these components:
Read the rest of this entry »

updating code in a running OTP node

December 5th, 2005

Word is, the release handler is not heavily used, inside Ericsson or outside. If we chose to use it for maintaining nodes on the fly, odds are we would find some key feature missing and would have to allocate time to add it.

Consider starting a node with minimal pieces needed for OTP plus sasl and mnesia. A mnesia table for the node is consulted and further beams are loaded from it.

  1. How do we handle code updates?
  2. How do we track CM information?
  3. How do we update applications that are shared, such as yaws?

don’t register transient processes

December 5th, 2005

Transient processes - spawned at runtime, and not running for the lifetime of the node, should be known only by other processes that need them. Use a globally registered name for permanent processes representing fixed resources.

Use of atoms should similarly be limited to permanent entities.

why doesn’t pg2 use mnesia?

December 5th, 2005

One reason pg2 and other Ericsson apps don’t use mnesia is that it would complicate configuration management (CM). A node can run only one instance of mnesia and that is coupled to the purpose of the node, and should not be affected by base OTP release.

Limiting growth of sasl log on a running node

November 29th, 2005

I think the way to deal with monster sasl logs is to enable log_mf_h. This can be done without stopping the nodes. Set environment vars, then stop and start sasl.
Read the rest of this entry »