Ted Tso: Kernel Hacker

Syndicate content
Musings about Open Source, Linux, and Life by Theodore Tso
Updated: 48 min 17 sec ago

Ext4 is now the primary filesystem on my laptop

Mon, 06/30/2008 - 14:06

Over the weekend, I converted my laptop to use the ext4 filesystem.  So far so good!  So far I’ve found one bug as a result of my using ext4 in production (if delayed allocation is enabled, i_blocks doesn’t get updated until the block allocation takes place, so files can appear to have 0k blocksize right after they are created, which is confusing/unfortunate), but nothing super serious yet.  I will be doing backups a bit more frequently until I’m absolutely sure things are rock solid, though!

I am using the latest ext4 patches and the tip of the e2fsprogs git repository.  Hopefully when we get the bulk of the patches merged into the mainline kernel after the 2.6.26 ships and the 2.6.27 merge window opens, and after I ship out e2fsprogs 1.41 (I have one work-in-progress pre-release, with another coming soon), it’ll be ready for much more wide-spread testing.

In addition to the excellent crew of ext4 developers, I’d like to call out for special thanks Gary Howco and Holger Kiehl, two early users/benchmarkers of ext4 who tried our latest code, and reported bugs that had previously escaped attention by developers (who had been mostly testing the code via the same old test suites); their additional workloads and benchmarks flushed out a few additional bugs.   Thanks, guys!!

Hopefully after a few weeks of my using ext4 for real-live work, I’ll find a few last few bugs to be fixed, and/or feel much more confident it’s ready for me to recommend to others for their production data.

Learning how to communicate

Fri, 06/13/2008 - 16:15

Pity poor Dr. Ari Jaaksi from Nokia. He gave a talk at the Handsets World conference in Berlin on Tuesday, where according to ZDnet, he lectured Open Source Developers that they needed to learn why DRM and other closed technologies were necessary, because of business issues such as subsidized (device) business models. I suspect he wasn’t prepared for the reaction, which took the form of a major fuss on Slashdot, as well as some declarations from a few people on the maemo-users mailing lists that they would never buy another Nokia device. Dr Jaaski then posted today on his blog an entry entitled, “Some learning to do?”, where he stated that while Nokia needs to learn how the open source world works (not just licenses and legal issues, but also the spirit), that the open source world also needed to learn as well — about WHY things are the way they are. He ended the post with “I’m not a teacher, I’m a learner” — which is fair enough.

I actually think that Dr. Jaaksi is right; we all should be interested in learning, and having talked with a number of Nokia employees, I can testify that their hearts are very much in the right place, and there are some strong reasons regarding the demands of carriers, the subsidized business model of handsets in the US, the attitudes of suppliers for components in mobile devices, that very much tie their hands — as well the hands of every other handset company out there. Progress in this space will come slowly, and but I believe there is hope that in the long run things will get better.

However, at the same time, I would suggest to Dr. Jaaksi that it’s also important how to communicate with the open source community — or with any community that holds views sometimes with great passion. Many people do not react well to being told that they need to learn something — even if it is true that they do. Admitting that they need to learn is for some people tantamount to admitting ignorance, or even worse admitting stupidity, and so they are loath to do this. Telling them this just makes them defensive and angry.

My suggestion for how get people to learn without making them defensive is to use a modified Socratic method. It works especially well with the Open Source crowd, because many of them are engineers, and engineers are by nature problem solvers. So if you give them a problem to solve, they will go to work trying to find potential solutions, and thus start thinking and learning about the problem from a different point of view. Hence, a better approach is to give them a series of business constraints (time to market, the demands of carriers, the fact that most end users prefer to buy subsidized phones with 1-2 year lock-in contracts, the intransigence of suppliers, why a single company can’t afford to create their own chipsets from scratch and have to rely on companies like Broadcom, etc.) and then tell them — we’d love to delight our customers in any way we can, and clearly you are very passionate about DRM, open device drivers, and other OSS issues. We however have to create successful products and sell them at a price such that we don’t go out of business — can you help us work through all of these constraints? Let’s work together so we can make an open source mobile platform that meets the spirit as well as the legal requirements of the open source world!

Dr. Jaaksi and his team are struggling with a hard problem, and I am very sympathetic to their issues. And, they have to be commended for investing as much time and effort on the Maemo platform, as well as other Linux and Open Source platforms as they have to date. The strength of the Open Source community is that we work together to solve our mutual problems, while celebrating and giving permission to people to “scratch their own itches”. In order to do that, we need to learn how to better communicate with each other, and that is something that goes in both directions.

When I was at MIT as an undergraduate, far too many years ago than I care to admit, I took economics classes and Sloan School of Management classes (even though they were not necessary for a Computer Science degree) for that very reason. I encourage open source advocates to learn how to walk a mile or two in business people’s shoes, since if you want to influence them them to change in the ways that you desire, you need to know where they are coming from — just shouting and throwing rotten tomatoes generally won’t get you very far. On the flip side, people who are on the corporate side of the divide also need to understand that certain ways of engaging the community can be more productive than others. At the end of the day, hopefully we can all agree with Dr. Jaaksi that everyone will benefit if we all commit to learning from each other.

Update: I’ve fixed Dr. Jaaksi’s name so it is spelled correctly; my apologies for the error.

Donald Knuth: “I trust my family jewels only to Linux”

Sat, 04/26/2008 - 20:02

Andrew Binstock interviewed Donald Knuth recently, and one of the more amusing tidbits was this:

I currently use Ubuntu Linux, on a standalone laptop—it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux.

More seriously, I found his comments about about multi-core computers to be very interesting:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

This is a very interesting issue, because it raises the question of what next-generation CPU’s need to do in order to be successful. Given that it is no longer possible to just double the clock frequency every 18 months, should CPU architects just start doubling the number of cores every 18 months instead? Or should they try to concentrate a lot more computing power into an individual core, and optimize for a fast and dense interconnect between the CPU’s? The latter is much more difficult, and the advantage of doing the first is that it’s really easy for marketing types to use some cheesy benchmark such as SPECint to help sell the chip, but then people find out that it’s not very useful in real life.

Why? Because programmers have proven that they have a huge amount of trouble writing programs that take advantage of these very large multicore computers. Ultimately, I suspect that we will need a radically different way of programming in order to take advantage of these systems, and perhaps a totally new programming language before we will be able to use them.

Professor Knuth is highly dubious that the later approach will work, and while I hope he’s wrong (since I suspect the hardware designers are starting to run out of ideas, so it’s time software engineers started doing some innovating), he’s a pretty smart guy, and he may well be right. Of course, another question is whether what would we do with all of that computing power? Whatever happened to the predictions that computers would be able to support voice or visual recognition? And of course, what about the power and cooling issues for these super-high-powered chips? All I can say is, the next couple of years is going to be interesting, as we try to sort out all of these issues.

Organic vs. Non-Organic Open Source, Revisited

Fri, 04/25/2008 - 23:21

There’s been some controversy generated over my use of the terminology of “Organic” and “Non-Organic” Open Source. Asa Dotzler noted that it wasn’t Mozilla’s original intent to “make a distinction between how Mozilla does open source and how others do open source”. Nessance complained that he didn’t like the term “Non-Organic”, because it was “raw and vague - is it alien, poison, silicon-based?” and suggested instead the term “Synthetic Open Source”, referencing a paper by Siobhán O’Mahony, ” What makes a project open source? Migrating from organic to synthetic communities”. Nessance referenced a series of questions and answers by Stephen O’ Grady from Red Monk, where he claimed the distinction between the two doesn’t matter. (Although given that Sun is a paying customer of Red Monk, Stephen admits that this might have influenced his thinking and so he might be “brainwashed” :-).

So let’s take some of these issues in reverse order. Does the distinction matter? After all, if the distinction doesn’t matter, then there’s no reason to create or define specialized terminology to describe the difference. Certainly, Brian Aker, a senior technologist from MySQL, thinks it does, as do folks like me and Amanda McPherson and Mike Dolan; but does it really? Are we just saying that because we want to take a cheap shot at Sun?

Well, to answer that, let’s go back and ask the question, “Why is Open Source a good thing in the first place?” It’s gotten to the point where people just assume that it’s a good thing, because everybody says it is. But if we go back to first principals maybe it will become much clearer why this dinction is so important.

Consider the Apache web server; it was able to completely dominate the web server market, easily besting all of its proprietary competitors, including the super-deep-pocketed Microsoft. Why? It won because a large number of volunteers were able to collaborate together to create a very fully featured product, using a “stone soup” model where each developer “scratched their own itch”. Many, if not most, of these volunteers were compensated by their employers for their work. Since their employers were not in the web server business, but instead needed a web server as means (a critical means, to be sure) to pursue their business, there was no economic reason not to let their engineers contribute their improvements back to the Apache project. Indeed, it was cheaper to let their engineers work on Apache collaboratively than it was to purchase a product that would be less suited for their needs. In other words, it was a collective “build vs. buy” decision, with the twist that because a large number of companies were involved in the collaboration, it was far, far cheaper than the traditional “build” option. This is a powerful model, and the fact that Sun originally asked Roy Felding from the Apache Foundation to assist in forming the Solaris community indicates that at least some people in Sun appreciated why this was so important.

There are other benefits of having code released under the Open Source license, such as the ability for others to see the implementation details of your operating system — but in truth, Sun had already made the Source Code for Solaris available for a nominal fee years before. And, of course, there are plenty of arguments over the exact licensing terms that should be used, such as GPLv2, GPLv3, CDDL, the CPL, MPL, etc., but sometimes those arguments can be a distraction from the central issue. While the legal issues that arise from the choice of license are important, at the end of the day, the most crucial issue is the development community. It is the strength and the diversity of the development community which is the best indicator for the health and the well-being of an Open Source project.

But what about end-users, I hear people cry? End users are important, to the extent that they provide ego-strokes to the developers, and to the extent that they provide testing and bug reports to the developers, and to the extent that they provide an economic justification to companies who employ open source developers to continue to do so. But ultimately, the effects of end-users on an open source project is only in a very indirect way.

Moreover, if you ask commercial end users what they value about Open Source, a survey by Computer Economics indicated that the number one reason why customers valued open source was “reduced dependence on software vendors”, which end users valued 2 to 1 over “lower total cost of ownership”. (Which is why Sun Salescritters who were sending around TCO analysis comparing 24×7 phone support form Red Hat with Support-by-email from Sun totally missed the point.) What’s important to commercial end users is that they be able to avoid the effects of vendor lock-in, which implies that if all of the developers are employed by one vendor, it doesn’t provide the value the end users were looking for.

This is why whether a project’s developers are dominated by employees from a single company is so important. The license under which the code is released is merely just the outward trappings of an open source project. What’s really critical is the extent to which the development costs are shared across a vast global community of developers who have many different means of support. This saves costs to the companies who are using a product being developed in such a fashion; it gives choice to customers about whether they can get their support from company A or company B; programmers who don’t like the way things are going at one company have an easier time changing jobs while still working on the same project; it’s a win-win-win scenario.

In contrast, if a project decides to release its code under an open source license, but nearly all the developers remain employed by a single company, it doesn’t really change the dynamic compared to when the project was previously under a closed-source license. It is a necessary but not sufficient step towards attracting outside contributors, and eventually migrating towards having a true open source development community. But if those further steps are not taken, the hopes that users will think that some project is “cool” because it is under an open-source license will ultimately be in vain. The “Generation Y”/Millennial Generation in particular are very sensitive indeed to Astroturfing-style marketing tactics.

Ok, so this is why the distinction matters. Given that it does, what terms shall we use? I still like “Organic” vs “Non-organic”. While it may not have been intended by the Mozilla Foundation, the description in their web page, “only a small percentage of whom are actual employees [of the Mozilla Foundation]“, is very much what I and others have been trying to describe. And while I originally used the description “Projects which have an Open Source Development Community” vs “Projects with an Open Source License but which are dominated by employees from a single company”, I think we can all agree these are very awkward. We need a better shorthand.

When Brian Aker from MySQL suggested “Organic” vs “Non-Organic” Open Source, and I think those terms work well. If some folks think that “Non-Organic” is somehow pejorative (hey, at least we didn’t say “genetically modified Open Source” :-), I suppose we could use Synthetic Open Source. I’m not really convinced that is any much more appetizing, myself, however.

So what would be better terms to use? Please give me some suggestions, and maybe we can come up with a better set of words that everyone is happy with.

Links — 2008-04-25

Fri, 04/25/2008 - 09:57
The Open Source Commands
Really good ideas that companies should take to heart.
Open Source Commandments II: Passover Penguins
More really good ideas, especially for companies like Sun…
Did Canonical Just Get Punked by Red Hat and Novell?
Interesting thoughts about Linux desktop strategies
rPath to OEM SUSE Linux Enterprise Server from Novell for Appliances
I know a bunch of the folks at rPath, and I very much respect their technology; I think this is a very good thing for them.
Does Microsoft CEO Steve Ballmer need an intervention?
Does anyone think a Microsoft/Yahoo merger makes sense besides Mr. Ballmer?

Organic vs. Non-organic Open Source

Thu, 04/24/2008 - 07:24

Brian Aker dropped by and replied to my previous essay by making the following comment:

I believe you are hitting the nail on the “organic” vs “nonorganic” open source. I do not believe we have a model for going from one to the other. Linux and Apache both have very different models for contribution… but I don’t believe either are really optimized at this point.

Optimization to me would lead to a system of “less priests” and more inclusion.

I made an initial reply as comment, and then decided it was so long that I should promote it to a top-level post.

I assume that when Brian talks about “organic open source” what he means is what I was calling an “open source development community”. Some googling turned up the following definition from Mozilla Firefox’s organic software page: “Our most well-known product, Firefox, is created by an international movement of thousands, only a small percentage of whom are actual employees.”

This puts it in contrast with “non-organic” software, where all or nearly all of the developers are employed by one company. (And anyone who proves talented at adding features to that source base soon gets a job offer by that one company. :-) By that definition we can certainly see projects like Wine, Mysql, Ghostscript (at one time), and others as fitting into that model, and being quite successful. There’s nothing really wrong with the non-organic software model, although many of them have struggled to make enough money when competing with pure proprietary softare competitors, with MySQL perhaps being the exception which proves the rule.

In most of these cases, though, the project started more as an organic open source, and then transitioned into the non-organic model when there was a desire to monetize the project — and/or when the open source programmers decided that it would be nice if they could turn their avocation into a vocation, and let their hobby put food on the family table.

Solaris, of course, is doing something else quite different, though. They are trying to make the transition from a proprietary customer/supplier relationship to trying to develop an Open Source community — and what John’s candidate statement pointed out is that they weren’t really interested in creating an organic open source developer community at all, but they wanted the fruits of an open source community — with plenty of application developers, end-users, etc., all participating in that community.

We don’t have a lot of precedent for projects who try to go in this direction, but I suspect they are skipping a step when they try to go to the end step without bothering to try to make themselves open to outside developers. And by continuing to act like a corporation, they end up shooting themselves in the foot. For example, the OpenSolaris license still prohibits people from publishing benchmarks or comparisons with other operating systems. Very common in closed-source operating systems and databases, but it discourages people from even trying to make things better, both within and outside of the Open Solaris core team. Instead, they respond to posts like David Miller’s with “Have you ever kissed a girl?”. (Thanks, Simon, for that quote; I had seen it before, but not for a while, and it pretty well sums up the sheer arrogance of the Open Solaris development team.)

So while Linux may not be completely optimized in terms of “less priests” and more inclusion, at least over 1200 developers contributed to 2.6.25 during its development cycle. Compared to that, Open Solaris is positively dominated by “high priests” and with a “you may not touch the holy-of-holies” attitude; heck, they won’t even allow you to compare them to other religions without branding you a heretic and suing you for licensing violations!

What Sun was trying to do with Open Solaris

Sat, 04/19/2008 - 16:51

I was recently checking to see what, if any follow-up there had been from Sun’s ham-handed handling of the Open Solaris Trademark, and I ran across this very interesting comment from John Plocher’s Candidate Statement for the Open Solaris Governing Board:

“I also think there was a misunderstanding about what Sun desired when it launched the community (in part) to encourage developers to adopt and use Solaris. My take is that, while there *is* value in getting more kernel, driver and utility developers contributing to and porting the (open) Solaris operating system, there is significantly *more* value in having a whole undivided ecosystem based on a compatible set of distributions, where application developers, university students, custom distro builders and users are all able to take advantage of each other’s work.

Put these two things together, and you can see Sun’s predicament. Sun *wanted* a community that empowered application developers, but *got* a community aimed squarely at kernel hackers. Whether you see this as the “kernel.org -vs- Ubuntu” fight, or the “fully open -vs- MySQL model” argument, in my opinion, it all is simply a reflection of the above mismatched expectations.”

So that explains why it’s take three long years to try to get basic open source development tools (such as putting Open Solaris source code in a distributed SCM located outside of the Sun firewall) for Open Solaris. It was never was Sun’s intention to try to promote a kernel engineering community, or at least, it was certainly not a high priority for them to do so. This can be shown by the fact that as of this writing they still are using the incredibly clunky requester/sponsor system for getting patches into Solaris; setting up a git or mercurial server is not rocket science. This lack explains why Linus gets more contributions while brushing his teeth than Open Solaris gets in a week.

So if you run into a Sun salescritter or a Sun CEO claiming that OpenSolaris is just like Linux, it’s not. Fundamentally, Open Solaris has been released under a Open Source license, but it is not an Open Source development community. Maybe it will be someday, as some Sun executives have claimed, but it’s definitely not a priority by Sun; if it was, it would have been done before now. And why not? After all, they are getting all of the marketing benefit of claiming that Solaris is “just like Linux”, without having to deal with any of the messy costs of working with an outside community. As a tactical measure, astroturfing is certainly a valid marketing trick. But after three years, the excuse of “just you wait a little longer, we’re just trying to figure this open source community stuff out”, is starting to wear a little thin.

Furthermore, if (as John Ploncher claims) this was about “empowering application programmers”, why was it that Sun’s first act was to trumpet how wonderful it was to release the Solaris source code under a Open Source license? This only seems to make sense if the Open Solaris initiative was really a cynical marketing tactic to try to save Solaris from being viewed as irrelevant. If that was Sun’s intention, I think it is fair to say that from a marketing point of view, the tactic has been at least partially successful — although as John has admitted, the goal of creating a full community with application developers, university students, and so on, hasn’t materialized for Open Solaris. Sun has the dream; the Linux community is living it.

However, from business standpoint, I wonder if Sun will really be able to sustain their Solaris engineering team if they will really be doing all of the work themselves, and outside contributions continue at the rate of 0.6 patches per day. After all, the margins when you are selling low-cost AMD servers are much lower than when you are selling über-expensive SPARC servers. With Linux, we have a major advantage in that kernel improvements are coming from multiple companies in the ecosystem, instead of being paid for by a single company. And given that 70-80% of Sun’s AMD servers are running Linux, not Solaris, it’s not clear how Sun justifies their Solaris engineering costs to their shareholders. Furthermore, if Solaris on x86_64 were to actually take off, there’s nothing to stop competitors from selling Solaris support — except the competitors won’t have to pay the engineering costs to maintain and improve Solaris, so they would be able to provide the support much more cheaply than Sun could. So while Sun’s marketing tactics have kept Solaris alive in some verticals, I have to question how successful Sun will be in the long-term.

Update: I’ve posted a reply to Brian Aker’s comments as a follow up that would probably be interesting to those folks who are interested in the ideas found in this essay.

Update^2: John Plocher’s name is spelled with an ‘h’, which I had omitted. My deep apologies to him for getting his name wrong. I’ve fixed it in the post.

AT&T: Customer support horrors

Tue, 04/15/2008 - 10:23

This morning, I just wasted two hours of my life trying to deal with a bill with AT&T. I am a work-at-home employee, and my company has a contract with AT&T so that when I deal 1-700-xxx-xxxx, I can reach the internal corporate phone network. In addition, long distance calls on my home office line are billed to the company at the pre-negotiated corporate rates. I also had a (long-dormant) AT&T long distance account, dating from before I started working at this company. Starting at the beginning of the year, that account grew a $18 monthly fee. When I tried to make it go away, the AT&T consumer side of the house said that according to Verizon, that was because I had my long distance service through AT&T. Which was true, in a sense — AT&T was providing my service, but through a corporate account. But the consumer side of AT&T didn’t understand that, and were in my opinion, willfully ignorant.

Several phone calls later, I got the 1-800 number for the AT&T corporate side of the house, and those folks said they couldn’t look at consumer/personal AT&T accounts, and implied it was my fault that I had both accounts on the line. (This took over an hour while the support person tried multiple things, all of which was not helpful at all.)

I finally googled for AT&T CEO’s office, and found a number on the consumerist.com web site that claimed to be the AT&T CEO’s office. It had long since been directed to a help center, but after I told my tale, I was quickly transferred to an executive customer support person, who was able to fix the problem in ten minutes.

The only questions remaining are:

1) Why can’t the different parts of AT&T talk to one another?
2) Will this problem really be solved, or will I see another bill in a month or two and have to spend more time dealing with this mess all over again?
3) When will VOIP services put AT&T long distance out of its misery? (Given the absolute frustration of this morning, this can’t happen soon enough.)
4) Will AT&T compensate me for two hours of frustration and of my life that I will never get back. (Not bloody likely.)

One thing is for certain, it will be a long, long, LONG time before I will ever voluntarily choose AT&T to provide service for just about anything. Even an iPhone wouldn’t be enough inducement….