December 30, 2011

What is sequencing going to do to DNA chips?

When it comes to genotyping, DNA chips do a lot of the work. In this post, I ask how cheap sequencing might change that.

DNA chips provide information about the presence or absence of a limited number of known variants, but are incapable of providing information about new sequences. In other words, they only provide information about variants that are already known to exist. Despite this, DNA chips are widely used for genotyping, presumably because they're still much cheaper than sequencing, and because they can often provide more accurate information. The applications of DNA chips are widespread: Research both academic and commercial, molecular diagnostics, genetic counseling, personal genomics, forensics, hygiene monitoring, genealogy, and lots of others.

The current decrease of next-generation sequencing cost calls into question the dominance of DNA chips and microarrays in many applications. A report by J. P. Morgan from earlier this year states that "In general, we view arrays as a technology in decline, but likely with only modest changes in utilization (up or down) over the next 1-2 years [...] Greater declines are expected in 2013 and beyond".

Chip does not equal chip
It's unlikely that the all areas of the genotyping market will be equally affected by the emergence of sequencing. A useful segmentation of the genotyping market is between singleplex and multiplex. Singleplex genotyping only considers a single site in a genome, resulting in a simple yes or no answer about the presence of a genetic variant. Well-known technology platforms include Taqman and Invader. Multiplex genotyping on the other hand considers multiple sites. Well-known platforms include Affymetrix and Illumina.

To me it seems that multiplex genotyping is much more under threat from DNA sequencing than singleplex genotyping. The reason is that both multiplex genotyping and sequencing are suitable for interrogating a large number of sites in a small number of samples. Singleplex genotyping on the other hand is more suitable for interrogating a low number of sites in a large number of samples, and is therefore less under threat from DNA sequencing.
Sequencing is more likely to be a threat to multiplex than to singleplex gentyping. The arrow indicates the most likely trajectory for the sequencing market (Conceptual)

Other advantages of singleplex genotyping compared to sequencing are that the turnaround time is quicker (in the region of one hour, compared to at least two hours for the fastest sequencers), less need for sample preparation, and lower data management costs. The emergence of cheap sequencing may even be an advantage for current singleplex platform providers: Sequencing may drive discoveries which in turn may increase demand for singleplex genotyping.

In summary, to me it seems likely next-generation sequencing to be a threat to more sophisticated multiplex genotyping, whilst singleplex genotyping may not be affected or even benefit. Do you consider this to be a sensible assessment, or am I missing something important?

December 23, 2011

Is disruptive innovation a threat to established sequencing companies?

In his book The Innovator's Dilemma, Clayton M. Christensen asks: Why do established companies fail when faced with disruptive technology? A commonly heard answer is that large companies grow complacent and that their managers just don't understand new technology. 

Christensen's answer is more nuanced than that. After all, large companies often have excellent management, yet struggle with disruptive innovation.
Disruptive innovation can be defined as technology that helps to create a new market that did not exist before. Because this market is new, it is still small, and therefore it does not make financial sense for large established companies to invest in it. For them, it is much more sensible to invest in markets that offer a return that is in proportion to their firm's size. For example, a €2bn company would not see the point of investing in a €200k market.

Because well-managed companies do not invest in non-existent or tiny markets, this leaves an opening that start-up companies can exploit. Start-ups, because they are small, are happy with a smaller market and the associated smaller profit. As the market grows, the start-ups grow with it. However, eventually the growing market will start to take over the established companies' market.

There seems to be a feeling amongst many people working in the field that the sequencing market is due for a disruptive innovation.
If the sequencing market develops in the way outlined in The Innovator's Dilemma, established providers of sequencing machines would continue to provide equipment to large research centres, whilst start-ups would move into new niches that are currently uninteresting for established providers. In the following posts, I'll invest more thought into whether this is a likely scenario.

December 16, 2011

Is the sequencing market saturated?

Since 2007, the cost of sequencing has declined by around 80% each year. This is amazing progress, a testimony to the ingenuity of the sequencing industry - and, I'll argue in this post, increasingly irrelevant.

My claim is based on the observation that sequence generation (library preparation and running the sequencing machine) is only part of the cost of a sequencing project. In a recent paper, Mark Gerstein and coworkers estimated that currently around a third of the cost of a sequencing project is spent on sequence generation, whilst the rest of the money goes to sample collection, data management, data analysis and other tasks. In a few years, it is likely that those other costs remain at a similar level, whilst the cost of sequence generation is likely to continue to decline massively. Less than $100 per genome is not unrealistic.

This means that the cost of sequence generation in comparison to other costs becomes negligible, and that any further decreases in that cost will be irrelevant compared to the potential savings that can be achieved elsewhere.

The cost of library preparation and actually running a sequencing machine (Sequence generation) is likely to decline as a proportion of the total cost of a sequencing project (Conceptual)
In other words, very soon the market demand for ever cheaper sequencing may be saturated. Nevertheless, it seems that low cost is still the single most important thing established sequencing companies are aiming for.

My guess is that this will provide an opportunity for start-ups that concentrate on different performance metrics, such speed, simplicity or ease of use. In part, this is happening already. Ion Torrent has started selling a sequencer that is more expensive to run than the competition's. However, it is also more convenient to use and takes less space. It is reported to be selling well.

Do you agree that future decreases in the cost of sequencing will become less important, whilst other performance metrics will become more important? If yes, what are these performance metrics likely to be?

December 9, 2011

Is it cheaper to re-sequence a genome than to save it in computer memory?

Scenario: Each time you visit your doctor for a check-up, she measures your blood pressure and sequences your genome. After looking for signs that there is something wrong with you and finding nothing, she forgets your blood pressure and deletes your genome from her computer. The next time you see her, and at every subsequent check-up, she goes trough the same procedure again.
Would it really cheaper to re-sequence a genome rather than storing it? 

The short answer is: No, it wouldn't - yet. Currently the cost of sequencing a human genome is $4000, and the cost of saving it on a hard disk is $140. Clearly, it's cheaper to save than to sequence.

This is going to change. The reason is that the cost of sequencing decreases much faster than the cost of data storage. If things continue like this, the question is: When will it become cheaper to re-sequence than to store?

To get a feel for what the answer may look like, I extrapolated from current trends. I know that only half a year ago, in July 2011, it cost around $10,000 to sequence a genome. Since 2007 this cost has decreased by a staggering 81% a year.

Similarly, I know that according to Kryder's law, a derivate of Moore's law, the cost of storing a byte of data on a hard disk has decreased by an annual 29% during the last decades.

Extrapolating from those two trends, it emerges that it will be cheaper to re-sequence than to store genomic data at some point in 2014 (the point where the two lines cross).
Cost of sequencing a genome, and of storing it on a hard drive, extrapolated from current trends (conceptual)
Something that may be even more unbelievable is that if sequencing costs fall at the same rate than they have since 2007, it will cost less than one dollar to sequence a genome at some point in 2016.

Do you think that this is realistic, and if yes, what are the implications?
Underlying assumptions
Above I made some assumptions that are worth spelling out explicitly.

Whilst I expected the cost of sequencing per base to continue decreasing, I expected coverage (i.e. how often you have to sequence a base to be sure your data is correct) to remain the same. I also assumed that it'd be desirable to store all sequence data, rather than just information on where the sequence differs from what is expected (variation), which would require much less disk space.

I haven't included sample analysis or experimental design in the sequencing cost. I haven't included the cost of running the hard drive either. That's because it isn't necessary to run the drive unless you put data on it or take it off.

Probably the biggest assumption is that the cost of sequencing and computer memory will decline at the same rate it declined in the past. Nevertheless, I think that the main conclusion - that it will be cheaper to re-sequence than to save - will even hold if the rate of decline of sequencing and memory cost changed, as long as sequencing cost still declined faster than memory cost. The difference will be when the it becomes cheaper to re-sequence than to save.

December 3, 2011

What's Seqonomics about?

Genome sequencing is evolving rapidly. The most obvious change is that it's becoming cheaper. The first human genome, released in 2001, cost a billion dollars. As of late 2011, the cost of sequencing a genome is $4,000 - this means that today you could sequence a quarter of a million people with the budget of the Human Genome Project.

But sequencing is changing in other ways too: Sequencing machines are becoming smaller and easier to use. As a result, new markets and new applications emerge.

In this blog, I'll explore the changes that are happening in the DNA sequencing market right now, asking questions such as:
  • Does the history of the computing industry teach us anything about the future of the sequencing industry?
  • What new applications will there be for radically cheaper DNA sequencing technology?
  • Will genome sequences be mainly stored in the cloud?
  • How will the purchase criteria for sequencing machines change in the future?
For the next few months, I'll post a new blog post every Friday. My entries will follow the dictum "Better boldly wrong rather than not having tried at all". As a result, I expect my posts to sometimes be controversial, and I hope that they'll cause some interesting discussion.