December 9, 2011

Is it cheaper to re-sequence a genome than to save it in computer memory?

Scenario: Each time you visit your doctor for a check-up, she measures your blood pressure and sequences your genome. After looking for signs that there is something wrong with you and finding nothing, she forgets your blood pressure and deletes your genome from her computer. The next time you see her, and at every subsequent check-up, she goes trough the same procedure again.
Would it really cheaper to re-sequence a genome rather than storing it? 

The short answer is: No, it wouldn't - yet. Currently the cost of sequencing a human genome is $4000, and the cost of saving it on a hard disk is $140. Clearly, it's cheaper to save than to sequence.

This is going to change. The reason is that the cost of sequencing decreases much faster than the cost of data storage. If things continue like this, the question is: When will it become cheaper to re-sequence than to store?

To get a feel for what the answer may look like, I extrapolated from current trends. I know that only half a year ago, in July 2011, it cost around $10,000 to sequence a genome. Since 2007 this cost has decreased by a staggering 81% a year.

Similarly, I know that according to Kryder's law, a derivate of Moore's law, the cost of storing a byte of data on a hard disk has decreased by an annual 29% during the last decades.

Extrapolating from those two trends, it emerges that it will be cheaper to re-sequence than to store genomic data at some point in 2014 (the point where the two lines cross).
Cost of sequencing a genome, and of storing it on a hard drive, extrapolated from current trends (conceptual)
Something that may be even more unbelievable is that if sequencing costs fall at the same rate than they have since 2007, it will cost less than one dollar to sequence a genome at some point in 2016.

Do you think that this is realistic, and if yes, what are the implications?
Underlying assumptions
Above I made some assumptions that are worth spelling out explicitly.

Whilst I expected the cost of sequencing per base to continue decreasing, I expected coverage (i.e. how often you have to sequence a base to be sure your data is correct) to remain the same. I also assumed that it'd be desirable to store all sequence data, rather than just information on where the sequence differs from what is expected (variation), which would require much less disk space.

I haven't included sample analysis or experimental design in the sequencing cost. I haven't included the cost of running the hard drive either. That's because it isn't necessary to run the drive unless you put data on it or take it off.

Probably the biggest assumption is that the cost of sequencing and computer memory will decline at the same rate it declined in the past. Nevertheless, I think that the main conclusion - that it will be cheaper to re-sequence than to save - will even hold if the rate of decline of sequencing and memory cost changed, as long as sequencing cost still declined faster than memory cost. The difference will be when the it becomes cheaper to re-sequence than to save.


  1. Thought provoking post - thanks!

    Key point surely is that the next-gen of next-gen will include a whole slew of epigenetic modifications which will presumably mean one has to retain the data to be able to compare...

    "Cost" perhaps more considerable for data mining - intriguing to think that GPs could rely on this intensive data analysis in future.

    (By no means an expert in this area).

  2. Maybe you could try to factor in
    a) labour costs of seq prep etc, which are also declining
    b) analysis costs

    and also mention that
    c) rare / irreplaceable samples will need to be kept.