“Early Experiments in Cloud Computing”
by Gale Gruman
InforWorld.com
April 7, 2008

What do the New York Times and the Nasdaq have in common?  Both companies have made a critical and substantial leap into the world of cloud computing through Amazon.com.  Amazon offers its cloud computing infrastructure to third parties for “internet-provisioned computing and storage services.” [pg. 3]  Amazon’s two products are called Elastic Compute Cloud (EC2) and Simple Storage Service (S3).
The New York Times used S3 in order to convert 11 million articles published between 1851 (the year the newspaper was founded) and 1989 from TIFF to PDF files.  The reason the newspaper converted the files was so that they could shrink them in size and then make them accessible through the nytimes.com website search engine.   What actually happen Ned?  The New York Times cut up old newspapers into columns to fit into the scanners in TIFF  format and then uploaded the files to S3.  In total, the TIFF data took up 4TB worth of storage space (1TB = 1,000 GB).  Then, the New York Times used EC2 to convert the 4 TB of raw data (the articles as they scanned them) into PDF files reducing the total size of the data to roughly 1.5 TB.   The greatest part of the story is that, “[t]he Times didn’t coordinate the job with Amazon – someone in IT just signed up for the service on the web using a credit card, then began uploading the data.” [pg. 4]  The New York Times IT staff completed the job in roughly 24 hours using 100 Linux computers.
Nasdaq sought to make extra revenue by selling historic data regarding stocks and investment funds.  Nasdaq turned to Amazon to host the data using S3 and also had Amazon design a special reader application using Adobe AIR technology in order for customers to be able to view the data purchased.
“The traditional approach wouldn’t have gotten off the ground economically…[t]he expenses of keeping all that data online was too high.  So Nasdaq took its market data and created flat files for every entity, each holding enough data for a 10-minute replay of the stock’s or fund’s price changes, on a second-by-second basis.  (It adds 100,000 files per day to the several million it started with.)  The Adobe AIR app Courbois’ (the VP for Data Products at Nasdaq) team put together in just a couple of days pulls in the flat files stored at Amazon.com and then creates the replay animations from them.” [pg. 4]
Several issues arise in this story that are pertinent to further discussion regarding policy matters and cloud computing.  As is the case in many other situations, there is a concern about privacy.  For both Nasdaq and the New York Times there should also be concerns regarding the safety and availability of their data.  If either of these companies wants access to its information, they are at the mercy of Amazon, instead of controlling their own data by purchasing and maintaining its own additional servers to do the job.  Of course, both companies measured this risk and it is evident that in both cases, they determined that the cost (or the risk of holding their data remotely in these instances) was outweighed by the benefits (saving lots of money on hard ware as well as labor to maintain the hard wear to house the data).