Privacy


“Early Experiments in Cloud Computing”
by Gale Gruman
InforWorld.com
April 7, 2008

What do the New York Times and the Nasdaq have in common?  Both companies have made a critical and substantial leap into the world of cloud computing through Amazon.com.  Amazon offers its cloud computing infrastructure to third parties for “internet-provisioned computing and storage services.” [pg. 3]  Amazon’s two products are called Elastic Compute Cloud (EC2) and Simple Storage Service (S3).
The New York Times used S3 in order to convert 11 million articles published between 1851 (the year the newspaper was founded) and 1989 from TIFF to PDF files.  The reason the newspaper converted the files was so that they could shrink them in size and then make them accessible through the nytimes.com website search engine.   What actually happen Ned?  The New York Times cut up old newspapers into columns to fit into the scanners in TIFF  format and then uploaded the files to S3.  In total, the TIFF data took up 4TB worth of storage space (1TB = 1,000 GB).  Then, the New York Times used EC2 to convert the 4 TB of raw data (the articles as they scanned them) into PDF files reducing the total size of the data to roughly 1.5 TB.   The greatest part of the story is that, “[t]he Times didn’t coordinate the job with Amazon – someone in IT just signed up for the service on the web using a credit card, then began uploading the data.” [pg. 4]  The New York Times IT staff completed the job in roughly 24 hours using 100 Linux computers.
Nasdaq sought to make extra revenue by selling historic data regarding stocks and investment funds.  Nasdaq turned to Amazon to host the data using S3 and also had Amazon design a special reader application using Adobe AIR technology in order for customers to be able to view the data purchased.
“The traditional approach wouldn’t have gotten off the ground economically…[t]he expenses of keeping all that data online was too high.  So Nasdaq took its market data and created flat files for every entity, each holding enough data for a 10-minute replay of the stock’s or fund’s price changes, on a second-by-second basis.  (It adds 100,000 files per day to the several million it started with.)  The Adobe AIR app Courbois’ (the VP for Data Products at Nasdaq) team put together in just a couple of days pulls in the flat files stored at Amazon.com and then creates the replay animations from them.” [pg. 4]
Several issues arise in this story that are pertinent to further discussion regarding policy matters and cloud computing.  As is the case in many other situations, there is a concern about privacy.  For both Nasdaq and the New York Times there should also be concerns regarding the safety and availability of their data.  If either of these companies wants access to its information, they are at the mercy of Amazon, instead of controlling their own data by purchasing and maintaining its own additional servers to do the job.  Of course, both companies measured this risk and it is evident that in both cases, they determined that the cost (or the risk of holding their data remotely in these instances) was outweighed by the benefits (saving lots of money on hard ware as well as labor to maintain the hard wear to house the data).

“Computing In The Clouds”
by Aaron Weiss
The Guide to Computing Literature, Networker Magazine
December 2007
In 1943, IBM Chairman Thomas Watson said, “I think there’s a world market for maybe five computer.”  [pg. 18]  The personal computing industry that began in the 1970’s and current popularity of cloud computing prove that Watson’s statement could not have been more wrong.  Weiss defines cloud computing generally as the ability to distribute computer processes over a large number of small computers/servers in order to maximize the efficient use of resources.  The idea being if one were to do an internet search through Google, for example, that Google could distribute the work of doing the actual search over a large number of computers rather than one large (and powerful) computer doing the search and returning the results to the user.   The relevant question, in this case, is how does Google most efficiently distribute the task of fulfilling the search to many individual computers/servers in order to decrease the time it takes to conduct the search and then return the results to the user.

The article also provides working definitions of SaaS and utility computing in order to understand how they relate or should be considered as part of the larger cloud computing phenomenon.  The most important and influential SaaS established to date is the creation of web-based email.  While many individuals, organizations and companies do not entirely depend on web-based email service, the trend is quickly moving in that direction.   Weiss refers to SaaS, as in the case of web-based email, as merely a revival of an older concept known as “thin client” computing.  In the realm of cloud computing, the most relevant concern that emerges is privacy because operating in the cloud and allowing a third party to store and/or process your digital information requires a high level of trust.

This article is relevant because it sheds light on the fact that cloud computing is a popular, “buzzword almost designed to be vague[.]”  [pg. 25]  One reaction to this piece is to feel that it is not possible to provide a complete definition for the terminology ‘cloud computing.’  Nevertheless, a more appropriate conclusion might be to think of cloud computing as a trend that “draws on many existing technologies and architectures.” [pg. 25]

For a while, I used the RSS reader Newsgator.  The cool thing about this reader was that when I was at work, I could check my feeds online, but when I got home, I could synchronize my desktop reader (which is more powerful than its online counterpart) to recognize what I had saved online.  Pretty handy.

But without fundamental privacy safeguards in place, this means that Newsgator can tap into my online and desktop activity and use it however it sees fit.  Don’t get me wrong, I love the convenience of desktop-online sychnronicity (and for free, no less!), but should it be an either/or option when it comes to privacy?