Greetings Everyone : I am back after a long break . I was busy in some other aspects of my life and now i have decided to keep updating this blog to share my knowledge and experience with you all . It's always give me a great feeling when i share some this with you all and learn alot from you guys through your mails , comments and chat.
One day , i was thinking about some scenario's about a large database then one thing click in my mind i.e, "What is the size of the largest database in this world" and what database are they using ? I guess, it may be google and then search on net and find some interesting facts which i will like to share you all . Let’s get started and just keep in mind that size alone does not determine how big a database is, it’s the information it contains in the form of fields and records and eventually it all depends on the technology being employed for storage and management.
WDCC is Operated by the Max Planck Institute for Meteorology and German Climate Computing Centre, The World Data Centre for Climate is the largest database in the world with 220 terabytes of data that is readily available on the internet. Add to that 110 terabytes of climate simulation data and 6 petabytes of data stored on magnetic tapes.
The WDCC is included in the CERA database system. Access to the CERA database is possible from the Internet by use of a Java- based browser. The CERA ( Climate and Environmental Retrieving and Archiving) - data archive is realised on an ORACLE database connected to a STK Silo system. Thus large data sets may be stored under control of this System while the metadata associated with CERA permit an easy way to relocate data, which have to be retrieved.
No 2. National Energy Research Scientific Computing Center (NERSCC) :
Based in Oakland, California, National Energy Research Scientific Computing Center or NERSC is owned and operated by the Lawrence and the U.S. Department of Energy. Included in its database of 2.8 petabytes is information on atomic energy research, high energy physics experiments and simulations of the early universe.
The High Performance Storage System (HPSS) is a modern, flexible, performance-oriented mass storage system. It has been used at NERSC for archival storage since 1998.
No 3. AT&T :
Bigger than sprint, AT&T boasts 1.9 trillion calling records which contribute to 323 terabytes worth of information. One factor behind the massiveness of its database is the fact that AT&T has been maintaining databases from the time when the technology to store terabytes wasn’t even available.
The Daytona® data management system is used by AT&T to solve a wide spectrum of data management problems. For example, Daytona is managing over 312 terabytes of data in a 7x24 production data warehouse whose largest table contains over 743 billion records as of Sept 2005. Indeed, for this database, Daytona is managing over 1.924 trillion records; it could easily manage more but we ran out of data. Update: as of June 2007, Daytona is managing over 2.8 trillion records in this same data warehouse, with over 938 billion records in the largest table.
AT&T is the sole source for the Daytona product, service and support and is the only company authorized to use the Daytona trademark for a database product.
No 4. Google :
The list wouldn't be complete without Google. Subjected to around 100 million searches per day, Google is one of the largest databases in the world that has over 33 trillion database entries. Although the exact size of Google’s database is unknown, it’s said that Google accounts every single search that makes each day into its database which is around 91 million searches per day. Google stores every search and makes patterns from previous searches so that the user can be easily directed. Google also collects information of their users and stores them as entries in their database which is said to expand over 33 trillion entries. On top of that Google has simply expanded their database with Gmail and Google ads and with their acquisitions like YouTube.
Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).
No 5. Sprint :
The third largest wireless telecommunications network in the US has a database of over 55 million users. Sprint processes over 365 million call detail records per day. Making up its huge database are 2.85 trillion rows of information.
No 6. LexisNexis :
LexisNexis is a company providing computer-assisted legal research services which bought Choicepoint in 2008 , and Choicepoint was in the business of acquiring inform ation about the American population including everything from phone numbers to criminal histories. It had over 250 terabytes of data on the American population until it was bought by LexisNexis .
No 7. YouTube :
With over 60 hours of video uploaded per minute, yes, per minute, YouTube has a video database of around 45 terabytes. It has over 100 million videos being watched every day Reports say that about a 100 million videos are watched in YouTube which is about 60% of the overall number of videos watched online.
No 8. Amazon :
Containing records of more than 60 million active users, Amazon also has more than 250,000 full text books available online and allows users to comment and interact on virtually every page of the website. Overall, the Amazon database is over 42 terabytes in size.
Amazon SimpleDB is a highly available and flexible non-relational data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.
No 9. Central Intelligence Agency :
With exact size understandably not made public, the CIA has comprehensive statistics on more than 250 countries of the world and also collects and distributes information on people, places and things. Although, the database is not open to public, portions of it are made available. Freedom of Information Act (FOIA) Electronic Reading Room is one such example where 100s of items are added from the database monthly.
The ARC maintains an automated system containing information concerning each individual accession ("job"). The A RC database includes detailed information at the file folder level for each accession retired after 1978, including the job number, box and file number, file title, level of security classification, inclusive dates, and disposition instructions, including date when disposition action will be taken. Less detailed information is maintained for accessions retired before 1978.
No 10. Library of Congress :
With over 130 million items including 29 million books, photographs and maps, 10,000 new items added each day and nearly 530 miles of shelves, the Library of Congress is a wonder to behold in itself. Only the text portion of the library would take up 20 terabytes of space. If internet isn’t helping you ou t in your research, head to the oldest federal cultural institution in the United States in DC.
The Library of Congress offers a wide variety of online databases and Internet resources to the public via the Web, including its own online catalog. In addition, LC provides an easy-to-use gateway for searching other institutions' online catalogs and extensive links to resources on the Internet.
Enjoy :-)