Saturday, April 20, 2013

Largest Databases of the World

Greetings Everyone  I am  back after a  long break . I  was  busy  in some other  aspects of my life and now i have decided to keep updating this blog to share my knowledge and experience with you all . It's always give me a great feeling when i share some this with you all and learn alot from you guys through your mails , comments and chat.

One  day , i  was thinking  about  some scenario's  about  a  large database  then  one  thing  click  in my mind i.e, "What  is the size of  the largest database in  this world" and  what  database are  they using ? I guess, it  may be  google  and  then  search on net  and  find  some  interesting  facts  which  i will like to share you all . Let’s get started and just keep in mind that size alone does not determine how big a database is, it’s the information it contains in the form of fields and records  and eventually it all depends on the technology being employed for storage and management. 

No 1. The World Data Centre for Climate (WDCC)
WDCC is  Operated  by  the Max Planck Institute for Meteorology and German Climate Computing Centre, The  World  Data Centre  for Climate  is  the largest database in the world with 220 terabytes of data that is readily available on the internet. Add to that 110 terabytes of climate simulation data and 6 petabytes of data stored on magnetic tapes.

The  WDCC  is  included  in  the CERA  database system. Access  to  the  CERA  database  is possible from  the Internet  by  use  of  a Java- based  browser. The  CERA ( Climate  and  Environmental Retrieving and Archiving) - data archive is realised on an ORACLE database connected to a STK Silo system. Thus  large data  sets  may  be stored under control of this System while the metadata associated with CERA permit an easy way to relocate data, which have to be retrieved.

No 2. National Energy Research Scientific Computing Center  (NERSCC) :
Based  in Oakland, California, National  Energy Research  Scientific Computing Center or NERSC is owned and operated by the Lawrence and  the  U.S.  Department of  Energy.  Included in its  database of 2.8 petabytes  is  information on  atomic energy  research, high energy physics experiments and simulations of the early universe.

The High Performance Storage System (HPSS) is a modern, flexible, performance-oriented mass storage system.  It has been used at NERSC for archival storage since 1998.

No 3.  AT&T :
Bigger than sprint, AT&T boasts 1.9 trillion calling records which contribute to 323 terabytes worth of information. One  factor behind  the  massiveness  of  its database  is the fact that AT&T has been maintaining databases from the time when the technology to store terabytes wasn’t even available.

The  Daytona®  data management  system  is  used  by AT&T to solve a wide spectrum of data management problems. For example, Daytona  is  managing  over 312 terabytes of data in a 7x24 production  data warehouse  whose largest  table  contains over 743 billion records as of Sept 2005. Indeed, for  this  database, Daytona  is managing over 1.924  trillion records; it  could easily manage more but  we  ran  out of  data. Update: as of June 2007, Daytona is managing over 2.8 trillion records in  this same data warehouse, with over 938 billion records in the largest table.

AT&T  is  the  sole source for the Daytona product, service and support and is the only company authorized to use the Daytona trademark for a database product.

No 4. Google  :
The   list  wouldn't  be  complete  without  Google. Subjected  to  around  100  million  searches per day, Google  is one  of  the  largest databases  in  the  world  that  has over 33  trillion  database entries. Although  the  exact size of Google’s database  is unknown, it’s  said  that Google  accounts every  single search  that  makes  each day into  its  database which  is  around 91 million  searches per day. Google  stores  every  search and  makes patterns  from  previous  searches so  that  the user can be  easily directed. Google also  collects  information  of their users and stores them as entries in their database which is said to expand over 33 trillion entries. On top of that Google has simply expanded their database with Gmail and Google ads and with their acquisitions like YouTube. 

Bigtable  is  a  distributed  storage  system (built by Google)  for  managing  structured data  that  is designed to scale to a very large size: petabytes of data across thousands of commodity servers.Many projects  at Google  store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

No 5. Sprint  :
The third largest wireless telecommunications network in the US has a database of over 55 million users. Sprint processes over 365 million call detail records per day. Making up its huge database are 2.85 trillion rows of information.

No 6. LexisNexis  :
LexisNexis  is  a  company providing  computer-assisted  legal  research  services  which  bought Choicepoint  in 2008 , and   Choicepoint  was  in  the  business of  acquiring  inform ation  about  the American  population  including  everything  from  phone numbers to criminal histories. It had  over 250 terabytes of data on  the  American population  until it was bought by LexisNexis .

No 7. YouTube  :
With over  60 hours  of video  uploaded  per  minute, yes,  per minute, YouTube  has a  video  database of  around  45 terabytes. It  has  over 100 million  videos  being  watched every  day  Reports say that about a 100 million  videos are  watched in  YouTube which is about 60% of the  overall number of videos watched online.

No 8. Amazon  :
Containing  records of more than 60 million  active users, Amazon also has more than 250,000 full text books available online and allows users to comment and interact on  virtually every page of the website. Overall, the Amazon database is over 42 terabytes in size.

Amazon SimpleDB is a highly available and flexible non-relational data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.

No 9. Central Intelligence Agency  :
With  exact  size  understandably  not  made  public, the  CIA  has  comprehensive  statistics on  more than 250  countries  of  the world  and  also collects  and distributes  information  on  people, places and things. Although,  the database  is  not  open to public, portions  of  it  are  made available. Freedom of Information Act (FOIA)  Electronic  Reading  Room  is  one  such  example  where  100s  of items are  added  from  the  database  monthly.

The  ARC  maintains  an  automated  system  containing  information  concerning  each  individual accession ("job"). The A RC database  includes  detailed  information  at  the  file  folder level for each accession retired  after  1978,  including  the  job  number, box  and  file number,  file  title, level  of security classification, inclusive dates, and  disposition  instructions, including  date  when  disposition action  will  be taken. Less detailed information is maintained for accessions retired before 1978.

No 10.  Library of Congress  :
With  over  130  million  items  including  29  million  books,  photographs  and  maps, 10,000  new items added  each  day and nearly  530 miles  of  shelves, the Library  of  Congress  is a wonder to behold in itself. Only  the  text  portion of  the  library would  take up  20  terabytes  of  space.  If internet  isn’t helping  you  ou t in  your  research, head  to  the oldest  federal cultural  institution  in  the United States in DC.

The  Library of  Congress  offers a  wide variety  of  online databases  and  Internet  resources  to  the public via  the Web, including  its  own  online  catalog.  In  addition, LC provides  an easy-to-use gateway  for  searching  other  institutions' online  catalogs  and  extensive  links  to  resources  on  the Internet.

Souce ::

Enjoy   :-) 


Vishwanath Sharma said...

Good peace of information..Thanks for sharing.

SDBExplorer said...

Amazon SimpleDB can be useful for those who need a non-relational database for storage of smaller, non-structural data. Amazon SimpleDB has restricted storage size to 10GB per domain. Amazon SimpleDB offers simplicity and flexibility. SimpleDB automatically indexes all data. Amazon SimpleDB pricing is based on your actual box usage. You can store any UTF-8 string data in Amazon SimpleDB.

SDB Explorer provides an industry-leading and intuitive Graphical User Interface (GUI) to explore Amazon SimpleDB service in a thorough manner, and in a very efficient and user friendly way.

Parvinder said...

interesting information dear

sneha farthyal said...

Nice article to read..

keep updating us with more such intresting facts..

sneha farthyal said...

Nice article to read..

keep updating us with more such intresting facts..


Intersting Information...Thanks Sir for sharing this