Tuesday, February 2, 2010

NoSQL database gaining momentum...

The next generation Non Relational, NonSQL/NoSQL databases are gaining quite a lot of popularity in the world of web scale data stores. There is an interesting shift happening from the traditional row and column based relational databases to key value pair based non sql databases. They are often called as document centric databases.

Some of the limitations of traditional relational databases are as follows:

1. The data has to be normalized and should be available in row and column format. This kind of data makes best candidate for storing in relational data store. That means relational databases are not really good candidate for string non structured data where data is not available in row and column format.

2. Normalization will reduce the performance.

3. Replication between nodes is very painful and more expensive.

4. Relational database are really very hard to scale horizontally.

In other hand the Key and Value based data stores provide the same features and flexibility and good for large amount of data to be stored and processed. They also work well with non structured or semi structured meta kind data sets.

The following are the key features of this new data stores :

1. Schema free , de-normalized document storage

2. Key/value based lookups

3. Good candidates for Horizontal scaling. ( Scales well with very large no of nodes )

4. Support for map and reduce style programming

5. Built in replication

6. Simple HTTP/REST based APIs

7. Most suitable for cloud based applications

Some of the most popular document based data stores are

1. Apache CouchDB

2. MongoDB

3. Riak

4. Redis

5. ThruDB

6. Tokyo Cabinet

7. Memcached

Apache CouchDB : Apache CouchDB is created by Damien Katz. This is a document oriented, highly distributed, schema-free database written in Erlang ideal for large concurrent applications.

The database can be queried and indexed using MapReduce style. CouchDB also offers incremental replication with bi-directional collision detection and resolution.

CouchDB provides a RESTful JSON API that can be accessed from any environment that allows HTTP requests.

This is one of the first true document oriented database is designed to scale with the web and this databases is already used by many software companies.

MongoDB : MangoDB is widely used database in this category. This is written in C++ and provides all most all the features of CouchDB. MongoDB is more matured and commercially available from a company called 10gen. The database manages collections of JSON-like document which are stored in a binary format referred to as BSON.

Riak : This one of the new entrant in this space. It combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.

Redis : This is also a new project hosted in Google code project. Redis is also key value based database system written in ANSI C and runs much faster. The implementation is very similar to Memcached. Available in most of the platforms. Provides the most of the features like other document centric databases.

ThruDB : ThruDB is also hosted in Google code project. Thrudb is a set of simple services built on top of the Apache Thrift Framework. This offers much faster and flexible easy-to-use services that can enhance or replace traditional data storage and access layers.

Others : There are many implementation of document oriented database. Most of them try to provide the key features mentioned above. This is new way of storing and retrieving data. These data base models provide an alternate and very flexible way to solve the large scale web data problems, which was traditionally a big limitation with Relational databases and other database models.


No comments:

Post a Comment