What is Bigdata ? – An Introduction
Bigdata Bigdata Bigdata, now a days its become a buzzword. So what is bigdata ?, today I will try to explain it in very very easiest way.
What is Bigdata?
Bigdata is just a term given a very large set of data. It is not a technology in real sense, rather a name given to a huge amount of data set which can’t be handle by our traditional RDBMS systems. So Bigdata can be defined as A huge set of data which has Volume, Variety and Velocity. These three are characteristics of Bigdata.
volume: represents the quantity of data generated
Variety: which is the type of content, for example: Structured data, Semi-structured data and Unstructured data
Velocity: which is the speed at which data is generated and processed.
Structured Data: The data which can be stored in RDBMS database server in Row-column format. Example: A data in MySQL table
Semi-structured: the data which cannot be organised in a specialized systems like RDBMS as database but they have some organizational properties/structure so that this data can be analysed or can be processed to store in RDBMS. Example: XML,JSON data
Unstructured: This consists of email messages, MS Word files, videos, images, audio files, presentations etc.
Today, we are living in a world of Data. We often use Social Media like Facebook, Twitter, Linkedin etc in our day to day life to stay connected, sharing stuffs like texts messages, images, audios, videos with our friends which leads to lot of data generations in TBs or PBs in a single day.
Large companies like Google, Amazon.com collects data from your cookies or previous searches and showcase their products or services that you might be interested in which may sell their products.
Bigdata has been extensively used by government, news agencies, weather forcast agencies now a days to analysis.
Bigdata and Hadoop
For some people Bigdata is Hadoop or Hadoop is bigdata. But it is not so in real sense. Bigdata is a term or name given to huge chunks of data while Hadoop is a software or a distributed systems which we use to process bigdata or large amount of data. For sake of better understanding, We can say Bigdata is a problem and Hadoop is a solution. This doesn’t mean Hadoop is the only solution to bigdata, there are lot of other solutions for Bigdata available in market.
How to choose a Bigdata solution, which best suits my data?
Different companies has different amount of data and veriety of data is also one constraint. But experts always suggests that not to go for Hadoop if your data is in some Gigabytes (GBs) or Terabytes (TBs). Hadoop can be helpful if your data is in Petabytes otherwise you have other solutions like MongoDB, Cassandra, CouchDB,
Amazon Redshift, DynamoDb and lots more depending on your requirement.
We need these solutions including Hadoop for processing bigdata and for analytics.
— what is bigdata —