PUNE Nande Campus

S. No. 44/1, 44 1/ 2, Nande Village
Pashan Sus Road, Taluka Mulshi
Pune 412 115
Dolamani Sahu: 07757029572
  020 66754651/64
  020 66754678

PUNE Mulshi Campus

Mulshi Group of Institute
Gat No. 237-243
Near Malhar Machi Resort
Sambhave Tal Mulshi
Pune 412108


A constituent of People's Empowerment Group Foundation
Premises No. 330/2, Pujali Trunk Road,
Ward No. 9, Mouza – Rajibpur, JL No. 43, 
Police Station – Budge Budge, 
Kolkata 700138, West Bengal
Landline: +91 33 24820019
Biswanath Thakur: 09230562572
Baishakhi Chatterjee: 09230562601


Plot No. 241
Sompura Industrial Area
Niduvanda Village
Sompura Hobli, Nelamangala Taluk
Bangalore 562132
080-29717901/ 080-27700169 / 27700170

Bangalore City Office

301, Anand Chambers,
# 27 Elephant Rock Road,
In front of Central library,
3rd Block, Jayanagar
Bangalore - 560 011
Dolamani Sahu: 07757029572
Landline no : 080 41612929

Lucknow City Office

B-1/96, Near Jaipuriya College,
Vineet Khand, Gomti Nagar,
Lucknow – 226010
Shimul Banerjee: 09519218833
Avijit Banerjee: 09830991821

Gurgaon City Office

Rectangle No. 41,
Maidawas Sector – 64
Gurgaon, Haryana – 122101
Vijay Kumar: 09899115369
Avijit Banerjee: 09830991821

What is Big Data?

What is Big Data?


Big data is characterized by three V-s namely Volume, Velocity and Variety. The Volume indicates huge amount of consumer data collected in real time from numerous market and other sources and can be used as an indicator of customer response to various products and services available in the market. The Velocity means the speed at which the data is collected from various online and offline sources. The data is generated at high speed by thousands of retail websites, mobile social media sites, electronic POS terminals at retail stores, banking transactions, air/rail/bus ticket booking counters, audio/video downloads and so on. Millions of events are happening per second across the globe and billions of records are generated per second which should be instantly captured and stored for further processing. Variety indicates different types of data that are generated and stored in Big Databases. The data includes numbers, text messages, image files, audio/video clips, animations, 3D, HD, unstructured data, log files, financial data, social media posts etc. Each of these data sets has specific formats and need to be stored in such a manner as they can be instantly reconstructed as and when required. So traditional database systems that are designed to handle structured low volume and low speed data will be unable to handle big data and special technology is required to store and analyze big data.

In order to store huge amount of unstructured data coming from a large variety of sources at a fast rate, NoSQL (not only SQL) databases are used that store data in the form of objects or key/value pairs and not in the form of tables. This approach is useful in case of distributed data structures and offers higher speed of operations, agility and accuracy. Another important big data technology is Apache Hadoop which is a Java based distributed computing platform that ensures fast data transfer rates under distributed database environments. Google’s MapReduce is a distributed database application framework where a database application is broken into a number of smaller parts which can run in any node of a distributed file system (such as Hadoop Distributed File System – HDFS).Some other allied technologies are Apache Hive (Hadoop data warehouse) and Apache Hbase (distributed database).

Big data can be of two types, namely online Big data and off-line Big data. Online Big data are generated online through numerous online events and are collected and stored in cloud servers. Users can subscribe to the cloud service and download and analyze the big data whenever required. Off-line big data are collected from various off-line batch processes and stored in distributed Hadoop databases using MapReduce technology. Examples of off-line big data systems are Data Warehouses, Extract Transfer & Load (ETL) systems or Business Intelligence tools. Major vendors of Big data include IBM, SAP, Oracle, Microsoft, Teradata and Amazon Web Services (AWS).