Student Training Program on Big Data Analytics
20th January, 2017
A Student Training Program on Big Data Analytics was conducted by NITTTR, Chandigarh on January 20, 2017 atDronacharya Group of Institutions, Greater Noida through ICT. The basic objective of the course was to understand the benefits and market perspectives of Big Data Technology. The students of fourth semester from the departments of Computer Science & Engineering, Information Technology and Computer Science & Information Technology departments attended the training. Ms Gargi Amoli (Assistant Professor, Information Technology) and Ms. Monika Sinha (Assistant Professor, MBA) coordinated the workshop.
The session was initiated by Mr. Aditya Bhardwaj (PhD Scholar, Dept of CSE, NITTTR, Chandigarh) with a presentation on“Introduction to Big Data Analytics”. He discussed the various sources of Big Data like social media and networks, scientific instruments, mobile devices and sensor technology. He talked about the growth in the popularity of social networking. He stated that the world’s data volume grows by 40% every year. He talked about the type of data generated and stored by various sectors. He stated the benefits of Big Data. He discussed the 3Vs of Big Data: Data Velocity, Data Volume and Data Variety. He then discussed the analytics tools like Microsoft Excel, Hadoop, Rapid Miner and Tableau. He gave an insight to Analytical Big Data Systems and Operation Big Data Systems and then differentiated between the two categories in terms of latency, concurrency, access patterns, data scope and technology.
Mr. Bhardwaj narrated the skills needed by a Data Scientist like Python, Hadoop Platform and predictive analytics tools. There was a discussion on the role of Data Science in Big Data Analytics. He gave an overview of the working of online advertising. The challenges associated with Big Data were discussed. The major challenges include capturing data, storage, searching, sharing, transfer, analysis and presentation. An essential skill for Big Data Analytics is the curiosity to dig deeper in to data and solving a problem by its root cause. Big Data value chain and the predicted lack of talent for its related technologies were discussed.
Ms. Geetika Goyal (M.Tech. Student, Dept of CSE, NITTTR, Chandigarh) gave a presentation on “Introduction to Hadoop”. She discussed the origin and milestones of the hadoop platform. She defined Hadoop as an open source software framework that supports data in distributed applications, licensed under the Apache v2 license. The framework is designed to scale up from single servers to thousands of machines, each offering local computation and storage. She stated the features of Hadoop such as fault tolerance, high scalability and availability. She described the core layers of Hadoop architecture which include processing/communication layer and storage layer. She talked about the Hadoop Ecosystem which includes data management, data access, data processing and data storage.
Ms. Goyal stated the three main application of Hadoop: advertisement, search and security. She gave an insight of the design requirements of Hadoop in Facebook messages. She then explained the process of validating data. There was a discussion on Hadoop MapReduce and its components which includes JobClient, Job Tracker and Task Tracker. She defined MapReduce as a parallel programming model for writing distributed applications devised at Google for efficient processing of large amounts of data on large clusters. There was a discussion on Hadoop Distributed File System which is based on the Google File System (GFS) and provides a distributed file system that is designed to run on commodity hardware.
In the afternoon there was a hands-on session on Big Data Analytics Tools by Mr. Nitesh Kumar (M.Tech. Student, Dept of CSE, NITTTR, Chandigarh). He stated that prior exposure to Core Java, database concepts, and any of the Linux operating system flavours are required to work on the tools. He stated the pre-installation setup of Hadoop framework. He demonstrated the installation of Java Development Toolkit which is a prerequisite for Hadoop Platform. He showed how to verify the existence of java in the system using the command “java -version”. The steps to create a user were demonstrated. Mr Nitesh recommended to create a separate user for Hadoop to isolate Hadoop file system from Unix file system. He then showed the process of setting up the environment variables. He demonstrated the verification of Hadoop installation and access of Hadoop on browser. With the help of commands he demonstrated how Hadoop runs code across a cluster of computers. He discussed the core tasks that Hadoop performs.
Mr. Nitesh stated the advantages of Hadoop framework. Students from the participating institutes asked queries about Hadoop Framework, Hadoop Distributed File System and role of Big Data in Internet-Of-Things. Mr Nitesh concluded the session with the discussion on the applications and major applications of Big Data.
All the presentations of the training program had been very informative and interest-arousing. The participatory nature of the sessions really heightened the learning experience. The workshop was well-crafted and the course material was fully structured and complete. The students who are aspiring to become Software Professionals, Analytics Professionals, and ETL developers have benefitted from the training program.