Wednesday, 02 August 2017 05:08

Iteration 2 - Understanding the Hadoop architecture and Hadoop data analysis Featured

Written by
Rate this item
(2 votes)

Iteration 2 - Understanding the Hadoop architecture and Hadoop data analysis
Student name: pranathi uppala

Email: This email address is being protected from spambots. You need JavaScript enabled to view it. 
Course Title
Date of submission

A process involving understanding Hadoop architecture as well as Hadoop data analysis shall take two weeks. During the first half of the week, I shall meet with cloud solutions specialist Mr. Willie. The researcher shall be required to understand big data analytic architecture. The activities involved shall include drawing various architectural designs such as business intelligence IT transformation management using cloud computing data center. Other architectures shall include data warehousing designing, and data mining designing (D. K. Willie, Personal communication, 01, May 2017). The second task shall involve conducting personal research with the intention of understanding data driven on line websites. The session shall involve researchers practicing big data approaches such as filling architectural design with searching systems, log processing systems analytical systems, video and imaging analysis systems as well as data and information retention using data analytics with Hadoop (D. K. Willie, Personal communication, 02, May 2017).
The second-week session shall include understanding big data architectures that use business intelligence operations. Researchers shall be required to understand various architectural designs such as Direct attached storage (DAS), network attached storage (NAS) and Storage Area Network (SAN) architect (F. G. Naomi, Personal communication, 03, May 2017)ture. The facilitator during a second-week session shall be Miss. Naomi specialist in big data analytics with Hadoop data nodes. The last session shall involve understanding big data applications domains such as digital marketing operations. The study shall incorporate understanding Hadoop web analytics, gold path analysis, data exploitation as well as data exploration, recovery as fraud detection and protection. Researchers shall be required to understand real time analysis using five big data analytics that implements Hadoop (F. G. Naomi, Personal communication, 04, May 2017).
I took part in analysis structure is designing distributed data analytics. The various structural operations drawn include making distributed file systems that support Hadoop data operation especially big data concepts. Study session has covered effective and efficient architectures such as cloud computing and internet of things support. Researchers were capable of drawing systems that can implement large data volumes to several Peta bytes through use or virtual technology (D. K. Willie, Personal communication, 05, May 2017). I took part in personal research involving web site operations. Hadoop architecture includes implementing web logs, system logs, and systems metrics. Research work involved benchmarking blogs, network systems as well as big data filtering big data analytics.
Big data architectures that use Business architecture included computing structural designs of systems such as SA, DAS, and NAS. Activities performed included designing similar boxes with a purpose of reducing network latency. During designing business intelligence systems, I was introduced in designing semi-structured and unstructured data and information network switches. Business Intelligent systems architecture incorporates the use of machine generated data and information (F. G. Naomi, Personal communication, 07, May 2017). Design architecture applied remote device insight, remote sense mechanism and location based intelligence. During designing of big data applications domains, I was involved in performing big data analysis with Hadoop. Through the assistance of Miss Naomi, I was capable of designing digital marketing applications that support Extraction Loading and Translation. Business applications that we designed to use Hadoop included big data analytic tools with retail, trade surveillance and Customer relationship management insurance. The incorporated analytical tools included online security analysis tools, data analysis, and business analysis tools (Chen, Mao, & Liu, 2014).
The session involving understanding Hadoop architecture as well as Hadoop data analysis has led to several discoveries involving efficient and effective architectures. I realized that Hadoop is the cloud computing architecture where most of data and information scientists get together with the intention of performing highly parallelized operations using big data techniques. Through the study, I have explored Hadoop and discovered that it has many levels of complexity. I have discovered that the architecture implements massive parallel programming architecture. Hadoop systems have a high level of scalability as well as high availability of carrying large data and information (Di, Kondo, & Cappello, 2014). I observed that Hadoop architecture systems support big data because of huge presence databases that support websites, blogs, social media and smart sensors. Through entire study, I have discovered that Hadoop architecture is implemented using machine learning using algorithms that apply learning capability for distributed operations. Techniques such as mining, data warehousing, and data marts get applied to effectively and efficiently classify and learn from analyzed data (Gani, Siddiqa, Shamshirband, & Hanum, 2016).
I observed that various technologies are applied to facilitate compatibility of different architectures. The criteria applied include using tools such as Sqoop and Apache Flume for ingesting data and information from relational databases.Through training session facilitated by Miss. Naomi, I have observed how effectively and efficiently Hadoop systems can apply hands-on, and project-based techniques in supporting real time and higher profiling systems. Observed principles involving practical data and information included changes are, eliminating physical and manual skills for analyzing data and information. Through bench marking observations, I observed how Pig architecture could be used for analyzing huge data and information sets.
Study session has equipped me with data and information that has led to complete understanding of core concepts involving Hadoop and cluster computing. Through working closely with Mr. Willie, I have acquired skills in designing patterns and parallel analytical algorithms that are responsible for creating distributed data analysis tools. The research operations have also facilitated complete coverage of methodologies applied in learning about data and information managing, mining, as well as warehousing using a distributed context that rely on Apache Hive and Hadoop database . The entire research study has equipped me with different experiences required for developing Hadoop program constructs like program complexity, Hadoop system as well as Spark applications that are developed within the scope of apache pig and spark data and information frames (Di, Kondo, & Cappello, 2014). Through the study, I have understood techniques applied by machine learning such as performing machine learning techniques. The research has facilitated complete implementation of programming concepts such as classification, clustering, and collaborative filtering using Spark’s MLlib development interface.
Study research has provided direct ways of dealing with data sets in distribution systems. The Hive tool can be applied in controlling as well as managing large datasets in distributed storage system. The future studies shall involve learning the process involved in configuring the Hadoop distributed file system (HDFS) (Zezula, 2015). I have acquired technique involved in performing various processes that are implemented in ingesting various applications that use MapReduce. The future research work shall facilitate effective and efficient copying of data and information from one cluster to cluster. Future research process shall facilitate the complete creation of data and information through creating a summary of data and queries.


Chen, M., Mao, S., & Liu, Y. (April 01, 2014). Big Data: A Survey. Mobile Networks and Applications : the Journal of Special Issues on Mobility of Systems, Users, Data and Computing, 19, 2, 171-209.
Deng, Z., Hu, Y., Zhu, M., Huang, X., & Du, B. (June 01, 2015). A scalable and fast OPTICS for clustering trajectory big data. Cluster Computing : the Journal of Networks, Software Tools and Applications, 18, 2, 549-562.
Gani, A., Siddiqa, A., Shamshirband, S., & Hanum, F. (February 01, 2016). A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowledge and Information Systems : an International Journal, 46, 2, 241-284.
Di, S., Kondo, D., & Cappello, F. (July 01, 2014). Characterizing and modeling cloud applications/jobs on a Google data center. The Journal of Supercomputing : an International Journal of High-Performance Computer Design, Analysis, and Use, 69, 1, 139-160.
Zezula, P. (2015). Similarity Searching for the Big Data: Challenges and Research Objectives. Mobile Networks and Applications : the Journal of Special Issues on Mobility of Systems, Users, Data and Computing, 20, 4, 487-496.

Read 96 times Last modified on Wednesday, 02 August 2017 05:13

Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.

2014 Iteration 2 - Understanding the Hadoop architecture and Hadoop data analysis.
Powered by Joomla 1.7 Templates
Trusted Site Seal Comodo SSL Certificate SSL