[Questions] Hadoop Interview Questions on the Basics of Hadoop and Hadoop Component Technologies

Though it says interview questions, this page list down questions that can be also used to test your understanding of BigData and Hadoop’s basics and about Hadoop’s component technologies that make up the Hadoop technology stack. This doesn’t go deeper into any of the technology stack component. Having a bigger picture and knowing how the components fit together will help you make decisions in using the right component in the right way.

These are questions without answers, but there will be hints along with most questions that may be used by interviewers to give additional inputs to the candidate and can be used by candidates to quickly get a hint about the question, like where a component might fit in the overall picture.   

  1. BigData and Hadoop Basics

    1. What is Big Data?

      1. Hint: Broad term for data sets so large or complex that traditional data processing applications are inadequate.

    2. What is Hadoop?

      1. Hint: HDFS + MapReduce + Libraries (HBase, Pig etc.).

    3. How is Google File System (GFS) related to Hadoop?

      1. Hint: Hadoop originally based on a whitepaper on GFS.

    4. Why Hadoop?

      1. Hint: Cheaper (commodity hardware), Faster (parallel processing).

    5. List few use cases where Hadoop can be used?

      1. Hint: Risk modeling, recommendation engine, Ad targeting, search engine quality.

    6. What are the core Hadoop components?

      1. Hint: HDFS and MapReduce.

    7. What are the differences between Hadoop 1.x and Hadoop 2.x?

      1. Hint: YARN

    8. What are the differences between RDBMS and Hadoop way of treating the data?

      1. RDBMS=schema on write, Hadoop=Schema on read.

    9. What are the disadvantages of using traditional relational databases for data analytics?

      1. Hint: Scalability, Speed etc.

    10. Many people compare RDBMS with Hadoop. Is Hadoop a database?

      1. Hint: Hadoop is a file system with processing store. May be used along with a database (mostly NoSQL database)

    11. Will RDBMS still be useful with the popularity of Hadoop?

      1. Hint: They solve a different problem.

    12. What are NoSQL Databases?

      1. Hint: Data that is modeled in means other than the tabular relations used in relational databases. no fixed columns.

    13. List few types of NoSQL databases with examples?

      1. Hint: key/value, columnstore, documentstore etc.

    14. What is a wide column store NoSQL Database?

      1. Hint: Width of column varies. E.g. HBase

    15. What do you know about CAP theorem?

      1. Hint: Consistency, Availability, Partition tolerance.

      2. Ref: https://en.wikipedia.org/wiki/CAP_theorem

    16. How and where do Hadoop fit in the CAP theorem?

      1. Hint: Scalability (Partitioning), Flexibility (Availability).

    17. What kinds of data are good fit for Hadoop?

      1. Hint: Behavioral Data.

    18. What kinds of data are not a good fit for Hadoop?

      1. Hint: Transactional data.

    19. Mostly Hadoop is used along with NoSQL databases. Can Hadoop be used with RDBMS? Explain.

    20. What are Hadoop’s alternative products or solutions?

      1. Hint: Disco, Filemap, Zillabyte etc.

    21. What the different distributions of Hadoop Available?

      1. Hint: open source (Apache Hadoop), commercial (Cloudera, HortonWorks, MapR), cloud (AWS with open source or commercial hadoop , Windows Azure HDInsight).

    22. What are the different hadoop solutions available from Cloudera?

      1. Hint: Cloudera Enterprise, Cloudera Live etc.

    23. What is Hue?

      1. Hint: GUI part of paid cloudera live distribution.

    24. What do you know about hadoop solutions available from HortonWorks?

      1. Hint: Windows and Linux versions, VMs with installations.

    25. What do you know about hadoop solutions available from MapR?

      1. Hint: NoSQL-DB file system, add ons to apache projects, sandboxes.

    26. What do you know about cloud initiatives based on Hadoop?

      1. Hint: AWS Elastic Map Reduce, Microsoft HDInsight.

    27. What do you mean by Hadoop incubator projects? Can you list anyone from it?

      1. Ref: http://incubator.apache.org/projects

    28. How do you compare Hadoop data processing with Grid Computing? 

  2. Hadoop Technology Stack

    1. What do you know about the below components (or libraries) and how are they related to Hadoop?

      1. HDFS

        1. Hint: Hadoop Distributed File System, part of hadoop core.

      2. MapReduce

        1. Hint: Programming model for processing data in Hadoop, part of hadoop core.

        2. Ref: https://en.wikipedia.org/wiki/MapReduce

      3. YARN

        1. Hint: Stands for Yet Another Resource Negotiator, Map Reduce v2, part of hadoop core.in Hadoop v2, cluster resource management system, allows any distributed program (not just MapReduce) to run on data in a Hadoop cluster.

      4. Hbase

        1. Hint: A key-value store, wide columnstore, NoSQL, uses HDFS for its underlying storage. 

      5. Hive

        1. Hint: HQL, Query language for HBase.

      6. Pig

        1. Hint: Scripting language

      7. Mahout

        1. Hint: Machine learning, predictive analysis.

      8. Oozie

        1. Hint: Workflow, coordination of jobs.

      9. Zookeeper

        1. Hint: Coordination

      10. Sqoop

        1. Hint: Data Exchange (RDBMS)

      11. Flume

        1. Hint: Log Collector.

      12. Ambari

        1. Hint: Managing Hadoop Clusters

      13. Cassandra

      14. Drill

      15. park

      16. Shark

      17. HCatalog

      18. Lucene

      19. Hama

      20. Crunch

      21. Avro

      22. Thrift

      23. Chukwa

    2. What are the differences between MapReduce 1 and YARN?

    3. What are the GUI tools available for managing hadoop HDFS, MapReduce and/or YARN? Have you used any?

    4. Can you run hadoop (map reduce) on regular file system without HDFS?

      1. Hint: Standalone.

    5. Can you run hadoop (map reduce) on cloud file system without HDFS?

      1. Hint: Amazon S3, Azure BLOB storage.



  • Wikipedia pages for all products listed here (if available).

  • CBT Nuggets Apache Hadoop

  • Lynda.com Hadoop Fundamentals


Big Data Learning Plans

Want to know about our Big Data course plan? 

Visit http://javajee.com/bigdata-and-hadoop-course-plan.

Want to join our Volunteer Learning Program for Big Data and Hadoop?

Fill the following form selecting course as Big Data and Hadoop: http://javajee.com/content/volunteer-learning-program.

Quick Notes Finder Tags

Activities (1) advanced java (1) agile (3) App Servers (6) archived notes (2) Arrays (1) Best Practices (12) Best Practices (Design) (3) Best Practices (Java) (7) Best Practices (Java EE) (1) BigData (3) Chars & Encodings (6) coding problems (2) Collections (15) contests (3) Core Java (All) (55) course plan (2) Database (12) Design patterns (8) dev tools (3) downloads (2) eclipse (9) Essentials (1) examples (14) Exception (1) Exceptions (4) Exercise (1) exercises (6) Getting Started (18) Groovy (2) hadoop (4) hibernate (77) hibernate interview questions (6) History (1) Hot book (5) http monitoring (2) Inheritance (4) intellij (1) java 8 notes (4) Java 9 (1) Java Concepts (7) Java Core (9) java ee exercises (1) java ee interview questions (2) Java Elements (16) Java Environment (1) Java Features (4) java interview points (4) java interview questions (4) javajee initiatives (1) javajee thoughts (3) Java Performance (6) Java Programmer 1 (11) Java Programmer 2 (7) Javascript Frameworks (1) Java SE Professional (1) JPA 1 - Module (6) JPA 1 - Modules (1) JSP (1) Legacy Java (1) linked list (3) maven (1) Multithreading (16) NFR (1) No SQL (1) Object Oriented (9) OCPJP (4) OCPWCD (1) OOAD (3) Operators (4) Overloading (2) Overriding (2) Overviews (1) policies (1) programming (1) Quartz Scheduler (1) Quizzes (17) RabbitMQ (1) references (2) restful web service (3) Searching (1) security (10) Servlets (8) Servlets and JSP (31) Site Usage Guidelines (1) Sorting (1) source code management (1) spring (4) spring boot (3) Spring Examples (1) Spring Features (1) spring jpa (1) Stack (1) Streams & IO (3) Strings (11) SW Developer Tools (2) testing (1) troubleshooting (1) user interface (1) vxml (8) web services (1) Web Technologies (1) Web Technology Books (1) youtube (1)