It has been a while since my last post and over that period I have received several questions via comments on my different posts. Almost all of the questions are related to Hadoop and I thought of starting this year with a post just answering those questions. I see readers asking questions that I have already answered several times as part of my responses. I thought it would be good if I could go through all the questions and come up with a frequently asked set of questions and answer all of them in one post. Hopefully, this will help my readers find answers at one place and will not have to go through all the post comments and my responses to those comments. I hope this helps.
Q: I am from XYZ background/technology/domain. Can I learn Hadoop?
A: Hadoop is an open source framework that is used many organizations to solve the problem of processing large amounts of data. It is a technology just like any other technology but it is gaining attention because of its application and widespread adoption. You can have experience in any background, technology or domain. If you are interested there should be nothing that should really stop you from learning Hadoop. For some it may take less time to pick it up, for some it may be long. It all depends on whether the technology interests you and you are willing to put in the effort and persistence required to learn it. Learning Hadoop is just like learning any other framework or technology.
Q: What is the future of Hadoop?
A: The present and the future of Hadoop is bright. I say this because the ecosystem, i.e. the applications that are being built on top of Hadoop are catering to almost every kind of use-case. Also the adoption of Hadoop by organizations within their core data teams is on the increase. The more the adoption, the better the future.
Q: I don’t have a technical background; can I still pick up Hadoop?
A: You can be from any background. Hadoop is no rocket science. It is vast and the ecosystem is growing everyday. Having said that, the community has done a pretty good job of documenting the workings and applications of Hadoop in their respective Apache project pages.
Q: I don’t know Java, but I am good at X, Y and Z, will Hadoop be the right choice for me?
A: You don’t need to know Java to learn Hadoop. Hadoop is the right choice only if you want to learn about Hadoop. If you have an inclination or itch to try out Hadoop, go ahead and try it out. You should be a good judge to know if you are enjoying it or no.
Q: Can you suggest good books to learn Hadoop?
A: There are a few books out there, you should be able to Google it and read the reviews of the books available. I would recommend the Hadoop Definitive Guide.
Q: Where can I practice Hadoop?
A: If you have a computer with good amount of RAM (around 8GB or more) and good performing CPU, you can spin up few virtual machines and install Hadoop on the cluster of VMs. You could also download the various Hadoop pre-installed VMs available.
Q: Can you guide me on a career path in the field of Hadoop?
A: There is really not much one can guide when it comes to Hadoop. Everything is out there. Tons of blog posts, books and the Apache project pages should easily give you an idea how Hadoop is doing and things required to pick up Hadoop. If you can play around with it and get a feel, you yourself should be the best person to decide if this is for you or not. Just put your head down and work, work real hard.
Q: Can you guide me to crack the interview?
A: Frankly, I am not sure how this is even possible. If you are prepared you will crack the interview. There is no set way or a magic pill to crack an interview. Read the job description, try and speak to others who do a similar job, prepare a good resume and give it your best shot. All the best!
Q: Can you please answer the below exam questions?
A: NO.
Q: Should I do the Admin course or the Developer course?
A: If managing a cluster (networking, monitoring, troubleshooting, Linux stuff) is what interests you should do the administrator course. If you like to write code (Pig scripts, Hive queries, Map Reduce, etc.) and solve data problems (which includes data engineering problems) you should consider doing a developer centric course. However, before doing any course try doing some self-study and get a feel.
Q: I am a Tester; will Hadoop be a good fit for me?
A: Anyone interested can do Hadoop. But if the question is, “Do Testers have a role in the Hadoop world?” then the answer is YES. A Hadoop distribution company needs to have testers who will test all the components and their inter-operability before they release a distribution. A tester with good understanding of Hadoop and its inner workings would be a great fit for such organizations.
Q: I followed the steps mentioned in your post, but it does not work for me. Can you help?
A: I would love to help. But practically, it is just not possible to help. I can help if I can see something obvious but if it requires lot of back and forth in the form of comments or emails, I really won’t have the time. Ideally, if you follow the instructions as they are (including the versions of the components used), the steps should work.
Q: I have X years of experience in XYZ field, will Hadoop give me a good start?
A: You should be getting into any field, because you want to get into it, not because it will give you a good start. If the technology excites you, you should play around with it, talk about it to others in the field and work hard to learn it. The same goes for Hadoop.
Q: What is the salary I can get if I learn Hadoop?
A: I don’t know.
Q: Is it mandatory to learn UNIX scripting to learn Hadoop?
A: It is NOT mandatory.
Q: Can you provide Hadoop certification and exam details?
A: NO.
Q: Where can I get large data sets for processing data in Hadoop?
A: Google it, you will find tons of free data sets from various domains. A quick search shows the following link form Quora – http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
Q: Where should I start, if I need to learn Hadoop?
A: Apache Project pages, Youtube/Vimeo videos, blog posts and the book – Hadoop Definitive Guide.
Q: Could you please suggest universities that are reputed among companies in the Silicon Valley?
A: I don’t know the answer to this.
Q: I have enrolled for a course with XYZ course provider. Are they good trainers? Can you suggest any good trainers? Do you teach?
A: No, I am not aware of any good trainers or good training institutes. Also, I am not a trainer and I don’t teach Hadoop. I blog about it, so, you may get to learn something from my blog posts. Also, www.hadoopscreencasts.com was something that I started to share my learning. If I do get the time to work on it, you should see more videos there.
Q: Will coding questions be asked in the certification exams?
A: I don’t know.
Q: I have done administration certification; can I clear the developer certification?
A: I don’t know. Look at the respective certification details for this.
Q: Is Hadoop a good career choice? Will Hadoop boost my career?
A: Hadoop is being used by many organizations and there is a lot of requirement for people who have Hadoop knowledge. Having said that, there are many such technologies and frameworks that are in demand. You have to do what you like to do.
Q: Which is better, Pig or Hive?
A: Both are good and are being used widely. If you like writing scripts, you can try Pig. If you like writing SQL like queries, use Hive.
Q: Can you suggest some projects that deal with Big Data?
A: For this, you will have to be creative. If you can’t think of anything, redo what people have already done. Take data sets that are freely available and think of all the questions you can ask.
Q: Do you teach Hadoop?
A: I don’t teach Hadoop. I blog and have been trying to work on www.hadoopscreencasts.com whenever I get the time.
Q: Is it difficult to get a job in the Hadoop field?
A: NO. If you can work hard and prove yourself, I think it should be easy.