Are you looking for high salary IT Technologies – This survey will be best bet.

2014-data-science-salary-survey by  oreilly

http://www.oreilly.com/data/free/files/2014-data-science-salary-survey.pdf

Conclusion
This report highlights some trends in the data space that many who
work in its core have been aware of for some time: Hadoop is on the
rise; cloud-based data services are important; and those who know
how to use the advanced, recently developed tools of Big Data typi‐
cally earn high salaries. What might be new here is in the details:
which tools specifically tend to be used together, and which corre‐
spond to the highest salaries (pay attention to Spark and Storm!);
which other factors most clearly affect data science salaries, and by
how much. Clearly the bulk of the variation is determined by factors
not at all specific to data, such as geographical location or position
in the company hierarchy, but there is significant room for move‐
ment based on specific data skills … Refer the link
Advertisements

Installing Hadoop Single cluster Node – Using source to compile

OS Version: Ubuntu 14.04.2 Server version

Java Version: 1.7.0_79

Hadoop Version:

1. Install Java
sudo apt-get install default-java

java -version
java version “1.7.0_79”
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

cd ~user1
vi .bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH

2. >>>Download and install protbuf, other tools required for compilation
curl -# -O https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
gunzip protobuf-2.5.0.tar.gz
tar -xvf protobuf-2.5.0.tar
cd protobuf-2.5.0/
sudo ./configure -prefix=/usr
sudo make
sudo make install
cd java
mvn install
mvn package

sudo apt-get install -y gcc g++ make maven cmake zlib zliblg-dev libcurl4-openssl-dev
Note: zlib and zliblg-dev already installed
sudo apt-get install -y gcc g++ make maven cmake libcurl4-openssl-dev

3. >>>Download hadoop 2.6.0 source from Apache hadoop mirror site.
wget http://mirror.nus.edu.sg/apache/hadoop/common/stable/hadoop-2.6.0-src.tar.gz
sudo gunzip hadoop-2.6.0-src.tar.gz
sudo tar -xvf hadoop-2.6.0-src.tar
cd hadoop-2.6.0-src/

4. >>>Compile the source
cd /home/user1/hadoop-2.6.0-src/
mvn clean install -DskipTests
cd hadoop-mapreduce-project/
export Platform=x64
mvn clean install assembly:assembly -Pnative
mvn package -Pdist,native -DskipTests=true -Dtar

This will create binaries and tar file under
cd /home/user1/hadoop-2.6.0-src/hadoop-dist/target/hadoop-2.6.0/
Set hadoop Path
sudo ln -s /home/user1/hadoop-2.6.0-src/hadoop-dist/target/hadoop-2.6.0 /usr/local/hadoop
sudo vi /etc/environment

5. >>>Configuring Hadoop
>>>cd to hadoop root folder
user1@Master:~$ cd /usr/local/hadoop
user1@Master:/usr/local/hadoop$ ls
bin  conf  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

>>>Create folder /app/hadoop/tmp to create hadoop metadata
user1@Master:/usr/local/hadoop/conf$ sudo mkdir -p /app/hadoop/tmp
user1@Master:/usr/local/hadoop/conf$ sudo chown user1 -R /app
user1@Master:/usr/local/hadoop/conf$ ls -ld /app
drwxr-xr-x 3 user1 root 4096 Jun 30 00:34 /app

>>>Configure hadoop by creating following configuration files
user1@Master:/usr/local/hadoop$ cd conf/
user1@Master:/usr/local/hadoop/conf$ vi core-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:54310/</value>
</property>
</configuration>

user1@Master:/usr/local/hadoop/conf$ vi map-red-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at.  If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>

user1@Master:/usr/local/hadoop/conf$ vi hdfs-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
    The default of 3 is used if replication is not specified.
    </description>
</property>
</configuration>

user1@Master:/usr/local/hadoop/conf$ vi yarn-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.dispatcher.exit-on-error</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
      $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
      $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
      $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
      $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>master:8034</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>

user1@Master:/usr/local/hadoop/conf$ vi capacity-scheduler-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>-1</value>
</property>
</configuration>

user1@Master:/usr/local/hadoop/conf$ vi hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/conf
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_HDFS_HOME=/usr/local/hadoop
export HADOO_MAPRED_HOME=/usr/local/hadoop
export HADOOP_YARN_HOME=/usr/local/hadoop
export YARN_CONF_DIR=/usr/local/hadoop/conf

user1@Master:/usr/local/hadoop/conf$ cp hadoop-env.sh yarn-env.sh

>>>We are building single node hadoop cluster, hence declare master node itself act as data node.

user1@Master:/usr/local/hadoop/conf$ hostname -f
Master
user1@Master:/usr/local/hadoop/conf$ vi slaves

6. >>>Create the HDFS file system

user1@Master:/usr/local/hadoop/conf$ hdfs namenode -format

7. >>>Start the hadoop service and list the services with jps

user1@Master:~$ jps
2475 NodeManager
1875 NameNode
2550 Jps
2208 SecondaryNameNode
2028 DataNode

Hadoop documents Resource

  1. Always start with creators. Go to apach hadoop
  2. Go to Hadoop Distributor Cloudera and Hortonworks

Good Hadoop Document Website

Good Website of Haddop documents, which has installation, ecosystem and many other documents.

Compiling Hadoop from it’s source files

Geting started with Hadoop 2.2.0 — Building