Pivotal HAWQ Lands In The Hortonworks Sandbox

May 4, 2015 Dan Baskette

sfeatured-elephant-hawkI have been working with Apache Hadoop® for quite a few years now and frequently find myself needing to try bits of code out on multiple distributions. During these times, the single node virtual editions of the various Hadoop distributions have always been my go-to resource.

When Pivotal first announced Pivotal HAWQ would be available on Hortonworks Data Platform (HDP), some of my first thoughts were about how nice it would be to provide customers the ability to install HAWQ directly onto the Hortonworks Sandbox to provide them with a place to take the software for a spin.

Earlier this week, I had a request to do a live customer demonstration of installing HAWQ on HDP 2.2.4 leveraging Apache Ambari. This activity kicked off those Sandbox thoughts again, and I decided to leverage the Sandbox for the demo.

I have to admit even I was surprised my initial fears that this would take too long and I’d miss my deadline melted away as I logged in to the completely configured environment just one hour later. I took the rest of my allotted time to work through some additional functionality that I knew would be needed in any follow-on efforts. One nice feature I spent a good deal of time on was automating the Ambari piece of the install via the extremely robust Ambari REST API. (see below for more detail)

One challenge that I ran into immediately was a versioning issue. Hortonworks provides an HDP-2.2 based VM that runs Ambari 1.7 and they provide a 2.2.4 based VM that runs the just-released Ambari 2.0. When developing the plugin that allows HAWQ to be installed as a Service within Ambari, our developers were working with the then newest release of Ambari 1.7, so at release Pivotal HAWQ installation requires Ambari 1.7. So, I had 2 options:

  • Update the HDP stack in the 2.2 based VM
  • Give the HAWQ installation a whirl on the 2.2.4 VM and Ambari 2.0

I decided to move forward with #2 just to see what would happen, and because I was unsure how upgrading the ALL of the Hadoop stack might effect some of the other tutorials that Horton provides—and it worked. What you find below are the results of that first hour or so of work.

NOTE: Please keep in mind, this installation on the Sandbox results in what would be considered an unsupported configuration for production use because it’s leveraging Ambari 2.0. However, just to explore the capabilities of HAWQ, so far it seems to work just fine. The process might show a few warnings, but you can safely ignore those.


Step by Step Guide for the Installation


  • Download the Hortonworks Sandbox 2.2.4 and install it according to the instructions on the Hortonworks site.
  • Boot the VM and once it’s booted you will see the ssh command needed to login to the Sandbox. The default root password is: hadoop. Using a terminal, SSH into the VM.
  • Outside of the VM: Download the Pivotal HAWQ package (PADS-1.3.0.1-13761 or newer), and the HAWQ plugin for Ambari on HDP (hawq-plugin-hdp-1.1-125 or newer). Move the files into the VM. This can be accomplished via a shared drive, or scp. If the versions downloaded are different, make the change to the commands below to accommodate the version differences. As an example:
scp /User/dbaskette/Downloads/hawq-plugin-hdp-1.1-125.tar root@192.168.9.131:/opt
  • Untar and uncompress the downloaded files inside the VM.
cd /opt
tar xvf ./hawq-plugin-hdp-1.1-125.tar.gz
tar xvf ./PADS-1.3.0.1-13761.tar
  • Install the HAWQ plugin and build repo for HAWQ installation.
yum install -y /opt/hawq-plugin-hdp-1.1-125/hawq-plugin-1.1-125.noarch.rpm
./PADS-1.3.0.1/setup_repo.sh
  • Modify HUE Proxy to allow the local repo.
sed -i "4i ProxyPass /PADS-1.3.0.1 !" /etc/httpd/conf.d/hue.conf
  • Restart HTTP and the Ambari Services.
service httpd restart
service ambari restart
  • The Sandbox is now ready to install HAWQ. In order to make things a little simpler and more programmatic, you can leverage the Ambari REST API to create the HAWQ Service and install it onto the Sandbox. First, we create the Service within Ambari. This step makes the specified Service available to be installed on the managed cluster, or in this case the Sandbox.
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"ServiceInfo":{"service_name":"HAWQ"}}' "http://localhost:8080/api/v1/clusters/Sandbox/services"
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"ServiceInfo":{"service_name":"PXF"}}' "http://localhost:8080/api/v1/clusters/Sandbox/services"
  • Next, you identify the components that will be installed as part of this service.
curl -u admin:admin -H X-Requested-By:devops -i -X POST "http://localhost:8080/api/v1/clusters/Sandbox/services/HAWQ/components/HAWQMASTER"
curl -u admin:admin -H X-Requested-By:devops -i -X POST "http://localhost:8080/api/v1/clusters/Sandbox/services/HAWQ/components/HAWQSTANDBY"
curl -u admin:admin -H X-Requested-By:devops -i -X POST "http://localhost:8080/api/v1/clusters/Sandbox/services/HAWQ/components/HAWQSEGMENT"
curl -u admin:admin -H X-Requested-By:devops -i -X POST "http://localhost:8080/api/v1/clusters/Sandbox/services/PXF/components/PXF"
  • Make the required changes to the XML Configuration Files, as well as the HAWQ configuration files. Since the HAWQ configuration is new to the cluster it must be created and then modified.
curl -u admin:admin -H X-Requested-By:devops -i -X PUT -d '{ "Clusters" : {"desired_config": {"type": "hawq-site", "tag" : "0" }}}'  "http://localhost:8080/api/v1/clusters/Sandbox"
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"type": "hawq-site", "tag": "1", "properties" : { "hawq.master.port" : 15432,"hawq.segments.per.node" : 1,"hawq.temp.directory" : "/data/hawq/temp"}}' "http://localhost:8080/api/v1/clusters/Sandbox/configurations"
curl -u admin:admin -H X-Requested-By:devops -i -X PUT -d '{ "Clusters" : {"desired_config": {"type": "hawq-site", "tag" : "1" }}}'  "http://localhost:8080/api/v1/clusters/Sandbox"
  • Now, we let Ambari know which service will run on which cluster node. In this case, all services will run on the Sandbox, or localhost).
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"HAWQMASTER"}}] }' "http://localhost:8080/api/v1/clusters/Sandbox/hosts?Hosts/host_name=sandbox.hortonworks.com"
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"HAWQSTANDBY"}}] }' "http://localhost:8080/api/v1/clusters/Sandbox/hosts?Hosts/host_name=sandbox.hortonworks.com"
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"HAWQSEGMENT"}}] }' "http://localhost:8080/api/v1/clusters/Sandbox/hosts?Hosts/host_name=sandbox.hortonworks.com"
curl -u admin:admin -H X-Requested-By:devops -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"PXF"}}] }' "http://localhost:8080/api/v1/clusters/Sandbox/hosts?Hosts/host_name=sandbox.hortonworks.com"
  • Install HAWQ into the Sandbox.
curl -u admin:admin -H X-Requested-By:devops -i -X PUT -d '{"RequestInfo": {"context" :"Installing HAWQ via API"}, "Body": {"ServiceInfo": {"state" : "INSTALLED"}}}'  "http://localhost:8080/api/v1/clusters/Sandbox/services/HAWQ"
curl -u admin:admin -H X-Requested-By:devops -i -X PUT -d '{"RequestInfo": {"context" :"Installing PXF via API"}, "Body": {"ServiceInfo": {"state" : "INSTALLED"}}}'  "http://localhost:8080/api/v1/clusters/Sandbox/services/PXF"
  • Monitor the Install until status is listed as Installed. You can run these commands multiple times to watch the status of the installs.
curl -u admin:admin -H X-Requested-By:devops -i -X GET "http://localhost:8080/api/v1/clusters/Sandbox/services/HAWQ?fields=ServiceInfo/state"
curl -u admin:admin -H X-Requested-By:devops -i -X GET "http://localhost:8080/api/v1/clusters/Sandbox/services/PXF?fields=ServiceInfo/state"
  • After the installs are complete, the last step within the Ambari API is starting the service. This step will create users, set paths, and share ssh keys for the gpadmin user.
curl -u admin:admin -H X-Requested-By:devops -i -X PUT -d '{"RequestInfo": {"context" :"Starting HAWQ via API"}, "Body": {"ServiceInfo": {"state" : "STARTED"}}}'  "http://localhost:8080/api/v1/clusters/Sandbox/services/HAWQ"
curl -u admin:admin -H X-Requested-By:devops -i -X PUT -d '{"RequestInfo": {"context" :"Starting PXF via API"}, "Body": {"ServiceInfo": {"state" : "STARTED"}}}'  "http://localhost:8080/api/v1/clusters/Sandbox/services/PXF"
  • Because this is a Single Node system, the Standby Master HAWQ server will not start properly. This causes the initialization of the database to end prematurely. So, we must re-initialize the database as the gpadmin user. The password for the gpadmin user is gpadmin. If you are using cut/paste, make sure to type the first command su - gpadmin and then cut and paste the remaining lines.
su - gpadmin
source /usr/local/hawq/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/hawq/master/gpseg-1/
gpstop -m
rm -rf /data/hawq/master/*
rm -rf /data/hawq/segments/*
rm -rf /data/hawq/temp/*
cd /tmp/hawq
gpinitsystem -a -c ./gpinitsystem_config -h hostfile

Now HAWQ is up and running on the Hortonworks Sandbox. To verify the system is operational, you can execute the following commands inside the Sandbox. You can skip the first two commands if you are using the same terminal from which you just initialized the database.

su - gpadmin
source /usr/local/hawq/greenplum_path.sh
psql
select version();
create table heroes(firstname text,lastname text);
insert into heroes values ('Peter','Parker'),('Victor','VonDoom'),('Reed','Richards');
select * from heroes;

If the query shows the rows you added, then you have successfully installed HAWQ into the Hortonworks Sandbox.

Note: The following step was omitted from the steps above: Edit the file /usr/local/hawq/etc/hdfs-client.xml and change the value of output.replace-datanode-on-failure to be false. Save the file and then as gpadmin, type gpstop -u.

Editor’s Note: ©2015 Pivotal Software, Inc. All rights reserved. Pivotal and HAWQ are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other countrie. Apache, Apache Hadoop, Hadoop, and Apache Ambari are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Hortonworks and HDP are either registered trademarks or trademarks of Hortonworks Inc. in the United States and/or other countries.

About the Author

Dan Baskette

Dan is Director of Technical Marketing at Pivotal with over 20 years of experience in various pre-sales and engineering roles with Sun Microsystems, EMC Corporation, and Pivotal Software. In addition to his technical marketing duties, Dan is frequently called upon to roll-up his sleeves for various "Will this work?" type projects. Dan is an avid collector of Marvel Comics gear and you can usually find him wearing a Marvel shirt. In his spare time, Dan enjoys playing tennis and hiking in the Smoky Mountains.

Follow on Twitter More Content by Dan Baskette
Previous
Industry Day Signals Big Things For Cloud Foundry Ecosystem
Industry Day Signals Big Things For Cloud Foundry Ecosystem

Last week marked the first Pivotal Cloud Foundry Industry Day, where 21 companies came together to create n...

Next
Four Must-See Developer Videos On Pivotal Cloud Foundry
Four Must-See Developer Videos On Pivotal Cloud Foundry

Recently, we worked with The New Stack on four videos that cover developing with Pivotal Cloud Foundry in d...