Docker-based Ambari 1.7 cluster install wizard repo URL dead (404) while 'Running setup agent script'

I’m trying to get a simple 2-node cluster (including 1 node with Ambari server) set up from the Ambari source and during the installation of Ambari agent on the DataNode it seems to reach out to a URL, http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/repodata/repomd.xm, which seems to be dead. I pretty green at this and can’t find where to a) find the right URL and b) change it in the script.

Set up

As per the official docs for Ambari Development in Docker, I downloaded and built the Ambari (1.7 recent release) Docker image (I’m on OS X) with:

  • docker/matplotlib: RuntimeError: Invalid DISPLAY variable
  • GNU parallel to keep docker-compose in attached mode
  • How to run AWS ECS Task overriding environment variables
  • Docker swarm: guarantee high availability after restart
  • How to know the file from command “docker change” is regular file or directory
  • Apache Mesos,MESOS-DNS, MARATHON and Docker
  • git clone https://github.com/apache/ambari.git
    cd ambari
    docker build -t ambari/build ./dev-support/docker/docker
    

    The build took a fairly long time (hours), but I was glad to see it worked. Docker rocks!

    I then fired up a Docker container based on the image built above as ambari-master in interactive mode using the -it Docker flags. This container is to serve as the Ambari server. Once in, I got its ssh private key written somewhere I could copy it when asked during the Ambari cluster install wizzard. The Docker command for the server looks like this (slight modifications to the docs and with broken lines for readability):

    # From the cloned {ambari_root} directory:
    docker run \ 
        --privileged \
        -h master.coderigo.com \
        --name ambari-master \ 
        -p 80:80 -p 5005:5005 -p 8080:8080 \
        -v $(pwd):/tmp/ambari \
        -it \
        ambari/build bash
    
    # From here on in, we are INSIDE the created container (ambari-master).
    
    # Copy the ssh private key to give the install wizard.
    [root@ambari-master tmp]: cat ~/.ssh/id_rsa > /tmp/ambari/coderigo-ambari-server-id_rsa
    
    # Open up the /etc/hosts file to add an entry for the slave node (created in the next step
    # but I'm pre-empting its IP address here, with the ambari-master having 172.17.0.25.
    [root@ambari-master tmp]: echo "172.17.0.26    slave1.coderigo.com slave1" >> /etc/hosts
    
    # Now fetch the ambari repo that is to be copied to all slaves by the Ambari install wizard
    # and place it where Ambari install wizard expects it (/etc/yum.repos.d/)
    [root@ambari-master tmp]: wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/1.x/GA/ambari.repo -O /etc/yum.repos.d/ambari.repo
    
    # Finally, fire up the ambari server on this container
    [root@ambari-master tmp]: /tmp/ambari-build-docker/bin/ambaribuild.py server
    

    Now, since I’m on OS X, I can run boot2docker ip on a new terminal, which gives me an IP address of 192.168.59.103 so to load the Ambari server web UI I simply go to http://192.168.59.103:8080 and I get the Ambari web UI. Good so far.

    Now, I want to create a new Docker container to use as part of this small test cluster. I do so with the below from a new terminal:

    # Start the slave docker container in interactive mode (again, in multiple lines for readability):
    docker run \
        --privileged 
        -h slave1.coderigo.com \
        --name ambari-slave1 \
        --link ambari-master:master.coderigo.com \ # automatically link to the ambari-master node.
        -it \
        ambari/build bash
    
    # From here on in, we are INSIDE the created container (ambari-slave1)
    
    # Start ssh server (so that master can ssh into it)
    [root@ambari-slave1 tmp]: /etc/init.d/sshd start
    

    Ambari install wizard

    From here on in, I can work purely with the Ambari web UI at http://192.168.59.103:8080 (if you’re on non-OS X the IP address may differ).

    I go to the cluster install wizard and select the following for the first three screens:

    1. Cluster Name: clusterbomb ….Next>
    2. Stack: HDP2.2 ….Next>
    3. Target hosts (FQDNs): slave1.coderigo.com SSH Private Key: Uploaded coderigo-ambari-server-id_rsa created within the ambari-master container (see further up the post). ….Register and Confirm>
    4. Now at this stage I get a screen showing me progress on the install for slave1.coderigo.com and after a few seconds it tells me it failed. Clicking on the log for why it failed shows me this for Registration log for slave1.coderigo.com:

      ==========================
      Creating target directory…
      ==========================

      Command start time 2014-12-18 08:11:48

      Connection to slave1.coderigo.com closed.
      SSH command execution finished
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:48

      ==========================
      Copying common functions script…
      ==========================

      Command start time 2014-12-18 08:11:48

      scp /usr/lib/python2.6/site-packages/ambari_commons
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:48

      ==========================
      Copying OS type check script…
      ==========================

      Command start time 2014-12-18 08:11:48

      scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:49

      ==========================
      Running OS type check…
      ==========================

      Command start time 2014-12-18 08:11:49
      Cluster primary/cluster OS family is redhat6 and local/current OS family is redhat6

      Connection to slave1.coderigo.com closed.
      SSH command execution finished
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:49

      ==========================
      Checking ‘sudo’ package on remote host…
      ==========================

      Command start time 2014-12-18 08:11:49
      sudo-1.8.6p3-15.el6.x86_64

      Connection to slave1.coderigo.com closed.
      SSH command execution finished
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:49

      ==========================
      Copying repo file to ‘tmp’ folder…
      ==========================

      Command start time 2014-12-18 08:11:49

      scp /etc/yum.repos.d/ambari.repo
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:50

      ==========================
      Moving file to repo dir…
      ==========================

      Command start time 2014-12-18 08:11:50

      Connection to slave1.coderigo.com closed.
      SSH command execution finished
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:50

      ==========================
      Copying setup script file…
      ==========================

      Command start time 2014-12-18 08:11:50

      scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
      host=slave1.coderigo.com, exitcode=0
      Command end time 2014-12-18 08:11:50

      ==========================
      Running setup agent script…
      ==========================

      Command start time 2014-12-18 08:11:50
      http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 – “The requested URL returned error: 404 Not Found”
      Trying other mirror.
      Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.x. Please verify its path and try again
      http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 – “The requested URL returned error: 404 Not Found”
      Trying other mirror.
      Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.x. Please verify its path and try again
      /bin/sh: /usr/sbin/ambari-agent: No such file or directory
      {‘exitstatus’: 1, ‘log’: (”, None)}

      Connection to slave1.coderigo.com closed.
      SSH command execution finished
      host=slave1.coderigo.com, exitcode=1
      Command end time 2014-12-18 08:11:52

      ERROR: Bootstrap of host slave1.coderigo.com fails because previous action finished with non-zero exit code (1)
      ERROR MESSAGE: tcgetattr: Inappropriate ioctl for device
      Connection to slave1.coderigo.com closed.

      STDOUT: http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 – “The requested URL returned error: 404 Not Found”
      Trying other mirror.
      Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.x. Please verify its path and try again
      http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 – “The requested URL returned error: 404 Not Found”
      Trying other mirror.
      Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.x. Please verify its path and try again
      /bin/sh: /usr/sbin/ambari-agent: No such file or directory
      {‘exitstatus’: 1, ‘log’: (”, None)}

      Connection to slave1.coderigo.com closed.

    Problem

    So, based on the log, I can see that most of it works but for a dead URL: http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/repodata/repomd.xml returing a 404. I can confirm this by manually cURL-ing the URL and it returns a 404 with some XML telling you it doesn’t exist.
    From scratching around I have tried different repos. Specifically, these ones:

    http://s3.amazonaws.com/public-repo-1.hortonworks.com/AMBARI-1.x/repos/centos6/ambari.repo
    http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.2.3.7/ambari.repo

    but these cause a failure earlier on in the install process as they are mismatched with the ambari server version (AFAICT).

    Question

    I never expected dead URLs to cause an issue (a redirect on dead URLs would be nice).
    I would love some suggestions or pointers on how to get over this hurdle. I’m going to try a manual slave set up, but ssh set up would be optimal. I hope the above is reproducible so specifically, does anyone know:

    a) what the right URL is; and

    b) where to change it to make it work?

  • Docker Shipyard - deploying a container to a specific node
  • Docker Toolbox setup fails on Windows 8.1
  • Tensorflow on Docker: How to save the work on Jupyter notebook?
  • Which DSpace Docker container is officially endorsed by the DSpace community?
  • docker run -i -t image /bin/bash - source files first
  • phpMyAdmin inside docker container via nginx reverse proxy
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.