hortonworks / ansible-hortonworks Goto Github PK
View Code? Open in Web Editor NEWAnsible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints
License: Apache License 2.0
Ansible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints
License: Apache License 2.0
I enabled three edge nodes intending to use them as a standalone NiFi cluster but then the blueprint fails. I modified the inventory/openstack/group_vars/all file like:
- group: "{{ name_prefix }}-edge"
count: 3
image: CentOS 7.2
flavor: m3.medium
public_ip: false
But then when I run apply_blueprint.sh:
fatal: [sebtestansible-master]: FAILED! => {"changed": false, "connection": "close", "content": "{\n \"status\" : 400,\n \"message\" : \"Invalid host_group specified: sebtestansible-edge. All request host groups must have a corresponding host group in the specified blueprint\"\n}", "content_type": "text/plain", "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "failed": true, "msg": "Status code was not [200, 201, 202]: HTTP Error 400: Bad Request", "redirected": false, "server": "Jetty(8.1.19.v20160209)", "set_cookie": "AMBARISESSIONID=y9cr5j3jou9jp3pjay50g7w;Path=/;HttpOnly", "status": 400, "url": "http://sebtestansible-master:8080/api/v1/clusters/sebtestansible", "user": "admin", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}
If i check /tmp/mytestcluster_blueprint
"host_groups" : [
{
"name" : "sebtestansible-slave",
...
},
{
"name" : "sebtestansible-master",
But no entry for sebtestansible-edge
Hi,
could we add those two parameter (NIFI_REGISTRY and the server of the LOG) when we run ansible-hortonworks.
thanks
Hi,
Could i setup the version of nifi with ansible on a ubuntu 16.04 ?
thanks.
best regards.
Method to count NN Servers (to determine if we have to enable HA) could be unsuitable in case we have several and all NN in the same group.
In the actual version, NN are counted by summing the number of groups with NAMENODE service:
# ansible-hortonworks/playbooks/set_variables.yml
set_fact:
namenode_groups: "{{ namenode_groups }} + [ '{{ item.host_group }}' ]"
when: groups[item.host_group] is defined and groups[item.host_group]|length > 0 and 'NAMENODE' in item.services
with_items: "{{ blueprint_dynamic }}"
no_log: True
#ansible-hortonworks/playbooks/roles/ambari-blueprint/templates/blueprint_dynamic.j2
# Check if we have multiple NN servers:
namenode_groups|length > 1
The things is that we can have 2 Namenodes in the same Group (if both servers are hosting exactly the same services).
Example :
- role: hdp-namenode-1
clients: "{{ hdp_namenode_1_client }}" # not relevant here
services:
- NAMENODE
- ZKFC
- JOURNALNODE
- RESOURCEMANAGER
- ZOOKEEPER_SERVER
- METRICS_MONITOR
[…]
To suit this case, we need to change:
The method of counting NN:
- name: Initialize the control variables
set_fact:
namenode_groups: []
namenode_count: 0
[...]
- name: Populate the namenode groups list
set_fact:
namenode_groups: "{{ namenode_groups }} + [ '{{ item.host_group }}' ]"
namenode_count: "{{ namenode_count | int + groups[item.role]|length }}"
when: groups[item.host_group] is defined and groups[item.host_group]|length > 0 and 'NAMENODE' in item.services
with_items: "{{ blueprint_dynamic }}"
no_log: True
And so the methods to check if we have at least 2 NN:
"xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_count | int > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",
[…]
{% if namenode_count | int > 1 -%}
Same thing for other HA like Ranger KMS, RM, …
Are you in line with this approach ?
Hello,
Thank you this script, it's very usefull for us.
I have just one remark, I tried to make an installation in offline mode on centOS nodes but the epel repo usage is blocking me.
Would it be possible to check if needed package are available and use epel only if packages are missing?
Thanks,
If you attempt to build a cluster with Ansible 2.6, the following error occurs:
TASK [ambari-blueprint : Generate the cluster blueprint] *********************** fatal: [m5.hadoop.cnc1.log.blackberry]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'template'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token '=', got '.'. String: {\n \"configurations\" : [\n{% if security|lower == \"mit-kdc\" or security|lower == \"active-directory\" %}\n
(I cut off the rest of the error, it's just JSON, I assume the contents are what the template should attempt to render).
If you revert back to Ansible 2.5, the cluster builds as expected.
The requirements state Ansible 2.5+, I didn't see any notes about 2.6.
ERROR: Management pack solr-ambari-mpack-3.0.0 already installed!
TASK [ambari-config : Install the HDP Search Management Pack creates=/var/lib/ambari-server/resources/mpacks/{{ mpack_filename | regex_replace('.tar.gz$','') }}, _raw_params=echo yes | ambari-server install-mpack --mpack={{ repo_base_url }}/HDP-SOLR/hdp-solr-ambari-mp/{{ mpack_filename }}] ***
fatal: [customer-master-0]: FAILED! => {"changed": true, "cmd": "echo yes | ambari-server install-mpack --mpack=http://ip-10-42-4-10.ec2.internal/repos/HDP-SOLR/hdp-solr-ambari-mp/solr-service-mpack-3.0.0.tar.gz", "delta": "0:00:00.232108", "end": "2018-07-26 09:04:45.547090", "msg": "non-zero return code", "rc": 255, "start": "2018-07-26 09:04:45.314982", "stderr": "", "stderr_lines": [], "stdout": "Using python /usr/bin/python\nInstalling management pack\n\nERROR: Management pack solr-ambari-mpack-3.0.0 already installed!\nERROR: Exiting with exit code -1. \nREASON: Management pack solr-ambari-mpack-3.0.0 already installed!", "stdout_lines": ["Using python /usr/bin/python", "Installing management pack", "", "ERROR: Management pack solr-ambari-mpack-3.0.0 already installed!", "ERROR: Exiting with exit code -1. ", "REASON: Management pack solr-ambari-mpack-3.0.0 already installed!"]}
Hi,
I am trying to running apply-blueprint playbook. But using the example-hdp-ha-3-masters-with-ranger dynamic blueprint, I got this error into embari error log :
WARN [pool-20-thread-1] BlueprintConfigurationProcessor:1546 - The property 'dfs.namenode.secondary.http-address' is associated with the component 'SECONDARY_NAMENODE' which isn't mapped to any host group. This may affect configuration topology resolution. INFO [pool-20-thread-1] ConfigureClusterTask:74 - Some host groups require more hosts, cluster configuration cannot begin
But I don't find 'dfs.namenode.secondary.http-address' in the generated blueprint.
Is it normal, or I have failed my configuration ?
Tks
When using a local repository (synced with reposync) the build.id file from the original repository is not available, so the following code will not work.
- name: Attempt to read the HDP repo build.id file (Ambari >= 2.6)
uri:
url: "{{ hdp_main_repo_url }}/build.id"
method: GET
return_content: yes
register: hdp_repo_build_id
from the Schema Registry Server Start
log:
"migrate" option failed : org.flywaydb.core.internal.exception.FlywayEnterpriseUpgradeRequiredException: Flyway Enterprise or PostgreSQL upgrade required: PostgreSQL 9.2 is past regular support by PostgreSQL and no longer supported by Flyway Community Edition and Flyway Pro Edition, but still supported by Flyway Enterprise Edition.
The tested installation contains a (almost)full HDP-3.0.1 stack, plus NIFI and the 2 registries (REGISTRY_SERVER
being the SchemaRegistry):
- host_group: "hdp-master"
services:
- REGISTRY_SERVER
- NIFI_REGISTRY_MASTER ...
database_options.external_hostname
database_options
var tough)I'm running through the Install.md for openstack but can't get past the nova --insecure list command. I keep getting the following error:
(ansible)vagrant@localhost:~/nifi_examples/ansible-hdp$ nova --insecure list
No handlers could be found for logger "keystoneauth.identity.generic.base"
ERROR (ConnectFailure): Unable to establish connection to https://192.175.27.106:5000/v3/auth/tokens: HTTPSConnectionPool(host='192.175.27.106', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x1e872d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Hi,
Now ansible stops at creating test cluster task:
ambari-blueprint : Fail if the cluster create task is in an error state]
Actually, when I go to check the ambari dashboard, some of the services like zookeeper run successfully. However, most of them are down and give alerts. I have no idea how ambari works inside but all of these alerts are connection refuse. And It didn't work when I restart them. So do I have to open these ports manually or disable the firewall? It would be a great help if you can point out the problem.
And I took a screenshot for this error
Hosts information:
10.80.64.51 host0 master01 (which I run ansible on)
10.80.64.110 host1 slave01
10.80.64.34 host2 slave02
Thanks in advance!
I tried to use this example file to deploy high-availability HDFS cluster. https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/group_vars/example-hdp-ha-3-masters-with-storm-kafka , but it didn't work.
I got an error message Unable to fetch namespace information from active NN
After I changed
"dfs.ha.fencing.methods" : "sshfence",
"dfs.ha.fencing.ssh.private-key-files" : "/root/.ssh/id_rsa"
,it starts work.
Hi Team,
I was trying to install a cluster with Namenode and ResourceManager HA. Cluster is installed but some of the services (RM, MapReduce2, Hive, Oozie, Spark2 ) failed to start with below error:
2018-07-13 10:33:22,032 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1293)) - Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: Incomplete HDFS URI, no host: hdfs://hwx_ansible
I am using dynamic blueprint. By default, nameservice got configured with cluster_name.
"dfs.nameservices" : "{{ cluster_name }}",
https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/roles/ambari-blueprint/templates/blueprint_dynamic.j2#L326
cluster name: hwx_ansible
Running the HDP install using static. Suddenly Im getting this error: "No package matching 'ambari-agent' found available, installed or updated"
I have
Jinja 2.10,
boto==2.49.0
boto3==1.9.2
botocore==1.12.2
installed that were problematic earlier.
Below is the error from verbose.
fatal: [hdp-single-node-cluster]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"allow_downgrade": false,
"conf_file": null,
"disable_gpg_check": false,
"disablerepo": null,
"enablerepo": null,
"exclude": null,
"install_repoquery": true,
"installroot": "/",
"list": null,
"name": [
"ambari-agent"
],
"security": false,
"skip_broken": false,
"state": "latest",
"update_cache": true,
"validate_certs": true
}
},
"msg": "No package matching 'ambari-agent' found available, installed or updated",
"rc": 126,
"results": [
"No package matching 'ambari-agent' found available, installed or updated"
]
}
Hello,
Such as mentionned by @fredrikhgrelland Commit 4d41737 brokes the dynamic_blueprint.j2 template.
TASK [ambari-blueprint : Upload the blueprint and the cluster creation template] ***********************************************************************************************************
failed: [master-2] (item={u'dest': u'/tmp/astcluster_blueprint', u'src': u'blueprint_dynamic.j2'}) => {"changed": false, "item": {"dest": "/tmp/astcluster_blueprint", "src": "blueprint_dynamic.j2"}, "msg": "AnsibleError: template error while templating string: expected token '=', got '.'.
My file playbooks/group_vars/ambari-server is configure like that:
`#############################
#############################
blueprint_name: '{{ cluster_name }}_blueprint' # the name of the blueprint as it will be stored in Ambari
blueprint_file: 'blueprint_dynamic.j2' # the blueprint JSON file - 'blueprint_dynamic.j2' is a Jinja2 template that generates the required JSON
blueprint_dynamic: # properties for the dynamic blueprint - these are only used by the 'blueprint_dynamic.j2' template to generate the JSON
role: "brokers"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'HCAT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'LOGSEARCH_LOGFEEDER']
services:
role: "name-node"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'HIVE_CLIENT', 'HCAT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'LOGSEARCH_LOGFEEDER']
services:
role: "sname-node"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'HIVE_CLIENT', 'HCAT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'LOGSEARCH_LOGFEEDER']
services:
role: "worker1"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'HIVE_CLIENT', 'HCAT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT','LOGSEARCH_LOGFEEDER']
services:
role: "workers"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'SLIDER', 'PIG', 'HIVE_CLIENT', 'HCAT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT','LOGSEARCH_LOGFEEDER']
services:
Thanks for your help,
To prevent the line from being added to the file at each playbook run, we suggest changing the lines:
dest: /etc/hosts
insertafter: "^127..*$"
Per:
path: /etc/hosts
regexp: '{{ item }}$'
Att.
While provisioning a single node cluster with only HDFS and YARN I forgot to add HISTORYSERVER component.
Here is the error message I got in Ambari: Logical Request: Provision Cluster 'hdp-yarn' FAILED: Unable to update configuration property 'yarn.log.server.url' with topology information. Component 'HISTORYSERVER' is mapped to an invalid number of hosts '0'.
This was my blueprint_dynamic:
master_clients: "['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT']" master_services: "- ZOOKEEPER_SERVER\n - NAMENODE\n - DATANODE\n - SECONDARY_NAMENODE\n - APP_TIMELINE_SERVER\n - RESOURCEMANAGER\n - NODEMANAGER\n - AMBARI_SERVER\n - METRICS_COLLECTOR\n - METRICS_MONITOR\n"
I added HISTORYSERVER and it worked. Shouldnt it fail earlier in the process?
First, I just want to say this is a great collection of playbooks/scripts. Thank you for this.
I am attempting to use updated versions of the ansible and azure python packages. After adding/tweaking the pip
dependencies, I was able to get build_cloud
to execute successfully (Azure).
I'm currently stuck at the prepare_nodes
playbook. It won't get past TASK [Populate the namenode groups list]
:
fatal: [host-mgmt.example.com]: FAILED! => {
"failed": true,
"msg": "'dict object' has no attribute 'blueprint_dynamic'"
}
The problem is in playbooks/set_variables.yml
line 38. It seems as though the vars in playbooks/group_vars/ambari-server
are not being added to the host, and therefore blueprint_dynamic
is not a var on the host. I can't past this. Do you have any idea why those vars are not being added?
Here are my initial values:
cluster_name: 'hdp3'
ambari_version: '2.7.0.0' # must be the 4-part full version number
hdp_version: '3.0.0.0' # must be the 4-part full version number
hdp_build_number: '1634' # the HDP build number from docs.hortonworks.com (if set to 'auto', Ansible will try to get it from the repository)
Here is the error message:
An internal system exception occurred: Stack data, Stack HDP 3.0 is not found in Ambari metainfo
It fails on this task: TASK [ambari-config : Register the VDF with Ambari (Ambari >= 2.6)] ************
I see Ambari 2.6.2.2 is installed
This worked some weeks ago, did I miss something?
In playbooks/group_vars/ambari-server
:
ambari_admin_user: 'admin'
ambari_admin_password: 'admin'
Are we supposed to change these values? I have found several some instances where admin:admin is hardcoded. In one of my tests, changing this also caused an Ansible failure.
I guess we should keep it for the initial installation and change the password via the UI after the cluster is setup? By then, we are allowed to update it for future run to add new slaves etc.? I'm a bit confused here, would appreciate some guidance.
Thank you.
Sorry i am just curious what is the correct result of apply_blueprint.sh ? I have retried the install from scratch (snapshot before apply and retry) for about 20 times with different combination but never get success msg from it. And what's minimal services combination to successful applying blueprint ? I have tried remove solr, solr+hive or solr+hst+hive+spark but still can't pass the apply phase. Need help or a bit tips here. Appreciate any help.
Hi,
I'm trying to configure a cluster, with 2 NN and 4 DN.
i derived the clsuter from the example-hdp-ha-3-masters
i added in the playbooks/group_vars/all, the host_group definition for (hdp-management, hdp-masternode-01/02 and hdp-worker)
the prepare_nodes.yml, install_ambari.yml, configure_ambari.yml went on apparently fine.
but when i execute the ansible script to appy the blueprint.yml
I'm facing an issue what looks like a apparent jinja2 template based blueprint generation error.
I'm not all that familiar yet with jinja and the blueprints, and I'm seeking help here to find possible root causes.
TASK
[ambari-blueprint : Generate the cluster blueprint] *********************************************************************************************************************************************************************************
task path: /home/pm/github.com/ansible-hortonworks/playbooks/roles/ambari-blueprint/tasks/main.yml:57
fatal: [manager]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'template'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token '=', got '.'. String: {\n "configurations" :
....
I was hoping i could test from ipython the template playbooks/roles/ambari-blueprint/templates/blueprint_dynamic.j2
but i'm getting nowhere close to the ansible blueprint generating error above.
looking forward your help to figure this out.
actually found that there's similar issue already on the same subject
#38.
Trying to install this on GCP. install_ambari.sh results in
fatal: [mytestcluster-hdp-slave-02]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'ambari-agent' found available, installed or updated", "rc": 126, "results": ["No package matching 'ambari-agent' found available, installed or updated"]}
###########################
###########################
cluster_name: 'WS-HDP'
ambari_version: '2.7.0.0' # must be the 4-part full version number
hdp_version: '3.0.0.0' # must be the 4-part full version number
hdp_build_number: 'auto' # the HDP build number from docs.hortonworks.com (if set to 'auto', Ansible will try to get it from the repository)
hdf_version: '3.2.0.0' # must be the 4-part full version number
hdf_build_number: 'auto' # the HDF build number from docs.hortonworks.com (if set to 'auto', Ansible will try to get it from the repository)
hdpsearch_version: '3.0.0' # must be the full version number
hdpsearch_build_number: '100' # the HDP Search build number from docs.hortonworks.com (hardcoded to 100 for the moment)
repo_base_url: 'http://public-repo-1.hortonworks.com' # change this if using a Local Repository
TASK [Fail if the selected components should not be part of an HDP 3 blueprint] ********************************************************************************************************************************
failed: [master01] (item=HCAT) => {"changed": false, "item": "HCAT", "msg": "When installing HDP 3 the component HCAT must not be part of the blueprint."}
failed: [master01] (item=SLIDER) => {"changed": false, "item": "SLIDER", "msg": "When installing HDP 3 the component SLIDER must not be part of the blueprint."}
failed: [master01] (item=WEBHCAT_SERVER) => {"changed": false, "item": "WEBHCAT_SERVER", "msg": "When installing HDP 3 the component WEBHCAT_SERVER must not be part of the blueprint."}
to retry, use: --limit @/Users/chandler/Documents/Projects/esxi/ansible/ansible-hortonworks/playbooks/configure_ambari.retry
playbooks/check_dynamic_blueprint.yml
- name: Fail if the selected components should not be part of an HDP 3 blueprint
fail:
msg: "When installing HDP 3 the component {{ item }} must not be part of the blueprint."
when: install_hdp and hdp_major_version == '3' and item in blueprint_all_clients | union(blueprint_all_services)
with_items:
- 'HCAT'
- 'SLIDER'
- 'WEBHCAT_SERVER'
Is there any way to get around this ? Thanks.
When I deploy ambari (one master, 2 slave) on Ubuntu 16.04 and ansible stop at TASK [ambari-blueprint : Generate the cluster blueprint]
"An unhandled exception occurred while running the lookup plugin 'template'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token 'end of statement block', got '.'
Is there anything wrong in the template file: blueprint_dynamic.j2?
WebHCat is marked as not supported in HDP 3.0, as well as HCat and Slider, but they are still present in playbooks/group_vars examples for HDP 3.0 deployments.
When using the example blueprint "example-hdp-ha-3-masters" I get the following error:
TASK [ambari-blueprint : Upload the blueprint to the Ambari server] ************
fatal: [management.local]: FAILED! => {"cache_control": "no-store", "changed": false, "connection": "close", "content": "{\n \"status\" : 400,\n \"message\" : \"Cluster Topology validation failed. Invalid service component count: [MYSQL_SERVER(actual=2, required=0-1)]. To disable topology validation and create the blueprint, add the following to the end of the url: '?validate_topology=false'\"\n}", "content_type": "text/plain", "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "failed": true, "msg": "Status code was not [200, 201, 202, 404]: HTTP Error 400: Bad Request", "pragma": "no-cache", "redirected": false, "set_cookie": "AMBARISESSIONID=ddkcn9it0pzbq8d7g0lnsa0j;Path=/;HttpOnly", "status": 400, "url": "http://management.local:8080/api/v1/blueprints/mytestcluster_blueprint", "user": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}
indicating that MYSQL_SERVER should only be present 0 or 1 time.
This is caused by the following line of code: dynamic_blueprint.j2 at line 522
{ "name" : "{{ service }}" }{% if service == "HIVE_METASTORE" and database == "embedded" %},{ "name" : "MYSQL_SERVER" }{% endif %}{% if not loop.last %},{% endif %}
MYSQL_SERVER should be added only to the first HIVE_METASTORE node
I am attempting to deploy to AWS in a new vpc and am getting the following errors. at the TASK [Create hdp-master node(s) (EBS root volume)] step
fatal: [localhost]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: list object has no element 0\n\nThe error appears to have been in '/ansible-hortonworks/playbooks/clouds/build_aws_nodes.yml': line 10, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create {{ outer_loop.role }} node(s) (EBS root volume)\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - "{{ foo }}"\n"}
I have looked over the file and can not seem to locate the issue.
Resource Manager wouldnt start because port 53 is expected to be used. After the start failed, I manually changed the port to 54 and it worked.
Our standard server design is that we run a local dnsmasq process on port 53, and run all DNS queries through this.
In the Hortonworks 3.0 deployment, the Yarn Registry DNS service also attempts to bind to port 53, and this is different from the upstream default of 5335 in the Apache Hadoop 3.1 distribution.
http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/RegistryDNS.html
So we have two questions:
Error: RuntimeError: Failed to execute command '/usr/bin/yum -y install hadoop_3_0_0_0_auto', exited with code '1', message: 'Error: Nothing to do
Ambari 2.7.0 installs fine using ansible scripts. The apply_blueprint.yml is where the error happens.
Im using Centos7 on AWS, 1 masternode and 1 datanode.
Im enclosing the output from Ambari.
std-err_std-out.txt
Edit:
If it is relevant:
I am running the ansible-playbooks as is, using --extra-vars to override the configurations from group_vars/all - for AWS and for HDP.
I wanted to use your ansible playbook(s) to generate and apply a blueprint to an existing ambari installation. Unfortunately the URL scheme (http) and the ambari port (8080) are hardcoded in multiple roles/playbooks. At the moment my companys environment it's not possible to change either the port or scheme.
Rather than manipulating your playbook(s) (and maintaining a "fork" of them in the future) I'd like to submit a PR which makes the port and url scheme configurable. It's likely that there are other pepole around the world which could benefit from this.
If you agree, I'm happy to submit a PR with the changes - please let me know.
Regards, Thomas
Hi,
can you create a blueprint dynamic template exemple to install a cluster with Atlas ?
Thanks you !
Is there any way to re-apply blueprint after a failure apply before ?
I got the following message, and it seems the you can't remove a existing Cluster.
fatal: [HDP-0]: FAILED! => {"changed": false, "msg": "Cluster HDP already exists!"}
Thanks for help.
I'm trying to configure Ambari agent in Ubuntu 14.04 which uses Python 2.7.6 and it gives the following error when the agent starts.
'module' object has no attribute 'PROTOCOL_TLSv1_2'
But it seems to work with Python 2.7.9.
ansible-hortonworks/playbooks/roles/ambari-agent/tasks/main.yml
Lines 17 to 27 in c6aa731
In my case I have multiple IPs (network adapters) and the default one is not the first. This results in a pg_hba.conf which looks like this:
local all ambari,mapred md5
host all ambari,mapred 0.0.0.0/0 md5
host all ambari,mapred ::/0 md5
host ambari ambari 10.0.2.15/32 md5
host hive hive 10.0.2.15/32 md5
host oozie oozie 10.0.2.15/32 md5
host ranger ranger 10.0.2.15/32 md5
My /etc/hosts
127.0.0.1 localhost
192.168.2.2 ansible.local ansible
192.168.2.3 repository.local repository
192.168.2.10 single.local single
And here is the bug when starting ranger
2018-06-19 16:12:51,664 [I] --------- Verifying Ranger DB connection ---------
2018-06-19 16:12:51,664 [I] Checking connection
2018-06-19 16:12:51,664 [JISQL] /usr/jdk64/jdk1.8.0_112/bin/java -cp /usr/hdp/2.6.3.0-235/ranger-admin/ews/lib/postgresql-jdbc.jar:/usr/hdp/current/ranger-admin/jisql/lib/* org.apache.util.sql.Jisql -driver postgresql -cstring jdbc:postgresql://single.local/ranger -u ranger -p '********' -noheader -trim -c \; -query "SELECT 1;"
SQLException : SQL state: 28000 org.postgresql.util.PSQLException: FATAL: no pg_hba.conf entry for host "192.168.2.10", user "ranger", database "ranger", SSL off ErrorCode: 0
2018-06-19 16:12:52,035 [E] Can't establish connection
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/ranger_admin.py", line 231, in <module>
RangerAdmin().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 375, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/ranger_admin.py", line 93, in start
self.configure(env, upgrade_type=upgrade_type, setup_db=params.stack_supports_ranger_setup_db_on_start)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 120, in locking_configure
original_configure(obj, *args, **kw)
File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/ranger_admin.py", line 135, in configure
setup_ranger_db()
File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/setup_ranger_xml.py", line 274, in setup_ranger_db
user=params.unix_user,
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-python-wrap /usr/hdp/current/ranger-admin/db_setup.py' returned 1. 2018-06-19 16:12:55,537 [I] DB FLAVOR :POSTGRES
The best choice whould be adding all the interfaces to the pg_hba.conf ?
Hi,
After fixing the issue #52
Reran the apply-blueprint.sh -vvv
Got error like "Topology validation failed: org.apache.ambari.server.topology.InvalidTopologyException: the following hosts are mapped to multiple host groups: [hostsA]...."
Meanwhile, indeed 'hostA' is mapped into the "hdp-management" group and the "hdp-masternode-01"
I guess that's the problem. The things is that i have 2 NN one of which i'd like to use to host also KNOX/ZEPPELIN/RANGER/ATLAS/KERBEROS, and key the 4 DN to host the HBASE_REGIONSERVER and SOLR instances
so i moved the ambari-server/knox/zeppelin/ranger/atlas to the hdp-masternode-02 along with the SECONDARY_NAMENODE service.
Running
apply_blueprint.sh -vvv --check
leads to "Fail if could not get a VersionDefinition from Ambari"
yet the playbook/group_vars/all contains the definition
ambari_version: '2.6.2.2'
would appreciate if you could provide some guidance on this
thanks
Hi,
I am trying to deploy a simple HDFS cluster w/o a resource manager. Here's my BP template:
blueprint_name: '{{ cluster_name }}_blueprint'
blueprint_file: 'blueprint_dynamic.j2'
blueprint_dynamic:
- host_group: "hdp-master"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT']
services:
- ZOOKEEPER_SERVER
- NAMENODE
- SECONDARY_NAMENODE
- AMBARI_SERVER
- METRICS_COLLECTOR
- METRICS_GRAFANA
- METRICS_MONITOR
- host_group: "hdp-slave"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT']
services:
- DATANODE
- NODEMANAGER
- METRICS_MONITOR
But the deployment keeps failing with:
FAILED! => {"cache_control": "no-store", "changed": false, "connection": "close", "content": "{\n "status" : 400,\n "message" : "Cluster Topology validation failed. Invalid service component count: [APP_TIMELINE_SERVER(actual=0, required=1), RESOURCEMANAGER(actual=0, required=1-2), TIMELINE_READER(actual=0, required=1), YARN_CLIENT(actual=0, required=1+)]. To disable topology validation and create the blueprint, add the following to the end of the url: '?validate_topology=false'"\n}", "content_type": "text/plain;charset=utf-8", "date": "Mon, 17 Sep 2018 10:50:56 GMT", "expires": "Thu, 01 Jan 1970 00:00:00 GMT", "json": {"message": "Cluster Topology validation failed. Invalid service component count: [APP_TIMELINE_SERVER(actual=0, required=1), RESOURCEMANAGER(actual=0, required=1-2), TIMELINE_READER(actual=0, required=1), YARN_CLIENT(actual=0, required=1+)]. To disable topology validation and create the blueprint, add the following to the end of the url: '?validate_topology=false'", "status": 400}, "msg": "Status code was 400 and not [200, 201, 202, 409]: HTTP Error 400: Bad Request", "pragma": "no-cache", "redirected": false, "set_cookie": "AMBARISESSIONID=node014endllt0wk5010rxj91xc3xxv9.node0;Path=/;HttpOnly", "status": 400, "url": "http://hdfs01.adm01.com:8080/api/v1/blueprints/mytestcluster_blueprint", "user": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "x_content_type_options": "nosniff, nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}
Any help is really appreciated !
Sadek
In a HA deployment one of the HBase Masters crashes, looks like a race condition. If I start the master (the crashed one) after it works fine.
The crashed master log
2018-10-08 15:21:35,831 ERROR [master/hdp-master-03:16000] master.HMaster: Failed to become active master
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/hbase.version could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
at org.apache.hadoop.ipc.Client.call(Client.java:1443)
at org.apache.hadoop.ipc.Client.call(Client.java:1353)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:510)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1865)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2018-10-08 15:21:35,832 ERROR [master/hdp-master-03:16000] master.HMaster: ***** ABORTING master hdp-master-03.caf.net,16000,1539012062336: Unhandled exception. Starting shutdown. *****
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/hbase.version could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
at org.apache.hadoop.ipc.Client.call(Client.java:1443)
at org.apache.hadoop.ipc.Client.call(Client.java:1353)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:510)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1865)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
Blueprint
blueprint_dynamic: # properties for the dynamic blueprint - these are only used by the 'blueprint_dynamic.j2' template to generate the JSON
- host_group: "hdp-gateway"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'HIVE_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT']
services:
- AMBARI_SERVER
- host_group: "hdp-masternode-01"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'HIVE_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT', 'INFRA_SOLR_CLIENT']
services:
- ZOOKEEPER_SERVER
- NAMENODE
- ZKFC
- JOURNALNODE
- RESOURCEMANAGER
- HBASE_MASTER
- HIVE_SERVER
- HIVE_METASTORE
- INFRA_SOLR
- AMBARI_SERVER
- host_group: "hdp-masternode-02"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'HIVE_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT', 'INFRA_SOLR_CLIENT']
services:
- ZOOKEEPER_SERVER
- NAMENODE
- ZKFC
- JOURNALNODE
- RESOURCEMANAGER
- APP_TIMELINE_SERVER
- YARN_REGISTRY_DNS
- TIMELINE_READER
- HIVE_SERVER
- HIVE_METASTORE
- host_group: "hdp-masternode-03"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'HIVE_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT', 'INFRA_SOLR_CLIENT']
services:
- ZOOKEEPER_SERVER
- JOURNALNODE
- HIVE_METASTORE
- HBASE_MASTER
- HISTORYSERVER
- METRICS_COLLECTOR
- METRICS_GRAFANA
- METRICS_MONITOR
- SPARK2_JOBHISTORYSERVER
- host_group: "hdp-worker"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'HIVE_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT', 'INFRA_SOLR_CLIENT']
services:
- DATANODE
- NODEMANAGER
- KAFKA_BROKER
- HBASE_REGIONSERVER
I am using CentOS 7 with 1 master and 2 slave nodes. Configuration standard. I changed the IPs in the static inventory file.
When starting the blueprint installation I get always this error
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-INSTALL/scripts/hook.py", line 37, in <module> BeforeInstallHook().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 335, in execute if self.should_expose_component_version(self.command_name): File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 246, in should_expose_component_version if stack_version_formatted and check_stack_feature(StackFeature.ROLLING_UPGRADE, stack_version_formatted): File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/stack_features.py", line 78, in check_stack_feature raise Fail("Stack features not defined by stack") resource_management.core.exceptions.Fail: Stack features not defined by stack
I need to rerun via Ambari the reinstallation of the host.
What might be the issue for this problem ?
Thanks
Value auto
didnt work for me for 3.0.0 and it doesnt work for me on 3.0.1.
hdp_build_number for 3.0.0 was 1634.
What is build number for 3.0.1?
Is there a way to check this self?
Thank you!
Include support for installing hcp packages. Eg : Metron
Ref : https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.4.0/bk_installation/bk_installation.pdf
Hi,
Is the database
property (which defaults to embedded) apply to both Ambari and Apache Ranger? I am trying to install RANGER_ADMIN/RANGER_KMS_SERVER on a host but backed with MySQL (installed/configured by this playbook), and leave Ambari backed with the default (embedded) database.
Also, what would be the easy way to add RANGER to the stack after Ambari had been deployed with embedded DB?
Thanks!
Sadek
Hi,
I've walked through all the playbooks successfully, with a topology of 2NN, 4DN (2 'hdp-worker-zk' and 2 'hdp-worker' as per below.
Now, when in ambari i try to start all, Namenodes are note starting, and the log shows :
2018-10-01 12:33:00,455 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(716)) - Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
Has anyone encountered this issue?
I see /hadoop/hdfs/namenode is all empty
What would be the recommended course of actions to get both namenodes up and running?
Should one be started with 'hdfs namenode -boostrapStandby' and then the otherone formatted?
The hadoop version used : Hadoop 3.1.1.3.0.1.0-187
hdfs getconf -namenodes returns the expected 2 nodes
Another last thing.
I have in the core site ha.zookeeper.quorum pointing to 4 nodes (2NN and 2DN) listening on port 2181.
When i check each of these nodes 1 of the DN is not listening.
The same host that was supposed to be a Zookeeper-server, is apparently missing the JournalNode. Could this be related to some namenode formating issue?
Should i clean the hdfs config removing the missing node from following adv configs?
ha.zookeeper.quorum
dfs.namenode.shared.edits.dir
If so, what then is the suggested actions to take to get the namenodes formatted? and the cluster up and running?
For reference, here is the content snippets of the template host_groups definition in the playbook/group_vars/all file :
blueprint_name: '{{ cluster_name }}_blueprint'
blueprint_file: 'blueprint_dynamic.j2'
blueprint_dynamic:
- host_group: "hdp-master1"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT']
services:
- ZOOKEEPER_SERVER
- NAMENODE
- ZKFC
- JOURNALNODE
- RESOURCEMANAGER
- APP_TIMELINE_SERVER
- TIMELINE_READER
- YARN_REGISTRY_DNS
- HISTORYSERVER
- SPARK2_JOBHISTORYSERVER
- ZEPPELIN_MASTER
- HIVE_SERVER
- HIVE_METASTORE
- HBASE_MASTER
- HST_SERVER
- ACTIVITY_ANALYZER
- ACTIVITY_EXPLORER
- HST_AGENT
- METRICS_MONITOR
- host_group: "hdp-master2"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT']
services:
- AMBARI_SERVER
- INFRA_SOLR
- ZOOKEEPER_SERVER
- NAMENODE
- ZKFC
- JOURNALNODE
- HIVE_SERVER
- HIVE_METASTORE
- OOZIE_SERVER
- ACTIVITY_ANALYZER
- KNOX_GATEWAY
- HST_AGENT
- METRICS_COLLECTOR
- METRICS_GRAFANA
- METRICS_MONITOR
- host_group: "hdp-worker-zk"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT']
services:
- ZOOKEEPER_SERVER
- JOURNALNODE
- DATANODE
- NODEMANAGER
- HBASE_REGIONSERVER
- ACTIVITY_ANALYZER
- HST_AGENT
- METRICS_MONITOR
#- SOLR_SERVER- host_group: "hdp-worker"
clients: ['ZOOKEEPER_CLIENT', 'HDFS_CLIENT', 'YARN_CLIENT', 'MAPREDUCE2_CLIENT', 'TEZ_CLIENT', 'PIG', 'SQOOP', 'HIVE_CLIENT', 'OOZIE_CLIENT', 'INFRA_SOLR_CLIENT', 'SPARK2_CLIENT', 'HBASE_CLIENT']
services:
- DATANODE
- NODEMANAGER
- HBASE_REGIONSERVER
- HST_AGENT
- METRICS_MONITOR
#- SOLR_SERVER
Hi,
When I execute "install_cluster.sh" to a GCE prject I receive the following error:
fatal: [mytestcluster-hdp-master]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'template'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token 'end of statement block', got '.'. String: {\n "configurations" : [\n{% if security|lower == "mit-kdc" or security|lower == "active-directory" %}\n {\n "kerberos-env": {\n{% if security|lower == "active-directory" %}\n "ldap_url": "{{ security_options.ldap_url }}",\n "container_dn": "{{ security_options.container_dn }}",\n{% endif %}\n "manage_identities": "true",\n "install_packages": "true",\n "realm" : "{{ security_options.realm }}",\n "kdc_type" : "{{ security }}",\n "kdc_hosts" : "{{ security_options.external_hostname|default(ansible_fqdn,true) }}",\n "admin_server_host" : "{{ security_options.external_hostname|default(ansible_fqdn,true) }}"\n }\n },\n {\n "krb5-conf": {\n "manage_krb5_conf" : "true"\n }\n },\n{% endif %}\n{% if rangeradmin_hosts|length > 0 %}\n {\n "admin-properties" : {\n "DB_FLAVOR" : "{{ database|regex_replace('mariadb', 'mysql')|upper }}",\n "SQL_CONNECTOR_JAR" : "{{ hostvars[inventory_hostname][database + '_jdbc_location'] }}",\n "db_host" : "{{ database_options.external_hostname|default(ansible_fqdn,true) }}",\n "db_name" : "{{ database_options.rangeradmin_db_name }}",\n "db_user" : "{{ database_options.rangeradmin_db_username }}",\n "db_password" : "{{ database_options.rangeradmin_db_password }}",\n "policymgr_external_url" : "http://%HOSTGROUP::{{ rangeradmin_groups[0] }}%:6080"\n }\n },\n {\n "ranger-admin-site" : {\n "ranger.externalurl" : "http://%HOSTGROUP::{{ rangeradmin_groups[0] }}%:6080",\n {% if database == "mysql" or database == "mariadb" -%}\n "ranger.jpa.jdbc.driver": "com.mysql.jdbc.Driver",\n "ranger.jpa.jdbc.url": "jdbc:mysql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ mysql_port }}/{{ database_options.rangeradmin_db_name }}",\n {% endif -%}\n {% if database == "postgres" -%}\n "ranger.jpa.jdbc.driver": "org.postgresql.Driver",\n "ranger.jpa.jdbc.url": "jdbc:postgresql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ postgres_port }}/{{ database_options.rangeradmin_db_name }}",\n {% endif -%}\n "ranger.audit.source.type" : "solr",\n "ranger.audit.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n{% if ranger_options.enable_plugins|default(no) %}\n{% if namenode_groups|length > 0 %}\n {\n "ranger-hdfs-plugin-properties" : {\n "ranger-hdfs-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-hdfs-security" : {\n "ranger.plugin.hdfs.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.hdfs.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-hdfs-audit" : {\n "xasecure.audit.destination.db" : "false",\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n {\n "ranger-hive-plugin-properties" : {\n "ranger-hive-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-hive-security" : {\n "ranger.plugin.hive.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.hive.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-hive-audit" : {\n "xasecure.audit.destination.db" : "false",\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n {\n "ranger-yarn-plugin-properties" : {\n "ranger-yarn-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-yarn-security" : {\n "ranger.plugin.yarn.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.yarn.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-yarn-audit" : {\n "xasecure.audit.destination.db" : "false",\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n {\n "ranger-hbase-plugin-properties" : {\n "ranger-hbase-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-hbase-security" : {\n "ranger.plugin.hbase.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.hbase.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-hbase-audit" : {\n "xasecure.audit.destination.db" : "false",\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n{% endif %}\n{% if hdf_hosts|length > 0 %}\n {\n "ranger-nifi-plugin-properties" : {\n "ranger-nifi-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-nifi-security" : {\n "ranger.plugin.nifi.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.nifi.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-nifi-audit" : {\n{% if namenode_groups|length > 0 %}\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n{% else %}\n "xasecure.audit.destination.hdfs" : "false",\n{% endif %}\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n{% endif %}\n{% if security|lower == "mit-kdc" or security|lower == "active-directory" -%}\n {\n "ranger-storm-plugin-properties" : {\n "ranger-storm-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-storm-security" : {\n "ranger.plugin.storm.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.storm.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-storm-audit" : {\n "xasecure.audit.destination.db" : "false",\n{% if namenode_groups|length > 0 %}\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n{% else %}\n "xasecure.audit.destination.hdfs" : "false",\n{% endif %}\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n{% endif %}\n {\n "ranger-kafka-plugin-properties" : {\n "ranger-kafka-plugin-enabled" : "Yes"\n }\n },\n {\n "ranger-kafka-security" : {\n "ranger.plugin.kafka.policy.rest.url" : "http://{{ hostvars[rangeradmin_hosts|sort|list|first]['ansible_fqdn'] }}:6080",\n "ranger.plugin.kafka.policy.pollIntervalMs" : "30000"\n }\n },\n {\n "ranger-kafka-audit" : {\n "xasecure.audit.destination.db" : "false",\n{% if namenode_groups|length > 0 %}\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n{% else %}\n "xasecure.audit.destination.hdfs" : "false",\n{% endif %}\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n{% endif %}\n {\n "ranger-env" : {\n{% if ranger_options.enable_plugins|default(no) %}\n{% if namenode_groups|length > 0 %}\n "ranger-hdfs-plugin-enabled" : "Yes",\n "ranger-hive-plugin-enabled" : "Yes",\n "ranger-yarn-plugin-enabled" : "Yes",\n "ranger-hbase-plugin-enabled" : "Yes",\n{% endif %}\n{% if hdf_hosts|length > 0 %}\n "ranger-nifi-plugin-enabled" : "Yes",\n{% endif %}\n{% if security|lower == "mit-kdc" or security|lower == "active-directory" -%}\n "ranger-storm-plugin-enabled" : "Yes",\n{% endif %}\n "ranger-kafka-plugin-enabled" : "Yes",\n{% endif %}\n "admin_username" : "admin",\n "admin_password" : "{{ ranger_options.ranger_admin_password }}",\n "ranger_admin_username" : "amb_ranger_admin",\n "ranger_admin_password" : "{{ ranger_options.ranger_admin_password }}",\n{% if hdp_major_version|int >= 3 -%}\n "rangerusersync_user_password" : "{{ ranger_options.ranger_admin_password }}",\n "rangertagsync_user_password" : "{{ ranger_options.ranger_admin_password }}",\n "keyadmin_user_password" : "{{ ranger_options.ranger_keyadmin_password }}",\n{% endif %}\n "xasecure.audit.destination.db" : "false",\n{% if namenode_groups|length > 0 %}\n "xasecure.audit.destination.hdfs" : "true",\n{% else %}\n "xasecure.audit.destination.hdfs" : "false",\n{% endif %}\n "xasecure.audit.destination.solr" : "true",\n "is_solrCloud_enabled": "true",\n "create_db_dbuser": "false"\n }\n },\n{% endif %}\n{% if rangerkms_hosts|length > 0 %}\n {\n "kms-properties" : {\n "DB_FLAVOR" : "{{ database|regex_replace('mariadb', 'mysql')|upper }}",\n "SQL_CONNECTOR_JAR" : "{{ hostvars[inventory_hostname][database + '_jdbc_location'] }}",\n "KMS_MASTER_KEY_PASSWD" : "{{ ranger_options.kms_master_key_password }}",\n "db_host" : "{{ database_options.external_hostname|default(ansible_fqdn,true) }}",\n "db_name" : "{{ database_options.rangerkms_db_name }}",\n "db_user" : "{{ database_options.rangerkms_db_username }}",\n "db_password" : "{{ database_options.rangerkms_db_password }}"\n }\n },\n {\n "dbks-site" : {\n {% if database == "mysql" or database == "mariadb" -%}\n "ranger.ks.jpa.jdbc.url": "jdbc:mysql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ mysql_port }}/{{ database_options.rangerkms_db_name }}",\n "ranger.ks.jpa.jdbc.driver": "com.mysql.jdbc.Driver"\n {% endif -%}\n {% if database == "postgres" -%}\n "ranger.ks.jpa.jdbc.url": "jdbc:postgresql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ postgres_port }}/{{ database_options.rangerkms_db_name }}",\n "ranger.ks.jpa.jdbc.driver": "org.postgresql.Driver"\n {% endif -%}\n }\n },\n {\n "kms-env" : {\n "create_db_user" : "false"\n }\n },\n {\n "kms-site" : {\n{% if rangerkms_hosts|length > 1 %}\n "hadoop.kms.cache.enable" : "false",\n "hadoop.kms.cache.timeout.ms" : "0",\n "hadoop.kms.current.key.cache.timeout.ms" : "0",\n "hadoop.kms.authentication.signer.secret.provider" : "zookeeper",\n "hadoop.kms.authentication.signer.secret.provider.zookeeper.connection.string" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}",\n{% endif %}\n "hadoop.kms.proxyuser.HTTP.hosts" : "",\n "hadoop.kms.proxyuser.HTTP.users" : "",\n "hadoop.kms.proxyuser.ranger.groups" : "",\n "hadoop.kms.proxyuser.ranger.hosts" : "",\n "hadoop.kms.proxyuser.ranger.users" : "",\n "hadoop.kms.proxyuser.yarn.groups" : "",\n "hadoop.kms.proxyuser.yarn.hosts" : "",\n "hadoop.kms.proxyuser.yarn.users" : ""\n }\n },\n {\n "ranger-kms-audit" : {\n "xasecure.audit.destination.db" : "false",\n "xasecure.audit.destination.hdfs" : "true",\n "xasecure.audit.destination.hdfs.dir" : "hdfs://{% if namenode_groups|length > 1 %}{{ hdfs_ha_name }}{% else %}{{ hostvars[groups[namenode_groups.0]|sort|list|first]['ansible_fqdn'] }}:8020{% endif %}/ranger/audit",\n "xasecure.audit.destination.solr" : "true",\n "xasecure.audit.destination.solr.zookeepers" : "{% for zk in zookeeper_hosts %}{{ hostvars[zk]['ansible_fqdn'] }}:2181{% if not loop.last %},{% endif %}{% endfor %}/infra-solr"\n }\n },\n{% endif %}\n{% if hdf_hosts|length > 0 %}\n {\n "nifi-ambari-config" : {\n "nifi.node.ssl.port": "9091",\n "nifi.node.port": "9090",\n "nifi.security.encrypt.configuration.password": "{{ default_password }}",\n "nifi.sensitive.props.key": "{{ default_password }}"\n }\n },\n {\n "nifi-env" : {\n "nifi_group" : "nifi",\n "nifi_user" : "nifi"\n }\n },\n{% if streamline_hosts|length > 0 %}\n {\n "streamline-common" : {\n {% if database == "mysql" or database == "mariadb" -%}\n "database_name" : "{{ database_options.streamline_db_name }}",\n "streamline.storage.type": "mysql",\n "streamline.storage.connector.user": "{{ database_options.streamline_db_username }}",\n "streamline.storage.connector.password": "{{ database_options.streamline_db_password }}",\n "streamline.storage.connector.connectURI": "jdbc:mysql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ mysql_port }}/{{ database_options.streamline_db_name }}",\n {% endif -%}\n {% if database == "postgres" -%}\n "database_name" : "{{ database_options.streamline_db_name }}",\n "streamline.storage.type": "postgresql",\n "streamline.storage.connector.user": "{{ database_options.streamline_db_username }}",\n "streamline.storage.connector.password": "{{ database_options.streamline_db_password }}",\n "streamline.storage.connector.connectURI": "jdbc:postgresql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ postgres_port }}/{{ database_options.streamline_db_name }}",\n {% endif -%}\n "jar.storage.type" : "local",\n "streamline.storage.query.timeout" : "30"\n }\n },\n{% endif %}\n{% if registry_hosts|length > 0 %}\n {\n "registry-common" : {\n {% if database == "mysql" or database == "mariadb" -%}\n "database_name" : "{{ database_options.registry_db_name }}",\n "registry.storage.type": "mysql",\n "registry.storage.connector.user": "{{ database_options.registry_db_username }}",\n "registry.storage.connector.password": "{{ database_options.registry_db_password }}",\n "registry.storage.connector.connectURI": "jdbc:mysql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ mysql_port }}/{{ database_options.registry_db_name }}",\n {% endif -%}\n {% if database == "postgres" -%}\n "database_name" : "{{ database_options.registry_db_name }}",\n "registry.storage.type": "postgresql",\n "registry.storage.connector.user": "{{ database_options.registry_db_username }}",\n "registry.storage.connector.password": "{{ database_options.registry_db_password }}",\n "registry.storage.connector.connectURI": "jdbc:postgresql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ postgres_port }}/{{ database_options.registry_db_name }}",\n {% endif -%}\n "jar.storage.type" : "local",\n "registry.storage.query.timeout" : "30"\n }\n },\n{% endif %}\n{% endif %}\n{% if namenode_groups|length > 0 %}\n {\n "hadoop-env" : {\n "dtnode_heapsize" : "1024m",\n "namenode_heapsize" : "2048m",\n "namenode_opt_maxnewsize" : "384m",\n "namenode_opt_newsize" : "384m"\n }\n },\n {\n "hdfs-site" : {\n {% if namenode_groups|length > 1 -%}\n "dfs.client.failover.proxy.provider.{{ hdfs_ha_name }}" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",\n "dfs.ha.automatic-failover.enabled" : "true",\n "dfs.ha.fencing.methods" : "shell(/bin/true)",\n "dfs.ha.namenodes.{{ hdfs_ha_name }}" : "nn1,nn2",\n "dfs.namenode.http-address.{{ hdfs_ha_name }}.nn1" : "%HOSTGROUP::{{ namenode_groups[0] }}%:50070",\n "dfs.namenode.http-address.{{ hdfs_ha_name }}.nn2" : "%HOSTGROUP::{{ namenode_groups[1] }}%:50070",\n "dfs.namenode.https-address.{{ hdfs_ha_name }}.nn1" : "%HOSTGROUP::{{ namenode_groups[0] }}%:50470",\n "dfs.namenode.https-address.{{ hdfs_ha_name }}.nn2" : "%HOSTGROUP::{{ namenode_groups[1] }}%:50470",\n "dfs.namenode.rpc-address.{{ hdfs_ha_name }}.nn1" : "%HOSTGROUP::{{ namenode_groups[0] }}%:8020",\n "dfs.namenode.rpc-address.{{ hdfs_ha_name }}.nn2" : "%HOSTGROUP::{{ namenode_groups[1] }}%:8020",\n "dfs.namenode.shared.edits.dir" : "qjournal://{% for jn in journalnode_groups %}%HOSTGROUP::{{ jn }}%:8485{% if not loop.last %};{% endif %}{% endfor %}/{{ hdfs_ha_name }}",\n "dfs.nameservices" : "{{ hdfs_ha_name }}",\n {% endif -%}\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "dfs.namenode.inode.attributes.provider.class" : "org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer",\n{% endif %}\n{% if rangerkms_hosts|length > 0 %}\n "dfs.encryption.key.provider.uri" : "kms://http@{% for kmshost in rangerkms_hosts %}{{ hostvars[kmshost]['ansible_fqdn'] }}{% if not loop.last %};{% endif %}{% endfor %}:9292/kms",\n{% endif %}\n "dfs.datanode.data.dir" : "/hadoop/hdfs/data",\n "dfs.datanode.failed.volumes.tolerated" : "0",\n "dfs.replication" : "3"\n }\n },\n {\n "yarn-site" : {\n {% if resourcemanager_groups|length > 1 -%}\n "hadoop.registry.zk.quorum": "{% for zk in zookeeper_groups %}%HOSTGROUP::{{ zk }}%:2181{% if not loop.last %},{% endif %}{% endfor %}",\n "yarn.resourcemanager.recovery.enabled": "true",\n "yarn.resourcemanager.store.class" : "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore",\n "yarn.resourcemanager.cluster-id" : "yarn-cluster",\n "yarn.resourcemanager.ha.enabled" : "true",\n "yarn.resourcemanager.ha.automatic-failover.zk-base-path" : "/yarn-leader-election",\n "yarn.resourcemanager.ha.rm-ids" : "rm1,rm2",\n "yarn.resourcemanager.address": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8050",\n "yarn.resourcemanager.scheduler.address": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8030",\n "yarn.resourcemanager.resource-tracker.address": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8025",\n "yarn.resourcemanager.admin.address": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8141",\n "yarn.resourcemanager.hostname": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8088",\n "yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%",\n "yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::{{ resourcemanager_groups[1] }}%",\n "yarn.resourcemanager.webapp.address": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8088",\n "yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8088",\n "yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::{{ resourcemanager_groups[1] }}%:8088",\n "yarn.resourcemanager.webapp.https.address": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8090",\n "yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::{{ resourcemanager_groups[0] }}%:8090",\n "yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::{{ resourcemanager_groups[1] }}%:8090",\n "yarn.resourcemanager.zk-address": "{% for zk in zookeeper_groups %}%HOSTGROUP::{{ zk }}%:2181{% if not loop.last %},{% endif %}{% endfor %}",\n {% endif -%}\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "yarn.acl.enable" : "true",\n "yarn.authorization-provider": "org.apache.ranger.authorization.yarn.authorizer.RangerYarnAuthorizer",\n{% endif %}\n "yarn.client.nodemanager-connect.retry-interval-ms" : "10000"\n }\n },\n {\n "hive-site" : {\n {% if database != "embedded" -%}\n {% if database == "mysql" or database == "mariadb" -%}\n "javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver",\n "javax.jdo.option.ConnectionURL": "jdbc:mysql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ mysql_port }}/{{ database_options.hive_db_name }}",\n {% endif -%}\n {% if database == "postgres" -%}\n "javax.jdo.option.ConnectionDriverName": "org.postgresql.Driver",\n "javax.jdo.option.ConnectionURL": "jdbc:postgresql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ postgres_port }}/{{ database_options.hive_db_name }}",\n {% endif -%}\n "javax.jdo.option.ConnectionUserName": "{{ database_options.hive_db_username }}",\n "javax.jdo.option.ConnectionPassword": "{{ database_options.hive_db_password }}",\n {% endif -%}\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "hive.security.authorization.enabled" : "true",\n{% endif %}\n "hive.metastore.failure.retries" : "24"\n }\n },\n {\n "hiveserver2-site" : {\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "hive.security.authorization.enabled" : "true",\n "hive.security.authorization.manager" : "org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory",\n "hive.conf.restricted.list" : "hive.security.authenticator.manager,hive.security.authorization.manager,hive.security.metastore.authorization.manager,hive.security.metastore.authenticator.manager,hive.users.in.admin.role,hive.server2.xsrf.filter.enabled,hive.security.authorization.enabled",\n{% endif %}\n "hive.metastore.metrics.enabled" : "true"\n }\n },\n {\n "hive-env" : {\n {% if database != "embedded" -%}\n {% if database == "mysql" or database == "mariadb" -%}\n "hive_database": "Existing MySQL / MariaDB Database",\n "hive_database_type": "mysql",\n {% endif -%}\n {% if database == "postgres" -%}\n "hive_database": "Existing PostgreSQL Database",\n "hive_database_type": "postgres",\n {% endif -%}\n "hive_database_name": "{{ database_options.hive_db_name }}",\n {% endif -%}\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "hive_security_authorization" : "Ranger",\n{% endif %}\n "hive_user" : "hive"\n }\n },\n {\n "oozie-site" : {\n {% if database != "embedded" -%}\n {% if database == "mysql" or database == "mariadb" -%}\n "oozie.service.JPAService.jdbc.driver": "com.mysql.jdbc.Driver",\n "oozie.service.JPAService.jdbc.url": "jdbc:mysql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ mysql_port }}/{{ database_options.oozie_db_name }}",\n {% endif -%}\n {% if database == "postgres" -%}\n "oozie.service.JPAService.jdbc.driver": "org.postgresql.Driver",\n "oozie.service.JPAService.jdbc.url": "jdbc:postgresql://{{ database_options.external_hostname|default(ansible_fqdn,true) }}:{{ postgres_port }}/{{ database_options.oozie_db_name }}",\n {% endif -%}\n "oozie.db.schema.name": "{{ database_options.oozie_db_name }}",\n "oozie.service.JPAService.jdbc.username": "{{ database_options.oozie_db_username }}",\n "oozie.service.JPAService.jdbc.password": "{{ database_options.oozie_db_password }}",\n {% endif -%}\n{% if (security|lower == "mit-kdc" or security|lower == "active-directory") and security_options.http_authentication|default(no) %}\n "oozie.authentication.cookie.domain" : "{{ security_options.realm|lower }}",\n{% endif %}\n "oozie.action.retry.interval" : "30"\n }\n },\n {\n "oozie-env" : {\n {% if database != "embedded" -%}\n {% if database == "mysql" or database == "mariadb" -%}\n "oozie_database": "Existing MySQL / MariaDB Database",\n {% endif -%}\n {% if database == "postgres" -%}\n "oozie_database": "Existing PostgreSQL Database",\n {% endif -%}\n {% endif -%}\n "oozie_user" : "oozie"\n }\n },\n {\n "hbase-site" : {\n {% if namenode_groups|length > 1 -%}\n "hbase.rootdir": "hdfs://{{ hdfs_ha_name }}/apps/hbase/data",\n {% endif -%}\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "hbase.security.authorization" : "true",\n "hbase.coprocessor.master.classes" : "org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor",\n "hbase.coprocessor.region.classes" : "org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint,org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor",\n "hbase.coprocessor.regionserver.classes" : "org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor",\n{% endif %}\n "hbase.client.retries.number" : "35"\n }\n },\n {\n "core-site": {\n {% if namenode_groups|length > 1 -%}\n "fs.defaultFS" : "hdfs://{{ hdfs_ha_name }}",\n "ha.zookeeper.quorum" : "{% for zk in zookeeper_groups %}%HOSTGROUP::{{ zk }}%:2181{% if not loop.last %},{% endif %}{% endfor %}",\n {% endif -%}\n{% if (security|lower == "mit-kdc" or security|lower == "active-directory") and security_options.http_authentication|default(no) %}\n "hadoop.http.authentication.simple.anonymous.allowed" : "false",\n "hadoop.http.authentication.signature.secret.file" : "/etc/security/http_secret",\n "hadoop.http.authentication.type" : "kerberos",\n "hadoop.http.authentication.kerberos.keytab" : "/etc/security/keytabs/spnego.service.keytab",\n "hadoop.http.authentication.kerberos.principal" : "HTTP/_HOST@{{ security_options.realm }}",\n "hadoop.http.filter.initializers" : "org.apache.hadoop.security.AuthenticationFilterInitializer",\n "hadoop.http.authentication.cookie.domain" : "{{ security_options.realm|lower }}",\n{% else %}\n "hadoop.http.authentication.simple.anonymous.allowed" : "true",\n "hadoop.http.authentication.type" : "simple",\n{% endif %}\n{% if rangerkms_hosts|length > 0 %}\n "hadoop.security.key.provider.path" : "kms://http@{% for kmshost in rangerkms_hosts %}{{ hostvars[kmshost]['ansible_fqdn'] }}{% if not loop.last %};{% endif %}{% endfor %}:9292/kms",\n "hadoop.proxyuser.kms.groups" : "*",\n{% endif %}\n "fs.trash.interval" : "360"\n }\n },\n{% endif %}\n {\n "storm-site": {\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) and (security|lower == "mit-kdc" or security|lower == "active-directory") %}\n "nimbus.authorizer" : "org.apache.ranger.authorization.storm.authorizer.RangerStormAuthorizer",\n{% endif %}\n "storm.zookeeper.retry.intervalceiling.millis" : "30000"\n }\n },\n {\n "kafka-broker": {\n{% if rangeradmin_hosts|length > 0 and ranger_options.enable_plugins|default(no) %}\n "authorizer.class.name" : "org.apache.ranger.authorization.kafka.authorizer.RangerKafkaAuthorizer",\n{% endif %}\n "zookeeper.session.timeout.ms" : "30000"\n }\n },\n {\n "zoo.cfg": {\n "clientPort" : "2181"\n }\n }\n ],\n\n "host_groups" : [\n {% set ns = namespace(hivedb_embedded_defined=false) %}\n {% for blueprint_item in blueprint_dynamic if groups[blueprint_item.host_group] is defined and groups[blueprint_item.host_group]|length > 0 -%}\n\n {\n "name" : "{{ blueprint_item.host_group }}",\n "configurations" : [ ],\n "components" : [\n {% for client in blueprint_item.clients -%}\n { "name" : "{{ client }}" },\n {% endfor %}\n{% if security|lower == "mit-kdc" or security|lower == "active-directory" -%}\n { "name" : "KERBEROS_CLIENT" },\n{% endif %}\n\n {% for service in blueprint_item.services -%}\n { "name" : "{{ service }}" }{% if service == "HIVE_METASTORE" and database == "embedded" and not ns.hivedb_embedded_defined %},{ "name" : "MYSQL_SERVER" }{% set ns.hivedb_embedded_defined=true %}{% endif %}{% if not loop.last %},{% endif %}\n\n {% endfor %}\n\n ]\n }{% if not loop.last and groups[blueprint_item.host_group]|length > 0 %},{% endif %}\n\n {% endfor %}\n\n ],\n "Blueprints" : {\n{% if security|lower == "mit-kdc" or security|lower == "active-directory" %}\n "security" : {\n "type" : "KERBEROS"\n },\n{% endif %}\n "stack_name" : "{% if namenode_groups|length > 0 %}HDP{% else %}HDF{% endif %}",\n "stack_version" : "{% if namenode_groups|length > 0 %}{{ hdp_minor_version }}{% else %}{{ hdf_minor_version }}{% endif %}"\n }\n}\n"}
I'm using the last commit
Hello Alexandru,
In your scripts, when we are using another database engine than Postgres, you are configuring MariaDB or MySQL official repo on ambari server node. Would it be possible to try installing package from OS repository before adding internet repo?
For offline installation, we have to create a specific repo and change vars in database role. It should more confortable to install mariadb directly from our OS repository.
Thanks,
We need to add and mount extra EBS volume to our cluster. the current script with only root volume looks like this(the default one):
nodes:
- host_group: "hdp-master"
count: 1
image: ami-2a7d75c0 # Ubuntu 16.04 AMI in eu-west-1 only (change this ID if using a different region)
type: r4.xlarge
public_ip: true
security_groups: default_cluster_access,ambari_access
root_volume:
ebs: true # non-EBS root volumes are not supported at the moment
type: gp2
size: 15
How can we add an additional disk here. What are all change we need to do in the script?
One thing we are thinking is to add this below root_volume
:
additional_volume:
ebs: true
type: gp2
size: 15
Will this work or need some changes in playbooks/clouds/build_aws_nodes.yml
as well?
Thanks for your help!
After the Upload the blueprint to the Ambari Server task, I get an error from the ambari-server,
"Invalid service component count: [APP_TIMELINE_SERVER [actual=0, required=1]
the amberi-server group_vars file has it defined
It seems to find all the other services but that one.
Help!
Hello,
I completed an installation of Hortonworks making small changes to the provided configuration.
After the completion i exported the blueprint and tried to install another cluster making use of this blueprint.
I am encountering the following issue:
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: '**zookeeper_log_max_backup_size**' is undefined\n\nThe error appears to have been in 'my-path/ansible-hortonworks/playbooks/set_variables.yml': line 35, column 11, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Read the static blueprint content\n ^ here\n"}
The blueprint correctly contains the following:
"zookeeper_log_max_backup_size": "10",
and
log4j.appender.ROLLINGFILE.MaxFileSize={{zookeeper_log_max_backup_size}}MB
If i hard code in the blueprint the value of the variable (10) in all the places where i have the variable the error changes as follow:
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: '**zookeeper_log_number_of_backup_files**' is undefined\n\nThe error appears to have been in 'my-path/ansible-hortonworks/playbooks/set_variables.yml': line 35, column 11, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Read the static blueprint content\n ^ here\n"}
For this reason, my understanding is that it is not able to work with variables. Is it expected?
Do i need to remove all the variables from the blueprint?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.