Configuring DataBricks on AWS

May 5, 2021 § Leave a comment

Despite the excellent QuickStart tools, this was way harder than I thought. For some reason I had the worst difficulty creating a Workspace on AWS for Databricks.

Here are some tips that might help others who get stuck.

A. Be clear which “Account ID” to enter where

  1. My Account ID on Databricks
  2. My Account ID on AWS
  3. Databrick’s Account ID on AWS (414351767826)

Sometimes a document will say “Databricks ID” — which could be either (1) or (3). The trick is to remember that (1) is a hexadecimal string, but AWS IDs are decimal.

B. The Policy goes “inline” – do not “attach”

The documentation is clear about this, but the AWS user interface is confusing enough that I missed it twice. This is probably the cause of the mysterious error message:

MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Create Placement Group, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Placement Group, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances, Create Internet Gateway, Create VPC, Delete VPC, Allocate Address, Release Address, Describe Nat Gateways, Delete Nat Gateway, Delete Vpc Endpoints, Create Route Table, Disassociate Route Table

C. Be sure to create your root S3 buckets in the proper region

I’m not entirely sure, but I think that is why the QuickStart kept failing on “createStorageConfiguration” and “createCredentials” (with mostly useless error messages, even in the CloudWatch log.

Received response status [FAILED] from custom resource. Message returned: See the details in CloudWatch Log Stream: 2021/05/05/[$LATEST]38cbfe886b2b4925b11c5f5375bba095 (RequestId: b3b7f3ba-33ab-47f6-a398-503140221b1c)

D. Try using CURL from the command-line

This was the version that finally helped me get it working. Possibly because it forced me to be precise about which Account ID I needed, but also because it included explicit AWS regions. NOTE: This requires you to properly create the Role ARN

// trust-databricks Role ARN
{
"Version": "2012-10-17",
"Statement": [
  {
    "Sid": "Grant Databricks Access",
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:PutObject",
      "s3:DeleteObject",
      "s3:ListBucket",
      "s3:GetBucketLocation"
    ],
    "Resource": [
      "arn:aws:s3:::my-previously-created-aws-bucket/*",
      "arn:aws:s3:::my-previously-created-aws-bucket"
    ]
  }
]
}
# .netrc
machine accounts.cloud.databricks.com
login ernest@my-company.com
password whatsit-toyou
curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/my-databricks-hex-account-id/credentials' \
      -d '{
      "credentials_name": "databricks-workspace-credentials-v2",
      "aws_credentials": {
        "sts_role": {
          "role_arn": "arn:aws:iam::my-aws-account-id:role/trust-databricks"
        }
      }
    }'
curl -X POST -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/my-databricks-hex-account-id/storage-configurations' \
  -d '{
    "storage_configuration_name": "databricks-workspace-storageconf-v1",
    "root_bucket_info": {
      "bucket_name": "my-previously-created-aws-bucket"
    }
  }'
curl -X POST -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/my-databricks-hex-account-id/workspaces' \
  -d '{
  "workspace_name": "my-databricks-workspace",
  "aws_region": "us-west-1",
  "credentials_id": "random-hex-string",
  "storage_configuration_id": "another-hex-string"
}'

Tagged: , ,

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

What’s this?

You are currently reading Configuring DataBricks on AWS at iHack, therefore iBlog.

meta

%d bloggers like this: