Blog

Spark packages from a password protected Repository

11 Dec, 2017
Xebia Background Header Wave

At my current client, we use Sonatype Nexus to store our artifacts. The repository is secured with a username/password both for publishing as downloading artifacts.

Spark is having support for specific repositories with the --repositories configuration.

We use it like this:

pyspark 
 --repositories https://readonly:secret_password@nexus/repository/maven-public/
 --packages com.example:foobar:1.0.0

Unfortunately, we ran into the following issue:

    ==== repo-1: tried

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.pom

      -- artifact com.example#foobar;1.0.0!foobar.jar:

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: com.example#foobar;1.0.0: not found

        ::::::::::::::::::::::::::::::::::::::::::::::

The strange thing: The url is correct. With curl we can download the dependency:

curl -s -o /dev/null -v https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.pom
* Hostname was NOT found in DNS cache
*   Trying 35...
* Connected to foobar.com (35.xxx.xxx.x) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
...
...
200 OK

Okay, let's debug this thing by using ivy directly.

Ivy is using a config file to configure the Nexus repository so I tried:


  defaultResolver="nexus"/>
  name="nexus-public"
                   value="https://nexus/repository/maven-public"/>
  
      name="nexus" m2compatible="true" root="${nexus-public}"/>
    

curl -L -O http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar
java -jar ivy-2.4.0.jar -settings ivy.settings -dependency com.example foobar 1.0.0 -debug

Here we end up with the same issue. So the issue is not Spark related, but Ivy.

    ==== nexus: tried

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.pom

      -- artifact com.example#foobar;1.0.0!foobar.jar:

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: com.example#foobar;1.0.0: not found

        ::::::::::::::::::::::::::::::::::::::::::::::

With the -debug option we find the following:

HTTP response status: 401 url=https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar
CLIENT ERROR: Unauthorized url=https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar
    nexus: resource not reachable for com/example#foobar;1.0.0: res=https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar

Now we understand the issue, we can start googling. I found this StackOverflow issue

So Let's change the basic authentication in the URL to a credentials block.


  defaultResolver="nexus"/>
  name="nexus-public"
                   value="https://nexus/repository/maven-public"/>
  host="nexus" realm="Sonatype Nexus Repository Manager"
    username="readonly" passwd="secret_password" />
  
      name="nexus" m2compatible="true" root="${nexus-public}"/>
    

Now everything works like a charm. Time to fix the pyspark command.

pyspark
  --packages com.example:foobar:1.0.0
  --conf spark.jars.ivySettings=/tmp/ivy.settings

Now Spark is able to download the packages as well. I'm a happy camper again. What is left for us to do, is to add this in our init script to initialize new dataproc clusters with this setup.

Explore related posts