Welcome to the Open Source at GoDataDriven, April 2018 edition.
Kris worked on his docker-kafka project, contributing PR 6 and 7 to upgrade Kafka version
and use openjdk instead of Oracle Java. His docker-kafka project, self contained with Zookeeper,
is invaluable if you want to get started. The fact that you can find the instance by name instead
of by IP also makes it a great candidate if you want to use it on a training environment (where not
all students will be familiar with the intricacies of Docker and IPs).
Kris didn’t stop here and contributed PR 13 to Scruid, to fix response parsing!
Henk then made this world a better place by improving the documentation of our tool to provision
a training environment on GCP (Google Cloud Platform).
Fokko instead (he doesn’t like to sit idle, does he?) contributed to Homebrew with PR 26793
to upgrade Scala to 2.11.12 (I mean, can you imagine Fokko running an outdated version of Scala?
). He went on with work on Airflow with PR 3252 and 3201. To close it, he improved Divolte
with PR 203, 215, and 216.
Rodrigo, even though his PR is still waiting to be merged, contributed
PR 10913 to scikit-learn to enable the handling of unseen label in the MultiLabelBinarizer!
Tim open sourced a package to access your Google Fit data, aptly called py_google_fit!
To conclude the show, I also tried to make Airflow better: first I open sourced hmsclient, a
Python package to interact with the Hive metastore. Airflow was, in fact, using a deprecated
client for all the interactions with the metastore. As a result, the part of Airflow interacting
with the metastore was not Python 3 compatible. With PR 3239 — by me — that is now fixed.
Not departing from the "big" data world, I also contributed PR 19 to findspark, so that
PYSPARK_PYTHON
is now a respected environment variable.
That’s it for this edition! Don’t forget we’re hiring! Especially if you are a software engineer
that would like to move in the data space, get in touch!
And if you want more rambling throughout the month, follow me on Twitter: I’m gglanzani there!
- GoDataDriven open source contribution: June 2018 edition
- GoDataDriven open source contribution: March 2018 edition
- GoDataDriven open source contribution: May 2018 edition
- GoDataDriven open source contribution: November 2018 edition
- GoDataDriven open source contribution: September and October 2018 edition
- GoDataDriven open source contribution: April 2018 edition
- GoDataDriven open source contribution: Augustus 2018 edition
- GoDataDriven open source contribution: December 2018 edition
- GoDataDriven open source contribution: February 2018 edition
- GoDataDriven Open Source Contribution for February 2019, the first Open Source Initiatives edition
- GoDataDriven Open Source Contribution for January 2019, the Apache Edition
- GoDataDriven Open Source Contribution for March and April 2019
- GoDataDriven Open Source Contribution for May and June 2019
- GoDataDriven Open Source Contribution for Q4 2019
- GoDataDriven open source contribution: July 2018 edition