Targeting multiple Spark versions with sbt-projectmatrix
I was recently involved in projects that had to work on multiple Spark versions.
CI made simpler with Earthly
Building applications in containers has well-known benefits. Encapsulating the whole build environment in a container removes the possibly lenghty and hardly reproducible process of setting up the build environment; while the smart use of build stages and layers opens up
opportunities for caching and parallelism. For builds
that consist of seve...
Migrating LDBC SNB Datagen to Spark
LDBC’s Social Network Benchmark (LDBC SNB) is an industrial and academic initiative, formed by principal actors in the field of graph-like data management. Its goal is to define a framework where different graph-based technologies can be fairly tested and compared, that can drive the identification of systems’ bottlenecks and required functional...
Deriving Spark Encoders and Schemas Using Implicits
Since Apache Spark 2.0 the Dataset API is the preferred way of programming over low level RDDs. However when migrating complex business entities from RDDs to Datasets, a handful of problems arise. One is the lack of support for user defined types, confining the developer to a predefined set of types and severely hindering the usefulness of Datas...
Set up multiple Git users on your machine
It is quite common to have separate users for different git repos. For example having a public account
for all your open-source GitHub stuff, and a work account for your employer’s private git remote.
Implementing Kleene logic two different ways in Scala
Recently I’ve been writing a graph query engine at Fault Tolerant System
Research Group at uni. The frontend is Cypher, a language popularized
by Neo Technology shipped OOTB with their graph database Neo4j.
26 post articles, 4 pages.