Home

I was recently involved in projects that had to work on multiple Spark versions.

Building applications in containers has well-known benefits. Encapsulating the whole build environment in a container removes the possibly lenghty and hardly reproducible process of setting up the build environment; while the smart use of build stages and layers opens up opportunities for caching and parallelism. For builds that consist of seve...

LDBC’s Social Network Benchmark (LDBC SNB) is an industrial and academic initiative, formed by principal actors in the field of graph-like data management. Its goal is to define a framework where different graph-based technologies can be fairly tested and compared, that can drive the identification of systems’ bottlenecks and required functional...

Since Apache Spark 2.0 the Dataset API is the preferred way of programming over low level RDDs. However when migrating complex business entities from RDDs to Datasets, a handful of problems arise. One is the lack of support for user defined types, confining the developer to a predefined set of types and severely hindering the usefulness of Datas...

It is quite common to have separate users for different git repos. For example having a public account for all your open-source GitHub stuff, and a work account for your employer’s private git remote.

Recently I’ve been writing a graph query engine at Fault Tolerant System Research Group at uni. The frontend is Cypher, a language popularized by Neo Technology shipped OOTB with their graph database Neo4j.

Targeting multiple Spark versions with sbt-projectmatrix

CI made simpler with Earthly

Migrating LDBC SNB Datagen to Spark

Deriving Spark Encoders and Schemas Using Implicits

Who's watching? 👀

Set up multiple Git users on your machine

Implementing Kleene logic two different ways in Scala