Data Craze Weekly #7

This message was sent first to subscribers of Data Craze Weekly newsletter.

Data Craze Weekly

Weekly dose of curated informations from data world!
Data engineering, analytics, case studies straight to your inbox.

    No spam. Unsubscribe at any time.


    The administrator of personal data necessary in the processing process, including the data provided above, is Data Craze - Krzysztof Bury, Piaski 50 st., 30-199 Rząska, Poland, NIP: 7922121365. By subscribing to the newsletter, you consent to the processing of your personal data (name, e-mail) as part of Data Craze activities.


    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

    Week in Data

    Data Errors in the Big World

    We often hear about interesting products, clever algorithms and… great earnings in large technology companies.

    What we rarely hear about are the problems they face, which (like the scale of the company) are often large.

    That’s why it’s even more worth reading what bugs the “great ones” of this world have to deal with.

    This case from LinkedIn shows the scale:

    Back in October 2018, we had an instance at LinkedIn when data quality problems affected the job recommendations platform. Client job views and usage decreased by 40 to 60% for a short period of time. Once this decline in views was detected, it took a total of 5 engineers 8 days to identify the root cause and 11 days to resolve the issue.

    Link: https://medium.com/@kylejameskirwan/real-oh-damn-moments-from-data-engineers-d900f1961c14

    More about contracts

    I have already written about what contracts are in the world of data in one of the previous editions of the newsletter.

    In short, it is nothing more than an agreement between teams (usually Frontend - Backend (Data)) regarding the method (scheme) of transmitting data / results.

    This time I would like to return to this topic with an example from GoCardless.

    The article does not cover the most important technical issues, because they will be specific to the company itself (GoCardless).

    What is more important is what has been achieved.

    It has allowed us to build what we refer to as our contract-driven data infrastructure, where from a Data Contract we can deploy all the tooling and services required to generate, manage and consume that data.

    Although the concept itself is not new, we will hear more and more about it in the world of data. Data quality is (always has been) crucial and any SDLC (Software Development Lifecycle) element that can help maintain it at the highest possible level will be eagerly used.

    Link: https://medium.com/gocardless-tech/implementing-data-contracts-at-gocardless-3b5c49074d13

    Data visualizations – more than a bar chart

    A few years ago, I devoted a lot of time in my daily work to data visualization.

    Thanks to great tools, I didn’t have to create them from scratch.

    My task was to best match the visualization to the story the data was telling… and at the end of the day it all ended up in Excel 😀 However, what has always fascinated me is the work of people who took visualizations to a different, many times higher level. If you work with data every day and one of your tasks is to visualize it, let yourself be inspired.

    Link: https://nightingaledvs.com/five-inspiring-data-visualization-galleries/

    Tools

    Pluralith – visualize terraform infrastructure, directly from your codebase completely automated.

    Do you use Terraform to build infrastructure in your company/project? This tool will be great for visualizing it. It will show how the elements are connected to each other, all without major problems and additional work.

    There is a paid option and a completely free one.

    Link: https://www.pluralith.com

    Check Your Skills

    #SQL

    Create two equivalent (in the context of the result sets) queries. Queries should be a join of sales data (SALES table) and product data (PRODUCTS table) (after the PRODUCT_ID join key).

    Solution: https://www.db-fiddle.com/f/3AqtpSy5NX8mUNC2BGoS53/0

    More SQL related questions you can find at SQL - Q&A

    Data Jobs

    Skills sought: GCP / AWS, ETL Tools (ex. Matillion), Data Architecture, SQL, RDBMS, Python

    Skills sought: MS SQL Server, SSRS, SSIS, SQL