Data Craze Weekly #4

This message was sent first to subscribers of Data Craze Weekly newsletter.

Data Craze Weekly

Weekly dose of curated informations from data world!
Data engineering, analytics, case studies straight to your inbox.

    No spam. Unsubscribe at any time.


    The administrator of personal data necessary in the processing process, including the data provided above, is Data Craze - Krzysztof Bury, Piaski 50 st., 30-199 Rząska, Poland, NIP: 7922121365. By subscribing to the newsletter, you consent to the processing of your personal data (name, e-mail) as part of Data Craze activities.


    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

    Week in Data

    The Myth of Self-Service BI

    Like a unicorn gliding over a rainbow, I have heard many times how self-service BI will fix all of corporate analytics.

    Self-service, i.e. putting analytics (the ability to create reports / metrics, etc.) in the hands of users is a great thing - there is no doubt about it.

    As always, the problem is in the details.

    If we do not take care of data quality, monitoring the tool (how many reports users create, how often they are used, how much memory they take up, etc.), periodic cleaning or training in how to use the tool, I can say with a high degree of probability that “Houston We have a problem. “

    Unfortunately, there are no magic solutions to analytics, but there is a lot of tedious work that, if done well, will produce great results.

    Quoting from the author:

    In this case, Self-Service BI is not the golden ticket to solve all company’s problems, rather it’s a tool to enable the top-notch analytics.

    Link: https://analysiswithanh.medium.com/why-does-self-service-bi-fail-and-what-could-enterprises-do-to-turn-the-tide-a7e2e577cc9e

    SLA, SLO, SLI for teams responsible for data

    You may be familiar with the gold standard of 11 9’s for service availability or durability.

    Can such metrics be reproduced in the world of data? Does your team have its own SLA, SLO or SLI? What are these metrics?

    If a customer reports a data error to us, do we have a process for how to fix it? Can we provide information on how long the repair will take?

    What all this is and how to implement such topics step by step is described by Xioaxu Gu in his article.

    What is important is the fact - no matter what we call the metrics or processes. If we do not check and measure elements of the process as a team, we will never know what is going well, what is going wrong and whether we have anything to improve.

    You can’t improve what you don’t measure. As a mature data team, you should start thinking about data reliability from today. It’s not a hot topic in the industry yet, but it brings long-term value to the team as data is becoming the backbone of the organization. This is also a niche that differentiates your team from the rest.

    Link: https://towardsdatascience.com/its-time-to-set-sla-slo-sli-for-your-data-team-only-3-steps-ed3c93009aa5

    Kendrick Lamar Python Discography

    Are you looking for a project to finally play with Python, here you go.

    The author of this article used Python to create visualizations of words, their sounds (positive / negative / neutral), etc.

    Apart from the artist, the article goes step by step through the topics of sentiment analysis, data collection and cleaning, and scoring of the “rich” linguistic background - for each of these things there is an example code fragment.

    And now let’s get to work, Sławomir, Zenek and other Polish greats are just waiting for their turn!

    Link: https://medium.com/geekculture/analyzing-and-scraping-the-lyrics-of-every-kendrick-lamar-album-in-python-b0551dcb563a

    Tools

    explain.depesz.com - “PostgreSQL’s explain analyze made readable”

    Are you working with a PostgreSQL database? Are you checking query execution plans? This tool will help you more than once.

    A great website where you can easily paste your query execution plan and get a clear result which parts of the plan are worth paying attention to. Additionally, you can keep the plan public or anonymize the result.

    Link: https://explain.depesz.com/

    Check Your Skills

    #SQL

    “Based on all the products (PRODUCTS table), create a list of product names (PRODUCT_NAME column) and an array of unique categories (PRODUCT_CATEGORY column) to which do these products belong?”

    Solution: https://www.db-fiddle.com/f/pDaBkugoEEMjTC9eEcx6J9/0

    More SQL related questions you can find at SQL - Q&A

    Data Jobs

    Skills sought: Analytics / Dashboarding Tools (Qlik / Tableau / PowerBI), Analytical skills, SQL