Once upon a time it used to be simple – you had a single database with few tables. When you had a question about your data you would either connect to the database yourself or ask one of the engineers to do it for you.
With the inflation of data and the era of big data – this is less and less feasible.
Depending on your company and what it does – there is a very good chance the company’s data is spread across one or more cloud providers, each containing different types of data sets, including data lakes, data warehouses and the old fashioned databases.
There is a good chance you don’t know each of the datasets intimately and are clueless as to how to pull the right data when you have a data related question.
If you don’t have a data analyst on your team – then most likely you are sending your questions to the engineering team. They will pull the data for you (well… most of the time…), but let’s face it – it’s not the best usage of their time, right? You want the engineering team to focus on releasing features and not on running SQL queries.
This is where data analysts come in.
They will help you extract the data you need, when you need it.
But let’s discuss what they can in further detail.
Wait a second – what’s the difference between a data analyst, a BI developer, a BI analyst and a data scientist?
I hear you. I totally get the confusion around this. Especially when the industry is not cut and clear about this as well.
Before delving into the various differences – I’d like to do one thing – and that’s to take the data scientist out of this discussion.
Data scientists deserve a post of their own, and they cannot be put in the same group as data analysts and BI analysts/developers since their responsibilities are quite different and require a different skill set, training and background.
Ok, so now that the data scientists are out – let’s discuss the high level terms of ‘analyst’ VS. ‘BI’.
In super short – ‘analysts’ deal with organizing the data, visualizing it and extracting information. ‘BI’ (which stands for ‘Business Intelligence’) is more about supporting the decision making process in the organization by extracting insights from the data provided.
Now, you can make things even more confusing by further classifying the analysts to ‘business analysts’ and ‘product analysts’ and the BI people to BI analysts and BI developers.
To be honest – I’m getting a bit lost here too. Here is my take on this based on what I’ve observed in the Israeli hi-tech scene:
- Business analysts are quite rare and exist mainly in big companies where there are plenty of analysts serving different departments. In such cases – the business analysts serve the business people by digging the relevant data for them and visualizing it.
- Product data analysts are doing exactly the same as business analysts, but they are serving different stakeholders – the product team. In addition to what business analysts are doing – they are helping the product team with making data driven decisions about the products.
- BI analysts – well… I’ve rarely seen those around me. But when they exist in the organization they are expected to be much more proactive about the data and generate insights that can serve the business based on the business KPIs and not per an ad-hoc request necessarily.
- BI developers are a bit different from what I’ve observed. They work much more closely with the engineers and focus on organizing the data in a way that makes sense and makes it easier to extract. As the name hints – it’s more about development, but in the realms of cloud databases and ETLs. Most small-medium companies simply expect their engineers or DevOps to do this work.
It could be possible that in the reality you are in – the people in these roles have a bit different set of responsibilities. So don’t stick to the definitions so much.
In the context of this post I’ll be focusing on product data analysts (PDA in short). Small companies usually refer to them as data analysts and bigger companies, which have dedicated business analysts, will call them PDAs.
How can a PDA help you?
A PDA can assist you and your team with the following:
- Answer ad-hoc data-related questions
- Build dashboards for common and repeated queries (visualization)
- Support the engineering team with the data strategy
- Spot trends early on (taking a proactive approach)
- Investigate data anomalies and pinpoint the root cause
When expanding the team – how soon should I focus on hiring a PDA?
As soon as possible. And you’ll thank me for that.
Understand this – without a PDA you don’t have easy access to the data. Without easy access to data it’d be very hard to make data driven decisions.Â
I believe I rest my case..
Seriously, think about this scenario – you just released a feature and you want to understand its activation rate and how it affects some various KPIs. How can you know this without a PDA?Â
You have two options:
- Reach out to the engineering team and ask them to pull this data for youÂ
- Devote a great amount of time yourself to become familiar with the various data sets and build the queries yourself.
I already noted that the first one doesn’t sound like a good time utilization of the engineering team. And the second doesn’t sound like a good time utilization of yours.
In the era of big data I see very little choice here but to employ a PDA as soon as you can afford one.
Who manages the PDA?
Well, the best practice in the industry is that the PDA belongs to the product department and they are reporting to one of the product managers in the department (usually the director/head of product).
I do want to note that I know a big company in which all the data & BI analysts were put in a ‘data’ department of their own – and you could only make data related requests using a ticketing system.
Personally, I find this an ineffective approach, as you are not getting familiar with the people involved and the one who is going to handle your request is a random analyst who may or may not have the relevant expertise for your group.
How do I effectively utilize the PDA?
Whether they report to you directly or you are just ‘authorized’ to assign them with tasks – their utilization looks the same. PDAs will be tasked with the following:
- Extracting data per an ad-hoc request from an internal stakeholder. If the company is small – it could be that the PDA is the sole analyst in the company. In that case they may receive such data queries from the business themselves or even from customers (indirectly through customer success, business or sales). This is in addition to the product related queries. So this can easily become a time-waster if not managed properly.
- Visualize the data by building dashboards. Again – if the company is small – requests for data visualization can flood in from all departments.
- Support product related queries and/or follow a feature’s performance
- Setting up data alerts and monitors for spotting negative trends early on.
- Investigating a data anomaly (like a drop of revenues) and finding the root cause.
If you are the one responsible for their time, I strongly recommend the following strategy for an efficient management of their time:
- Single access point – you are the only one assigning them with tasks. Even if the CEO wants something from them – they need to go through you. Of course the brutal approach of announcing to the world that ‘no one is talking to this guy but me’ is highly unrecommended. Explain to your boss (and even to the CEO separately if needed) why it’s needed and why it will result in everyone’s benefit that someone who is seeing the full picture will manage their priorities.
- The general approach should be that both the external stakeholders (customers) and the internal ones (business, executives, sales, etc..) should get answers to 95% of their data related questions using dashboards (whether internal or customer facing ones). If a query is repeated often, or you expect it to be repeated often – task the PDA with building a dashboard.
- Data investigations cost a lot of time. Minimize them by setting up an alerting & monitoring system early on. If you designed it properly and prioritized it, many negative trends and anomalies will be spotted quite early and the root cause will most likely be obvious. The worst that can happen is that the customer notices the negative trend and it comes to you as a surprise. Your PDA will probably need to drop everything and look into this right away.
- Invest the PDA time in getting familiar with the data structure and the visualization tools early on. Tell them you expect them to become power users of the tools they are going to use and not just ‘users’. For example – if your company is using Looker as the internal visualization tool – make sure the PDA becomes a Looker wizard very quickly. It will have a very high ROI. Trust me.
- Connect them with a guild or mentor. Most likely you won’t be able to address all their technical questions and all of their data modeling considerations. Make sure you connect them with someone they can consult with. Many veteran analysts will be willing to be their mentors (even if not in your company) and if you can’t find one – then focus on a ‘guild’ they can belong to.
- Make them part of the Scrum process. If you are working with sprints – make sure they become part of the sprint and have their own commitments. This will serve two things:
- Make them feel they are part of the team and part of the overall effort of achieving the business goals.
- Cut the interruptions and context switches by educating everyone that everything can wait for ‘the next sprint’.
- Force them to think ‘modular’ and reuse components and queries. In software engineering one of the worst things you can do is ‘code duplication’. It has tons of drawbacks, but mainly that each mistake needs to be fixed in the places the code was duplicated in and also the development time increases (because the code is not written in a reusable way). From what I’ve observed – the same applies to the data world. So make sure the PDA never duplicates queries and reuse visualization components they have built in various dashboards.
If you did a good job as their manager – the context switches will be minimal and most of their time would focus on providing dashboards so all the stakeholders can get their data without interrupting them. Also the time it takes them to set up a dashboard will decline over time until it’s a ‘no-brainer’.
That wraps up the post for today.
If you found this post/series useful – let me know in the comments. If you think others can benefit from it – feel free to share it with them.
Thank you, and until next time 🙂