We recently advertised a Data Analyst role, which had the following desirable skills listed:
- creating and querying data models using SQL
- working with both structured and semi-structured data
- exploring and visualising data
- using probability and statistics to perform and support analyses
- drawing insights from large and complex datasets
- using hypothesis tests to create rigorous insights from data
- working in an agile manner to continuously deliver work
- articulating results to a broad range of audiences
We didn’t expect any candidates to tick every single box, but the most compelling applications at the first stage ticked a number of them.
The skills we looked for
When writing the job ad, there were really three areas we wanted the “desirable skills” to describe. They include “Data Analysis” and “Data Evangelism”, but this blog covers the first of the skills: “Data Engineering”.
What is Data Engineering?
Data Engineering is all about working with and structuring data. A large part of most data roles is sourcing, cleaning and shaping data so that it is fit for analysis or further use. We are no different, and we build and maintain multiple datasets.
How does it relate to the desirable skills?
In our job ad, the first two bullet points were the Data Engineering skills we were interested in:
- creating and querying data models using SQL
- working with both structured and semi-structured data
How to learn Data Engineering
Now you know the required competencies, I have outlined below where you can go to learn them. I hope you find it useful!
What do you need to know? | How can you learn? |
---|---|
How to work with structured data: i.e. data stored in a relational database | To work with structured data, you need to learn SQL, the language used to query databases. Search for “learn SQL online”, and there will be lots of courses that will give you the basics. Many courses are free, but charge for completion certificates (which I don’t consider as necessary). |
How to work with semi-structured data: i.e. data (usually) not in a database, but with some form of tagging to allow the data to be described, such as a JSON or XML file | It’s harder to find an online course for this specifically, but many of the courses listed below under Data Analysis will cover elements of this (e.g. you’ll use the Pandas read_json function in the python courses). You could also have a look at some of the Kaggle challenges, or find/build your own dataset and perform an analysis. For instance, smart watches, Fitbits and the like have a wealth of data – can you shape and analyse that to find something of interest? If you want to move into data, at some point you’re likely to come across Python, as it is a common language used to either process and analyse data. The 100 Days of Python course will help you go from “absolute beginner” to “extremely competent” in Python, and should help you write good quality code while enhancing your “data engineer” skills at the same time. |
How to model data: this means the best way to shape and store data based on its expected use | For a more general appreciation of data structure, it’s a bit more complex. Structuring data for OLTP vs a data warehouse is a very different proposition. Some of this comes with experience, and through doing SQL courses you might pick up on things as a side benefit. However, there are definitely patterns and anti-patterns that can be learnt. I think The Data Warehouse Toolkit by Ralph Kimball is very good from a data warehousing perspective (sorry Inmon fans!), but be warned – it quickly gets technical, and implicitly assumes prior database experience. |
Once you have started Data Engineering, don’t forget to check out the next blog in the series covering a skill so vital it’s in the job title: Data Analysis.