What's data engineering?

Data engineering is the practice of designing and building software for collecting, storing, and managing data. The most common goal in data engineering is to enable stakeholders (such as product managers, marketing, or the C-suite) to make informed decisions with data. Other common goals are providing data to external users, features for a machine learning model, or empowering applications to react to events.

To enable all of these workflows, those who practice data engineering create infrastructure and processes to create the data when needed. Many of these processes take into consideration the difficulties mentioned below:

  • If it’s scattered, they make the data easier to combine (for example, taxi trips and weather data)
  • If it’s inconsistent, they clean and test the data so stakeholders can trust it
  • If it’s large, they optimize how the data is used to reduce time and costs

Data engineering can be a difficult and time-consuming process due to the complexities of managing data from disparate sources. As data becomes larger and more complex, manual workflows become too time-intensive and unreliable. This is when data practitioners may consider adopting an orchestrator.