Difference between revisions of "Data Pipeline"
Ntustma012 (talk | contribs) (Created page with "A data pipeline is a technique for transferring raw data from different data sources to a data store, such as a data lake or data warehouse, for analysis. In most cases, data is processed before it enters a data repository. This includes data transformations that guarantee proper data integration and standards, such as filtering, masking, and aggregations. When a relational database is the dataset's final destination, this is very crucial. In order to update current data...") |
Ntustma012 (talk | contribs) (Initialization) |
||
Line 1: | Line 1: | ||
A data pipeline is a technique for transferring raw data from different data sources to a data store, such as a data lake or data warehouse, for analysis. In most cases, data is processed before it enters a data repository. This includes data transformations that guarantee proper data integration and standards, such as filtering, masking, and aggregations. When a relational database is the dataset's final destination, this is very crucial. In order to update current data with new data, this form of data repository needs alignment, or the matching of data columns and types. | A data pipeline is a technique for transferring raw data from different [[Data Source|data sources]] to a data store, such as a data lake or data warehouse, for analysis. In most cases, data is processed before it enters a data repository. This includes data transformations that guarantee proper [[Data Integration|data integration]] and standards, such as filtering, masking, and aggregations. When a relational database is the dataset's final destination, this is very crucial. In order to update current data with new data, this form of data repository needs alignment, or the matching of data columns and types. | ||
Data pipelines serve as the "piping" for data science initiatives or business intelligence dashboards, as their name suggests. Data may be obtained from a wide range of sources, including APIs, SQL and NoSQL databases, files, etc., but sadly, most of this data isn't immediately usable. Data scientists or data engineers are typically in charge of data preparation duties since they organize the data to suit the requirements of the business use case. A combination of exploratory data analysis and clearly stated business objectives is typically used to establish the kind of data processing that a data pipeline needs. The data can then be kept and made available for use after being properly filtered, combined, and summarized. A variety of data projects, such as exploratory data analysis, data visualizations, and machine learning tasks, can be built on top of well-organized data pipelines. | Data pipelines serve as the "piping" for [[Data Science|data science]] initiatives or business intelligence dashboards, as their name suggests. Data may be obtained from a wide range of sources, including APIs, SQL and NoSQL [[Database|databases]], files, etc., but sadly, most of this data isn't immediately usable. [[Data Scientist|Data scientists]] or data engineers are typically in charge of data preparation duties since they organize the data to suit the requirements of the business use case. A combination of exploratory data analysis and clearly stated business objectives is typically used to establish the kind of data processing that a [[Data Pipeline|data pipeline]] needs. The data can then be kept and made available for use after being properly filtered, combined, and summarized. A variety of data projects, such as exploratory data analysis, [[Data Visualizations|data visualizations]], and [[Machine Learning|machine learning]] tasks, can be built on top of well-organized [[Data Pipeline|data pipelines]]. | ||
== Types of Data Pipelines == | |||
* [[Batch Processing]] | |||
* [[Streaming Data]] | |||
== Data pipeline architecture == | |||
* [[Data Ingestion]] | |||
* [[Data Transformation]] | |||
* [[Data Storage]] | |||
== Use cases of data pipelines == | |||
* [[Exploratory Data Analysis]] | |||
* [[Data Visualizations]] | |||
* [[Machine Learning|Machine learning]] |
Latest revision as of 06:53, 18 September 2022
A data pipeline is a technique for transferring raw data from different data sources to a data store, such as a data lake or data warehouse, for analysis. In most cases, data is processed before it enters a data repository. This includes data transformations that guarantee proper data integration and standards, such as filtering, masking, and aggregations. When a relational database is the dataset's final destination, this is very crucial. In order to update current data with new data, this form of data repository needs alignment, or the matching of data columns and types.
Data pipelines serve as the "piping" for data science initiatives or business intelligence dashboards, as their name suggests. Data may be obtained from a wide range of sources, including APIs, SQL and NoSQL databases, files, etc., but sadly, most of this data isn't immediately usable. Data scientists or data engineers are typically in charge of data preparation duties since they organize the data to suit the requirements of the business use case. A combination of exploratory data analysis and clearly stated business objectives is typically used to establish the kind of data processing that a data pipeline needs. The data can then be kept and made available for use after being properly filtered, combined, and summarized. A variety of data projects, such as exploratory data analysis, data visualizations, and machine learning tasks, can be built on top of well-organized data pipelines.