Show notes are at https://stevelitchfield.com/sshow/chat.html
…
continue reading
Innehåll tillhandahållet av Tobias Macey. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Tobias Macey eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !
Gå offline med appen Player FM !
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
MP3•Episod hem
Manage episode 431141312 series 3449056
Innehåll tillhandahållet av Tobias Macey. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Tobias Macey eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.
Summary
Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.
Announcements
Parting Question
…
continue reading
Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
- At Outshift, the incubation engine from Cisco, they are driving innovation in AI, cloud, and quantum technologies with the powerful combination of enterprise strength and startup agility. Their latest innovation for the AI ecosystem is Motific, addressing a critical gap in going from prototype to production with generative AI. Motific is your vendor and model-agnostic platform for building safe, trustworthy, and cost-effective generative AI solutions in days instead of months. Motific provides easy integration with your organizational data, combined with advanced, customizable policy controls and observability to help ensure compliance throughout the entire process. Move beyond the constraints of traditional AI implementation and ensure your projects are launched quickly and with a firm foundation of trust and efficiency. Go to motific.ai today to learn more!
- Your host is Tobias Macey and today I'm interviewing Tom Baeyens about using data contracts to build a clearer API for your data
- Introduction
- How did you get involved in the area of data management?
- Can you describe the scope and purpose of data contracts in the context of this conversation?
- In what way(s) do they differ from data quality/data observability?
- Data contracts are also known as the API for data, can you elaborate on this?
- What are the types of guarantees and requirements that you can enforce with these data contracts?
- What are some examples of constraints or guarantees that cannot be represented in these contracts?
- Are data contracts related to the shift-left?
- Data contracts are also known as the API for data, can you elaborate on this?
- The obvious application of data contracts are in the context of pipeline execution flows to prevent failing checks from propagating further in the data flow. What are some of the other ways that these contracts can be integrated into an organization's data ecosystem?
- How did you approach the design of the syntax and implementation for Soda's data contracts?
- Guarantees and constraints around data in different contexts have been implemented in numerous tools and systems. What are the areas of overlap in e.g. dbt, great expectations?
- Are there any emerging standards or design patterns around data contracts/guarantees that will help encourage portability and integration across tooling/platform contexts?
- What are the most interesting, innovative, or unexpected ways that you have seen data contracts used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on data contracts at Soda?
- When are data contracts the wrong choice?
- What do you have planned for the future of data contracts?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- Soda
- Podcast Episode
- JBoss
- Data Contract
- Airflow
- Unit Testing
- Integration Testing
- OpenAPI
- GraphQL
- Circuit Breaker Pattern
- SodaCL
- Soda Data Contracts
- Data Mesh
- Great Expectations
- dbt Unit Tests
- Open Data Contracts
- ODCS == Open Data Contract Standard
- ODPS == Open Data Product Specification
448 episoder
MP3•Episod hem
Manage episode 431141312 series 3449056
Innehåll tillhandahållet av Tobias Macey. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Tobias Macey eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.
Summary
Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.
Announcements
Parting Question
…
continue reading
Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
- At Outshift, the incubation engine from Cisco, they are driving innovation in AI, cloud, and quantum technologies with the powerful combination of enterprise strength and startup agility. Their latest innovation for the AI ecosystem is Motific, addressing a critical gap in going from prototype to production with generative AI. Motific is your vendor and model-agnostic platform for building safe, trustworthy, and cost-effective generative AI solutions in days instead of months. Motific provides easy integration with your organizational data, combined with advanced, customizable policy controls and observability to help ensure compliance throughout the entire process. Move beyond the constraints of traditional AI implementation and ensure your projects are launched quickly and with a firm foundation of trust and efficiency. Go to motific.ai today to learn more!
- Your host is Tobias Macey and today I'm interviewing Tom Baeyens about using data contracts to build a clearer API for your data
- Introduction
- How did you get involved in the area of data management?
- Can you describe the scope and purpose of data contracts in the context of this conversation?
- In what way(s) do they differ from data quality/data observability?
- Data contracts are also known as the API for data, can you elaborate on this?
- What are the types of guarantees and requirements that you can enforce with these data contracts?
- What are some examples of constraints or guarantees that cannot be represented in these contracts?
- Are data contracts related to the shift-left?
- Data contracts are also known as the API for data, can you elaborate on this?
- The obvious application of data contracts are in the context of pipeline execution flows to prevent failing checks from propagating further in the data flow. What are some of the other ways that these contracts can be integrated into an organization's data ecosystem?
- How did you approach the design of the syntax and implementation for Soda's data contracts?
- Guarantees and constraints around data in different contexts have been implemented in numerous tools and systems. What are the areas of overlap in e.g. dbt, great expectations?
- Are there any emerging standards or design patterns around data contracts/guarantees that will help encourage portability and integration across tooling/platform contexts?
- What are the most interesting, innovative, or unexpected ways that you have seen data contracts used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on data contracts at Soda?
- When are data contracts the wrong choice?
- What do you have planned for the future of data contracts?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- Soda
- Podcast Episode
- JBoss
- Data Contract
- Airflow
- Unit Testing
- Integration Testing
- OpenAPI
- GraphQL
- Circuit Breaker Pattern
- SodaCL
- Soda Data Contracts
- Data Mesh
- Great Expectations
- dbt Unit Tests
- Open Data Contracts
- ODCS == Open Data Contract Standard
- ODPS == Open Data Product Specification
448 episoder
Alla avsnitt
×Välkommen till Player FM
Player FM scannar webben för högkvalitativa podcasts för dig att njuta av nu direkt. Den är den bästa podcast-appen och den fungerar med Android, Iphone och webben. Bli medlem för att synka prenumerationer mellan enheter.