Software at Scale 13 - Emma Tang: ex Data Infrastructure Lead, Stripe

Effective Management of Big Data Platforms


Emma Tang was the Engineering Manager of the Data Infrastructure team at Stripe. She was also a Lead Software Engineer at Aggregate Knowledge, where she worked on the data platform.

We explore the technological and organizational challenges of maintaining big data platforms. We discuss when a company would require a “Big Data” system, what the properties of a good system look like, how some of these systems look like today, some of the tools/frameworks that work well, hiring the right engineers, and unsolved problems in the field.

Apple Podcasts | Spotify | Google Podcasts

0:30 - “Big Data” for software engineers - when does a company need a big data solution

2:30 - The transition from when a company uses a regular database to a big data solution, with a motivating example of Stripe

4:20 - Verification of processed output. Some of the tools involved: Amazon S3, Parquet, Kafka, and MongoDB.

9:00 - The cost of ensuring correctness in the data processing. Using tools like Protobuf to enforce schemas

13:30 - Data Governance as a trend

16:30 - Why should a company have a data platform organization?

21:30 - Hiring for data infrastructure/platform engineers

24:00 - How does a data organization maintain quality? What metrics do they look at?

28:30 - Trends of some problem areas in this space.

33:30 - Emma’s interest in data infrastructure, and advice for those looking to get into the field.