Senior Data Engineer / ML Ops (Media Data & AI Enablement)
Engineering Role Details
Posted Apr 02, 2025At TeamStation AI, we are on a mission to bring together the brightest minds to solve tomorrow’s toughest technology challenges. Our work is about more than just AI—it’s about building the future through collaboration and innovation. We believe that the key to solving the world’s most complex problems lies in aligning diverse talents and perspectives. Our AI-powered platform enables cutting-edge scientific and technical teams to work smarter, faster, and together. By joining us, you’ll help unlock new technological breakthroughs and drive innovation where it matters most.
Join the Mission at TeamStation AI!
Where do we come from? We are seeking visionaries, innovators, and problem solvers who thrive in fast-paced, collaborative environments. If you’re passionate about AI, technology, and solving critical challenges, we want to hear from you. Come be part of a team where your ideas can drive the future.
About the Role:
TeamStation AI is partnering exclusively with a leading global technology solutions provider serving major Hollywood studios and the broader media and entertainment industry. We are seeking a highly skilled Senior Data Engineer to play a pivotal role in enabling advanced AI and Machine Learning capabilities within this dynamic sector.
The core challenge? Unlocking the potential hidden within vast amounts of complex studio data – including video, audio, image, and text formats. As a Senior Data Engineer, you will be the architect and builder of the critical data infrastructure needed by our partner's AI team. You'll design, implement, and manage robust data pipelines and a scalable data lakehouse, transforming raw media assets into structured, analysis-ready data. Your work will directly automate workflows and provide the clean, reliable data foundation essential for training ML models, powering computer vision systems, and driving innovation in media workflows.
Responsibilities:
- ✓ Architect, build, and maintain scalable, automated data pipelines for ingesting, cleaning, validating, and transforming large volumes of diverse media data (video, audio, text, metadata).
- ✓ Design, implement, and manage a robust and efficient data lakehouse architecture (utilizing technologies like Delta Lake on platforms such as Databricks or Snowflake) optimized for media data types and downstream AI/ML consumption.
- ✓ Develop, deploy, and monitor complex ETL/ELT workflows using orchestration tools (e.g., Apache Airflow, Prefect, Databricks Workflows) ensuring reliability and performance.
- ✓ Implement rigorous data quality checks, monitoring frameworks, and data governance practices throughout the data lifecycle.
- ✓ Optimize data processing tasks (primarily using Apache Spark) for scalability, efficiency, and cost-effectiveness, particularly for compute-intensive media processing.
- ✓ Collaborate closely with AI Engineers, ML Engineers, and Data Scientists at our partner company to deeply understand their data requirements and ensure the data platform effectively supports feature engineering, model training, and evaluation.
- ✓ Manage data storage within AWS (S3, Glacier, etc.), implementing best practices for organization, security, access control, and lifecycle management.
- ✓ Utilize Infrastructure as Code (IaC) principles and tools (e.g., Terraform) to provision and manage data infrastructure components reliably and repeatably.
- ✓ Proactively troubleshoot and resolve complex data pipeline and platform issues, ensuring high data availability and integrity.
- ✓ Stay abreast of emerging technologies and best practices in data engineering, particularly concerning big data, lakehouse patterns, and media processing.
Key Technologies:
- Data Processing & Orchestration:
- Python, SQL, Apache Spark
- Apache Airflow, Prefect, Databricks Workflows, or similar orchestrators
- Data Storage & Lakehouse:
- AWS S3, Delta Lake (essential)
- Databricks, Snowflake, or similar platforms
- PostgreSQL or other relational/NoSQL databases as needed
- Cloud & Infrastructure:
- AWS (core services like Glue, EMR, Lambda, IAM, EC2, VPC, etc.)
- Terraform
- Docker (for development/deployment environments)
- CI/CD:
- GitHub Actions or similar
Qualifications:
- ✓ 5+ years of dedicated experience in Data Engineering roles.
- ✓ Proven track record of designing, building, and optimizing complex data pipelines processing large-scale datasets, with demonstrable experience handling unstructured or semi-structured data (video, audio, images, text is a strong plus).
- ✓ Strong, hands-on experience with data lakehouse concepts and technologies, particularly Delta Lake.
- ✓ Expert-level proficiency in Apache Spark for distributed data processing.
- ✓ Advanced programming skills in Python and extensive experience with SQL.
- ✓ Deep understanding and practical experience with AWS cloud data services (S3, Glue, EMR, Lambda, IAM are key).
- ✓ Significant experience with workflow orchestration tools like Apache Airflow or Prefect.
- ✓ Experience implementing Infrastructure as Code (IaC) using Terraform.
- ✓ Solid understanding of data modeling, data warehousing principles, and data quality assurance techniques.
- ✓ Excellent analytical and problem-solving skills, with the ability to debug complex distributed systems.
- ✓ Strong communication skills and a collaborative mindset to work effectively with AI/ML teams and stakeholders.
Bonus Points For:
- ☐ Direct experience working within the Media & Entertainment industry, including familiarity with common media formats, metadata standards, or post-production workflows.
- ☐ Hands-on experience with the Databricks unified data platform.
- ☐ Experience using Snowflake for data warehousing/lakehouse implementations.
- ☐ Familiarity with stream processing technologies (e.g., Apache Kafka, AWS Kinesis).
- ☐ Basic understanding of MLOps principles and the lifecycle of machine learning models.
Why Join Us?:
- Foundational Impact: Build the critical data infrastructure that unlocks AI innovation for major Hollywood studios through our industry-leading technology partner.
- Unique Data Challenges: Work with massive, diverse, and complex media datasets at scale.
- Cutting-Edge Technology: Utilize modern data stack components like Spark, Delta Lake, Airflow, Terraform, and AWS.
- Collaborative Environment: Partner closely with talented AI/ML engineers and industry experts at a top M&E tech company.
- High Visibility: Your work directly enables key strategic initiatives for a global industry leader.
- Remote-First Culture: Enjoy the flexibility to work from anywhere while collaborating effectively.
- Growth Opportunities: Be part of a dynamic environment with opportunities to learn and lead.