Senior Data & AI Engineer

FP Inc.

Toronto, ON, Canada
Contract
Hybrid
C$60 - C$75

Daily Responsibilities:

  • Triage overnight pipeline failures (Airflow DAGs, ETL jobs) and resolve data quality issues
  • Write and optimize complex SQL across Trino, Snowflake, and SQL Server for IAM reporting
  • Build and iterate on MCP server tools / agentic AI workflows — prompt tuning, RAG testing, LLM output validation
  • Develop and maintain ETL/ELT pipelines (Airflow, Spark, DBT, Iceberg) for new and existing data feeds
  • Translate IAM business rules (PAM, entitlements, password compliance) into queryable data models
  • Collaborate with IAM analysts and stakeholders to clarify requirements and deliver data solutions
  • Maintain Snowflake semantic layer views and Cortex AI features for dashboards and AI agents
  • Run QA test cases — validate pipeline outputs, AI agent responses, and source-to-target accuracy
  • Monitor AI governance controls — cost logs, guardrails, audit trails
  • Review and merge GitHub PRs; maintain CI/CD workflows via GitHub Actions
  • Update technical documentation (data dictionaries, runbooks, READMEs)
  • Design, build, and optimize data models and semantic views in Snowflake, including Snowflake Cortex for AI-powered natural language querying and ML-assisted analytics.
  • Build and maintain semantic layer definitions that expose IAM data domains in a consistent, reusable format for AI agents and dashboards.
  • Design and implement production-ready agentic AI systems using MCP server architecture and LLM frameworks such as LangChain, Claude/Anthropic APIs, or equivalent tooling.
  • Build RAG (Retrieval-Augmented Generation) pipelines with vector database integration (e.g. Pinecone, Chroma) and embedding-based retrieval strategies.
  • Write and iterate on prompt engineering strategies — chain-of-thought patterns, tool use, structured output formatting, and context management at scale.
  • Enforce AI governance: LLM cost controls, output monitoring, logging, guardrails, and audit trails to ensure responsible AI use within IAM.
  • Integrate MCP servers and AI agents with existing Trino, SQL Server, and Snowflake data sources.
  • Rapidly onboard to IAM domain — PAM (Privileged Access Management), entitlement profiling, PAI datamart structures, password compliance, and access certification workflows.
  • Translate complex IAM business rules and access governance logic into queryable data models and automated pipelines.
  • Collaborate with IAM analysts and stakeholders to capture and formalize business requirements into technical specifications.
  • Build and maintain QA frameworks and structured test cases for data pipelines, AI agent outputs, and SQL transformations.
  • Validate data accuracy, completeness, and business rule compliance across source-to-target data flows; perform regression testing on pipeline or logic changes.
  • Write complex analytical SQL across Trino, Snowflake, and SQL Server.
  • Build and maintain ETL/ELT pipelines using Apache Airflow (DAG authoring, scheduling, dependency management), Apache Spark, and DBT.
  • Work with Apache Iceberg and Hadoop-based data sources; understand distributed data processing patterns in hybrid on-prem and cloud environments (AWS/Azure).
  • Use GitHub for version control, branch management, pull requests, and peer code review; author and maintain GitHub Actions CI/CD workflows.
  • Produce and maintain clear technical documentation (README files, runbooks, data dictionaries).

What program/technology/software knowledge is essential for this role? Describe in what capacity the selected candidate will be using it:

  • Python — Used as the baseline language for all AI and data tooling: building pipelines, AI agents, automation scripts, and IAM business logic transformations.
  • Advanced SQL (Snowflake, Trino, SQL Server)— Used daily to write complex analytical queries across multiple IAM data sources and platforms; translating business rules into queryable data models.
  • Snowflake + Cortex AI— Used to design and build semantic views and data models; Cortex specifically for AI-powered natural language querying and ML-assisted analytics on IAM data.
  • MCP Server Architecture— Used to build and deploy production-grade agentic AI tools that connect LLMs to IAM data sources (Trino, Snowflake, SQL Server).
  • LLM APIs (Claude/Anthropic, GPT) — Used to design, implement, and govern AI agent systems; includes prompt engineering, cost monitoring, output validation, and audit logging.
  • LangChain + RAG Pipelines + Vector DBs (Pinecone, Chroma)— Used to build retrieval-augmented generation systems that allow AI agents to reason over IAM domain knowledge and documentation.
  • Apache Airflow — Used to author, schedule, and manage ETL/ELT pipeline DAGs across IAM data domains.
  • Apache Spark— Used for large-scale distributed data processing within ETL/ELT pipelines.
  • DBT — Used for SQL-based data transformations and maintaining data models in the lakehouse layer.
  • Apache Iceberg / Hadoop— Used to work with lakehouse-format data sources in hybrid on-prem and cloud environments.
  • AWS / Azure — Used to deploy and manage data pipelines and AI workloads across hybrid cloud infrastructure.
  • GitHub + GitHub Actions** — Used for version control, branch/PR workflows, peer code review, and CI/CD automation of pipeline and AI tool deployments.

Must-have Skills/Experiences and/or Education, certifications, qualifications, designations:

  • Python
  • Advanced SQL — Snowflake, Trino, SQL Server
  • Snowflake + Cortex AI (semantic views, ML-assisted analytics)
  • MCP Server architecture (agentic AI tooling)
  • LLM APIs — Claude/Anthropic, GPT or equivalent
  • LangChain / RAG pipelines
  • Prompt engineering (chain-of-thought, context management, guardrails)
  • Apache Airflow (DAG authoring, scheduling)
  • Apache Spark (ETL/ELT)
  • DBT (data transformations)
  • Apache Iceberg / Hadoop (lakehouse / distributed processing)
  • AWS and/or Azure (hybrid cloud)
  • GitHub + GitHub Actions (CI/CD, version control)
  • Proficiency in Python (baseline for all AI/data tooling) and advanced SQL across at least two platforms (Snowflake, Trino, SQL Server).
  • Hands-on experience with Snowflake SQL, Cortex AI features, and semantic view design.
  • Practical experience building MCP servers or agentic AI frameworks, including LLM API integration (Claude/Anthropic, Experience building RAG pipelines with LangChain or similar orchestration frameworks and vector database integration (Pinecone, Chroma, or similar).
  • Demonstrated ability to design effective prompts with context management, chain-of-thought patterns, and governance controls (cost monitoring, logging, guardrails).
  • Familiarity with IAM/PAM domain concepts (privileged access, entitlements, access certifications).
  • Experience writing structured test plans and executing test cases for data or software systems.
  • Experience authoring Apache Airflow DAGs, Apache Spark jobs, and managing pipeline dependencies.
  • Knowledge of modern data lakehouse tooling: Apache Iceberg, DBT, and/or Hadoop ecosystem (HDFS, Hive).
  • Experience with hybrid cloud deployments (AWS and/or Azure) alongside on-prem infrastructure.
  • Proficient with Git workflows, GitHub Actions CI/CD pipeline configuration, and code review practices.
  • Ability to produce clear technical documentation and data dictionaries.
  • Strong problem-solving skills and ability to deliver in critical timelines with minimal oversight.
  • Excellent communication skills — comfortable engaging with both technical teams and IAM business stakeholders.

Nice-to-have Skills/Experience and/or Education, certifications, qualifications, designations:

  • SailPoint IdentityIQ
  • CyberArk / BeyondTrust
  • Apache Kafka (streaming)
  • SharePoint (reporting integration)
  • Tableau BI
  • KubeFlow / Dagster / Temporal
  • Experience with Trino (distributed SQL query engine) — actively used in this environment.
  • Exposure to SailPoint IdentityIQ or other IGA platforms.
  • Familiarity with PAM tools such as CyberArk or BeyondTrust.
  • Real-time streaming experience with Apache Kafka.
  • Knowledge of SharePoint integration for reporting outputs.
  • Prior work in a regulated financial services environment.
  • GitHub portfolio of shipped AI/LLM projects or production RAG systems.
  • Experience with additional orchestration tools: KubeFlow, Dagster, or Temporal.
  • QA-related certification (e.g. ISTQB, Agile Testing).
  • Computer Engineering, Computer Science, or related technical degree/diploma or equivalent experience.
  • Any experience with tools like Tableau, Power BI, or Airflow UI will be an added advantage.

Soft skills:

  • Strong Problem-Solving Skills — Ability to work through complex IAM data and AI challenges independently, delivering solutions under critical timelines with minimal oversight.
  • Excellent Communication Skills — Must be comfortable engaging with both technical teams (engineers, platform teams) and non-technical IAM business stakeholders; translating business requirements into technical specifications.
  • Self-Starter / Fast Learner — Expected to rapidly onboard to the IAM domain (PAM, entitlements, PAI datamart, access certifications) with no hand-holding implied.
  • Collaboration — Works cross-functionally with IAM analysts, platform teams, and business stakeholders across on-prem and cloud environments.