Senior Data & AI Engineer at FP Inc.

Daily Responsibilities:

Triage overnight pipeline failures (Airflow DAGs, ETL jobs) and resolve data quality issues
Write and optimize complex SQL across Trino, Snowflake, and SQL Server for IAM reporting
Build and iterate on MCP server tools / agentic AI workflows — prompt tuning, RAG testing, LLM output validation
Develop and maintain ETL/ELT pipelines (Airflow, Spark, DBT, Iceberg) for new and existing data feeds
Translate IAM business rules (PAM, entitlements, password compliance) into queryable data models
Collaborate with IAM analysts and stakeholders to clarify requirements and deliver data solutions
Maintain Snowflake semantic layer views and Cortex AI features for dashboards and AI agents
Run QA test cases — validate pipeline outputs, AI agent responses, and source-to-target accuracy
Monitor AI governance controls — cost logs, guardrails, audit trails
Review and merge GitHub PRs; maintain CI/CD workflows via GitHub Actions
Update technical documentation (data dictionaries, runbooks, READMEs)
Design, build, and optimize data models and semantic views in Snowflake, including Snowflake Cortex for AI-powered natural language querying and ML-assisted analytics.
Build and maintain semantic layer definitions that expose IAM data domains in a consistent, reusable format for AI agents and dashboards.
Design and implement production-ready agentic AI systems using MCP server architecture and LLM frameworks such as LangChain, Claude/Anthropic APIs, or equivalent tooling.
Build RAG (Retrieval-Augmented Generation) pipelines with vector database integration (e.g. Pinecone, Chroma) and embedding-based retrieval strategies.
Write and iterate on prompt engineering strategies — chain-of-thought patterns, tool use, structured output formatting, and context management at scale.
Enforce AI governance: LLM cost controls, output monitoring, logging, guardrails, and audit trails to ensure responsible AI use within IAM.
Integrate MCP servers and AI agents with existing Trino, SQL Server, and Snowflake data sources.
Rapidly onboard to IAM domain — PAM (Privileged Access Management), entitlement profiling, PAI datamart structures, password compliance, and access certification workflows.
Translate complex IAM business rules and access governance logic into queryable data models and automated pipelines.
Collaborate with IAM analysts and stakeholders to capture and formalize business requirements into technical specifications.
Build and maintain QA frameworks and structured test cases for data pipelines, AI agent outputs, and SQL transformations.
Validate data accuracy, completeness, and business rule compliance across source-to-target data flows; perform regression testing on pipeline or logic changes.
Write complex analytical SQL across Trino, Snowflake, and SQL Server.
Build and maintain ETL/ELT pipelines using Apache Airflow (DAG authoring, scheduling, dependency management), Apache Spark, and DBT.
Work with Apache Iceberg and Hadoop-based data sources; understand distributed data processing patterns in hybrid on-prem and cloud environments (AWS/Azure).
Use GitHub for version control, branch management, pull requests, and peer code review; author and maintain GitHub Actions CI/CD workflows.
Produce and maintain clear technical documentation (README files, runbooks, data dictionaries).

What program/technology/software knowledge is essential for this role? Describe in what capacity the selected candidate will be using it:

Python — Used as the baseline language for all AI and data tooling: building pipelines, AI agents, automation scripts, and IAM business logic transformations.
Advanced SQL (Snowflake, Trino, SQL Server)— Used daily to write complex analytical queries across multiple IAM data sources and platforms; translating business rules into queryable data models.
Snowflake + Cortex AI— Used to design and build semantic views and data models; Cortex specifically for AI-powered natural language querying and ML-assisted analytics on IAM data.
MCP Server Architecture— Used to build and deploy production-grade agentic AI tools that connect LLMs to IAM data sources (Trino, Snowflake, SQL Server).
LLM APIs (Claude/Anthropic, GPT) — Used to design, implement, and govern AI agent systems; includes prompt engineering, cost monitoring, output validation, and audit logging.
LangChain + RAG Pipelines + Vector DBs (Pinecone, Chroma)— Used to build retrieval-augmented generation systems that allow AI agents to reason over IAM domain knowledge and documentation.
Apache Airflow — Used to author, schedule, and manage ETL/ELT pipeline DAGs across IAM data domains.
Apache Spark— Used for large-scale distributed data processing within ETL/ELT pipelines.
DBT — Used for SQL-based data transformations and maintaining data models in the lakehouse layer.
Apache Iceberg / Hadoop— Used to work with lakehouse-format data sources in hybrid on-prem and cloud environments.
AWS / Azure — Used to deploy and manage data pipelines and AI workloads across hybrid cloud infrastructure.
GitHub + GitHub Actions** — Used for version control, branch/PR workflows, peer code review, and CI/CD automation of pipeline and AI tool deployments.

Must-have Skills/Experiences and/or Education, certifications, qualifications, designations:

Python
Advanced SQL — Snowflake, Trino, SQL Server
Snowflake + Cortex AI (semantic views, ML-assisted analytics)
MCP Server architecture (agentic AI tooling)
LLM APIs — Claude/Anthropic, GPT or equivalent
LangChain / RAG pipelines
Prompt engineering (chain-of-thought, context management, guardrails)
Apache Airflow (DAG authoring, scheduling)
Apache Spark (ETL/ELT)
DBT (data transformations)
Apache Iceberg / Hadoop (lakehouse / distributed processing)
AWS and/or Azure (hybrid cloud)
GitHub + GitHub Actions (CI/CD, version control)
Proficiency in Python (baseline for all AI/data tooling) and advanced SQL across at least two platforms (Snowflake, Trino, SQL Server).
Hands-on experience with Snowflake SQL, Cortex AI features, and semantic view design.
Practical experience building MCP servers or agentic AI frameworks, including LLM API integration (Claude/Anthropic, Experience building RAG pipelines with LangChain or similar orchestration frameworks and vector database integration (Pinecone, Chroma, or similar).
Demonstrated ability to design effective prompts with context management, chain-of-thought patterns, and governance controls (cost monitoring, logging, guardrails).
Familiarity with IAM/PAM domain concepts (privileged access, entitlements, access certifications).
Experience writing structured test plans and executing test cases for data or software systems.
Experience authoring Apache Airflow DAGs, Apache Spark jobs, and managing pipeline dependencies.
Knowledge of modern data lakehouse tooling: Apache Iceberg, DBT, and/or Hadoop ecosystem (HDFS, Hive).
Experience with hybrid cloud deployments (AWS and/or Azure) alongside on-prem infrastructure.
Proficient with Git workflows, GitHub Actions CI/CD pipeline configuration, and code review practices.
Ability to produce clear technical documentation and data dictionaries.
Strong problem-solving skills and ability to deliver in critical timelines with minimal oversight.
Excellent communication skills — comfortable engaging with both technical teams and IAM business stakeholders.

Nice-to-have Skills/Experience and/or Education, certifications, qualifications, designations:

SailPoint IdentityIQ
CyberArk / BeyondTrust
Apache Kafka (streaming)
SharePoint (reporting integration)
Tableau BI
KubeFlow / Dagster / Temporal
Experience with Trino (distributed SQL query engine) — actively used in this environment.
Exposure to SailPoint IdentityIQ or other IGA platforms.
Familiarity with PAM tools such as CyberArk or BeyondTrust.
Real-time streaming experience with Apache Kafka.
Knowledge of SharePoint integration for reporting outputs.
Prior work in a regulated financial services environment.
GitHub portfolio of shipped AI/LLM projects or production RAG systems.
Experience with additional orchestration tools: KubeFlow, Dagster, or Temporal.
QA-related certification (e.g. ISTQB, Agile Testing).
Computer Engineering, Computer Science, or related technical degree/diploma or equivalent experience.
Any experience with tools like Tableau, Power BI, or Airflow UI will be an added advantage.

Soft skills:

Strong Problem-Solving Skills — Ability to work through complex IAM data and AI challenges independently, delivering solutions under critical timelines with minimal oversight.
Excellent Communication Skills — Must be comfortable engaging with both technical teams (engineers, platform teams) and non-technical IAM business stakeholders; translating business requirements into technical specifications.
Self-Starter / Fast Learner — Expected to rapidly onboard to the IAM domain (PAM, entitlements, PAI datamart, access certifications) with no hand-holding implied.
Collaboration — Works cross-functionally with IAM analysts, platform teams, and business stakeholders across on-prem and cloud environments.