Daily Responsibilities:
- Triage overnight pipeline failures (Airflow DAGs, ETL jobs) and resolve data quality issues
- Write and optimize complex SQL across Trino, Snowflake, and SQL Server for IAM reporting
- Build and iterate on MCP server tools / agentic AI workflows — prompt tuning, RAG testing, LLM output validation
- Develop and maintain ETL/ELT pipelines (Airflow, Spark, DBT, Iceberg) for new and existing data feeds
- Translate IAM business rules (PAM, entitlements, password compliance) into queryable data models
- Collaborate with IAM analysts and stakeholders to clarify requirements and deliver data solutions
- Maintain Snowflake semantic layer views and Cortex AI features for dashboards and AI agents
- Run QA test cases — validate pipeline outputs, AI agent responses, and source-to-target accuracy
- Monitor AI governance controls — cost logs, guardrails, audit trails
- Review and merge GitHub PRs; maintain CI/CD workflows via GitHub Actions
- Update technical documentation (data dictionaries, runbooks, READMEs)
- Design, build, and optimize data models and semantic views in Snowflake, including Snowflake Cortex for AI-powered natural language querying and ML-assisted analytics.
- Build and maintain semantic layer definitions that expose IAM data domains in a consistent, reusable format for AI agents and dashboards.
- Design and implement production-ready agentic AI systems using MCP server architecture and LLM frameworks such as LangChain, Claude/Anthropic APIs, or equivalent tooling.
- Build RAG (Retrieval-Augmented Generation) pipelines with vector database integration (e.g. Pinecone, Chroma) and embedding-based retrieval strategies.
- Write and iterate on prompt engineering strategies — chain-of-thought patterns, tool use, structured output formatting, and context management at scale.
- Enforce AI governance: LLM cost controls, output monitoring, logging, guardrails, and audit trails to ensure responsible AI use within IAM.
- Integrate MCP servers and AI agents with existing Trino, SQL Server, and Snowflake data sources.
- Rapidly onboard to IAM domain — PAM (Privileged Access Management), entitlement profiling, PAI datamart structures, password compliance, and access certification workflows.
- Translate complex IAM business rules and access governance logic into queryable data models and automated pipelines.
- Collaborate with IAM analysts and stakeholders to capture and formalize business requirements into technical specifications.
- Build and maintain QA frameworks and structured test cases for data pipelines, AI agent outputs, and SQL transformations.
- Validate data accuracy, completeness, and business rule compliance across source-to-target data flows; perform regression testing on pipeline or logic changes.
- Write complex analytical SQL across Trino, Snowflake, and SQL Server.
- Build and maintain ETL/ELT pipelines using Apache Airflow (DAG authoring, scheduling, dependency management), Apache Spark, and DBT.
- Work with Apache Iceberg and Hadoop-based data sources; understand distributed data processing patterns in hybrid on-prem and cloud environments (AWS/Azure).
- Use GitHub for version control, branch management, pull requests, and peer code review; author and maintain GitHub Actions CI/CD workflows.
- Produce and maintain clear technical documentation (README files, runbooks, data dictionaries).
What program/technology/software knowledge is essential for this role? Describe in what capacity the selected candidate will be using it:
- Python — Used as the baseline language for all AI and data tooling: building pipelines, AI agents, automation scripts, and IAM business logic transformations.
- Advanced SQL (Snowflake, Trino, SQL Server)— Used daily to write complex analytical queries across multiple IAM data sources and platforms; translating business rules into queryable data models.
- Snowflake + Cortex AI— Used to design and build semantic views and data models; Cortex specifically for AI-powered natural language querying and ML-assisted analytics on IAM data.
- MCP Server Architecture— Used to build and deploy production-grade agentic AI tools that connect LLMs to IAM data sources (Trino, Snowflake, SQL Server).
- LLM APIs (Claude/Anthropic, GPT) — Used to design, implement, and govern AI agent systems; includes prompt engineering, cost monitoring, output validation, and audit logging.
- LangChain + RAG Pipelines + Vector DBs (Pinecone, Chroma)— Used to build retrieval-augmented generation systems that allow AI agents to reason over IAM domain knowledge and documentation.
- Apache Airflow — Used to author, schedule, and manage ETL/ELT pipeline DAGs across IAM data domains.
- Apache Spark— Used for large-scale distributed data processing within ETL/ELT pipelines.
- DBT — Used for SQL-based data transformations and maintaining data models in the lakehouse layer.
- Apache Iceberg / Hadoop— Used to work with lakehouse-format data sources in hybrid on-prem and cloud environments.
- AWS / Azure — Used to deploy and manage data pipelines and AI workloads across hybrid cloud infrastructure.
- GitHub + GitHub Actions** — Used for version control, branch/PR workflows, peer code review, and CI/CD automation of pipeline and AI tool deployments.
Must-have Skills/Experiences and/or Education, certifications, qualifications, designations:
- Python
- Advanced SQL — Snowflake, Trino, SQL Server
- Snowflake + Cortex AI (semantic views, ML-assisted analytics)
- MCP Server architecture (agentic AI tooling)
- LLM APIs — Claude/Anthropic, GPT or equivalent
- LangChain / RAG pipelines
- Prompt engineering (chain-of-thought, context management, guardrails)
- Apache Airflow (DAG authoring, scheduling)
- Apache Spark (ETL/ELT)
- DBT (data transformations)
- Apache Iceberg / Hadoop (lakehouse / distributed processing)
- AWS and/or Azure (hybrid cloud)
- GitHub + GitHub Actions (CI/CD, version control)
- Proficiency in Python (baseline for all AI/data tooling) and advanced SQL across at least two platforms (Snowflake, Trino, SQL Server).
- Hands-on experience with Snowflake SQL, Cortex AI features, and semantic view design.
- Practical experience building MCP servers or agentic AI frameworks, including LLM API integration (Claude/Anthropic, Experience building RAG pipelines with LangChain or similar orchestration frameworks and vector database integration (Pinecone, Chroma, or similar).
- Demonstrated ability to design effective prompts with context management, chain-of-thought patterns, and governance controls (cost monitoring, logging, guardrails).
- Familiarity with IAM/PAM domain concepts (privileged access, entitlements, access certifications).
- Experience writing structured test plans and executing test cases for data or software systems.
- Experience authoring Apache Airflow DAGs, Apache Spark jobs, and managing pipeline dependencies.
- Knowledge of modern data lakehouse tooling: Apache Iceberg, DBT, and/or Hadoop ecosystem (HDFS, Hive).
- Experience with hybrid cloud deployments (AWS and/or Azure) alongside on-prem infrastructure.
- Proficient with Git workflows, GitHub Actions CI/CD pipeline configuration, and code review practices.
- Ability to produce clear technical documentation and data dictionaries.
- Strong problem-solving skills and ability to deliver in critical timelines with minimal oversight.
- Excellent communication skills — comfortable engaging with both technical teams and IAM business stakeholders.
Nice-to-have Skills/Experience and/or Education, certifications, qualifications, designations:
- SailPoint IdentityIQ
- CyberArk / BeyondTrust
- Apache Kafka (streaming)
- SharePoint (reporting integration)
- Tableau BI
- KubeFlow / Dagster / Temporal
- Experience with Trino (distributed SQL query engine) — actively used in this environment.
- Exposure to SailPoint IdentityIQ or other IGA platforms.
- Familiarity with PAM tools such as CyberArk or BeyondTrust.
- Real-time streaming experience with Apache Kafka.
- Knowledge of SharePoint integration for reporting outputs.
- Prior work in a regulated financial services environment.
- GitHub portfolio of shipped AI/LLM projects or production RAG systems.
- Experience with additional orchestration tools: KubeFlow, Dagster, or Temporal.
- QA-related certification (e.g. ISTQB, Agile Testing).
- Computer Engineering, Computer Science, or related technical degree/diploma or equivalent experience.
- Any experience with tools like Tableau, Power BI, or Airflow UI will be an added advantage.
Soft skills:
- Strong Problem-Solving Skills — Ability to work through complex IAM data and AI challenges independently, delivering solutions under critical timelines with minimal oversight.
- Excellent Communication Skills — Must be comfortable engaging with both technical teams (engineers, platform teams) and non-technical IAM business stakeholders; translating business requirements into technical specifications.
- Self-Starter / Fast Learner — Expected to rapidly onboard to the IAM domain (PAM, entitlements, PAI datamart, access certifications) with no hand-holding implied.
- Collaboration — Works cross-functionally with IAM analysts, platform teams, and business stakeholders across on-prem and cloud environments.