AI/ML

AI training platform with Synthesia + Bedrock RAG + Lex/Transcribe

An AI-powered training platform delivering course content with Synthesia-generated video instructors, a RAG-backed Q&A chat on Amazon Bedrock, and conversational practice via Amazon Lex and Transcribe — built with Ryanair's training organisation.

RyanairSoftware Development Engineer (AWS)Oct 2024 – Mar 2025
Multi-format
Training modules
Multi-lingual
Languages
Video + chat + voice
Modalities

Problem

Operational training inside a major airline is an unglamorous, expensive, never-ending workload. New staff need to be trained, current staff need to be re-certified, regulations change, languages and locales vary across the workforce. Recording video courses with human instructors is slow and hard to update.

Ryanair wanted to test whether modern generative AI could change that pipeline: AI-generated instructors that could be regenerated when content changes, a chat assistant that knew the training material and could answer trainee questions, and conversational practice for procedures where you'd otherwise need a human role-playing partner.

Process

I joined as the AWS engineer driving the solution architecture. The platform had three modalities, each with its own engineering shape:

  1. Course delivery via Synthesia. Course material authored in markdown, fed to Synthesia's API to generate video lessons with selectable AI presenters and languages. Updating a course = re-rendering the affected lessons; no studio time.
  2. Q&A chat on Bedrock with RAG. A trainee could ask "what's the procedure for X?" mid-course; the chat assistant retrieved the relevant section from the course corpus and answered citing it. Anti-hallucination guardrails essential — this is training content, not a chatbot.
  3. Voice practice via Lex + Transcribe. Specific scenarios (customer-service handling, safety announcements) require speaking practice. Amazon Lex modelled the dialogue, Transcribe handled speech-to-text, and we evaluated the trainee's response against a rubric.

A lot of the work was integration discipline rather than novel ML — getting the pieces talking with sane error handling, locale awareness threaded through, and content versioning so a trainee in the middle of a module didn't see a half-updated course.

Outcome

The platform ran across multiple training modules in multiple languages, combining all three modalities. Updating a course module became measured in hours, not weeks. The Q&A chat answered with citations into the course material itself, which made trainers comfortable with what trainees were seeing. Voice practice covered the scenarios that previously needed paired human practice sessions.

For engineersTechnical Deep Dive
Expand

High-level architecture

Course markdown (versioned in S3)
   │
   ├─ Synthesia render pipeline → MP4 + captions per language
   │
   ├─ Embeddings generated → OpenSearch (RAG index)
   │
   └─ Bedrock guardrail config (course-specific allowed topics)

Trainee experience (React app):
   ├─ Video player (Synthesia output served from CloudFront)
   ├─ Q&A chat (Bedrock streaming + RAG)
   └─ Voice practice (Lex bot per scenario, Transcribe STT)

Per-trainee state:
   DynamoDB (progress, attempts, scores)

RAG with course versioning

The Q&A chat needed to answer about the course version this trainee is on, not the latest. We tagged every embedded chunk with (course_id, version, chunk_id), and the trainee's session pinned the version. When a trainer published a new version, in-progress trainees finished on their pinned version; new starts went to the latest.

Citations were enforced. The system prompt required every answer to quote a specific chunk by ID and the UI rendered the chunk inline as a "from your course material" block.

Voice practice with Lex

Amazon Lex modelled each practice scenario as an intent slot graph: prompt → expected response shape → branch. Transcribe converted the trainee's spoken response to text; Lex matched it (with intent/utterance flexibility), then handed off to Lambda for scoring.

Scoring used a rubric prompt against Bedrock — not a trained classifier — because the criteria varied per scenario and we needed quick iteration with the training team. Latency was acceptable because voice practice is a turn-taking experience, not real-time.

Multilingual considerations

Synthesia's multi-language presenters did most of the heavy lifting on output. The harder part was input — the chat and voice flows. We built per-language Bedrock prompts and Lex bots rather than translating everything in/out of English; quality was meaningfully better when the model worked in-language end-to-end.

Trade-offs

  • No fine-tuning, again. Same reasoning as other Bedrock work: stronger RAG and guardrails beat fine-tuning for this kind of factual, citable use case.
  • Lex over a custom LLM-driven dialogue. We considered driving voice practice purely through an LLM. Lex won because the dialogue shape was structured and Lex's slot-fill model was easier to debug for the training team than freeform LLM transitions.
  • Synthesia vs in-house generation. We used Synthesia rather than building a video generation pipeline. The trade-off was cost and lock-in; the win was an order-of-magnitude faster delivery.