Engenharia AI-Native

RAG real, agent patterns, MCP, LLMOps, evaluation — o que separa um protótipo de IA de um sistema AI-native em produção.

11artigos

1000XP total

🧩 RAG: por que "só jogar tudo no LLM" não funciona

Retrieval-Augmented Generation na prática: limite de contexto, alucinação, arquitetura em dois estágios (retrieve → generate), quando RAG vence fine-tuning.

⏱ 16 min·+80 XP

→

🔪 Chunking e Embeddings: as decisões que fazem ou quebram seu RAG

Fixed vs semantic vs recursive chunking, overlap, contextual retrieval, escolha de embedding (OpenAI, Voyage, BGE), redução de dimensão, cosine vs dot product.

⏱ 17 min·+85 XP

→

🎯 Hybrid Search + Reranking: do BM25 ao cross-encoder

BM25 + vector, reciprocal rank fusion, cross-encoder reranking (Cohere, Jina, Voyage), HyDE, query expansion — o pipeline de retrieval de produção.

⏱ 18 min·+90 XP

→

📊 Avaliando RAG: recall@k, nDCG e LLM-as-judge

Golden dataset, métricas de retrieval (recall@k, MRR, nDCG), métricas de generation (faithfulness, context relevance, answer relevance), RAGAS, LLM-as-judge sem vazamento.

⏱ 16 min·+80 XP

→

🤖 Agent Patterns: ReAct, Reflexion e Tree of Thoughts

ReAct (think-act-observe), Reflexion (self-critique), Tree of Thoughts, Plan-and-Execute, Router — padrões empíricos com quando cada um funciona e quando quebra.

⏱ 18 min·+90 XP

→

🕸️ Multi-Agent Systems: orchestrator-worker, swarms e handoffs

Orchestrator-worker, swarm com handoffs (OpenAI Swarm), CrewAI, hierarquias, quando multi-agent vale (e quando só aumenta custo).

⏱ 17 min·+85 XP

→

🧠 Context Engineering: prompt caching, subagents e skills

Anthropic prompt caching, janela de contexto, compaction, subagent delegation, skills (Agent Skills), CLAUDE.md/AGENTS.md, context window budget.

⏱ 16 min·+80 XP

→

🔌 MCP Deep Dive: construindo um servidor profissional

Model Context Protocol em profundidade: stdio vs HTTP, tools/resources/prompts, autenticação, rate limit, logging, exemplo real em TypeScript e Python.

⏱ 19 min·+90 XP

→

🚀 LLM APIs em Produção: streaming, structured output, batch e cache

Streaming SSE, tool use, structured output com JSON schema/Zod, batch API (50% desconto), prompt caching, retry com jitter, rate limit handling.

⏱ 16 min·+80 XP

→

📈 LLMOps: eval harness, drift detection e canary de prompts

Eval harness (promptfoo, LangSmith, custom), regressão de prompt, canary/A-B de prompts, drift detection, cost attribution, SLO de qualidade.

⏱ 18 min·+90 XP

→

🏁 Capstone: RAG production-grade — de ponta a ponta

Projeto: RAG completo — ingestão (chunking + embed), store (Postgres pgvector ou Pinecone), retrieval (hybrid BM25 + vector + reranker), eval harness (golden set + LLM judge), observability (Langfuse), cost attribution. Deploy com feature flag pra canary.

⏱ 45 min·+150 XP

→

Discussão

Carregando…

← Voltar à home