Search & Information Retrieval

Para quem já sabe o básico e quer ir fundo. Aqui o assunto é como os modelos funcionam em produção: memória, roteamento, ferramentas, agentes. O lado técnico que pouca gente explica direito.

7artigos

410XP total

📐 Busca: o que importa (precision, recall, NDCG)

Métricas: precision@k, recall@k, F1, MRR (Mean Reciprocal Rank), NDCG (discounted cumulative gain). Quando user wants top-1 vs top-10. Golden set pra eval.

⏱ 12 min·+50 XP

→

🐘 Full-text search em Postgres: tsvector + GIN

tsvector (normalized tokens), tsquery, ranking (ts_rank, ts_rank_cd), GIN index (GIN vs GiST), weights (A/B/C/D), multilingual (Portuguese dictionary), fuzzy via pg_trgm.

⏱ 13 min·+55 XP

→

🔍 Elasticsearch/OpenSearch: quando sair de Postgres

Quando FTS Postgres não dá: escala grande (>10M docs), relevance tuning avançado, aggregations complexas. Elasticsearch vs OpenSearch (fork AWS post-licença). Inverted index, shards, replicas.

⏱ 13 min·+55 XP

→

📏 BM25 e TF-IDF sem misticismo

TF (term frequency) + IDF (inverse document frequency) = TF-IDF clássico. BM25 refina: saturação de TF, length normalization (k1, b params). Default em Elastic, Lucene, tantivy.

⏱ 12 min·+50 XP

→

🧭 Vector search: HNSW, IVF, indexes aproximados

Embeddings 1536 dims. Brute force O(n) inviável em 10M. HNSW (Hierarchical Navigable Small World) — graph-based, 10ms p99. IVF (inverted file) — cluster-based. Postgres pgvector, Pinecone, Qdrant, Weaviate.

⏱ 14 min·+60 XP

→

🎯 Hybrid search + reranking: RRF e cross-encoder

Hybrid: BM25 (keyword) + vector (semantic) combinados. Fusion: RRF (Reciprocal Rank Fusion — simples, robust), weighted. Reranker cross-encoder (Cohere Rerank, BGE-reranker) em top-50. +30% NDCG típico.

⏱ 13 min·+55 XP

→

🏁 Capstone: search multimodal (texto + filtros + vetor)

Build search: produtos e-commerce. BM25 (title/desc) + vector (semantic) + filters (price/category) + facet aggregation. Reranker pra top-20. Dataset 100k+ items. Meta: NDCG@10 > 0.8, latency < 200ms p95.

⏱ 18 min·+85 XP

→

← Voltar à home