A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
-
Updated
Jun 30, 2025 - Python
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.
Unified AI Gateway for 30+ LLMs (OpenAI, Anthropic, Bedrock, Azure etc) with Caching, Guardrails, A/B test & cost controls. Go-native Fastest & Scalable AI Gateway LiteLLM & Kong AI Gateway alternative.
mimir is a drop-in proxy that caches LLM API responses using semantic similarity, reducing costs and latency for repeated or similar queries.
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
Reliable and Efficient Semantic Prompt Caching with vCache
RAMen is a fast in-memory data store like Redis, but built for AI: drop-in Redis protocol, native vector search, semantic caching, and a built-in MCP server for agents. Single Go binary, BSD-3.
Redis integration for Google Agent Development Kit (ADK) - Memory, Sessions, Search Tools, MCP
Unified multi-layer caching library for AI/agent pipelines — LangChain, LangGraph, AutoGen, CrewAI, Agno, A2A
Redis Vector Library (RedisVL) -- the AI-native Java client for Redis.
Enterprise AI traffic gateway — unified compliance, routing across 20+ LLM providers, semantic cache, quotas, and audit. SDK / network / OS-layer intercept.
This is a RAG based chatbot in which semantic cache and guardrails have been incorporated.
This repository contains sample code demonstrating how to implement a verified semantic cache using Amazon Bedrock Knowledge Bases to prevent hallucinations in Large Language Model (LLM) responses while improving latency and reducing costs.
ToolOps is a framework-agnostic middleware SDK that treats every tool call as a first-class operation. By wrapping your tools in a single decorator, you instantly upgrade them with industrial-grade caching, resilience, and observability.
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
Local-first semantic cache for AI agents. A small C daemon + CLI that remembers what your agent learned across sessions. Plugs into Claude Code, Codex, Gemini CLI, and Claude Desktop / ChatGPT via MCP. No LLM calls, no SaaS, no API key.
RouterArena #1 among known public baselines: 96.77% accuracy, $0.0768/1K, 1.0000 robustness. OpenAI-compatible LLM router across 47+ providers.
AI real-estate automation platform: Telegram bot, RAG, apartment search, CRM workflows, voice agent, Langfuse observability, and Dockerized AI runtime.
Enhance LLM retrieval performance with Azure Cosmos DB Semantic Cache. Learn how to integrate and optimize caching strategies in real-world web applications.
Add a description, image, and links to the semantic-cache topic page so that developers can more easily learn about it.
To associate your repository with the semantic-cache topic, visit your repo's landing page and select "manage topics."