rag-evaluation

Here are 113 public repositories matching this topic...

Giskard-AI / giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated May 17, 2026
Python

Marker-Inc-Korea / AutoRAG

Star

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

python open-source qa benchmarking ops pipeline analysis optimization evaluation embeddings automl document-parser rag llm retrieval-augmented-generation llm-ops llm-evaluation rag-evaluation

Updated May 14, 2026
Python

Agenta-AI / agenta

Star

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

evaluation agents observability prompt-engineering llmops prompt-management llm-tools llm-framework llm-playground llm-platform llm-evaluation rag-evaluation llm-monitoring llm-as-a-judge llm-observability

Updated May 15, 2026
TypeScript

frutik / Awesome-RAG

Star

rag rag-implementation rag-evaluation

Updated Sep 7, 2025

vectara / open-rag-eval

Star

RAG evaluation without the need for "golden answers"

metrics evaluation-metrics rag vectara retrieval-augmented-generation rag-evaluation

Updated Dec 15, 2025
Python

onyx-dot-app / EnterpriseRAG-Bench

Star

Dataset and benchmark for RAG on company internal documents.

python enterprise benchmark information-retrieval evaluation dataset question-answering knowledge-base semantic-search enterprise-search synthetic-data rag synthetic-data-generation large-language-models llm generative-ai retrieval-augmented-generation llm-evaluation rag-evaluation

Updated May 8, 2026

LLAMATOR-Core / llamator

Star

Red Teaming python-framework for testing chatbots and GenAI systems.

Updated Apr 13, 2026
Python

GiovanniPasq / chunky

Star

Convert and validate your Markdown, then choose the best chunking strategy for your RAG pipeline.

Updated May 17, 2026
TypeScript

mburaksayici / RAG-Boilerplate

Star

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

Updated Nov 18, 2025
Python

Vbj1808 / Dokis

Star

Lightweight RAG provenance middleware. Verifies every claim in an LLM response is grounded in a retrieved source - without an LLM call.

python middleware provenance citations developer-tools ai-safety rag guardrails trustworthy-ai llm langchain retrieval-augmented-generation llm-evaluation rag-evaluation hallucination-detection

Updated Apr 28, 2026
Python

mts-ai / rurage

Star

information-retrieval question-answering rag llm-evaluation rag-evaluation

Updated Apr 14, 2025
Python

HZYAI / RagScore

Star

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

privacy jupyter mcp evaluation colab dataset-generation synthetic-data fine-tuning rag qa-generation ai-evaluation llm llmops local-llm ollama rag-evaluation llm-as-a-judge

Updated Apr 11, 2026
Python

vero-labs-ai / vero-eval

Star

Open source framework for evaluating AI Agents

python testing evaluation datasets dataset-generation evaluation-metrics evaluation-framework testing-framework testing-library synthetic-dataset-generation user-persona evals llm-evaluation rag-evaluation llm-evaluation-framework langgraph rag-testing

Updated Feb 24, 2026
Python

dokimos-dev / dokimos

Star

Evaluation Framework for LLM applications in Java and Kotlin

Updated May 9, 2026
Java

mburaksayici / smallevals

Star

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

qa chroma question-generation weaviate qa-generation milvus vector-database qdrant chromadb rag-evaluation tiny-llm retrieval-evaluation offline-evaluation retrieval-metrics

Updated Dec 4, 2025
Python

Evaliphy / evaliphy

Star

The E2E AI testing tool | No ML Overhead

ai test-automation testing-tools end-to-end-testing test-automation-framework rag ai-testing llm-evaluation rag-evaluation llm-evaluation-toolkit llm-evaluation-framework rag-pipeline llm-testing ai-testing-tool ai-test-automation

Updated May 7, 2026
TypeScript

oztrkoguz / RAG-Framework-Evaluation

Star

This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.

swarms autogen rag langchain llamaindex rag-evaluation crewai langchain-rag autogen-rag crewai-rag llamaindex-rag swarms-rag

Updated Jul 28, 2024
Python

ioannis-papadimitriou / rag-playground

Star

A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.

chatbot qa-generation llm-inference retrieval-augmented-generation rag-evaluation

Updated Dec 18, 2024
Python

simranjeet97 / Learn_RAG_from_Scratch_LLM

Star

Learn Retrieval-Augmented Generation (RAG) from Scratch using LLMs from Hugging Face and Langchain or Python

artificial-intelligence rag datascience-machinelearning generative-ai llm-training retrieval-augmented-generation rag-model llm-framework llm-apps llm-evaluation genai-usecase rag-implementation rag-evaluation rag-embeddings rag-pipeline rag-llm rag-chatbot rag-application genai-domain

Updated Jan 20, 2025
Jupyter Notebook

rostyslavshovak / RAG-Retrieval-Augmented-Generation

Star

RAG Chatbot for Financial Analysis

open-source pdf rag gradio-interface langchain qdrant-vector-database retrieval-augmented-generation rag-evaluation

Updated Mar 9, 2025
Python

Improve this page

Add a description, image, and links to the rag-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rag-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rag-evaluation

Here are 113 public repositories matching this topic...

Giskard-AI / giskard-oss

Marker-Inc-Korea / AutoRAG

Agenta-AI / agenta

frutik / Awesome-RAG

vectara / open-rag-eval

onyx-dot-app / EnterpriseRAG-Bench

LLAMATOR-Core / llamator

GiovanniPasq / chunky

mburaksayici / RAG-Boilerplate

Vbj1808 / Dokis

mts-ai / rurage

HZYAI / RagScore

vero-labs-ai / vero-eval

dokimos-dev / dokimos

mburaksayici / smallevals

Evaliphy / evaliphy

oztrkoguz / RAG-Framework-Evaluation

ioannis-papadimitriou / rag-playground

simranjeet97 / Learn_RAG_from_Scratch_LLM

rostyslavshovak / RAG-Retrieval-Augmented-Generation

Improve this page

Add this topic to your repo