Benchmarking

Preparing for V1 Release

As we prepare Spykio for the V1 release, we're conducting extensive benchmarking to assess and optimize its performance in long-context scenarios. This page presents our current benchmarking efforts and will be continuously updated with new results.

We're exploring how Spykio performs on several long-context LLM benchmarks, including:

LongBench v2 - Designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks.
OpenAI MRCR - A benchmark for evaluating the multi-round contextual retrieval capabilities of models.

We will publish comprehensive results here, comparing Spykio against standard vector database solutions such as Weaviate and Pinecone, as well as general long-context models like Gemini, OpenAI GPT-4o, Llama 4 Scout, and others.

Current Benchmark Results

We've conducted initial tests using a real-world dataset to evaluate Spykio's retrieval capabilities:

Kaggle PDF Dataset Benchmark

Dataset: Dataset of PDF Files (Kaggle)

Test setup: 800 PDF files, 80 queries

Performance Results

Standard Mode

83%

Correct document retrieval

Accurate Mode

94%

Correct document retrieval

Paragraph Precision

92%

Hit rate on relevant paragraphs (Accurate mode)

Note: We acknowledge that the current sample size is small. These results are preliminary and will be expanded with more comprehensive testing.

Ongoing Improvements

We are continuously improving the Spykio engine to enhance its performance across various retrieval scenarios. Our team is working on:

Optimizing the knowledge graph generation for better contextual understanding
Improving the query processing pipeline
Enhancing document chunking strategies for more accurate retrieval
Expanding our test suite to cover more diverse document types and query patterns

Join Our Testing Effort

We invite you to test Spykio with your own data and use cases. Please contact us if you'd like additional credits for more advanced testing. Your feedback is invaluable as we refine our solution.

Future Benchmarks

In addition to our current testing, we're planning to evaluate Spykio on additional benchmarks, including:

Domain-specific legal document retrieval accuracy
Multi-lingual document understanding
Cross-document reference resolution
Time-based information retrieval performance

Check back regularly for updated benchmark results as we continue our testing journey toward the V1 release.