Benchmarking
Preparing for V1 Release
As we prepare Spykio for the V1 release, we're conducting extensive benchmarking to assess and optimize its performance in long-context scenarios. This page presents our current benchmarking efforts and will be continuously updated with new results.
We're exploring how Spykio performs on several long-context LLM benchmarks, including:
- LongBench v2 - Designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks.
- OpenAI MRCR - A benchmark for evaluating the multi-round contextual retrieval capabilities of models.
We will publish comprehensive results here, comparing Spykio against standard vector database solutions such as Weaviate and Pinecone, as well as general long-context models like Gemini, OpenAI GPT-4o, Llama 4 Scout, and others.
Current Benchmark Results
We've conducted initial tests using a real-world dataset to evaluate Spykio's retrieval capabilities:
Kaggle PDF Dataset Benchmark
Dataset: Dataset of PDF Files (Kaggle)
Test setup: 800 PDF files, 80 queries
Performance Results
Note: We acknowledge that the current sample size is small. These results are preliminary and will be expanded with more comprehensive testing.
Ongoing Improvements
We are continuously improving the Spykio engine to enhance its performance across various retrieval scenarios. Our team is working on:
- Optimizing the knowledge graph generation for better contextual understanding
- Improving the query processing pipeline
- Enhancing document chunking strategies for more accurate retrieval
- Expanding our test suite to cover more diverse document types and query patterns
Join Our Testing Effort
We invite you to test Spykio with your own data and use cases. Please contact us if you'd like additional credits for more advanced testing. Your feedback is invaluable as we refine our solution.
Future Benchmarks
In addition to our current testing, we're planning to evaluate Spykio on additional benchmarks, including:
- Domain-specific legal document retrieval accuracy
- Multi-lingual document understanding
- Cross-document reference resolution
- Time-based information retrieval performance
Check back regularly for updated benchmark results as we continue our testing journey toward the V1 release.