Your lab's AI-powered institutional memory. Ask natural-language questions about every experiment, route, dataset, and protocol your team has ever run.
February 2026 · Rasyn AI
Chemistry labs generate enormous amounts of data—retrosynthesis sessions, executed protocols, analytical datasets, accepted routes, sample inventories. But this knowledge is scattered across tools, spreadsheets, and the memories of individual researchers. When someone leaves, their institutional knowledge walks out the door.
Arcade changes this. It automatically indexes every piece of structured data your lab produces and makes it searchable through natural language. Ask “What were the highest-yielding Suzuki coupling conditions we've ever used?” and get a grounded answer with citations back to the original experiments.
No manual tagging. No data migration. Arcade ingests from your existing Rasyn workflows—sessions, routes, execute runs, datasets, and samples—and builds a unified knowledge graph that your entire team can query.
Architecture
Natural language query from the user
Claude Haiku parses intent, keywords, numeric filters, and chemical references
BM25 + vector similarity + numeric range filters executed in parallel
Paragraph-level retrieval from protocol steps, observations, and notes
Claude Sonnet synthesizes an answer grounded in retrieved evidence
Each claim linked back to source cards with full provenance chain
Retrieval
PostgreSQL GIN indexes with weighted fields: title (A), summary (B), content (C), tags (B). English stemming and stop-word removal.
OpenAI text-embedding-3-small (1536 dims) with HNSW cosine index. Captures semantic meaning beyond exact keyword matches.
Range queries on yield, conversion, purity, scale (g), temperature (°C), and time (min) stored in structured JSONB with indexed access.
Data Sources
Full retrosynthesis planning sessions with target molecules, models used, and discovered routes
Curated synthesis routes with step-by-step reactions, conditions, and scoring
Executed experiments with protocols, reagent lists, reaction conditions, and QC reports
Analytical results (HPLC, NMR) with QC metrics and instrument metadata
Physical samples tracked in inventory with lot/batch numbers and storage info
Features
Combines BM25 full-text search, vector similarity (OpenAI embeddings), and numeric range filters in a single query. Weighted 40/60 text-to-vector for optimal chemistry domain performance.
Claude parses natural-language questions into structured query plans, recognizing chemical names, SMILES notation, reaction types, and percentage ranges automatically.
Every answer is grounded in your actual lab records. Citations link back to source cards with full audit trails to the original experiment, route, or dataset.
Background hooks automatically index new sessions, accepted routes, executed experiments, and analytical datasets. No manual ETL required.
Numeric range queries on yield, conversion, purity, scale, temperature, and reaction time. Ask "experiments with yield > 70% and temp < 80°C" and get precise results.
Canonical SMILES indexing and Morgan fingerprint similarity enable structural queries. Find every experiment that has ever used a specific molecule or its close analogues.
Under the Hood
Arcade uses PostgreSQL with the pgvector extension, giving us full-text search (GIN indexes), vector similarity (HNSW with cosine distance), and structured JSONB queries in a single database. Six core tables handle the data model:
arcade_cardsPrimary search documents. One card per canonical entity with title, summary, content, embedding (1536-dim), key_metrics (JSONB), tags, and molecule references.
arcade_chunksParagraph-level evidence chunks for deep retrieval. Protocol steps grouped in chunks of 3–5 with independent embeddings.
arcade_moleculesCanonical molecule entries with SMILES, InChIKey, computed properties, and Morgan fingerprints (1024-bit) for structural similarity.
arcade_eventsFull audit log of every ingest, update, delete, reindex, and embed operation with timing and error tracking.
arcade_interactionsUser behavior tracking (views, clicks, pins, copies) for future ranking improvements.
arcade_conversationsPersistent multi-turn chat history with linked source cards for provenance.
Arcade uses three AI models, each chosen for a specific role in the pipeline:
Query Planning
Claude Haiku
Parses natural language into structured query plans. Fast and cheap for high-frequency calls.
Embeddings
text-embedding-3-small
1536-dimension vectors at $0.02/1M tokens. Batched in groups of 100 with 30K char truncation.
Answer Generation
Claude Sonnet
Synthesizes grounded answers from retrieved evidence. Enforced citation format and factual discipline.
Arcade is designed to work even when external services are unavailable. If OpenAI's embedding API is down, search falls back to BM25-only text matching. If Claude is unavailable, query planning uses regex-based keyword extraction and answers return raw search results instead of synthesized responses. Ingestion continues even if individual cards fail to embed, and cards are flagged for re-embedding once the service recovers.
Arcade is available now on all Rasyn plans. Every experiment you run, every route you accept, every dataset you upload is automatically indexed and searchable.
No credit card required