Document Intelligence

Document intelligence covers a range of tasks that require understanding large collections of unstructured text: contract analysis, research synthesis, compliance checking, due diligence, and more.

Use cases

Contract analysis - find clauses across hundreds of contracts that match a specific pattern or risk type. “Find all contracts with limitation of liability clauses that cap damages below $1M.” Research synthesis - index thousands of research papers, query by concept, surface the most relevant work for a literature review. Compliance checking - given a new policy document, find all existing documents that may conflict with or need to be updated for it. Due diligence - index a target company’s documents during M&A, query for specific risk factors, financial terms, or obligations. Email/communication search - find past communications relevant to a current situation, even when the exact words aren’t known.

Implementation pattern

Document intelligence applications follow the same RAG pattern with heavier emphasis on chunking strategy and metadata.

import { SolVec } from "@veclabs/solvec";

const sv = new SolVec({ network: "devnet" });
const collection = sv.collection("contracts", { dimensions: 1536 });

interface ContractChunk {
  contractId: string;
  contractName: string;
  party: string;
  effectiveDate: string;
  chunkText: string;
  chunkIndex: number;
}

async function indexContracts(
  contracts: Array<{
    id: string;
    name: string;
    party: string;
    date: string;
    fullText: string;
  }>,
) {
  for (const contract of contracts) {
    const chunks = chunkByParagraph(contract.fullText);
    const embeddings = await batchEmbed(chunks);

    await collection.upsert(
      chunks.map((chunk, i) => ({
        id: `${contract.id}__p${i}`,
        values: embeddings[i],
        metadata: {
          contractId: contract.id,
          contractName: contract.name,
          party: contract.party,
          effectiveDate: contract.date,
          text: chunk,
          chunkIndex: i,
        },
      })),
    );
  }
}

// Find relevant clauses across all contracts
async function findClauses(query: string, topK = 20) {
  const embedding = await embed(query);

  const results = await collection.query({
    vector: embedding,
    topK,
    minScore: 0.78, // high threshold for legal documents
  });

  // Group by contract
  const byContract = new Map<string, typeof results>();
  results.forEach((r) => {
    const existing = byContract.get(r.metadata.contractId) || [];
    byContract.set(r.metadata.contractId, [...existing, r]);
  });

  return Array.from(byContract.entries()).map(([contractId, chunks]) => ({
    contractId,
    contractName: chunks[0].metadata.contractName,
    party: chunks[0].metadata.party,
    relevantClauses: chunks.map((c) => c.metadata.text),
    maxScore: Math.max(...chunks.map((c) => c.score)),
  }));
}

function chunkByParagraph(text: string): string[] {
  return text
    .split(/\n\n+/)
    .map((p) => p.trim())
    .filter((p) => p.length > 50); // skip very short paragraphs
}

Proof of analysis

For legal and compliance work, the on-chain Merkle proof matters. After indexing, call .verify() to create a timestamped, immutable record of exactly what documents were in your analysis index:

const proof = await collection.verify();
// proof.solanaExplorerUrl - share this as evidence of your analysis corpus
// proof.onChainRoot - the cryptographic fingerprint of all indexed documents

This lets you prove, years later, exactly what documents were included in an analysis and that none were added or removed retroactively.

Getting Started

Core Concepts

Why VecLabs

Security & Data Privacy

Use Cases

TypeScript SDK

Python SDK

Guides

Reference

Document Intelligence

Document Intelligence

Use cases

Implementation pattern

Proof of analysis

Getting Started

Core Concepts

Why VecLabs

Security & Data Privacy

Use Cases

TypeScript SDK

Python SDK

Guides

Reference

​Document Intelligence

​Use cases

​Implementation pattern

​Proof of analysis

Document Intelligence

Use cases

Implementation pattern

Proof of analysis