OKF Knowledge Base Migration Sprint

Migrate your support KB, Confluence, or Notion wiki into a conformant Open Knowledge Format bundle. A time-boxed migration sprint with editorial review.

Last updated 2026-06-26T00:00:00.000Z. Independent resource, not affiliated with Google.

We are an independent implementation agency for the Open Knowledge Format (OKF), the open specification Google Cloud published on 12 June 2026. We are not affiliated with or endorsed by Google. The migration sprint takes a knowledge base that is trapped inside a tool and turns it into a portable, agent-readable OKF bundle.

Who this is for

This sprint is for teams whose institutional knowledge already exists but lives in the wrong shape. Typically that means:

  • A support knowledge base that customers and agents read but machines cannot parse cleanly.
  • An internal wiki in Confluence, Notion, or SharePoint that has grown sprawling and duplicated.
  • A docs site that is good for humans but produces noisy results when fed to a retrieval pipeline.
  • A data catalogue whose descriptions are scattered and inconsistent.

If you want that knowledge to be usable by AI agents, portable between tools, and version-controlled, this is the starting point. If you are not sure your content is ready, begin with an OKF readiness audit.

The problem with tool-locked knowledge bases

Most knowledge bases are hostages to the platform they were written in.

  • Not portable. Export options are lossy. Your content is entangled with a vendor’s database, macros, and proprietary blocks.
  • Not diffable. You cannot see what changed, when, or why. There is no clean version history at the content level.
  • Agent-hostile. HTML wrappers, navigation chrome, and inconsistent structure make raw pages poor source material for retrieval. Agents ingest noise.
  • Duplicated. The same answer exists in four slightly different pages, and nobody knows which one is canonical.

An OKF bundle fixes the shape: a directory of UTF-8 Markdown files, each with YAML frontmatter and a required type, curated and stored in git. Portable, diffable, typed, and readable by both people and agents.

The migration approach

We run the sprint in a fixed sequence so scope stays controlled and quality stays high.

  1. Source inventory. We catalogue every source page, its traffic or usage, its last-updated date, and its owner. This is where pruning decisions begin.
  2. Concept-type mapping. We map content to OKF concept types and assign each file a type in frontmatter. A how-to is not a reference page is not a policy. Typing the corpus is what makes it agent-readable.
  3. Automated extraction and frontmatter generation. We extract clean Markdown from the source and generate recommended frontmatter: title, description, resource, tags, and timestamp.
  4. Dedupe and cleanup. We merge near-duplicates, pick canonical versions, strip platform chrome, and normalise formatting.
  5. Human editorial review. Automation does the heavy lifting; editors do the judgement. We check accuracy, merge decisions, and typing by hand.
  6. Conformance validation. Every file is validated against the spec: required type present, frontmatter well-formed, UTF-8 throughout, reserved files in place.
  7. Index and log structure. We build the reserved index.md for progressive disclosure and log.md to record provenance and change history.
  8. Delivery in version control. The finished bundle ships as a git repository your team owns outright.

What is migrated versus what is pruned

MigratedPruned
Live, accurate pages still in useStale drafts and superseded versions
The canonical version of each answerNear-duplicate copies of the same answer
Content that maps to a clear concept typeOrphaned pages with no owner or purpose
Reference data, how-tos, policies, FAQsNavigation chrome, macros, platform artefacts

The goal is a smaller, cleaner, typed corpus. A leaner bundle that an agent can actually use beats a complete archive of noise.

Handling scale

For large knowledge bases we batch the work by source section and run extraction in passes, with validation gating each batch before it moves to editorial review. Inventory and usage data drive the order: high-value, high-traffic content first, long-tail and low-use content triaged for prune-or-keep. This keeps a 5,000-page wiki tractable without a like-for-like dump.

Before and after

Tool-locked KBOKF bundle
Proprietary platform, lossy exportPlain UTF-8 Markdown, fully portable
No content-level version historyGit history, every change diffable
Untyped HTML pagesTyped files with required type frontmatter
Duplicated, conflicting answersDeduplicated, canonical sources
Agent-hostile, noisy for retrievalClean, curated source for RAG or direct context
No provenanceresource and timestamp on every file, plus log.md

Deliverables

  • A conformant, validated OKF bundle in a git repository you own.
  • Typed Markdown files with complete YAML frontmatter.
  • Reserved index.md (progressive disclosure) and log.md (provenance).
  • A migration report: what was migrated, what was pruned, and why.
  • A short maintenance guide so the bundle stays current.

Typical sprint timeline

PhaseDuration
Scoping and source inventory2 to 3 days
Concept-type mapping2 to 3 days
Automated extraction and frontmatter3 to 5 days
Dedupe, cleanup, editorial review4 to 7 days
Validation, index/log, delivery2 to 3 days

Most sprints complete in two to four weeks. Exact timing depends on volume and content quality, which we confirm after the audit.

Start the migration

A clean, agent-ready knowledge layer starts with knowing what you have. Book a scoping call via contact, review pricing, or begin with an OKF readiness audit. When the bundle is built, the next step is making it work for your agents: see OKF for AI agents.

Frequently asked questions

Which knowledge bases can you migrate?

Support KBs (Zendesk, Intercom, Freshdesk), internal wikis (Confluence, Notion, SharePoint), static docs sites (Docusaurus, GitBook, MkDocs), and data catalogues. If your content exports to HTML, Markdown, or a structured API, we can map it.

Do you migrate everything?

No. Part of the value is pruning. We migrate live, accurate, distinct content and deliberately leave behind duplicates, stale drafts, and superseded pages. You get a smaller, cleaner corpus, not a like-for-like copy.

Is the Open Knowledge Format a Google product, and are you affiliated with Google?

OKF is an open specification published by Google Cloud on 12 June 2026. We are an independent agency. We are not affiliated with, endorsed by, or partnered with Google. We implement the open spec.

What do we get at the end?

A validated OKF bundle delivered in a git repository: typed Markdown files with YAML frontmatter, a reserved index.md and log.md, plus a short maintenance guide so your team can keep it current.

How long does a sprint take?

A typical sprint runs two to four weeks depending on source volume and how much editorial cleanup the content needs. We scope this precisely after a readiness audit.