Building a Git-Based CMS: How It Actually Works

Most content management systems store content in a database: PostgreSQL, MongoDB, or a proprietary cloud datastore. But there's an alternative architecture that uses Git repositories as the content backend, with the GitHub API as the interface layer.

This approach has real advantages: version history for free, branch-based workflows, no separate infrastructure to manage. It also has real challenges: API rate limits, file parsing complexity, and the difficulty of building a live preview on top of a version control system.

Below I'll walk through the architecture of a Git-based CMS, the technical problems you'll hit, and the trade-offs compared to database-backed systems.

The core idea

A Git-based CMS reads and writes content files through the GitHub API instead of a database. The content (JSON, YAML, Markdown, MDX) lives in a repository that's also the source code for the website. The CMS provides a visual interface for editing those files, and saves changes as commits or pull requests.

The architecture looks like this:

Browser (Editor UI)
    |
    v
CMS Application Server
    |
    v
GitHub REST / GraphQL API
    |
    v
Git Repository (content files)
    |
    v
CI/CD Pipeline (Vercel, Netlify, etc.)
    |
    v
Production Website

The CMS never stores content. It's a stateless proxy between the user and their Git repository.

Reading files from GitHub

The GitHub Contents API is the primary way to read files:

// Fetch a single file
const response = await fetch(
  "https://api.github.com/repos/owner/repo/contents/locales/en.json",
  {
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: "application/vnd.github.v3+json",
    },
  }
);

const data = await response.json();
// data.content is base64-encoded
const content = Buffer.from(data.content, "base64").toString("utf-8");
const parsed = JSON.parse(content);

For listing files in a directory:

// List directory contents
const response = await fetch(
  "https://api.github.com/repos/owner/repo/contents/locales",
  {
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: "application/vnd.github.v3+json",
    },
  }
);

const files = await response.json();
// Returns array of { name, path, sha, size, type }

The tree API for large repos

The Contents API has a limit: it won't return directories with more than 1,000 files. For large repos, use the Git Trees API instead:

// Get the full file tree recursively
const response = await fetch(
  "https://api.github.com/repos/owner/repo/git/trees/main?recursive=1",
  {
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: "application/vnd.github.v3+json",
    },
  }
);

const tree = await response.json();
// tree.tree is an array of { path, mode, type, sha, size }
// Filter for content files
const contentFiles = tree.tree.filter(
  (item: { type: string; path: string }) =>
    item.type === "blob" &&
    /\.(json|yaml|yml|md|mdx)$/.test(item.path)
);

This returns the entire repository tree in a single request, which is far more efficient than walking directories one by one.

Writing content back

Creating a commit through the GitHub API requires multiple steps:

Option 1: Update a single file (simple)

The Contents API supports single-file updates directly:

await fetch(
  "https://api.github.com/repos/owner/repo/contents/locales/en.json",
  {
    method: "PUT",
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: "application/vnd.github.v3+json",
    },
    body: JSON.stringify({
      message: "Update English translations",
      content: Buffer.from(JSON.stringify(updated, null, 2)).toString(
        "base64"
      ),
      sha: currentFileSha, // required to prevent race conditions
      branch: "content-update-123",
    }),
  }
);

Option 2: Multi-file commit (complex)

When the user edits multiple files, you need to create a Git tree and commit directly:

// 1. Get the current commit SHA for the branch
const ref = await fetch(
  `https://api.github.com/repos/owner/repo/git/ref/heads/main`,
  { headers }
);
const currentCommitSha = (await ref.json()).object.sha;

// 2. Get the current tree
const commit = await fetch(
  `https://api.github.com/repos/owner/repo/git/commits/${currentCommitSha}`,
  { headers }
);
const baseTreeSha = (await commit.json()).tree.sha;

// 3. Create blobs for each changed file
const blobs = await Promise.all(
  changedFiles.map(async (file) => {
    const blob = await fetch(
      `https://api.github.com/repos/owner/repo/git/blobs`,
      {
        method: "POST",
        headers,
        body: JSON.stringify({
          content: file.content,
          encoding: "utf-8",
        }),
      }
    );
    return {
      path: file.path,
      mode: "100644" as const,
      type: "blob" as const,
      sha: (await blob.json()).sha,
    };
  })
);

// 4. Create a new tree
const newTree = await fetch(
  `https://api.github.com/repos/owner/repo/git/trees`,
  {
    method: "POST",
    headers,
    body: JSON.stringify({
      base_tree: baseTreeSha,
      tree: blobs,
    }),
  }
);

// 5. Create the commit
const newCommit = await fetch(
  `https://api.github.com/repos/owner/repo/git/commits`,
  {
    method: "POST",
    headers,
    body: JSON.stringify({
      message: "Update translations (en, fr, de)",
      tree: (await newTree.json()).sha,
      parents: [currentCommitSha],
    }),
  }
);

// 6. Update the branch reference
await fetch(
  `https://api.github.com/repos/owner/repo/git/refs/heads/content-update-123`,
  {
    method: "PATCH",
    headers,
    body: JSON.stringify({
      sha: (await newCommit.json()).sha,
    }),
  }
);

This is the equivalent of git add . && git commit -m "...", but done entirely through REST API calls.

File parsing challenges

A CMS that works with "any content file" needs to parse multiple formats reliably. Each has its own set of edge cases.

JSON

JSON is the most common format for translation files. Parsing is straightforward with JSON.parse(), but serialization matters:

Preserve formatting. If the original file uses 2-space indentation, don't write back with 4 spaces. Read the original, detect the style, and match it.
Key ordering. JSON objects are technically unordered, but in practice, everyone expects keys to stay in the same order. Use a parser that preserves insertion order.
Unicode. Translation files contain strings in every language. Make sure your parser handles UTF-8 correctly, including CJK characters, RTL text, and emoji.

YAML

YAML is more complex. It supports multiple syntaxes for the same data:

# Block style
description: |
  This is a multiline
  string with newlines preserved.

# Flow style
tags: [i18n, cms, git]

# Anchors and aliases
defaults: &defaults
  theme: dark
  language: en

production:
  <<: *defaults
  debug: false

A CMS needs to parse all of these correctly and, critically, write them back in the same style. If a user has block-style strings, don't convert them to flow style on save. Libraries like js-yaml handle parsing well, but round-trip preservation requires extra care.

Markdown and MDX

Markdown files typically have YAML frontmatter followed by content:

---
title: "Getting Started"
date: 2026-01-15
draft: false
---

# Getting Started

Welcome to our documentation...

The CMS needs to:

Split the file at the frontmatter boundary (---)
Parse the frontmatter as YAML
Parse the body as Markdown (or MDX)
Present frontmatter fields as form inputs and the body as a rich text editor
Reassemble the file without corrupting either section

MDX adds another layer of complexity. It's Markdown with JSX components, and the CMS should preserve custom components it doesn't understand:

---
title: "Component Guide"
---

# Component Guide

Here's an interactive example:

<CodePlayground language="typescript">
  const x: number = 42;
</CodePlayground>

Regular markdown continues here.

If the CMS corrupts the <CodePlayground> block on save, the page breaks. Treating unknown components as opaque blocks is the safest approach.

Live preview architecture

Live preview is the hardest part to get right. The user edits content in the CMS, and a preview panel shows how the site will look with those changes, updating in real time without deploying.

Approach 1: iframe with injected content

The simplest approach: run the user's site in an iframe and inject updated content via postMessage:

// CMS editor
function updatePreview(key: string, value: string) {
  const iframe = document.getElementById("preview") as HTMLIFrameElement;
  iframe.contentWindow?.postMessage(
    { type: "content-update", key, value },
    "*"
  );
}

// Script injected into the preview iframe
window.addEventListener("message", (event) => {
  if (event.data.type === "content-update") {
    const { key, value } = event.data;
    // Find and update the DOM element displaying this key
    document.querySelectorAll(`[data-i18n-key="${key}"]`).forEach((el) => {
      el.textContent = value;
    });
  }
});

This is fast but fragile. It only works if you can map translation keys to DOM elements, and it doesn't handle layout changes, conditional rendering, or component re-renders.

Approach 2: virtual file system

A better approach: intercept the framework's file reads and serve modified content from memory.

For Next.js, this means running a development server where the i18n library reads from a virtual file system instead of disk. When the user changes a value in the CMS, you update the virtual file and trigger a hot reload.

This is the approach SkyBlobs uses for its live preview, running the user's actual framework with modified content files so the preview is pixel-perfect.

Approach 3: server-side render on demand

For static sites, you can re-render individual pages server-side with the modified content:

// On each content change, re-render the affected page
const html = await renderPage("/pricing", {
  ...originalContent,
  "pricing.title": newValue,
});

// Send the rendered HTML to the preview iframe
iframe.srcdoc = html;

This gives an accurate preview but is slower, since each edit triggers a full page render.

Rate limits and caching

The GitHub API has a rate limit of 5,000 requests per hour for authenticated users. A CMS can burn through that quickly if it's not careful:

Cache aggressively. File contents don't change unless someone pushes a commit. Cache file contents by SHA (which is a content-addressable hash) and only re-fetch when the SHA changes.
Batch reads. Use the Trees API to get the full file tree in one request instead of walking directories.
Use conditional requests. The GitHub API supports If-None-Match headers with ETags. A 304 response doesn't count against your rate limit.
Minimize writes. Batch multiple file changes into a single commit instead of one commit per file.

// Cache file content by SHA (content never changes for a given SHA)
const fileCache = new Map<string, string>();

async function getFileContent(
  owner: string,
  repo: string,
  path: string,
  sha: string
): Promise<string> {
  if (fileCache.has(sha)) {
    return fileCache.get(sha)!;
  }

  const response = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/git/blobs/${sha}`,
    { headers }
  );
  const data = await response.json();
  const content = Buffer.from(data.content, "base64").toString("utf-8");
  fileCache.set(sha, content);
  return content;
}

Trade-offs vs database-backed CMS

Advantages of Git-based

No infrastructure. No database to provision, no server to manage, no backups to configure. GitHub is your datastore.
Version history. Every change is a Git commit with author, timestamp, and diff. Rollback is git revert.
Branch-based workflows. Content changes can live on a branch, go through review, and merge when ready. This is impossible with most database-backed CMS tools.
No vendor lock-in. Your content is plain files in a Git repo. If you stop using the CMS, your content is still there, unchanged.
Developer experience. Developers can still edit content files directly, use their own tools, and the CMS doesn't get in the way.

Disadvantages of Git-based

No real-time collaboration. Two people can't edit the same file simultaneously like in Google Docs. Git handles conflicts, but it's not instant.
API rate limits. GitHub's API has limits. A CMS serving many concurrent users needs careful caching and request management.
No relational data. If your content model has relationships (blog post -> author -> bio), you're managing those relationships in file references, not foreign keys.
Publish latency. Merging a PR triggers a build and deploy. That's 30 seconds to a few minutes, not the instant publish of a database-backed system.
File size limits. GitHub's API has a 100MB file size limit, and performance degrades with very large files. Not an issue for text content, but relevant for media assets.

When to choose Git-based

A Git-based CMS is the right choice when:

Your content already lives in files (JSON, YAML, Markdown)
Your team uses Git and GitHub as part of their daily workflow
You want content changes to go through code review
You don't want to manage CMS infrastructure
Your content model is file-shaped, not relational

A database-backed CMS is better when:

You need real-time multi-user editing
Your content model has complex relationships
You need instant publish (no build step)
You're managing thousands of content entries
Non-technical users need to create new pages, not just edit existing ones

Conclusion

Building a CMS on top of Git works, and it comes with clear trade-offs. The GitHub API provides a capable backend for reading, writing, and versioning content files. The challenges (file parsing, live preview, rate limits) are solvable with careful engineering.

For teams whose content already lives in Git, this architecture avoids the complexity of a separate content infrastructure while giving non-technical users the editing interface they need. It's not the right choice for every project, but for static sites with file-based content, it's increasingly the best one.

Building a Git-Based CMS: How It Actually Works

Building a Git-Based CMS: How It Actually Works

The core idea

Reading files from GitHub

The tree API for large repos

Writing content back

Option 1: Update a single file (simple)

Option 2: Multi-file commit (complex)

File parsing challenges

JSON

YAML

Markdown and MDX

Live preview architecture

Approach 1: iframe with injected content

Approach 2: virtual file system

Approach 3: server-side render on demand

Rate limits and caching

Trade-offs vs database-backed CMS

Advantages of Git-based

Disadvantages of Git-based

When to choose Git-based

Conclusion

Keep reading

Try SkyBlobs free