Best way of learning is Sharing.: Build a “README & Docs Agent” in Python (No Cursor/Copilot

Are you or your team is lazy enough to for documentation then here is the article shows how to build a small DocAgent in Python that can scan any repo, detect the project type, and generate:

README.md
optional additional docs like docs/HOW_TO_RUN.md and docs/PROJECT_OVERVIEW.md

The design is deliberately practical:

Scan → Detect → Summarize → Generate

“Don’t guess”: write TODO if info isn’t provable from the repo
Minimal dependencies: only requests

What you’ll build

You’ll create a CLI like:

python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs"

And it will:

detect whether the repo looks like .NET, Node, or Python
detect whether it’s a Web app vs Console/Library
summarize high-signal files with a cheaper model
generate a README with a stronger model (and optional docs)

Step-by-step build

1) Create a folder

DocAgentPy/
  docagent/
  docagent_cli.py
  requirements.txt

2) Install dependencies

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r tools\DocAgentPy\requirements.txt

3) Set your API key

$env:OPENAI_API_KEY="YOUR_KEY_HERE"
OR
$env:AZURE_OPENAI_API_KEY="YOUR_KEY_HERE" //for azure open ai

4) Run it

README only:

python docagent_cli.py --repo "C:\git\myproject"

README + docs:

python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs"

Dry-run:

python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs" --dry-run

Explain detection:

python docagent_cli.py --repo "C:\git\myproject" --explain-detection --dry-run

How detection works (high level)

We use a few reliable markers:

.NET

*.csproj exists ⇒ .NET
Microsoft.NET.Sdk.Web or Controllers/ ⇒ likely Web app

Node

package.json exists ⇒ Node
Next.js / React / Express dependencies ⇒ likely Web app

Python

pyproject.toml / requirements.txt / setup.py exists ⇒ Python
manage.py / wsgi.py / asgi.py ⇒ likely Web app

This detection is used to provide stack-specific guidance to the model (dotnet vs npm vs venv), but the model still must rely on repo evidence and use TODOs when uncertain.

Important: OpenAI vs Azure OpenAI (and how to avoid 404s)

If your --api-base looks like:

https://api.openai.com/v1 → OpenAI
https://<resource>.openai.azure.com/ → Azure OpenAI

Azure OpenAI does not accept POST /v1/responses. It requires:

POST /openai/deployments/<deployment>/responses?api-version=<version>
header api-key: <your key>

That’s why calling /responses directly on an Azure base URL returns 404 Resource not found.

Important: Some Azure resources/api-versions still return 404 for the Responses route even with the correct URL. If client.responses.create(...) 404s even in the Azure Portal sample, then /responses is not available for your resource yet.

Practical solution: use `/chat/completions` as a fallback (runtime switch)

To keep the agent useful across Azure environments, add a runtime flag:

--azure-mode responses → use /responses only (fail if 404)
--azure-mode chat → use /chat/completions only
--azure-mode auto → try /responses, and if Azure returns 404, fall back to /chat/completions

This lets you keep the best route where available, and still generate docs in environments where /responses isn’t exposed yet.

In this Python DocAgent implementation:

If api_base contains openai.azure.com, we automatically switch to Azure mode.
The tool treats --model / --summarize-model as deployment names in Azure mode.
If your deployment names differ, pass:
--deployment <name> --summarize-deployment <name>

Example:

$env:AZURE_OPENAI_API_KEY="YOUR_AZURE_KEY"
python tools\DocAgentPy\docagent_cli.py --repo "C:\git\myproject" `
  --api-base "https://<resource>.openai.azure.com/" `
  --api-version "2024-12-01-preview" `
  --deployment "<deployment-name>" `
  --summarize-deployment "<summarize-deployment-name>"
  --azure-mode chat

Note: In my case, Azure open AI /responses route was not enabled hence I used chat mode.

Full copy/paste implementation (all files)

Everything below is intended to be copied into the same paths.

`DocAgentPy/requirements.txt`

requests==2.32.3
openai==1.60.0

`DocAgentPy/docagent/init.py`

__all__ = [
    "openai_responses",
    "repo_scanner",
    "project_detector",
    "generator",
]

`DocAgentPy/docagent/openai_responses.py`

from __future__ import annotations

import json
from dataclasses import dataclass
from typing import Any, Optional

import requests

try:
    # Optional: official SDK supports AzureOpenAI and handles routing/versioning.
    from openai import AzureOpenAI, OpenAI  # type: ignore
except Exception:  # pragma: no cover
    AzureOpenAI = None  # type: ignore
    OpenAI = None  # type: ignore


@dataclass(frozen=True)
class OpenAIConfig:
    api_key: str
    api_base: str = "https://api.openai.com/v1"
    # Azure OpenAI support (optional). If api_base contains "openai.azure.com", Azure mode is auto-enabled.
    api_version: str = "2024-12-01-preview"
    # Azure routing strategy:
    # - "responses": use Responses API only (fail if 404)
    # - "chat": use Chat Completions only
    # - "auto": try Responses, and if Azure returns 404, fall back to Chat Completions
    azure_mode: str = "responses"


class OpenAIResponsesClient:
    """
    Minimal client for OpenAI Responses API.
    Uses the official OpenAI SDK when available; falls back to requests.
    """

    def __init__(self, cfg: OpenAIConfig, session: Optional[requests.Session] = None) -> None:
        self._cfg = cfg
        self._session = session or requests.Session()

    def create_response_text(self, *, model: str, system_prompt: str, user_prompt: str, timeout_s: int = 120) -> str:
        is_azure = "openai.azure.com" in self._cfg.api_base.lower()

        # Prefer the official SDK when available. It matches Azure Portal examples and avoids URL pitfalls.
        if is_azure and AzureOpenAI is not None:
            client = AzureOpenAI(azure_endpoint=self._cfg.api_base, api_key=self._cfg.api_key, api_version=self._cfg.api_version)
            mode = (self._cfg.azure_mode or "responses").lower()

            def do_chat() -> str:
                chat = client.chat.completions.create(
                    model=model,  # Azure: model == deployment name
                    messages=[
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": user_prompt},
                    ],
                )
                content = getattr(chat.choices[0].message, "content", None)
                if isinstance(content, str) and content.strip():
                    return content.strip()
                raise RuntimeError("Azure chat completion returned no message content.")

            if mode == "chat":
                return do_chat()

            if mode in {"responses", "auto"}:
                try:
                    resp = client.responses.create(
                        model=model,  # Azure: model == deployment name
                        input=[
                            {"role": "system", "content": system_prompt},
                            {"role": "user", "content": user_prompt},
                        ],
                    )
                    return _extract_output_text_from_sdk(resp)
                except Exception as e:
                    if mode == "auto" and "404" in str(e):
                        return do_chat()
                    raise

            raise ValueError(f"Invalid azure_mode: {self._cfg.azure_mode!r}. Expected 'responses', 'chat', or 'auto'.")

        if (not is_azure) and OpenAI is not None:
            client = OpenAI(api_key=self._cfg.api_key, base_url=self._cfg.api_base)
            resp = client.responses.create(
                model=model,
                input=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt},
                ],
            )
            return _extract_output_text_from_sdk(resp)

        # Fallback: raw HTTP (works for OpenAI and for Azure when SDK isn't available)
        if is_azure:
            mode = (self._cfg.azure_mode or "responses").lower()
            deployment = model
            headers = {"api-key": self._cfg.api_key, "Content-Type": "application/json", "Accept": "application/json"}

            def do_chat_http() -> str:
                chat_url = self._cfg.api_base.rstrip("/") + f"/openai/deployments/{deployment}/chat/completions?api-version={self._cfg.api_version}"
                chat_payload: dict[str, Any] = {
                    "messages": [
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": user_prompt},
                    ],
                }
                chat_resp = self._session.post(chat_url, headers=headers, data=json.dumps(chat_payload), timeout=timeout_s)
                chat_text = chat_resp.text
                if chat_resp.status_code // 100 != 2:
                    raise RuntimeError(f"OpenAI error ({chat_resp.status_code}) calling {chat_url}: {chat_text}")

                chat_data = json.loads(chat_text)
                choice0 = (chat_data.get("choices") or [{}])[0]
                msg = choice0.get("message") or {}
                content = msg.get("content")
                if isinstance(content, str) and content.strip():
                    return content.strip()
                raise RuntimeError(f"Azure chat completion returned no message content: {chat_text}")

            if mode == "chat":
                return do_chat_http()

            url = self._cfg.api_base.rstrip("/") + f"/openai/deployments/{deployment}/responses?api-version={self._cfg.api_version}"
            payload: dict[str, Any] = {
                "input": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt},
                ],
            }
        else:
            url = self._cfg.api_base.rstrip("/") + "/responses"
            headers = {
                "Authorization": f"Bearer {self._cfg.api_key}",
                "Content-Type": "application/json",
                "Accept": "application/json",
            }
            payload = {
                "model": model,
                "input": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt},
                ],
            }

        resp = self._session.post(url, headers=headers, data=json.dumps(payload), timeout=timeout_s)
        text = resp.text

        if resp.status_code // 100 == 2:
            return _extract_output_text(text)

        if is_azure and (self._cfg.azure_mode or "responses").lower() == "auto" and resp.status_code == 404:
            return do_chat_http()

        raise RuntimeError(f"OpenAI error ({resp.status_code}) calling {url}: {text}")


def _extract_output_text_from_sdk(resp: Any) -> str:
    """
    Extract output text from OpenAI SDK response objects.
    """
    # New SDKs expose output_text directly.
    t = getattr(resp, "output_text", None)
    if isinstance(t, str) and t.strip():
        return t.strip()

    # Otherwise, fall back to JSON parsing using our tolerant extractor.
    raw: Optional[str] = None
    if hasattr(resp, "model_dump_json"):
        raw = resp.model_dump_json()  # pydantic
    elif hasattr(resp, "to_json"):
        raw = resp.to_json()

    if isinstance(raw, str) and raw.strip():
        return _extract_output_text(raw)

    # Last resort: stringify via __dict__ (may be incomplete).
    return _extract_output_text(json.dumps(resp, default=lambda o: getattr(o, "__dict__", str(o))))


def _extract_output_text(raw_json: str) -> str:
    """
    Extract concatenated output text from a Responses API payload.
    Kept intentionally tolerant to minor schema changes.
    """
    data = json.loads(raw_json)

    # Common shape: output: [ { content: [ { type:"output_text", text:"..." }, ... ] }, ... ]
    out = data.get("output")
    if isinstance(out, list):
        parts: list[str] = []
        for item in out:
            content = item.get("content") if isinstance(item, dict) else None
            if not isinstance(content, list):
                continue
            for c in content:
                if not isinstance(c, dict):
                    continue
                if c.get("type") == "output_text" and isinstance(c.get("text"), str):
                    parts.append(c["text"])
        joined = "".join(parts).strip()
        if joined:
            return joined

    # Fallback: some payloads expose output_text directly
    if isinstance(data.get("output_text"), str):
        return data["output_text"].strip()

    raise RuntimeError("Could not extract output text from OpenAI response.")

`DocAgentPy/docagent/repo_scanner.py`

from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Iterable


@dataclass(frozen=True)
class ScannedFile:
    relative_path: str
    size_bytes: int
    content: str


@dataclass(frozen=True)
class ScannedRepo:
    repo_root: str
    tree: str
    files: list[ScannedFile]


IGNORED_DIRS = {".git", ".vs", "bin", "obj", "node_modules", "packages", ".idea", "__pycache__"}

HIGH_SIGNAL_EXTS = {
    # .NET
    ".csproj", ".sln", ".slnx", ".cs", ".fs", ".vb",
    # Python
    ".py",
    # Node
    ".js", ".mjs", ".cjs", ".ts", ".tsx",
    # Docs/config
    ".md", ".json", ".yml", ".yaml", ".toml", ".ini", ".env",
    # scripts
    ".ps1", ".sh", ".cmd", ".bat",
}

HIGH_SIGNAL_NAMES = {
    # General
    "README.md", "LICENSE", "Dockerfile", "docker-compose.yml", "docker-compose.yaml", "global.json",
    # .NET
    "Program.cs",
    # Node
    "package.json", "package-lock.json", "yarn.lock", "pnpm-lock.yaml", "next.config.js", "next.config.mjs",
    # Python
    "pyproject.toml", "requirements.txt", "Pipfile", "Pipfile.lock", "poetry.lock", "setup.py", "setup.cfg", "manage.py",
    # env
    ".env", ".env.example",
}


def list_repo_files(repo_root: str, max_files: int = 10_000) -> list[str]:
    root = Path(repo_root).resolve()
    out: list[str] = []
    for p in _enumerate_files(root):
        out.append(str(p.relative_to(root)))
        if len(out) >= max_files:
            break
    return out


def scan_repo(repo_root: str, max_files: int = 80, max_chars_per_file: int = 18_000, tree_max_entries: int = 250) -> ScannedRepo:
    root = Path(repo_root).resolve()
    all_files = list(_enumerate_files(root))

    ranked = sorted(all_files, key=lambda p: (-_score_file(root, p), len(str(p))))
    ranked = ranked[:max_files]

    scanned: list[ScannedFile] = []
    for p in ranked:
        rel = str(p.relative_to(root))
        size = p.stat().st_size
        scanned.append(ScannedFile(rel, size, _read_text_best_effort(p, max_chars=max_chars_per_file)))

    return ScannedRepo(str(root), _build_tree(root, tree_max_entries), scanned)


def _enumerate_files(root: Path) -> Iterable[Path]:
    stack = [root]
    while stack:
        d = stack.pop()
        for child in d.iterdir():
            if child.is_dir():
                if child.name in IGNORED_DIRS:
                    continue
                stack.append(child)
                continue
            if child.is_file() and not _is_ignored_file(child):
                yield child


def _is_ignored_file(p: Path) -> bool:
    name = p.name
    if name.lower().endswith((".dll", ".exe", ".pdb", ".cache", ".user", ".suo")):
        return True

    ext = p.suffix.lower()
    if ext in HIGH_SIGNAL_EXTS:
        return False
    if name in HIGH_SIGNAL_NAMES:
        return False

    return True


def _score_file(root: Path, p: Path) -> int:
    rel = str(p.relative_to(root)).replace("\\", "/")
    name = p.name
    ext = p.suffix.lower()

    score = 0
    if name in HIGH_SIGNAL_NAMES:
        score += 200
    if name.lower().endswith(".csproj"):
        score += 180
    if name.lower().endswith((".sln", ".slnx")):
        score += 170
    if name == "Program.cs":
        score += 140
    if name.lower().startswith("appsettings"):
        score += 120
    if rel.startswith(".github/workflows/"):
        score += 110
    if "/docker" in rel.lower() or name.lower().startswith("docker"):
        score += 90

    if ext == ".md":
        score += 60
    if ext in {".cs", ".py", ".ts", ".js"}:
        score += 50
    if ext in {".json", ".yml", ".yaml", ".toml"}:
        score += 40

    # prefer top-level
    score += max(0, 15 - rel.count("/"))
    return score


def _read_text_best_effort(p: Path, max_chars: int) -> str:
    try:
        text = p.read_text(encoding="utf-8")
    except UnicodeDecodeError:
        text = p.read_text(encoding="utf-8", errors="replace")
    except Exception:
        return "<unreadable>"

    if len(text) <= max_chars:
        return text
    return text[:max_chars] + "\\n\\n... (truncated)\\n"


def _build_tree(root: Path, max_entries: int) -> str:
    items: list[str] = []
    for i, p in enumerate(sorted(_enumerate_files(root), key=lambda x: str(x).lower())):
        if i >= max_entries:
            items.append("... (tree truncated)")
            break
        items.append(str(p.relative_to(root)).replace("\\\\", "/"))
    return "\\n".join(items)

`DocAgentPy/docagent/project_detector.py`

from __future__ import annotations

import json
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable


class PrimaryLanguage:
    UNKNOWN = "Unknown"
    DOTNET = "DotNet"
    NODE = "Node"
    PYTHON = "Python"


class AppKind:
    UNKNOWN = "Unknown"
    CONSOLE_OR_LIBRARY = "ConsoleOrLibrary"
    WEB_APP = "WebApp"


@dataclass(frozen=True)
class DetectedProject:
    language: str
    kind: str
    confidence: str
    evidence: list[str]


def detect_project(repo_root: str, repo_files: Iterable[str]) -> DetectedProject:
    repo_files = list(repo_files)
    evidence: list[str] = []

    # ---- .NET ----
    csproj = next((p for p in repo_files if p.lower().endswith(".csproj")), None)
    if csproj:
        evidence.append(f"Found C# project file: {csproj}")
        kind = AppKind.CONSOLE_OR_LIBRARY
        try:
            xml = Path(repo_root, csproj).read_text(encoding="utf-8", errors="replace")
            if "Microsoft.NET.Sdk.Web".lower() in xml.lower():
                kind = AppKind.WEB_APP
                evidence.append("csproj uses Microsoft.NET.Sdk.Web (ASP.NET Core Web app).")
            elif any("/Controllers/" in p.replace("\\\\", "/") for p in repo_files):
                kind = AppKind.WEB_APP
                evidence.append("Found Controllers/ folder (likely ASP.NET Web API).")
        except Exception:
            pass
        return DetectedProject(PrimaryLanguage.DOTNET, kind, "high", evidence)

    # ---- Node ----
    pkg = next((p for p in repo_files if Path(p).name.lower() == "package.json"), None)
    if pkg:
        evidence.append("Found package.json (Node project).")
        kind = AppKind.CONSOLE_OR_LIBRARY
        try:
            raw = Path(repo_root, pkg).read_text(encoding="utf-8", errors="replace")
            data = json.loads(raw)
            deps = data.get("dependencies") or {}
            deps_txt = json.dumps(deps).lower()
            if any(Path(p).name.lower() in {"next.config.js", "next.config.mjs"} for p in repo_files):
                kind = AppKind.WEB_APP
                evidence.append("Found next.config.* (Next.js web app).")
            elif any("/src/app/" in p.replace("\\\\", "/").lower() or "/pages/" in p.replace("\\\\", "/").lower() for p in repo_files):
                kind = AppKind.WEB_APP
                evidence.append("Found common web app folders (src/app or pages).")
            elif any(x in deps_txt for x in ["react", "next", "express", "fastify"]):
                kind = AppKind.WEB_APP
                evidence.append("package.json dependencies indicate a web framework (react/next/express/fastify).")
        except Exception:
            pass
        return DetectedProject(PrimaryLanguage.NODE, kind, "medium", evidence)

    # ---- Python ----
    has_pyproject = any(Path(p).name.lower() == "pyproject.toml" for p in repo_files)
    has_requirements = any(Path(p).name.lower() == "requirements.txt" for p in repo_files)
    has_setup_py = any(Path(p).name.lower() == "setup.py" for p in repo_files)

    if has_pyproject or has_requirements or has_setup_py:
        if has_pyproject:
            evidence.append("Found pyproject.toml (Python project).")
        if has_requirements:
            evidence.append("Found requirements.txt (Python project).")
        if has_setup_py:
            evidence.append("Found setup.py (Python project).")

        kind = AppKind.CONSOLE_OR_LIBRARY
        if any(Path(p).name.lower() == "manage.py" for p in repo_files):
            kind = AppKind.WEB_APP
            evidence.append("Found manage.py (likely Django web app).")
        elif any(Path(p).name.lower() in {"app.py", "wsgi.py", "asgi.py"} for p in repo_files):
            kind = AppKind.WEB_APP
            evidence.append("Found common Python web entrypoints (app.py/wsgi.py/asgi.py).")

        return DetectedProject(PrimaryLanguage.PYTHON, kind, "medium", evidence)

    return DetectedProject(PrimaryLanguage.UNKNOWN, AppKind.UNKNOWN, "low", ["No strong project markers found."])

`DocAgentPy/docagent/generator.py`

from __future__ import annotations

from dataclasses import dataclass
from typing import Optional

from .openai_responses import OpenAIResponsesClient
from .project_detector import DetectedProject, PrimaryLanguage, AppKind, detect_project
from .repo_scanner import ScannedRepo, list_repo_files, scan_repo


@dataclass(frozen=True)
class GenerateRequest:
    repo_root: str
    readme_path: str
    docs_dir: Optional[str]
    readme_model: str
    summarize_model: str
    max_files: int


@dataclass(frozen=True)
class GeneratedDoc:
    relative_path: str
    markdown: str


@dataclass(frozen=True)
class GenerateResult:
    readme_markdown: str
    additional_docs: list[GeneratedDoc]


class DocumentationGenerator:
    def __init__(self, openai: OpenAIResponsesClient) -> None:
        self._openai = openai

    def generate(self, req: GenerateRequest) -> GenerateResult:
        repo_files = list_repo_files(req.repo_root)
        detected = detect_project(req.repo_root, repo_files)

        repo = scan_repo(req.repo_root, max_files=req.max_files)
        summaries = self._summarize_files(repo, req.summarize_model)

        readme = self._generate_readme(detected, repo, summaries, req.readme_model)

        docs: list[GeneratedDoc] = []
        if req.docs_dir:
            docs.append(GeneratedDoc("PROJECT_OVERVIEW.md", self._generate_project_overview(detected, repo, summaries, req.readme_model)))
            docs.append(GeneratedDoc("HOW_TO_RUN.md", self._generate_how_to_run(detected, repo, summaries, req.readme_model)))

        return GenerateResult(readme, docs)

    def _summarize_files(self, repo: ScannedRepo, model: str) -> list[tuple[str, str]]:
        out: list[tuple[str, str]] = []
        system = "You are a senior engineer summarizing repository files for documentation. Be accurate and concrete."

        for f in repo.files:
            user = f"""Summarize this file for documentation purposes.

Rules:
- Focus on what a reader needs for README/docs: what it does, how it’s used, key commands/config, gotchas.
- Keep it short (5-12 bullets). Include relevant command examples if present.
- If the file is clearly irrelevant to running/understanding the project, say so.

FILE: {f.relative_path}
SIZE_BYTES: {f.size_bytes}

CONTENT:
{f.content}
"""
            summary = self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip()
            out.append((f.relative_path, summary))
        return out

    def _generate_readme(self, detected: DetectedProject, repo: ScannedRepo, summaries: list[tuple[str, str]], model: str) -> str:
        system = (
            "You write accurate READMEs for software repos.\\n"
            "Do NOT guess. If info is missing, add a TODO section instead of hallucinating.\\n"
            "Prefer actionable steps and concrete commands.\\n"
            "Return ONLY markdown.\\n"
        )
        user = f"""Create a README.md for this repository.

Detected project (heuristic):
- Language: {detected.language}
- Kind: {detected.kind}
- Confidence: {detected.confidence}
- Evidence:
{_format_evidence(detected)}

Repository root: {repo.repo_root}

Repository tree (truncated):
{repo.tree}

File summaries:
{_format_summaries(summaries)}

README requirements:
- Title + one-paragraph overview
- Features (bullets)
- Prerequisites
- Quickstart (commands)
- Configuration (env vars / config files if present)
- Project structure (top-level overview)
- Troubleshooting (common build/run issues)
- If a license file exists in the tree, mention it; otherwise omit.
- Add a TODO section ONLY when necessary (unknown ports/env vars/etc).

Stack-specific guidance (follow if applicable; otherwise use TODO):
{_stack_guidance(detected)}

Important: Keep it friendly and easy for a beginner to run.
"""
        return self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip() + "\\n"

    def _generate_project_overview(self, detected: DetectedProject, repo: ScannedRepo, summaries: list[tuple[str, str]], model: str) -> str:
        system = (
            "You write an easy-to-understand project overview document.\\n"
            "Do NOT guess; call out unknowns as TODO.\\n"
            "Return ONLY markdown.\\n"
        )
        user = f"""Write docs/PROJECT_OVERVIEW.md for this repository.

Detected project:
- Language: {detected.language}
- Kind: {detected.kind}
- Confidence: {detected.confidence}

Include:
- What problem it solves
- What’s inside (major components/files)
- Key flows / entrypoints
- How to extend it safely

Repo tree:
{repo.tree}

Summaries:
{_format_summaries(summaries)}
"""
        return self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip() + "\\n"

    def _generate_how_to_run(self, detected: DetectedProject, repo: ScannedRepo, summaries: list[tuple[str, str]], model: str) -> str:
        system = (
            "You write a concise runbook for developers.\\n"
            "Do NOT guess. Prefer exact dotnet/npm/etc commands found in the repo summaries.\\n"
            "Return ONLY markdown.\\n"
        )
        user = f"""Write docs/HOW_TO_RUN.md for this repository.

Include:
- Prerequisites
- Build
- Run
- Test (if applicable)
- Common issues and fixes

Detected project:
- Language: {detected.language}
- Kind: {detected.kind}
- Confidence: {detected.confidence}

Stack-specific guidance:
{_stack_guidance(detected)}

Repo tree:
{repo.tree}

Summaries:
{_format_summaries(summaries)}
"""
        return self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip() + "\\n"


def _format_summaries(summaries: list[tuple[str, str]]) -> str:
    parts: list[str] = []
    for path, summary in summaries:
        parts.append(f"### {path}\\n{summary}\\n")
    return "\\n".join(parts).strip()


def _format_evidence(detected: DetectedProject) -> str:
    if not detected.evidence:
        return "  - (none)"
    return "\\n".join([f"  - {e}" for e in detected.evidence])


def _stack_guidance(detected: DetectedProject) -> str:
    # Guidance, not facts — model must still rely on repo summaries and use TODO if uncertain.
    if detected.language == PrimaryLanguage.DOTNET and detected.kind == AppKind.WEB_APP:
        return (
            "- Prefer ASP.NET Core style instructions.\\n"
            "- Typical commands (use only if confirmed by repo): `dotnet restore`, `dotnet build`, `dotnet run`, `dotnet test`.\\n"
            "- Mention `appsettings*.json`, `launchSettings.json`, and env vars if present.\\n"
            "- If ports/urls are unclear, add TODO rather than guessing.\\n"
        )
    if detected.language == PrimaryLanguage.DOTNET:
        return (
            "- Prefer .NET style instructions.\\n"
            "- Typical commands (use only if confirmed by repo): `dotnet restore`, `dotnet build`, `dotnet run`, `dotnet test`.\\n"
            "- Mention TargetFramework if present.\\n"
        )
    if detected.language == PrimaryLanguage.NODE and detected.kind == AppKind.WEB_APP:
        return (
            "- Prefer Node web app instructions.\\n"
            "- Detect package manager from lockfile: pnpm/yarn/npm.\\n"
            "- Typical commands (use only if confirmed by repo): install deps, run dev server, build, test.\\n"
            "- Mention `.env` / `.env.example` if present.\\n"
        )
    if detected.language == PrimaryLanguage.NODE:
        return (
            "- Prefer Node app/library instructions.\\n"
            "- Detect package manager from lockfile: pnpm/yarn/npm.\\n"
            "- Typical commands (use only if confirmed by repo): install, build, test.\\n"
        )
    if detected.language == PrimaryLanguage.PYTHON and detected.kind == AppKind.WEB_APP:
        return (
            "- Prefer Python web app instructions.\\n"
            "- Mention creating a venv and installing deps (`requirements.txt` or `pyproject.toml`).\\n"
            "- If framework is unclear, add TODO rather than guessing.\\n"
        )
    if detected.language == PrimaryLanguage.PYTHON:
        return (
            "- Prefer Python app/library instructions.\\n"
            "- Mention creating a venv and installing deps (`requirements.txt` or `pyproject.toml`).\\n"
        )
    return "- Language/framework is unclear: keep instructions generic and add TODOs for missing run commands.\\n"

`DocAgentPy/docagent_cli.py`

from __future__ import annotations

import argparse
import os
from pathlib import Path

from docagent.generator import DocumentationGenerator, GenerateRequest
from docagent.openai_responses import OpenAIConfig, OpenAIResponsesClient
from docagent.project_detector import detect_project
from docagent.repo_scanner import list_repo_files


def main() -> int:
    p = argparse.ArgumentParser(
        prog="docagent",
        description="DocAgent (Python): generate README/docs for a repository using OpenAI.",
    )
    p.add_argument("--repo", default=".", help="Repo/project root to document (default: current directory)")
    p.add_argument("--readme", default=None, help="Output README path (default: <repo>/README.md)")
    p.add_argument("--docs", default=None, help="Also write docs into this directory (e.g. <repo>/docs)")
    p.add_argument("--api-key", default=None, help="OpenAI API key (default: OPENAI_API_KEY env var)")
    p.add_argument("--api-base", default="https://api.openai.com/v1", help="API base URL (OpenAI: https://api.openai.com/v1, Azure: https://<resource>.openai.azure.com/)")
    p.add_argument("--api-version", default="2024-12-01-preview", help="Azure OpenAI api-version (used only for *.openai.azure.com)")
    p.add_argument(
        "--azure-mode",
        default="responses",
        choices=["responses", "chat", "auto"],
        help="Azure routing: responses|chat|auto (auto: try /responses then fall back to /chat/completions on 404).",
    )
    p.add_argument("--model", default="gpt-4.1", help="Model for README/docs generation")
    p.add_argument("--summarize-model", default="gpt-4.1", help="Model for file summarization")
    p.add_argument("--deployment", default=None, help="Azure OpenAI deployment name for generation (overrides --model in Azure mode)")
    p.add_argument("--summarize-deployment", default=None, help="Azure OpenAI deployment name for summarization (overrides --summarize-model in Azure mode)")
    p.add_argument("--max-files", type=int, default=80, help="Max files to summarize")
    p.add_argument("--dry-run", action="store_true", help="Print markdown to stdout (don’t write files)")
    p.add_argument("--explain-detection", action="store_true", help="Print detection evidence")

    args = p.parse_args()

    repo_root = str(Path(args.repo).resolve())
    readme_path = str(Path(args.readme).resolve()) if args.readme else str(Path(repo_root, "README.md"))
    docs_dir = str(Path(args.docs).resolve()) if args.docs else None

    # Support either OpenAI or Azure OpenAI key via env vars.
    api_key = args.api_key or os.environ.get("OPENAI_API_KEY") or os.environ.get("AZURE_OPENAI_API_KEY")
    if not api_key:
        print("Missing API key. Pass --api-key or set OPENAI_API_KEY (OpenAI) / AZURE_OPENAI_API_KEY (Azure).")
        return 2

    repo_files = list_repo_files(repo_root)
    detected = detect_project(repo_root, repo_files)
    print(f"Detected: {detected.language} / {detected.kind} (confidence: {detected.confidence})")
    if args.explain_detection:
        for e in detected.evidence:
            print(f"- {e}")

    # If using Azure OpenAI, treat "model" as deployment name unless explicitly overridden.
    is_azure = "openai.azure.com" in str(args.api_base).lower()
    readme_model = args.deployment if (is_azure and args.deployment) else args.model
    summarize_model = args.summarize_deployment if (is_azure and args.summarize_deployment) else args.summarize_model

    client = OpenAIResponsesClient(
        OpenAIConfig(api_key=api_key, api_base=args.api_base, api_version=args.api_version, azure_mode=args.azure_mode)
    )
    generator = DocumentationGenerator(client)

    result = generator.generate(
        GenerateRequest(
            repo_root=repo_root,
            readme_path=readme_path,
            docs_dir=docs_dir,
            readme_model=readme_model,
            summarize_model=summarize_model,
            max_files=args.max_files,
        )
    )

    if args.dry_run:
        print(result.readme_markdown)
        if result.additional_docs:
            print("\\n---- Additional docs ----")
            for doc in result.additional_docs:
                print(f"\\n# {doc.relative_path}\\n{doc.markdown}")
        return 0

    Path(readme_path).parent.mkdir(parents=True, exist_ok=True)
    Path(readme_path).write_text(result.readme_markdown, encoding="utf-8")
    print(f"Wrote {readme_path}")

    if docs_dir:
        Path(docs_dir).mkdir(parents=True, exist_ok=True)
        for doc in result.additional_docs:
            out_path = Path(docs_dir, doc.relative_path)
            out_path.parent.mkdir(parents=True, exist_ok=True)
            out_path.write_text(doc.markdown, encoding="utf-8")
            print(f"Wrote {out_path}")

    return 0


if __name__ == "__main__":
    raise SystemExit(main())

`DocAgentPy/README.md` (optional)

# DocAgentPy (Python documentation generator)
This is a Python version of **DocAgent**. It scans a repo folder and generates:
- a `README.md`
- optional additional docs in a `docs/` directory
It uses the OpenAI **Responses API** via `requests` (no IDE plugins required).
## Setup
From the repository root:
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r tools\DocAgentPy\requirements.txt

Set your API key:

$env:OPENAI_API_KEY="YOUR_KEY_HERE"

Run

Generate README for a repo:

python docagent_cli.py --repo "C:\git\myproject"

Generate README + extra docs:

python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs"

Dry run:

python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs" --dry-run

Explain detection:

python docagent_cli.py --repo "C:\git\myproject" --explain-detection --dry-run

Hope this would have help you. Happy documenting.

Best way of learning is Sharing.

Friday, 19 December 2025

Build a “README & Docs Agent” in Python (No Cursor/Copilot — Just an OpenAI Model)

What you’ll build

Step-by-step build

1) Create a folder

2) Install dependencies

3) Set your API key

4) Run it

How detection works (high level)

Important: OpenAI vs Azure OpenAI (and how to avoid 404s)

Practical solution: use `/chat/completions` as a fallback (runtime switch)

Full copy/paste implementation (all files)

`DocAgentPy/requirements.txt`

`DocAgentPy/docagent/init.py`

`DocAgentPy/docagent/openai_responses.py`

`DocAgentPy/docagent/repo_scanner.py`

`DocAgentPy/docagent/project_detector.py`

`DocAgentPy/docagent/generator.py`

`DocAgentPy/docagent_cli.py`

`DocAgentPy/README.md` (optional)

Run

No comments:

Post a Comment

Friday, 19 December 2025

Build a “README & Docs Agent” in Python (No Cursor/Copilot — Just an OpenAI Model)

What you’ll build

Step-by-step build

1) Create a folder

2) Install dependencies

3) Set your API key

4) Run it

How detection works (high level)

Important: OpenAI vs Azure OpenAI (and how to avoid 404s)

Practical solution: use /chat/completions as a fallback (runtime switch)

Full copy/paste implementation (all files)

DocAgentPy/requirements.txt

DocAgentPy/docagent/__init__.py

DocAgentPy/docagent/openai_responses.py

DocAgentPy/docagent/repo_scanner.py

DocAgentPy/docagent/project_detector.py

DocAgentPy/docagent/generator.py

DocAgentPy/docagent_cli.py

DocAgentPy/README.md (optional)

Run

No comments:

Post a Comment

Practical solution: use `/chat/completions` as a fallback (runtime switch)

`DocAgentPy/requirements.txt`

`DocAgentPy/docagent/init.py`

`DocAgentPy/docagent/openai_responses.py`

`DocAgentPy/docagent/repo_scanner.py`

`DocAgentPy/docagent/project_detector.py`

`DocAgentPy/docagent/generator.py`

`DocAgentPy/docagent_cli.py`

`DocAgentPy/README.md` (optional)