Are you or your team is lazy enough to for documentation then here is the article shows how to build a small DocAgent in Python that can scan any repo, detect the project type, and generate:
README.md- optional additional docs like
docs/HOW_TO_RUN.mdanddocs/PROJECT_OVERVIEW.md
The design is deliberately practical:
Scan → Detect → Summarize → Generate
- “Don’t guess”: write TODO if info isn’t provable from the repo
- Minimal dependencies: only
requests
What you’ll build
You’ll create a CLI like:
python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs"And it will:
- detect whether the repo looks like .NET, Node, or Python
- detect whether it’s a Web app vs Console/Library
- summarize high-signal files with a cheaper model
- generate a README with a stronger model (and optional docs)
Step-by-step build
1) Create a folder
DocAgentPy/
docagent/
docagent_cli.py
requirements.txt2) Install dependencies
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r tools\DocAgentPy\requirements.txt3) Set your API key
$env:OPENAI_API_KEY="YOUR_KEY_HERE"
OR
$env:AZURE_OPENAI_API_KEY="YOUR_KEY_HERE" //for azure open ai4) Run it
README only:
python docagent_cli.py --repo "C:\git\myproject"README + docs:
python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs"Dry-run:
python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs" --dry-runExplain detection:
python docagent_cli.py --repo "C:\git\myproject" --explain-detection --dry-runHow detection works (high level)
We use a few reliable markers:
.NET
*.csprojexists ⇒ .NETMicrosoft.NET.Sdk.WeborControllers/⇒ likely Web app
Node
package.jsonexists ⇒ Node- Next.js / React / Express dependencies ⇒ likely Web app
Python
pyproject.toml/requirements.txt/setup.pyexists ⇒ Pythonmanage.py/wsgi.py/asgi.py⇒ likely Web app
This detection is used to provide stack-specific guidance to the model (dotnet vs npm vs venv), but the model still must rely on repo evidence and use TODOs when uncertain.
Important: OpenAI vs Azure OpenAI (and how to avoid 404s)
If your --api-base looks like:
https://api.openai.com/v1→ OpenAIhttps://<resource>.openai.azure.com/→ Azure OpenAI
Azure OpenAI does not accept POST /v1/responses. It requires:
POST /openai/deployments/<deployment>/responses?api-version=<version>- header
api-key: <your key>
That’s why calling /responses directly on an Azure base URL returns 404 Resource not found.
Important: Some Azure resources/api-versions still return 404 for the Responses route even with the correct URL. If client.responses.create(...) 404s even in the Azure Portal sample, then /responses is not available for your resource yet.
Practical solution: use /chat/completions as a fallback (runtime switch)
To keep the agent useful across Azure environments, add a runtime flag:
--azure-mode responses→ use/responsesonly (fail if 404)--azure-mode chat→ use/chat/completionsonly--azure-mode auto→ try/responses, and if Azure returns 404, fall back to/chat/completions
This lets you keep the best route where available, and still generate docs in environments where /responses isn’t exposed yet.
In this Python DocAgent implementation:
- If
api_basecontainsopenai.azure.com, we automatically switch to Azure mode. - The tool treats
--model/--summarize-modelas deployment names in Azure mode. - If your deployment names differ, pass:
--deployment <name>
--summarize-deployment <name>
Example:
$env:AZURE_OPENAI_API_KEY="YOUR_AZURE_KEY"
python tools\DocAgentPy\docagent_cli.py --repo "C:\git\myproject" `
--api-base "https://<resource>.openai.azure.com/" `
--api-version "2024-12-01-preview" `
--deployment "<deployment-name>" `
--summarize-deployment "<summarize-deployment-name>"
--azure-mode chatNote: In my case, Azure open AI /responses route was not enabled hence I used chat mode.
Full copy/paste implementation (all files)
Everything below is intended to be copied into the same paths.
DocAgentPy/requirements.txt
requests==2.32.3
openai==1.60.0DocAgentPy/docagent/__init__.py
__all__ = [
"openai_responses",
"repo_scanner",
"project_detector",
"generator",
]DocAgentPy/docagent/openai_responses.py
from __future__ import annotations
import json
from dataclasses import dataclass
from typing import Any, Optional
import requests
try:
# Optional: official SDK supports AzureOpenAI and handles routing/versioning.
from openai import AzureOpenAI, OpenAI # type: ignore
except Exception: # pragma: no cover
AzureOpenAI = None # type: ignore
OpenAI = None # type: ignore
@dataclass(frozen=True)
class OpenAIConfig:
api_key: str
api_base: str = "https://api.openai.com/v1"
# Azure OpenAI support (optional). If api_base contains "openai.azure.com", Azure mode is auto-enabled.
api_version: str = "2024-12-01-preview"
# Azure routing strategy:
# - "responses": use Responses API only (fail if 404)
# - "chat": use Chat Completions only
# - "auto": try Responses, and if Azure returns 404, fall back to Chat Completions
azure_mode: str = "responses"
class OpenAIResponsesClient:
"""
Minimal client for OpenAI Responses API.
Uses the official OpenAI SDK when available; falls back to requests.
"""
def __init__(self, cfg: OpenAIConfig, session: Optional[requests.Session] = None) -> None:
self._cfg = cfg
self._session = session or requests.Session()
def create_response_text(self, *, model: str, system_prompt: str, user_prompt: str, timeout_s: int = 120) -> str:
is_azure = "openai.azure.com" in self._cfg.api_base.lower()
# Prefer the official SDK when available. It matches Azure Portal examples and avoids URL pitfalls.
if is_azure and AzureOpenAI is not None:
client = AzureOpenAI(azure_endpoint=self._cfg.api_base, api_key=self._cfg.api_key, api_version=self._cfg.api_version)
mode = (self._cfg.azure_mode or "responses").lower()
def do_chat() -> str:
chat = client.chat.completions.create(
model=model, # Azure: model == deployment name
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
)
content = getattr(chat.choices[0].message, "content", None)
if isinstance(content, str) and content.strip():
return content.strip()
raise RuntimeError("Azure chat completion returned no message content.")
if mode == "chat":
return do_chat()
if mode in {"responses", "auto"}:
try:
resp = client.responses.create(
model=model, # Azure: model == deployment name
input=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
)
return _extract_output_text_from_sdk(resp)
except Exception as e:
if mode == "auto" and "404" in str(e):
return do_chat()
raise
raise ValueError(f"Invalid azure_mode: {self._cfg.azure_mode!r}. Expected 'responses', 'chat', or 'auto'.")
if (not is_azure) and OpenAI is not None:
client = OpenAI(api_key=self._cfg.api_key, base_url=self._cfg.api_base)
resp = client.responses.create(
model=model,
input=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
)
return _extract_output_text_from_sdk(resp)
# Fallback: raw HTTP (works for OpenAI and for Azure when SDK isn't available)
if is_azure:
mode = (self._cfg.azure_mode or "responses").lower()
deployment = model
headers = {"api-key": self._cfg.api_key, "Content-Type": "application/json", "Accept": "application/json"}
def do_chat_http() -> str:
chat_url = self._cfg.api_base.rstrip("/") + f"/openai/deployments/{deployment}/chat/completions?api-version={self._cfg.api_version}"
chat_payload: dict[str, Any] = {
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
}
chat_resp = self._session.post(chat_url, headers=headers, data=json.dumps(chat_payload), timeout=timeout_s)
chat_text = chat_resp.text
if chat_resp.status_code // 100 != 2:
raise RuntimeError(f"OpenAI error ({chat_resp.status_code}) calling {chat_url}: {chat_text}")
chat_data = json.loads(chat_text)
choice0 = (chat_data.get("choices") or [{}])[0]
msg = choice0.get("message") or {}
content = msg.get("content")
if isinstance(content, str) and content.strip():
return content.strip()
raise RuntimeError(f"Azure chat completion returned no message content: {chat_text}")
if mode == "chat":
return do_chat_http()
url = self._cfg.api_base.rstrip("/") + f"/openai/deployments/{deployment}/responses?api-version={self._cfg.api_version}"
payload: dict[str, Any] = {
"input": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
}
else:
url = self._cfg.api_base.rstrip("/") + "/responses"
headers = {
"Authorization": f"Bearer {self._cfg.api_key}",
"Content-Type": "application/json",
"Accept": "application/json",
}
payload = {
"model": model,
"input": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
}
resp = self._session.post(url, headers=headers, data=json.dumps(payload), timeout=timeout_s)
text = resp.text
if resp.status_code // 100 == 2:
return _extract_output_text(text)
if is_azure and (self._cfg.azure_mode or "responses").lower() == "auto" and resp.status_code == 404:
return do_chat_http()
raise RuntimeError(f"OpenAI error ({resp.status_code}) calling {url}: {text}")
def _extract_output_text_from_sdk(resp: Any) -> str:
"""
Extract output text from OpenAI SDK response objects.
"""
# New SDKs expose output_text directly.
t = getattr(resp, "output_text", None)
if isinstance(t, str) and t.strip():
return t.strip()
# Otherwise, fall back to JSON parsing using our tolerant extractor.
raw: Optional[str] = None
if hasattr(resp, "model_dump_json"):
raw = resp.model_dump_json() # pydantic
elif hasattr(resp, "to_json"):
raw = resp.to_json()
if isinstance(raw, str) and raw.strip():
return _extract_output_text(raw)
# Last resort: stringify via __dict__ (may be incomplete).
return _extract_output_text(json.dumps(resp, default=lambda o: getattr(o, "__dict__", str(o))))
def _extract_output_text(raw_json: str) -> str:
"""
Extract concatenated output text from a Responses API payload.
Kept intentionally tolerant to minor schema changes.
"""
data = json.loads(raw_json)
# Common shape: output: [ { content: [ { type:"output_text", text:"..." }, ... ] }, ... ]
out = data.get("output")
if isinstance(out, list):
parts: list[str] = []
for item in out:
content = item.get("content") if isinstance(item, dict) else None
if not isinstance(content, list):
continue
for c in content:
if not isinstance(c, dict):
continue
if c.get("type") == "output_text" and isinstance(c.get("text"), str):
parts.append(c["text"])
joined = "".join(parts).strip()
if joined:
return joined
# Fallback: some payloads expose output_text directly
if isinstance(data.get("output_text"), str):
return data["output_text"].strip()
raise RuntimeError("Could not extract output text from OpenAI response.")DocAgentPy/docagent/repo_scanner.py
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable
@dataclass(frozen=True)
class ScannedFile:
relative_path: str
size_bytes: int
content: str
@dataclass(frozen=True)
class ScannedRepo:
repo_root: str
tree: str
files: list[ScannedFile]
IGNORED_DIRS = {".git", ".vs", "bin", "obj", "node_modules", "packages", ".idea", "__pycache__"}
HIGH_SIGNAL_EXTS = {
# .NET
".csproj", ".sln", ".slnx", ".cs", ".fs", ".vb",
# Python
".py",
# Node
".js", ".mjs", ".cjs", ".ts", ".tsx",
# Docs/config
".md", ".json", ".yml", ".yaml", ".toml", ".ini", ".env",
# scripts
".ps1", ".sh", ".cmd", ".bat",
}
HIGH_SIGNAL_NAMES = {
# General
"README.md", "LICENSE", "Dockerfile", "docker-compose.yml", "docker-compose.yaml", "global.json",
# .NET
"Program.cs",
# Node
"package.json", "package-lock.json", "yarn.lock", "pnpm-lock.yaml", "next.config.js", "next.config.mjs",
# Python
"pyproject.toml", "requirements.txt", "Pipfile", "Pipfile.lock", "poetry.lock", "setup.py", "setup.cfg", "manage.py",
# env
".env", ".env.example",
}
def list_repo_files(repo_root: str, max_files: int = 10_000) -> list[str]:
root = Path(repo_root).resolve()
out: list[str] = []
for p in _enumerate_files(root):
out.append(str(p.relative_to(root)))
if len(out) >= max_files:
break
return out
def scan_repo(repo_root: str, max_files: int = 80, max_chars_per_file: int = 18_000, tree_max_entries: int = 250) -> ScannedRepo:
root = Path(repo_root).resolve()
all_files = list(_enumerate_files(root))
ranked = sorted(all_files, key=lambda p: (-_score_file(root, p), len(str(p))))
ranked = ranked[:max_files]
scanned: list[ScannedFile] = []
for p in ranked:
rel = str(p.relative_to(root))
size = p.stat().st_size
scanned.append(ScannedFile(rel, size, _read_text_best_effort(p, max_chars=max_chars_per_file)))
return ScannedRepo(str(root), _build_tree(root, tree_max_entries), scanned)
def _enumerate_files(root: Path) -> Iterable[Path]:
stack = [root]
while stack:
d = stack.pop()
for child in d.iterdir():
if child.is_dir():
if child.name in IGNORED_DIRS:
continue
stack.append(child)
continue
if child.is_file() and not _is_ignored_file(child):
yield child
def _is_ignored_file(p: Path) -> bool:
name = p.name
if name.lower().endswith((".dll", ".exe", ".pdb", ".cache", ".user", ".suo")):
return True
ext = p.suffix.lower()
if ext in HIGH_SIGNAL_EXTS:
return False
if name in HIGH_SIGNAL_NAMES:
return False
return True
def _score_file(root: Path, p: Path) -> int:
rel = str(p.relative_to(root)).replace("\\", "/")
name = p.name
ext = p.suffix.lower()
score = 0
if name in HIGH_SIGNAL_NAMES:
score += 200
if name.lower().endswith(".csproj"):
score += 180
if name.lower().endswith((".sln", ".slnx")):
score += 170
if name == "Program.cs":
score += 140
if name.lower().startswith("appsettings"):
score += 120
if rel.startswith(".github/workflows/"):
score += 110
if "/docker" in rel.lower() or name.lower().startswith("docker"):
score += 90
if ext == ".md":
score += 60
if ext in {".cs", ".py", ".ts", ".js"}:
score += 50
if ext in {".json", ".yml", ".yaml", ".toml"}:
score += 40
# prefer top-level
score += max(0, 15 - rel.count("/"))
return score
def _read_text_best_effort(p: Path, max_chars: int) -> str:
try:
text = p.read_text(encoding="utf-8")
except UnicodeDecodeError:
text = p.read_text(encoding="utf-8", errors="replace")
except Exception:
return "<unreadable>"
if len(text) <= max_chars:
return text
return text[:max_chars] + "\\n\\n... (truncated)\\n"
def _build_tree(root: Path, max_entries: int) -> str:
items: list[str] = []
for i, p in enumerate(sorted(_enumerate_files(root), key=lambda x: str(x).lower())):
if i >= max_entries:
items.append("... (tree truncated)")
break
items.append(str(p.relative_to(root)).replace("\\\\", "/"))
return "\\n".join(items)DocAgentPy/docagent/project_detector.py
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable
class PrimaryLanguage:
UNKNOWN = "Unknown"
DOTNET = "DotNet"
NODE = "Node"
PYTHON = "Python"
class AppKind:
UNKNOWN = "Unknown"
CONSOLE_OR_LIBRARY = "ConsoleOrLibrary"
WEB_APP = "WebApp"
@dataclass(frozen=True)
class DetectedProject:
language: str
kind: str
confidence: str
evidence: list[str]
def detect_project(repo_root: str, repo_files: Iterable[str]) -> DetectedProject:
repo_files = list(repo_files)
evidence: list[str] = []
# ---- .NET ----
csproj = next((p for p in repo_files if p.lower().endswith(".csproj")), None)
if csproj:
evidence.append(f"Found C# project file: {csproj}")
kind = AppKind.CONSOLE_OR_LIBRARY
try:
xml = Path(repo_root, csproj).read_text(encoding="utf-8", errors="replace")
if "Microsoft.NET.Sdk.Web".lower() in xml.lower():
kind = AppKind.WEB_APP
evidence.append("csproj uses Microsoft.NET.Sdk.Web (ASP.NET Core Web app).")
elif any("/Controllers/" in p.replace("\\\\", "/") for p in repo_files):
kind = AppKind.WEB_APP
evidence.append("Found Controllers/ folder (likely ASP.NET Web API).")
except Exception:
pass
return DetectedProject(PrimaryLanguage.DOTNET, kind, "high", evidence)
# ---- Node ----
pkg = next((p for p in repo_files if Path(p).name.lower() == "package.json"), None)
if pkg:
evidence.append("Found package.json (Node project).")
kind = AppKind.CONSOLE_OR_LIBRARY
try:
raw = Path(repo_root, pkg).read_text(encoding="utf-8", errors="replace")
data = json.loads(raw)
deps = data.get("dependencies") or {}
deps_txt = json.dumps(deps).lower()
if any(Path(p).name.lower() in {"next.config.js", "next.config.mjs"} for p in repo_files):
kind = AppKind.WEB_APP
evidence.append("Found next.config.* (Next.js web app).")
elif any("/src/app/" in p.replace("\\\\", "/").lower() or "/pages/" in p.replace("\\\\", "/").lower() for p in repo_files):
kind = AppKind.WEB_APP
evidence.append("Found common web app folders (src/app or pages).")
elif any(x in deps_txt for x in ["react", "next", "express", "fastify"]):
kind = AppKind.WEB_APP
evidence.append("package.json dependencies indicate a web framework (react/next/express/fastify).")
except Exception:
pass
return DetectedProject(PrimaryLanguage.NODE, kind, "medium", evidence)
# ---- Python ----
has_pyproject = any(Path(p).name.lower() == "pyproject.toml" for p in repo_files)
has_requirements = any(Path(p).name.lower() == "requirements.txt" for p in repo_files)
has_setup_py = any(Path(p).name.lower() == "setup.py" for p in repo_files)
if has_pyproject or has_requirements or has_setup_py:
if has_pyproject:
evidence.append("Found pyproject.toml (Python project).")
if has_requirements:
evidence.append("Found requirements.txt (Python project).")
if has_setup_py:
evidence.append("Found setup.py (Python project).")
kind = AppKind.CONSOLE_OR_LIBRARY
if any(Path(p).name.lower() == "manage.py" for p in repo_files):
kind = AppKind.WEB_APP
evidence.append("Found manage.py (likely Django web app).")
elif any(Path(p).name.lower() in {"app.py", "wsgi.py", "asgi.py"} for p in repo_files):
kind = AppKind.WEB_APP
evidence.append("Found common Python web entrypoints (app.py/wsgi.py/asgi.py).")
return DetectedProject(PrimaryLanguage.PYTHON, kind, "medium", evidence)
return DetectedProject(PrimaryLanguage.UNKNOWN, AppKind.UNKNOWN, "low", ["No strong project markers found."])DocAgentPy/docagent/generator.py
from __future__ import annotations
from dataclasses import dataclass
from typing import Optional
from .openai_responses import OpenAIResponsesClient
from .project_detector import DetectedProject, PrimaryLanguage, AppKind, detect_project
from .repo_scanner import ScannedRepo, list_repo_files, scan_repo
@dataclass(frozen=True)
class GenerateRequest:
repo_root: str
readme_path: str
docs_dir: Optional[str]
readme_model: str
summarize_model: str
max_files: int
@dataclass(frozen=True)
class GeneratedDoc:
relative_path: str
markdown: str
@dataclass(frozen=True)
class GenerateResult:
readme_markdown: str
additional_docs: list[GeneratedDoc]
class DocumentationGenerator:
def __init__(self, openai: OpenAIResponsesClient) -> None:
self._openai = openai
def generate(self, req: GenerateRequest) -> GenerateResult:
repo_files = list_repo_files(req.repo_root)
detected = detect_project(req.repo_root, repo_files)
repo = scan_repo(req.repo_root, max_files=req.max_files)
summaries = self._summarize_files(repo, req.summarize_model)
readme = self._generate_readme(detected, repo, summaries, req.readme_model)
docs: list[GeneratedDoc] = []
if req.docs_dir:
docs.append(GeneratedDoc("PROJECT_OVERVIEW.md", self._generate_project_overview(detected, repo, summaries, req.readme_model)))
docs.append(GeneratedDoc("HOW_TO_RUN.md", self._generate_how_to_run(detected, repo, summaries, req.readme_model)))
return GenerateResult(readme, docs)
def _summarize_files(self, repo: ScannedRepo, model: str) -> list[tuple[str, str]]:
out: list[tuple[str, str]] = []
system = "You are a senior engineer summarizing repository files for documentation. Be accurate and concrete."
for f in repo.files:
user = f"""Summarize this file for documentation purposes.
Rules:
- Focus on what a reader needs for README/docs: what it does, how it’s used, key commands/config, gotchas.
- Keep it short (5-12 bullets). Include relevant command examples if present.
- If the file is clearly irrelevant to running/understanding the project, say so.
FILE: {f.relative_path}
SIZE_BYTES: {f.size_bytes}
CONTENT:
{f.content}
"""
summary = self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip()
out.append((f.relative_path, summary))
return out
def _generate_readme(self, detected: DetectedProject, repo: ScannedRepo, summaries: list[tuple[str, str]], model: str) -> str:
system = (
"You write accurate READMEs for software repos.\\n"
"Do NOT guess. If info is missing, add a TODO section instead of hallucinating.\\n"
"Prefer actionable steps and concrete commands.\\n"
"Return ONLY markdown.\\n"
)
user = f"""Create a README.md for this repository.
Detected project (heuristic):
- Language: {detected.language}
- Kind: {detected.kind}
- Confidence: {detected.confidence}
- Evidence:
{_format_evidence(detected)}
Repository root: {repo.repo_root}
Repository tree (truncated):
{repo.tree}
File summaries:
{_format_summaries(summaries)}
README requirements:
- Title + one-paragraph overview
- Features (bullets)
- Prerequisites
- Quickstart (commands)
- Configuration (env vars / config files if present)
- Project structure (top-level overview)
- Troubleshooting (common build/run issues)
- If a license file exists in the tree, mention it; otherwise omit.
- Add a TODO section ONLY when necessary (unknown ports/env vars/etc).
Stack-specific guidance (follow if applicable; otherwise use TODO):
{_stack_guidance(detected)}
Important: Keep it friendly and easy for a beginner to run.
"""
return self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip() + "\\n"
def _generate_project_overview(self, detected: DetectedProject, repo: ScannedRepo, summaries: list[tuple[str, str]], model: str) -> str:
system = (
"You write an easy-to-understand project overview document.\\n"
"Do NOT guess; call out unknowns as TODO.\\n"
"Return ONLY markdown.\\n"
)
user = f"""Write docs/PROJECT_OVERVIEW.md for this repository.
Detected project:
- Language: {detected.language}
- Kind: {detected.kind}
- Confidence: {detected.confidence}
Include:
- What problem it solves
- What’s inside (major components/files)
- Key flows / entrypoints
- How to extend it safely
Repo tree:
{repo.tree}
Summaries:
{_format_summaries(summaries)}
"""
return self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip() + "\\n"
def _generate_how_to_run(self, detected: DetectedProject, repo: ScannedRepo, summaries: list[tuple[str, str]], model: str) -> str:
system = (
"You write a concise runbook for developers.\\n"
"Do NOT guess. Prefer exact dotnet/npm/etc commands found in the repo summaries.\\n"
"Return ONLY markdown.\\n"
)
user = f"""Write docs/HOW_TO_RUN.md for this repository.
Include:
- Prerequisites
- Build
- Run
- Test (if applicable)
- Common issues and fixes
Detected project:
- Language: {detected.language}
- Kind: {detected.kind}
- Confidence: {detected.confidence}
Stack-specific guidance:
{_stack_guidance(detected)}
Repo tree:
{repo.tree}
Summaries:
{_format_summaries(summaries)}
"""
return self._openai.create_response_text(model=model, system_prompt=system, user_prompt=user).strip() + "\\n"
def _format_summaries(summaries: list[tuple[str, str]]) -> str:
parts: list[str] = []
for path, summary in summaries:
parts.append(f"### {path}\\n{summary}\\n")
return "\\n".join(parts).strip()
def _format_evidence(detected: DetectedProject) -> str:
if not detected.evidence:
return " - (none)"
return "\\n".join([f" - {e}" for e in detected.evidence])
def _stack_guidance(detected: DetectedProject) -> str:
# Guidance, not facts — model must still rely on repo summaries and use TODO if uncertain.
if detected.language == PrimaryLanguage.DOTNET and detected.kind == AppKind.WEB_APP:
return (
"- Prefer ASP.NET Core style instructions.\\n"
"- Typical commands (use only if confirmed by repo): `dotnet restore`, `dotnet build`, `dotnet run`, `dotnet test`.\\n"
"- Mention `appsettings*.json`, `launchSettings.json`, and env vars if present.\\n"
"- If ports/urls are unclear, add TODO rather than guessing.\\n"
)
if detected.language == PrimaryLanguage.DOTNET:
return (
"- Prefer .NET style instructions.\\n"
"- Typical commands (use only if confirmed by repo): `dotnet restore`, `dotnet build`, `dotnet run`, `dotnet test`.\\n"
"- Mention TargetFramework if present.\\n"
)
if detected.language == PrimaryLanguage.NODE and detected.kind == AppKind.WEB_APP:
return (
"- Prefer Node web app instructions.\\n"
"- Detect package manager from lockfile: pnpm/yarn/npm.\\n"
"- Typical commands (use only if confirmed by repo): install deps, run dev server, build, test.\\n"
"- Mention `.env` / `.env.example` if present.\\n"
)
if detected.language == PrimaryLanguage.NODE:
return (
"- Prefer Node app/library instructions.\\n"
"- Detect package manager from lockfile: pnpm/yarn/npm.\\n"
"- Typical commands (use only if confirmed by repo): install, build, test.\\n"
)
if detected.language == PrimaryLanguage.PYTHON and detected.kind == AppKind.WEB_APP:
return (
"- Prefer Python web app instructions.\\n"
"- Mention creating a venv and installing deps (`requirements.txt` or `pyproject.toml`).\\n"
"- If framework is unclear, add TODO rather than guessing.\\n"
)
if detected.language == PrimaryLanguage.PYTHON:
return (
"- Prefer Python app/library instructions.\\n"
"- Mention creating a venv and installing deps (`requirements.txt` or `pyproject.toml`).\\n"
)
return "- Language/framework is unclear: keep instructions generic and add TODOs for missing run commands.\\n"DocAgentPy/docagent_cli.py
from __future__ import annotations
import argparse
import os
from pathlib import Path
from docagent.generator import DocumentationGenerator, GenerateRequest
from docagent.openai_responses import OpenAIConfig, OpenAIResponsesClient
from docagent.project_detector import detect_project
from docagent.repo_scanner import list_repo_files
def main() -> int:
p = argparse.ArgumentParser(
prog="docagent",
description="DocAgent (Python): generate README/docs for a repository using OpenAI.",
)
p.add_argument("--repo", default=".", help="Repo/project root to document (default: current directory)")
p.add_argument("--readme", default=None, help="Output README path (default: <repo>/README.md)")
p.add_argument("--docs", default=None, help="Also write docs into this directory (e.g. <repo>/docs)")
p.add_argument("--api-key", default=None, help="OpenAI API key (default: OPENAI_API_KEY env var)")
p.add_argument("--api-base", default="https://api.openai.com/v1", help="API base URL (OpenAI: https://api.openai.com/v1, Azure: https://<resource>.openai.azure.com/)")
p.add_argument("--api-version", default="2024-12-01-preview", help="Azure OpenAI api-version (used only for *.openai.azure.com)")
p.add_argument(
"--azure-mode",
default="responses",
choices=["responses", "chat", "auto"],
help="Azure routing: responses|chat|auto (auto: try /responses then fall back to /chat/completions on 404).",
)
p.add_argument("--model", default="gpt-4.1", help="Model for README/docs generation")
p.add_argument("--summarize-model", default="gpt-4.1", help="Model for file summarization")
p.add_argument("--deployment", default=None, help="Azure OpenAI deployment name for generation (overrides --model in Azure mode)")
p.add_argument("--summarize-deployment", default=None, help="Azure OpenAI deployment name for summarization (overrides --summarize-model in Azure mode)")
p.add_argument("--max-files", type=int, default=80, help="Max files to summarize")
p.add_argument("--dry-run", action="store_true", help="Print markdown to stdout (don’t write files)")
p.add_argument("--explain-detection", action="store_true", help="Print detection evidence")
args = p.parse_args()
repo_root = str(Path(args.repo).resolve())
readme_path = str(Path(args.readme).resolve()) if args.readme else str(Path(repo_root, "README.md"))
docs_dir = str(Path(args.docs).resolve()) if args.docs else None
# Support either OpenAI or Azure OpenAI key via env vars.
api_key = args.api_key or os.environ.get("OPENAI_API_KEY") or os.environ.get("AZURE_OPENAI_API_KEY")
if not api_key:
print("Missing API key. Pass --api-key or set OPENAI_API_KEY (OpenAI) / AZURE_OPENAI_API_KEY (Azure).")
return 2
repo_files = list_repo_files(repo_root)
detected = detect_project(repo_root, repo_files)
print(f"Detected: {detected.language} / {detected.kind} (confidence: {detected.confidence})")
if args.explain_detection:
for e in detected.evidence:
print(f"- {e}")
# If using Azure OpenAI, treat "model" as deployment name unless explicitly overridden.
is_azure = "openai.azure.com" in str(args.api_base).lower()
readme_model = args.deployment if (is_azure and args.deployment) else args.model
summarize_model = args.summarize_deployment if (is_azure and args.summarize_deployment) else args.summarize_model
client = OpenAIResponsesClient(
OpenAIConfig(api_key=api_key, api_base=args.api_base, api_version=args.api_version, azure_mode=args.azure_mode)
)
generator = DocumentationGenerator(client)
result = generator.generate(
GenerateRequest(
repo_root=repo_root,
readme_path=readme_path,
docs_dir=docs_dir,
readme_model=readme_model,
summarize_model=summarize_model,
max_files=args.max_files,
)
)
if args.dry_run:
print(result.readme_markdown)
if result.additional_docs:
print("\\n---- Additional docs ----")
for doc in result.additional_docs:
print(f"\\n# {doc.relative_path}\\n{doc.markdown}")
return 0
Path(readme_path).parent.mkdir(parents=True, exist_ok=True)
Path(readme_path).write_text(result.readme_markdown, encoding="utf-8")
print(f"Wrote {readme_path}")
if docs_dir:
Path(docs_dir).mkdir(parents=True, exist_ok=True)
for doc in result.additional_docs:
out_path = Path(docs_dir, doc.relative_path)
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(doc.markdown, encoding="utf-8")
print(f"Wrote {out_path}")
return 0
if __name__ == "__main__":
raise SystemExit(main())DocAgentPy/README.md (optional)
# DocAgentPy (Python documentation generator)
This is a Python version of **DocAgent**. It scans a repo folder and generates:
- a `README.md`
- optional additional docs in a `docs/` directory
It uses the OpenAI **Responses API** via `requests` (no IDE plugins required).
## Setup
From the repository root:
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r tools\DocAgentPy\requirements.txtSet your API key:
$env:OPENAI_API_KEY="YOUR_KEY_HERE"Run
Generate README for a repo:
python docagent_cli.py --repo "C:\git\myproject"Generate README + extra docs:
python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs"Dry run:
python docagent_cli.py --repo "C:\git\myproject" --docs "C:\git\myproject\docs" --dry-runExplain detection:
python docagent_cli.py --repo "C:\git\myproject" --explain-detection --dry-runHope this would have help you. Happy documenting.
No comments:
Post a Comment