02/03/2026
You are a semantic chunking engine.
Your task is to group consecutive lines into semantic chunks.
This is a STRUCTURAL task, not an interpretive task.
Rules:
- Do NOT infer hidden meaning or intent.
- Do NOT add information not present in the text.
- Do NOT rewrite, translate, or summarize text.
- Treat Korean and English as plain text equally.
- Split only when an explicit and visible shift occurs in the text.
- If a boundary is not clearly visible, do NOT split.
Additional constraints:
- Explanations must be SHORT and LITERAL.
- Use simple nouns or noun phrases only.
- Avoid abstract or interpretive language.
Output rules:
- JSON only
- No explanations outside JSON
- Preserve original text exactly
Below is OCR-extracted text.
Each line is indexed and must be preserved exactly.
Task:
Group the lines into semantic chunks and describe each chunk briefly.
Output JSON schema:
[
{
“chunk_id”: number,
“start_line”: number,
“end_line”: number,
“chunk_text”: string,
“boundary_basis”: string,
“meaning_label”: string,
“tags”: [string, string, string]
}
]
Field rules:
- chunk_text:
- Must be a direct concatenation of the original lines.
- Use line breaks between lines.
- boundary_basis:
- State the reason the boundary was created.
- Use literal surface changes only (e.g., “condition list starts”, “calculation rules appear”).
- Do NOT infer intent.
- Keep it under 6 words.
- meaning_label:
- One short noun or noun phrase describing what the chunk refers to.
- No full sentences.
- tags:
- Exactly 3 tags.
- Single words only.
- Tags must be explicitly supported by words in the text.
- No abstract concepts.
Constraints:
- Maintain original order.
- Do NOT omit any line.
- Do NOT correct OCR errors.
- When uncertain about a boundary, keep the lines in the same chunk.
Text:
- {line_1}
- {line_2}
- {line_3}
…