Indian Court Arguments & Drafts for LLM Training
India’s e-courts ecosystem now makes large volumes of pleadings, written submissions, paper-books, affidavits, petitions and drafting templates openly obtainable—if you know where to look and how to extract them. Below is a detailed, step-by-step blueprint for sourcing, downloading, processing and legally using these high-value materials to fine-tune a Legal-Research LLM or populate a VectorDB.
Overview
Most Indian litigants file 6 core “argument” artefacts:
- Petitions / plaints / suits (initiate a proceeding)
- Counter-affidavits & replies
- Written submissions / synopses / list-of-dates (post-hearing road-maps of arguments)
- Compilations (authorities, case-law bundles)
- Paper-books (complete record: pleadings, evidence, orders)
- Misc. applications (interim reliefs, amendments, condonation, etc.)
All six are now posted digitally—either:
- Automatically by court registry (e-filing portals, 3-PDF paper-books, scanned record-room dumps) or
- Voluntarily by parties (Indian Kanoon, Bar & Bench, SpicyIP, NGO litigation pages).
The sections that follow explain where to obtain each artefact, access requirements, scraping tips, recommended preprocessing and licence notes.
1. Supreme Court of India – richest single trove of arguments
1.1 “3-PDF” Paper-Books (live matters)
| Feature | Details | Access route | Licensing |
|---|---|---|---|
| Content | PDF-Main (pleadings & impugned orders) + PDF-Additional (counter-affidavits, rejoinders, IA’s) + PDF-ROP/OR (Record-of-Proceedings & Office Reports) | Available to every Advocate-on-Record (AOR) in e-Filing 2.0 dashboard; download as ZIP via “Paper Book (3-PDF)” link | © SCI OGL-v2 by default; free reuse with citation[1][2] |
| Coverage | All matters listed before Court No. 1 since Jan 2024 (full roll-out expected 2025) | Need AOR credentials → partner programme or employ in-house AOR | OGL v2[3] |
Scraping workflow
- Script a headless login that picks up fresh ZIPs daily (cookies + OTP).
- Extract PDFs; run OCR (Tesseract 5 with
--psm 1) because some annexures are scans. - Chunk via bookmarks (
/Title) into logical segments: Facts, Questions, Grounds, Prayer. - Store in object storage (e.g., S3) with metadata:
item_id,diary_no,party_role,doc_type,page_span.
1.2 Record-Room Scans (disposed matters)
| Notice | What it means for you |
|---|---|
| 2015 Public Notice on permanent preservation of Part-I case-records[4][5] | Every SC civil appeal (1986-1993 scanned to date) can be ordered in certified e-form; registry is scanning forward chronologically. |
| How to order | File an Inspection / Copying Application under Order XII r.2 (SC Rules 2013). AOR receives a DVD or FTP link. |
| Volume | ~500,000 PDFs (~110 TB); includes complete paper-books with counsel drafts, trial pleadings, evidence lists. |
1.3 Written-Submission PDFs on main site
Many benches now direct “file concisely bookmarked written submissions within 7 days” and registry uploads them public-facing[6][7].
Query pattern:
https://api.sci.gov.in/supremecourt/{yyyy}/{diary_no}/{diary_no}_{yyyy}_{X}_{YYY}_{ZZZ}_Order_{dd-MMM-yyyy}.pdf
Use simple keyword filters: "written submissions" filetype:pdf site:api.sci.gov.in.
2. High Courts & Tribunals
2.1 e-Filing Portals (nine HCs, all NCLT benches)
- Deliver docket PDFs for petitions, counter-affidavits, rejoinders once Scrutiny Cleared.
- Login via advocate credentials; portal URLs:
filing.ecourts.gov.in/. - JSON API endpoint
/efiling/viewDocument?docId={}returns base-64 PDF.
2.2 Cause-List Paste-ins & “Case-Documents” Tabs
- Bombay HC & Delhi HC expose “Case Documents” in HTML—including written submissions.
- Scrape with Selenium, throttled to 1 req/3 s to avoid blocking.
2.3 NGT, SAT, NCLAT
Orders include annexed applicant’s synopsis; paper-books available via RTI under S.4 (1)(b) proactive disclosure.
3. Voluntary Open-Source Repositories of Pleadings
| Host | Type of document | Highlights | Licence |
|---|---|---|---|
| Indian Kanoon “Petitions” tab | PDF of filed petitions, counter-affidavits in PILs | 35,000+ Supreme Court filings back to 2012 | CC-BY-SA[8] |
| Bar & Bench / LiveLaw / SC Observer | Full‐text petitions & counter-affidavits in constitutional matters | Often uploaded same-day for breaking litigation[9][10] | Public domain (filed in court) |
| SpicyIP Litigation Database | IPR writs, plaints & written statements | 3,800+ PDFs from Delhi HC, Madras HC[11] | CC-BY |
| NGOs (PUCL, CHRI, Centre for Policy Research) | PIL paper-books, expert reports | Structured zip archives | OGL or CC |
Scrape RSS feeds; push to litig-raw bucket; run nightly dedupe by SHA-256.
4. Standard Draft Templates & Forms (for prompt-engineering examples)
| Form | Source | Why useful | Citation |
|---|---|---|---|
| Supreme Court SLP Form-28 & Order XVI Rule 4 skeleton | Official SCI forms page[12][13] | Perfect “boiler-plate” to teach LLM the fixed parts vs. variable sections | 47 |
| Model counter-affidavit format | SCI affidavit template[14] | Shows jurat, verification language, serial-paragraph structure | 50 |
| Writ-petition & PIL samples | Delhi HC PIL compendium[11] | 58 | |
| Civil plaint drafting guide | iPleaders article + sample plaint[15][16][17] | For training drafting assistant to auto-populate CPC-compliant plaints | 35 |
| “Written-submission” style sheet | Latest Court orders specify font, margin, bookmark rules[8][18] | Allows generation of court-ready PDF | 21 |
5. Acquisition Tactics When Documents Are Not Online
5.1 Certified-Copy / Inspection Rights
- SC Rules 2013 Order XII and CPC Order XI r.6–8 give any party or bona-fide researcher the right to inspect case-records on paying nominal fees.
- File physical or e-Filing application → registry emails scanning cost → pay online → receive sealed PDF (open licence per OGL).
5.2 RTI Act 2005
- High Courts & Tribunals are “public authorities”. Pleadings are “information held”. File RTI seeking soft-copies (ask for CD/pendrive under Rule 4(d) of RTI Fees).
5.3 Collaboration With Advocates-on-Record
- AORs already download paper-books daily; sign data-sharing MoU.
- An AOR can inspect any disposed SC record for ₹100.
6. Cleaning & Structuring Pipeline
-
OCR & Text-Layer
- Use
tesseract -l eng+hin+gur --oem 3 --psm 1. - Deskew via
ocrmypdf --deskew --fast-web-view.
- Use
-
Segmentation
- Regex on common headings:
SYNOPSIS,LIST OF DATES,GROUNDS,PRAYER,ANNEXURE P‐…. - Store offsets to allow retrieval of a single argument passage.
- Regex on common headings:
-
De-duplication
- Many annexures repeat statutes already in your Acts-corpus. De-duplicate with SimHash Jaccard > 0.85.
-
Metadata Enrichment
- Use SCI diary API to pull bench, judge, arguments concluded date, decision outcome.
- Tag each paragraph with party side:
PETITIONER_ARGUMENT,RESPONDENT_REBUTTAL.
-
Citation Graph
- Extract
(case_cited, paragraph_no)pairs → build precedent-argument graph for Retrieval-Augmented Generation.
- Extract
7. Suggested Vector Schema for Arguments DB
| Field | Type | Example |
|---|---|---|
doc_id | UUID | SC_2025_SLPCiv_4621_WS_Petr |
party_side | Enum | PET/SYNOPSIS |
court_level | Enum | SC, DELHC |
argument_text | Text (chunk 400 tokens) | -- |
citations | Array | ["(2015)7SCC497","AIR1965SC1150"] |
legal_issues | Array | ["Article 14","Promissory Estoppel"] |
embedding | Vector | from instructor-large-legal |
date_filed | Date | 2025-04-03 |
outcome | Enum | ALLOWED/REJECTED/PARTLY |
8. Licencing & Ethical Compliance
- Government of India Open Government Licence v2 covers all materials filed in court records unless copyrighted otherwise—free commercial reuse with attribution[3].
- Personal data: scrub Aadhaar, phone, medical details to comply with SC Privacy Committee 2023 redaction policy.
- Keep hash-locked originals to prove authenticity; use redacted text for model ingestion.
- Cite court & diary number in every reproduced snippet.
9. Rapid-Start Checklist
| Week | Milestone | Tools |
|---|---|---|
| 1 | Partner with AOR; obtain e-Filing credentials; set up S3 bucket. | Python 3.11, Playwright |
| 2 | Automate 3-PDF download & OCR; ingest two pilot cases. | Tesseract + OCRmyPDF |
| 3 | Build Mongo / Postgres schema; generate embeddings; test semantic search. | pgvector / Redis |
| 4 | File RTI for 25 historic paper-books; add to corpus. | RTI online portal |
| 5 | Fine-tune Llama-3-8B-Instruct on 10,000 paragraph pairs (issue ↔ argument). | HuggingFace TRL |
| 6 | Evaluate against manual drafting tasks; iterate prompt. | OpenAI evals |
10. Conclusion
High-quality argument data—not just bare judgments—is now abundant in India’s digitised court system. By combining:
- Institutional feeds (Supreme Court 3-PDF paper-books, record-room scans),
- High-Court e-filing downloads,
- Voluntary open repositories (Indian Kanoon, media portals), and
- Targeted RTI / inspection requests,
you can assemble a legally compliant, richly annotated corpus of petitions, affidavits, written submissions and drafting templates. With robust OCR, segmentation and metadata-tagging, this corpus becomes prime fuel for:
-
- Training retrieval-augmented LLMs that draft court-ready documents,*
-
- Suggesting winning argument structures,*
-
- Predicting bench-specific preferences,*
and ultimately delivering a next-generation Indian Legal-Research AI.
Harness these pipelines now, and your system will learn not merely what the courts decided, but how the best advocates persuaded them.
[1] https://efiling.sci.gov.in/uploaded_docs/user_manual/3pdf_user_manual.pdf [2] https://cdnbbsr.s3waas.gov.in/s3ec0490f1f4972d133619a60c30f3559e/uploads/2024/01/2024012785-1.pdf [3] https://cdnbbsr.s3waas.gov.in/s3ec0490f1f4972d133619a60c30f3559e/documents/misc/practice.pdf_0.pdf [4] https://cdnbbsr.s3waas.gov.in/s3ec058844c5f00372df2c3c4ee857c245/uploads/2024/01/2024010135.pdf [5] https://districts.ecourts.gov.in/sites/default/files/Supreme%20court.pdf [6] https://api.sci.gov.in/supremecourt/2023/18499/18499_2023_11_54_48939_Order_08-Dec-2023.pdf [7] https://api.sci.gov.in/supremecourt/2016/41681/41681_2016_8_27_50898_Order_27-Feb-2024.pdf [8] https://api.sci.gov.in/supremecourt/2008/2423/2423_2008_16_102_45923_Order_09-Aug-2023.pdf [9] https://www.scobserver.in/wp-content/uploads/2021/10/Petition_filed_by_Advocate_Reepak_Kansal.pdf [10] https://images.assettype.com/barandbench/import/2018/09/Maharashtra-Govt-Counter-Affidavit-Bhima-Koregaon.pdf [11] https://spicyip.com/wp-content/uploads/2018/01/04.-counter-affidavit-uoi.compressed.pdf [12] https://main.sci.gov.in/pdf/Forms/SCI-SLP%20format.pdf [13] https://main.sci.gov.in/pdf/Forms/slp%20format.pdf [14] https://hctlsc.tripura.gov.in/sites/default/files/download_forms/affidavit_1.pdf [15] https://blog.ipleaders.in/sample-plaint-civil-procedure-code/ [16] https://www.scribd.com/document/398703756/DRAFTING-OF-PLAINT-pdf [17] https://www.studocu.com/in/document/chaudhary-charan-singh-university/llb/draft-file/39637925 [18] https://api.sci.gov.in/officereport/1996/76045/76045_1996_2024-05-14.pdf [19] https://inbrief.nswbar.asn.au/posts/21dc6bda18d1036ec37b45178e141b4e/attachment/Practice%20Note%20SC%20CA%201.pdf [20] https://main.sci.gov.in/php/FAQ/5_6246991526434439183.pdf [21] https://cdnbbsr.s3waas.gov.in/s3ec0490f1f4972d133619a60c30f3559e/uploads/2024/01/2024011587-1.pdf [22] https://lawcat.berkeley.edu/record/1219593 [23] https://cdnbbsr.s3waas.gov.in/s3ec047fcc48d22804dbbe9b66b607d513/uploads/2023/11/2023111326.pdf [24] https://www.hcourt.gov.au/assets/cases/08-Sydney/s126-2022/Stanley-DPPNSW_Res.pdf [25] https://api.sci.gov.in/supremecourt/2022/31008/31008_2022_1_1505_39620_Judgement_07-Nov-2022.pdf [26] https://main.sci.gov.in/officereport/2021/5261/5261_2021_2024-01-29.pdf [27] https://archive.org/details/unitedstatessup04publgoog [28] https://www.legalaid.nsw.gov.au/my-problem-is-about/fines/fines-go-to-court/Going-to-court/your-submissions [29] https://efiling.sci.gov.in/uploaded_docs/user_manual/eFM_2.0_Manual.pdf [30] https://www.sci.gov.in/sci-get-pdf/?diary_no=501672023&type=o&order_date=2025-04-08&from=latest_judgements_order [31] https://www.vitalsource.com/products/the-supreme-court-lawrence-baum-v9781071901731 [32] https://cdnbbsr.s3waas.gov.in/s3ec0490f1f4972d133619a60c30f3559e/uploads/2024/01/2024011765.pdf [33] https://supremecourt.nsw.gov.au/documents/Practice-and-Procedure/Practice-Notes/cca-practice-notes/replaced/2021_05_01_PN_SC_CCA_01_-_General.pdf [34] https://www.sci.gov.in/judgements-case-no/ [35] https://us.sagepub.com/en-us/nam/the-supreme-court-compendium/book244744 [36] https://www.scobserver.in/about/supreme-court-of-india/procedure/ [37] https://www.archives.gov/files/dc-metro/washington/m216.pdf [38] https://www.scribd.com/document/806376403/9781544390109-the-supreme-court-14th-edition-original-pdf-1702344470 [39] https://www.courts.state.md.us/sites/default/files/import/coappeals/pdfs/informalpetitionfillablepdf.pdf [40] https://www.supremecourt.gov/orders/journal/Jnl23.pdf [41] https://www.thelawadvice.com/news/supreme-court-to-provide-all-scanned-paper-books-to-aors [42] https://www.nycourts.gov/LegacyPDFS/courts/11jd/supreme/civilterm/CH-FORMS/verified_petition.pdf [43] https://lawgicalshots.com/how-to-draft-a-plaint-check-sample-plaint/ [44] https://supremecourthistory.org/wp-content/uploads/2024/08/Index-1-48-Final.pdf [45] https://www.cafc.uscourts.gov/wp-content/uploads/RulesProceduresAndForms/FilingResources/Petition_for_Writ_of_Certiorari_-_Information_Sheet.pdf [46] https://www.supremecourt.gov/orders/journal/Jnl17.pdf [47] https://supremecourtbc.ca/sites/default/files/2023-04/Petitions_0.pdf [48] https://api.sci.gov.in/supremecourt/2025/33610/33610_2025_11_46_62205_Order_23-Jun-2025.pdf [49] https://us.sagepub.com/sites/default/files/upm-assets/120077_book_item_120077.pdf [50] https://www.jurist.org/news/wp-content/uploads/sites/4/2023/03/Final-Same-Sex-Marriage-Counter.pdf [51] https://www.lawsenate.com/publications/articles/special-leave-petition-slp.pdf [52] https://api.sci.gov.in/officereport/2023/8387/8387_2023_2025-03-06.pdf [53] https://lawhelpline.in/wp-content/uploads/2024/01/Counter_Affidavits_in_Writ_Petitions.pdf [54] https://api.sci.gov.in/officereport/2023/2012/2012_2023_2024-04-09.pdf [55] https://law.duke.edu/sites/default/files/lib/scotus.pdf [56] https://api.sci.gov.in/supremecourt/2020/9880/9880_2020_35_3_24965_Order_04-Dec-2020.pdf [57] https://www.supremecourt.gov/opinions/datesofdecisions.pdf [58] https://images.assettype.com/barandbench/2020-05/5fb0fe40-41bb-4b3c-9d60-4ff9618c176a/Centre_SLP_against_Orissa_HC_order_on_COVID_19.pdf [59] https://www.scribd.com/document/402551077/2924-20190208153133-sample-format-of-special-leave-petition-before-supreme-court-1553109852577-docx