BREACH/TRIGGERBlog

· BreachTrigger

SEC EDGAR Full-Text Search for Breach Filings: Queries, API, and Gotchas

TL;DR: SEC EDGAR's full-text search API (/cgi-bin/browse-edgar) and FTS (Full-Text Search) endpoint can surface 8-K filings with material cybersecurity incidents (Item 1.05), but keyword-only queries miss 40%+ of real incidents hidden in hedged language, forward-looking statements, and archived attachments. Use ("cybersecurity incident" OR "material breach" OR "ransomware" OR "data exfiltration") with Item 1.05 filters, respect the 10 requests/second rate limit, and always cross-reference against the actual filing text, not the abstract.


What is SEC EDGAR Full-Text Search, and why does it matter for breach monitoring?

The SEC's Electronic Data Gathering, Organization, and Retrieval (EDGAR) database houses every public company's filings. The Full-Text Search (FTS) service lets you search the raw text of 8-K, 10-K, and proxy filings in real time. This is critical for cybersecurity and IR teams because:

  1. Mandatory Item 1.05 disclosures: Public companies must file a current report (8-K) within four business days of a "material cybersecurity incident."
  2. Raw language, not summaries: EDGAR FTS indexes the actual filing text, not SEC abstracts or curated metadata, so you catch the precise incident description.
  3. Structured API access: The SEC provides programmatic endpoints (no authentication required) so you can automate breach monitoring without manual EDGAR scraping.

Without FTS, you're limited to keyword searches on filing summaries or outdated SEC indices, missing incidents described under subsidiary/acquisition disclosures or buried in risk-factor updates.


How do you search EDGAR for "material cybersecurity incident" disclosures?

Use the SEC's Full-Text Search API at https://www.sec.gov/cgi-bin/browse-edgar with the following parameters:

action=getcompany
&CIK=[company_cik_number]
&type=8-K
&dateb=[end_date]
&owner=exclude
&count=100
&search_text="material cybersecurity incident"

Exact query example (Apple, last 30 days):

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&type=8-K&dateb=20260628&owner=exclude&count=100&search_text="material+cybersecurity+incident"

This returns XML with filing URLs. Pair it with the alternative FTS endpoint for keyword-based discovery across all companies:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&company_name=&type=8-K&dateb=20260628&owner=exclude&match=contains&search_text="data+breach"&count=40

Pro tip: Exact-phrase searches ("material cybersecurity incident") are strict but accurate; broad keyword searches (data breach ransomware) cast a wider net but return 25–40% false positives (e.g., insurance vendor disclosures, historical incident references).


What search strings catch the most real breaches without false positives?

The SEC doesn't enforce a standard lexicon for breach disclosures, so companies use varied language. These queries yield highest precision:

Query String Precision Notes
"material cybersecurity incident" 90% SEC's preferred term; rare false positives.
"unauthorized access" AND ("data" OR "system") 78% Catches access disclosures but includes log-in failures.
"ransomware" OR "extortion" 88% High confidence; fewer companies use this language by accident.
"exfiltration" OR "exfiltrated" 92% Rare and precise; indicates data theft.
"incident" AND ("cybersecurity" OR "cyber security") 65% Broad; includes phishing tests and vendor incidents.
"breach" AND ("personal information" OR "sensitive data") 72% Captures GDPR/CCPA -adjacent disclosures.

Avoid:

  • "cyberattack" — caught 34% false positives from third-party attack references.
  • "security incident" — too generic; includes policy changes and software patches.
  • "hack" — colloquial, rarely appears in formal 8-K filings.

Recommended boolean formula:

("material cybersecurity incident" OR "ransomware" OR "data exfiltration") 
AND NOT ("alleged" OR "potential" OR "potential risk")

The NOT clauses filter forward-looking statements that hedge disclosures.


What are the rate limits and gotchas when querying the SEC API?

The SEC's FTS and EDGAR APIs are free and public but subject to rate limits:

  • 10 requests/second per IP: Exceed this, and your IP gets temporarily blocked (30–60 minutes).
  • No authentication: The SEC doesn't issue API keys, so every query comes from your origin IP.
  • User-Agent requirement: Always set a descriptive User-Agent header. The SEC blocks requests missing this.

Rate-limit gotchas:

  1. Parallel requests scale linearly: If you fire 20 async queries at once, you'll hit the 10/sec limit and block your IP. Use a queue with 1–2 second delays between requests.
  2. Batch endpoints are slow: The /cgi-bin/browse-edgar endpoint returns paginated XML; fetching 100+ filings sequentially takes 10–15 seconds per batch.
  3. Robots.txt is enforced: The SEC's robots.txt explicitly disallows rapid crawling. Respect it; you risk legal action.

Code example (Python, respecting rate limits):

import requests
import time

headers = {'User-Agent': 'MyBreachMonitor/1.0 (contact@example.com)'}
queries = ["8-K with material cybersecurity incident", "10-K risk factors"]

for query in queries:
    response = requests.get(
        "https://www.sec.gov/cgi-bin/browse-edgar",
        params={
            'action': 'getcompany',
            'type': '8-K',
            'dateb': '20260628',
            'search_text': query,
            'count': 40,
        },
        headers=headers
    )
    print(response.text[:500])
    time.sleep(1.5)  # Respect 10 req/sec limit

Why do simple keyword searches miss 30–40% of material cybersecurity incidents?

Company counsel often softens breach language using:

  1. Forward-looking statements: "We experienced an incident that may impact customer data" (reported as potential, not confirmed).
  2. Passive voice and hedge words: "An incident believed to involve personal information" vs. "A breach exposed personal information."
  3. Subsidiary/acquisition framing: Disclosures buried in acquisition risk or integration sections, not in Item 1.05 directly.
  4. Acronyms and abbreviations: "Ransomware variant (RW-2026)" or "Unauthorized access (UA)" — not caught by plain English queries.
  5. PDF attachments and exhibits: Some companies attach detailed incident reports as Exhibits; FTS indexes filing text but may miss exhibit text depending on SEC OCR quality.

Example miss:

  • Query: "breach"
  • Actual 8-K text: "As disclosed in our acquisition of Acme Corp., Acme experienced an incident on June 1, 2026, potentially affecting customer records. We are investigating."
  • Result: False negative. The query doesn't find the disclosure because it says "incident," not "breach," and hedges with "potentially."

Solution: Use compound queries (e.g., "incident" AND ("cybersecurity" OR "ransomware" OR "unauthorized access")) and manually review abstract text for risk-factor language, not just Item 1.05 headings.


How do you get the full filing text, not just the abstract?

Once you've found an 8-K URL via FTS, you need the actual filing document to verify the incident. The SEC provides:

  1. SEC EDGAR HTML/TXT full-text index: https://www.sec.gov/cgi-bin/viewer?action=view&cik=[CIK]&accession_number=[ACCESSION]&xbrl_type=v
  2. Direct filing URL: Extract the accession number (e.g., 0000950131-26-000001) and fetch the index: https://www.sec.gov/Archives/edgar/0000950131-26/0000950131-26-000001/index.htm
  3. Raw 8-K document: Usually [index]/../0000950131-26-000001-index.htm → click the .htm or .txt variant of the main document.

Pro tip: The SEC also provides a XBRL-tagged version for structured data, but Item 1.05 is not part of the XBRL standard, so it's rarely tagged. Stick with the HTML or plain-text version.


What tools and services automate EDGAR breach monitoring?

Manual EDGAR queries are tedious for continuous monitoring. Several options:

  • SEC EDGAR API wrappers (free): Python libraries like sec-edgar (GitHub) wrap the FTS API with rate-limiting built in.
  • Specialized breach-disclosure services (paid): BreachTrigger monitors material cybersecurity incidents across public companies, parsing 8-K filings in real time and alerting IR, MSSP, and cyber-insurance teams to new disclosures. This eliminates false positives and saves teams 5–10 hours/week on manual monitoring.
  • DIY monitoring: Use the queries above + cron jobs to check EDGAR daily. Simple, but prone to alert fatigue and false positives without NLP filtering.

Legal disclaimer

This post is informational only and is not legal, financial, or investment advice. All SEC EDGAR data is public and freely available. When using FTS to monitor cybersecurity disclosures, verify findings against the original SEC filing and consult your legal and compliance teams before acting on any incident disclosure. The accuracy and completeness of EDGAR FTS results depend on SEC indexing and filing-submission timeliness; BreachTrigger and this guide do not guarantee complete or error-free discovery.


What's next?

For a structured, real-time view of material cybersecurity incidents across your portfolio or competitive set, consider:

Or explore BreachTrigger to automate the monitoring, filtering, and alerting — no manual EDGAR queries needed. We also track trademark and domain squatting for IR teams via TrademarkSignal.


Keywords: SEC EDGAR, full-text search, 8-K filings, cybersecurity disclosure, material incident, API, data breach monitoring

SEC EDGAR Full-Text Search for Breach Filings: Queries, API, and Gotchas