Overview
SEC-EDGAR-GPT is a 124-million parameter GPT-2 language model trained on 1.55 billion tokens of SEC-EDGAR corporate filings using the nanoGPT framework. The model was trained on a single NVIDIA RTX 4070 GPU over approximately 8 hours, reaching a final validation loss of 2.28.
We evaluated the model's generation quality across multiple SEC filing sections including business descriptions, management discussion and analysis, risk factors, financial notes, and proxy statements. The model successfully learns SEC document structure, financial vocabulary, and boilerplate language patterns, but exhibits characteristic limitations in long-range coherence, numerical consistency, and table extension.
Generation Quality Analysis
Five prompts representing different SEC filing sections. Each prompt: 500–2,000 characters of authentic SEC text. Generated 1,000 tokens at temperature 0.7, top-k=100. Evaluation prompts generated with Hermes Agent (Nous Research).
Prompt 1: Business Description (Item 1)
Show full prompt (input text)
ITEM 1. BUSINESS
Overview
We are a leading provider of cloud-based enterprise software solutions for the healthcare industry. Our platform enables hospitals, clinics, and other healthcare organizations to streamline their operations, improve patient outcomes, and reduce costs. We serve over 2,500 customers across 48 states, including 18 of the top 25 hospital systems in the United States.
Our solutions address critical challenges in healthcare delivery, including electronic health records (EHR), practice management, revenue cycle management, and population health analytics. Our AI-powered clinical decision support tools help physicians make better-informed treatment decisions, while our interoperability platform enables seamless data exchange between disparate healthcare systems.
For the fiscal year ended December 31, 2023, we generated revenue of approximately $487.2 million, representing a 28% increase from the prior year. Our subscription revenue, which accounted for 78% of total revenue, grew 35% year-over-year. We had approximately 8,200 full-time employees as of December 31, 2023.
Generated continuation (first 500 tokens):
Our business is heavily dependent on the resources of our
clinical trials and the discovery and development of our
product candidates. Our technology costs are based on the
materials, manufacturing, test, and customer support costs.
We are a global leader in the healthcare industry, and the
care industry is undergoing a number of product development,
including the following:
- Our core product development, product development and
commercialization of our product candidates, which we
currently license agreements with third-party manufacturers.
- Our commercialization of X-Avent therapies, including
the following:
- Our commercialization of X-Zentib, including the following:
- Our commercialization of X-Zentib, including the following:
Structure Maintains SEC filing format
Loop "Commercialization of product candidates" repeats 47+ times
Hallucinate Invented drug names: X-Avent, X-Zentib, S-Zentib, Q-partnerib
Prompt 2: MD&A Revenue Analysis
Show full prompt (input text)
ITEM 7. MANAGEMENT'S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS
Results of Operations
Revenue
Total revenue increased by $105.4 million, or 28%, to $487.2 million for the year ended December 31, 2023, from $381.8 million for the year ended December 31, 2022. The increase was primarily attributable to growth in subscription revenue, which increased by $98.2 million, or 35%, driven by new customer acquisitions and expanded usage from existing customers. Professional services revenue increased by $7.2 million, or 8%, reflecting increased implementation activity.
Cost of Revenue and Gross Profit
Cost of revenue increased by $31.2 million, or 22%, to $173.4 million for the year ended December 31, 2023, from $142.2 million for the year ended December 31, 2022. The increase was primarily due to higher hosting costs associated with our cloud infrastructure and increased headcount in our customer support organization. Gross profit margin improved to 64.4% for the year ended December 31, 2023, compared to 62.7% for the prior year, reflecting economies of scale in our cloud infrastructure.
Generated continuation (first 400 tokens):
Cost of revenue increased by $32.1 million, or 26%, for
the year ended December 31, 2023, compared to $162.2
million for the prior year. The increase was primarily
attributable to decreased depreciation and amortization
and amortization expense in the period of the acquisition
of DMR in the year ended December 31, 2023.
Cost of revenue increased by $40.0 million, or 2%, to
$107.1 million for the year ended December 31, 2023,
from $162.8 million for the year ended December 31, 2022.
The increase was primarily attributable to the increase
in service revenue, partially offset by depreciation on
certain C-1 products, which increased by $73.4 million,
or 1%, from $96.0 million for the year ended December 31,
2023.
Cost of revenue increased by $20.5 million, or 2%, to
$1.8 million for the year ended December 31, 2023,
primarily due to the increase in service revenue and the
increased amortization of intangible assets.
Template Excellent first paragraph — references acquisition (DMR)
Loop 10 consecutive "Cost of revenue increased/decreased" paragraphs
Math Percentages don't match dollar amounts cited
Prompt 3: Risk Factors
Show full prompt (input text)
RISK FACTORS
Risks Related to Our Business
We have a history of net losses and may not achieve or maintain profitability in the future.
We incurred net losses of $42.3 million, $67.8 million, and $89.1 million for the years ended December 31, 2023, 2022, and 2021, respectively. As of December 31, 2023, we had an accumulated deficit of $523.4 million. We expect to continue to incur significant expenses related to research and development, sales and marketing, and general administrative operations as we expand our business. We may not achieve or maintain profitability in the future.
Our quarterly operating results may fluctuate significantly, which could cause our stock price to decline.
Our quarterly revenue and operating results have varied in the past and may continue to vary significantly from quarter to quarter. Our revenue growth rate may decline in future periods as our revenue base increases. Factors that may cause our quarterly results to fluctuate include the timing of large enterprise contracts, seasonal purchasing patterns in the healthcare industry, and the impact of regulatory changes on customer buying decisions.
Generated continuation (first 500 tokens):
We have financed and expect to continue to incur losses
and we may not achieve profitability in the future.
We have incurred significant losses since inception and
expect our net losses since inception and expect to
continue to incur losses and we may never achieve
profitability through the development and
commercialization of our product candidates. Many of our
product candidates are in development, and we have not
yet generated any revenue from product sales.
We continue to develop and market new products and
technologies in the development and commercialization of
our product candidates, and as a result, our revenues
from product sales and marketing, are subject to product
commercialization and regulatory requirements.
Our product candidates may fail to develop, develop and
commercialize our product candidates may fail to develop
and commercialize our product candidates.
Structure Maintains risk factor format (heading + explanation)
Loop "Product candidates" appears 47 times in 25 lines
Self-ref Recursive: "product candidates may fail to develop, develop and commercialize"