Documentation
About This Dataset
This directory provides easy access to hospital Machine-Readable Files (MRFs) as required by the CMS Hospital Price Transparency rule. Under this regulation, hospitals must publish standard charges for all items and services in a machine-readable format. The details and structure of these files are available at the CMSGov github repository.
What's Available
- Original MRF Files - Raw data files published by hospitals (JSON, CSV, or other formats)
- DuckDB Databases - Parsed and optimized versions for fast querying
- Hospital Metadata - Contact information, URLs, and file details
- MRF Metadata - Download timestamps, checksums, and file characteristics
Data Collection & Processing
CMS-HPT.txt files are regularly fetched from hospital websites, and the MRF files they point to are downloaded and parsed. Each hospital in the directory has the cms-hpt.txt file, if available, a link to the original MRF file, if available, and a parsed duckdb version of that MRF file, if we could successfuly parse it.
Why DuckDB?
The raw files that are published by hospitals can be large and difficult to work with directly. We felt that providing an easy way to query the data in a structure that most closely resembled the "Tall" csv format would make it easier for users to explore and analyze the data. We are huge fans of DuckDB. It is easy to install, incredibly fast and lightweight, and widely supported.
Update Frequency
Hospitals are required to update their pricing data at least annually. This directory's data is refreshed periodically to reflect the latest published information.
Consolidated Hospital Download
If you're looking for a consolidated database of all the MRF data that is available at the Hospital Data Directory, we've made that available.
Simply login to Oria (create an account if you don't have one, it's free!), and on the left menu you can navigate to Full Data Download. From there you can download a single archive that contains a duckdb file combined with parquet files that has all 5000+ hospital MRF data parsed and ready to query in one place.
Data Dictionary
Hospital Metadata Files
Each hospital folder contains a cms-hpt.txt file with the following fields:
| Field Name | Type | Description |
|---|---|---|
location_name | string | Official name of the healthcare facility |
source_page_url | URL | Link to the hospital's price transparency webpage |
mrf_url | URL | Direct download link to the MRF file |
contact_name | string | Contact person for pricing inquiries |
contact_email | Email address for pricing questions |
MRF Metadata (mrf-meta.txt)
Technical details about the downloaded MRF file:
| Field Name | Type | Description |
|---|---|---|
facility | string | Facility name extracted from MRF |
downloaded | timestamp | Date and time the file was retrieved |
file_name | string | Original filename of the MRF |
file_size | integer | Size of the file in bytes |
file_type | string | MIME type or format (e.g., application/json) |
md5_hash | string | MD5 checksum for file integrity verification |
modified_time | timestamp | Last modified date from the source server |
Original MRF File Format
The raw MRF files published by hospitals typically follow the CMS schema and contain pricing data in JSON or CSV format. These files are parsed and loaded into the DuckDB database format for easier querying. See the DuckDB Schema section below for the actual database structure.
Working with DuckDB Files
Each hospital's data has been parsed into a DuckDB database (*_parsed.duckdb) for
efficient querying.
DuckDB Database Schema
The parsed DuckDB files contain 7 main tables with normalized and enriched data. Below is the complete schema reference.
1. hospitals
Hospital metadata (one row per hospital)
| Column | Type | Description |
|---|---|---|
hospital_id | BIGINT | Unique hospital identifier (primary key) |
hospital_name | VARCHAR | Official hospital name |
hospital_address | VARCHAR | Primary hospital address |
hospital_location | VARCHAR | Hospital location/system designation |
last_updated_on | DATE | Date MRF was last updated |
version | VARCHAR | MRF schema version (e.g., "2.0.0") |
license_number | VARCHAR | State license number |
hospital_state | VARCHAR | Two-letter state code |
affirmation | VARCHAR | Affirmation statement if provided |
confirm_affirmation | BOOLEAN | Whether affirmation was confirmed |
financial_aid_policy | VARCHAR | URL or description of financial aid policy |
general_contract_provisions | VARCHAR | Contract provisions text or URL |
hospital_city | VARCHAR | City where hospital is located |
all_addresses | VARCHAR[] | Array of all addresses for multi-location facilities |
all_locations | VARCHAR[] | Array of all location names |
mrf_hospital_name | VARCHAR | Hospital name as it appears in MRF file |
hpt_hospital_name | VARCHAR | Hospital name as it appears in HPT file |
2. standard_charges
Service-level charges with aggregated payer statistics (one row per service)
| Column | Type | Description |
|---|---|---|
charge_id | BIGINT | Unique charge identifier |
charge_seq | INTEGER | Charge sequence number |
hospital_id | BIGINT | Foreign key to hospitals table |
description | VARCHAR | Service or item description |
gross_charge | DOUBLE | Gross charge (undiscounted price) |
discounted_cash | DOUBLE | Discounted cash price |
minimum | DOUBLE | Minimum charge |
maximum | DOUBLE | Maximum charge |
setting | VARCHAR | Setting: 'inpatient', 'outpatient', 'both' |
billing_class | VARCHAR | Billing class (e.g., 'professional', 'facility') |
cpt | VARCHAR | CPT code |
hcpcs | VARCHAR | HCPCS code |
ms_drg | VARCHAR | MS-DRG code |
rc | VARCHAR | Revenue code |
cdm | VARCHAR | Charge description master code |
ndc | VARCHAR | National Drug Code |
icd | VARCHAR | ICD diagnosis/procedure code |
modifiers | VARCHAR | Associated procedure modifiers |
additional_generic_notes | VARCHAR | Additional notes about the charge |
drug_unit | VARCHAR | Drug unit of measure (if applicable) |
drug_type | VARCHAR | Type of drug (brand, generic) |
other_code1 | VARCHAR | Additional code type 1 |
other_code1_type | VARCHAR | Type of other_code1 |
other_code2 | VARCHAR | Additional code type 2 |
other_code2_type | VARCHAR | Type of other_code2 |
all_codes | VARCHAR | Concatenated string of all codes |
payer_count | INTEGER | Total number of payer entries |
distinct_payer_count | INTEGER | Number of unique payers |
avg_negotiated_rate | DOUBLE | Average negotiated dollar amount |
min_negotiated_rate | DOUBLE | Minimum negotiated dollar amount |
max_negotiated_rate | DOUBLE | Maximum negotiated dollar amount |
negotiated_rate_stddev | DOUBLE | Standard deviation of negotiated rates |
3. standard_charge_details
Payer-level detail with negotiated rates (one row per payer per service)
| Column | Type | Description |
|---|---|---|
detail_id | BIGINT | Unique detail row identifier (primary key) |
charge_id | BIGINT | Foreign key to standard_charges |
charge_seq | INTEGER | Charge sequence number |
payer_seq | INTEGER | Payer sequence within charge |
hospital_id | BIGINT | Foreign key to hospitals table |
description | VARCHAR | Charge description |
gross_charge | DOUBLE | Gross charge amount |
discounted_cash | DOUBLE | Cash/self-pay price |
minimum | DOUBLE | Minimum negotiated rate |
maximum | DOUBLE | Maximum negotiated rate |
setting | VARCHAR | Care setting |
billing_class | VARCHAR | Billing classification |
additional_generic_notes | VARCHAR | Additional notes |
drug_unit | VARCHAR | Drug unit of measure |
drug_type | VARCHAR | Drug type |
cpt | VARCHAR | CPT code |
hcpcs | VARCHAR | HCPCS code |
ms_drg | VARCHAR | MS-DRG code |
rc | VARCHAR | Revenue code |
cdm | VARCHAR | CDM code |
ndc | VARCHAR | NDC code |
icd | VARCHAR | ICD code |
other_code1 | VARCHAR | Additional code 1 |
other_code1_type | VARCHAR | Type of additional code 1 |
other_code2 | VARCHAR | Additional code 2 |
other_code2_type | VARCHAR | Type of additional code 2 |
all_codes | VARCHAR | All codes concatenated |
payer_name | VARCHAR | Insurance payer name |
plan_name | VARCHAR | Specific plan name |
additional_payer_notes | VARCHAR | Notes specific to this payer |
standard_charge_dollar | DOUBLE | Negotiated rate in dollars |
standard_charge_percentage | DOUBLE | Negotiated rate as percentage of gross |
standard_charge_algorithm | VARCHAR | Algorithm used to determine rate |
estimated_amount | DOUBLE | Estimated amount for this payer |
methodology | VARCHAR | Methodology for rate calculation |
payer_group | VARCHAR | Payer grouping (Commercial, Medicare, Medicaid, etc.) |
payer_type | VARCHAR | Type of payer |
setting_normalized | VARCHAR | Normalized setting value |
billing_class_normalized | VARCHAR | Normalized billing class value |
methodology_normalized | VARCHAR | Normalized methodology value |
4. modifier_charges
CPT/HCPCS modifier summary (one row per modifier)
| Column | Type | Description |
|---|---|---|
modifier_id | BIGINT | Unique modifier identifier |
hospital_id | BIGINT | Foreign key to hospitals table |
description | VARCHAR | Common name of the modifier |
code | VARCHAR | Modifier code (e.g., '50', '22') |
payer_count | INTEGER | Total number of payer rows |
distinct_payer_count | INTEGER | Number of distinct payers |
Note: Modifiers indicate special circumstances that affect how a procedure should be reimbursed. Common examples include bilateral procedures, multiple procedures, or unusual circumstances.
5. modifier_charge_details
Payer-level modifier details (one row per payer per modifier)
| Column | Type | Description |
|---|---|---|
modifier_detail_id | BIGINT | Unique modifier detail identifier |
modifier_id | BIGINT | Foreign key to modifier_charges |
payer_seq | INTEGER | Payer sequence within modifier |
hospital_id | BIGINT | Foreign key to hospitals table |
modifier_description | VARCHAR | Modifier description |
modifier_code | VARCHAR | Modifier code |
payer_name | VARCHAR | Insurance payer name |
plan_name | VARCHAR | Insurance plan name |
description | VARCHAR | How modifier changes the charge |
Note: This table denormalizes modifier information for query optimization, similar to standard_charge_details. Additional columns may include payer groups, methodology, and negotiated rates.
6. cms_hpt_metadata
CMS-HPT file metadata (one row per hospital)
| Column | Type | Description |
|---|---|---|
contact_email | VARCHAR | Contact email from HPT file |
contact_name | VARCHAR | Contact name from HPT file |
location_name | VARCHAR | Location name |
mrf_url | VARCHAR | MRF file URL from HPT listing |
source_page_url | VARCHAR | CMS HPT source page URL |
Note: This table contains metadata from the CMS Hospital Price Transparency program listing. It links to the original CMS HPT listing and provides contact information.
7. mrf_metadata
MRF file download and parsing metadata (one row per hospital)
| Column | Type | Description |
|---|---|---|
facility | VARCHAR | Facility name |
source_url | VARCHAR | URL where MRF was downloaded from |
data_downloaded | VARCHAR | Download timestamp |
filename | VARCHAR | Original filename |
filesize | VARCHAR | File size (human readable) |
md5 | VARCHAR | MD5 checksum of file |
last_updated_on | VARCHAR | Last update date from MRF metadata |
Table Relationships
- hospitals (1) ↔ (many) standard_charges
- hospitals (1) ↔ (many) standard_charge_details
- standard_charges (1) ↔ (many) standard_charge_details (via charge_id)
- hospitals (1) ↔ (1) cms_hpt_metadata
- hospitals (1) ↔ (1) mrf_metadata
Schema Notes
- Denormalized Schema: We felt that a structure that most closely resembled the "Tall" csv format would make it easier for users to explore and analyze the data.
- Payer Standardization: Payer names are standardized and grouped (e.g., all Blue Cross variants → "BCBS" in
payer_group). Payer types classify as Commercial, Medicare, Medicaid, Medicare Advantage, or Other. - Computed Fields: Some columns like
hospital_city,payer_type, and other standardized fields are derived during processing to make querying easier.
Querying in Duck-UI
The easiest way to explore the data is through Duck-UI, a browser-based SQL interface. Click "Query in Duck-UI" on any hospital detail page to start querying immediately.
Common SQL Patterns
List all available tables
SHOW TABLES; View table schema
DESCRIBE table_name; Find charges for a specific CPT code
SELECT cpt, description, gross_charge
FROM standard_charge_details
WHERE cpt = '99213'
ORDER BY gross_charge DESC; Compare prices across payers
SELECT payer_name, AVG(standard_charge_dollar) as avg_rate
FROM standard_charge_details
WHERE cpt = '99213'
GROUP BY payer_name
ORDER BY avg_rate DESC; Get payer statistics for a hospital
SELECT payer_type,
COUNT(*) as charge_count,
AVG(standard_charge_dollar) as avg_rate,
MIN(standard_charge_dollar) as min_rate,
MAX(standard_charge_dollar) as max_rate
FROM standard_charge_details
WHERE standard_charge_dollar IS NOT NULL
GROUP BY payer_type
ORDER BY avg_rate DESC; Using DuckDB Locally
Download any .duckdb file and open it with the DuckDB CLI or Python:
Python Example
import duckdb
# Connect to the database
con = duckdb.connect('hospital_parsed.duckdb')
# Run a query
result = con.execute("SELECT * FROM standard_charges LIMIT 10").fetchdf()
print(result) Performance Tips
- Use
LIMITfor initial exploration to avoid loading huge result sets - DuckDB automatically optimizes queries - no need to create indexes manually
- For large exports, use
COPY TOinstead of fetching to memory
Frequently Asked Questions
What is a Machine-Readable File (MRF)?
An MRF is a structured data file (JSON or CSV) that hospitals are required to publish containing their standard charges for all items and services. The "machine-readable" format allows automated processing and analysis, unlike PDF documents.
How often is this data updated?
Hospitals must update their pricing data at least annually, though some update more frequently. This portal periodically refreshes data from hospital sources. Check the "Downloaded" timestamp on each hospital's detail page to see when the data was last retrieved.
What's the difference between gross charges and negotiated rates?
Gross charges (also called "standard charges" or "chargemaster prices") are the hospital's list prices before any discounts. Negotiated rates are the actual amounts that insurance companies have agreed to pay, which are typically much lower than gross charges.
How do I find a specific hospital?
Use the search bar at the top of any page to search by hospital name. You can also browse alphabetically using the letter index below the search bar, or view all hospitals on the Hospitals page.
Can I download all the data at once?
Currently, each hospital's data must be downloaded individually from their detail page. We intend to release a comprehensive dataset with all hospitals in one download in the near future.
What tools can I use to analyze this data?
The DuckDB files can be opened with:
- Duck-UI - Browser-based SQL interface (no installation required)
- DuckDB CLI - Command-line tool (download)
- Python - Using the
duckdbpackage - R - Using the
duckdbpackage - Jupyter - Great for exploratory analysis
Why are some hospitals missing?
This portal only includes hospitals that have published compliant MRF files that we were able to parse. Hospitals may be missing due to non-compliance, technical issues with their files, or because they haven't been processed yet.
How accurate is this data?
The data is exactly as published by each hospital. We do not modify pricing information, only parse and structure it for easier access. However, hospitals may have errors in their original files, and published prices may not reflect all discounts or negotiations.