Documentation

About This Dataset

This directory provides easy access to hospital Machine-Readable Files (MRFs) as required by the CMS Hospital Price Transparency rule. Under this regulation, hospitals must publish standard charges for all items and services in a machine-readable format. The details and structure of these files are available at the CMSGov github repository.

What's Available

  • Original MRF Files - Raw data files published by hospitals (JSON, CSV, or other formats)
  • DuckDB Databases - Parsed and optimized versions for fast querying
  • Hospital Metadata - Contact information, URLs, and file details
  • MRF Metadata - Download timestamps, checksums, and file characteristics

Data Collection & Processing

CMS-HPT.txt files are regularly fetched from hospital websites, and the MRF files they point to are downloaded and parsed. Each hospital in the directory has the cms-hpt.txt file, if available, a link to the original MRF file, if available, and a parsed duckdb version of that MRF file, if we could successfuly parse it.

Why DuckDB?

The raw files that are published by hospitals can be large and difficult to work with directly. We felt that providing an easy way to query the data in a structure that most closely resembled the "Tall" csv format would make it easier for users to explore and analyze the data. We are huge fans of DuckDB. It is easy to install, incredibly fast and lightweight, and widely supported.

Update Frequency

Hospitals are required to update their pricing data at least annually. This directory's data is refreshed periodically to reflect the latest published information.

Consolidated Hospital Download

If you're looking for a consolidated database of all the MRF data that is available at the Hospital Data Directory, we've made that available.

Simply login to Oria (create an account if you don't have one, it's free!), and on the left menu you can navigate to Full Data Download. From there you can download a single archive that contains a duckdb file combined with parquet files that has all 5000+ hospital MRF data parsed and ready to query in one place.

Data Dictionary

Hospital Metadata Files

Each hospital folder contains a cms-hpt.txt file with the following fields:

Field Name Type Description
location_name string Official name of the healthcare facility
source_page_url URL Link to the hospital's price transparency webpage
mrf_url URL Direct download link to the MRF file
contact_name string Contact person for pricing inquiries
contact_email email Email address for pricing questions

MRF Metadata (mrf-meta.txt)

Technical details about the downloaded MRF file:

Field Name Type Description
facility string Facility name extracted from MRF
downloaded timestamp Date and time the file was retrieved
file_name string Original filename of the MRF
file_size integer Size of the file in bytes
file_type string MIME type or format (e.g., application/json)
md5_hash string MD5 checksum for file integrity verification
modified_time timestamp Last modified date from the source server

Original MRF File Format

The raw MRF files published by hospitals typically follow the CMS schema and contain pricing data in JSON or CSV format. These files are parsed and loaded into the DuckDB database format for easier querying. See the DuckDB Schema section below for the actual database structure.

Working with DuckDB Files

Each hospital's data has been parsed into a DuckDB database (*_parsed.duckdb) for efficient querying.

DuckDB Database Schema

The parsed DuckDB files contain 7 main tables with normalized and enriched data. Below is the complete schema reference.

1. hospitals

Hospital metadata (one row per hospital)

Column Type Description
hospital_idBIGINTUnique hospital identifier (primary key)
hospital_nameVARCHAROfficial hospital name
hospital_addressVARCHARPrimary hospital address
hospital_locationVARCHARHospital location/system designation
last_updated_onDATEDate MRF was last updated
versionVARCHARMRF schema version (e.g., "2.0.0")
license_numberVARCHARState license number
hospital_stateVARCHARTwo-letter state code
affirmationVARCHARAffirmation statement if provided
confirm_affirmationBOOLEANWhether affirmation was confirmed
financial_aid_policyVARCHARURL or description of financial aid policy
general_contract_provisionsVARCHARContract provisions text or URL
hospital_cityVARCHARCity where hospital is located
all_addressesVARCHAR[]Array of all addresses for multi-location facilities
all_locationsVARCHAR[]Array of all location names
mrf_hospital_nameVARCHARHospital name as it appears in MRF file
hpt_hospital_nameVARCHARHospital name as it appears in HPT file

2. standard_charges

Service-level charges with aggregated payer statistics (one row per service)

Column Type Description
charge_idBIGINTUnique charge identifier
charge_seqINTEGERCharge sequence number
hospital_idBIGINTForeign key to hospitals table
descriptionVARCHARService or item description
gross_chargeDOUBLEGross charge (undiscounted price)
discounted_cashDOUBLEDiscounted cash price
minimumDOUBLEMinimum charge
maximumDOUBLEMaximum charge
settingVARCHARSetting: 'inpatient', 'outpatient', 'both'
billing_classVARCHARBilling class (e.g., 'professional', 'facility')
cptVARCHARCPT code
hcpcsVARCHARHCPCS code
ms_drgVARCHARMS-DRG code
rcVARCHARRevenue code
cdmVARCHARCharge description master code
ndcVARCHARNational Drug Code
icdVARCHARICD diagnosis/procedure code
modifiersVARCHARAssociated procedure modifiers
additional_generic_notesVARCHARAdditional notes about the charge
drug_unitVARCHARDrug unit of measure (if applicable)
drug_typeVARCHARType of drug (brand, generic)
other_code1VARCHARAdditional code type 1
other_code1_typeVARCHARType of other_code1
other_code2VARCHARAdditional code type 2
other_code2_typeVARCHARType of other_code2
all_codesVARCHARConcatenated string of all codes
payer_countINTEGERTotal number of payer entries
distinct_payer_countINTEGERNumber of unique payers
avg_negotiated_rateDOUBLEAverage negotiated dollar amount
min_negotiated_rateDOUBLEMinimum negotiated dollar amount
max_negotiated_rateDOUBLEMaximum negotiated dollar amount
negotiated_rate_stddevDOUBLEStandard deviation of negotiated rates

3. standard_charge_details

Payer-level detail with negotiated rates (one row per payer per service)

Column Type Description
detail_idBIGINTUnique detail row identifier (primary key)
charge_idBIGINTForeign key to standard_charges
charge_seqINTEGERCharge sequence number
payer_seqINTEGERPayer sequence within charge
hospital_idBIGINTForeign key to hospitals table
descriptionVARCHARCharge description
gross_chargeDOUBLEGross charge amount
discounted_cashDOUBLECash/self-pay price
minimumDOUBLEMinimum negotiated rate
maximumDOUBLEMaximum negotiated rate
settingVARCHARCare setting
billing_classVARCHARBilling classification
additional_generic_notesVARCHARAdditional notes
drug_unitVARCHARDrug unit of measure
drug_typeVARCHARDrug type
cptVARCHARCPT code
hcpcsVARCHARHCPCS code
ms_drgVARCHARMS-DRG code
rcVARCHARRevenue code
cdmVARCHARCDM code
ndcVARCHARNDC code
icdVARCHARICD code
other_code1VARCHARAdditional code 1
other_code1_typeVARCHARType of additional code 1
other_code2VARCHARAdditional code 2
other_code2_typeVARCHARType of additional code 2
all_codesVARCHARAll codes concatenated
payer_nameVARCHARInsurance payer name
plan_nameVARCHARSpecific plan name
additional_payer_notesVARCHARNotes specific to this payer
standard_charge_dollarDOUBLENegotiated rate in dollars
standard_charge_percentageDOUBLENegotiated rate as percentage of gross
standard_charge_algorithmVARCHARAlgorithm used to determine rate
estimated_amountDOUBLEEstimated amount for this payer
methodologyVARCHARMethodology for rate calculation
payer_groupVARCHARPayer grouping (Commercial, Medicare, Medicaid, etc.)
payer_typeVARCHARType of payer
setting_normalizedVARCHARNormalized setting value
billing_class_normalizedVARCHARNormalized billing class value
methodology_normalizedVARCHARNormalized methodology value

4. modifier_charges

CPT/HCPCS modifier summary (one row per modifier)

Column Type Description
modifier_idBIGINTUnique modifier identifier
hospital_idBIGINTForeign key to hospitals table
descriptionVARCHARCommon name of the modifier
codeVARCHARModifier code (e.g., '50', '22')
payer_countINTEGERTotal number of payer rows
distinct_payer_countINTEGERNumber of distinct payers

Note: Modifiers indicate special circumstances that affect how a procedure should be reimbursed. Common examples include bilateral procedures, multiple procedures, or unusual circumstances.

5. modifier_charge_details

Payer-level modifier details (one row per payer per modifier)

Column Type Description
modifier_detail_idBIGINTUnique modifier detail identifier
modifier_idBIGINTForeign key to modifier_charges
payer_seqINTEGERPayer sequence within modifier
hospital_idBIGINTForeign key to hospitals table
modifier_descriptionVARCHARModifier description
modifier_codeVARCHARModifier code
payer_nameVARCHARInsurance payer name
plan_nameVARCHARInsurance plan name
descriptionVARCHARHow modifier changes the charge

Note: This table denormalizes modifier information for query optimization, similar to standard_charge_details. Additional columns may include payer groups, methodology, and negotiated rates.

6. cms_hpt_metadata

CMS-HPT file metadata (one row per hospital)

Column Type Description
contact_emailVARCHARContact email from HPT file
contact_nameVARCHARContact name from HPT file
location_nameVARCHARLocation name
mrf_urlVARCHARMRF file URL from HPT listing
source_page_urlVARCHARCMS HPT source page URL

Note: This table contains metadata from the CMS Hospital Price Transparency program listing. It links to the original CMS HPT listing and provides contact information.

7. mrf_metadata

MRF file download and parsing metadata (one row per hospital)

Column Type Description
facilityVARCHARFacility name
source_urlVARCHARURL where MRF was downloaded from
data_downloadedVARCHARDownload timestamp
filenameVARCHAROriginal filename
filesizeVARCHARFile size (human readable)
md5VARCHARMD5 checksum of file
last_updated_onVARCHARLast update date from MRF metadata

Table Relationships

  • hospitals (1) ↔ (many) standard_charges
  • hospitals (1) ↔ (many) standard_charge_details
  • standard_charges (1) ↔ (many) standard_charge_details (via charge_id)
  • hospitals (1) ↔ (1) cms_hpt_metadata
  • hospitals (1) ↔ (1) mrf_metadata

Schema Notes

  • Denormalized Schema: We felt that a structure that most closely resembled the "Tall" csv format would make it easier for users to explore and analyze the data.
  • Payer Standardization: Payer names are standardized and grouped (e.g., all Blue Cross variants → "BCBS" in payer_group). Payer types classify as Commercial, Medicare, Medicaid, Medicare Advantage, or Other.
  • Computed Fields: Some columns like hospital_city, payer_type, and other standardized fields are derived during processing to make querying easier.

Querying in Duck-UI

The easiest way to explore the data is through Duck-UI, a browser-based SQL interface. Click "Query in Duck-UI" on any hospital detail page to start querying immediately.

Common SQL Patterns

List all available tables

SHOW TABLES;

View table schema

DESCRIBE table_name;

Find charges for a specific CPT code

SELECT cpt, description, gross_charge
FROM standard_charge_details
WHERE cpt = '99213'
ORDER BY gross_charge DESC;

Compare prices across payers

SELECT payer_name, AVG(standard_charge_dollar) as avg_rate
FROM standard_charge_details
WHERE cpt = '99213'
GROUP BY payer_name
ORDER BY avg_rate DESC;

Get payer statistics for a hospital

SELECT payer_type,
       COUNT(*) as charge_count,
       AVG(standard_charge_dollar) as avg_rate,
       MIN(standard_charge_dollar) as min_rate,
       MAX(standard_charge_dollar) as max_rate
FROM standard_charge_details
WHERE standard_charge_dollar IS NOT NULL
GROUP BY payer_type
ORDER BY avg_rate DESC;

Using DuckDB Locally

Download any .duckdb file and open it with the DuckDB CLI or Python:

Python Example

import duckdb

# Connect to the database
con = duckdb.connect('hospital_parsed.duckdb')

# Run a query
result = con.execute("SELECT * FROM standard_charges LIMIT 10").fetchdf()
print(result)

Performance Tips

  • Use LIMIT for initial exploration to avoid loading huge result sets
  • DuckDB automatically optimizes queries - no need to create indexes manually
  • For large exports, use COPY TO instead of fetching to memory

Frequently Asked Questions

What is a Machine-Readable File (MRF)?

An MRF is a structured data file (JSON or CSV) that hospitals are required to publish containing their standard charges for all items and services. The "machine-readable" format allows automated processing and analysis, unlike PDF documents.

How often is this data updated?

Hospitals must update their pricing data at least annually, though some update more frequently. This portal periodically refreshes data from hospital sources. Check the "Downloaded" timestamp on each hospital's detail page to see when the data was last retrieved.

What's the difference between gross charges and negotiated rates?

Gross charges (also called "standard charges" or "chargemaster prices") are the hospital's list prices before any discounts. Negotiated rates are the actual amounts that insurance companies have agreed to pay, which are typically much lower than gross charges.

How do I find a specific hospital?

Use the search bar at the top of any page to search by hospital name. You can also browse alphabetically using the letter index below the search bar, or view all hospitals on the Hospitals page.

Can I download all the data at once?

Currently, each hospital's data must be downloaded individually from their detail page. We intend to release a comprehensive dataset with all hospitals in one download in the near future.

What tools can I use to analyze this data?

The DuckDB files can be opened with:

  • Duck-UI - Browser-based SQL interface (no installation required)
  • DuckDB CLI - Command-line tool (download)
  • Python - Using the duckdb package
  • R - Using the duckdb package
  • Jupyter - Great for exploratory analysis

Why are some hospitals missing?

This portal only includes hospitals that have published compliant MRF files that we were able to parse. Hospitals may be missing due to non-compliance, technical issues with their files, or because they haven't been processed yet.

How accurate is this data?

The data is exactly as published by each hospital. We do not modify pricing information, only parse and structure it for easier access. However, hospitals may have errors in their original files, and published prices may not reflect all discounts or negotiations.

External Resources