UMLS and MySQL stats

These stats are from my experience loading the Unified Medical Language System (UMLS) maintained by the National Library of Medicine (NLM) into tables in MySQL version 3.23.39. The compressed 2001 UMLS Knowledge Sources can be downloaded here, assuming you are a registered user of the UMLS. The size of the download is 645 MB for Unix .TGZ or 642 MB for PC .ZIP.

You can find more information about accessing the UMLS Knowledge Sources here on the UMLS Information page. Sample load scripts for putting the UMLS Metathesaurus into MySQL are available here.

All file sizes are in bytes unless otherwise indicated.


before MySQL after MySQL
DIRECTORY NAME # OF TABLES SIZE OF TEXT FILES
(uncompressed)
SIZE OF TABLE FILES
(.MYD, .MYI, .frm)
META
(Metathesaurus)
18 3,182,121,481 3,553,484,570
LEX
(Lexicon)
13 39,048,919 48,009,980
LEX/LEX_DB
(Lexical databases)
3 581,585 875,002
NET
(Semantic Network)
6 587,709 835,980
UMLS (total) 40 3,235,105,035
(3 GB)
3,603,205,532
(3.3 GB)



SIZES OF A FEW SPECIFIC FILES

before MySQL after MySQL
FILE NAME SIZE OF TEXT FILES
(uncompressed)
SIZE OF TABLE FILES
(.MYD, .MYI, .frm)
MRCOC 466,167,879 524,103,382
MRCON 123,355,529 129,604,508
MRSO 89,085,376 ** 31,835,392
LRAGR 21,929,585 24,302,690
** Note: This is the size of table MRSO after it was compressed using the myisampack tool provided by MySQL. Before compression, the size of the table files was 87,739,980 bytes.


UMLS Total Counts (from Section B.3 of the UMLS Documentation)
Concepts: 797,359
Terms: 1,485,241
Strings: 1,734,706
Source Strings: 1,877,059


back to my DMP page
Alice Hagens, August 2001
email:alice.hagens@dartmouth.edu.