DATA FORMATS

 

You will often need to convert your data in order to import them into Stata. Data come in many formats. Two types discussed here include ASCII and binary. 

 

ASCII stores data as plain text, so they can be read by any text editor. ASCII Data may be stored in fixed or free format. Fixed format arranges variables in columns. Free format uses commas or tabs to distinguish between values.

 

Binary data must be decoded using special software. This file format includes metadata on labels and notes. Binary data may become unreadable upon the release of a Stata update.

 

Stata also reads SAS transport format and, of course, Stata format.

 

 

 

Long (2009) specifies three ways to import data:

 

1. Stata will read data in Stata, ASCII, or SAS transport formats.

2. Data can first be imported into a program that allows users to export the data in Stata, ASCII, or SAS transport formats.

3. A program such as Stat/Transfer or DBMS/Copy can be used to convert the data into a format that Stata reads. See the video below for a demonstration of Stat/Transfer.

 

 

 

 

 

 

 

 

 

 

 

 

When importing ASCII files, Stata users have three command options:

 

1. insheet reads tab-delimited or comma-delimited files, which are often generated by spreadsheet programs. For examples, use help insheet.

2. infile reads tab-limited, comma-delimited, and space-delimited ASCII files stored in free format. It can read fixed-format files when there is a dictionary specifying variable locations. For examples, use help infile.

3. infix allows users to import fixed-format ASCII lacking a dictionary. For examples, use help infix.

 

When importing SAS transport files, Stata users can implement the fdause command. For examples, see help fdause.

 

The following video demonstrates how to import ASCII files using dialog boxes.

 

 

 

 

 

 

 

 

 

 

 

 

This video demonstrates how to copy and paste Excel data into Stata.

 

 

 

 

 

 

 

 

 

 

 

 

IMPORTANT: Regardless of the method used to import the data, researchers should always verify that Stata has accurately read their dataset. Even seemingly fool-proof methods sometimes fail.

 

 

 

Long (2009) outlines three steps for ensuring the accuracy of your data after it has been imported into Stata.

 

1. Compare descriptive statistics: Run frequency distributions on variables with limited categories and obtain summary statistics on more complex variables. Use codebook, compact to quickly obtain basic summary statistics on all of the variables in your dataset. When using another statistical package (e.g., SAS or SPSS), compare descriptive statistics across programs to ensure accuracy.

2. Check missing values: Inspect missing values in the source program and Stata to ensure they have been properly converted using the missing option (e.g., tab1 varlist, missing).

3. Compare conversion methods: Import the same dataset into Stata using two different methods and compare the files using the cf command.

IMPORTING DATA

VERIFYING DATA CONVERSION