vendredi 3 mai 2019

How to use an if command inside foreach loop in Stata?

I want to loop through several datasets that are otherwise similar but in others each individual has a unique observation, in others the individuals have more than one observation.

In the loop, I first want to transpose these datasets into wide datasets. Transposed variables are renamed with a numerical suffix (i.e. variables A B C become A1 B1 C1 A2 B2 C2) When individuals have only one observation, the dataset only has variables A1 B1 C1. When at least some individuals have more than one observation, the dataset will contain variables A1 B1 C1 A2 B2 C2 etc. I would then like to provide some commands that relate to variables A1 B1 C1, or A1 B1 C1 A2 B2 C2 etc., where appropriate, but in the loop I cannot refer to variables like A2 that only exist in some of the datasets, as this returns an error when the loop goes through dataset that only have A1 B1 C1.

How can I use the if command to direct part of my code to only part of the datasets? Or some other approach entirely?

I have the following type of code:

foreach year in 98 00 02 04 06 08 {
   use data_`year', clear
   bysort id: generate idcount=_n
   by id: egen idcountmax=max(idcount)
   reshape wide A B C ..., i(id idcountmax) j(idcount)


This means in datasets where individuals have only one observation, idcountmax=1 and in the transposed version there are only variables A1 B1 C1. But in datasets where individuals had more than one observation, idcountmax>1 and there are variables A1 B1 C1 A2 B2 C2 etc.

I want to do something of the sort:

   if idcountmax==1 {
      (commands that relate to A1 B1 C1)
   }
   if idcountmax==2 {
      (commands that relate to A1 B1 C1 A2 B2 C2)
   }
   if idcountmax==3 {
      (commands that relate to A1 B1 C1 A2 B2 C2 A3 B3 C3)
   }
}

But Stata's if command does not work in this way. It looks at the first row of the dataset to solve id idcountmax==x and performs the given commands for all observations in the current dataset, whereas I would like Stata to find all observations in the current dataset where idcountmax==x and perform the appropriate commands, then find all observations in the current dataset where idcountmax==y and perform those commands, and so on.

(I have used SAS previously, this would correspond to SAS's if ... then do; end;)

Note that I can of course find out which datasets have idcountmax==1, 2, and 3, and write different forloops for each, but this is definitely undesirable. I also do not wish to add A2-C3 to datasets that don't have them.

Aucun commentaire:

Enregistrer un commentaire