mardi 7 mai 2019

Is there a way to run a conditionally sum loop in Stata?

I have quite a large conflict dataset (71 million observations) with many variables and date (daily).

This is from the GDELT project for which the way the dataset is structured is that for each day, there is a target country and a source country of aggression. Namely, the first of January of 2000, many countries engaged in aggressive behaviour against others or themselves, and this dataset tracks this.

It looks like this:

clear

input long date_01 str18 source_01 str19 target_01 str4 cameocode_01
20000101 "AFG"    "AFGGOV" "2" 
20000101 "AFG"    "AFGGOV" "8"
20000101 "AFG"    "ARE"    "3" 
20000101 "AFG"    "CVL"    "4" 
20000101 "AFG"    "GOV"    "10" 
20000101 "AFG"    "GOV"    "4" 
20000101 "AFGGOV" "kasUAF" "3"
20000101 "FRA" "kasUAF"    "8" 
20000101 "AFG"    "IGOUNO" "3" 
20000101 "AFG"    "IND"    "4" 
20000101 "AFG"    "IND"    "12"
20000102 "AFG"    "IND"    "19"  
end

Date is the day, Source is the country that initiated aggression, Target is the victim, and cameocode is the variable of concern which states the degree of hostility or cooperation. If the number is between 10 and 10, that is a hostility event, 20 being the more hostile. IF the number is between 0 and 9, that indicates cooperation (good event), 9 being the friendliest.

I have managed with some help from this platform to isolate the event per country, namely to isolate the cameo codes involving a certain amount of countries (im interested in 30) to be able to understand their conflict evolution through time. To do so, I did this:

 foreach c in AFG IND ARE {
generate ind_`c' = cameocode_01 if strmatch(source_01, "`c'") |  ///
                                   strmatch(target_01, "`c'")
}

Which yields this, as I desired:

date         source       target        cameocode      ind_AFG   ind_IND   ind_ARE

1. 20000101         AFG      AFGGOV            2         2                    
2. 20000101         AFG         IND            4         4            4           
3. 20000101         AFG      AFGGOV            8         8                     
4. 20000101         AFG         ARE            3         3              36       
5. 20000101         AFG         CVL            4         4                     
6. 20000101         AFG         GOV            10       10                      
7. 20000101         AFG         GOV            4        4                      
8. 20000101      AFGGOV      kasUAF            3                               
9. 20000101      AFGGOV      kasUAF            8                                
10.20000101         AFG      IRQ               12       12                     

 11.20000102         AFG         IND           19       19           19           

Whenever a given country is involved as either recipient or initiator, I create a new variable isolating that specific event and its intensity for a given date.

So far so good.

Now what I want to do is to be able to create a standardized measure or ratio where for each date, the sum of conflict measures (numbers from 10 to 20) are divided to by the sum of the coopeartion measures (numbers from 1-9) for each country.

So my desired output for this table above for AFG 20000101 (5th column) would be (12+19)/(2+4+8+3+4+4)

And I would like to repeat this for each date for each of the variables ind_COUNTRY CODE, to in this way have one number per day per country.

Is there a way to do this? Thank you and I hope I was clear enough.

Aucun commentaire:

Enregistrer un commentaire