I have three arrays from which I would like to eliminate inconsistent data. There are two inconsistencies that I want to find:
- unavailable data that is marked at ':' in the dataset
- the row data (e.g. a country) must exist in all arrays. If this is not the case, the data is is not consistent for analysis
First, I tried to specify what the inconsistencies are in the arrays. Then, I tried to create three for-loops to analyse each array. Subsequently, I wanted to state when rows will be eliminated based on the found inconsistencies.
By trying, I found two problems:
- The first problem considers the length of arrays. The length of the four arrays varies. Although I sorted the arrays alphabetically, it seems difficult to find if a country exists in different arrays while they may be at different locations (i.e. i=12 and j=14). How can I check whether countries are available in the array regardless of the index
- I should use i,j,k in the loop, but I have no idea how to put it such that it finds the inconsistencies
My code:
nodata = ':';
invalid = any(pop(:,1) =~ gdp(:,1) | pop(:,1) ~= fp(:,1) | gdp(:,1) ~= fp(:,1))
for i = 1:length(pop)
for j= 1:length(gdp)
for k = 1:length(fp)
if (:,2:end == nodata) | (:,1 == invalid)
% Delete entire row = []
end
end
end
end
I know this code does not work. But what it should do is eliminate every row in which inconsistent data is.
Aucun commentaire:
Enregistrer un commentaire