vendredi 22 novembre 2019

Calculating shortest euclidean distance for each ID with other IDs in the same dataframe

I have a python pandas dataframe that has two lists of coordinates. Mx and my are the reference coordinates and x and y are arbitrary coordinates.

ID mx my x y
1 1 3 1 3
2 2 4 5 2
2 2 4 1 2
2 2 4 2 5
3 3 2 6 2
3 3 2 7 1

Notice how the count of mx, my and IDs are same? What I'm trying to do here is for example, take the first ID 1, look at the other IDs and loop through all the arbitrary points of those OTHER IDs (this means exclude the one that I am currently looking at, that is in this case, the first ID 1) and calculate the shortest euclidean distance.

So with ID 1, the closest (x, y) point to the reference point of ID 1 that is (mx = 1, my = 3) is (x = 1, y = 2) which the euclidean distance is 5.385164. With ID 2, the nearest (x, y) point to the reference point that is (mx = 2, my = 4) is (x = 1, y = 3) which the euclidean distance is 7.615773.

Doing this for the rest of the IDs as well, what I had in mind would be a column that would look like this:

ID mx my x y Dist.
1 1 3 1 3 5.385164807
2 2 4 5 2 7.615773106
2 2 4 1 2 7.615773106
2 2 4 2 5 7.615773106
3 3 2 6 2 5.656854249
3 3 2 7 1 5.656854249

So seeing that I have three unique identifiers, the output should only be three shortest distances. I'm trying to automate the process instead of having manually input what ID needs to be looked at so I would think that a loop and/or if statement would be incorporated through the unique identifiers but I have tried many times and have not succeeded. I have also tried dictionaries too as well as splitting it up into two separate dataframes too and comparing them; neither have been successful thus far.

The solution to my question need not necessarily be in the format of another column; it can be in a list or array too. Any way that works is fine. Any help would be much appreciated! Thank you!

Aucun commentaire:

Enregistrer un commentaire