I got this project where I want to check if an e-mail address exists in two or more csv files. The number of files can vary, also their prefix, but they will always be stored in the same directory.
I need help with the following
- A method for finding a match in two or more files.
- Search a whole directory at once
- write all the rows where the matching address exists to a new file.
- Point me in the direction where I can put this to use in a script where I can use it with a "if" statement and together with a webb app.
I have had a look at
extracting rows from CSV file based on specific keywords
But that would require me to know what e-mail address I am looking for which I don´t.
For the one with loads of time, in the essay below you can find what I have "achieved" so far and example of original file and desired output.
Example of original file that will be checked. The number of rows can vary. The e-mail address can also occasionally be found in other columns than column 1. Therefore maybe a keyword method is suggested? This is something I have not yet accomplished.
example.csv
IP ADDRESS, FIRST TIME LOGGED IN, LAST TIME LOGGED IN, USERNAME
192.168.1.1 , 2018-03-07 11:33:22, 2018-03-07 11:33:28, Federov
E-MAIL ADDRESS, FIRST TIME LOGGED IN, LAST TIME LOGGED IN, USERNAME,
schultz@mail.com, 2018-03-07 09:33:22, 2018-03-07 11:33:28, Boris Becker
The desired outcome is something like the below, both for the saved file and for the webb app.
Result.csv
Match
E-MAIL ADDRESS, FIRST TIME LOGGED IN, LAST TIME LOGGED IN, USERNAME
schultz@mail.com, 2018-03-07 09:33:22, 2018-03-07 11:33:28, Boris Becker
schultz@mail.com, 2017-01-07 14:56:12, 2018-01-18 18:44:03, McEnroe
This is what I got so far:
I tried putting my "step by step" method into a string. I ran this string in a folder where I had two .csv files with one matching address. However I received zero, nothing, nada.. No error message and no nothing in the file. The string looks like the following:
awk '/E-MAIL/{y=1;next}y' *.csv | awk '{print $1}' FS="," | awk 'FNR==NR{arr[$1];next}$1 in arr{print $1,"match"}' > results.csv
Step by step it works but it´s a grueling job doing this for every file. I also have to create new files to make it work.
awk '/E-MAIL/{y=1;next}y' file-0A.csv > /test/file-0B.csv`
awk '{print $1}' FS="," file-0B.csv > /test/file-1A.csv
awk 'FNR==NR{arr[$1];next}$1 in arr{print $1,"match"}' file-1A.csv file-1B.csv > /test/results.csv
Except for being ridiculous tedious and probably plain stupid, this method or at least in it´s current state, only allow a match to be done between two files, adding a third will make it look like the match needs to be found in all three files not in any two which is required...
Also, the current method ( if you even can call it a method) does not allow to have the additional information together with the e-mail address when doing the match step, since this will then match for example date or time.. I have not the knowledge either to use this output for a "if" statement..
The OS is Raspian Stretch with root privileges.
I apologize if I have not included any vital information, misspelled or put this question in a wrong way.
Any help is very much appreciated!
Aucun commentaire:
Enregistrer un commentaire