if-statement: Check if string exists in multiple csv files and write row to file

jeudi 8 mars 2018

Check if string exists in multiple csv files and write row to file

I got this project where I want to check if an e-mail address exists in two or more csv files. The number of files can vary, also their prefix, but they will always be stored in the same directory.

I need help with the following

A method for finding a match in two or more files.
Search a whole directory at once
write all the rows where the matching address exists to a new file.
Point me in the direction where I can put this to use in a script where I can use it with a "if" statement and together with a webb app.

I have had a look at

extracting rows from CSV file based on specific keywords

But that would require me to know what e-mail address I am looking for which I don´t.

For the one with loads of time, in the essay below you can find what I have "achieved" so far and example of original file and desired output.

Example of original file that will be checked. The number of rows can vary. The e-mail address can also occasionally be found in other columns than column 1. Therefore maybe a keyword method is suggested? This is something I have not yet accomplished.

example.csv
IP ADDRESS, FIRST TIME LOGGED IN, LAST TIME LOGGED IN, USERNAME
192.168.1.1 , 2018-03-07 11:33:22, 2018-03-07 11:33:28, Federov
E-MAIL ADDRESS, FIRST TIME LOGGED IN, LAST TIME LOGGED IN, USERNAME, 
schultz@mail.com, 2018-03-07 09:33:22, 2018-03-07 11:33:28, Boris Becker

The desired outcome is something like the below, both for the saved file and for the webb app.

Result.csv
Match
E-MAIL ADDRESS, FIRST TIME LOGGED IN, LAST TIME LOGGED IN, USERNAME
schultz@mail.com, 2018-03-07 09:33:22, 2018-03-07 11:33:28, Boris Becker
schultz@mail.com, 2017-01-07 14:56:12, 2018-01-18 18:44:03, McEnroe

This is what I got so far:

I tried putting my "step by step" method into a string. I ran this string in a folder where I had two .csv files with one matching address. However I received zero, nothing, nada.. No error message and no nothing in the file. The string looks like the following:

    awk '/E-MAIL/{y=1;next}y' *.csv | awk '{print $1}' FS="," | awk 'FNR==NR{arr[$1];next}$1 in arr{print $1,"match"}' > results.csv

Step by step it works but it´s a grueling job doing this for every file. I also have to create new files to make it work.

    awk '/E-MAIL/{y=1;next}y' file-0A.csv > /test/file-0B.csv`

    awk '{print $1}' FS="," file-0B.csv > /test/file-1A.csv

    awk 'FNR==NR{arr[$1];next}$1 in arr{print $1,"match"}' file-1A.csv file-1B.csv > /test/results.csv

Except for being ridiculous tedious and probably plain stupid, this method or at least in it´s current state, only allow a match to be done between two files, adding a third will make it look like the match needs to be found in all three files not in any two which is required...

Also, the current method ( if you even can call it a method) does not allow to have the additional information together with the e-mail address when doing the match step, since this will then match for example date or time.. I have not the knowledge either to use this output for a "if" statement..

The OS is Raspian Stretch with root privileges.

I apologize if I have not included any vital information, misspelled or put this question in a wrong way.

Any help is very much appreciated!

if-statement

jeudi 8 mars 2018

Check if string exists in multiple csv files and write row to file

Aucun commentaire:

Enregistrer un commentaire