vendredi 30 octobre 2020

Execute a series of commands on each IP address in a file in bash

I am looking to analyse a log file for IP addresses which accessed a specific number of web pages in less than a specific time frame and append the IP addresses to a file.

The log file (output.csv) has been modified and uses the following format:

29/Oct/2020:07:41:42|111.111.111.111|200|/page-a/
29/Oct/2020:08:30:40|000.111.000.111|200|/page-a/ 
29/Oct/2020:08:30:44|000.111.000.111|200|/page-b/
29/Oct/2020:08:30:45|000.111.000.111|200|/page-c/
29/Oct/2020:08:30:47|000.111.000.111|200|/page-d/
29/Oct/2020:08:30:48|000.111.000.111|200|/page-e/

To get the time difference in seconds between a specific number of instances of an IP address, I used the following set of commands:

egrep "000.111.000.111" output.csv | awk 'BEGIN{FS="|"; ORS=" "} NR==1 || NR==5 {print $1,$2}' | sed -e 's/[\/:]/\ /g' -e 's/Jan/1/g' -e 's/Feb/2/g' -e 's/Mar/3/g' -e 's/Apr/4/g' -e 's/May/5/g' -e 's/Jun/6/g' -e 's/Jul/7/g' -e 's/Aug/8/g' -e 's/Sep/9/g' -e 's/Oct/10/g' -e 's/Nov/11/g' -e 's/Dec/12/g' | awk '{print $3,$2,$1,$4,$5,$6 "," $10,$9,$8,$11,$12,$13","$14}' | awk -F, '{d2=mktime($2);d1=mktime($1);print d2-d1, $3}'  | awk '{if($1<15)print $2}' >> file.txt

What this is supposed to achieve:

  • search for IP in output.csv
  • where possible, show the 1st and 5th line where this IP appears, printing the date/time & IP
  • remove separators "/" & ":" in date and time
  • change dates to a numerical format
  • reorder the date and time, and change format to read date/time, date/time, IP address
  • print the difference in sec between the first and second date/time on each line
  • append the IP address to file.txt if the time (in seconds) is less than 15.

If 5 pages are accessed in under 15 seconds by the given IP address, the command above appends the IP to a file.

I would like to run this command on every IP address in the file.

The desired result is a file with a list of IP addresses which all accessed the server at a rate of more than 5 pages in 14 seconds (timing can be adjusted).

What I have tried...

I attempted to use egrep -f with a list the IP addresses in the same sequence (a shot in the dark):

egrep -f ip-list output.csv | xargs

This failed miserably, as you might expect — with awk stating that it can not find a file with the name of the given IP address.

I also created a list of files for each set of IP addresses:

awk '{print >  "ip_"$1}' ip-list.txt

...but alas, I had no look iterating through them (I am a bit green at looping and bash scripting).

Apologies in advance if I have worded my question badly or if my attempts are somewhat primitive or inefficient.

Help would be most appreciated.

Thank you.

Aucun commentaire:

Enregistrer un commentaire