I am looking to analyse a log file for IP addresses which accessed a specific number of web pages in less than a specific time frame and append the IP addresses to a file.
The log file (output.csv) has been modified and uses the following format:
29/Oct/2020:07:41:42|111.111.111.111|200|/page-a/
29/Oct/2020:08:30:40|000.111.000.111|200|/page-a/
29/Oct/2020:08:30:44|000.111.000.111|200|/page-b/
29/Oct/2020:08:30:45|000.111.000.111|200|/page-c/
29/Oct/2020:08:30:47|000.111.000.111|200|/page-d/
29/Oct/2020:08:30:48|000.111.000.111|200|/page-e/
To get the time difference in seconds between a specific number of instances of an IP address, I used the following set of commands:
egrep "000.111.000.111" output.csv | awk 'BEGIN{FS="|"; ORS=" "} NR==1 || NR==5 {print $1,$2}' | sed -e 's/[\/:]/\ /g' -e 's/Jan/1/g' -e 's/Feb/2/g' -e 's/Mar/3/g' -e 's/Apr/4/g' -e 's/May/5/g' -e 's/Jun/6/g' -e 's/Jul/7/g' -e 's/Aug/8/g' -e 's/Sep/9/g' -e 's/Oct/10/g' -e 's/Nov/11/g' -e 's/Dec/12/g' | awk '{print $3,$2,$1,$4,$5,$6 "," $10,$9,$8,$11,$12,$13","$14}' | awk -F, '{d2=mktime($2);d1=mktime($1);print d2-d1, $3}' | awk '{if($1<15)print $2}' >> file.txt
What this is supposed to achieve:
- search for IP in output.csv
- where possible, show the 1st and 5th line where this IP appears, printing the date/time & IP
- remove separators "/" & ":" in date and time
- change dates to a numerical format
- reorder the date and time, and change format to read date/time, date/time, IP address
- print the difference in sec between the first and second date/time on each line
- append the IP address to file.txt if the time (in seconds) is less than 15.
If 5 pages are accessed in under 15 seconds by the given IP address, the command above appends the IP to a file.
I would like to run this command on every IP address in the file.
The desired result is a file with a list of IP addresses which all accessed the server at a rate of more than 5 pages in 14 seconds (timing can be adjusted).
What I have tried...
I attempted to use egrep -f with a list the IP addresses in the same sequence (a shot in the dark):
egrep -f ip-list output.csv | xargs
This failed miserably, as you might expect — with awk stating that it can not find a file with the name of the given IP address.
I also created a list of files for each set of IP addresses:
awk '{print > "ip_"$1}' ip-list.txt
...but alas, I had no look iterating through them (I am a bit green at looping and bash scripting).
Apologies in advance if I have worded my question badly or if my attempts are somewhat primitive or inefficient.
Help would be most appreciated.
Thank you.
Aucun commentaire:
Enregistrer un commentaire