Below is an example of what is in a log file. I'm just trying to read the
logs and dump the fields into a database.
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:21 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
AAE9C2CCFFE5E6DB REST.GET.ACL - "GET /?acl HTTP/1.1" 200 - 556 - 488 - "-" "-"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:24 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
66FB31B05AFA84E9 REST.GET.LOGGING_STATUS - "GET /?logging HTTP/1.1" 200 - 244
- 171 - "-" "-"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:56 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
40AC4747CFF7ACFD REST.GET.BUCKET - "GET / HTTP/1.1" 200 - 1298 - 15 12 "-"
"Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:56 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
5938B6855868E040 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1298 - 642 473
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:33 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
16F565F75362B5A8 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1298 - 508 293
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:33 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
D61C9201C46617CF REST.PUT.OBJECT testFile.zip "PUT /testFile.zip HTTP/1.1"
200 - - 17428 334 11 "-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:34 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
B2FEB30917A1F050 REST.GET.BUCKET - "GET / HTTP/1.1" 200 - 1634 - 181 15 "-"
"Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:34 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
B41FCF38CD590562 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1634 - 15 13 "-"
"Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:11 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
C42BF5C887E61F18 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1634 - 476 299
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:12 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
A590228971F16081 REST.PUT.OBJECT testFile.zip "PUT /testFile.zip HTTP/1.1"
200 - - 1487163 20298 48 "-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:32 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
6528418F2CCABB59 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1969 - 312 309
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:33 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
EE65B98BD633E32C REST.GET.BUCKET - "GET / HTTP/1.1" 200 - 1969 - 16 14 "-"
"Amazon S3 CSharp Library"
Stanimir Stoyanov said:
I am sure there is *more* elegant solution to the problem, can you post a
sample log output, and do you want to get the individual words out of the
log?
E.g. if the log line is
[28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion word goes here!
would you like to have the timestamp, "Test", "with", etc as separate
matches? If so, you could split the text using string.Split() once you have
the actual log text (see my previous code example for the 'log text' case).
--
Stanimir Stoyanov
http://stoyanoff.info
M1iS said:
I was hoping to avoid taking the time to create a regular expression as
there
are 17 fields per S3 record. It took me a while but here is what I ended
up
with:
(.*?)(\s+)(.*?)(\s+)(\[.*?\])(\s+)((?
?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\d])(\s+)(.*?)(\s+)(.*?)(\s+)(.*?)(\s+)(.*)(\s+)(".*?")(\s+)(.*?)(\s+)(.*?)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(".*?")(\s+)(".*?")
Yuck, I'd rather being doing about a million other things, but oh well
problem solved.
Stanimir Stoyanov said:
Hi Scott,
I personally would use Regular Expressions to split the words in a smart
way. Below is a sample console application to demonstrate it. The regular
expression \[.*\]\s*|.+ means that it can select from two alternatives:
a) Text wrapped inside [ and ]
b) Any other text (your actual server log)
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string expr = @"\[.*\]\s*|.+";
string line = "[28/Oct/2008:21:44:21 +0000] Test with
p~nctuat!ion
word goes here!";
Regex regex = new Regex(expr);
foreach (Match m in regex.Matches(line))
{
string value = m.Value.Trim();
if (value.StartsWith("[") && value.EndsWith("]"))
{
// This is part of the timestamp
Console.WriteLine("TEST: time = " + value);
}
else
{
// This is an actual slice of the result
Console.WriteLine("TEST: word = " + value);
}
}
Console.Read();
}
}
I’m trying to parse out Amazon S3 server logs which are space
delimited.
However date fields are in the following form:
[28/Oct/2008:21:44:21 +0000]
When I try to use the following code to split the record on the spaces
it
also splits date field:
string[] fields = record.Split(' ');
What can I do to get around this?
Scott