google-bigquery - google bigquery - Parsing fields out of textPayload in stackdriver logging - answerstu - answerstu.com answerstu

google bigquery - Parsing fields out of textPayload in stackdriver logging

Here is a line from tomcat access log:

127.0.0.1 - - [24/May/2016:17:53:05 -0700] "POST /users HTTP/1.1" 200 10676

Is there a way to parse out the various fields here like client IP, HTTP request method, request path, response code, etc., and load it into separate columns in a BigQuery table?

The table at the bottom of this page links to the fluent catch-all config, but I believe different logs should be parsed out and loaded differently in a configurable manner for ease of querying?

Or am I missing something fundamental here?

2 Answers

  1. Philip- Reply

    2019-11-14

    it might be not what you mean - but just guess :

    How about loading log into GBQ table so each log line becomes row in table and than parse it into another table with something like below (code is not pretending to be optimal - just to show idea)

    SELECT 
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){0} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){1} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){2} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){3} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){4} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){5} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){6} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){7} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){8} (.+?) '),
      REGEXP_EXTRACT(log_line, r'(?: (?:.+?)){9} (.+?) '),
    FROM (
      SELECT ' ' + REGEXP_REPLACE(log_line, r'[\[\]\"]', '') + ' ' AS log_line 
      FROM 
        (SELECT '127.0.0.1 - - [24/May/2016:17:53:05 -0700] "POST /users HTTP/1.1" 200 10676' AS log_line)
    )
    
  2. Phoebe- Reply

    2019-11-14

    Can you just import it as a "CSV"? BigQuery allows you to specify custom delimiters and quote characters.

    127.0.0.1 - - [24/May/2016:17:53:05 -0700] "POST /users HTTP/1.1" 200 10676

    It looks like you could provide a single space as a quote character and double-quote as the (optional) quote. I would expect the example line above to parse into:

    "127.0.0.1", "-", "-", "[24/May/2016:17:53:05", "-0700]", "POST /users HTTP/1.1", 200, 10676

    The timestamp + timezone gets mangled a bit and you'd need to initially import some of the fields as strings, but you could post-process with a query (as in Mikhail's answer) to fix it up and avoid having yet-another-tool to configure and manage.

Leave a Reply

Your email address will not be published. Required fields are marked *

You can use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>