A regex disaster

Posted on 02/07/2021

So, I had a minor disaster. This is what went wrong and how I fixed it.

We use the amazing WordPress Redirection plugin. We recently started a survey and the best way to contact the participants was by letter. We had to include a URL to the survey. Some people were mistyping the URL. For example:

/direct-payments-servey /direct-payments-urvey /directpayment-survery/ /directpaymentsurvey /directpaymentssurvey

The correct URL is /direct-payments-survey/

To fix this, at 8:10am this morning, I threw up a very hasty regex redirect and went to have my breakfast, slapping myself on the back. It matches all the errors and, I thought, would catch most other typos. Here it is:

^/(directpayment|direct-payment).*

Problem is it also caught the target URL and an endless redirect ensued. The page was down for 9 hours.

After my trials a little while back trying to get to grips with not matching strings in a regex I had a good idea for how to fix it.

^/(?!direct-payments-survey)(directpayment|direct-payment).*

It was that easy. Here is the regex on regex101 as usual.

The silver lining here is that I now have a very reusable fix when I need to match something very close to the target URL. I’ve had this problem in the past and often just created a completely different URL. Even then this was not foolproof as WordPress keeps it’s own records of old URLs and redirects.

AWS Cloudwatch and fail2ban logs

Posted on 30/04/2021

Phil

Well, who knew Cloudwatch would be so much fun to tinker with?! Not me!

I have been slowly refining my Cloudwatch dashboard: adding new alarms, expanding the scope of the log watch, all that good stuff. It is very satisfying. Over the last week or so I have also set-up fail2ban because (according to my audit log via Cloudwatch ?) sshd was getting hammered. As previously mentioned, this box is not well resourced, so I wanted to nip that in the bud. But does the cost of running fail2ban outweigh the benefits? Hard to say!

Anyway, I am getting quite a lot of email from fail2ban. This is good because I know it is working but I’d rather not have the email and still be able to easily check it was working… so Cloudwatch!

I added the fail2ban log to the config and used the Logs Insights tool to explore. This is typical line:

2021-04-30 17:58:06.631, "2021-04-30 18:58:06,208 fail2ban.filter [100432]: INFO [sshd] Found 205.185.119.236 - 2021-04-30 18:58:05"

We could use the date/time a few more times, right? I decided this was the time to jump into the parse command in the CloudWatch Logs Insights query language (rolls off the tongue that). I knew I was going to need another regex within about 10 seconds. But, damn, if the examples aren’t thin on the ground. I googled and found virtually nothing although this post did help a bit.

So, to regex101.com I went. I exported a few lines from the log to test and I must be getting quite a lot better because I got the basics working pretty quickly:

\[sshd\]\ (?<action>[a-zA-z]*)\ (?<ip_address>[\d\.]*)

Then this query in Cloudwatch Logs Insights did the job:

parse @message /\[sshd\]\ (?<action>[a-zA-z]*)\ (?<ip_address>[\d\.]*)/

| display @timestamp, action, ip_address

| limit 200

Unfortunately, I find reading timestamp pretty hard so a bit more tinkering:

parse @message /(?<date>\d\d\d\d-\d\d-\d\d)\ (?<time>\d\d:\d\d:\d\d).*\[sshd\]\ (?<action>[a-zA-z]*)\ (?<ip_address>[\d\.]*)/

| display date, time, action, ip_address

| limit 200

Excellent! It was running for about 5 minutes and it suddenly produced a blank line. Of course, [sshd] in the log refers to the jail. I have several set up so…

parse @message /(?<date>\d\d\d\d-\d\d-\d\d)\ (?<time>\d\d:\d\d:\d\d).*\[(?<jail>sshd|recidive|mysqld-auth)\]\ (?<action>[a-zA-z]*)\ (?<ip_address>[\d\.]*)/

| display date, time, jail, action, ip_address

| limit 200

And that does the job nicely at the moment. You can find an explanation of the regex on regex101.

Once I am a bit more confident, I’ll start filtering on the action, so I can just see bans and unbans:

| filter action = "Ban" or action ="Unban"

More RegEx action

Posted on 28/04/2021

Phil

Had a bit of trouble with what should have been a simple regex today.

I removed a whole bunch of Pages from our site today, and they were all sat under the same parent. I wanted to redirect the now missing pages to the parent. I came up with this:

^\/dignity-care-reports\/[a-z-]*(\/?)

It worked fine in that it redirected the sub-pages but the parent was now in a redirect loop. So, I headed over to regex101.com to investigate. I quickly found that, yes, this regex did match the parent URL.

The great thing about regex101 is that it gives an explanation of what each of your tokens is doing and I quickly saw:

* matches the previous token between zero and unlimited times

And there’s the problem! Now I just needed any sort of match after that second / to stop the redirect loop and a ‘+’ does the job:

^\/dignity-care-reports\/[a-z-]+(\/?)

You can see the example here: https://regex101.com/r/IfIPpn/1

Now I just need to remember that * can match zero times!

Ruby-Quartz Glasses

A place to share solutions, aspirations and achievements

Tag: regex

A regex disaster

AWS Cloudwatch and fail2ban logs

More RegEx action

My Profile

Your Profile