mardi 25 août 2015

Regex if than else explanation

I'm sorting some of my old documents and trying to make some order in them, possibly put them in more readable format, so I'm writing some scripts to fix them.

Since I only know basic regex I thought this would be great opportunity to learn more but I'm kind of stuck.

Here are my data samples:

Adapter za uređaje sa navojem M52x0.75 (Dedal, Jahnke) - 34 mm
Prednja baza za nosač skidajući (Švenk) - Antonio Zoli: 1900
Prednja baza za nosač skidajući (Švenk) - CZ 550, 557, 537, ZKK, 600, 601, 602 (19 mm prism)
Redukcijski prsten za Dedal monokular M-54X - 44 mm
Prsteni - prizma 11 - 25.4 mm, matica, H12
Prsteni - prizma 16,5 - 25.4 (26) mm, matica, H14
Stražnja nožica za nosač skidajući (Švenk), H10.3
Prednja nožica za nosač skidajući (Švenk), H10, KR10
Nosač s etažom - prizma 16,5 (CZ 527) - 30 mm, ručica, H15
Nastavak za čišćenje (mesing) M4 - kalibar .22

And here is what I expect to get grouped together (marked groups with []):

[Adapter za uređaje sa navojem M52x0.75 (Dedal, Jahnke)] - [34 mm]
[Prednja baza za nosač skidajući (Švenk)] - [Antonio Zoli: 1900]
[Prednja baza za nosač skidajući (Švenk)] - [CZ 550, 557, 537, ZKK, 600, 601, 602 (19 mm prism)]
[Redukcijski prsten za Dedal monokular M-54X] - [44 mm]
[Prsteni - prizma 11 - 25.4 mm, matica], [H12]
[Prsteni - prizma 16,5 - 25.4 (26) mm, matica], [H14]
[Stražnja nožica za nosač skidajući (Švenk)], [H10.3]
[Prednja nožica za nosač skidajući (Švenk), H10], [KR10]
[Nosač s etažom - prizma 16,5 (CZ 527) - 30 mm, ručica], [H15]
[Nastavak za čišćenje (mesing) M4] - [kalibar .22]

Basically, the rule I have envisioned goes as follows:

  1. if string ends with H(num) or KR(num) than select all until (H|KR)(num) in one group and (H|KR)(num) in another group

otherwise

  1. select everything till last occurrence of - or , in first group and everything after in second group (match with - takes priority over ,)

Here are my regexs for 1 and 2:

  1. (.+), ((?:H|KR)[0-9\.]+)$
  2. (.*)(?: -) (.*)|(.*)(?:,) *(.*)

Now I just need to test if (?:H|KR)[0-9\.]+$ matches and than choose 1 or 2 accordingly but I don't know how.

I have found that it can be done with (?(?=regex)then|else) but when I incorporate my solution it doesn't work. Here it is:

(?(?=(?:H|KR)[0-9\.]+$)(.+), ((?:H|KR)[0-9\.]+)$|(?:(.*)(?: -) (.*)|(.*)(?:,) *(.*)))

I can do that test inside script but I thought I'd try it with regex. I would also appreciate more elegant form of my solution if someone can find it :D

Thanks!

Aucun commentaire:

Enregistrer un commentaire