I'm sorting some of my old documents and trying to make some order in them, possibly put them in more readable format, so I'm writing some scripts to fix them.
Since I only know basic regex I thought this would be great opportunity to learn more but I'm kind of stuck.
Here are my data samples:
Adapter za uređaje sa navojem M52x0.75 (Dedal, Jahnke) - 34 mm
Prednja baza za nosač skidajući (Švenk) - Antonio Zoli: 1900
Prednja baza za nosač skidajući (Švenk) - CZ 550, 557, 537, ZKK, 600, 601, 602 (19 mm prism)
Redukcijski prsten za Dedal monokular M-54X - 44 mm
Prsteni - prizma 11 - 25.4 mm, matica, H12
Prsteni - prizma 16,5 - 25.4 (26) mm, matica, H14
Stražnja nožica za nosač skidajući (Švenk), H10.3
Prednja nožica za nosač skidajući (Švenk), H10, KR10
Nosač s etažom - prizma 16,5 (CZ 527) - 30 mm, ručica, H15
Nastavak za čišćenje (mesing) M4 - kalibar .22
And here is what I expect to get grouped together (marked groups with []):
[Adapter za uređaje sa navojem M52x0.75 (Dedal, Jahnke)] - [34 mm]
[Prednja baza za nosač skidajući (Švenk)] - [Antonio Zoli: 1900]
[Prednja baza za nosač skidajući (Švenk)] - [CZ 550, 557, 537, ZKK, 600, 601, 602 (19 mm prism)]
[Redukcijski prsten za Dedal monokular M-54X] - [44 mm]
[Prsteni - prizma 11 - 25.4 mm, matica], [H12]
[Prsteni - prizma 16,5 - 25.4 (26) mm, matica], [H14]
[Stražnja nožica za nosač skidajući (Švenk)], [H10.3]
[Prednja nožica za nosač skidajući (Švenk), H10], [KR10]
[Nosač s etažom - prizma 16,5 (CZ 527) - 30 mm, ručica], [H15]
[Nastavak za čišćenje (mesing) M4] - [kalibar .22]
Basically, the rule I have envisioned goes as follows:
- if string ends with
H(num)
orKR(num)
than select all until(H|KR)(num)
in one group and(H|KR)(num)
in another group
otherwise
- select everything till last occurrence of
-
or,
in first group and everything after in second group (match with-
takes priority over,
)
Here are my regexs for 1 and 2:
(.+), ((?:H|KR)[0-9\.]+)$
(.*)(?: -) (.*)|(.*)(?:,) *(.*)
Now I just need to test if (?:H|KR)[0-9\.]+$
matches and than choose 1
or 2
accordingly but I don't know how.
I have found that it can be done with (?(?=regex)then|else)
but when I incorporate my solution it doesn't work. Here it is:
(?(?=(?:H|KR)[0-9\.]+$)(.+), ((?:H|KR)[0-9\.]+)$|(?:(.*)(?: -) (.*)|(.*)(?:,) *(.*)))
I can do that test inside script but I thought I'd try it with regex. I would also appreciate more elegant form of my solution if someone can find it :D
Thanks!
Aucun commentaire:
Enregistrer un commentaire