I believe I have a simple problem. This is a data sample of a literature work that I would like to divide up:
WholeBook = "Random info - at beginning-man. "+ ...
"Random info still continues. "+ ...
"CHAPTER 1 " + ...
"1 This is sentence one of verse one, "+ ...
"This still sentence one of verse one. "+ ...
"2 This is sentence one of verse two. "+ ...
"This is sentence two of verse two. "+ ...
"3 This is sentence one of verse three; "+ ...
"this still sentence one of verse three. "+ ...
"CHAPTER 2 " + ...
"Random info in middle two. "+ ...
"Random info still continues again. "+ ...
"1 This is sentence four? "+ ...
"2 This is sentence five, "+ ...
"3 this still sentence five but verse three!"+ ...
"Random info at end's end.";
I would like to divide the following data in a table like this (This is how the solution should look):
However, my current solution looks like this:
Thus row 1 is incorrect, but row 2 is correct. Otherwise said, my solution works if there is indeed information after "CHAPTER #", but not if there is no information. This is the code that produced this solution:
[tokens, RandomInfoMiddle] = regexp(WholeBook, '(CHAPTER \d)\s*(.*?)1', 'tokens', 'match');
RandomInfoMiddle = RandomInfoMiddle';
RandomInfoMiddle = regexprep(RandomInfoMiddle,'CHAPTER \d+ (.+) \d$','$1'); %Delete "Chapter+Nr" + ...1
% To explain the regular expression (CHAPTER \d)\.\s*(.*?)1:
% (CHAPTER \d) matches CHAPTER with any number, and the () brackets surrounding it will capture the match in the tokens variable.
% \. matches the period
% \s* matches any possible whitespace
% (.*?)1 will capture any text till the next 1 in the text. Note the question mark to make it match lazy, otherwise it will match all the text till the last 1 in str.
Please help me find a solution as described in the first picture/table. (I suspect the use of an if statement coupled with the correct regexp expression.)
All help is appreciated.


Aucun commentaire:
Enregistrer un commentaire