I have a diff result like below from Python's diff lib-
- "text": "abc xyz efg "
+ "text": "abc xyz efg"
- "header": true,
? ^^^
+ "header": false,
? ^^^^
- "text": "1.1 bacdefg"
- },
- {
- "header": false,
- "This is one example sentence which needs to be extracted."
? -------------- -
+ "This is one example sentence which needs to be extracted."
? + ++++++++++
- "This is one example sentence which needs to be extracted."
? -------------- -
+ "This is one example sentence which needs to be extracted."
? ++++++++++ +
+ "header": true,
+ "text": "some text"
I need to extract lines on three ways-
- which starts with "-" and the following line starts with "?" in a list named updated
- which starts with "+" and no "?" in the consecutive sentence in a list named deleted
- which starts with "-" and no "?" in the consecutive sentence in a list named inserted
header : true / false can be ignored
I am new to Python and I somehow managed to parse two PDFs to JSONs and do a diff using difflib but unable to write a for loop and if condition to look for the consecutive lines of only text fields.
EDIT-
diffile=[]
diff = difflib.Differ()
for line in diff.compare(f1_text, f2_text):
#json.dump(line,f, indent=2)
if line.startswith(("-", "+", "?")):
diffile.append(line)
updated=[]
for i in range(len(diffile) - 1):
value = diffile[i:i+2]
for line in value:
if line.startswith (("-")) and line.startswith ("?"):
updated.append (line)
I did the above but I am not able to extract only the next line starting with "?" after "-". It is giving me all the lines starting with "?".
Aucun commentaire:
Enregistrer un commentaire