vendredi 14 août 2020

Python: how can I use a for loop to skip rows?

The main problem I encounter is that each url gets visited using requests, while I want to skip urls where the information inside that url visited does not adhere to my system datetime.year (which is of course 2020). Using for in combination with continue looks like a solution, since "the continue statement forces the loop to continue or execute the next iteration." GeeksforGeeks.org

As an example take the following two datasets:

#df1 is the original frame with all urls ever posted in each series
df1=pd.DataFrame({'Series':['Economics','Economics','Economics',
                            'Business','Business',
                            'Tech','Tech','Tech'],
                 'Url':['Eurl1','Eurl2','Eurl3',
                        'Burl1','Burl2',
                        'Turl1','Turl2','Turl3']})

#df2 is the 'new' frame, but now with postyear added since each url is sequential visited
df2=pd.DataFrame({'Series':['Economics','Economics','Economics',
                            'Business','Business',
                            'Tech','Tech','Tech'],
                 'Url':['Eurl1','Eurl2','Eurl3',
                        'Burl1','Burl2',
                        'Turl1','Turl2','Turl3'],
                 'Postyear':[2020,2019,2019,
                             2020,2019,
                             2020,2019,2019]})

What I look to solve is a conditional loop that examines when Postyear != 2020 within df2, the rows are skipped to the next series of df1, since I only know the urls beforehand but the postyear only after visiting the url. So the final result will be (meanwhile skipping / not visiting Eurl3, and Turl3):

Series,Url,Postyear
Economics,Eurl1,2020
Business,Burl1,2020
Tech,Turl1,2020
Tech,Turl2,2020

Aucun commentaire:

Enregistrer un commentaire