mardi 30 mai 2017

Row by Row (pandas) - If Column A = 'Something' and Column B > 25 Then Column C = "Category"

I'm trying to have a script go Row by Row in Python using (pandas). I want it so that if Column A = 'Something' and Column B > 25 then write to Column C = "Category".

I have a Month Column and a Day Column. So, for example:

When Month = August and Day >= 25 then Week = August 25

I tried a couple of things, neither worked...

First I tried:

import os               ### OS library is imported.
import pandas as pd     ### Pandas library is imporated as 'pd'.

counter = 1             ### Counter starts at the first iteration.

while os.path.exists("CSV-Iteration-"'{0}'"/".format(counter)):     ### Runs the loop until all iteration's folders have been processed.

    a = pd.read_csv("output-"'{0}'".csv".format(counter))           ### Sets 'a' dataframe as holding data from a CSV file.
    a['Week'] = ""

    a[(a['Month'] is 'June') & (a['Day'] < 25)]['Week'] = 'June 18'
    a[(a['Month'] is 'June') & (a['Day'] >= 25)]['Week'] = 'June 25'
    a[(a['Month'] is 'July') & (a['Day'] < 2)]['Week'] = 'June 25'
    a[(a['Month'] is 'July') & (a['Day'] >= 2) & (a['Day'] < 9)]['Week'] = 'July 2'
    a[(a['Month'] is 'July') & (a['Day'] >= 9) & (a['Day'] < 16)]['Week'] = 'July 9'
    a[(a['Month'] is 'July') & (a['Day'] >= 16) & (a['Day'] < 23)]['Week'] = 'July 16'
    a[(a['Month'] is 'July') & (a['Day'] >= 23) & (a['Day'] < 30)]['Week'] = 'July 23'
    a[(a['Month'] is 'July') & (a['Day'] >= 31) & (a['Day'] < 16)]['Week'] = 'July 30'
    a[(a['Month'] is 'August') & (a['Day'] < 6)]['Week'] = 'July 30'

    a[(a['Month'] is 'August') & (a['Day'] >= 6) & (a['Day'] < 13)]['Week'] = 'August 6'
    a[(a['Month'] is 'August') & (a['Day'] >= 13) & (a['Day'] < 20)]['Week'] = 'August 13'
    a[(a['Month'] is 'August') & (a['Day'] >= 20) & (a['Day'] < 27)]['Week'] = 'August 20'
    a[(a['Month'] is 'August') & (a['Day'] >= 27)]['Week'] = 'August 27'
    a[(a['Month'] is 'September') & (a['Day'] < 3)]['Week'] = 'August 27'

    a[(a['Month'] is 'September') & (a['Day'] >= 3) & (a['Day'] < 10)]['Week'] = 'September 3'
    a[(a['Month'] is 'September') & (a['Day'] >= 10) & (a['Day'] < 17)]['Week'] = 'September 10'
    a[(a['Month'] is 'September') & (a['Day'] >= 17) & (a['Day'] < 24)]['Week'] = 'September 17'
    a[(a['Month'] is 'September') & (a['Day'] >= 24)] = 'September 24'

    a[(a['Month'] is 'October') & (a['Day'] >= 1) & (a['Day'] < 8)]['Week'] = 'October 1'
    a[(a['Month'] is 'October') & (a['Day'] >= 8) & (a['Day'] < 15)]['Week'] = 'October 8'
    a[(a['Month'] is 'October') & (a['Day'] >= 15) & (a['Day'] < 22)]['Week'] = 'October 15'
    a[(a['Month'] is 'October') & (a['Day'] >= 22) & (a['Day'] < 29)]['Week'] = 'October 22'
    a[(a['Month'] is 'October') & (a['Day'] >= 29)]['Week'] = 'October 29'
    a[(a['Month'] is 'November') & (a['Day'] < 5)]['Week'] = 'October 29'

    a[(a['Month'] is 'November') & (a['Day'] >= 5) & (a['Day'] < 12)]['Week'] = 'November 5'
    a[(a['Month'] is 'November') & (a['Day'] >= 12) & (a['Day'] < 19)]['Week'] = 'November 12'
    a[(a['Month'] is 'November') & (a['Day'] >= 19) & (a['Day'] < 26)]['Week'] = 'November 19'
    a[(a['Month'] is 'November') & (a['Day'] >= 26)]['Week'] = 'November 26'
    a[(a['Month'] is 'December') & (a['Day'] < 3)]['Week'] = 'November 26'

    a[(a['Month'] is 'December') & (a['Day'] >= 3) & (a['Day'] < 10)]['Week'] = 'December 3'
    a[(a['Month'] is 'December') & (a['Day'] >= 10) & (a['Day'] < 17)]['Week'] = 'December 10'
    a[(a['Month'] is 'December') & (a['Day'] >= 17) & (a['Day'] < 24)]['Week'] = 'December 17'
    a[(a['Month'] is 'December') & (a['Day'] >= 24) & (a['Day'] < 31)]['Week'] = 'December 24'
    a[(a['Month'] is 'December') & (a['Day'] >= 31)]['Week'] = 'December 31'
    a[(a['Month'] is 'January') & (a['Day'] < 7)]['Week'] = 'December 31'

    a[(a['Month'] is 'January') & (a['Day'] >= 7) & (a['Day'] < 14)]['Week'] = 'January 7'
    a[(a['Month'] is 'January') & (a['Day'] >= 14) & (a['Day'] < 21)]['Week'] = 'January 14'
    a[(a['Month'] is 'January') & (a['Day'] >= 21) & (a['Day'] < 28)]['Week'] = 'January 21'
    a[(a['Month'] is 'January') & (a['Day'] >= 28)]['Week'] = 'January 28'

    a.to_csv("TESToutput-"'{0}'".csv".format(counter), index=False)         ### 'a' dataframe becomes 'TESToutput-#.csv' and does not print fields for indexing (index=False).

    counter += 1        ### Adds 1 to the counter.

print 'Date Corrections - All Done!'

Then I tried:

import os               ### OS library is imported.
import pandas as pd     ### Pandas library is imporated as 'pd'.

counter = 1             ### Counter starts at the first iteration.

while os.path.exists("CSV-Iteration-"'{0}'"/".format(counter)):     ### Runs the loop until all iteration's folders have been processed.

    a = pd.read_csv("output-"'{0}'".csv".format(counter))           ### Sets 'a' dataframe as holding data from a CSV file.
    a['Week'] = ""

    def this_week (row):
        if row[(a['Month'] is 'June') + (a['Day'] < 25)]:
            return 'June 18'
        if row[(a['Month'] is 'June') + (a['Day'] >= 25)]:
            return 'June 25'
        if row[(a['Month'] is 'July') + (a['Day'] < 2)]:
            return 'June 25'
        if row[(a['Month'] is 'July') + (a['Day'] >= 2) + (a['Day'] < 9)]:
            return 'July 2'
        if row[(a['Month'] is 'July') + (a['Day'] >= 9) + (a['Day'] < 16)]:
            return 'July 9'
        if row[(a['Month'] is 'July') + (a['Day'] >= 16) + (a['Day'] < 23)]:
            return 'July 16'
        if row[(a['Month'] is 'July') + (a['Day'] >= 23) + (a['Day'] < 30)]:
            return 'July 23'
        if row[(a['Month'] is 'July') + (a['Day'] >= 31) + (a['Day'] < 16)]:
            return 'July 30'
        if row[(a['Month'] is 'August') + (a['Day'] < 6)]:
            return 'July 30'
        if row[(a['Month'] is 'August') + (a['Day'] >= 6) + (a['Day'] < 13)]:
            return 'August 6'
        if row[(a['Month'] is 'August') + (a['Day'] >= 13) + (a['Day'] < 20)]:
            return 'August 13'
        if row[(a['Month'] is 'August') + (a['Day'] >= 20) + (a['Day'] < 27)]:
            return 'August 20'
        if row[(a['Month'] is 'August') + (a['Day'] >= 27)]:
            return 'August 27'
        if row[(a['Month'] is 'September') + (a['Day'] < 3)]:
            return 'August 27'
        if row[(a['Month'] is 'September') + (a['Day'] >= 3) + (a['Day'] < 10)]:
            return 'September 3'
        if row[(a['Month'] is 'September') + (a['Day'] >= 10) + (a['Day'] < 17)]:
            return 'September 10'
        if row[(a['Month'] is 'September') + (a['Day'] >= 17) + (a['Day'] < 24)]:
            return 'September 17'
        if row[(a['Month'] is 'September') + (a['Day'] >= 24)]:
            return 'September 24'
        if row[(a['Month'] is 'October') + (a['Day'] >= 1) + (a['Day'] < 8)]:
            return 'October 1'
        if row[(a['Month'] is 'October') + (a['Day'] >= 8) + (a['Day'] < 15)]:
            return 'October 8'
        if row[(a['Month'] is 'October') + (a['Day'] >= 15) + (a['Day'] < 22)]:
            return 'October 15'
        if row[(a['Month'] is 'October') + (a['Day'] >= 22) + (a['Day'] < 29)]:
            return 'October 22'
        if row[(a['Month'] is 'October') + (a['Day'] >= 29)]:
            return 'October 29'
        if row[(a['Month'] is 'November') + (a['Day'] < 5)]:
            return 'October 29'
        if row[(a['Month'] is 'November') + (a['Day'] >= 5) + (a['Day'] < 12)]:
            return 'November 5'
        if row[(a['Month'] is 'November') + (a['Day'] >= 12) + (a['Day'] < 19)]:
            return 'November 12'
        if row[(a['Month'] is 'November') + (a['Day'] >= 19) + (a['Day'] < 26)]:
            return 'November 19'
        if row[(a['Month'] is 'November') + (a['Day'] >= 26)]:
            return 'November 26'
        if row[(a['Month'] is 'December') + (a['Day'] < 3)]:
            return 'November 26'
        if row[(a['Month'] is 'December') + (a['Day'] >= 3) + (a['Day'] < 10)]:
            return 'December 3'
        if row[(a['Month'] is 'December') + (a['Day'] >= 10) + (a['Day'] < 17)]:
            return 'December 10'
        if row[(a['Month'] is 'December') + (a['Day'] >= 17) + (a['Day'] < 24)]:
            return 'December 17'
        if row[(a['Month'] is 'December') + (a['Day'] >= 24) + (a['Day'] < 31)]:
            return 'December 24'
        if row[(a['Month'] is 'December') + (a['Day'] >= 31)]:
            return 'December 31'
        if row[(a['Month'] is 'January') + (a['Day'] < 7)]:
            return 'December 31'
        if row[(a['Month'] is 'January') + (a['Day'] >= 7) + (a['Day'] < 14)]:
            return 'January 7'
        if row[(a['Month'] is 'January') + (a['Day'] >= 14) + (a['Day'] < 21)]:
            return 'January 14'
        if row[(a['Month'] is 'January') + (a['Day'] >= 21) + (a['Day'] < 28)]:
            return 'January 21'
        if row[(a['Month'] is 'January') + (a['Day'] >= 28)]:
            return 'January 28'

    a['Week'] = a.apply (lambda row: this_week (row), axis=1)

    a.to_csv("TESToutput-"'{0}'".csv".format(counter), index=False)         ### 'a' dataframe becomes 'TESToutput-#.csv' and does not print fields for indexing (index=False).

    counter += 1        ### Adds 1 to the counter.

print 'Date Corrections - All Done!'

The second gives me this error: "IndexingError: ('Unalignable boolean Series key provided', u'occurred at index 0')"

I'm very new to Python so I put these together based on what I've read in the forums. Please let me know if there is a simpler way to do it or if there is a correction or addition to make one of these two scripts work.

Thanks!

Aucun commentaire:

Enregistrer un commentaire