samedi 30 septembre 2017

Does testing .includes() on flattened array before testing on various subarrays improve performance in js?

I am writing a program that checks whether particular keywords come back in a string, using .includes(). I have categorised the keywords in a json object, mostly because I want separate counts per category.

My first approach was to loop through each word in the text, and run an if-statement for each array in the keywords object. This resulted in 6 different if-statements for each word in the text, which I figured might not be very efficient, especially because many words in the text do not match any of the words in any of the keywords arrays.

I then decided to check whether it would be better to flatten my keywords object to a single array, and to check whether a word matched any of the words in the flattened array before moving on to the more specific arrays of keywords.

I have included a simplified example below:

The list of keywords:

{
  "category1": {
    "subcategory1": [
      "keyword",
      "keyword"
    ],
    "subcategory2": [
      "keyword"
    ]
  },
  "category2" : {
    "subcategory1" : [
      "keyword",
      "keyword",
      "keyword"
    ],
    "subcategory2" : [
      "keyword",
      "keyword"
    ]
  },
  "category3": [
    "keyword",
    "keyword",
    "keyword"    
  ]
}

Now, for the second approach, I flattened the json object (keywordList) to a number of arrays, then reducing it to a single array (keywordListArray) using reduce(). I then included an if-statement that would filter out any of the words that were in neither of the arrays, before executing more specific tests.

 for (let property in text) {

    if (keywordListArray.includes(property)) {
    // Will this improve performance?

        if (keywordList.category.subcategory.includes(property)) {
          result.category.subcategory ++;
        }
        if (keywordList.category.otherSubcategory.includes(property)) {
          result.category.subcategory ++;
        }

    }
  }

I then checked the execution time of each approach. I have provided a simplified example, but in my case, my keywords object consisted of 6 different array with about 10 keywords each. The input text is about 200 characters long, and will probably return some 15 matches with the keywords.

text of 200 words:

Execution time without flattened array and prior filter (approach 1): 10-12 ms Execution time with flattened array and prior filter (approach 2): 9-11 ms

I have also tested with 400 words, but there is barely any difference in execution time.

I was wondering which approach you guys would recommend, both in terms of writing 'good code' and in terms of performance?

Two assumptions to get started:

  • The more words of the text that match a keyword, the more redundant the prior filter using the flattened array.
  • The more categories (arrays) in the json object, the more if statements, and the slower the approach without prior filter.

Is this true, and do you guys expect there to be a large performance difference when employing this in a larger scale project?

Thanks in advance! Daan

Aucun commentaire:

Enregistrer un commentaire