How do I implement a fast way to filter a large array of 20k objects?

I’m building an Ionic app. The app contains an array of 20k “word” objects with the word and pronunciation. This data is all local and is used via a service.

There’s a search page with a search input that on keydown makes a new search request and updates the “DOM.” The time it takes to run my search function takes a few seconds depending on the search, so the user can’t even type multiple characters without delay.

I’ve tried just running the function without the DOM aspect and it’s still slow.

I’ve even tried rewriting my .filter() function as a for loop with various limits for the number of results, but if there isn’t a match, the whole 20k array of items will be searched.

Is there a faster way to make this search function, or do I just need to reduce my dataset? Is there a good tutorial for something like this?

Hello,
array functions are mostly fast, so it is difficulty what does your code slow down without seeing the code.

Maybe it helps, if you have a sorted array,using array.findIndex(whatever) and show following entrys or…

Best regards, anna-liebt

Maybe you can try use a slice of your array.
look this article, I think this can help you

You might want to look at the reselect package. Build dictionaries and MemoizedSelectors.

I wrote something like this (I’m trying a JS search engine now, so I don’t have the exact code.)

let dictionary = [];
let dictionaryItem {
  word: "cat",
  type: "noun"
}

dictionary.push(dictionaryItem)

function search(searchTerm, limit=10) {
  let results = [];
  for(let i = 0; results.length < limit; i ++) {
    let item = this.dictionary[i]
    if(item["word"].startsWith(searchTerm)) {
      results.push(item)
    }
  }
  return results
}

Performance is awful when dictionary.length > 20k

Index of the search items are unkown. Not sure how .slice() helps me here.

Debounce the keystrokes

And split the array using the first character into chunks and let the search function method use those to create the resulting searchset

SmartDataset:Object=
{
A:[array of words starting with A],
Etc
}

In average this way u only need to search 20000/26=800 items when in search mode, otherwise return the full set (kept in separate datastructure)

So trading memory to imrpove cpu performance

This does not make any sense!

Instead try:

function search(searchTerm) {
  let results = [];
  for(let i = 0; this.dictionary.length i ++) {
    if(this.dictionary[i].word.startsWith(searchTerm)) {
      results.push(this.dictionary[i])
    }
  }
  return results
}

I would imagine using filter would be more performant.

Something like

results = this.dictionary.filter(entry => entry.word.startsWith(searchTerm))

Or mapping it

results = this.dictionary.map(entry => {
if (entry.word.startsWith(searchTerm)) {
   return entry;
   }
 })

I’d guess filter would be fastest

Just noticed this. Apparently you already tried a plain old filter. Perhaps adding

results = this.dictionary.filter(entry => entry.word !== “ “ && entry.word.startsWith(searchTerm))

Not sure how my code doesn’t make any sense. The only weird thing is that when I pasted the code in the for loop the less than rendered as < . It’s to keep the results from being too long;

1 Like

Your question got me curious so I took another look and found this.
https://www.npmjs.com/package/aho-corasick

No idea how good an implementation that is, but your situation sounds ideal for the algorithm in general.

The code:

for(let i = 0; results.length < limit; i ++) {
    let item = this.dictionary[i]
    if(item["word"].startsWith(searchTerm)) {
      results.push(item)
    }
  }

Is not looping through the array but only the first ten items (as defined by the limit.

Read it again. It loops until results has 10 items.

1 Like

This is using 0 to the value of the limit (10) - that means it is only ever going to look at the first 11 items in the array.

Actually this would be my solution

1 . Amend the dictionary array as follows:

{
  word: "Hippopotamus",
  type: "noun",
  search: "hippo"
}

This now includes a extra field which is a fixed length (5) and lower case. The function would be:

function search(searchTerm) {
     var searchtext = searchTerm.substring(0, 5).toLowerCase();
     return this.dictionary.filter((item)=>{return item.search===searchtext });
}

I also found myself curious after reading this post, and feel like the most viable solution is what @AaronSterling suggested. Maybe not that particular solution, but it seems like you would need assistance from some sort of library / helper npm package.

In the end, “filtering 20,000 objects” and “fast” doesn’t seem viable without some outside help.

@JAR19 might be on to something in the sense that adding some sort of additional property to streamline the process would be helpful. Maybe separating arrays by “noun” and “verb”, etc

Hello!

Sorry, I don’t get it :slight_smile:
Here is small fiddle to demonstrate my test: https://jsfiddle.net/0v4p9yjv/

I add 20k random words in the dictionary then look for a string of 3 letters. The solution is found in general in less than 10 ms…

Here is the search function:

function findWord(start){
	return dictionary.filter(function(e){
  	return e.word.startsWith(start);
  })
}

I even try to make all the words starts with the same letters: solution is found in 20ms so still acceptable (without limiting the number of output which should be even faster)

Let me know

2 Likes

UPDATE FROM OP:
I moved my function from running on keydown on my search bar to a specific search button. I’m not sure why, but this drastically increased performance to < a second.

I ended up using ArnaudIze’s solution.

1 Like

What if the base of words is not 20 thousand, but 50 or 100?
It is not necessary to keep such amounts of data on mobile devices.
Now you have a search on startWith, but if you search with contains? How to quickly replenish this database to the end user?

I see here two options:

  1. still store data on the server, use there for example an elastic search, from the client to receive data requests, at least on startWith, though on contains, even spelling a word with errors (via n-grams)
  2. if you still store data offline, you can store words in different tables, for example, depending on the first letter (two, three …).
    Also it is possible together with a base of words to drag the base of n-grams and to do a search on it.