Skip to content Skip to sidebar Skip to footer

Select Rows Where Array Contains One Of Several Values In Bigquery (ideally With Dbplyr)

I have a large set of tweets on bigquery and now want to filter those that contain at least one of a list of hashtags. The hashtags are saved in an array column (uploaded from a li

Solution 1:

The difficult part here is that your hashtags column is of type list or array. As per this question dbplyr translation for more advanced data types like arrays does not appear to be well established.

Two alternative approaches:

  1. Convert your hashtags to a character string and use text search (grep).

  2. Write a bigquery query as a character string in R and attach it to an existing connection. Here is an example:

db_connection = DBI::dbConnect( ... )# connect to database
remote_tbl = dplyr::tbl(db_connection, from ="remote_table_name")# build SQL query
sql_query <- glue::glue("SELECT *\n","FROM (\n","{dbplyr::sql_render(remote_tbl)}\n",") alias\n",
  
new_remote_table = dplyr::tbl(db_connection, dbplyr::sql(sql_query))

Post a Comment for "Select Rows Where Array Contains One Of Several Values In Bigquery (ideally With Dbplyr)"