python - How can I use a column as index to find words in another column using SparkSQL? -


my dataframe this:

enter image description here

and want use list of indexes top5 find corresponding word in words.

for example,if in first row, words [i ,am , ,student, how, about, you] , top5 [5,4,0,1,2] want new column word form words index number of top5, result i , , a, how, about. how can make it?

since number of values in top5 fixed can use bracket notation or getitem. using example question:

from pyspark.sql.functions import col, array  df = sc.parallelize([     (["i", "am", "a", "student", "how", "about", "you"], [5, 4, 0, 1, 2]) ]).todf(["words", "top5"]) 

you can either:

df.select([col("words")[col("top5")[i]] in range(5)]) 

or:

df.select([col("words").getitem(col("top5")[i]) in range(5)]) 

with both giving same result:

+--------------+--------------+--------------+--------------+--------------+ |words[top5[0]]|words[top5[1]]|words[top5[2]]|words[top5[3]]|words[top5[4]]| +--------------+--------------+--------------+--------------+--------------+ |         about|           how|             i|            am|             a| +--------------+--------------+--------------+--------------+--------------+ 

if wan array column wrap 1 of above using array function:

df.select(array(*[     col("words").getitem(col("top5")[i]) in range(5) ]).alias("top5mapped")) 
+----------------------+ |top5mapped            | +----------------------+ |[about, how, i, am, a]| +----------------------+ 

Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -