python - How can I use a column as index to find words in another column using SparkSQL? -

my dataframe this:

and want use list of indexes top5 find corresponding word in words.

for example,if in first row, words [i ,am , ,student, how, about, you] , top5 [5,4,0,1,2] want new column word form words index number of top5, result i , , a, how, about. how can make it?

since number of values in top5 fixed can use bracket notation or getitem. using example question:

from pyspark.sql.functions import col, array  df = sc.parallelize([     (["i", "am", "a", "student", "how", "about", "you"], [5, 4, 0, 1, 2]) ]).todf(["words", "top5"])

you can either:

df.select([col("words")[col("top5")[i]] in range(5)])

or:

df.select([col("words").getitem(col("top5")[i]) in range(5)])

with both giving same result:

+--------------+--------------+--------------+--------------+--------------+ |words[top5[0]]|words[top5[1]]|words[top5[2]]|words[top5[3]]|words[top5[4]]| +--------------+--------------+--------------+--------------+--------------+ |         about|           how|             i|            am|             a| +--------------+--------------+--------------+--------------+--------------+

if wan array column wrap 1 of above using array function:

df.select(array(*[     col("words").getitem(col("top5")[i]) in range(5) ]).alias("top5mapped"))

+----------------------+ |top5mapped            | +----------------------+ |[about, how, i, am, a]| +----------------------+

Search This Blog

Brent

python - How can I use a column as index to find words in another column using SparkSQL? -

Comments

Post a Comment

Popular posts from this blog

ios - Change Storyboard View using Seague -

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -