Visualizing POS

Khulood Nasher
4 min readDec 19, 2020

In a previous blog, I explained what is Part of Speech (POS).Please visit my blog on POS. Today, I will talk about how to visualize POS. To follow with the python code please click on my Github.We can do the visualization work on spaCy through displacy class. So we will import spacy and download the English language, we will also create our text as nlp object. Then we will import the ‘displacy’ class from spaCy as follows:

To visualize my POS, we have to render the dependency parse through the ’style’ attribute. The ‘render’ method from displacy can work on jupyter by setting the parameter ‘jupyter’ equals to ‘True’ as follows:

By setting the style on ‘dep’ which is dependency parse, the relation between words can be shown for the above text and also the syntax structure as follows:

Here, I was only able to take a screenshot on part of my text visualization. To see the whole visualization, we have the parameter ‘option’ on the ‘render’ method. By setting the distance between words on options equals to 80 % for example,I can collect the whole visualization on my screenshot as follows:

Explaining the Syntactic Dependency

As we can see from the above visualization the part of speech tags are defined for each word, and the relation which is the dependency is also shown in the visualization.The relationship can be shown through curves as above or through lines as I will show later. The word ‘need’ is recognized as the most important word in the above sentence therefore the arrows were emitted from it toward other words on its right and on its left. The word ‘need’ is recognized as a ‘verb’ and the word ‘will’ has a dependency of auxiliary verb ’aux’ to it. But the dependency between ‘you’ and ‘need’ is ‘nsubj’ meaning nominal subject. There is a ‘dobj’ dependency between the verb ‘need’ and the noun ‘shots’ which means that the word ’shots’ is a direct object ’dobj’ for the verb ‘need’, while the word ‘two’ has a dependency of ‘nummod’ which means numeric modifier to the word ‘shots’. The word ‘most’ and the word ‘COVID-19’ have a dependency of ‘amod’ with the word ‘vaccines’ which means that both words are an adjective modifiers to the word ‘vaccines’.

The dependency parse can show the coarse POS tag for each token, as well as the dependency tag if given:

Above, we can learn more about the dependency between words through the attribute’.explain(token.dep_)’ where we can see the definition of each dependency tag.

Visualization Options

Previously we used the modifier ‘options’ on our visualization rendering to change the distance between tokens. In addition to playing with the distance option, spaCy has other options such as color of the text, the color of the background, and whether it is curve shape or compacted to a flat shape, the font of the text, and more. We have to define each option inside a dictionary. For example, I can define a distance between tokens to be 90, and make a compact visualization by setting the ‘compact’ value to a string of True. The text color and the background ’bg’ can take any string of color or a string of a hex code. The font can be defined according to certain fonts that are available for spaCy such as Time New Roman that is defined as ‘Times’ as follows:

As we can see above , I define the options modifier as a dictionary in the visualization rendering. The plot is compacted to a flat, the font color and background is changed according to my definition that is different from default.

But what about if I was given a large text, how can I visualize it?

Well, that is still possible through spaCy library. This is going to be in a new blog about sentence segmentation.

--

--

Khulood Nasher

Data Scientist,Health Physicist, NLP Researcher, & Arabic Linguist. https://www.linkedin.com/in/khuloodnasher https:/khuloodnasher1.wixsite.com/resume