Photo by Clément Hélardot on Unsplash
Extract Domain Name from URL using Python
Simple useful code using Python
Introduction
I needed to extract the domain name from a string containing URLs. The problem was that the string did not contain standard URL. It could have it as
http://www.domain.com/
https://www.domain.com/
www.domain.com/
domain.com/
And with page links and querystring for the above combinations.
And some would be malformed as http.//www.domain.com, http;//www.domain.com. etc
Code to extract the domain name
import tldextract
def extract_domain_from_string(url):
try:
# Use tldextract library to extract domain
extracted = tldextract.extract(url)
domain = extracted.registered_domain
if domain:
return domain
else:
raise ValueError("Invalid URL: {}".format(url))
except Exception as e:
print("Error:", e)
return None
If there are any other alternative code or better code, please do suggest them on the comments.