Booom...........

LabNotes and Random Thoughts...............

Sunday, November 13, 2016

high paying low stress jobs

for time pass I parsed a slideshow from time.com about an article on high paying low stress job. some seaborn plots to describe the data - 'Stress tolerance is measured by the Bureau of Labor Statistics and Occupational Information Network, with lower scores indicating less stress on the job.' I need to check their website... high_paying_low_stress_jobs

high paying low stress jobs

  • taken from an article from time.com
In [1]:
import requests
import bs4
In [2]:
from string import punctuation
exclude = set(punctuation)
In [3]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set()
In [4]:
url="http://time.com/4081673/high-paying-low-stress-jobs/"
In [5]:
rs = requests.get(url)
In [6]:
rs.status_code
Out[6]:
200
In [7]:
#rs.content
In [8]:
# with open('high_paying_low_stress_jobs.html', 'wb') as fd:
#     for chunk in rs.iter_content():
#         fd.write(chunk)
    
In [9]:
# Make a soup obj
soup = bs4.BeautifulSoup(rs.text,'lxml')
In [10]:
h2 = soup.find_all('h2', {'class':'article-item-title'})
p1 =soup.find_all('p', {'class':'p1'})
p2 = soup.find_all('p', {'class':'p2'})
p4 = soup.find_all('p', {'class':'p4'})
In [11]:
# information
soup.find_all('p',{'class':'p3'})[0].get_text()
Out[11]:
'Stress tolerance is measured by the Bureau of Labor Statistics and Occupational Information Network, with lower scores indicating less stress on the job.'
In [12]:
bd = soup.find_all('section', {'class':'article-item-body'})

There are 24 Jobs listed on times

  • Jobs
  • Stress Tolerance
  • Average Salary
  • Remarks are parsed into differnt list and then to a pandas data frame
In [13]:
jobs=[]
for i in h2:
    jobs.append( i.get_text())
In [14]:
stress = []
aas = []
remarks =[]
for e,i in enumerate(bd):
    ii = i.get_text() 
    for i in ii.split('\n'):
        if i.startswith("Stress tolerance:"):
            st = float( i.split(": ")[1] )
            stress.append(st)
        elif i.startswith("Average"):
            aa = i.split(": ")[1]
            aa = int( ''.join(ch for ch in aa if ch not in exclude) )
            aas.append(aa)
        elif i.startswith("What"):
            rem = i.split(": ")[1]
            remarks.append(rem)
        else:pass
In [15]:
df = pd.DataFrame(data={'J':jobs, 'ST':stress, 'Sal':aas, 'Remarks':remarks})
In [16]:
df.head()
Out[16]:
J Remarks ST Sal
0 Materials Scientists Research and study substances at the atomic an... 53.0 94350
1 Mathematicians Conduct research in fundamental mathematics or... 57.2 104350
2 Geographers Study the nature and use of areas of Earth’s s... 58.0 75610
3 Economists Economists study the production and distributi... 58.7 105290
4 Statisticians Use statistical methods to collect and analyze... 59.0 84010
In [17]:
df.describe()
Out[17]:
ST Sal
count 24.000000 24.000000
mean 62.683333 104860.416667
std 4.117795 25309.101212
min 53.000000 75440.000000
25% 60.800000 91607.500000
50% 62.250000 103060.000000
75% 65.125000 110230.000000
max 70.300000 201030.000000
In [18]:
# making a new dataframe
df2 =  df[['J','ST','Sal']]
df2.set_index(df2.J, inplace=True,drop=True)
del df2['J']
In [19]:
df2.head()
Out[19]:
ST Sal
J
Materials Scientists 53.0 94350
Mathematicians 57.2 104350
Geographers 58.0 75610
Economists 58.7 105290
Statisticians 59.0 84010
In [20]:
g = sns.jointplot("ST","Sal", data=df2, kind="reg", color="r", size=6)
/home/rosaak/.virtualenvs/p3/lib/python3.5/site-packages/statsmodels/nonparametric/kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
In [21]:
ax = sns.regplot('Sal','ST',data=df2)
In [22]:
df2 = df2.sort_values(by = 'Sal')
ax = sns.barplot(x=df2.Sal, y=df2.index, data=df2 )
In [23]:
df2 = df2.sort_values(by ='ST')
ax = sns.barplot(x=df2.ST, y=df2.index, data=df2 )
In [24]:
# getting the data frame where Stress Tolerence is > 65
df3 = df[df.ST >65].sort_values('Sal')
    
In [25]:
df3
Out[25]:
J Remarks ST Sal
18 Hydrologists Study how water moves across and through Earth... 65.5 81930
21 Art Directors Art directors are responsible for the visual s... 69.0 97850
22 Marine Engineers and Naval Architects Design, build, and maintain ships, including a... 69.6 99160
20 Computer Hardware Engineers Research, design, develop, or test computer or... 67.0 110650
23 Optometrists Optometrists perform eye exams to check for vi... 70.3 113010
19 Orthodontists Examine, diagnose, and treat dental misalignme... 67.0 201030
In [26]:
for e,i in enumerate( df3.index, start=1):
    print(e, df3.loc[i][0],":\t",df3.loc[i][1], end='\n\n')
1 Hydrologists :  Study how water moves across and through Earth’s crust. They can use their expertise to solve problems in the areas of water quality or availability.

2 Art Directors :  Art directors are responsible for the visual style and images in magazines, newspapers, product packaging, and movie and television productions.

3 Marine Engineers and Naval Architects :  Design, build, and maintain ships, including aircraft carriers, submarines, sailboats, and tankers. Marine engineers work on the mechanical systems, such as propulsion and steering. Naval architects work on the basic design, including the form and stability of hulls.

4 Computer Hardware Engineers :  Research, design, develop, or test computer or computer-related equipment for commercial, industrial, military, or scientific use.

5 Optometrists :  Optometrists perform eye exams to check for vision problems and diseases. They prescribe eyeglasses or contact lenses as needed.

6 Orthodontists :  Examine, diagnose, and treat dental misalignments and oral cavity anomalies, design appliances to realign teeth and jaws.