lsys / forestplot Goto Github PK
View Code? Open in Web Editor NEWA Python package to make publication-ready but customizable coefficient plots.
Home Page: http://forestplot.rtfd.io
License: MIT License
A Python package to make publication-ready but customizable coefficient plots.
Home Page: http://forestplot.rtfd.io
License: MIT License
Is there any way to pass an existing axis to forestplot? This would enable subplots.
There seems to be a bug where if you have duplicate values in separate groupings the plot does not show some of the rows.
import sys
import forestplot as fp
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
print(
f"numpy version: {sys.version}",
f"pandas version: {pd.__version__}",
f"matplotlib version: {matplotlib.__version__}",
f"forestplot version: {fp.__version__}",
sep='\n'
)
# numpy version: 3.8.1 (default, Feb 3 2020, 12:44:18)
# [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
# pandas version: 1.4.3
# matplotlib version: 3.4.2
# forestplot version: 0.3.1
def create_data():
group_a = pd.DataFrame({'name': ['name_a', 'name_b'], 'estimate': [1.1, 1.0]})
group_a['Lower CI'] = group_a['estimate'] - 0.05
group_a['Upper CI'] = group_a['estimate'] + 0.05
group_a['group'] = "group_a"
group_b = group_a.copy()
group_b['group'] = 'group_b'
groups = pd.concat([group_a, group_b], axis=0)
# group_a["group"] = "group_a"
return groups
df = create_data()
display(df)
print("Missing part of the plot")
fp.forestplot(df,
estimate='estimate', varlabel='name', ll="Lower CI", hl="Upper CI", groupvar="group")
plt.show()
# print("Still missing part of the plot")
# df.loc[df['group'] == 'group_b', ['estimate', "Lower CI", "Upper CI"]] += 0.001
# fp.forestplot(df,
# estimate='estimate', varlabel='name', ll="Lower CI", hl="Upper CI", groupvar="group")
# plt.show()
print("Now it works")
df.loc[df['group'] == 'group_b', ['estimate', "Lower CI", "Upper CI"]] += 0.01
fp.forestplot(df,
estimate='estimate', varlabel='name', ll="Lower CI", hl="Upper CI", groupvar="group")
plt.show()
Dear all,
Thanks so much for such amazing tool.
My question is how can I change the font size for groupvar ?
When I change the fontsize value, nothing happen for groupvar in term of font size.
Any suggestion please?
best
The "Confidence interval" ylabel
and the "P-value" headers have different height and fontsize:
import pandas as pd
import forestplot as fp
df = fp.load_data("sleep")
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # columns containing conf. int. lower and higher limits
varlabel="label", # column containing variable label
pval="p-val", # Column of p-value to be reported on right
ylabel="Confidence interval", # ylabel to print
)
I would like to use different fonts (e.g. Helvetica, Arial) to match journal font requirements for publication-ready figures.
A potential fix could be to use matplotlib.pyplot.table to render the table text to allow left-align with any font.
The pandas
append
backend seems to have been deprecated:
When running the following code I get a plot without variable names and confidence intervals. I'm using running matplotlib 3.6.2, numpy 1.22.4 and pandas 1.5.1
`import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import forestplot as fp
N = 10
ids = ["Number is: " + str(x) for x in range(N) ]
coef = np.random.uniform(0.4,2, size=N)
lower = coef - np.random.uniform(size=N)
upper = coef + np.random.uniform(size=N)
data = pd.DataFrame({"c":coef, "lower": lower, "upper": upper, "name": ids})
fig = fp.forestplot(data, estimate="c", ll="lower", hl="upper", varlabel="name")
plt.savefig("debug.png")`
Start some regression tests to ensure that old bugs do not resurface in new releases (e.g., the issue raise in #47).
At present, the estimates ("r" in the example dataframe) can only be plotted on a linear scale. Certain estimates (e.g. Odds Ratios (ORs)) are typically plotted onto log-scales.
For example, in forestplot:
Equivalent in 'R':
It would be great if forestplot allowed users the request a log-scale and (ideally) to customize aspects of this.
I followed the install and quick start instructions in the README, but I'm having an issue with excess whitespace being added between the variable labels and the plot. Below are screenshots of the example as well as example with customization 5. The whitespace is added regardless of the width of the overall forest plot.
I am using Jupyter version 6.5.2 and Python 3.9.15 via Anaconda.
Add test from conda-forge
installation to https://github.com/LSYS/forestplot/blob/patch/.github/workflows/nb-pkg.yml.
These 2 files are considered as an important part of any project so we should consider adding them.
Hello, I'm really enjoying this package but have one issue. The 'color_alt_rows' setting colors only the plot but doesn't include the text which still makes it difficult to match up and read easily. Would it possible to shade in the text as well? Thanks!
Hi,
Thanks for this great package. I was wondering if there is an argument to pass the x-tick labels. I'd want to plot odds ratios in the log scale, but label them with their actual OR values. For e.g. I'd like to mark the ticks at [math.log(0.5), math.log(1), math.log(1.5)] and label them as [0.5,1,1.5]
I would like to propose a feature enhancement for the forestplot
package that allows users to specify custom column names instead of the default "Variable" label. This feature would be handy for those looking to compare models or other items that require more descriptive column naming.
Currently, the forestplot
package automatically assigns the name "Variable" to one of its columns. While this works well for general purposes, it needs more flexibility for cases where a different label might be more appropriate or informative (I could change it for my use cases, but it can be difficult for some users).
For example, in one of your sample codes, we can see that the first column of the forestplot is Variable:
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
capitalize="capitalize", # Capitalize labels
pval="p-val", # column containing p-values to be formatted
annote=["n", "power", "est_ci"], # columns to report on left of plot
annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"], # ^corresponding headers
rightannote=["formatted_pval", "group"], # columns to report on right of plot
right_annoteheaders=["P-value", "Variable group"], # ^corresponding headers
xlabel="Pearson correlation coefficient", # x-label title
table=True, # Format as a table
)
I suggest adding an option allowing users to specify their column names. This could be implemented as an additional argument in the relevant function(s), allowing users to choose a custom name that better fits their data or the context of their analysis.
For instance, in scenarios where users are comparing different models (e.g., prediction models like 'Regression', 'Random Forest', 'SVM'), having the ability to label the column as "Prediction Models" or a similar custom name would enhance the readability and relevance of the forestplot.
forestplot
package, making it more versatile for various data analyses.Thank you for considering this feature request. I believe it would be a valuable addition to the forestplot
package and enhance its utility for a wide range of users.
Best regards,
@kamalakbari7
Add note in readme to save figures with the bbox_inches="tight"
option.
Related to #38.
Example with customization #1 is outdated:
import forestplot as fp
df = fp.load_data("sleep")
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
moerror="moerror", # columns containing conf. int. margin of error
varlabel="label", # column containing variable label
capitalize="capitalize", # Capitalize labels
groupvar="group", # Add variable groupings
# group ordering
group_order=["labor factors", "occupation", "age", "health factors",
"family factors", "area of residence", "other factors"],
sort=True, # sort in ascending order (sorts within group if group is specified)
)
Thanks for this great library! I prefer to write my plots in SVG since it has a much better resolution than PNG or JPEG formats. I tried to use plt.savefig()
function but it writes an empty file. Is it not possible to save the plot outputs in svg? It would be great if this feature is added.
Freeze matplotlib-inline
dependency to <= 0.1.3
in setup.py.
Forgot to do this in previous patch.
See #40.
Updating documentation to reflect new changes and fixing errors:
capitalize
option to the examplescapitalize
option to Table of API optionsmoerror
option"family factors'
examples/readme-examples.ipynb
Great package!
Where in the code should I look if I want to modify the plotting so that I can change the color of a line when the corresponding row is statistically significant? I don't want to show the p-values or the stars, I just want to change the line color.
When plotting hazard or odds ratios in forest plots it would be helpful to be able to change the location of x-reference line--to move it from x=0 to x=1.
Thanks for this neat package again!
When I add the "N" column in the table, it is automatically converting it to have one decimal point (example: 100 -> 100.0)
Tried multiple work arounds to modify it to make an integer but couldnt work it out. In the example shown in the manual, it does seem to work fine.
Any help is greatly appreciated.
Updated**
data['n'] = data['n'].astype("string")
Fixed it.
kwargs
for thresholds
and symbols
are not getting passed through to the star_pval
formatter.
For example,
import pandas as pd
import forestplot as fp
df = fp.load_data("sleep")
fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # columns containing conf. int. lower and higher limits
varlabel="label", # column containing variable label
pval="p-val", # Column of p-value to be reported on right
color_alt_rows=True, # Gray alternate rows
ylabel="Est.(95% Conf. Int.)", # ylabel to print
decimal_precision=3,
**{"thresholds":(0.001, 0.01, 0.05)}
)
I have data of about 20 rows. The confidence intervals are between 0 and 5 except for two rows with confidence interval labels ranging between 1.5 and 40. Plotting them all together makes the intervals very narrow because they are compressed by the wide intervals of the two records. I wanted to add a confidence interval limit to make the plot readable. I used xticks=[0,1,5, 10]
and set_xlim(0, 10)
. But this makes my plot bad.
This was the graph without adding set_xlim(0,10)
Hello! Thank you for building this awesome package.
I'm trying to visualize some odds-ratios and their confidence intervals, and I'm running into an issue where the top of the plot seems to get cut off.
I installed forestplot today via conda with conda install -c conda-forge forestplot
, and am running it on Python 3.9.6
.
>>> df
varname or low high
0 a 1.761471 0.926563 3.022285
1 b 1.443660 0.517059 3.130794
2 c 1.525227 0.603459 3.082003
3 d 0.895682 0.463703 1.660016
4 e 4.025670 1.196941 8.932753
5 f 1.086105 0.555171 2.045735
fp.forestplot(df, # the dataframe with results data
estimate='or', # col containing estimated effect size
ll='low', hl='high', # columns containing conf. int. lower and higher limits
varlabel='varname', # column containing variable label
ylabel='Confidence interval', # y-label title
xlabel='Odds Ratio', # x-label title
color_alt_rows=True,
figsize=(4,8),
**{
'xline' : 1.,
"xlinestyle": (0, (10, 5)), # long dash for x-reference line
}
)
plt.savefig('test_forest_plot.png', bbox_inches='tight')
What I'm seeing is the attached image, where the top of the plot (where variable "a" should be) is getting cut off. I've tried adjusting the plot size but the issue persists. Any insight or advice would be appreciated!
Hi @LSYS,
Thank you for this wonderful package!
I was wondering if it is possible to show P-values in scientific notation as very small P-values are displayed as 0.
Thank you!
currently, Chinese Character will not show in the plot and it is the same for korean, japanese and so on. How to change font setting to let it support Chinese? thx in advance
There is a line along the y-axis where x=0. I want this line to be where x=1 or to remove it completely. Is there a way to do this? I've been trying to access it using spines set_position but it didn't work.
Is there a way to move the xaxis line from 0 to 1 or add a new line in Xaxis?
See issue48-table-does-not-work-6rows-or-fewer.ipynb.
df = fp.load_data("sleep")
fp.forestplot(df.head(6), # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
capitalize="capitalize", # Capitalize labels
pval="p-val", # column containing p-values to be formatted
annote=["n", "power", "est_ci"], # columns to report on left of plot
annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"], # ^corresponding headers
rightannote=["formatted_pval", "group"], # columns to report on right of plot
right_annoteheaders=["P-value", "Variable group"], # ^corresponding headers
xlabel="Pearson correlation coefficient", # x-label title
table=True, # Format as a table
)
Thanks for your great work,
I tried running your code with the newest version of matplotlib (3.8) but unfortunately it fails.
Error from matplotlib:
*** AttributeError: 'YTick' object has no attribute 'label'
originating at forsetplot/graph_utils.py
line: pad = max( T.label.get_window_extent(renderer=fig.canvas.get_renderer()).width for T in yax.majorTicks )
Downgrading to matplotlib 3.7.3 solves the issue
Got an install warning:
DEPRECATION: forestplot is being installed using the legacy 'setup.py install' method,
because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip
23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-
pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
It would be good to also push wheels to pypi; since this package is pure python it should be fairly straightforward. I would be happy to contribute to a PR if you would accept.
See https://myst-parser.readthedocs.io/en/stable/syntax/optional.html#syntax-header-anchors
Remove whitespaces at top of the plot (vertical whitespace) using the y-axis limits.
Related to #37.
Would be nice to allow no drawing of the confidence intervals directly from the API. (A hack is to convert the ll and hl to be the same as the estimate.)
One solution is to allow ll
and hl
to be None
.
Another solution is a new option drawci
that is True
by default but can be set to False
.
ll
and hl
no longer required(?)import forestplot as fp
import pandas as pd
df = pd.read_csv("review_example.csv",sep=";") # companion example data
fp.forestplot(df, # the dataframe with results data
estimate='PCSA_Men_mean', # col containing estimated effect size
ll= 'PCSA_Men_Lower', hl='PCSA_Men_Upper', # columns containing conf. int. lower and higher limits
varlabel='Abbreviation', # column containing variable label
capitalize="capitalize", # Capitalize labels
annote=["Source", "Image modality", 'Sample_size',"Method", 'Position'], # columns to report on left of plot
annoteheaders=["Ref", "Modality", 'N',"PCSA", 'Pose'], # ^corresponding headers
rightannote=['Age', 'Height', 'Weight', 'Fiber_length', 'Pennation', "Info"], # columns to report on right of plot
right_annoteheaders=['Age[y]', 'Height[cm]', 'Weight[kg]', 'Fiber_length[cm]', 'Pennation[Deg]', "Note"], #corresponding headers
groupvar= "Agegroup", # column containing group labels
group_order=["Reference","Young Adults","Adults"],
xlabel="PCSA Ratio", # x-label title
xticks=[0,30,60], # x-ticks to be printed
table=True, # Format as a table
color_alt_rows=True, # Gray alternate rows
# Additional kwargs for customizations
**{"marker": "D", # set maker symbol as diamond
"markersize": 35, # adjust marker size
"xtick_size": 12, # adjust x-ticker fontsize
})
#plt.savefig("plot.jpg", bbox_inches="tight")
Issue raised at #47
There seems to be a better backend for CIs: https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.errorbar.html#matplotlib.axes.Axes.errorbar.
This should allow for asymmetrical CIs common in Odds ratios (#28).
Should affect only the backend.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.