twanvl / multiset Goto Github PK
View Code? Open in Web Editor NEWmultiset haskell package
License: Other
multiset haskell package
License: Other
There are a bunch of comments on the Ord
instances suggesting that it might be nice to use a different ordering. This shouldn't be a big deal, I don't think. Start by viewing the ends of each to see if they have the same key, in which case special logic is needed. Otherwise do the simple thing. Either way, use Data.Ord.Down
. @twanvl, I'll be happy to do this, but only if you want it. Please let me know.
The summary of MultiSet
An efficient implementation of multisets, also somtimes called bags.
Should be
An efficient implementation of multisets, also sometimes called bags.
Could you please upgrate multiset on stackage for GHC 8.4.* LTS and Nightly?
Thank you!
Even though on the docs it says it should work only with positive values, sometimes it is easy missed introducing a hard to find bug (at least It happened to me :/ ). As an example, this can be unintuitive fromListOccur [(x,1),(x,-1)] == fromListOccur [(x,0)] /= fromListOccur []
. I think it might be a good idea either:
error "negative or zero occurrences not supported"
.I think the second option is more interesting cause it eliminates the problem and makes the library useful for new use-cases; but it would slightly change the current semantics of some functions.
Could you please publish 0.3.4.1 to Hackage?
Could a new release be submitted to Hackage?
Despite the API change, it should be backwards compatible.
Due to changes in Prelude
and Data.Monoid
, compilation with GHC 7.10.2 results in the following warnings:
$ stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 7.10.2
$ stack build
multiset-0.3.1: unregistering (local file changes: Data/IntMultiSet.hs Data/MultiSet.hs multiset.cabal test/Main.hs)
multiset-0.3.1: configure
Configuring multiset-0.3.1...
multiset-0.3.1: build
Preprocessing library multiset-0.3.1...
[1 of 2] Compiling Data.MultiSet ( Data/MultiSet.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.4.0/build/Data/MultiSet.o )
Data/MultiSet.hs:139:1: Warning:
Module ‘Prelude’ does not export ‘join’
Data/MultiSet.hs:144:1: Warning:
The import of ‘Data.Monoid’ is redundant
except perhaps to import instances from ‘Data.Monoid’
To import instances alone, use: import Data.Monoid()
Data/MultiSet.hs:146:1: Warning:
The qualified import of ‘List.foldr’
from module ‘Data.Foldable’ is redundant
[2 of 2] Compiling Data.IntMultiSet ( Data/IntMultiSet.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.4.0/build/Data/IntMultiSet.o )
Data/IntMultiSet.hs:133:1: Warning:
Module ‘Prelude’ does not export ‘join’
Data/IntMultiSet.hs:138:1: Warning:
The import of ‘Data.Monoid’ is redundant
except perhaps to import instances from ‘Data.Monoid’
To import instances alone, use: import Data.Monoid()
In-place registering multiset-0.3.1...
multiset-0.3.1: copy/register
Installing library in
/home/user/src/multiset/.stack-work/install/x86_64-linux/lts-3.20/7.10.2/lib/x86_64-linux-ghc-7.10.2/multiset-0.3.1-APke0rlS0Vt5EHV2CxbSF5
Registering multiset-0.3.1...
Data.Set
has:
elemAt :: Int -> Set a -> a -- O(log n)
deleteAt :: Int -> Set a -> Set a -- O(log n)
It would be nice to have similar functions for MultiSet
:
elemAt :: Int -> MultiSet a -> a -- O(n)
distinctElemAt :: Int -> MultiSet a -> a -- O(log n)
deleteAt :: Int -> MultiSet a -> MultiSet a -- O(n)
distinctDeleteAt :: Int -> MultiSet a -> MultiSet a -- O(log n)
distinctDeleteManyAt :: Int -> Occur -> MultiSet a -> MultiSet a -- O(log n)
distinctDeleteAllAt :: Int -> MultiSet a -> MultiSet a -- O(log n)
I'm mostly interested in elemAt
for my purposes (using MultiSet
as a bag to randomly take from).
You could theoretically bring down all the complexities to O(log n)
but it would require changing the internal structure.
This is a limitation of doctest and won't occur unless both these packages are in scope. I think it can be resolved by using PackageImports - assuming it matters to you. I'll disable this test-suite on stackage at least for now.
/tmp/stackage-build1/multiset-0.3.1/test/Main.hs:3:8:
Ambiguous module name ‘System.FilePath.Glob’:
it was found in multiple packages:
filemanip-0.3.6.3@filem_AFSulF8Y70TAA5R8pjLEdB Glob-0.7.5@Glob_J8eYhnaKSeW8dehUANn0F1
mapEither
isn't very general as a multiset operation. It generalizes nicely to
mapPair
:: Ord a
=> (a -> Occur -> ((b, Occur), (c, Occur)))
-> MultiSet a -> (MultiSet b, MultiSet c)
By digging into Data.Map
internals, it's also possible to write a very efficient version for when the passed function is strictly increasing (relative to the natural partial ordering of pairs).
mapPairAsc
:: (a -> Occur -> ((b, Occur), (c, Occur)))
-> MultiSet a -> (MultiSet b, MultiSet c)
In both MultiSet and IntMultiSet, the definitions of maxView
and minView
are identical. In maxView
, you should replace deleteFindMin
by deleteFindMax
.
I would like to use MultiSet
inside a data structure that has
deriving (Eq,Ord,Read,Show, Generic, NFData, Data)
.
This lead to an error * No instance for (NFData (MultiSet.MultiSet BExpr))
Could MultiSet
just like Set
be derived from NFData?
Is there a reason not to define fmap = MS.map
?
Here's the output from running stack test
:
$ stack test
multiset-0.3.1: unregistering (dependencies changed)
multiset-0.3.1: configure (lib + test)
Configuring multiset-0.3.1...
multiset-0.3.1: build (lib + test)
Preprocessing library multiset-0.3.1...
In-place registering multiset-0.3.1...
Preprocessing test suite 'doctests' for multiset-0.3.1...
[1 of 1] Compiling Main ( test/Main.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.4.0/build/doctests/doctests-tmp/Main.o )
Linking .stack-work/dist/x86_64-linux/Cabal-1.22.4.0/build/doctests/doctests ...
multiset-0.3.1: copy/register
Installing library in
/home/user/src/multiset/.stack-work/install/x86_64-linux/lts-3.20/7.10.2/lib/x86_64-linux-ghc-7.10.2/multiset-0.3.1-APke0rlS0Vt5EHV2CxbSF5
Registering multiset-0.3.1...
multiset-0.3.1: test (suite: doctests)
Progress: 1/2
In file included from /home/user/src/multiset/Data/MultiSet.hs:666:0:
/home/user/.stack/programs/x86_64-linux/ghc-7.10.2/lib/ghc-7.10.2/base_GDytRqRVSUX7zckgKqJjgw/include/Typeable.h:17:2:
warning: #warning <Typeable.h> is obsolete and will be removed in GHC 7.10 [-Wcpp]
#warning <Typeable.h> is obsolete and will be removed in GHC 7.10
^
In file included from /home/user/src/multiset/Data/IntMultiSet.hs:672:0:
/home/user/.stack/programs/x86_64-linux/ghc-7.10.2/lib/ghc-7.10.2/base_GDytRqRVSUX7zckgKqJjgw/include/Typeable.h:17:2:
warning: #warning <Typeable.h> is obsolete and will be removed in GHC 7.10 [-Wcpp]
#warning <Typeable.h> is obsolete and will be removed in GHC 7.10
^
In file included from /home/user/src/multiset/Data/MultiSet.hs:666:0:
/home/user/.stack/programs/x86_64-linux/ghc-7.10.2/lib/ghc-7.10.2/base_GDytRqRVSUX7zckgKqJjgw/include/Typeable.h:17:2:
warning: #warning <Typeable.h> is obsolete and will be removed in GHC 7.10 [-Wcpp]
#warning <Typeable.h> is obsolete and will be removed in GHC 7.10
^
In file included from /home/user/src/multiset/Data/IntMultiSet.hs:672:0:
/home/user/.stack/programs/x86_64-linux/ghc-7.10.2/lib/ghc-7.10.2/base_GDytRqRVSUX7zckgKqJjgw/include/Typeable.h:17:2:
warning: #warning <Typeable.h> is obsolete and will be removed in GHC 7.10 [-Wcpp]
#warning <Typeable.h> is obsolete and will be removed in GHC 7.10
^
Examples: 4 Tried: 4 Errors: 0 Failures: 0
Examples: 0 Tried: 0 Errors: 0 Failures: 0
Completed 2 action(s).
This is caused by doctest compiling Multiset.hs
and IntMultiset.hs
which includes Typeable.h
from the base
package via #include
directives, when - instead - they should be including the version supplied under the include
directory.
> /tmp/stackage-build8/multiset-0.3.2$ runghc -clear-package-db -global-package-db -package-db=/home/stackage/work/builds/nightly/pkgdb Setup configure --enable-tests --package-db=clear --package-db=global --package-db=/home/stackage/work/builds/nightly/pkgdb --libdir=/home/stackage/work/builds/nightly/lib --bindir=/home/stackage/work/builds/nightly/bin --datadir=/home/stackage/work/builds/nightly/share --libexecdir=/home/stackage/work/builds/nightly/libexec --sysconfdir=/home/stackage/work/builds/nightly/etc --docdir=/home/stackage/work/builds/nightly/doc/multiset-0.3.2 --htmldir=/home/stackage/work/builds/nightly/doc/multiset-0.3.2 --haddockdir=/home/stackage/work/builds/nightly/doc/multiset-0.3.2 --flags=
Configuring multiset-0.3.2...
> /tmp/stackage-build8/multiset-0.3.2$ runghc -clear-package-db -global-package-db -package-db=/home/stackage/work/builds/nightly/pkgdb Setup build
Building multiset-0.3.2...
Preprocessing library multiset-0.3.2...
[1 of 2] Compiling Data.MultiSet ( Data/MultiSet.hs, dist/build/Data/MultiSet.o )
Data/MultiSet.hs:139:1: warning: [-Wdodgy-imports]
Module ‘Prelude’ does not export ‘join’
Data/MultiSet.hs:419:1: warning: [-Wredundant-constraints]
• Redundant constraint: Ord a
• In the type signature for:
filter :: Ord a => (a -> Bool) -> MultiSet a -> MultiSet a
Data/MultiSet.hs:425:1: warning: [-Wredundant-constraints]
• Redundant constraint: Ord a
• In the type signature for:
partition :: Ord a =>
(a -> Bool) -> MultiSet a -> (MultiSet a, MultiSet a)
Data/MultiSet.hs:580:1: warning: [-Wredundant-constraints]
• Redundant constraint: Ord a
• In the type signature for:
fromMap :: Ord a => Map a Occur -> MultiSet a
[2 of 2] Compiling Data.IntMultiSet ( Data/IntMultiSet.hs, dist/build/Data/IntMultiSet.o )
Data/IntMultiSet.hs:133:1: warning: [-Wdodgy-imports]
Module ‘Prelude’ does not export ‘join’
Preprocessing test suite 'doctests' for multiset-0.3.2...
[1 of 1] Compiling Main ( test/Main.hs, dist/build/doctests/doctests-tmp/Main.o )
Linking dist/build/doctests/doctests ...
> /tmp/stackage-build8/multiset-0.3.2$ dist/build/doctests/doctests
/tmp/stackage-build8/multiset-0.3.2/Data/IntMultiSet.hs:671:0: error:
fatal error: Typeable.h: No such file or directory
#include "Typeable.h"
^
compilation terminated.
doctests: `gcc' failed in phase `C pre-processor'. (Exit code: 1)
This commit breaks the build on GHC 7.6. Please either depend on a base version high enough to exclude 7.6 or tweak the CPP so the GHC option is not used in 7.6.
See minView
and maxView
at https://hackage.haskell.org/package/multiset-0.3.2/docs/Data-MultiSet.html#g:10. The example is correctly executed by doctest but is not formatted correctly by Haddock:
This is caused by the absence of a blank line between "Examples:" and the beginning of the code sample.
As a user, I would like the monad version of map to be provided by the multiset library instead of having to define it locally each time I need it.
Consequently, when a better implementation is possible, I don't have to change my code in many places.
MultiSet already offers the map function of signature Ord b => (a-> b) -> MultiSet.MultiSet a -> MultiSet.MultiSet b
.
Could you also offer something like
{-# LANGUAGE ScopedTypeVariables #-}
mapM :: forall a b m . (Ord b, Monad m) => (a-> m b) -> MultiSet.MultiSet a -> m (MultiSet.MultiSet b)
mapM f a = MultiSet.fromOccurList <$> mapM f' (MultiSet.toOccurList a)
where f' :: (a, MultiSet.Occur) -> m (b, MultiSet.Occur)
f' (x,c) = do
v <- f x
return (v,c)
I use the naming convention:
A postfix 'M' always stands for a function in the Kleisli category: The monad type constructor m is added to function results (modulo currying) and nowhere else. So, for example,
filter :: (a -> Bool) -> [a] -> [a]
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
It first builds lists, then converts them lazily into multisets. I think it makes a lot more sense to accumulate those multisets eagerly.
Any chance we can get a version of this that works on 8.4? Specifically, it needs a Semigroup
instance.
Preprocessing test suite 'doctests' for multiset-0.3.2...
[1 of 1] Compiling Main ( test/Main.hs, dist/build/doctests/doctests-tmp/Main.o )
Linking dist/build/doctests/doctests ...
> /tmp/stackage-build8/multiset-0.3.2$ dist/build/doctests/doctests
/tmp/stackage-build8/multiset-0.3.2/Data/IntMultiSet.hs:671:0: error:
fatal error: Typeable.h: No such file or directory
#include "Typeable.h"
^
compilation terminated.
doctests: `gcc' failed in phase `C pre-processor'. (Exit code: 1)
Map
stopped using "hedge" algorithms several years ago: the much simpler divide-and-conquer ones have better-understood performance characteristics and seem to perform better in practice. I don't think IntMap
has ever used anything like that. At the same time, performance claims can be updated accordingly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.